Wikidata Training: the Mediawiki API

Table of Contents

  1. Documentation
  2. Data model
  3. Use the API directly
  4. Use pywikibot

Documentation

Wikidata uses the Wikibase extension to Mediawiki (a wiki software), therefore you can find it's documentation there:

All Wikibase-specific API calls start with wb, e.g.:

Let's play around with the sandbox a bit to get a feeling for the API:

Data model

The Wikibase data model needs some explaination.

Items

Items are Entities that are typically represented by a Wikipage (at least in some Wikipedia languages). They can be viewed as "the thing that a Wikipage is about," which could be an individual thing (the person Albert Einstein), a general class of things (the class of all Physicists), and any other concept that is the subject of some Wikipedia page (including things like History of Berlin).

Examples:

Properties

Properties are Entities that describe a relationship between Items (or other Entities) and Values of the property. Typical properties are population (using numbers as values), binomial name (using strings as values), but also has father and author of (both using Items as values).

Examples:

Snaks

Snaks are the basic information structures used to describe Entities in Wikidata. They are an integral part of each Statement (which can be viewed as collection of Snaks about an Entity, together with a list of references).

Examples:

Note: Snaks do not mention the subject to which they refer (Berlin, Zürich), this is given by the context in which a Snak is used (typically as part of a Statement).

Statements

Statements describe the claim of a statement and list references for this claim. Every Statement refers to one particular Entity, called the subject of the Statement. There is always one main Snak that forms the most important part of the statement. Moreover, there can be zero or more additional PropertySnaks that describe the Statement in more detail. These qualifier Snaks (or "qualifiers" for short) store additional information that does not directly refer to the subject (e.g., the time at which the main part of the statement was valid). References are provided as a list (the order is significant in some contexts, especially for displaying a main reference).

Examples:

Use the API directly

Use pywikibot

pywikibot is a python library based on the Mediawiki API. In this notebook we will see how to use the API using Python with pywikibot and lay the groundwork to later develop a bot or tool for Wikidata.

Use pywikibot for Wikidata:

If you want to setup pywikibot on your computer, check this tutorial: https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_up_Shop

Quick steps:

  1. Create a new directory for your project
  2. Clone pywikibot in this directory: git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot
  3. Run python generate_user_files.py to create user-config.py
  4. Run python pwb.py login to login with your account