Accessing Wikidata Content

This notebook provides a tutorial for how to access content in Wikidata either via the JSON dumps) or API. It has three stages:

Accessing the Wikidata JSON Dumps

This is an example of how to parse through JSON dumps) and gather statements (property:value) for all items with at least one Wikipedia sitelink). This can obviously be adjusted for whatever filtering etc. is desired. Of note, the JSON dumps are over 50 GB compressed and thus processing them can easily take a full day. If this is done via PAWS, the service will time-out.

Show an example of the data

Accessing the Wikidata APIs

The Wikidata APIs can be much faster for accessing data about Wikidata items if you know what items you are interested in and are interested in relatively few items (e.g., hundreds or low thousands). To demonstrate, we'll show how to use the wbgetentities endpoint, which allows you to get all the statements and sitelinks associated with a Wikidata item. We choose a random sample of 10 items from the JSON dump to compare.

NOTE: the APIs are up-to-date while the JSON dumps are always at least several days behind -- i.e. for specific snapshots in time -- so the data you get from the JSON dumps might be different from the APIs if users have made edits to the Wikidata items in the intervening days.