SparQL as a Wikidata pywikibot generator

In Wikidata, complex queries can be performed because the data is stored in a structured way. SparQL is the querying language used by the wikibase technology (which drives wikidata).

SparQL is meant to write queries to what is generally called (key-value) like data, which is exactly how Wikidata stores it's data (property, value) tuples. In general, it's a query language for RDF. RDF (Resource Description Framework) is a W3C specificationn to write metadata model graphs. i.e. it helps in specifying a way to write some types of relational diagrams.

To run and test SparQL queries on wikidata, a query service was created at - Use it while going through the tutorial.

1. Turtle

The basic building block of a SparQL query is an RDF/turtle. The full form of turtle is "Terse RDF Triple Language". It consists of a triplet or a 3-tuple where the items reresent a subject, a predicate and an object. In wikidata, we would say the three items are subject, property, value.

For example, in wikidata, we can write the following turtles:

In wikidata, the SparlQL have some special definitions (prefixes) which have been given pre-defined meanings. The wdt: and wd: prefixes:

These words can be changed and other prefixes can be defined by using @prefix. Hence, you can simply consider the following two lines are always added to every query by default:

@prefix wd: <>
@prefix wdt: <>

Hence, the above mentioned turtles will be written as the following in SparQL:

The standard prefixes used by wikidata are:

@prefix wd: <> 
@prefix wdt: <>
@prefix wikibase: <>
@prefix p: <>
@prefix ps: <>
@prefix pq: <>
@prefix rdfs: <>

2. Writing a simple RDF query

Using turtles, we can define a basic query which fetches all items with a specific property value. The syntax for this is:

SELECT ?item WHERE { ?item wdt:P31 wd:Q5 . } LIMIT 100

The word item is similar to a variable. The query above means "Return all items, which are instance of human, limited to 100 items". let us try fetching this data in pywikibot:

The pywikibotpagegenerators.WikidataSPARQLPageGenerator function is restricted, as it can only accept queries which gives out a single ItemPage. It also expects the variabe name to be ?item. But SparQL is considerably more flexible, as it can generate different types of output.

For example, try the following query which should list all the places Douglas Adams (Q42) was educated at (P69):

This would give the KeyError saying that item was not found. Running the same query on gives the appropriate result. Run the next code block and click the "Run" button to see the query:

2. Running generic SparQL queries in Pywikibot

Pywikibot can also be used to run any generic SparQL queries using the SparqlQuery class:

The result given by the SparqlQuery is a bit raw and just gives the raw RDF converted to JSON. Hence normally the pywikibot API using ItemPage and Claim is an easier way to get data from the pages after creating the appropraite Page Generator.

If you're sure that the value is going to be a SELECT query, then the .select() function is a much cleaner way to get the data as it parses the JSON and sanitizes it:

But the data here still gives the url given by RDF rather than the ItemPage, hence it is rather limited in functionaity.


For a more elaborate RDF quide on SparQL check out

For the complete guide to wikidata's SparQL check out

Also, check out the example queries in and to understand more complex queries.