Querying Wikidata with SPARQL

An introduction at the Wikidata Zurich Meetup #2
September 5, 2018. Zurich, Switzerland.
This notebook is licensed by CC-BY-SA
Cristina Sarasua (University of Zurich)
See also Wiki page of the Meetup


Introduction

Wikidata's data can be accessed in many different ways. Depending on what exactly you would like to access (e.g., item descriptions, or page edit history) you may:

In this hands-on we will focus on querying data items of the knowledge base using the SPARQL query language --- the language that is used to query Linked Data and Semantic Web data in general.

Wikidata Documentation

If you would like to look up further information about Wikidata, I would recommend you read the article at the Communications of the ACM journal, by Denny Vrandečić and Markus Krötzsch: Wikidata: a Free Collaborative, Knowledge Base, and the following information:

Wikidata's data (model and data access)

Querying Wikidata with SPARQL

Querying from Python

I won't showcase these due to time restrictions, but they are also useful tools to work with SPARQL and RDF data in Python:

When you try out queries, you often need to look up the names/IDs of items, properties, or maybe you need to look up the way qualifiers and references are stated. In that case, I highly recommend you to look at examples of item descriptions

E.g. https://www.wikidata.org/wiki/Q72 E.g. https://www.wikidata.org/wiki/Q11943

Tip! If you would like to keep up-to-date and discover new very sophisticated SPARQL queries that you can execute, you should not miss the updatey by Wikidata-Facts in Twitter! =) https://twitter.com/WikidataFacts


Part I - SPARQL and Wikidata Query Service

The query service is a Web service that accesses Wikidata's data and can be used by humans through this GUI and by programs to execute data queries written in SPARQL.

https://query.wikidata.org

Exploring SPARQL features

Let's look at the kind of things we can do in SPARQL when we want to query data.

> Get items (ID and label) of a particular type

Example: query to retrieve the ID and label of train stations in Zurich

select ?item ?itemLabel
where{
  ?item wdt:P31 wd:Q55488 .
  ?item wdt:P131 wd:Q72 .
  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

What if someone described the station using a more specific type of item than train (e.g. dead-end station)? We can use property paths and ask for the items that have been defined as train stations or subclasses of train stations:

select ?item ?itemLabel
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

We can also order the results by e.g. the label.

` select ?item ?itemLabel where{ ?item wdt:P31/wdt:P279* wd:Q55488 . ?item wdt:P131 wd:Q72 . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . } } ORDER BY ?itemLabel

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Sometimes, when you ask for some information (e.g., opening date) it can happen that not all stations have that information. So this query will be restrictive and will give you only 16 results (those that have that information):

select ?item ?itemLabel ?openingDate
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  ?item wdt:P1619 ?openingDate .

  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

What you can do is to "relax" the condition and ask for the stations, and get the opening date (in case they have it). You will still get the other information in the cases where the station does not have the opening date.

select ?item ?itemLabel ?openingDate
where{
  ?item wdt:P31/wdt:P279* wd:Q55488 .
   ?item wdt:P131 wd:Q72 .
  OPTIONAL{?item wdt:P1619 ?openingDate .}

  SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Maybe you would like to retrieve the train stations and bus stations. In that case you can use UNION.

```` select ?item ?itemLabel ?openingDate where{ {?item wdt:P31/wdt:P279 wd:Q55488 .} UNION {?item wdt:P31/wdt:P279 wd:Q494829.}

?item wdt:P131 wd:Q72 . OPTIONAL{?item wdt:P1619 ?openingDate .}

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . } }

```

You can also count, limit and many more things!

> Get Wikipedia and Wikimedia Commons URLs

Example: retrieve the id, label of things located in neighbourhoods of Zurich, with their Wikipedia article and Wikimedia commons URL

SELECT ?quartier ?item ?itemLabel ?article ?image ?prop
{?item wdt:P131 ?quartier .
 ?quartier wdt:P31 wd:Q19644586 .
  OPTIONAL {?article schema:about ?item .
    ?article schema:isPartOf <https://en.wikipedia.org/>.
 ?item wdt:P18 ?image}

 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Federated queries

Example: query example from from https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federated_queries cited: ''find a value for TOID (P3120) from the UK Ordnance Survey for items that have a GSS code (2011) (P836) but no TOID (P3120):''

SELECT ?item ?itemLabel ?gss ?os_toid WHERE {
    ?item p:P836 ?gss_stmt .
    ?gss_stmt ps:P836 ?gss .
    FILTER NOT EXISTS { ?gss_stmt pq:P582 [] } .
    FILTER NOT EXISTS { ?item wdt:P3120 [] } .

    SERVICE <http://data.ordnancesurvey.co.uk/datasets/os-linked-data/apis/sparql> {
      ?os_toid <http://data.ordnancesurvey.co.uk/ontology/admingeo/gssCode> ?gss
    }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Exploring features of the Wikidata Query Service

> Queries using / retrieving qualifiers and references --- two valuable features in Wikidata

Example: query to the population of the neighbourhoods of Zurich at a particular point in time, and the source of the information.

PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT ?itemLabel ?itemDescription ?item ?population ?time ?ref WHERE {
  ?item wdt:P625 ?coord.
  ?item wdt:P31 wd:Q19644586.
  ?item wdt:P131 wd:Q72.
  OPTIONAL {
    ?item p:P1082 ?statement.
    ?statement prov:wasDerivedFrom ?source.
    ?source pr:P854 ?ref.
    ?statement ps:P1082 ?population.
    ?statement pq:P585 ?time.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Display query results in a map

Example: query to show the mountains of Switzerland in a map, by height (meters above the sea level).

#defaultView:Map
SELECT ?item ?itemLabel ?coord ?height ?layer ?image 
WHERE { ?item wdt:P31 wd:Q8502. ?item wdt:P625 ?coord. 
       ?item wdt:P17 wd:Q39.  ?item wdt:P2044 ?height . 
       BIND( IF(?height < 1000, "<1000 metres", 
                IF(?height < 2000, "1000 - 2000 metres", 
                   IF(?height < 3000, "2000 - 3000 metres", 
                      IF(?height < 4000, "3000 - 4000 metres", "> 4000 metres")))) AS ?layer). 
       OPTIONAL {?item wdt:P18 ?image.} 

       SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
`

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Retrieve things around / in the surrounding area

Example: query to retrieve max. 100 libraries around the UZH Zentrum building, with a radius of 2km.

#defaultView:Map
SELECT ?thing ?location ?distance ?thingLabel WHERE {

    SERVICE wikibase:around { 
      ?thing wdt:P625 ?location . 
      bd:serviceParam wikibase:center "Point(8.548333 47.374722)"^^geo:wktLiteral .
      bd:serviceParam wikibase:radius "2" . 
      bd:serviceParam wikibase:distance ?distance .
    } 
    ?thing wdt:P31/wdt:P279* wd:Q7075

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?distance LIMIT 100
`

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Example: query to retrieve max. 100 fountains that we know have been added from a specific catalogue, the one maintained by the City of Zurich drinking water supply (https://www.wikidata.org/wiki/Q53629101)

#defaultView:Map
SELECT ?thing ?location ?distance ?thingLabel WHERE {

    SERVICE wikibase:around { 
      ?thing wdt:P625 ?location . 
      bd:serviceParam wikibase:center "Point(8.548333 47.374722)"^^geo:wktLiteral .
      bd:serviceParam wikibase:radius "2" . 
      bd:serviceParam wikibase:distance ?distance .
    } 
    ?thing p:P528 ?statement . ?statement pq:P972 wd:Q53629101 .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?distance LIMIT 100

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

> Display query results in a bubble-chart

Example: query to plot in a bubble chart the number of floods per country in Europe since 1980.

#defaultView:BubbleChart
SELECT ?country ?countryLabel (count(distinct ?item) as ?count) 
WHERE
{
  ?item wdt:P31/wdt:P279* wd:Q8068 .#wd:Q8065 .
  ?item wdt:P17 ?country .
  ?country wdt:P30 wd:Q46 .
  ?item wdt:P580 ?starttime .
  FILTER(?starttime >= "1980-00-00T00:00:00Z"^^xsd:dateTime)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
GROUP BY ?country ?countryLabel

Execute this query in the Wikidata Query Service (Open this link, and click on the "play" symbol to execute the query live).

Part II - Importing and using SPARQL query results in your Jupyter-Python Notebook

Part III - Executing SPARQL queries via the Wikidata (MediaWiki) API