MediaWiki REST API docs
  Home | Discuss | Fork this notebook on PAWS


Search Wikipedia articles

The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API search endpoints to search for articles about the Solar System on English Wikipedia.


To access English Wikipedia, we'll use English Wikipedia's domain (en.wikipedia.org) and the REST API v1 URL (/w/rest.php/v1). To search a different Wikipedia, change the language code from en to the language you'd like to search. For example, ru.wikipedia.org for Russian Wikipedia or vi.wikipedia.org for Vietnamese Wikipedia.

To keep the results manageable, we'll use the limit parameter to restrict the response to three results. The search pages endpoint returns an array of page objects, giving us information about each article. The title property gives us the article title as it appears on the page, while the key property gives us the title in URL-friendly format. We can use the key to construct the URL for the article using English Wikipedia's domain and article path (/wiki).

Now that we have a simple list of search results, we can use other properties of the page object to add more detail. description provides a short summary of the topic, generated from the article's corresponding entry on Wikidata. If a Wikidata entry isn't available, description will be null, so we'll need to account for this when adding descriptions to our search results.

To get more information about how the article relates to the search query, we can use the excerpt property, which gives us a few lines from the article in HTML. In the excerpt, search terms are highlighted with span tags of class="searchmatch" to make them easy to style in your app. For example, here's the excerpt for the last article in our search results.

For this notebook, we can import the HTML class from the IPython display module to render the titles and excerpts nicely.

To add images to our search results, we can use the thumbnail object's url, height, and width properties. Like description, thumbnail will be null if no thumbnail is available. To account for this, we'll use a try statement that substitutes a default image of the Wikipedia globe in case of an exception. To render the image, we'll import the Image class from the IPython display module.

So far, we've used the search pages endpoint, which searches the content of pages for the most relevant results. But in some cases, you may want results that are based on the page title, such as creating a typeahead search that automatically suggests relevant pages by title.

Here's an example of the same search using the autocomplete page title endpoint. However, since this endpoint is designed for quicker searching, it doesn't include an excerpt, so we'll need to remove it from the output.

You should now be able to use the REST API search endpoints to create a Wikipedia search app. You can also use these endpoints with any Wikimedia projects; try searching for dictionary entries, famous quotes, and more.

To fork, edit, and re-run this Jupyter Notebook, download the source, and upload to PAWS using your Wikimedia account.

For more information about these endpoints, see the API reference. To share your feedback on this tutorial, post a comment to the REST API discussion page.


Image credits:


This tutorial is licensed under the Creative Commons Attribution-ShareAlike License.