mwapi Example

In this notebook, we'll show you the basics of using mwapi to get data out of MediaWiki APIs like those available for Wikipedia, Wiktionary, Commons, and Wikidata. The mwapi library is very basic. It provides a thin wrapper and some simple convenience functions around the basic MediaWiki API structure.

This notebook will procede in 3 parts that perform increasingly advanced actions.

  1. Running a basic query
  2. Using query continuation
  3. Connecting your bot via OAuth

Part 1: Running a basic query

We'll start by constructing a session object.

Note that the library complains that a user_agent argument wasn't provided. This is OK and you'll be allowed to continue, but it's highly recommended that you use this to provide a description of what you are doing and who you are to enable the operations engineers to contact you about your API usage.

OK. No more warning. :) Now to actually perform a query. In the request below, we're going to get the content of the last 10 edits to my talk page.

As you can see, the library will give you a back a JSON style python dict. Let's list out the fields we got back.

Part 2: Using query continuation

The example we worked through in part 1 is great when we only want a few items out of the API, but what about when we want to read the entire history of a page? The API only returns so many revisions at a time and provides a continuation strategy to allow for sequential queries to retrieve large responses. mwapi provides some nice utilities for automating continuation. We'll explore this by providing the continuation=True parameter to get() and using the continuation to analyze the entire history of my user talk page.

The docs variable now contains a generator of query results that make up this continuation. We can process them in a loop to generate some stats.

Note: The next code block is broken because PAWS uses mwapi 0.3.1 instead of 0.4.0+