Accessing Edit Tag APIs

The Revisions API can be a much simpler way to access data about edit tags for a given article if you know what articles you are interested in and are interested in relatively few articles (e.g., hundreds or low thousands).

NOTE: the APIs are up-to-date while the Mediawiki dumps are always at least several days behind -- i.e. for specific snapshots in time -- so the data you get from the Mediawiki dumps might be different from the APIs if edits have been made to a page in the intervening days.

As an example, let's try to count the number of mobile and non-mobile edits for a simple wikipedia page about London

Parsing the history dump The history dump is fairly more structured and far easier to parse because a specialised package (mwxml: mediawiki xml) has been made for that purpose. As you loop through the parsed history dump, take record all the revison ids of the edits and store them in such a way that they are mapped to the year in which the edit was made.

Note that: checking namespace(0) just filters out the most necessary edits, removing things like talk page edits. It is an optional step that comes in very handy if you have computational limitations (time and memory). Also, the revision ids are converted to strings so that they can easily be compared with the the mobile edit revision ids which are already strings.