Some notes about the cxpublishedtranslations API:

Get set of translated articles to dig more deeply into

Alternative view of the data via Pandas

Get corresponding parallel translation

Parallel translations can either be accessed through the dump files or API. Use the dump files if you are planning on analyzing the entire corpus (or a large proportion) of translated articles. The API is best for looking at a few examples.

Dump files




Some starting examples include trying to better understand what happens to the translated article after it is created. The page history for every Wikipedia article is publicly available. Each article also has a corresponding talk page, in which editors might discuss the content on the page and other related items. If you are unfamiliar with how to access this content, see these overviews of how to access page history ( and talk pages (

For example, for the English version of Gradient Boosting, these can be found at:

Go through the edit histories for a few articles and begin to identify whether any trends emerge about the types of edits that happen to translated articles. Compare the translated and source articles in their current state. What types of content were added after the translation? Are the articles diverging in content or staying similar? What sorts of discussions occur on the talk pages of translated articles?

Eventually we can do this in a more robust manner: more carefully choosing which articles to examine, developing more concrete questions to answer, building a code book for annotating article histories, content, or discussions, etc.

Quantitative Analyses

More data can be accessed about the translations and what occurred after them. Try comparing statistics about edits, pageviews, etc. between the source and translated versions of articles. More advanced analyses in a project might eventually compare translated articles with similar articles that were not translated or classify edits based upon their 'type' for more fine-grained analyses of what happens to translated articles.

You can programmatically access page views for source/translated pages:

You can access page history as detailed below: