Studying the Wikipedia Clickstream

This notebook provides a tutorial for how to study reader behavior via the Wikipedia Clickstream via the monthly dumps. It has three stages:

Accessing Wikipedia Clickstream Dumps

This is an example of how to parse through the Wikipedia clickstream dumps for a wiki and gather the data for a given article.

Finding Your Article in Other Languages

The langlinks API is a simple (automatic) way to get all the other language versions of a particular article. For example, you can do this with List of dinosaurs of the Morrison Formation and see that it exists not just in English but also in French.

Compare Reader Behavior across Languages

Here you want to explore what's similar and what's different between how readers interact with your article depending on the language they are reading it in. Provide hypotheses for any large differences you see. You don't have to do any formal statistical tests unless you want to -- it can be just observations you have about the data. Feel free to focus on the article you chose or expand to other articles.

Remember, you can use the langlinks API to see whether an article in one language is the same as one in another language. For instance, in the French clickstream dataset we see that someone went from the article Formation de Morrison to Liste des dinosaures de la formation de Morrison in French Wikipedia 15 times in January. From the English clickstream dataset, we could see that someone went from the article Morrison Formation to List of dinosaurs of the Morrison Formation in English Wikipedia 401 times in January. With the langlinks API, we can verify that this reading path is equivalent in French and English (same source and destination article, just different languages).

Future Analyses

TODO: Describe what additional patterns you might want to explore in the data (and why). You don't know have to know how to do the analyses.