Studying the Wikipedia Clickstream

This notebook provides a tutorial for how to study reader behavior via the Wikipedia Clickstream via the monthly dumps. It has three stages:

Accessing Wikipedia Clickstream Dumps

This is an example of how to parse through the Wikipedia clickstream dumps for a wiki and gather the data for a given article.

Finding Your Article in Other Languages

The langlinks API is a simple (automatic) way to get all the other language versions of a particular article. For example, you can do this with List of dinosaurs of the Morrison Formation and see that it exists not just in English but also in French.

Compare Reader Behavior across Languages

Here you want to explore what's similar and what's different between how readers interact with your article depending on the language they are reading it in. Provide hypotheses for any large differences you see. You don't have to do any formal statistical tests unless you want to -- it can be just observations you have about the data. Feel free to focus on the article you chose or expand to other articles.

Remember, you can use the langlinks API to see whether an article in one language is the same as one in another language. For instance, in the French clickstream dataset we see that someone went from the article Formation de Morrison to Liste des dinosaures de la formation de Morrison in French Wikipedia 15 times in January. From the English clickstream dataset, we could see that someone went from the article Morrison Formation to List of dinosaurs of the Morrison Formation in English Wikipedia 401 times in January. With the langlinks API, we can verify that this reading path is equivalent in French and English (same source and destination article, just different languages).

Analyses of the article Westdeutscher_Rundfun(English and Germany clickstreams)

1.From the three graphs above one for English clickstream and two for Germany clickstream of the article Westdeutscher_Rundfunk_Köln (english->Westdeutscher_Rundfunk), It can be seen that it is higly likely that a person who started reading a certain page on wikipedia would end up reading the article Westdeutscher_Rundfunk in Germany language that a certain person reading a certain english article would end up reading the article Westdeutscher_Rundfunk in english language.

Westdeutscher_Rundfunk as source article analysis

2.It can be noted that from the English clickstream the article Westdeutscher_Rundfunk is not found as a source article but mostly people would end up clicking it as a destination article. However it can be noted that in a Germany clickstream we find that people could start by reading the article Westdeutscher_Rundfunk as a source article and the would end up clicking another article. It can be noted that the article Westdeutscher_Rundfunk was clicked at least 64 times as a source file. Which shows that the article Westdeutscher_Rundfunk is more of interest by the germany audiences than it is by the english audiences.It can be noted that Westdeutscher_Rundfunk as a source article leading to the article Friedrich_Nowottny had the most pageviews with more than 250 in the Germany clickstream.

Westdeutscher_Rundfunk as a destination article analysis

3.The most common source-destination pair in the Germany clickstream is other-internal and Westdeutscher_Rundfunk with more than 50 000 pageviews as compared to the most common source-destinatio pair in the English clickstream which is other-external and Westdeutscher_Rundfunk with more than 5000 pageviews.

other analysis observed

4.It can be noted that Westdeutscher_Rundfunk has more pageviews in Germany more than 85k compared with the same article read in english with 8k pageviews in the month of january.

5.A similarity from the data is the bank Westdeutscher_Rundfunk, just by the look of the pageviews shows that it is an important bank for both english speakers and germany speakers.

Future Analyses

TODO: Describe what additional patterns you might want to explore in the data (and why). You don't know have to know how to do the analyses.

1)I think I'm interested to learn how a certain treding topic can be noticable in wikimedia clickstream across different people that is to say across different langauges. Secondly I'm also curious to explore how human cognition can be depicted through wikimedia clickstreams.