Studying the Wikipedia Clickstream

This notebook provides a tutorial for how to study reader behavior via the Wikipedia Clickstream via the monthly dumps. It has three stages:

Accessing Wikipedia Clickstream Dumps

This is an example of how to parse through the Wikipedia clickstream dumps for a wiki and gather the data for a given article.

Reading the Wikipedia Clickstream data

Number of Unique Destination?

Pageviews "other-search" proportion?

Source-destination pair?

Most common: (other-search, Richard_Ramirez) with 4258415 count

The Popular Article

Validate that the article is truly popular

Retrieve rows containing the popular article

Constructing the multipartite graph with NetworkX

Graph Explanation

We can see that the graph is a multipartite graph, to be exact tripartite, and we divide the articles to three groups:

We arrange these groups respectively from left to right, hence the flow of the arrows are generally from left to right as well.

However, there could be some articles being source and destination at the same time, shown as some arrows from middle to left.

For the graph below,