Coup baseline analysis

Coup revisions

Categories

Image usage

Category network

Tag network

Language-tag similarity

topics_parent_lang_df = pd.read_csv('topics_parent_lang.csv',encoding='utf8',header=None,index_col=0) topics_parent_lang_dict = topics_parent_lang_df[1].to_dict() topics_parent_lang_dict with open('topics_parent_lang.json','w') as f: json.dump(topics_parent_lang_dict,f)

Get the names of all the images.

We want to align the image_tags_dict and files in lang_images, but there are Unicode normalization issues that happened when writing the filenames to disk as compared to how they're encoded on Wikipedia. It appears that writing to disk decomposed into characters and modifiers and Wikipedia has the composed representation.

See also:

These are images that were downloaded and tagged but apparently don't appear in any languages.

Revision network

Image usage

Image-language network

Image-page network

Image-language network

Image-lang-tag tripartite network