EDA – plays a cruciall role in understanding the what, why, and how of the problem statement

Importing of different Modules and explore their usages

Pandas is an extensively data manipulation tool, built on the Numpy package and its key data structure called DataFrame allows to store and manipulate tabular data in rows of obseravations

json is javascript object notation,lightweight data-interchange format I have ran multiple examples to get acquinted with the json format, it's syntax and datatypes like string, number, object (JSON object), array, boolean, nul

re is Python's inbuilt library to work with regular expressions. I learnt different functions of re and sequences

Seaborn and Matplotlib are imported to perform some visualizations as Analyses results can be shown well as the infographics have an impulsed and quick impact than the text paragraphs describing the results in th

I have imported mediawiki API and have referred through all the sections and actions that can be performed -Quering etc

Interaction with API

To know how the data looks like I have run following

To check if there exists any missing or null values

There are 11 Columns

Noted the Observations....

As discussed in the earlier Contribution #1 in the Jupyter Notebook named kickstart- exploration of the data and other modules that the stats is an python dictionary accessing the value for each corresponding key may be difficult So made each value of stats into a seperate Column in the Data STATS has the highest importance as it speaks everything in the analysis

sort by multiple columns:

Better Comparison among the stats

import matplotlib.pyplot as plt l = Data_stats.columns.values number_of_columns= 4 number_of_rows = len(l)-1/number_of_columns from matplotlib import figure f = figure.Figure(figsize=(number_of_columns,5*number_of_rows)) for i in range(0,len(l)): plt.subplot(number_of_rows + 1,number_of_columns,i+1) sns.set_style('whitegrid') sns.boxplot(Data[l[i]],color='green',orient='v') plt.tight_layout()

from matplotlib import figure f = figure.Figure( figsize =(7,7) ) sns.heatmap(Data_stats["any"][:8]. Data_stats["mt"][:8],cmap='Blues',annot=False)

Data_num = Data.select_dtypes(include = ['float64', 'int64']) Data_num1 = Data_num[:10] Data_num1.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8)