{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# COVID-19 Related Articles (Strongly related)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Updated March 21th) New definition. To find the former definition go to this [notebook](https://paws-public.wmflabs.org/paws/user/Diego_%28WMF%29/notebooks/CoronaVirusStronglyRelatedPages-FisrtMethodologyMarch18.ipynb)\n", "\n", "Here we focus in articles related to COVID-19 (Q84263196) and 2019–20 COVID-19 pandemic (Q81068910). Finding relevant connections such as \"Main Subject\", \"Part Of\" or \"Has caused\".\n", "\n", "This approach tries to solve the \"Tom Hanks' problem\", that is articles about celebrities \"Medical Condition\" such as (Tom Hanks) or other weak connections with COVID.\n", "To get a full list of articles realted with COVID-19 go to [this notebook](https://paws-public.wmflabs.org/paws/user/Diego_%28WMF%29/notebooks/Corona%20virus%20related%20pages.ipynb)\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: SPARQLWrapper in /srv/paws/lib/python3.6/site-packages\r\n", "Requirement already satisfied: rdflib>=4.0 in /srv/paws/lib/python3.6/site-packages (from SPARQLWrapper)\r\n", "Requirement already satisfied: pyparsing in /srv/paws/lib/python3.6/site-packages (from rdflib>=4.0->SPARQLWrapper)\r\n", "Requirement already satisfied: isodate in /srv/paws/lib/python3.6/site-packages (from rdflib>=4.0->SPARQLWrapper)\r\n", "Requirement already satisfied: six in /srv/paws/lib/python3.6/site-packages (from isodate->rdflib>=4.0->SPARQLWrapper)\r\n" ] } ], "source": [ "#install dependencies\n", "!pip install SPARQLWrapper" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "from SPARQLWrapper import SPARQLWrapper, JSON\n", "import pandas as pd\n", "\n", "sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n", "#https://w.wiki/KvX (Thanks User:Dipsacus_fullonum)\n", "# All statements with item, property, value and rank with COVID-19 (Q84263196) as value for qualifier.\n", "\n", "sparql.setQuery(\"\"\"\n", "SELECT ?item ?itemLabel ?property ?propertyLabel ?value ?valueLabel ?rank ?qualifier ?qualifierLabel\n", "WHERE\n", "{\n", " ?item ?claim ?statement.\n", " ?property wikibase:claim ?claim.\n", " ?property wikibase:statementProperty ?sprop.\n", " ?statement ?sprop ?value.\n", " ?statement wikibase:rank ?rank. \n", " ?statement ?qprop wd:Q84263196. # COVID-19\n", "\n", " \n", " ?qualifier wikibase:qualifier ?qprop.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "allStatements = pd.io.json.json_normalize(results['results']['bindings'])\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "disease outbreak 370\n", "human 7\n", "treatment 2\n", "drug repositioning 1\n", "diagnostic test 1\n", "pneumonia 1\n", "medical diagnosis 1\n", "mascot character 1\n", "2020-03-05T00:00:00Z 1\n", "vaccine 1\n", "hierarchy of hazard controls 1\n", "drug development 1\n", "pandemic 1\n", "moe anthropomorphic character 1\n", "Name: valueLabel.value, dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "allStatements['valueLabel.value'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Example of pages main subject COVID-19" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item.valueitemLabel.value
0http://www.wikidata.org/entity/Q889739212020 coronavirus pandemic in Manitoba
1http://www.wikidata.org/entity/Q889769602020 coronavirus pandemic in Saint Vincent and...
2http://www.wikidata.org/entity/Q86901049COVID-19 testing
3http://www.wikidata.org/entity/Q873436822020 coronavirus outbreak in Tunisia
4http://www.wikidata.org/entity/Q874024042019–20 coronavirus outbreak in Hubei
5http://www.wikidata.org/entity/Q874064282020 coronavirus outbreak in Americas
6http://www.wikidata.org/entity/Q874120282020 coronavirus outbreak in Brunei
7http://www.wikidata.org/entity/Q874099532020 coronavirus pandemic in Washington
8http://www.wikidata.org/entity/Q87418063Category:Coronavirus disease 2019 survivors
9http://www.wikidata.org/entity/Q874147412020 coronavirus pandemic in New York
\n", "
" ], "text/plain": [ " item.value \\\n", "0 http://www.wikidata.org/entity/Q88973921 \n", "1 http://www.wikidata.org/entity/Q88976960 \n", "2 http://www.wikidata.org/entity/Q86901049 \n", "3 http://www.wikidata.org/entity/Q87343682 \n", "4 http://www.wikidata.org/entity/Q87402404 \n", "5 http://www.wikidata.org/entity/Q87406428 \n", "6 http://www.wikidata.org/entity/Q87412028 \n", "7 http://www.wikidata.org/entity/Q87409953 \n", "8 http://www.wikidata.org/entity/Q87418063 \n", "9 http://www.wikidata.org/entity/Q87414741 \n", "\n", " itemLabel.value \n", "0 2020 coronavirus pandemic in Manitoba \n", "1 2020 coronavirus pandemic in Saint Vincent and... \n", "2 COVID-19 testing \n", "3 2020 coronavirus outbreak in Tunisia \n", "4 2019–20 coronavirus outbreak in Hubei \n", "5 2020 coronavirus outbreak in Americas \n", "6 2020 coronavirus outbreak in Brunei \n", "7 2020 coronavirus pandemic in Washington \n", "8 Category:Coronavirus disease 2019 survivors \n", "9 2020 coronavirus pandemic in New York " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "allStatements[['item.value','itemLabel.value']].head(10)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# All truthy statements with COVID-19 (Q84263196) as value.\n", "#https://w.wiki/KvZ (Thanks User:Dipsacus_fullonum)\n", "\n", "sparql.setQuery(\"\"\"\n", "SELECT ?item ?itemLabel ?property ?propertyLabel\n", "WHERE\n", "{\n", " ?item ?claim wd:Q84263196.\n", " ?property wikibase:directClaim ?claim.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", "}\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "truthy = pd.io.json.json_normalize(results['results']['bindings'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Example All truthy statements with COVID-19 (Q84263196) as value." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item.valueitemLabel.valuepropertyLabel.value
257http://www.wikidata.org/entity/Q5753193Carolina Darias San Sebastiánmedical condition
1439http://www.wikidata.org/entity/Q88975335Perceptions of the Adult US Population regardi...main subject
30http://www.wikidata.org/entity/Q321770Aaron Tveitmedical condition
1627http://www.wikidata.org/entity/Q88976733Protocol for a randomized controlled trial tes...main subject
1129http://www.wikidata.org/entity/Q88876722Narciso Arranz Cerezocause of death
1698http://www.wikidata.org/entity/Q88977417SARS-CoV-2 receptor ACE2 and TMPRSS2 are predo...main subject
374http://www.wikidata.org/entity/Q18275390Steve Padillamedical condition
1486http://www.wikidata.org/entity/Q88975920Lymphopenia predicts disease severity of COVID...main subject
282http://www.wikidata.org/entity/Q6925992Mousa Shubairi Zanjanimedical condition
123http://www.wikidata.org/entity/Q3390432Benet Joanet Jiménezcause of death
\n", "
" ], "text/plain": [ " item.value \\\n", "257 http://www.wikidata.org/entity/Q5753193 \n", "1439 http://www.wikidata.org/entity/Q88975335 \n", "30 http://www.wikidata.org/entity/Q321770 \n", "1627 http://www.wikidata.org/entity/Q88976733 \n", "1129 http://www.wikidata.org/entity/Q88876722 \n", "1698 http://www.wikidata.org/entity/Q88977417 \n", "374 http://www.wikidata.org/entity/Q18275390 \n", "1486 http://www.wikidata.org/entity/Q88975920 \n", "282 http://www.wikidata.org/entity/Q6925992 \n", "123 http://www.wikidata.org/entity/Q3390432 \n", "\n", " itemLabel.value propertyLabel.value \n", "257 Carolina Darias San Sebastián medical condition \n", "1439 Perceptions of the Adult US Population regardi... main subject \n", "30 Aaron Tveit medical condition \n", "1627 Protocol for a randomized controlled trial tes... main subject \n", "1129 Narciso Arranz Cerezo cause of death \n", "1698 SARS-CoV-2 receptor ACE2 and TMPRSS2 are predo... main subject \n", "374 Steve Padilla medical condition \n", "1486 Lymphopenia predicts disease severity of COVID... main subject \n", "282 Mousa Shubairi Zanjani medical condition \n", "123 Benet Joanet Jiménez cause of death " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "truthy[['item.value','itemLabel.value','propertyLabel.value']].sample(10).head(10)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "main subject 1055\n", "medical condition 553\n", "cause of death 208\n", "has cause 198\n", "research intervention 13\n", "category combines topics 3\n", "has effect 3\n", "named after 3\n", "facet of 3\n", "different from 3\n", "has immediate cause 2\n", "medical condition treated 1\n", "Wikimedia portal's main topic 1\n", "category's main topic 1\n", "item for this sense 1\n", "represents 1\n", "interested in 1\n", "instance of 1\n", "vaccine for 1\n", "Name: propertyLabel.value, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "truthy['propertyLabel.value'].value_counts()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
537Recent advances in the detection of respirator...main subject
538The continuing 2019-nCoV epidemic threat of no...main subject
539Clinical features of patients infected with 20...main subject
540Early Transmission Dynamics in Wuhan, China, o...main subject
5412019-nCoV, first death outside Chinamain subject
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "537 Recent advances in the detection of respirator... main subject\n", "538 The continuing 2019-nCoV epidemic threat of no... main subject\n", "539 Clinical features of patients infected with 20... main subject\n", "540 Early Transmission Dynamics in Wuhan, China, o... main subject\n", "541 2019-nCoV, first death outside China main subject" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "truthy[truthy['propertyLabel.value'] == 'main subject'][['itemLabel.value','propertyLabel.value']].head(5)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
5102019–20 COVID-19 pandemichas cause
5112019–20 coronavirus pandemic in mainland Chinahas cause
5122019–20 coronavirus outbreak in Japanhas cause
5132019–20 COVID-19 outbreak in South Koreahas cause
5142019–20 coronavirus outbreak in Vietnamhas cause
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "510 2019–20 COVID-19 pandemic has cause\n", "511 2019–20 coronavirus pandemic in mainland China has cause\n", "512 2019–20 coronavirus outbreak in Japan has cause\n", "513 2019–20 COVID-19 outbreak in South Korea has cause\n", "514 2019–20 coronavirus outbreak in Vietnam has cause" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "truthy[truthy['propertyLabel.value'] == 'has cause'][['itemLabel.value','propertyLabel.value']].head(5)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
594timeline of the 2019–20 coronavirus pandemicfacet of
595SARS-CoV-2 transmissionfacet of
1096Ordinance of January 30, 2020facet of
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "594 timeline of the 2019–20 coronavirus pandemic facet of\n", "595 SARS-CoV-2 transmission facet of\n", "1096 Ordinance of January 30, 2020 facet of" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "truthy[truthy['propertyLabel.value'] == ('facet of' or 'main subject')][['itemLabel.value','propertyLabel.value']].head(5)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
12Mikel Artetamedical condition
13Rüştü Reçbermedical condition
14Marko Pantelićmedical condition
15Suk Hyun-junmedical condition
16Thomas Kahlenbergmedical condition
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "12 Mikel Arteta medical condition\n", "13 Rüştü Reçber medical condition\n", "14 Marko Pantelić medical condition\n", "15 Suk Hyun-jun medical condition\n", "16 Thomas Kahlenberg medical condition" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "truthy[truthy['propertyLabel.value'] == 'medical condition'][['itemLabel.value','propertyLabel.value']].head(5)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "#Remove medical condition and cause of death\n", "mainSubject = truthy[truthy['propertyLabel.value'] != 'medical condition']\n", "mainSubject = mainSubject[mainSubject['propertyLabel.value'] != 'cause of death']\n", "mainSubject = mainSubject[mainSubject['propertyLabel.value'] != 'medical condition treated']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
1442Modelling the coronavirus disease (COVID-19) o...main subject
1221Clinical characteristics of 2019 novel coronav...main subject
1205Preliminary identification of potential vaccin...main subject
835Clinical characteristics of 24 asymptomatic in...main subject
1783SARS-CoV-2 specific antibody responses in COVI...main subject
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "1442 Modelling the coronavirus disease (COVID-19) o... main subject\n", "1221 Clinical characteristics of 2019 novel coronav... main subject\n", "1205 Preliminary identification of potential vaccin... main subject\n", "835 Clinical characteristics of 24 asymptomatic in... main subject\n", "1783 SARS-CoV-2 specific antibody responses in COVI... main subject" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mainSubject[['itemLabel.value','propertyLabel.value']].sample(5).head(5)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "#All truthy statements with 2019–20 COVID-19 pandemic (Q81068910) as value.\n", "#https://w.wiki/Kvd (Thanks User:Dipsacus_fullonum)\n", "\n", "sparql.setQuery(\"\"\"\n", "# \n", "SELECT ?item ?itemLabel ?property ?propertyLabel WHERE {\n", " ?item ?claim wd:Q81068910. #2019–20 COVID-19 pandemic\n", " ?property wikibase:directClaim ?claim.\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n", "}\n", "\"\"\")\n", "sparql.setReturnFormat(JSON)\n", "results = sparql.query().convert()\n", "\n", "Q81068910 = pd.io.json.json_normalize(results['results']['bindings'])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "main subject 991\n", "part of 532\n", "facet of 40\n", "category combines topics 21\n", "has cause 11\n", "significant event 8\n", "has effect 3\n", "has contributing factor 3\n", "category contains 2\n", "field of work 2\n", "different from 2\n", "template's main topic 1\n", "interested in 1\n", "notable works 1\n", "category's main topic 1\n", "Wikimedia portal's main topic 1\n", "Name: propertyLabel.value, dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q81068910['propertyLabel.value'].value_counts()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
22019–20 coronavirus pandemic in mainland Chinapart of
32019–20 coronavirus outbreak in Japanpart of
42019–20 COVID-19 outbreak in South Koreapart of
52019–20 coronavirus outbreak in Vietnampart of
62019–20 coronavirus outbreak in Singaporepart of
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "2 2019–20 coronavirus pandemic in mainland China part of\n", "3 2019–20 coronavirus outbreak in Japan part of\n", "4 2019–20 COVID-19 outbreak in South Korea part of\n", "5 2019–20 coronavirus outbreak in Vietnam part of\n", "6 2019–20 coronavirus outbreak in Singapore part of" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q81068910[Q81068910['propertyLabel.value'] == 'part of'][['itemLabel.value','propertyLabel.value']].head(5)\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
332020 Hubei lockdownshas cause
34Huoshenshan Hospitalhas cause
35the Central Leading Group for the Response to ...has cause
36Leishenshan Hospitalhas cause
37evacuations related to the 2019–20 coronavirus...has cause
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "33 2020 Hubei lockdowns has cause\n", "34 Huoshenshan Hospital has cause\n", "35 the Central Leading Group for the Response to ... has cause\n", "36 Leishenshan Hospital has cause\n", "37 evacuations related to the 2019–20 coronavirus... has cause" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q81068910[Q81068910['propertyLabel.value'] == 'has cause'][['itemLabel.value','propertyLabel.value']].head(5)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemLabel.valuepropertyLabel.value
297Black Mondayhas contributing factor
298Black Thursdayhas contributing factor
2992020 stock market crashhas contributing factor
\n", "
" ], "text/plain": [ " itemLabel.value propertyLabel.value\n", "297 Black Monday has contributing factor\n", "298 Black Thursday has contributing factor\n", "299 2020 stock market crash has contributing factor" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q81068910[Q81068910['propertyLabel.value'] == 'has contributing factor'][['itemLabel.value','propertyLabel.value']].head(5)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "#removing not strong connections such.\n", "Q81068910Strong = Q81068910[Q81068910['propertyLabel.value'] != 'field of work']\n", "Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'interested in']\n", "Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'notable works']\n", "Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'Wikimedia portal\\'s main topic']\n", "Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'category combines topics']\n", "\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "#Getting Qs ids\n", "mainSubjectQ = [ link.split('/')[-1] for link in mainSubject['item.value'].tolist()]\n", "allStatementsQ = [ link.split('/')[-1] for link in allStatements['item.value'].tolist()]\n", "Q81068910StrongQ = [ link.split('/')[-1] for link in Q81068910Strong['item.value'].tolist()]\n", "## merging both sets\n", "#adding both sets & seeds\n", "strongQs = set(mainSubjectQ).union(allStatementsQ).union({'Q84263196','Q83741704'})\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "import pickle\n", "with open('strongQsCovid-19_20200325.pickle','wb') as f:\n", " pickle.dump(strongQs,f)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "#On the strong set\n", "import requests\n", "sitelinks_base = 'https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&props=sitelinks&ids=' \n", "sitelinks = []\n", "\n", "for Q in strongQs:\n", " url = sitelinks_base + Q\n", " sitelinks.append(requests.get(url=url).json())\n", "\n", "pagesPerProject = {}\n", "for s in sitelinks:\n", " if 'entities' in s:\n", " for k,v in s['entities'].items():\n", " if 'sitelinks' in v:\n", " for wiki,data in v['sitelinks'].items():\n", " page = data['title']\n", " project ='%s.wikipedia' % wiki.replace('wiki','')\n", " pagesPerProject[project] = pagesPerProject.get(project,[])\n", " pagesPerProject[project].append(page)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Write Wikitext" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "with open('pagesPerProjectStronglyRelated20200402.wikitext','w') as f:\n", " for project, pages in pagesPerProject.items():\n", " projectcode = project.split('.')[0]\n", " f.write('\\n== %s == \\n \\n' % project )\n", " for page in pages:\n", " f.write('* [[%s:%s|%s]]\\n' % (projectcode,page,page)) " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "import pickle\n", "with open('pagesPerProjectStronglyRelated20200402.pickle','wb') as f:\n", " pickle.dump(pagesPerProject,f)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }