{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# COVID-19 Related Articles (Strongly related)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(Updated March 21th) New definition. To find the former definition go to this [notebook](https://paws-public.wmflabs.org/paws/user/Diego_%28WMF%29/notebooks/CoronaVirusStronglyRelatedPages-FisrtMethodologyMarch18.ipynb)\n",
"\n",
"Here we focus in articles related to COVID-19 (Q84263196) and 2019–20 COVID-19 pandemic (Q81068910). Finding relevant connections such as \"Main Subject\", \"Part Of\" or \"Has caused\".\n",
"\n",
"This approach tries to solve the \"Tom Hanks' problem\", that is articles about celebrities \"Medical Condition\" such as (Tom Hanks) or other weak connections with COVID.\n",
"To get a full list of articles realted with COVID-19 go to [this notebook](https://paws-public.wmflabs.org/paws/user/Diego_%28WMF%29/notebooks/Corona%20virus%20related%20pages.ipynb)\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: SPARQLWrapper in /srv/paws/lib/python3.6/site-packages\r\n",
"Requirement already satisfied: rdflib>=4.0 in /srv/paws/lib/python3.6/site-packages (from SPARQLWrapper)\r\n",
"Requirement already satisfied: pyparsing in /srv/paws/lib/python3.6/site-packages (from rdflib>=4.0->SPARQLWrapper)\r\n",
"Requirement already satisfied: isodate in /srv/paws/lib/python3.6/site-packages (from rdflib>=4.0->SPARQLWrapper)\r\n",
"Requirement already satisfied: six in /srv/paws/lib/python3.6/site-packages (from isodate->rdflib>=4.0->SPARQLWrapper)\r\n"
]
}
],
"source": [
"#install dependencies\n",
"!pip install SPARQLWrapper"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from SPARQLWrapper import SPARQLWrapper, JSON\n",
"import pandas as pd\n",
"\n",
"sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n",
"#https://w.wiki/KvX (Thanks User:Dipsacus_fullonum)\n",
"# All statements with item, property, value and rank with COVID-19 (Q84263196) as value for qualifier.\n",
"\n",
"sparql.setQuery(\"\"\"\n",
"SELECT ?item ?itemLabel ?property ?propertyLabel ?value ?valueLabel ?rank ?qualifier ?qualifierLabel\n",
"WHERE\n",
"{\n",
" ?item ?claim ?statement.\n",
" ?property wikibase:claim ?claim.\n",
" ?property wikibase:statementProperty ?sprop.\n",
" ?statement ?sprop ?value.\n",
" ?statement wikibase:rank ?rank. \n",
" ?statement ?qprop wd:Q84263196. # COVID-19\n",
"\n",
" \n",
" ?qualifier wikibase:qualifier ?qprop.\n",
" SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n",
"}\n",
"\"\"\")\n",
"sparql.setReturnFormat(JSON)\n",
"results = sparql.query().convert()\n",
"\n",
"allStatements = pd.io.json.json_normalize(results['results']['bindings'])\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"disease outbreak 370\n",
"human 7\n",
"treatment 2\n",
"drug repositioning 1\n",
"diagnostic test 1\n",
"pneumonia 1\n",
"medical diagnosis 1\n",
"mascot character 1\n",
"2020-03-05T00:00:00Z 1\n",
"vaccine 1\n",
"hierarchy of hazard controls 1\n",
"drug development 1\n",
"pandemic 1\n",
"moe anthropomorphic character 1\n",
"Name: valueLabel.value, dtype: int64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"allStatements['valueLabel.value'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Example of pages main subject COVID-19"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" item.value | \n",
" itemLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" http://www.wikidata.org/entity/Q88973921 | \n",
" 2020 coronavirus pandemic in Manitoba | \n",
"
\n",
" \n",
" 1 | \n",
" http://www.wikidata.org/entity/Q88976960 | \n",
" 2020 coronavirus pandemic in Saint Vincent and... | \n",
"
\n",
" \n",
" 2 | \n",
" http://www.wikidata.org/entity/Q86901049 | \n",
" COVID-19 testing | \n",
"
\n",
" \n",
" 3 | \n",
" http://www.wikidata.org/entity/Q87343682 | \n",
" 2020 coronavirus outbreak in Tunisia | \n",
"
\n",
" \n",
" 4 | \n",
" http://www.wikidata.org/entity/Q87402404 | \n",
" 2019–20 coronavirus outbreak in Hubei | \n",
"
\n",
" \n",
" 5 | \n",
" http://www.wikidata.org/entity/Q87406428 | \n",
" 2020 coronavirus outbreak in Americas | \n",
"
\n",
" \n",
" 6 | \n",
" http://www.wikidata.org/entity/Q87412028 | \n",
" 2020 coronavirus outbreak in Brunei | \n",
"
\n",
" \n",
" 7 | \n",
" http://www.wikidata.org/entity/Q87409953 | \n",
" 2020 coronavirus pandemic in Washington | \n",
"
\n",
" \n",
" 8 | \n",
" http://www.wikidata.org/entity/Q87418063 | \n",
" Category:Coronavirus disease 2019 survivors | \n",
"
\n",
" \n",
" 9 | \n",
" http://www.wikidata.org/entity/Q87414741 | \n",
" 2020 coronavirus pandemic in New York | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" item.value \\\n",
"0 http://www.wikidata.org/entity/Q88973921 \n",
"1 http://www.wikidata.org/entity/Q88976960 \n",
"2 http://www.wikidata.org/entity/Q86901049 \n",
"3 http://www.wikidata.org/entity/Q87343682 \n",
"4 http://www.wikidata.org/entity/Q87402404 \n",
"5 http://www.wikidata.org/entity/Q87406428 \n",
"6 http://www.wikidata.org/entity/Q87412028 \n",
"7 http://www.wikidata.org/entity/Q87409953 \n",
"8 http://www.wikidata.org/entity/Q87418063 \n",
"9 http://www.wikidata.org/entity/Q87414741 \n",
"\n",
" itemLabel.value \n",
"0 2020 coronavirus pandemic in Manitoba \n",
"1 2020 coronavirus pandemic in Saint Vincent and... \n",
"2 COVID-19 testing \n",
"3 2020 coronavirus outbreak in Tunisia \n",
"4 2019–20 coronavirus outbreak in Hubei \n",
"5 2020 coronavirus outbreak in Americas \n",
"6 2020 coronavirus outbreak in Brunei \n",
"7 2020 coronavirus pandemic in Washington \n",
"8 Category:Coronavirus disease 2019 survivors \n",
"9 2020 coronavirus pandemic in New York "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"allStatements[['item.value','itemLabel.value']].head(10)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# All truthy statements with COVID-19 (Q84263196) as value.\n",
"#https://w.wiki/KvZ (Thanks User:Dipsacus_fullonum)\n",
"\n",
"sparql.setQuery(\"\"\"\n",
"SELECT ?item ?itemLabel ?property ?propertyLabel\n",
"WHERE\n",
"{\n",
" ?item ?claim wd:Q84263196.\n",
" ?property wikibase:directClaim ?claim.\n",
" SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n",
"}\"\"\")\n",
"sparql.setReturnFormat(JSON)\n",
"results = sparql.query().convert()\n",
"\n",
"truthy = pd.io.json.json_normalize(results['results']['bindings'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Example All truthy statements with COVID-19 (Q84263196) as value."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" item.value | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 257 | \n",
" http://www.wikidata.org/entity/Q5753193 | \n",
" Carolina Darias San Sebastián | \n",
" medical condition | \n",
"
\n",
" \n",
" 1439 | \n",
" http://www.wikidata.org/entity/Q88975335 | \n",
" Perceptions of the Adult US Population regardi... | \n",
" main subject | \n",
"
\n",
" \n",
" 30 | \n",
" http://www.wikidata.org/entity/Q321770 | \n",
" Aaron Tveit | \n",
" medical condition | \n",
"
\n",
" \n",
" 1627 | \n",
" http://www.wikidata.org/entity/Q88976733 | \n",
" Protocol for a randomized controlled trial tes... | \n",
" main subject | \n",
"
\n",
" \n",
" 1129 | \n",
" http://www.wikidata.org/entity/Q88876722 | \n",
" Narciso Arranz Cerezo | \n",
" cause of death | \n",
"
\n",
" \n",
" 1698 | \n",
" http://www.wikidata.org/entity/Q88977417 | \n",
" SARS-CoV-2 receptor ACE2 and TMPRSS2 are predo... | \n",
" main subject | \n",
"
\n",
" \n",
" 374 | \n",
" http://www.wikidata.org/entity/Q18275390 | \n",
" Steve Padilla | \n",
" medical condition | \n",
"
\n",
" \n",
" 1486 | \n",
" http://www.wikidata.org/entity/Q88975920 | \n",
" Lymphopenia predicts disease severity of COVID... | \n",
" main subject | \n",
"
\n",
" \n",
" 282 | \n",
" http://www.wikidata.org/entity/Q6925992 | \n",
" Mousa Shubairi Zanjani | \n",
" medical condition | \n",
"
\n",
" \n",
" 123 | \n",
" http://www.wikidata.org/entity/Q3390432 | \n",
" Benet Joanet Jiménez | \n",
" cause of death | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" item.value \\\n",
"257 http://www.wikidata.org/entity/Q5753193 \n",
"1439 http://www.wikidata.org/entity/Q88975335 \n",
"30 http://www.wikidata.org/entity/Q321770 \n",
"1627 http://www.wikidata.org/entity/Q88976733 \n",
"1129 http://www.wikidata.org/entity/Q88876722 \n",
"1698 http://www.wikidata.org/entity/Q88977417 \n",
"374 http://www.wikidata.org/entity/Q18275390 \n",
"1486 http://www.wikidata.org/entity/Q88975920 \n",
"282 http://www.wikidata.org/entity/Q6925992 \n",
"123 http://www.wikidata.org/entity/Q3390432 \n",
"\n",
" itemLabel.value propertyLabel.value \n",
"257 Carolina Darias San Sebastián medical condition \n",
"1439 Perceptions of the Adult US Population regardi... main subject \n",
"30 Aaron Tveit medical condition \n",
"1627 Protocol for a randomized controlled trial tes... main subject \n",
"1129 Narciso Arranz Cerezo cause of death \n",
"1698 SARS-CoV-2 receptor ACE2 and TMPRSS2 are predo... main subject \n",
"374 Steve Padilla medical condition \n",
"1486 Lymphopenia predicts disease severity of COVID... main subject \n",
"282 Mousa Shubairi Zanjani medical condition \n",
"123 Benet Joanet Jiménez cause of death "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"truthy[['item.value','itemLabel.value','propertyLabel.value']].sample(10).head(10)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"main subject 1055\n",
"medical condition 553\n",
"cause of death 208\n",
"has cause 198\n",
"research intervention 13\n",
"category combines topics 3\n",
"has effect 3\n",
"named after 3\n",
"facet of 3\n",
"different from 3\n",
"has immediate cause 2\n",
"medical condition treated 1\n",
"Wikimedia portal's main topic 1\n",
"category's main topic 1\n",
"item for this sense 1\n",
"represents 1\n",
"interested in 1\n",
"instance of 1\n",
"vaccine for 1\n",
"Name: propertyLabel.value, dtype: int64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"truthy['propertyLabel.value'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 537 | \n",
" Recent advances in the detection of respirator... | \n",
" main subject | \n",
"
\n",
" \n",
" 538 | \n",
" The continuing 2019-nCoV epidemic threat of no... | \n",
" main subject | \n",
"
\n",
" \n",
" 539 | \n",
" Clinical features of patients infected with 20... | \n",
" main subject | \n",
"
\n",
" \n",
" 540 | \n",
" Early Transmission Dynamics in Wuhan, China, o... | \n",
" main subject | \n",
"
\n",
" \n",
" 541 | \n",
" 2019-nCoV, first death outside China | \n",
" main subject | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"537 Recent advances in the detection of respirator... main subject\n",
"538 The continuing 2019-nCoV epidemic threat of no... main subject\n",
"539 Clinical features of patients infected with 20... main subject\n",
"540 Early Transmission Dynamics in Wuhan, China, o... main subject\n",
"541 2019-nCoV, first death outside China main subject"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"truthy[truthy['propertyLabel.value'] == 'main subject'][['itemLabel.value','propertyLabel.value']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 510 | \n",
" 2019–20 COVID-19 pandemic | \n",
" has cause | \n",
"
\n",
" \n",
" 511 | \n",
" 2019–20 coronavirus pandemic in mainland China | \n",
" has cause | \n",
"
\n",
" \n",
" 512 | \n",
" 2019–20 coronavirus outbreak in Japan | \n",
" has cause | \n",
"
\n",
" \n",
" 513 | \n",
" 2019–20 COVID-19 outbreak in South Korea | \n",
" has cause | \n",
"
\n",
" \n",
" 514 | \n",
" 2019–20 coronavirus outbreak in Vietnam | \n",
" has cause | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"510 2019–20 COVID-19 pandemic has cause\n",
"511 2019–20 coronavirus pandemic in mainland China has cause\n",
"512 2019–20 coronavirus outbreak in Japan has cause\n",
"513 2019–20 COVID-19 outbreak in South Korea has cause\n",
"514 2019–20 coronavirus outbreak in Vietnam has cause"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"truthy[truthy['propertyLabel.value'] == 'has cause'][['itemLabel.value','propertyLabel.value']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 594 | \n",
" timeline of the 2019–20 coronavirus pandemic | \n",
" facet of | \n",
"
\n",
" \n",
" 595 | \n",
" SARS-CoV-2 transmission | \n",
" facet of | \n",
"
\n",
" \n",
" 1096 | \n",
" Ordinance of January 30, 2020 | \n",
" facet of | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"594 timeline of the 2019–20 coronavirus pandemic facet of\n",
"595 SARS-CoV-2 transmission facet of\n",
"1096 Ordinance of January 30, 2020 facet of"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"truthy[truthy['propertyLabel.value'] == ('facet of' or 'main subject')][['itemLabel.value','propertyLabel.value']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 12 | \n",
" Mikel Arteta | \n",
" medical condition | \n",
"
\n",
" \n",
" 13 | \n",
" Rüştü Reçber | \n",
" medical condition | \n",
"
\n",
" \n",
" 14 | \n",
" Marko Pantelić | \n",
" medical condition | \n",
"
\n",
" \n",
" 15 | \n",
" Suk Hyun-jun | \n",
" medical condition | \n",
"
\n",
" \n",
" 16 | \n",
" Thomas Kahlenberg | \n",
" medical condition | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"12 Mikel Arteta medical condition\n",
"13 Rüştü Reçber medical condition\n",
"14 Marko Pantelić medical condition\n",
"15 Suk Hyun-jun medical condition\n",
"16 Thomas Kahlenberg medical condition"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"truthy[truthy['propertyLabel.value'] == 'medical condition'][['itemLabel.value','propertyLabel.value']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"#Remove medical condition and cause of death\n",
"mainSubject = truthy[truthy['propertyLabel.value'] != 'medical condition']\n",
"mainSubject = mainSubject[mainSubject['propertyLabel.value'] != 'cause of death']\n",
"mainSubject = mainSubject[mainSubject['propertyLabel.value'] != 'medical condition treated']"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 1442 | \n",
" Modelling the coronavirus disease (COVID-19) o... | \n",
" main subject | \n",
"
\n",
" \n",
" 1221 | \n",
" Clinical characteristics of 2019 novel coronav... | \n",
" main subject | \n",
"
\n",
" \n",
" 1205 | \n",
" Preliminary identification of potential vaccin... | \n",
" main subject | \n",
"
\n",
" \n",
" 835 | \n",
" Clinical characteristics of 24 asymptomatic in... | \n",
" main subject | \n",
"
\n",
" \n",
" 1783 | \n",
" SARS-CoV-2 specific antibody responses in COVI... | \n",
" main subject | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"1442 Modelling the coronavirus disease (COVID-19) o... main subject\n",
"1221 Clinical characteristics of 2019 novel coronav... main subject\n",
"1205 Preliminary identification of potential vaccin... main subject\n",
"835 Clinical characteristics of 24 asymptomatic in... main subject\n",
"1783 SARS-CoV-2 specific antibody responses in COVI... main subject"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mainSubject[['itemLabel.value','propertyLabel.value']].sample(5).head(5)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"#All truthy statements with 2019–20 COVID-19 pandemic (Q81068910) as value.\n",
"#https://w.wiki/Kvd (Thanks User:Dipsacus_fullonum)\n",
"\n",
"sparql.setQuery(\"\"\"\n",
"# \n",
"SELECT ?item ?itemLabel ?property ?propertyLabel WHERE {\n",
" ?item ?claim wd:Q81068910. #2019–20 COVID-19 pandemic\n",
" ?property wikibase:directClaim ?claim.\n",
" SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\n",
"}\n",
"\"\"\")\n",
"sparql.setReturnFormat(JSON)\n",
"results = sparql.query().convert()\n",
"\n",
"Q81068910 = pd.io.json.json_normalize(results['results']['bindings'])"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"main subject 991\n",
"part of 532\n",
"facet of 40\n",
"category combines topics 21\n",
"has cause 11\n",
"significant event 8\n",
"has effect 3\n",
"has contributing factor 3\n",
"category contains 2\n",
"field of work 2\n",
"different from 2\n",
"template's main topic 1\n",
"interested in 1\n",
"notable works 1\n",
"category's main topic 1\n",
"Wikimedia portal's main topic 1\n",
"Name: propertyLabel.value, dtype: int64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Q81068910['propertyLabel.value'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 2019–20 coronavirus pandemic in mainland China | \n",
" part of | \n",
"
\n",
" \n",
" 3 | \n",
" 2019–20 coronavirus outbreak in Japan | \n",
" part of | \n",
"
\n",
" \n",
" 4 | \n",
" 2019–20 COVID-19 outbreak in South Korea | \n",
" part of | \n",
"
\n",
" \n",
" 5 | \n",
" 2019–20 coronavirus outbreak in Vietnam | \n",
" part of | \n",
"
\n",
" \n",
" 6 | \n",
" 2019–20 coronavirus outbreak in Singapore | \n",
" part of | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"2 2019–20 coronavirus pandemic in mainland China part of\n",
"3 2019–20 coronavirus outbreak in Japan part of\n",
"4 2019–20 COVID-19 outbreak in South Korea part of\n",
"5 2019–20 coronavirus outbreak in Vietnam part of\n",
"6 2019–20 coronavirus outbreak in Singapore part of"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Q81068910[Q81068910['propertyLabel.value'] == 'part of'][['itemLabel.value','propertyLabel.value']].head(5)\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 33 | \n",
" 2020 Hubei lockdowns | \n",
" has cause | \n",
"
\n",
" \n",
" 34 | \n",
" Huoshenshan Hospital | \n",
" has cause | \n",
"
\n",
" \n",
" 35 | \n",
" the Central Leading Group for the Response to ... | \n",
" has cause | \n",
"
\n",
" \n",
" 36 | \n",
" Leishenshan Hospital | \n",
" has cause | \n",
"
\n",
" \n",
" 37 | \n",
" evacuations related to the 2019–20 coronavirus... | \n",
" has cause | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"33 2020 Hubei lockdowns has cause\n",
"34 Huoshenshan Hospital has cause\n",
"35 the Central Leading Group for the Response to ... has cause\n",
"36 Leishenshan Hospital has cause\n",
"37 evacuations related to the 2019–20 coronavirus... has cause"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Q81068910[Q81068910['propertyLabel.value'] == 'has cause'][['itemLabel.value','propertyLabel.value']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" itemLabel.value | \n",
" propertyLabel.value | \n",
"
\n",
" \n",
" \n",
" \n",
" 297 | \n",
" Black Monday | \n",
" has contributing factor | \n",
"
\n",
" \n",
" 298 | \n",
" Black Thursday | \n",
" has contributing factor | \n",
"
\n",
" \n",
" 299 | \n",
" 2020 stock market crash | \n",
" has contributing factor | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" itemLabel.value propertyLabel.value\n",
"297 Black Monday has contributing factor\n",
"298 Black Thursday has contributing factor\n",
"299 2020 stock market crash has contributing factor"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Q81068910[Q81068910['propertyLabel.value'] == 'has contributing factor'][['itemLabel.value','propertyLabel.value']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"#removing not strong connections such.\n",
"Q81068910Strong = Q81068910[Q81068910['propertyLabel.value'] != 'field of work']\n",
"Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'interested in']\n",
"Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'notable works']\n",
"Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'Wikimedia portal\\'s main topic']\n",
"Q81068910Strong = Q81068910Strong[Q81068910Strong['propertyLabel.value'] != 'category combines topics']\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"#Getting Qs ids\n",
"mainSubjectQ = [ link.split('/')[-1] for link in mainSubject['item.value'].tolist()]\n",
"allStatementsQ = [ link.split('/')[-1] for link in allStatements['item.value'].tolist()]\n",
"Q81068910StrongQ = [ link.split('/')[-1] for link in Q81068910Strong['item.value'].tolist()]\n",
"## merging both sets\n",
"#adding both sets & seeds\n",
"strongQs = set(mainSubjectQ).union(allStatementsQ).union({'Q84263196','Q83741704'})\n"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"with open('strongQsCovid-19_20200325.pickle','wb') as f:\n",
" pickle.dump(strongQs,f)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"#On the strong set\n",
"import requests\n",
"sitelinks_base = 'https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&props=sitelinks&ids=' \n",
"sitelinks = []\n",
"\n",
"for Q in strongQs:\n",
" url = sitelinks_base + Q\n",
" sitelinks.append(requests.get(url=url).json())\n",
"\n",
"pagesPerProject = {}\n",
"for s in sitelinks:\n",
" if 'entities' in s:\n",
" for k,v in s['entities'].items():\n",
" if 'sitelinks' in v:\n",
" for wiki,data in v['sitelinks'].items():\n",
" page = data['title']\n",
" project ='%s.wikipedia' % wiki.replace('wiki','')\n",
" pagesPerProject[project] = pagesPerProject.get(project,[])\n",
" pagesPerProject[project].append(page)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Write Wikitext"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"with open('pagesPerProjectStronglyRelated20200402.wikitext','w') as f:\n",
" for project, pages in pagesPerProject.items():\n",
" projectcode = project.split('.')[0]\n",
" f.write('\\n== %s == \\n \\n' % project )\n",
" for page in pages:\n",
" f.write('* [[%s:%s|%s]]\\n' % (projectcode,page,page)) "
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"with open('pagesPerProjectStronglyRelated20200402.pickle','wb') as f:\n",
" pickle.dump(pagesPerProject,f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}