Let's beign writing simple scripts that can pull data from existing structured or semi-structured sources and put it into Wikidata using the pywikibot API.
Some possible tasks are:
There github releases are available for public viewing in the Github project's Releases page. Even if the project does not use github, they generally clone the project on a github repo which syncs all the commits, branches, tags, and releases. Hence, there is a lot of data on github whih can be used in Wikidata !
Related info to this task:
Examples with lots of versions: systemd (Q286124), Debian (Q7593), Linus kernel (Q14579)
Possible things to work on:
import pywikibot
WIKIDATA_ITEM = 'Q48464'
GITHUB_PAGE = 'https://github.com/joshumax/hurd'
def get_releases(link):
link = link.replace('github.com', 'api.github.com/repos')
link += ('' if link[-1] == '/' else '/') + 'releases'
return link
def get_tags(link):
link = link.replace('github.com', 'api.github.com/repos')
link += ('' if link[-1] == '/' else '/') + 'tags'
return link
def get_json(url):
import json
from urllib.request import urlopen
response = urlopen(url)
data = json.loads(response.read().decode('utf-8'))
return data
print(get_releases(GITHUB_PAGE))
tags = get_json(get_tags(GITHUB_PAGE))
releases = get_json(get_releases(GITHUB_PAGE))
def get_tag_versions(tags):
import datetime
for tag in tags:
print("Working on tag: ", tag['name'])
commit_info = get_json(tag['commit']['url'])
date = datetime.datetime.strptime(commit_info['commit']['author']['date'], '%Y-%m-%dT%H:%M:%SZ')
yield {"name": tag['name'], "date": date}
def get_release_versions(releases):
import datetime
for release in releases:
print("Working on release: ", release['name'])
date = datetime.datetime.strptime(release['published_at'], '%Y-%m-%dT%H:%M:%SZ')
yield {"name": release['name'], "date": date}
from pprint import pprint
pprint(list(get_tag_versions(tags)))
pprint(list(get_release_versions(releases)))
Some organizations like Gnome host all their versions of a application on FTP. This allows us to get a single place from where various versions of different software can be scraped.
Related info to the task:
Use the CHANGELOG given by the software to find the software versions and update it.
Related info to the task:
The population data is publicly available in most countries. This task aims to use this data and add it to the respective city/district/state/country pages in Wikidata and add the source and time when the data was taken from.
Related info to this task: