DS4UX PAWS cheatsheet

This notebook is a demo of the various connections that you can make to external datasets within PAWS The notebook uses markdown for text formatting (cheatsheet)

Used for the University of Washington course HCDE598a (Spring 2016): Data Science for User Experience Research

Forking a Notebook

step 1: get the url of another public PAWS notebook (example: http://paws-public.wmflabs.org/paws-public/EpochFail/projects/examples/mwapi.ipynb) step 2: pass in a raw param to download a raw .ipynb file http://paws-public.wmflabs.org/paws-public/EpochFail/projects/examples/mwapi.ipynb?format=raw step 3: log into your PAWS account and use "upload" to upload this copy into your own directory

API connections

You can connect to all sorts of APIs!

MediaWiki API

You can use python-mwapi (docs) to run queries against the MediaWiki API. You can also test your queries in the API sandbox

MediaWiki Pageview API

You can use python-mwviews (docs) to run queries against the MediaWiki Pageview API (blog post)

Wikidata API

UNTESTED: Query WikiData using query.wikidata.org (user manual) with Yuvi's wdqs module... which has no documentation (yet!)

Any other API

TODO: check with Socrata API

Database connections

You can use pymysql to run queries against the Wikimedia replica databases. See also Manual:Database access and Manual:Database layout on MediaWiki.org.

Access special datasets

Explaination goes here

Access HostBot db

Note: this is not working yet. Need to store hb db credentials in home directory so they can be accessed from my PAWS env. TODO: ask Yuvi to help set this up.

Import text data

Import CSV data