Using Pywikibot with PAWS¶

In this tutorial you'll learn about Pywikibot, a Python library that can be used to automate tasks on wikis, and how to get started with it using either Python 3 notebooks or the Terminal in PAWS.

See the PAWS documentation on Wikitech for more information and tutorials.

Table of Contents¶

  • What is Pywikibot?
  • Pywikibot and PAWS
  • What can you do with Pywikibot?
  • Before you start
  • Use a Python 3 notebook to work with Pywikibot in PAWS
  • Use a terminal to work with Pywikbot in PAWS
  • Explore Pywikibot scripts
  • Pywikibot and Wikidata
  • Further documentation
  • Example notebooks
  • The Pywikibot community

What is Pywikibot?¶

Pywikibot is a Python library and collection of tools that help automate work on MediaWiki sites. A Python library is a reusable piece of code that is focused around a specific purpose (math, datascience, game development, etc).

The Pywikibot library was orginally designed for Wikipedia; it is now used throughout the Wikimedia Foundation's projects and on many other MediaWiki wikis.

Pywikibot and PAWS¶

Pywikibot makes it possible to use scripts to automate tasks on wikis. It can be used in several environments: your own computer, Toolforge, and PAWS.

Of these three options PAWS requires the least set-up and does not require advanced technical knowledge. This makes it ideal for newcomers who are learning about Wikimedia technology and those wishing to run scripts without setting up their own environments.

Note: PAWS is most suitable for light weight, one time tasks. For scheduled tasks or tasks that need some heavy lifting, Toolforge is the suggested environment.

Learn more about why the PAWS service may be a good fit for your project or whether you should choose an alternative service.

Python 3 Notebook or Terminal?¶

From your PAWS control panel, you have two options that make it possible for you to work with the Pywikibot library: Python 3 Notebooks and the terminal available in PAWS. This tutorial will cover the basics of each.

Both methods work. The one you choose will depend on your comfort level with either method.

Do you need to know Python to use Piwikibot with PAWS?¶

This tutorial covers some basic tasks you can run using Pywikibot in a Python 3 notebook. Python is a general purpose programming language used by many in the Wikimedia technical community. While it's not necessary to know Python in order to use Pywikibot, it can be very useful to have at least some basic knowledge.

Python resources¶

  • Python for beginners

From you PAWS control panel you can work with Pywikibot either in a Python 3 notebook or a terminal. This tutorial will cover the basics of using both of these tools.

Do you need to know command line to use Pywkibot with PAWS?¶

This tutorial covers some basic tasks you can run using Pywikibot and the terminal. If you choose to use Pywikibot in the terminal it is helpful to have a basic understanding of command line.



What can you do with Pywikibot?¶

Pywikibot is a Python library that makes it possible to use scripts to perform a variety of tasks on wikis. Some examples include creating multiple pages at once, adding categories, adding labels in Wikidata, etc.

You can find an extensive list of scripts in the Pywikibot manual on Mediawiki. These are built-in scripts that you can use without coding anything.

You may also find it helpful to explore some existing notebooks that use Pywikibot to gain a better understanding of what you can do with it.

Understand users and user behavior on a wiki¶

  • See the global block history for a user across wikis
  • Pages created from external links by non-autoconfirmed users. This can be used to reduce spam on wikis.

Make it easier for editors to organize articles and information¶

  • Extract information about stubs for editors who are considering merging them This example uses the catgory "Rural localities in Russia."
  • Search for pages with deprecated templates

Contribute to Wikidata¶

  • Add missing labels to Wikidata

You can find a more extensive list of recipes, how-tos, and examples on Wikitech.



Before you start¶

  • Determine whether PAWS is right for your project.
  • Sign up for a Wikimedia Account.
  • Read the PAWS Getting Started tutorial/PAWS-Tutorial.ipynb)
  • Make sure to use the Test Wikipedia to practice and test scripts before running them on live wikis. You are responsible for the scripts you run, so be careful when using them on a site other than the Test Wiki.
  • Don't write passwords or private information in you notebook. It's public!
  • Remember to run the cells of your notebook in the correct order.
  • Make sure you have set up a user-config.py file file, if you plan to use Pywikibot from the terminal.


Use a Python 3 notebook to work with Pywikibot in PAWS¶

In this tutorial, we'll show you how to perform some simple tasks using Pywikibot in a Python 3 notebook in PAWS. You will not need to install any software or upload additional files.

Note: If you are following along and want to practice, use the Test Wikipedia, to ensure you don't inadvertantly make mistakes on your wiki.

Get started with a Python 3 notebook¶

1) Launch PAWS in your browser. 2) Create a new Python 3 notebook from your control panel.

3) Now, you can import the Pywikibot library. In the code cell enter the following and click run:

In [21]:
import pywikibot

Next, you will need to connect Pywikibot to the wiki you are want to work with. For our tutorial we will connect to the Test Wikipedia. You won't need to enter log in credentials. You have already logged into PAWS using OAUTH.

To connect to the wiki you want to work with, you will need to create to create an APISite object) that includes the language and the family of your wiki.

For example:

In [22]:
site = pywikibot.Site('language', 'family')

In the code cell enter the following and click run:

In [23]:
site = pywikibot.Site('test', 'wikipedia')

You are now connected to the Test Wikipedia and can begin to perform basic tasks using Pywikibot.

Note: If you want to check to see if this was successful type site in a code cell and click run

In [24]:
site
Out[24]:
APISite("test", "wikipedia")

Later when you want to connect to a different wiki, you can use the same code above. You'll just need to swap out the "language" and "family."

For example if you wish to connect to English Wikipedia you would type the following and click run:

In [25]:
site = pywikibot.Site('en', 'wikipedia')
In [26]:
site
Out[26]:
APISite("en", "wikipedia")

Some basic tasks you can perform using the Python 3 notebooks¶

With Pywikibot you can run a large number of scripts to perform an array of tasks.

Create a page¶

In [27]:
site = pywikibot.Site('test', 'wikipedia')
page = pywikibot.Page(site, 'Test:Pegasus')
page.save('test edit') 
Page [[Test:Pegasus]] saved

Fetch a page¶

If you want to fetch a page from the wiki you are connected to, you can do so by using the following script:

In [28]:
page = pywikibot.Page(site, '<code>Page name</code>')

In our following example we are still working with Test Wiki. The page we are fetching is called "Test:Pegasus."

After we run the script, we can check to see if this page exists by typing page.exists() into the cell. The output tells us it is true. The page does exist. If it does not exist, you will receive a false message.

In [29]:
page.exists()
Out[29]:
True

Add text to a page¶

You can add text to your page using Pywikibot as well.

In [30]:
page = pywikibot.Page(site, 'Test:Pegasus')
In [31]:
page.text = 'A pegasus is a flying horse.'
page.save('test edit')
Sleeping for 9.3 seconds, 2020-10-09 17:05:14
Page [[Test:Pegasus]] saved

Now, we want to view the text on the page, so we will type page.text into the cell. When we run it, we will retrieve the page text.

In [32]:
page.text
Out[32]:
'A pegasus is a flying horse.'

For our examples, we have been running separate cells, but you can run multiple lines at once:

In [33]:
import pywikibot

site = pywikibot.Site('test', 'wikipedia')
page = pywikibot.Page(site, 'test')
page.text = 'Hello world!'
page.save('test edit')
page.text
Sleeping for 9.3 seconds, 2020-10-09 17:05:24
Page [[Test]] saved
Out[33]:
'Hello world!'


Use a terminal to work with Pywikbot in PAWS¶

In this tutorial, we'll show you how to perform some simple task using Pywikibot in a terminal in PAWS.

Note: If you are following along and want to practice, use the Test Wikipedia, to ensure you don't inadvertantly make mistakes on your wiki.

Set up user-config.py¶

When you are ready work with a wiki, you'll need to connect it to Pywikibot by setting up a user-config.py file before you run any scripts. The user-config.py file contains information about the family or type of wiki you are working on and its language. This connects Pywikibot to your wiki and ensures your scripts will be run in the appropriate place.

For our tutorial, we'll be working with Test Wikpedia, and we'll set up a simple user-config.py file that connects Pywikibot to Test Wikipedia. You can explore more examples of user-config.py files to gain a better understanding of what they are and how to set them up. You may also want to look up the code for the language of the wiki you plan to work with.

Now, let's set up a basic user-config.py file for PAWS.

1) Launch PAWS in your browser. 2) Create a text file from your control panel.

3)Give the file user-config.py a title.

4) You will create a file that indicates the language and family of your document, as well as your bot's username.

In our case we are using the test wiki, so we would enter the following:

In [34]:
mylang = 'test'
family = 'wikipedia'
usernames['wikipedia']['test'] = 'BOTNAME')

5) Under the File tab click Save.

You should now see the user-config.py in the index of files in your PAWS control panel. You can alter this at anytime by clicking through the document and editing it. Note: When you are working with the terminal in PAW and you wish to work with a different wiki, make sure to change your user-config.py to reflect this.

For example, if you want to work in English Wikpedia, your user-config.py will include:

In [35]:
mylang = 'en'
family = 'wikipedia'

If you want to work with Wikimedia Commons, your user-config.py will include:

In [36]:
mylang = 'commons'
family = 'commons'

You can find more information about user-config.py in the Pywikibot manual on Mediawiki.

Get started with the PAWS terminal¶

1) Launch PAWS in your browser. 2) Create a new Terminal from your control panel.

3) You'll be taken to a terminal.

If you wish to see the commands available to you type ls /bin/ into the terminal, and hit Enter.

In [37]:
ls /bin/
bash*          date*           lessecho*       pwd*         uname*
bunzip2*       dd*             lessfile@       rbash@       uncompress*
bzcat*         df*             lesskey*        readlink*    vdir*
bzcmp@         dir*            lesspipe*       rm*          wdctl*
bzdiff*        dmesg*          ln*             rmdir*       which*
bzegrep@       dnsdomainname@  login*          rnano@       ypdomainname@
bzexe*         domainname@     ls*             run-parts*   zcat*
bzfgrep@       echo*           lsblk*          sed*         zcmp*
bzgrep*        egrep*          mkdir*          sh@          zdiff*
bzip2*         false*          mknod*          sh.distrib@  zegrep*
bzip2recover*  fgrep*          mktemp*         sleep*       zfgrep*
bzless@        findmnt*        more*           stty*        zforce*
bzmore*        grep*           mount*          su*          zgrep*
cat*           gunzip*         mountpoint*     sync*        zless*
chgrp*         gzexe*          mv*             tar*         zmore*
chmod*         gzip*           nano*           tempfile*    znew*
chown*         hostname*       nisdomainname@  touch*
cp*            kill*           pidof@          true*
dash*          less*           ps*             umount*

Log in to Test Wikipedia¶

Type the following text into your terminal.

For our tutorial you will not need to enter log in credentials. You have already logged into PAWS using OAUTH.

In [38]:
$ pwb.py login

The terminal should now indicate that you are logged in to Test Wikipedia.

Some basic tasks you can perform using the PAWS terminal¶

Create a page¶

In the following example, you'll create your User Talk page on the Test Wiki. Type the following in your terminal, making sure to replace (username) with your own username.

In [39]:
$ pwb.py add_text -up -talk -page:"User talk:<username>" -text:"Hello. ~~~~"

You will see something similar to the following in your terminal. Notice how you will have the option to accept the changes.

Fetch a page¶

You can fetch a page by name and save it to your PAWS control panel as a text file by typing the following:

In [40]:
pwb.py listpages -page:"<page>" -save

Your terminal will look like this:

Once you've created the image, check the PAWS control panel, and you'll find the page as a .txt file there:

If you click through this file, you will find a page that contains the text of the page you fetched:

Explore Pywikibot scripts¶

This tutorial only covers some very basic things you can do with Pywikibot in PAWS.

Your next step would be to explore and write scripts that are more complex. You can find a list of scripts here:

  • Global bot scripts
  • Scripts package

Pywikibot and Wikidata¶

Many people use Pywikibot to work with Wikidata.

A deep exploration of this is beyond the scope of this tutorial, but you can find more information in the Wikidata section of the Pywikibot manual on Mediawiki and the Wikidata:Pywikibot - Python 3 Tutorial on Wikidata.



Documentation on wikis¶

  • PAWS- Learn about PAWS and how to use Jupyter Notebooks to support your wiki projects.
  • Pywikibot-Pywikibot technical documentation on MediaWiki.

Example notebooks¶

Here you'll find a selection of user notebooks that use Pywkibot:

  • Add copyright to items in Wikidata
  • Add copyright, creator to items in Wikidata
  • Add awards to Wikidata category Sports Hall of Fame
  • Add referenences to items already in Wikidata
  • Auto Wikiproject
  • Add short descriptions to biographies on Wikipedia EN
  • Add items to Wikidata
  • Change qualifier in P39 statements - Wikidata
  • Make changes to pages using MyPySQL and Pywikibot - HY Wikipedia
  • Remove broken files
  • Investigate bot issues
  • Policy changes - ZH Wikipedia
  • Teahouse archives answers
  • Analyze number of new editors per month
  • Catagorize images after the end of Wiki Loves Love
  • Clean history merge list - Wikiproject history
  • Categorize images from Wiki Loves Earth
  • Move and recategorize patronymic names on Commons
  • Dead interlanguage links***
  • Fix BDA Ids on Wikidata
  • Fix titles on Wikidata
  • Get articles without images
  • Global replace in Wikipedia DE
  • Categorize graves in cemeteries - commons
  • Mass remove claims - Wikidata ***
  • A script to move pages
  • Get files with NASA image template - Commons
  • Remove redirect class
  • Check userpage authorship - RU Wikipedia
  • Fix bad interwiki links
  • Recategorize and move pages
  • Upload text
  • Parse data from talk pages
  • Add a property to a category - Wikidata
  • Autostatus update for Wikiproject
  • Batch delete and unlink images
  • Identify unhelpful file names on Commons
  • Bulk depracate a template
  • Bulk deprecate an index parameter
  • Add statements to candiadats in Canada elections - Wikidata
  • Move all pages from one subcategory to another
  • Create new user pages
  • Redirect a talk page
  • Relicense uploads to Wikimedia Commons
  • Replace page text
  • Update a redirect

Community¶

  • Pywikibot Communication - Find links to discussion groups and lists.