In this tutorial you'll learn about Pywikibot, a Python library that can be used to automate tasks on wikis, and how to get started with it using either Python 3 notebooks or the Terminal in PAWS.
See the PAWS documentation on Wikitech for more information and tutorials.
Pywikibot is a Python library and collection of tools that help automate work on MediaWiki sites. A Python library is a reusable piece of code that is focused around a specific purpose (math, datascience, game development, etc).
The Pywikibot library was orginally designed for Wikipedia; it is now used throughout the Wikimedia Foundation's projects and on many other MediaWiki wikis.
Pywikibot makes it possible to use scripts to automate tasks on wikis. It can be used in several environments: your own computer, Toolforge, and PAWS.
Of these three options PAWS requires the least set-up and does not require advanced technical knowledge. This makes it ideal for newcomers who are learning about Wikimedia technology and those wishing to run scripts without setting up their own environments.
Note: PAWS is most suitable for light weight, one time tasks. For scheduled tasks or tasks that need some heavy lifting, Toolforge is the suggested environment.
Learn more about why the PAWS service may be a good fit for your project or whether you should choose an alternative service.
From your PAWS control panel, you have two options that make it possible for you to work with the Pywikibot library: Python 3 Notebooks and the terminal available in PAWS. This tutorial will cover the basics of each.
Both methods work. The one you choose will depend on your comfort level with either method.
This tutorial covers some basic tasks you can run using Pywikibot in a Python 3 notebook. Python is a general purpose programming language used by many in the Wikimedia technical community. While it's not necessary to know Python in order to use Pywikibot, it can be very useful to have at least some basic knowledge.
From you PAWS control panel you can work with Pywikibot either in a Python 3 notebook or a terminal. This tutorial will cover the basics of using both of these tools.
This tutorial covers some basic tasks you can run using Pywikibot and the terminal. If you choose to use Pywikibot in the terminal it is helpful to have a basic understanding of command line.
Pywikibot is a Python library that makes it possible to use scripts to perform a variety of tasks on wikis. Some examples include creating multiple pages at once, adding categories, adding labels in Wikidata, etc.
You can find an extensive list of scripts in the Pywikibot manual on Mediawiki. These are built-in scripts that you can use without coding anything.
You may also find it helpful to explore some existing notebooks that use Pywikibot to gain a better understanding of what you can do with it.
You can find a more extensive list of recipes, how-tos, and examples on Wikitech.
user-config.py file
file, if you plan to use Pywikibot from the terminal.In this tutorial, we'll show you how to perform some simple tasks using Pywikibot in a Python 3 notebook in PAWS. You will not need to install any software or upload additional files.
Note: If you are following along and want to practice, use the Test Wikipedia, to ensure you don't inadvertantly make mistakes on your wiki.
1) Launch PAWS in your browser. 2) Create a new Python 3 notebook from your control panel.
3) Now, you can import the Pywikibot library. In the code cell enter the following and click run:
import pywikibot
Next, you will need to connect Pywikibot to the wiki you are want to work with. For our tutorial we will connect to the Test Wikipedia. You won't need to enter log in credentials. You have already logged into PAWS using OAUTH.
To connect to the wiki you want to work with, you will need to create to create an APISite object) that includes the language
and the
family
of your wiki.
For example:
site = pywikibot.Site('language', 'family')
In the code cell enter the following and click run:
site = pywikibot.Site('test', 'wikipedia')
You are now connected to the Test Wikipedia and can begin to perform basic tasks using Pywikibot.
Note: If you want to check to see if this was successful type site
in a code cell and click run
site
APISite("test", "wikipedia")
Later when you want to connect to a different wiki, you can use the same code above. You'll just need to swap out the "language" and "family."
For example if you wish to connect to English Wikipedia you would type the following and click run:
site = pywikibot.Site('en', 'wikipedia')
site
APISite("en", "wikipedia")
With Pywikibot you can run a large number of scripts to perform an array of tasks.
site = pywikibot.Site('test', 'wikipedia')
page = pywikibot.Page(site, 'Test:Pegasus')
page.save('test edit')
Page [[Test:Pegasus]] saved
If you want to fetch a page from the wiki you are connected to, you can do so by using the following script:
page = pywikibot.Page(site, '<code>Page name</code>')
In our following example we are still working with Test Wiki. The page we are fetching is called "Test:Pegasus."
After we run the script, we can check to see if this page exists by typing page.exists()
into the cell. The output tells us it is true. The page does exist. If it does not exist, you will receive a false message.
page.exists()
True
You can add text to your page using Pywikibot as well.
page = pywikibot.Page(site, 'Test:Pegasus')
page.text = 'A pegasus is a flying horse.'
page.save('test edit')
Sleeping for 9.3 seconds, 2020-10-09 17:05:14 Page [[Test:Pegasus]] saved
Now, we want to view the text on the page, so we will type page.text
into the cell. When we run it, we will retrieve the page text.
page.text
'A pegasus is a flying horse.'
For our examples, we have been running separate cells, but you can run multiple lines at once:
import pywikibot
site = pywikibot.Site('test', 'wikipedia')
page = pywikibot.Page(site, 'test')
page.text = 'Hello world!'
page.save('test edit')
page.text
Sleeping for 9.3 seconds, 2020-10-09 17:05:24 Page [[Test]] saved
'Hello world!'
In this tutorial, we'll show you how to perform some simple task using Pywikibot in a terminal in PAWS.
Note: If you are following along and want to practice, use the Test Wikipedia, to ensure you don't inadvertantly make mistakes on your wiki.
When you are ready work with a wiki, you'll need to connect it to Pywikibot by setting up a user-config.py file
before you run any scripts. The user-config.py
file contains information about the family or type of wiki you are working on and its language. This connects Pywikibot to your wiki and ensures your scripts will be run in the appropriate place.
For our tutorial, we'll be working with Test Wikpedia, and we'll set up a simple user-config.py
file that connects Pywikibot to Test Wikipedia. You can explore more examples of user-config.py
files to gain a better understanding of what they are and how to set them up. You may also want to look up the code for the language of the wiki you plan to work with.
Now, let's set up a basic user-config.py file
for PAWS.
1) Launch PAWS in your browser. 2) Create a text file from your control panel.
3)Give the file user-config.py
a title.
4) You will create a file that indicates the language and family of your document, as well as your bot's username.
In our case we are using the test wiki, so we would enter the following:
mylang = 'test'
family = 'wikipedia'
usernames['wikipedia']['test'] = 'BOTNAME')
5) Under the File tab click Save.
You should now see the user-config.py
in the index of files in your PAWS control panel. You can alter this at anytime by clicking through the document and editing it. Note: When you are working with the terminal in PAW and you wish to work with a different wiki, make sure to change your user-config.py
to reflect this.
For example, if you want to work in English Wikpedia, your user-config.py
will include:
mylang = 'en'
family = 'wikipedia'
If you want to work with Wikimedia Commons, your user-config.py
will include:
mylang = 'commons'
family = 'commons'
You can find more information about user-config.py
in the Pywikibot manual on Mediawiki.
1) Launch PAWS in your browser. 2) Create a new Terminal from your control panel.
3) You'll be taken to a terminal.
If you wish to see the commands available to you type ls /bin/
into the terminal, and hit Enter.
ls /bin/
bash* date* lessecho* pwd* uname* bunzip2* dd* lessfile@ rbash@ uncompress* bzcat* df* lesskey* readlink* vdir* bzcmp@ dir* lesspipe* rm* wdctl* bzdiff* dmesg* ln* rmdir* which* bzegrep@ dnsdomainname@ login* rnano@ ypdomainname@ bzexe* domainname@ ls* run-parts* zcat* bzfgrep@ echo* lsblk* sed* zcmp* bzgrep* egrep* mkdir* sh@ zdiff* bzip2* false* mknod* sh.distrib@ zegrep* bzip2recover* fgrep* mktemp* sleep* zfgrep* bzless@ findmnt* more* stty* zforce* bzmore* grep* mount* su* zgrep* cat* gunzip* mountpoint* sync* zless* chgrp* gzexe* mv* tar* zmore* chmod* gzip* nano* tempfile* znew* chown* hostname* nisdomainname@ touch* cp* kill* pidof@ true* dash* less* ps* umount*
Type the following text into your terminal.
For our tutorial you will not need to enter log in credentials. You have already logged into PAWS using OAUTH.
$ pwb.py login
The terminal should now indicate that you are logged in to Test Wikipedia.
In the following example, you'll create your User Talk page on the Test Wiki. Type the following in your terminal, making sure to replace (username)
with your own username.
$ pwb.py add_text -up -talk -page:"User talk:<username>" -text:"Hello. ~~~~"
You will see something similar to the following in your terminal. Notice how you will have the option to accept the changes.
You can fetch a page by name and save it to your PAWS control panel as a text file by typing the following:
pwb.py listpages -page:"<page>" -save
Your terminal will look like this:
Once you've created the image, check the PAWS control panel, and you'll find the page as a .txt file there:
If you click through this file, you will find a page that contains the text of the page you fetched:
This tutorial only covers some very basic things you can do with Pywikibot in PAWS.
Your next step would be to explore and write scripts that are more complex. You can find a list of scripts here:
Many people use Pywikibot to work with Wikidata.
A deep exploration of this is beyond the scope of this tutorial, but you can find more information in the Wikidata section of the Pywikibot manual on Mediawiki and the Wikidata:Pywikibot - Python 3 Tutorial on Wikidata.
Here you'll find a selection of user notebooks that use Pywkibot: