Getting the most edited shows that will end in 2016

Challenge overview

  1. Create a CSV file that contains a list of all the Wikipedia articles for shows that are ending this year, and how many revisions have been made to each article.
  2. Which of the shows that are being cancelled this year have been edited the most? Which ones have been edited the least?

This challenge can be accomplished without any functions at all, just using some combination of (nested) 'for' loops, while loops, etc.

However, using functions helps us avoid writing the same commands over and over again, and keeps our code well organized. Today we'll walk through how to design a Python script that uses functions to solve this challenge.

Step 0. Outlining what our code will do

A good way to start any complex coding project is to write an outline of what needs to be done, using notes and/or pseudocode.

Outlining your coding challenge (or final project) this way will help you stay on track and avoid getting stuck—even if you don't use functions! But it's especially helpful if you are going to use functions, because when you break the task down into steps like this, it helps you identify discrete sets of operations that naturally 'go together'.

Step 1. Write an API query function

Step 2 involves making a series of API queries using almost exactly the same parameters—the only thing that changes is the title of the show we're asking for revisions for.

Your main method (which I cover in the next step) will show what code is being executed, and in what order. But most of execution (the for loops, while loops, and if statements) will happen elsewhere—inside functions.

This sounds like a perfect opportunity to write a function. Let's plan it out. Again, using notes/pseudocode first.

Function 1 outline

If you follow all the steps above for every show in the list, you will have collected all the data you need for analysis. Here's what a final version of this function might look like.

Function 1 code

Step 2. Create a Main method

By the 'main' method, I mean the part of the script that controls what happens and in what order it happens. Before today, our whole scripts have essentially been the 'main' method, because we have been ordering everything sequentially—we start executing at the top of the script, and we finish at the bottom.

But when you are writing complex operations in your code (e.g. lots of for loops, while loops, and if statements), it quickly becomes hard to read. And it becomes even harder to get a 'birds eye view' of what the code is doing.

Your main method is where you execute the code that is inside the functions.

Your main method is usually placed at the bottom of the file, even though it is the first part of the Python file that is run (other than any import statements).

That's why the functions have to be above the main method: when Python sees a function call in the main method, it checks against the functions that it has seen so far. It's really the same situation as with show_titles: Python can't execute any operations against the contents of show_titles unless it has seen that list before it is asked to do something with it.

Step 3: Make a CSV printing function

Now we have written one function. As you can see, it makes it somewhat easier to see what's happening at a high level (especially since we gave our function a descriptive name). But it might not be clear yet why we went through the extra trouble of making a function for this, rather than just using a for loop.

For our next step, we will write a function that takes the dict we just created, shows_2016, and output it to a CSV file.

Function 2 outline

Function 2 code

This function is a bit different from the last one. First, it takes two variables, rather than one. Second, it doesn't return anything. That's because in this case, it doesn't need to: its job is to take input from Main and then save that input to a file on your hard drive or server.

Here's what our Main method looks like once we've implemented this new function:

Things to notice:

Steps 4-5: getting min and max values

Now that we've saved our results in a dictionary (shows_2016) and as a CSV file (tv_show_2016.csv) there are at least three ways we could always find out the min/max values:

  1. by opening that file in Microsoft Excel or another spreadsheet program, and sorting by the second column.
  2. by looping through our dictionary and comparing each rev count against the largest one we've seen so far, then doing the same thing looking for the smallest count.
  3. by creating sorted lists out of our dictionary data and picking the first/last value in the list

All of these solutions are fine, but #3 is probably the quickest.

Now we've accomplished our task and we can all hang up our coding gloves and go home. Right?

Not so fast

This solution is reasonably clear and succinct. But it's not very extensible. For example:

If we want to do any of this, or if we think we might want to do it in the future, without having to re-write all this code, let's take a look at our code for finding the min/max value, and see if we can make it shorter and more abstract by using a function.

Function 3 outline

Function 3 code

This function is very short. But that's okay. Functions can be very short—how much a particular function does is up to you. Many programmers are fond of short functions, because they make the code very modular and abstract.

How long should your functions be? A good rule of thumb is to try to make each function do only one thing... of course what constitutes "one thing" is up to you. Ultimately, the best way to figure out how much a single function should do is to experiment.

Here's what our Main method looks like once we've implemented this new function:

The final piece: if __name__ == "__main__":

The code we've written so far is much clearer, better organized, and easier to update than our first version. But there's one more (optional) thing that we can do to make full use of the functions we've created: add if __name__ == "__main__": at the top of our main method (which until we've been designating with an all-caps comment string).

Note that when we use if __name__ == "__main__":, we have to indent all of our 'main' stuff under it, just like we would for a function. That's because the main method is a function (method and function are roughly interchangable words in Python, like library and module). By using if __name__ == "__main__":, we're making the functional nature of 'main' explicit.

Using if __name__ == "__main__": also allows us to use our script as a module and use the functions we've created in other scripts (or in the terminal) by importing our script into that environment, and passing values to whatever function we want.

We will demonstrate this in TextWrangler using the script week8/, but I will paste what the main method of that script looks like below, for clarity.