# Start (as usual) by loading libraries
import numpy as np
import pandas as pd

def get_wac(year, state = "ca"):
    '''
    Arguments
    state: string, two-letter code of state for which we want the data
    year: int, the year we want to bring in data for
    
    Returns
    A pandas DataFrame with the WAC for a specific state and year
    '''
    
    base_url = 'https://lehd.ces.census.gov/data/lodes/LODES7/'
    file_specs = '{st}/wac/{st}_wac_S000_JT00_{yr}.csv.gz'.format(st = state, yr = year)
    file_name = base_url + file_specs
    
    # print("The URL for the file is at: " + file_name)
    output = pd.read_csv(file_name,compression='gzip')
    return(output)

Loops#

Sometimes, we want to run the same code many times over. In these case, we can use loops so that we don’t have to copy and paste the code over and over. To demonstrate how loops work, we’ll first look at a basic for loop.

for i in range(0,10):
    print(i)
0
1
2
3
4
5
6
7
8
9

Here, we are looping through the numbers 0 to 9 and printing them out. Let’s break down how each part works.

First, consider the first line.

for i in range(0,10):

This indicates that we will be looping through the values of 0 to 9, incrementing i in each iteration. That is, the code will use i=0 for one iteration. Then, it will go back and do everything again, except using i=1. This keeps going until it hits i=9, after which it stops.

Notice that the second line is indented. In Python, we use the indentation to delineate when the loop starts and ends. Everything after the colon that is indented is part of the for loop. The for loop ends when a line isn’t indented.

Consider the following code and think about what you expect it to print out before running it.

for i in range(0,10):
    print(i)
print("We're done now.")
0
1
2
3
4
5
6
7
8
9
We're done now.

Since the line with print("We're done now.") isn’t indented, it isn’t repeated. The for loop goes through the loop with just print(i).

Using a For Loop to Read In CSV Files#

Now that we’ve gone over the basics of how a for loop works, let’s apply it to reading in multiple CSV files. We’ve already created a function that takes a year and reads a CSV file. We want to do this for multiple years automatically, so that we don’t need to keep on changing the year and running the code again (if, for example, we want to do this for many years). So, in other words, we want to create a loop that runs the same code multiple times, with only the year changed.

Part of our task is a bit easier, since we’ve already created a function that does what we want. Now, all we need to do is loop through the years we want, calling that function with a different argument for the year.

There’s one small complication though: how will we automate storage of these Data Frame objects? There’s multiple possibilities, but the way we’ll do it is using a Python dictionary.

Using Loops and Functions to Bring in Multiple Datasets#

We’ll start by creating an empty dictionary in which we’ll store the Data Frames that we read in. Then, we’re going to loop through a few years (here, we’ll do 2009 to 2015), calling the get_wac function we created earlier to store the appropriate dataset in the dictionary. We’ll also make sure to provide the proper key when storing the dataset, so that we can easily access it.

# Initialize an empty dictionary.
wac_all_years = {}

# This loop might take a little bit of time.
# If you want to see progress while it runs, uncomment the second line in the loop.
for i in range(2009,2016):
    wac_all_years[i] = get_wac(year = i)
    # print("WAC for " + str(i) + " obtained.")

After running the loop, wac_all_years should contain seven Data Frames, each accessible using the year as the key.

Let’s look at one of the years.

wac_all_years[2009].head(10)
w_geocode C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02 ... CFA02 CFA03 CFA04 CFA05 CFS01 CFS02 CFS03 CFS04 CFS05 createdate
0 60014001001000 11 1 6 4 0 4 7 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
1 60014001001007 27 7 15 5 2 6 19 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
2 60014001001008 1 0 1 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
3 60014001001009 1 0 1 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
4 60014001001010 1 0 1 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
5 60014001001012 3 1 1 1 0 0 3 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
6 60014001001013 1 1 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
7 60014001001014 1 1 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
8 60014001001015 1 0 1 0 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 20160228
9 60014001001016 2 1 0 1 0 0 2 0 0 ... 0 0 0 0 0 0 0 0 0 20160228

10 rows × 53 columns

Here, we’re looking at the value in the dictionary wac_all_years that has the key 2009, then using the head() method on that Data Frame object to take a peek at what the first few lines of the data looks like.

If we wanted to work more extensively with one of the years (rather than just looking at it as we’ve done here), we might want to use something like

wac_09 = wac_all_years[2009]

That way, we can just use wac_09.

Checkpoint: Use Functions and Loops to Bring in Your Data for Multiple Years#

Using what we’ve learned above, try to apply the same methods to bring in multiple years’ worth of data for a different state. Remember to name objects differently so that you don’t overwrite anything.