Loops
Contents
# Start (as usual) by loading libraries
import numpy as np
import pandas as pd
def get_wac(year, state = "ca"):
'''
Arguments
state: string, two-letter code of state for which we want the data
year: int, the year we want to bring in data for
Returns
A pandas DataFrame with the WAC for a specific state and year
'''
base_url = 'https://lehd.ces.census.gov/data/lodes/LODES7/'
file_specs = '{st}/wac/{st}_wac_S000_JT00_{yr}.csv.gz'.format(st = state, yr = year)
file_name = base_url + file_specs
# print("The URL for the file is at: " + file_name)
output = pd.read_csv(file_name,compression='gzip')
return(output)
Loops#
Sometimes, we want to run the same code many times over. In these case, we can use loops so that we don’t have to copy and paste the code over and over. To demonstrate how loops work, we’ll first look at a basic for
loop.
for i in range(0,10):
print(i)
0
1
2
3
4
5
6
7
8
9
Here, we are looping through the numbers 0 to 9 and printing them out. Let’s break down how each part works.
First, consider the first line.
for i in range(0,10):
This indicates that we will be looping through the values of 0 to 9, incrementing i
in each iteration. That is, the code will use i=0
for one iteration. Then, it will go back and do everything again, except using i=1
. This keeps going until it hits i=9
, after which it stops.
Notice that the second line is indented. In Python, we use the indentation to delineate when the loop starts and ends. Everything after the colon that is indented is part of the for
loop. The for
loop ends when a line isn’t indented.
Consider the following code and think about what you expect it to print out before running it.
for i in range(0,10):
print(i)
print("We're done now.")
0
1
2
3
4
5
6
7
8
9
We're done now.
Since the line with print("We're done now.")
isn’t indented, it isn’t repeated. The for
loop goes through the loop with just print(i)
.
Using a For Loop to Read In CSV Files#
Now that we’ve gone over the basics of how a for loop works, let’s apply it to reading in multiple CSV files. We’ve already created a function that takes a year and reads a CSV file. We want to do this for multiple years automatically, so that we don’t need to keep on changing the year and running the code again (if, for example, we want to do this for many years). So, in other words, we want to create a loop that runs the same code multiple times, with only the year changed.
Part of our task is a bit easier, since we’ve already created a function that does what we want. Now, all we need to do is loop through the years we want, calling that function with a different argument for the year.
There’s one small complication though: how will we automate storage of these Data Frame objects? There’s multiple possibilities, but the way we’ll do it is using a Python dictionary.
Using Loops and Functions to Bring in Multiple Datasets#
We’ll start by creating an empty dictionary in which we’ll store the Data Frames that we read in. Then, we’re going to loop through a few years (here, we’ll do 2009 to 2015), calling the get_wac
function we created earlier to store the appropriate dataset in the dictionary. We’ll also make sure to provide the proper key when storing the dataset, so that we can easily access it.
# Initialize an empty dictionary.
wac_all_years = {}
# This loop might take a little bit of time.
# If you want to see progress while it runs, uncomment the second line in the loop.
for i in range(2009,2016):
wac_all_years[i] = get_wac(year = i)
# print("WAC for " + str(i) + " obtained.")
After running the loop, wac_all_years
should contain seven Data Frames, each accessible using the year as the key.
Let’s look at one of the years.
wac_all_years[2009].head(10)
w_geocode | C000 | CA01 | CA02 | CA03 | CE01 | CE02 | CE03 | CNS01 | CNS02 | ... | CFA02 | CFA03 | CFA04 | CFA05 | CFS01 | CFS02 | CFS03 | CFS04 | CFS05 | createdate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 60014001001000 | 11 | 1 | 6 | 4 | 0 | 4 | 7 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
1 | 60014001001007 | 27 | 7 | 15 | 5 | 2 | 6 | 19 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
2 | 60014001001008 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
3 | 60014001001009 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
4 | 60014001001010 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
5 | 60014001001012 | 3 | 1 | 1 | 1 | 0 | 0 | 3 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
6 | 60014001001013 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
7 | 60014001001014 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
8 | 60014001001015 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
9 | 60014001001016 | 2 | 1 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20160228 |
10 rows × 53 columns
Here, we’re looking at the value in the dictionary wac_all_years
that has the key 2009
, then using the head()
method on that Data Frame object to take a peek at what the first few lines of the data looks like.
If we wanted to work more extensively with one of the years (rather than just looking at it as we’ve done here), we might want to use something like
wac_09 = wac_all_years[2009]
That way, we can just use wac_09
.
Checkpoint: Use Functions and Loops to Bring in Your Data for Multiple Years#
Using what we’ve learned above, try to apply the same methods to bring in multiple years’ worth of data for a different state. Remember to name objects differently so that you don’t overwrite anything.