Functions and Loops#

In the previous chapter, we looked at how to bring in a dataset in a Pandas Data Frame and explore it. In this workbook, we’ll explore more of how we can use Python to automate and speed up analysis. This will be useful later on when we want to do analysis and create visualizations for multiple years’ worth of data.

Motivating Question#

So far, we’ve only looked at data from 2015. However, part of what makes the LODES data so useful is the availability of the data over many years. The data are available in multiple CSV files. There are many insights we might find from analyzing data over years, so we want to be able to bring them all together. In this chapter, we’ll explore how to do that as we try to answer the following questions:

What is the difference in the number of jobs by age group and by year? What are the different trends across years?

To answer these, we’ll introduce you to creating Python functions to make reading datasets easier. Then, we’ll use them within loops to automate reading in datasets.

Looking at Multiple Years#

In the previous chapters, we looked at the 2015 dataset. We might want to look at multiple years to see what happens across years and the difference in industries by year. However, it can get quite tedious if we want to bring in the different csv files one by one. To speed up the whole process, we can use Python to automate the process of bringing in datasets and make the whole task much easier to manage.

In the following sections, we’ll go over how to automate the downloading of the LODES files by creating functions and for loops. To set everything up, refer to page 2 in the LODES data documentation (you can access it here). This describes the directory tree as well as explaining that the CSV files are compressed using a GZip algorithm. You don’t need to worry too much about the details here, but we use this information in determining how exactly we’ll automate the downloading process.