Introduction to SQL and Relational Databases#

Python is very useful and powerful for doing data analysis. However, it is not meant to be used with extremely large datasets, or to be used as a method of storing data. This is where SQL and relational databases come in. In this chapter, we will explore how relational databases work to store data, and how to use SQL to organize and extract the data that we want.

The LODES Data Schema#

In the SQLite database that we will be using in this book, we have the following tables:

  • ca_wac_2015: California Workplace Area Characteristics

  • ca_rac_2015: California Residence Area Characteristics

  • ca_od_2015: California Origin-Destination

  • ca_xwalk: California Block-level Crosswalk

Motivating Question#

In this chapter, we will focus on the following motivating question:

What are the Characteristics of Jobs by Census Block?

The LODES data has a wealth of information about jobs at the census block level. We want to explore this, so that we can characterize the data that is available to us. That is, for any given state, we want to answer, for example, some of the following questions:

  • How many census blocks contain workplaces?

  • What were the most jobs in a census block?

  • How many census blocks had over 50 jobs? Over 100?

  • Among census blocks containing worplaces, what is the average number of jobs per census block?

These, as well as other questions about the data we might answer, can help us better understand the distribution of jobs by location. In this notebook, try to keep these types of questions in mind as we explore the data.