Introduction to SQL and Relational Databases#

Python is very useful and powerful for doing data analysis. However, it is not meant to be used with extremely large datasets, or to be used as a method of storing data. This is where SQL and relational databases come in. In this chapter, we will explore how relational databases work to store data, and how to use SQL to organize and extract the data that we want.

The LODES Data Schema#

In the SQLite database that we will be using in this book, we have the following tables:

ca_wac_2015: California Workplace Area Characteristics
ca_rac_2015: California Residence Area Characteristics
ca_od_2015: California Origin-Destination
ca_xwalk: California Block-level Crosswalk

Motivating Question#

In this chapter, we will focus on the following motivating question:

What are the Characteristics of Jobs by Census Block?

The LODES data has a wealth of information about jobs at the census block level. We want to explore this, so that we can characterize the data that is available to us. That is, for any given state, we want to answer, for example, some of the following questions:

How many census blocks contain workplaces?
What were the most jobs in a census block?
How many census blocks had over 50 jobs? Over 100?
Among census blocks containing worplaces, what is the average number of jobs per census block?

These, as well as other questions about the data we might answer, can help us better understand the distribution of jobs by location. In this notebook, try to keep these types of questions in mind as we explore the data.

Introduction to Python and SQL for Data Analysis

Introduction to SQL and Relational Databases

Contents

Introduction to SQL and Relational Databases#

The LODES Data Schema#

Motivating Question#