%LOAD ../../lodes.db

Different Types of Joins#

We’ve so far done only one type of join, an inner join. This is the default join (which is why we didn’t need to specify anything more in the code). However, there are different types of joins.

Left and Right Joins in SQL#

Suppose we want to look at every single census block in one table, only filling in information from the second table if it exists. We’ll illustrate this using Table A and Table B from before. Recall that our JOIN created Table C:

blockid

C000

CA01

2

10

2

5

22

4

6

9

1

Instead, we want to create the following table:

blockid

C000

CA01

1

5

null

2

10

2

3

2

null

4

6

null

5

22

4

6

9

1

Here, we’ve kept every single row in Table A, and simply filled in the information from Table B if it existed for that blockid. This is called a LEFT JOIN, since we’re taking the table on the left (that is, Table A) and adding the information from Table B onto that. We could have also done a RIGHT JOIN, which does the same thing, except flipping the tables, giving us something that looks like:

blockid

C000

CA01

2

10

2

5

22

4

6

9

1

7

null

2

8

null

0

Applying Left Joins#

When might you use left or right joins? Suppose you want to know which census blocks don’t have any jobs. Then, we’d want to make sure that we keep all of the census blocks in the geography crosswalk, even if they aren’t present in the workplace area characteristics tables, and add in the workplace area characteristics. This would mean that any blocks with a null value in the C000 column would be blocks without any jobs.

To do this JOIN, we can use the LEFT JOIN statement.

SELECT w_geocode, tabblk2010, c000, ca01, ca02, ca03, ctyname, cbsaname  FROM ca_wac_2015
LEFT JOIN ca_xwalk 
ON ca_wac_2015.w_geocode = ca_xwalk.tabblk2010
LIMIT 10;
w_geocode tabblk2010 c000 ca01 ca02 ca03 ctyname cbsaname
060014001001007 060014001001007 30 2 16 12 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001008 060014001001008 4 0 1 3 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001011 060014001001011 3 2 1 0 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001017 060014001001017 11 3 3 5 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001024 060014001001024 10 3 3 4 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001026 060014001001026 3 0 2 1 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001027 060014001001027 13 3 3 7 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001032 060014001001032 13 2 4 7 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001033 060014001001033 2 0 0 2 Alameda County, CA San Francisco-Oakland-Hayward, CA
060014001001034 060014001001034 1 0 0 1 Alameda County, CA San Francisco-Oakland-Hayward, CA

This is very similar to what we’ve done already with JOIN, except we add the word LEFT to it. When doing LEFT and RIGHT JOINs, make sure to keep track of which one is first. Here, ca_wac_2015 comes first, so that’s the table on the “left” side.

Right JOINs are not supported in SQLite3, so you aren’t able to practice doing them here. However, it’s useful to keep in mind for more sophisticated databases.

Full Outer Join#

An outer join keeps all unique ids, then puts NULL if it isn’t part of that table. This is similar to a LEFT or RIGHT JOIN, except instead of only keeping all IDs from one table, it keeps them from both tables. Consider our example with Table A and Table B. We want to join them such that we get a table that looks like:

blockid

C000

CA01

1

5

null

2

10

2

3

2

null

4

6

null

5

22

4

6

9

1

7

null

2

8

null

0

In a way, it’s like combining the LEFT and RIGHT JOINs so that we have all information from both tables.

Applying Outer Joins#

Suppose we want to know which census blocks that contain either the residences of people with jobs in the state or the census blocks of the location of the workplace, but not both. We use FULL OUTER JOIN for that. Unfortunately, we aren’t able to show the outer join here, as it isn’t supported by SQLite. We’ve provided the code here, but it won’t run, so just make sure to keep it in mind for the future.

SELECT * FROM ca_wac_2015
FULL OUTER JOIN ca_rac_2015 
ON ca_wac_2015.w_geocode = ca_rac_2015.h_geocode 
LIMIT 30;

This will let us see which census blocks contain values for both workplace characteristics and residence characteristics.

Checkpoint: Types of Joins#

Consider the following situations. How would you answer the question posed? What type of join should you use for each one? Which tables do you need to join? Try doing the join.

  • How many census blocks in the state contain a workplace and how many don’t? Recall that the geography crosswalk table contains information about the census blocks in the state.

  • Which county contains the most census blocks containing a workplace?

  • Which metropolitan/micropolitan area had the most census blocks containing a residence of a worker in the state?