Different Types of Joins
Contents
%LOAD ../../lodes.db
Different Types of Joins#
We’ve so far done only one type of join, an inner join. This is the default join (which is why we didn’t need to specify anything more in the code). However, there are different types of joins.
Left and Right Joins in SQL#
Suppose we want to look at every single census block in one table, only filling in information from the second table if it exists. We’ll illustrate this using Table A and Table B from before. Recall that our JOIN
created Table C:
blockid |
C000 |
CA01 |
---|---|---|
2 |
10 |
2 |
5 |
22 |
4 |
6 |
9 |
1 |
Instead, we want to create the following table:
blockid |
C000 |
CA01 |
---|---|---|
1 |
5 |
null |
2 |
10 |
2 |
3 |
2 |
null |
4 |
6 |
null |
5 |
22 |
4 |
6 |
9 |
1 |
Here, we’ve kept every single row in Table A, and simply filled in the information from Table B if it existed for that blockid
. This is called a LEFT JOIN, since we’re taking the table on the left (that is, Table A) and adding the information from Table B onto that. We could have also done a RIGHT JOIN, which does the same thing, except flipping the tables, giving us something that looks like:
blockid |
C000 |
CA01 |
---|---|---|
2 |
10 |
2 |
5 |
22 |
4 |
6 |
9 |
1 |
7 |
null |
2 |
8 |
null |
0 |
Applying Left Joins#
When might you use left or right joins? Suppose you want to know which census blocks don’t have any jobs. Then, we’d want to make sure that we keep all of the census blocks in the geography crosswalk, even if they aren’t present in the workplace area characteristics tables, and add in the workplace area characteristics. This would mean that any blocks with a null
value in the C000
column would be blocks without any jobs.
To do this JOIN
, we can use the LEFT JOIN
statement.
SELECT w_geocode, tabblk2010, c000, ca01, ca02, ca03, ctyname, cbsaname FROM ca_wac_2015
LEFT JOIN ca_xwalk
ON ca_wac_2015.w_geocode = ca_xwalk.tabblk2010
LIMIT 10;
w_geocode | tabblk2010 | c000 | ca01 | ca02 | ca03 | ctyname | cbsaname |
---|---|---|---|---|---|---|---|
060014001001007 | 060014001001007 | 30 | 2 | 16 | 12 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001008 | 060014001001008 | 4 | 0 | 1 | 3 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001011 | 060014001001011 | 3 | 2 | 1 | 0 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001017 | 060014001001017 | 11 | 3 | 3 | 5 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001024 | 060014001001024 | 10 | 3 | 3 | 4 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001026 | 060014001001026 | 3 | 0 | 2 | 1 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001027 | 060014001001027 | 13 | 3 | 3 | 7 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001032 | 060014001001032 | 13 | 2 | 4 | 7 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001033 | 060014001001033 | 2 | 0 | 0 | 2 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
060014001001034 | 060014001001034 | 1 | 0 | 0 | 1 | Alameda County, CA | San Francisco-Oakland-Hayward, CA |
This is very similar to what we’ve done already with JOIN
, except we add the word LEFT
to it. When doing LEFT
and RIGHT JOIN
s, make sure to keep track of which one is first. Here, ca_wac_2015
comes first, so that’s the table on the “left” side.
Right JOINs are not supported in SQLite3, so you aren’t able to practice doing them here. However, it’s useful to keep in mind for more sophisticated databases.
Full Outer Join#
An outer join keeps all unique ids, then puts NULL
if it isn’t part of that table. This is similar to a LEFT
or RIGHT JOIN
, except instead of only keeping all IDs from one table, it keeps them from both tables. Consider our example with Table A and Table B. We want to join them such that we get a table that looks like:
blockid |
C000 |
CA01 |
---|---|---|
1 |
5 |
null |
2 |
10 |
2 |
3 |
2 |
null |
4 |
6 |
null |
5 |
22 |
4 |
6 |
9 |
1 |
7 |
null |
2 |
8 |
null |
0 |
In a way, it’s like combining the LEFT
and RIGHT JOIN
s so that we have all information from both tables.
Applying Outer Joins#
Suppose we want to know which census blocks that contain either the residences of people with jobs in the state or the census blocks of the location of the workplace, but not both. We use FULL OUTER JOIN
for that. Unfortunately, we aren’t able to show the outer join here, as it isn’t supported by SQLite. We’ve provided the code here, but it won’t run, so just make sure to keep it in mind for the future.
SELECT * FROM ca_wac_2015
FULL OUTER JOIN ca_rac_2015
ON ca_wac_2015.w_geocode = ca_rac_2015.h_geocode
LIMIT 30;
This will let us see which census blocks contain values for both workplace characteristics and residence characteristics.
Checkpoint: Types of Joins#
Consider the following situations. How would you answer the question posed? What type of join should you use for each one? Which tables do you need to join? Try doing the join.
How many census blocks in the state contain a workplace and how many don’t? Recall that the geography crosswalk table contains information about the census blocks in the state.
Which county contains the most census blocks containing a workplace?
Which metropolitan/micropolitan area had the most census blocks containing a residence of a worker in the state?