Using Conditional Statements#

Suppose we want to look at a subset of the data. We can use conditional statements to do this.

SELECT *
FROM ca_wac_2015 
WHERE c000 < 100
LIMIT 10;
w_geocode c000 ca01 ca02 ca03 ce01 ce02 ce03 cns01 cns02 cns03 cns04 cns05 cns06 cns07 cns08 cns09 cns10 cns11 cns12 cns13 cns14 cns15 cns16 cns17 cns18 cns19 cns20 cr01 cr02 cr03 cr04 cr05 cr07 ct01 ct02 cd01 cd02 cd03 cd04 cs01 cs02 cfa01 cfa02 cfa03 cfa04 cfa05 cfs01 cfs02 cfs03 cfs04 cfs05 createdate over100
060014001001007 30 2 16 12 4 2 24 0 0 0 0 0 0 0 0 0 1 0 25 0 0 0 0 0 0 4 0 25 1 0 3 1 0 27 3 3 6 10 9 7 23 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001008 4 0 1 3 0 0 4 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 2 0 0 2 0 0 4 0 0 2 1 1 0 4 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001011 3 2 1 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 3 0 3 0 0 0 0 1 0 3 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001017 11 3 3 5 2 2 7 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 10 0 0 1 0 0 11 0 0 0 3 5 10 1 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001024 10 3 3 4 7 1 2 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 2 0 0 6 0 4 0 0 5 0 1 10 0 0 2 3 2 2 8 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001026 3 0 2 1 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 1 0 0 2 0 0 2 1 0 0 2 1 1 2 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001027 13 3 3 7 4 5 4 0 0 0 0 0 0 0 0 0 0 0 3 0 0 1 1 0 5 3 0 9 0 0 4 0 0 10 3 1 4 2 3 6 7 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001032 13 2 4 7 3 2 8 0 0 0 0 0 0 6 0 0 0 0 6 0 0 0 0 0 0 1 0 6 0 0 7 0 0 12 1 0 4 3 4 7 6 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001033 2 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 2 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 20170919 0
060014001001034 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 20170919 0

Using a query like the one above can be useful for finding if there are any data entry errors or missing values. Since it’s not possible to have an age less 0, if there are any rows with negative age, this is likely an error or the method used to code missing values (e.g. -1).

We can also use more complicated conditional statements.

SELECT count(*) 
FROM ca_wac_2015
WHERE (c000 > 50) AND (c000 < 100);
count(*)
17877

This subsets to rows in which c000 is greater than 50 and c000 is less than 100. That is, this subsets to census blocks with between 50 and 100 total jobs. Using OR works in the same way.

SELECT count(*) 
FROM ca_wac_2015
WHERE (c000 <= 50) OR (c000 >= 100);
count(*)
225585

This subsets to rows in which c000 is less than or equal to 50 or c000 is greater than or equal to 100. This query should, in other words, capture the rest of the rows.

Common Comparison Operators#

Though there are some more complicated comparison operators (if you’re curious, feel free to look up what LIKE and IN do), these should cover most of what you want to do.

  • =: equal to

  • != or “<>”: not equal to

  • <: less than

  • <=: less-than-or-equal-to

  • >: greater than

  • >=: greater-than-or-equal-to

  • IS NULL and IS NOT NULL: The signifier of a row in a column not having a value is a special keyword: NULL. To check for NULL, you use IS NULL or IS NOT NULL, rather than “=” or “!=”. For example, to count the number of rows with NULL values for c000 we might use the following:

      SELECT count(*) 
      FROM ca_wac_2015
      WHERE c000 IS NULL;
    

Creating Variables#

Suppose we want to create a new column in the table that acts as a “flag” for which rows fit a certain condition, so that you can use them later. We can do this using the ALTER TABLE statement.

ALTER TABLE ca_wac_2015 ADD over100 BOOL;
UPDATE ca_wac_2015 SET over100 = 0;
UPDATE ca_wac_2015 SET over100 = 1 WHERE c000 > 100;

Let’s break this down line by line. First, we use ALTER TABLE, then specify the table we want to alter. In this case, we want to alter the ca_wac_2015 table. Then, we ADD a new column, over100. We designate this as a BOOL for boolean (that is, a TRUE/FALSE value) column.

After we create this new column, we need to fill it with the appropriate values. First, we’re going to set everything in the column to be 0 (or False). To do this, we use UPDATE, specify the appropriate table, then use SET over100 = 0. Then, we replace the value with 1 (or True) if the value in c000 for that row is above 100. We again use UPDATE in a similar manner, except we add a WHERE clause, so that it only set the value to TRUE if a certain condition is met – in this case, that c000 > 100.

Checkpoint: Using Conditional Statements#

Answer the questions below, making sure to write out the queries used to answer the questions.

  • How many census blocks contain more than 200 jobs?

  • How many census blocks contain residences of fewer than 25 workers?

  • How many census blocks contain workplaces with more than 10 workers with a Bachelor’s degree or higher?