Reading SQL tables using Pandas
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect("../../lodes.db")
Reading SQL tables using Pandas#
Now that we’ve created our engine to connect to the database, we can use the read_sql
function in pandas
to write SQL queries and get tables out as DataFrames.
df = pd.read_sql("SELECT * FROM ca_wac_2015",conn)
Here, pd.read_sql()
outputs the table that the SQL query that we wrote as a string would return. In this case, it’s simply the ca_wac_2015
table. Of course, you can include more complicated queries, such as joins, if you’d like.
Let’s look at the data to make sure we got what we wanted.
df.head()
w_geocode | c000 | ca01 | ca02 | ca03 | ce01 | ce02 | ce03 | cns01 | cns02 | ... | cfa03 | cfa04 | cfa05 | cfs01 | cfs02 | cfs03 | cfs04 | cfs05 | createdate | over100 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 060014001001007 | 30 | 2 | 16 | 12 | 4 | 2 | 24 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20170919 | 0 |
1 | 060014001001008 | 4 | 0 | 1 | 3 | 0 | 0 | 4 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20170919 | 0 |
2 | 060014001001011 | 3 | 2 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20170919 | 0 |
3 | 060014001001017 | 11 | 3 | 3 | 5 | 2 | 2 | 7 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20170919 | 0 |
4 | 060014001001024 | 10 | 3 | 3 | 4 | 7 | 1 | 2 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20170919 | 0 |
5 rows × 54 columns
We can use more complicated SQL queries as well, bringing in portions of tables. This is more often what you’ll want to do, as tables in SQL can get quite large, and you may want data that has already been joined together.
query = """
SELECT w_geocode, c000, cbsaname, ctyname
FROM ca_wac_2015 a
JOIN ca_xwalk b
ON a.w_geocode = b.tabblk2010;
"""
df = pd.read_sql(query,conn)
Here, we are first constructing a query (using the triple quotation marks to create the string over multiple lines for readability), then putting that query into the read_sql()
function, which outputs the result of the query as a Data Frame. Let’s take a quick look at it.
df.head()
w_geocode | c000 | cbsaname | ctyname | |
---|---|---|---|---|
0 | 060014001001007 | 30 | San Francisco-Oakland-Hayward, CA | Alameda County, CA |
1 | 060014001001008 | 4 | San Francisco-Oakland-Hayward, CA | Alameda County, CA |
2 | 060014001001011 | 3 | San Francisco-Oakland-Hayward, CA | Alameda County, CA |
3 | 060014001001017 | 11 | San Francisco-Oakland-Hayward, CA | Alameda County, CA |
4 | 060014001001024 | 10 | San Francisco-Oakland-Hayward, CA | Alameda County, CA |