Writing Data Frames to an SQL Database
Contents
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect("../../lodes.db")
Writing Data Frames to an SQL Database#
We can use our engine that we used to connect to the database to also write tables to a database. We do this using the to_sql
Data Frame method in pandas
. Let’s try this out by putting a new state’s data into our lodes.db database.
data_url = 'https://lehd.ces.census.gov/data/lodes/LODES7/md/wac/md_wac_S000_JT00_2015.csv.gz'
df_md = pd.read_csv(data_url,compression='gzip')
df_md.head()
w_geocode | C000 | CA01 | CA02 | CA03 | CE01 | CE02 | CE03 | CNS01 | CNS02 | ... | CFA02 | CFA03 | CFA04 | CFA05 | CFS01 | CFS02 | CFS03 | CFS04 | CFS05 | createdate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 240010001001023 | 8 | 3 | 4 | 1 | 4 | 4 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
1 | 240010001001025 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
2 | 240010001001054 | 10 | 2 | 3 | 5 | 7 | 3 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
3 | 240010001001113 | 2 | 0 | 2 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
4 | 240010001002061 | 8 | 4 | 4 | 0 | 7 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
5 rows × 53 columns
First, we will lowercase all column names to make it easier to work with in SQL. Remember, SQL lowercases everything by default, so if we bring this data into the database with uppercase column names, we’ll need to use double quotes to refer to all column names, which can become a headache to manage down the road.
df_md.columns = df_md.columns.str.lower()
df_md.columns
Index(['w_geocode', 'c000', 'ca01', 'ca02', 'ca03', 'ce01', 'ce02', 'ce03',
'cns01', 'cns02', 'cns03', 'cns04', 'cns05', 'cns06', 'cns07', 'cns08',
'cns09', 'cns10', 'cns11', 'cns12', 'cns13', 'cns14', 'cns15', 'cns16',
'cns17', 'cns18', 'cns19', 'cns20', 'cr01', 'cr02', 'cr03', 'cr04',
'cr05', 'cr07', 'ct01', 'ct02', 'cd01', 'cd02', 'cd03', 'cd04', 'cs01',
'cs02', 'cfa01', 'cfa02', 'cfa03', 'cfa04', 'cfa05', 'cfs01', 'cfs02',
'cfs03', 'cfs04', 'cfs05', 'createdate'],
dtype='object')
df_md.to_sql('md_wac_2015', conn, if_exists='replace')
33450
Here, to_sql()
takes the Data Frame and puts it into the database in the connection conn
, naming it as the value given in the first argument (in this case, we named it md_wac_2015
. The last argument, if_exists='replace'
, makes it so that we replace the table if it exists already. This isn’t necessarily always the best to use, because it will overwrite any updates you might have made, but it works for our purposes because if we are bringing in a new table, we haven’t done anything about the old one to warrant keeping it.
Let’s bring this dataset back to make sure we got what we wanted.
df = pd.read_sql("SELECT * FROM md_wac_2015",conn)
df.head()
index | w_geocode | c000 | ca01 | ca02 | ca03 | ce01 | ce02 | ce03 | cns01 | ... | cfa02 | cfa03 | cfa04 | cfa05 | cfs01 | cfs02 | cfs03 | cfs04 | cfs05 | createdate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 240010001001023 | 8 | 3 | 4 | 1 | 4 | 4 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
1 | 1 | 240010001001025 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
2 | 2 | 240010001001054 | 10 | 2 | 3 | 5 | 7 | 3 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
3 | 3 | 240010001001113 | 2 | 0 | 2 | 0 | 0 | 1 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
4 | 4 | 240010001002061 | 8 | 4 | 4 | 0 | 7 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20190826 |
5 rows × 54 columns
Checkpoint: Put More Datasets into the Database#
Try doing the same process of putting data into a database with a different dataset. Make sure you lowercase all columns, and check to make sure it has been put into the database properly by bringing it back into Python.