import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import sqlite3

conn = sqlite3.connect("../../lodes.db")

Writing Data Frames to an SQL Database#

We can use our engine that we used to connect to the database to also write tables to a database. We do this using the to_sql Data Frame method in pandas. Let’s try this out by putting a new state’s data into our lodes.db database.

data_url = 'https://lehd.ces.census.gov/data/lodes/LODES7/md/wac/md_wac_S000_JT00_2015.csv.gz'
df_md = pd.read_csv(data_url,compression='gzip')

df_md.head()

	w_geocode	C000	CA01	CA02	CA03	CE01	CE02	CE03	...	createdate
0	240010001001023	8	3	4	1	4	4	0	...	20190826
1	240010001001025	1	0	1	0	0	1	0	...	20190826
2	240010001001054	10	2	3	5	7	3	0	...	20190826
3	240010001001113	2	0	2	0	0	1	1	...	20190826
4	240010001002061	8	4	4	0	7	1	0	...	20190826

5 rows × 53 columns

First, we will lowercase all column names to make it easier to work with in SQL. Remember, SQL lowercases everything by default, so if we bring this data into the database with uppercase column names, we’ll need to use double quotes to refer to all column names, which can become a headache to manage down the road.

df_md.columns = df_md.columns.str.lower()
df_md.columns

Index(['w_geocode', 'c000', 'ca01', 'ca02', 'ca03', 'ce01', 'ce02', 'ce03',
       'cns01', 'cns02', 'cns03', 'cns04', 'cns05', 'cns06', 'cns07', 'cns08',
       'cns09', 'cns10', 'cns11', 'cns12', 'cns13', 'cns14', 'cns15', 'cns16',
       'cns17', 'cns18', 'cns19', 'cns20', 'cr01', 'cr02', 'cr03', 'cr04',
       'cr05', 'cr07', 'ct01', 'ct02', 'cd01', 'cd02', 'cd03', 'cd04', 'cs01',
       'cs02', 'cfa01', 'cfa02', 'cfa03', 'cfa04', 'cfa05', 'cfs01', 'cfs02',
       'cfs03', 'cfs04', 'cfs05', 'createdate'],
      dtype='object')

df_md.to_sql('md_wac_2015', conn, if_exists='replace')

Here, to_sql() takes the Data Frame and puts it into the database in the connection conn, naming it as the value given in the first argument (in this case, we named it md_wac_2015. The last argument, if_exists='replace', makes it so that we replace the table if it exists already. This isn’t necessarily always the best to use, because it will overwrite any updates you might have made, but it works for our purposes because if we are bringing in a new table, we haven’t done anything about the old one to warrant keeping it.

Let’s bring this dataset back to make sure we got what we wanted.

df = pd.read_sql("SELECT * FROM md_wac_2015",conn)
df.head()

	index	w_geocode	c000	ca01	ca02	ca03	ce01	ce02	ce03	...	createdate
0	0	240010001001023	8	3	4	1	4	4	0	...	20190826
1	1	240010001001025	1	0	1	0	0	1	0	...	20190826
2	2	240010001001054	10	2	3	5	7	3	0	...	20190826
3	3	240010001001113	2	0	2	0	0	1	1	...	20190826
4	4	240010001002061	8	4	4	0	7	1	0	...	20190826

5 rows × 54 columns

Checkpoint: Put More Datasets into the Database#

Try doing the same process of putting data into a database with a different dataset. Make sure you lowercase all columns, and check to make sure it has been put into the database properly by bringing it back into Python.

Introduction to Python and SQL for Data Analysis

Writing Data Frames to an SQL Database

Contents

Writing Data Frames to an SQL Database#

Checkpoint: Put More Datasets into the Database#