{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data\n", "\n", "In this book, we will use a set of public datasets from the Longitudinal Employer Household Dynamic (LEHD) data provided by the United States Census Bureau. In particular, we will use the LEHD Origin-Destination Employment Statistics (LODES) data. These data are based on tabulated administrative data and give information about workplaces and residences of workers at the census block level. There are four main types of data that we will use.\n", "- **Workplace Area Characteristics (WAC):** Census block level. Job totals for workplaces in the census block.\n", "- **Residence Area Characteristics (RAC):** Census block level. Job totals for residences in the census block.\n", "- **Origin-Destination (OD):** Origin census block - Destination census block pair level. \n", "- **Crosswalk (xwalk):** Census block level. Contains all census blocks within that state, and contains information about that census block (e.g. city, county)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Workplace Area Characteristics (WAC) and Residence Area Characteristics (RAC)\n", "\n", "The WAC and RAC data generally look something like the following:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide_input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
w_geocodeC000CA01CA02CA03CE01CE02CE03CNS01CNS02...CFA02CFA03CFA04CFA05CFS01CFS02CFS03CFS04CFS05createdate
0240010001001023834144000...00000000020190826
1240010001001025101001000...00000000020190826
22400100010010541023573000...00000000020190826
3240010001001113202001100...00000000020190826
4240010001002061844071000...00000000020190826
\n", "

5 rows × 53 columns

\n", "
" ], "text/plain": [ " w_geocode C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02 \\\n", "0 240010001001023 8 3 4 1 4 4 0 0 0 \n", "1 240010001001025 1 0 1 0 0 1 0 0 0 \n", "2 240010001001054 10 2 3 5 7 3 0 0 0 \n", "3 240010001001113 2 0 2 0 0 1 1 0 0 \n", "4 240010001002061 8 4 4 0 7 1 0 0 0 \n", "\n", " ... CFA02 CFA03 CFA04 CFA05 CFS01 CFS02 CFS03 CFS04 CFS05 \\\n", "0 ... 0 0 0 0 0 0 0 0 0 \n", "1 ... 0 0 0 0 0 0 0 0 0 \n", "2 ... 0 0 0 0 0 0 0 0 0 \n", "3 ... 0 0 0 0 0 0 0 0 0 \n", "4 ... 0 0 0 0 0 0 0 0 0 \n", "\n", " createdate \n", "0 20190826 \n", "1 20190826 \n", "2 20190826 \n", "3 20190826 \n", "4 20190826 \n", "\n", "[5 rows x 53 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd \n", "URL = 'https://lehd.ces.census.gov/data/lodes/LODES7/md/wac/md_wac_S000_JT00_2015.csv.gz'\n", "pd.read_csv(URL, compression='gzip').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, each of the rows represents a **census block** (this particular table contains data from Maryland). The `w_geocode` indicates the **block code**, serving as the unique identifier for the census block, and the `C000` variable represents the total number of jobs in that census block. The rest of the variable break down the number of jobs by various categories. For example, `CA01`, `CA02`, and `CA03` break down the jobs by age group:\n", "- `CA01`: Number of jobs for workers age 29 or younger\n", "- `CA02`: Number of jobs for workers age 30 to 54\n", "- `CA03`: Number of jobs for workers age 55 or older\n", "\n", "So, the sum of those columns should be equal to the value in `C000`. \n", "\n", "The same applies for the RAC data, except instead of the jobs in that census block, it shows the residences in the census block. So, the `C000` column in the RAC data represents all workers who lived in that census block. The `CA01`, `CA02`, and `CA03` variables represent the number of workers within each age group that lived in that census block. \n", "\n", "Note that for both of these datasets, the unit of observations is the census block." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Origin-Destination\n", "\n", "The Origin-Destination file looks like this:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "hide_input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
w_geocodeh_geocodeS000SA01SA02SA03SE01SE02SE03SI01SI02SI03createdate
0240010001001023240010001002184101001001020190826
1240010001001023240010001003108101001001020190826
2240010001001023240010002003023100110001020190826
3240010001001023240010022001060101001001020190826
4240010001001023240430107002095110010001020190826
\n", "
" ], "text/plain": [ " w_geocode h_geocode S000 SA01 SA02 SA03 SE01 SE02 SE03 \\\n", "0 240010001001023 240010001002184 1 0 1 0 0 1 0 \n", "1 240010001001023 240010001003108 1 0 1 0 0 1 0 \n", "2 240010001001023 240010002003023 1 0 0 1 1 0 0 \n", "3 240010001001023 240010022001060 1 0 1 0 0 1 0 \n", "4 240010001001023 240430107002095 1 1 0 0 1 0 0 \n", "\n", " SI01 SI02 SI03 createdate \n", "0 0 1 0 20190826 \n", "1 0 1 0 20190826 \n", "2 0 1 0 20190826 \n", "3 0 1 0 20190826 \n", "4 0 1 0 20190826 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "URL = 'https://lehd.ces.census.gov/data/lodes/LODES7/md/od/md_od_main_JT00_2015.csv.gz'\n", "pd.read_csv(URL, compression='gzip').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, each of the rows represents a `w_geocode`-`h_geocode` pair. That is, each row is a pair of census blocks for which there was at least one person who worked in the `w_geocode` census block and lived in the `h_geocode` census block. The `S000` variable represents how many people lived in the `h_geocode` census block and worked in the `w_geocode` census block." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crosswalk" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "hide_input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tabblk2010ststuspsstnamectyctynametrcttrctnamebgrpbgrpname...stanrcnamenectanectanamemilmilnamestwibstwibnameblklatddblklonddcreatedate
024003731201100124MDMaryland24003Anne Arundel County, MD240037312017312.01 (Anne Arundel, MD)2400373120111 (Tract 7312.01, Anne Arundel, MD)...NaN99999NaNNaNNaN2400100101 Anne Arundel WIA39.086213-76.53645720211018
124003701200100324MDMaryland24003Anne Arundel County, MD240037012007012 (Anne Arundel, MD)2400370120011 (Tract 7012, Anne Arundel, MD)...NaN99999NaNNaNNaN2400100101 Anne Arundel WIA38.926495-76.53715120211018
224003702500103424MDMaryland24003Anne Arundel County, MD240037025007025 (Anne Arundel, MD)2400370250011 (Tract 7025, Anne Arundel, MD)...NaN99999NaNNaNNaN2400100101 Anne Arundel WIA38.951701-76.55078420211018
324003702702200924MDMaryland24003Anne Arundel County, MD240037027027027.02 (Anne Arundel, MD)2400370270222 (Tract 7027.02, Anne Arundel, MD)...NaN99999NaNNaNNaN2400100101 Anne Arundel WIA39.011417-76.52762620211018
424003702500402024MDMaryland24003Anne Arundel County, MD240037025007025 (Anne Arundel, MD)2400370250044 (Tract 7025, Anne Arundel, MD)...NaN99999NaNNaNNaN2400100101 Anne Arundel WIA38.947590-76.53852420211018
\n", "

5 rows × 43 columns

\n", "
" ], "text/plain": [ " tabblk2010 st stusps stname cty ctyname \\\n", "0 240037312011001 24 MD Maryland 24003 Anne Arundel County, MD \n", "1 240037012001003 24 MD Maryland 24003 Anne Arundel County, MD \n", "2 240037025001034 24 MD Maryland 24003 Anne Arundel County, MD \n", "3 240037027022009 24 MD Maryland 24003 Anne Arundel County, MD \n", "4 240037025004020 24 MD Maryland 24003 Anne Arundel County, MD \n", "\n", " trct trctname bgrp \\\n", "0 24003731201 7312.01 (Anne Arundel, MD) 240037312011 \n", "1 24003701200 7012 (Anne Arundel, MD) 240037012001 \n", "2 24003702500 7025 (Anne Arundel, MD) 240037025001 \n", "3 24003702702 7027.02 (Anne Arundel, MD) 240037027022 \n", "4 24003702500 7025 (Anne Arundel, MD) 240037025004 \n", "\n", " bgrpname ... stanrcname necta nectaname \\\n", "0 1 (Tract 7312.01, Anne Arundel, MD) ... NaN 99999 NaN \n", "1 1 (Tract 7012, Anne Arundel, MD) ... NaN 99999 NaN \n", "2 1 (Tract 7025, Anne Arundel, MD) ... NaN 99999 NaN \n", "3 2 (Tract 7027.02, Anne Arundel, MD) ... NaN 99999 NaN \n", "4 4 (Tract 7025, Anne Arundel, MD) ... NaN 99999 NaN \n", "\n", " mil milname stwib stwibname blklatdd blklondd \\\n", "0 NaN NaN 24001001 01 Anne Arundel WIA 39.086213 -76.536457 \n", "1 NaN NaN 24001001 01 Anne Arundel WIA 38.926495 -76.537151 \n", "2 NaN NaN 24001001 01 Anne Arundel WIA 38.951701 -76.550784 \n", "3 NaN NaN 24001001 01 Anne Arundel WIA 39.011417 -76.527626 \n", "4 NaN NaN 24001001 01 Anne Arundel WIA 38.947590 -76.538524 \n", "\n", " createdate \n", "0 20211018 \n", "1 20211018 \n", "2 20211018 \n", "3 20211018 \n", "4 20211018 \n", "\n", "[5 rows x 43 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "URL = 'https://lehd.ces.census.gov/data/lodes/LODES7/md/md_xwalk.csv.gz'\n", "pd.read_csv(URL, compression='gzip').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more information about the datasets used in the examples, please refer to the data documentation provided [at this link](https://lehd.ces.census.gov/data/lodes/LODES7/LODESTechDoc7.4.pdf). " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }