Search results

Showing 1 – 3 of 3 results.

Self-published

Census Data Workflow using IPUMS NHGIS API: Data Request and Download (ICPSR 120305)

Released/updated on: 2020-07-31

Time period: 2005-01-01--2009-01-01

This archive provides two Jupyter Notebooks to explore metadata and retrieve data using the IPUMS NHGIS API. American Community Survey (ACS) 5-year estimates 2005-2009 at the block group and county levels are requested and downloaded for Galveston County, Texas. Data of interest are race and ethnicity, and median household income. Block group and county shapefiles are also downloaded.

The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management.The notebooks use Google Drive for file storage and include extensive markdown and comments. The notebooks can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).

The first notebook explores metadata in order to identify relevant datasets and tables and necessary parameters for subsequent data request and retrieval. The second notebook uses the parameters identified from the first notebook. A data request is constructed and the data extract is downloaded and files unzipped and made ready for analysis. The data that were downloaded are also stored separately with this archive.

The data referenced in this archive have research applications listed in the Related Publications section and in ongoing research at the Texas A&M University Department of Landscape Architecture and Urban Planning (LAUP), and the Hazard Reduction and Recovery Center (HRRC).

Self-published

NHGIS Census Data Workflow using Stata and Python, Miami-Dade County 1990 Block Groups (ICPSR 119390)

Released/updated on: 2020-05-18

Time period: 1990-01-01--1990-01-01

This data archive provides Stata .do files and a Jupyter Notebook for the cleaning, manipulation, exploration, and mapping of Block Group and County level data. The data workflow is meant to provide useful tools and code to any researcher working Census data at the block group level or other geographic levels. The benefit of using NHGIS data is the ability to obtain older datasets not easily available for download from the Census. Additionally, NHGIS provides the data in a standardized format with codebooks.

The workflow is broken out into sub-tasks: obtain data, clean data, and explore data. Each task is designed to be completed prior to the subsequent task. The workflow follows guidance from Long (2009).

Stata .do file features:

Loops, local and global macros, and other advanced functions for generating and manipulating variables
Commands to generate and export descriptive statistics tables, histograms, boxplots and between-area analyses to Microsoft Word and image files

Jupyter Notebook features:

Code and markdown commentary to read in and map the output file from Stata and merge with shapefiles

The data used in this workflow example is used in analysis by Peacock et al. (2014), and Zhang and Peacock (2009).

Self-published

Population Distribution Workflow using Census API in Jupyter Notebook: Dynamic Map of Census Tracts in Boone County, KY, 2000 (ICPSR 120382)

Released/updated on: 2020-07-31

Time period: 2000-01-01--2000-01-01

This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.

The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.

The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components.

The notebook features code that performs the following functions:

install/import necessary Python packages
download the Census Tract shapefile from the TIGER/Line FTP Server
download Census data via CensusAPI
manipulate Census tabular data
merge Census data with TIGER/Line shapefile
apply a coordinate reference system
calculate land area and population density
map and export the map to HTML
export the map to ESRI shapefile
export the table to CSV

The notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).