2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File, United States (ICPSR 38855)
Version Date: Jun 15, 2023 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
John M. Abowd, United States. Bureau of the Census;
Robert Ashmead, United States. Bureau of the Census;
Ryan Cumings-Menon, United States. Bureau of the Census;
Simson Garfinkel, (formerly) United States. Bureau of the Census;
Micah Heineck, Knexus Research Corporation;
Christine Heiss, Knexus Research Corporation;
Robert Johns, Knexus Research Corporation;
Daniel Kifer, United States. Bureau of the Census; Pennsylvania State University;
Philip Leclerc, United States. Bureau of the Census;
Ashwin Machanavajjhala, Duke University; Tumult Labs;
Brett Moran, United States. Bureau of the Census;
William Sexton, (formerly) United States. Bureau of the Census; Tumult Labs;
Matthew Spence, United States. Bureau of the Census;
Pavel Zhuravlev, United States. Bureau of the Census
https://doi.org/10.3886/ICPSR38855.v1
Version V1
Summary View help for Summary
The 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9, and implemented in the DAS 2020 Redistricting Production Code). The 2020 Redistricting NMF was an intermediate output of the DAS during the execution of the algorithm to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File. The NMFs are intermediate privacy-protected outputs of the DAS; they were generated using the Census Bureau's implementation of the Discrete Gaussian Mechanism, calibrated to satisfy zero-Concentrated Differential Privacy with bounded neighbors. The NMF values, called "noisy measurements" are the output of applying the Discrete Gaussian Mechanism to counts from the 2020 Census Edited File (CEF). They are generally inconsistent with one another (for example, in a county composed of two tracts, the noisy measurement for the county's total population may not equal the sum of the noisy measurements of the two tracts' total population), and frequently negative (especially when the population being measured was small), but are integer-valued. The NMF was later post-processed as part of the DAS code to take the form of microdata and to satisfy various constraints. The NMF documented here contains both the noisy measurements themselves as well as the data needed to represent the DAS constraints; thus, the NMF could be used to reproduce the steps taken by the DAS code to produce microdata from the noisy measurements by applying the production code base.
The 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism. These are estimated counts of individuals and housing units included in the 2020 Census Edited File (CEF), which includes confidential data initially collected in the 2020 Census of Population and Housing. The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File.
The NMF provides estimates of counts of persons in the CEF by various characteristics and combinations of characteristics including their reported race and ethnicity, whether they were of voting age, whether they resided in a housing unit or one of 7 group quarters types, and their census block of residence after the addition of discrete Gaussian noise (with the scale parameter determined by the privacy-loss budget allocation for that particular query under zCDP). Noisy measurements of the counts of occupied and vacant housing units by census block are also included. Lastly, data on constraints--information into which no noise was infused by the Disclosure Avoidance System (DAS) and used by the TDA to post-process the noisy measurements into the 2020 Census Redistricting Data (P.L. 94-171) Summary File --are provided.
These data are available for download (i.e. not restricted access). Due to their size, they must be downloaded through the link on this metadata page and not through the standard ICPSR download. The link will take you to the Globus site where these data are housed. A README file is located in the Globus repository. Please refer to that for pertinent information.
The Globus holding site requires users to create an account to access these data. Accounts can be created through existing institutional access and by personal access.
Please see the Globus "How to get Started" page for more information.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Data Collection Notes View help for Data Collection Notes
- Visit the U.S. Census Bureau's Disclosure Avoidance Modernization page to learn more about the use of differential privacy in the 2020 Census.
Data Source View help for Data Source
The primary source for the 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) was direct collection of responses from the population of the United States. A number of source documents are useful for understanding the NMFs. Chief among these are:
- 2020 Census Redistricting Data (P.L. 94-171) Summary File Technical Documentation complete-technical-documents.html#redistricting
- DAS 2020 Redistricting Production Code Release (public GitHub repository for the 2020 Census DAS, vintaged as of the commit used to produce the official production run of the Redistricting product. The zCDP framework NMFs were generated in a for-internal-use-only pickled (https://docs.python.org/3/library/pickle.html; https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.pickleFile.html) form as a byproduct of the use of this code. A stand-alone script was developed and used to convert these internal-use NMFs into the Parquet format used in this product (that script is not yet publicly available).
Data Type(s) View help for Data Type(s)
HideNotes
These data are freely available to data users at ICPSR member institutions. The curation and dissemination of this study are provided by the institutional members of ICPSR. How do I access ICPSR data if I am not at a member institution?