2020 Census Demographic and Housing Characteristics (DHC) Noisy Measurement File (NMF) (ICPSR 38937)
Version Date: Oct 24, 2023 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
John M. Abowd, United States. Bureau of the Census;
Robert Ashmead, United States. Bureau of the Census;
Ryan Cumings-Menon, United States. Bureau of the Census;
Simson Garfinkel, (formerly) United States. Bureau of the Census;
Micah Heineck, Knexus Research Corporation;
Christine Heiss, Knexus Research Corporation;
Robert Johns, Knexus Research Corporation;
Daniel Kifer, United States. Bureau of the Census;
Philip Leclerc, United States. Bureau of the Census;
Ashwin Machanavajjhala, Duke University; Tumult Labs;
Brett Moran, United States. Bureau of the Census;
William Sexton, (formerly) United States. Bureau of the Census; Tumult Labs;
Matthew Spence, United States. Bureau of the Census;
Pavel Zhuravlev, United States. Bureau of the Census
https://doi.org/10.3886/ICPSR38937.v1
Version V1
Summary View help for Summary
The 2020 Census Demographic and Housing Characteristics Noisy Measurement File is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022], and implemented in DAS_2020_DHC_Production_Code/das_decennial/programs/engine/primitives.py at main uscensusbureau/DAS_2020_DHC_Production_Code (github.com) The 2020 Census Demographic and Housing Characteristics Noisy Measurement File includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism (Cannone C., et al., [2023] ), which added positive or negative integer-valued noise to each of the resulting counts. These are estimated counts of individuals and housing units included in the 2020 Census Edited File (CEF), which includes confidential data collected in the 2020 Census of Population and Housing.
The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the Census Demographic and Housing Characteristics Summary File. In addition to the noisy measurements, constraints based on invariant calculations --- counts computed without noise --- are also included (with the exception of the state-level total populations, which can be sourced separately from data.census.gov).
The Noisy Measurement File was produced using the official "production settings," the final set of algorithmic parameters and privacy-loss budget allocations that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File.
The noisy measurements are produced in an early stage of the TDA. Afterward, these noisy measurements are post-processed to ensure internal and hierarchical consistency within the resulting tables. The Census Bureau has released these noisy measurements to enable data users to evaluate the impact of disclosure avoidance variability on 2020 Census data. The 2020 Census Demographic and Housing Characteristics (DHC) Noisy Measurement File has been cleared for public dissemination by the Census Bureau Disclosure Review Board (CBDRB-FY22-DSEP-004).
These data are available for download (i.e. not restricted access). Due to their size, they must be downloaded through the link on this metadata page and not through the standard ICPSR download. The link will take you to the Globus site where these data are housed. A README file is located in the Globus repository. Please refer to that for pertinent information.
The Globus holding site requires users to create an account to access these data. Accounts can be created through existing institutional access and by personal access.
Please see the Globus "How to get Started" page for more information.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Data Collection Notes View help for Data Collection Notes
- Visit the U.S. Census Bureau's Disclosure Avoidance Modernization page to learn more about the use of differential privacy in the 2020 Census.
- Copies of the original data collection instruments can be found at: Decennial Census Questionnaires and Instructions.
- The readme is located here: 2020_DHC_NMF_README.
Data Source View help for Data Source
The primary source for the 2020 Census Demographic and Housing Characteristics Noisy Measurement File was direct collection of responses from the population of the United States. A number of source documents are useful for understanding the NMFs. Chief among these are:
- 2020 Census Redistricting Data (P.L. 94-171) Summary File Technical Documentation complete-technical-documents.html#dhc-and-dp
- DAS 2020 Redistricting Production Code Release(public GitHub repository for the 2020 Census DAS, vintaged as of the commit used to produce the official production run of the Redistricting product. The zCDP framework NMFs were generated in a for-internal-use-only pickled (https://docs.python.org/3/library/pickle.html; https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.pickleFile.html) form as a byproduct of the use of this code. A stand-alone script was developed and used to convert these internal-use NMFs into the Parquet format used in this product (that script is not yet publicly available).
Data Type(s) View help for Data Type(s)
HideNotes
These data are freely available to data users at ICPSR member institutions. The curation and dissemination of this study are provided by the institutional members of ICPSR. How do I access ICPSR data if I am not at a member institution?