This study is provided by ICPSR. ICPSR provides leadership and training in data access, curation, and methods of analysis for a diverse and expanding social science research community.
Census of Population and Housing, 2000 [United States]: Selected Subsets From Summary File 3 (ICPSR 13402)
Principal Investigator(s): United States Department of Commerce. Bureau of the Census; Inter-university Consortium for Political and Social Research
Prepared by the Inter-university Consortium for Political and Social Research, this data collection consists of selected subsets extracted from the Census of Population and Housing, 2000, Summary File 3 (SF3). The SF3 data contain information compiled from the questions asked of a sample of persons and housing units enumerated in Census 2000. Population items include sex, age, race, Hispanic or Latino origin, household relationship, marital status, caregiving by grandparents, language and ability to speak English, ancestry, place of birth, citizenship status and year of entry to the United States, migration, place of work, journey to work, school enrollment, educational attainment, veteran status, disability, employment status, industry, occupation, class of worker, income, and poverty status. Housing items include housing unit vacancy status, housing unit tenure (owner/renter), number of rooms, number of bedrooms, year moved into unit, occupants per room, units in structure, year structure built, heating fuel, telephone service, plumbing and kitchen facilities, vehicles available, value of home, rent, and shelter costs. The information in SF3 is presented in 813 tables, one variable per table cell, plus additional variables with geographic information. Cases in the summary file data are classified by levels of observation, known as "summary levels" in the Census Bureau's nomenclature, which served as the selection criteria for the subsets generated by ICPSR. Each subset comprises all of the cases in one of 10 summary levels: the nation (summary level 010), states (summary level 040), Metropolitan Statistical Areas (MSA)/Consolidated Metropolitan Statistical Areas (CMSA) (summary level 380), Primary Metropolitan Statistical Areas (PMSA) (summary level 385), places (summary level 160), counties (summary level 050), county subdivisions (summary level 060), whole census tracts (summary level 140), census tracts in places (summary level 158), and 5-Digit ZIP Code Tabulation Areas (ZCTA) (summary level 860). Four files are supplied for the summary level 860 subset: a single file that contains all of the SF3 tables, plus three smaller files, each of which contains about one third of the tables. Five files are supplied for each of the summary level 010, 040, 380, 385, 160, and 050 subsets: a single file that contains all of the SF3 tables, plus four smaller files, each of which contains approximately one quarter of the tables. Fifteen files are provided for each of the summary level 140 and 158 subsets. There is a national file with all of the SF3 tables, plus two smaller national files, each of which contains approximately one half of the tables. Additionally, there are three files for each of the four census regions (Northeast, Midwest, South, and West): a file with all tables and two smaller files each containing about one half of the tables. Twenty files are supplied for summary level 060. There is a national file with all tables, plus three smaller national files, each of which contains approximately one third of the tables. In addition, there are four files for each of the four census regions: a file with all tables and three smaller files each containing about one third of the tables.
These data are freely available.
WARNING: Because this study has many datasets, the download all files option has been suppressed, and you will need to download one dataset at a time.
WARNING: This study is over 150MB in size and may take several minutes to download on a typical internet connection.
United States Department of Commerce. Bureau of the Census, and Inter-university Consortium for Political and Social Research. Census of Population and Housing, 2000 [United States]: Selected Subsets From Summary File 3. ICPSR13402-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2004. http://doi.org/10.3886/ICPSR13402.v2
Persistent URL: http://doi.org/10.3886/ICPSR13402.v2
This study was funded by:
- United States Department of Health and Human Services. National Institutes of Health (R01 HD42564)
Scope of Study
Geographic Coverage: United States
Date of Collection:
Universe: All persons and housing units in the United States.
Data Types: census data
Data Collection Notes:
(1) The original SF3 data comprise 4,081 files. For the nation as a whole, every state, the District of Columbia, and Puerto Rico, there is one column-delimited file that contains geographic identifiers (the geographic header record file or "Geo" file), plus 76 comma-delimited table files, each of which contains a portion of the SF3 tables. For states, the District of Columbia, and Puerto Rico, the variables in the Geo file and table files 1-18 and 56-62 are shown down to the block group level, but the variables in table files 19-55 and 63-76 are only shown down to the census tract level. Consequently, table files 19-55 and 63-76 have fewer records than the Geo file and table files 1-18 and 56-62. In comparison, every one of 77 national files has the same number of cases. (2) The summary level 010, 040, 380, 385, 160, 050, 060, and 860 subsets were generated from the 77 national level files. Initial steps in the production of each of these subsets involved sorting the national-level Geo file and 76 table files in ascending order of the common identification variable LOGRECNO, reformatting the Geo file as a comma-delimited file, and stripping off the first five identification variables from each of the 76 table files (FILEID, STUSAB, CHARITER, CIFSN, and LOGRECNO). Next, the reformatted Geo file was merged with the stripped table files, so that corresponding records in the Geo and table files were joined as a single record in the merged file. The nth record of the merged file was created by concatenating the nth record of the reformatted Geo file, the nth record of the first stripped table file, the nth record of the second stripped table file, and so on, up to the nth record of the 76th stripped table file. Finally, the subset was generated by extracting from the merged file all cases with a given value for SUMLEV, the variable that identifies the summary level. For example, the states subset was produced by extracting all cases coded 040 for SUMLEV. (3) The summary level 140 and 158 subsets were produced one state at a time from the 4,004 state files (including the District of Columbia and Puerto Rico). Initial steps in the production of a subset for a state involved sorting its Geo file and 76 table files in ascending order of the common identification variable LOGRECNO, reformatting the Geo file as a comma-delimited file, removing records with data below the tract level from table files 1-18 and 56-62 and corresponding records in the Geo file, and stripping off the first five identification variables from each of the 76 table files. Next, the reformatted Geo file was merged with the stripped table files so that corresponding records in the Geo and table files were joined as a single record in the merged file. A state subset was generated by extracting from the merged file all cases with a given value for SUMLEV, i.e., 140 or 158. After subsets were produced for every state, the national and regional subsets were generated by combining their component state subsets in ascending order of their state Federal Information Processing Standards (FIPS) codes. (4) Due to the large size of the summary level 140, 158, and 060 subsets, which may be too large for some systems, these subsets are also provided as four smaller regional files: Northeast, South, Midwest, and West. The following states are included in the regional files. Northeast: Connecticut, Massachusetts, Maine, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, and Vermont. Midwest: Iowa, Illinois, Indiana, Kansas, Michigan, Minnesota, Missouri, North Dakota, Nebraska, Ohio, South Dakota, and Wisconsin. South: Alabama, Arkansas, District of Columbia, Delaware, Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia, and Puerto Rico. West: Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, New Mexico, Nevada, Oregon, Utah, Washington, and Wyoming. (5) To allow for compatibility with SPSS 10, which cannot read raw data files with a record length greater than 32,767, the summary file subsets, including the regional files, are supplied as smaller component files, each with a record length less than the SPSS limit. These component files supplement the complete files containing all of the SF3 variables, all of which have record lengths much larger than 32,767. Four component files are provided for the summary level 010, 040, 380, 385, 160, and 050 subsets: a "first quarter" file with the Geo variables and tables P1-P160I, a "second quarter" file with the Geo variables and tables PCT1-PCT68C, a "third quarter" file with the Geo variables and tables PCT68D-H26, and a "fourth quarter" file with the Geo variables and tables H27-HCT48I. Three component files are supplied for the summary level 060 and 860 subsets: a "first third" file with the Geo variables and tables P1-PCT27, a "second third" file with the Geo variables and tables PCT28-PCT73G, and a "third third" file with the Geo variables and tables PCT73H-HCT48I. Two component files are supplied for the summary level 140 and 158 subsets: a "first half" file with the Geo variables and tables P1-PCT69I and a "second half" file with the Geo variables and tables PCT70A-HCT48I. (6) The implied decimal places in variables INTPTLAT (latitude) and INTPTLON (longitude) in the original data files were made explicit in the subsets. Additionally, the values of all the Geo variables were enclosed in quotes in the subsets, except for variables AREALAND, AREAWATR, POP100, HU100, INTPTLAT, and INTPTLON. (7) The data definition statements were tested with SAS 8, SPSS 10, and Stata/SE 8. (8) The codebook is provided by the principal investigators as a Portable Document Format (PDF) file. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using PDF reader software, such as the Adobe Acrobat Reader. Information on how to obtain a copy of the Acrobat Reader is provided on the ICPSR Web site. (9) The codebook documents data collection procedures, concepts, and individual variables in the original Summary File data as well as the ICPSR-produced subsets, but not the layout and structure of the subsets. That information is contained in the data dictionary files provided with this collection. (10) Each subset contains all of the geographic component iterations in its summary level, if any.
Sample: Every person and housing unit in the United States was asked basic demographic and housing questions, for example, race, age, relationship to householder, housing unit vacancy status, and housing unit tenure. A sample of these people and housing units was asked more detailed questions about items, such as income, occupation, and housing costs. The sampling unit for Census 2000 was the housing unit, including all occupants. There were four different housing unit sampling rates: 1-in-8, 1-in-6, 1-in-4, and 1-in-2 (designed for an overall average of about 1-in-6). The Census Bureau assigned these varying rates based on pre-census occupied housing unit estimates of various geographic and statistical entities, such as incorporated places and interim census tracts. For people living in group quarters or enumerated at long-form eligible service sites (shelters and soup kitchens), the sampling unit was the person and the sampling rate was 1-in-6.
Original ICPSR Release: 2003-06-13
- 2006-01-18 File CB13402.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.
- 2004-07-26 Subsets for six more summary levels were added to the data collection: the nation (summary level 010), PMSAs (summary level 385), places (summary level 160), county subdivisions (summary level 060), 5-Digit ZCTAs (summary level 860), and census tracts in places (summary level 158). The documentation has been revised to reflect these additions.
Related Publications (?)
- Citations exports are provided above.
Export Study-level metadata (does not include variable-level metadata)
If you're looking for collection-level metadata rather than an individual metadata record, please visit our Metadata Records page.