Missing Data in the Uniform Crime Reports (UCR), 1977-2000 [United States] (ICPSR 32061)
This study reexamined and recoded missing data in the Uniform Crime Reports (UCR) for the years 1977 to 2000 for all police agencies in the United States. The principal investigator conducted a data cleaning of 20,067 Originating Agency Identifiers (ORIs) contained within the Offenses-Known UCR data from 1977 to 2000. Data cleaning involved performing agency name checks and creating new numerical codes for different types of missing data including missing data codes that identify whether a record was aggregated to a particular month, whether no data were reported (true missing), if more than one index crime was missing, if a particular index crime (motor vehicle theft, larceny, burglary, assault, robbery, rape, murder) was missing, researcher assigned missing value codes according to the "rule of 20", outlier values, whether an ORI was covered by another agency, and whether an agency did not exist during a particular time period.
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
WARNING: This study is over 150MB in size and may take several minutes to download on a typical internet connection.
Targonski, Joseph. Missing Data in the Uniform Crime Reports (UCR), 1977-2000 [United States]. ICPSR32061-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2012-11-26. http://doi.org/10.3886/ICPSR32061.v1
Persistent URL: http://doi.org/10.3886/ICPSR32061.v1
This study was funded by:
- United States Department of Justice. Office of Justice Programs. National Institute of Justice (2004-IJ-CX-0006)
Scope of Study
Subject Terms: assault, auto theft, burglary, crime rates, crime reporting, crime statistics, larceny, law enforcement, murder, offenses, police, police departments, police records, police reports, rape, records management, robbery, Uniform Crime Reports
Geographic Coverage: United States
The principal investigator submitted data for this project in Microsoft Excel format. ICPSR is distributing the Microsoft Excel data so that secondary users can view the color codes developed by the principal investigator for the various forms of missing data. Additionally, ICPSR converted the original Microsoft Excel data into a full suite of formats for preservation and dissemination, including SAS, SPSS, and Stata formats.
More detailed information about imputation methodologies in the Offenses-Known Uniform Crime Reports, data cleaning, and the creation and testing of simulation datasets is available in the project's report (Targonski, 2011; NCJ 235152).
The principal investigator performed a data cleaning of 20,067 Originating Agency Identifiers (ORIs) based on the Offenses-Known Uniform Crime Reporting (UCR) Program Data from 1977 to 2000. The UCR Offenses-Known data collection assembles monthly crime tabulations on what is known as the "Return A" form, which is submitted monthly by police agencies. This includes the crime index, which encompasses murder, rape, robbery, aggravated assault, burglary, larceny, and motor vehicle theft.
The agency-level files from 1977-2000 were merged by the principal investigator using the ORI as the key variable to create a single longitudinal dataset. The longitudinal dataset was further prepared and cleaned by the principal investigator to create the final version that is being distributed as part of this data collection. Data cleaning entailed performing agency name checks, identifying "true missing" values, creating monthly aggregation missing value codes, identifying agencies that are "covered by" another agency, flagging non-existent agencies, creating researcher assigned missing values according to the "rule of 20", and accounting for negative values as well as outlier values. Specifically, the principal investigator performed the following data cleaning tasks:
- Agency name checks were performed to ensure the ORI code for each year refers to one and only one agency and to determine the years in which the ORI existed.
- Any month with a missing value for the Return A variable DATE LAST UPDATE was recoded as a "true missing" value (-99).
- To accurately account for the number of months reported, months that were flagged as missing by the DATE LAST UPDATE were recoded using distinct monthly aggregation missing value codes (-112 through -102).
- Some smaller agencies choose to report their UCR data through a larger neighboring agency, rather than report directly themselves to the FBI or state-reporting agency. This is a "covered by" situation, whereby the larger agency acts as the "covering" agency. For the analysis of missing data when an agency's data was "covered by" another agency, a missing value code (-85) was assigned to months in which the agency was covered by another agency.
- For the years that an ORI was not in existence between 1977 and 2000, another missing value code (-80) was also assigned to the months in which that particular agency did not exist.
- A missing value code (-90) was assigned according to a "rule of 20". The "rule of 20" established that an ORI with an average of 20 or more index crimes per month could not have zero index crimes in a month, if the DATE LAST UPDATE flagged the Return A as being submitted.
- For the purpose of screening outliers in the negative values, -4 was determined as the cutoff for legitimate values. Any values less than -4 were recoded as missing values (-99), since they were most likely data entry errors.
- To identify additional outlier values, as part of the data screening process, each agency's trend was examined graphically. In the process, outliers were detected for the crime index. The outlier values were also recoded as -90.
- For the crimes of motor vehicle theft, larceny, burglary, assault, robbery, rape, and murder, missing data codes (-97 through -91) were assigned if a particular index crime was missing. Additionally, if more than one index crime was missing, it was assigned a separate missing data code (-98).
Sample: The sample consists of 20,067 police agencies in the United States, as identified by all Originating Agency Identifiers (ORIs) in the Offenses-Known Uniform Crime Reporting data from 1977 to 2000.
UNIFORM CRIME REPORTING PROGRAM DATA: 1975-1997 [ICPSR 9028]
UNIFORM CRIME REPORTING PROGRAM DATA: OFFENSES KNOWN AND CLEARANCES BY ARREST, 1998 [ICPSR 2904]
UNIFORM CRIME REPORTING PROGRAM DATA: OFFENSES KNOWN AND CLEARANCES BY ARREST, 1999 [ICPSR 3158]
UNIFORM CRIME REPORTING PROGRAM DATA: OFFENSES KNOWN AND CLEARANCES BY ARREST, 2000 [ICPSR 3447]
Description of Variables: This study contains a total of 410 variables including an Originating Agency Identifier (ORI) name and code, population totals by year, covering agency by year, statistical metropolitan area by year, county code by year, FBI group by year, and FBI crime index totals by month and year.
- Created variable labels and/or value labels.
- Checked for undocumented or out-of-range codes.
Original ICPSR Release: 2012-11-26
- Citations exports are provided above.
Export Study-level metadata (does not include variable-level metadata)