Uniform Crime Reports
Smallest Geographic Unit:
Date of Collection:
Unit of Observation:
All police agencies in the United States between 1977 and 2000.
administrative records data
Data Collection Notes:
The principal investigator submitted data for this project in Microsoft Excel format. ICPSR is distributing the Microsoft Excel data so that secondary users can view the color codes developed by the principal investigator for the various forms of missing data. Additionally, ICPSR converted the original Microsoft Excel data into a full suite of formats for preservation and dissemination, including SAS, SPSS, and Stata formats.
More detailed information about imputation methodologies in the Offenses-Known Uniform Crime Reports, data cleaning, and the creation and testing of simulation datasets is available in the project's report (Targonski, 2011; NCJ 235152).
The purpose of this study was to reexamine and recode missing data in the Uniform Crime Reports (UCR) for the years 1977 to 2000 for all police agencies in the United States.
The principal investigator performed a data cleaning of 20,067 Originating Agency Identifiers (ORIs) based on the Offenses-Known Uniform Crime Reporting (UCR) Program Data from 1977 to 2000. The UCR Offenses-Known data collection assembles monthly crime tabulations on what is known as the "Return A" form, which is submitted monthly by police agencies. This includes the crime index, which encompasses murder, rape, robbery, aggravated assault, burglary, larceny, and motor vehicle theft.
The agency-level files from 1977-2000 were merged by the principal investigator using the ORI as the key variable to create a single longitudinal dataset. The longitudinal dataset was further prepared and cleaned by the principal investigator to create the final version that is being distributed as part of this data collection. Data cleaning entailed performing agency name checks, identifying "true missing" values, creating monthly aggregation missing value codes, identifying agencies that are "covered by" another agency, flagging non-existent agencies, creating researcher assigned missing values according to the "rule of 20", and accounting for negative values as well as outlier values. Specifically, the principal investigator performed the following data cleaning tasks:
- Agency name checks were performed to ensure the ORI code for each year refers to one and only one agency and to determine the years in which the ORI existed.
- Any month with a missing value for the Return A variable DATE LAST UPDATE was recoded as a "true missing" value (-99).
- To accurately account for the number of months reported, months that were flagged as missing by the DATE LAST UPDATE were recoded using distinct monthly aggregation missing value codes (-112 through -102).
- Some smaller agencies choose to report their UCR data through a larger neighboring agency, rather than report directly themselves to the FBI or state-reporting agency. This is a "covered by" situation, whereby the larger agency acts as the "covering" agency. For the analysis of missing data when an agency's data was "covered by" another agency, a missing value code (-85) was assigned to months in which the agency was covered by another agency.
- For the years that an ORI was not in existence between 1977 and 2000, another missing value code (-80) was also assigned to the months in which that particular agency did not exist.
- A missing value code (-90) was assigned according to a "rule of 20". The "rule of 20" established that an ORI with an average of 20 or more index crimes per month could not have zero index crimes in a month, if the DATE LAST UPDATE flagged the Return A as being submitted.
- For the purpose of screening outliers in the negative values, -4 was determined as the cutoff for legitimate values. Any values less than -4 were recoded as missing values (-99), since they were most likely data entry errors.
- To identify additional outlier values, as part of the data screening process, each agency's trend was examined graphically. In the process, outliers were detected for the crime index. The outlier values were also recoded as -90.
- For the crimes of motor vehicle theft, larceny, burglary, assault, robbery, rape, and murder, missing data codes (-97 through -91) were assigned if a particular index crime was missing. Additionally, if more than one index crime was missing, it was assigned a separate missing data code (-98).
The sample consists of 20,067 police agencies in the United States, as identified by all Originating Agency Identifiers (ORIs) in the Offenses-Known Uniform Crime Reporting data from 1977 to 2000.
Mode of Data Collection:
UNIFORM CRIME REPORTING PROGRAM DATA: 1975-1997 [ICPSR 9028]
UNIFORM CRIME REPORTING PROGRAM DATA: OFFENSES KNOWN AND CLEARANCES BY ARREST, 1998 [ICPSR 2904]
UNIFORM CRIME REPORTING PROGRAM DATA: OFFENSES KNOWN AND CLEARANCES BY ARREST, 1999 [ICPSR 3158]
UNIFORM CRIME REPORTING PROGRAM DATA: OFFENSES KNOWN AND CLEARANCES BY ARREST, 2000 [ICPSR 3447]
Description of Variables:
This study contains a total of 410 variables including an Originating Agency Identifier (ORI) name and code, population totals by year, covering agency by year, statistical metropolitan area by year, county code by year, FBI group by year, and FBI crime index totals by month and year.
Presence of Common Scales:
Extent of Processing: ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of
disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major
statistical software formats as well as standard codebooks to accompany the data. In addition to
these procedures, ICPSR performed the following processing steps for this data collection:
Created variable labels and/or value labels.
Checked for undocumented or out-of-range codes.