National Archive of Criminal Justice Data
This dataset is maintained and distributed by the National Archive of Criminal Justice Data (NACJD), the criminal justice archive within ICPSR. NACJD is primarily sponsored by three agencies within the U.S. Department of Justice: the Bureau of Justice Statistics, the National Institute of Justice, and the Office of Juvenile Justice and Delinquency Prevention.
Development of Crime Forecasting and Mapping Systems for Use by Police in Pittsburgh, Pennsylvania, and Rochester, New York, 1990-2001 (ICPSR 4545)
Principal Investigator(s): Cohen, Jacqueline, Carnegie Mellon University; Gorr, Wilpen L., Carnegie Mellon University
This study was designed to develop crime forecasting as an application area for police in support of tactical deployment of resources. Data on crime offense reports and computer aided dispatch (CAD) drug calls and shots fired calls were collected from the Pittsburgh, Pennsylvania Bureau of Police for the years 1990 through 2001. Data on crime offense reports were collected from the Rochester, New York Police Department from January 1991 through December 2001. The Rochester CAD drug calls and shots fired calls were collected from January 1993 through May 2001. A total of 1,643,828 records (769,293 crime offense and 874,535 CAD) were collected from Pittsburgh, while 538,893 records (530,050 crime offense and 8,843 CAD) were collected from Rochester. ArcView 3.3 and GDT Dynamap 2000 Street centerline maps were used to address match the data, with some of the Pittsburgh data being cleaned to fix obvious errors and increase address match percentages. A SAS program was used to eliminate duplicate CAD calls based on time and location of the calls. For the 1990 through 1999 Pittsburgh crime offense data, the address match rate was 91 percent. The match rate for the 2000 through 2001 Pittsburgh crime offense data was 72 percent. The Pittsburgh CAD data address match rate for 1990 through 1999 was 85 percent, while for 2000 through 2001 the match rate was 100 percent because the new CAD system supplied incident coordinates. The address match rates for the Rochester crime offenses data was 96 percent, and 95 percent for the CAD data. Spatial overlay in ArcView was used to add geographic area identifiers for each data point: precinct, car beat, car beat plus, and 1990 Census tract. The crimes included for both Pittsburgh and Rochester were aggravated assault, arson, burglary, criminal mischief, misconduct, family violence, gambling, larceny, liquor law violations, motor vehicle theft, murder/manslaughter, prostitution, public drunkenness, rape, robbery, simple assaults, trespassing, vandalism, weapons, CAD drugs, and CAD shots fired.
These data are freely available.
Cohen, Jacqueline, and Wilpen L. Gorr. Development of Crime Forecasting and Mapping Systems for Use by Police in Pittsburgh, Pennsylvania, and Rochester, New York, 1990-2001. ICPSR04545-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2006-08-31. http://doi.org/10.3886/ICPSR04545.v1
Persistent URL: http://doi.org/10.3886/ICPSR04545.v1
This study was funded by:
- United States Department of Justice. Office of Justice Programs. National Institute of Justice (2001-IJ-CX-0018)
Scope of Study
Subject Terms: aggravated assault, arson, assault, burglary, crime mapping, crime patterns, crime prediction, driving under the influence, drug law offenses, fraud, geographic information systems, homicide, rape, robbery, trend analysis
Smallest Geographic Unit: police car patrol beats
Date of Collection:
Unit of Observation: police car patrol beats
All criminal offenses or computer aided dispatch (CAD)
calls in Pittsburgh, Pennsylvania, from 1990 through 2001.
All criminal offenses or computed aided dispatch (CAD) calls in Rochester, New York, from 1991 through 2001.
Data Types: administrative records data
Data Collection Notes:
The files are provided in a WinZip archive with 73 files in three folders. The Statistical Data Files folder provides data for Pittsburgh and Rochester in comma-separated text files. The GIS folder provides geographic files for Pittsburgh and Rochester for use with mapping software. The Report Files folder provides the final report, a data dictionary, and the first and last five observations. The WinZip archive must be extracted to the C:\ drive in order for the ArcView project file to work correctly.
Study Purpose: The purpose of this study was to develop crime forecasting as an application area for police in support of tactical deployment of resources. The crime forecasting methods and models included (1) a multivariate model for estimating crime seasonality based on demographic and land use demographics, and (2) leading indicator models with 4 and 12 time lags. An application of tracking signals as a supporting crime analysis tool to automatically detect crime series pattern changes was also introduced.
Study Design: The crime data collected for this study were from two Northeastern, mid-sized cities: Pittsburgh, Pennsylvania, and Rochester, New York. The researchers had previously collected all crime offense reports and computer aided dispatch (CAD) calls from the Pittsburgh Bureau of Police for the years 1990 through 1998. The current study added the years 1999 through 2001. Since Pittsburgh started using a new record management system in 2000, all of the 1990 through 1999 data had to be reprocessed to ensure that the 1999 data were treated identically to the 1990 through 1998 data and to make as smooth a connection as possible to the new format of the 2000 and 2001 data. The 1990 through 1999 offense datasets were in 17 flat files extracted from an old mainframe system. Oracle SQL Loader was used to import the data into an Oracle database. The imported data were in 13 tables. The tables were then exported into an Access database. In Access, links were created between the tables and various queries were created to limit crime records to offenses only. Several fields were concatenated to get a complete street address for each crime record. A crime code table, created by the researchers, was joined to the database so that each crime record would have a consistent descriptive crime name that matched the Rochester data. The resultant table containing the Pittsburgh offense data for 1990 through 1999 has 637,166 records. The Pittsburgh offense data for 2000 and 2001 were taken from an Oracle database. The 132,127 records were appended to the earlier data, with a crime code table added so each crime record has a descriptive major code. The Pittsburgh CAD data have 874,535 records. Only CAD drugs and CAD shots are used in the forecast models. CAD data could not be obtained for November and December of 1999. Instead, simple exponential smoothing was used to forecast those two months, and the forecasts are used as data values in the datasets. A SAS program was used to eliminate duplicated CAD calls based on the time and location of the calls. The total number of crime offense and CAD records for Pittsburgh is 1,643,828. ArcView 3.3 and GDT Dynamap 2000 Street centerline maps were used to address match the Pittsburgh data. Some data were cleaned to fix obvious errors and increase address match percentages. For the 1990 through 1999 crime offense data the address match rate was 91 percent. The match rate for the 2000 through 2001 crime offense data was 72 percent. The CAD data address match rate for 1990 through 1999 was 85 percent, while for 2000 through 2001 the match rate was 100 percent because the new CAD system supplied incident coordinates. Once the data addresses were matched, spatial overlay in ArcView was used to add geographic area identifiers for each data point: precinct, car beat, car beat plus, and 1990 Census tract. Car beat plus is an aggregation of car beats designed to increase monthly average crime volumes while keeping the resultant districts from crossing precinct boundaries and maintaining compact areas. Car beats are aggregations of census tracts and were the patrol districts used by the Pittsburgh Bureau of Police during the study period. The next step was to aggregate a number of crime types to monthly times series for each geography. The Rochester offense data contain 530,050 records from January 1991 to December 2001. All files were imported and processed in Access. The Rochester CAD records contain data from January 1993 to May 2001 and 3,767,002 records. However, only the 8,843 records containing the CAD shots and drugs data were used. Again, SAS was used to eliminate duplicate CAD calls. The total number of crime offense and CAD records for Rochester is 538,893. ArcView 3.3 and GDT Dynamap 2000 Street centerline maps were also used to address match the Rochester data. No data cleaning was necessary. The address match rates for the Rochester crime offenses data was 96 percent, and 95 percent for the CAD data. Spatial overlay followed in the same fashion as in Pittsburgh.
Sample: For Pittsburgh, 769,293 crime offense records and 874,535 computer aided dispatch (CAD) drug calls and shot fired call records, for a total of 1,643,828 records, are included in the data. For Rochester, 530,050 crime offense records and 8,843 CAD drug call and shots fired call records, for a total of 538,893 records, are included in the data.
Mode of Data Collection: record abstracts
The individual offense incident and computer aided dispatch (CAD) data were obtained from the Pittsburgh, Pennsylvania Bureau of Police and Rochester, New York Police Department.
Description of Variables: The data include an identification number for the observations unit's geography: tract, beat number, beat plus number, or precinct number. Two date variables, year and month, are included. Finally, computer aided dispatch (CAD) drug calls, shot fired calls, and crime offense variables are included. Thirty crime offense variables, aggravated assault, arson, assaults by prisoners, assaults on officers, burglaries, criminal mischief, misconduct, drug offense, drunken driving, embezzlement, family violence, forgery, fraud, gambling, larceny, liquor violation, motor vehicle theft, murder/manslaughter, negligent manslaughter, public drunkenness, rape, robbery, run away, simple assault, receiving stolen property, trespassing, vagrancy, vandalism, weapons charges, prostitution, and sex offense are included in the data.
Response Rates: Not applicable.
Presence of Common Scales: none
Original ICPSR Release: 2006-08-31
- Citations exports are provided above.
Export Study-level metadata (does not include variable-level metadata)