The purpose of this study was to develop crime
forecasting as an application area for police in support of tactical
deployment of resources. The crime forecasting methods and models
included (1) a multivariate model for estimating crime seasonality
based on demographic and land use demographics, and (2) leading
indicator models with 4 and 12 time lags. An application of tracking
signals as a supporting crime analysis tool to automatically detect
crime series pattern changes was also introduced.
The crime data collected for this study were from
two Northeastern, mid-sized cities: Pittsburgh, Pennsylvania, and
Rochester, New York. The researchers had previously collected all
crime offense reports and computer aided dispatch (CAD) calls from the
Pittsburgh Bureau of Police for the years 1990 through 1998. The
current study added the years 1999 through 2001. Since Pittsburgh
started using a new record management system in 2000, all of the 1990
through 1999 data had to be reprocessed to ensure that the 1999 data
were treated identically to the 1990 through 1998 data and to make as
smooth a connection as possible to the new format of the 2000 and 2001
data. The 1990 through 1999 offense datasets were in 17 flat files
extracted from an old mainframe system. Oracle SQL Loader was used to
import the data into an Oracle database. The imported data were in 13
tables. The tables were then exported into an Access database. In
Access, links were created between the tables and various queries were
created to limit crime records to offenses only. Several fields were
concatenated to get a complete street address for each crime record. A
crime code table, created by the researchers, was joined to the
database so that each crime record would have a consistent descriptive
crime name that matched the Rochester data. The resultant table
containing the Pittsburgh offense data for 1990 through 1999 has
637,166 records. The Pittsburgh offense data for 2000 and 2001 were
taken from an Oracle database. The 132,127 records were appended to
the earlier data, with a crime code table added so each crime record
has a descriptive major code. The Pittsburgh CAD data have 874,535
records. Only CAD drugs and CAD shots are used in the forecast
models. CAD data could not be obtained for November and December of
1999. Instead, simple exponential smoothing was used to forecast those
two months, and the forecasts are used as data values in the datasets.
A SAS program was used to eliminate duplicated CAD calls based on the
time and location of the calls. The total number of crime offense and
CAD records for Pittsburgh is 1,643,828. ArcView 3.3 and GDT Dynamap
2000 Street centerline maps were used to address match the Pittsburgh
data. Some data were cleaned to fix obvious errors and increase
address match percentages. For the 1990 through 1999 crime offense
data the address match rate was 91 percent. The match rate for the
2000 through 2001 crime offense data was 72 percent. The CAD data
address match rate for 1990 through 1999 was 85 percent, while for
2000 through 2001 the match rate was 100 percent because the new CAD
system supplied incident coordinates. Once the data addresses were
matched, spatial overlay in ArcView was used to add geographic area
identifiers for each data point: precinct, car beat, car beat plus,
and 1990 Census tract. Car beat plus is an aggregation of car beats
designed to increase monthly average crime volumes while keeping the
resultant districts from crossing precinct boundaries and maintaining
compact areas. Car beats are aggregations of census tracts and were
the patrol districts used by the Pittsburgh Bureau of Police during
the study period. The next step was to aggregate a number of crime
types to monthly times series for each geography. The Rochester
offense data contain 530,050 records from January 1991 to December
2001. All files were imported and processed in Access. The Rochester
CAD records contain data from January 1993 to May 2001 and 3,767,002
records. However, only the 8,843 records containing the CAD shots and
drugs data were used. Again, SAS was used to eliminate duplicate CAD
calls. The total number of crime offense and CAD records for
Rochester is 538,893. ArcView 3.3 and GDT Dynamap 2000 Street
centerline maps were also used to address match the Rochester data. No
data cleaning was necessary. The address match rates for the Rochester
crime offenses data was 96 percent, and 95 percent for the CAD
data. Spatial overlay followed in the same fashion as in Pittsburgh.
For Pittsburgh, 769,293 crime offense records and 874,535
computer aided dispatch (CAD) drug calls and shot fired call records,
for a total of 1,643,828 records, are included in the data. For
Rochester, 530,050 crime offense records and 8,843 CAD drug call and
shots fired call records, for a total of 538,893 records, are included
in the data.
All criminal offenses or computer aided dispatch (CAD)
calls in Pittsburgh, Pennsylvania, from 1990 through 2001.
All criminal offenses or computed aided dispatch (CAD)
calls in Rochester, New York, from 1991 through 2001.
police car patrol beats
The individual offense incident and computer aided
dispatch (CAD) data were obtained from the Pittsburgh, Pennsylvania
Bureau of Police and Rochester, New York Police Department.
administrative records data
The data include an identification number for the
observations unit's geography: tract, beat number, beat plus number,
or precinct number. Two date variables, year and month, are included.
Finally, computer aided dispatch (CAD) drug calls, shot fired calls,
and crime offense variables are included. Thirty crime offense
variables, aggravated assault, arson, assaults by prisoners, assaults
on officers, burglaries, criminal mischief, misconduct, drug offense,
drunken driving, embezzlement, family violence, forgery, fraud,
gambling, larceny, liquor violation, motor vehicle theft,
murder/manslaughter, negligent manslaughter, public drunkenness, rape,
robbery, run away, simple assault, receiving stolen property,
trespassing, vagrancy, vandalism, weapons charges, prostitution, and
sex offense are included in the data.
Not applicable.
none