Case Tracking and Mapping System Developed for the United States Attorney's Office, Southern District of New York, 1997-1998 (ICPSR 2929)
Crime Hot Spot Forecasting with Data from the Pittsburgh [Pennsylvania] Bureau of Police, 1990-1998 (ICPSR 3469)
This study used crime count data from the Pittsburgh, Pennsylvania, Bureau of Police offense reports and 911 computer-aided dispatch (CAD) calls to determine the best univariate forecast method for crime and to evaluate the value of leading indicator crime forecast models.
The researchers used the rolling-horizon experimental design, a design that maximizes the number of forecasts for a given time series at different times and under different conditions. Under this design, several forecast models are used to make alternative forecasts in parallel. For each forecast model included in an experiment, the researchers estimated models on training data, forecasted one month ahead to new data not previously seen by the model, and calculated and saved the forecast error. Then they added the observed value of the previously forecasted data point to the next month's training data, dropped the oldest historical data point, and forecasted the following month's data point. This process continued over a number of months.
A total of 15 statistical datasets and 3 geographic information systems (GIS) shapefiles resulted from this study.
The statistical datasets consist of
- Univariate Forecast Data by Police Precinct (Dataset 1) with 3,240 cases
- Output Data from the Univariate Forecasting Program: Sectors and Forecast Errors (Dataset 2) with 17,892 cases
- Multivariate, Leading Indicator Forecast Data by Grid Cell (Dataset 3) with 5,940 cases
- Output Data from the 911 Drug Calls Forecast Program (Dataset 4) with 5,112 cases
- Output Data from the Part One Property Crimes Forecast Program (Dataset 5) with 5,112 cases
- Output Data from the Part One Violent Crimes Forecast Program (Dataset 6) with 5,112 cases
- Input Data for the Regression Forecast Program for 911 Drug Calls (Dataset 7) with 10,011 cases
- Input Data for the Regression Forecast Program for Part One Property Crimes (Dataset 8) with 10,011 cases
- Input Data for the Regression Forecast Program for Part One Violent Crimes (Dataset 9) with 10,011 cases
- Output Data from Regression Forecast Program for 911 Drug Calls: Estimated Coefficients for Leading Indicator Models (Dataset 10) with 36 cases
- Output Data from Regression Forecast Program for Part One Property Crimes: Estimated Coefficients for Leading Indicator Models (Dataset 11) with 36 cases
- Output Data from Regression Forecast Program for Part One Violent Crimes: Estimated Coefficients for Leading Indicator Models (Dataset 12) with 36 cases
- Output Data from Regression Forecast Program for 911 Drug Calls: Forecast Errors (Dataset 13) with 4,936 cases
- Output Data from Regression Forecast Program for Part One Property Crimes: Forecast Errors (Dataset 14) with 4,936 cases
- Output Data from Regression Forecast Program for Part One Violent Crimes: Forecast Errors (Dataset 15) with 4,936 cases.
- The GIS Shapefiles (Dataset 16) are provided with the study in a single zip file: Included are polygon data for the 4,000 foot, square, uniform grid system used for much of the Pittsburgh crime data (grid400); polygon data for the 6 police precincts, alternatively called districts or zones, of Pittsburgh(policedist); and polygon data for the 3 major rivers in Pittsburgh the Allegheny, Monongahela, and Ohio (rivers).
CrimeMapTutorial Workbooks and Sample Data for ArcView and MapInfo, 2000 (ICPSR 3143)
CrimeStat III: A Spatial Statistics Program for the Analysis of Crime Incident Locations (Version 3.3), United States, 2010 (ICPSR 2824)
CrimeStat III is a spatial statistics program for the analysis of crime incident locations, developed by Ned Levine and Associates under the direction of Ned Levine, PhD, that was funded by grants from the National Institute of Justice (grants 1997-IJ-CX-0040, 1999-IJ-CX-0044, 2002-IJ-CX-0007, and 2005-IJ-CX-K037). The program is Windows-based and interfaces with most desktop GIS programs. The purpose is to provide supplemental statistical tools to aid law enforcement agencies and criminal justice researchers in their crime mapping efforts. CrimeStat is being used by many police departments around the country as well as by criminal justice and other researchers.
The program inputs incident locations (e.g., robbery locations) in 'dbf', 'shp', ASCII or ODBC-compliant formats using either spherical or projected coordinates. It calculates various spatial statistics and writes graphical objects to ArcGIS, MapInfo, Surfer for Windows, and other GIS packages.
CrimeStat is organized into five sections:
Data Setup- Primary file - this is a file of incident or point locations with X and Y coordinates. The coordinate system can be either spherical (lat/lon) or projected. Intensity and weight values are allowed. Each incident can have an associated time value.
- Secondary file - this is an associated file of incident or point locations with X and Y coordinates. The coordinate system has to be the same as the primary file. Intensity and weight values are allowed. The secondary file is used for comparison with the primary file in the risk-adjusted nearest neighbor clustering routine and the duel kernel interpolation.
- Reference file - this is a grid file that overlays the study area. Normally, it is a regular grid though irregular ones can be imported. CrimeStat can generate the grid if given the X and Y coordinates for the lower-left and upper-right corners.
- Measurement parameters - This page identifies the type of distance measurement (direct, indirect or network) to be used and specifies parameters for the area of the study region and the length of the street network. CrimeStat III has the ability to utilize a network for linking points. Each segment can be weighted by travel time, travel speed, travel cost or simple distance. This allows the interaction between points to be estimated more realistically.
- Spatial distribution - statistics for describing the spatial distribution of incidents, such as the mean center, center of minimum distance, standard deviational ellipse, the convex hull, or directional mean.
- Spatial autocorrelation - statistics for describing the amount of spatial autocorrelation between zones, including general spatial autocorrelation indices - Moran's I , Geary's C, and the Getis-Ord General G, and correlograms that calculate spatial autocorrelation for different distance separations - the Moran, Geary, Getis-Ord correlograms. Several of these routines can simulate confidence intervals with a Monte Carlo simulation.
- Distance analysis I - statistics for describing properties of distances between incidents including nearest neighbor analysis, linear nearest neighbor analysis, and Ripley's K statistic. There is also a routine that assigns the primary points to the secondary points, either on the basis of nearest neighbor or point-in-polygon, and then sums the results by the secondary point values.
- Distance analysis II - calculates matrices representing the distance between points for the primary file, for the distance between the primary and secondary points, and for the distance between either the primary or secondary file and the grid.
- 'Hot spot' analysis I - routines for conducting 'hot spot' analysis including the mode, the fuzzy mode, hierarchical nearest neighbor clustering, and risk-adjusted nearest neighbor hierarchical clustering. The hierarchical nearest neighbor hot spots can be output as ellipses or convex hulls.
- 'Hot spot' analysis II - more routines for conducting hot spot analysis including the Spatial and Temporal Analysis of Crime (STAC), K-means clustering, Anselin's local Moran, and the Getis-Ord local G statistics. The STAC and K-means hot spots can be output as ellipses or convex hulls. All of these routines can simulate confidence intervals with a Monte Carlo simulation.
- Interpolation I - a single-variable kernel density estimation routine for producing a surface or contour estimate of the density of incidents (e.g., burglaries) and a dual-variable kernel density estimation routine for comparing the density of incidents to the density of an underlying baseline (e.g., burglaries relative to the number of households).
- Interpolation II - a Head Bang routine for smoothing zonal data that can be applied to events (volumes), rates or can be used to create rates. In addition, there is an interpolated Head Bang routine for interpolating the smoothed Head Bang result to grid cells.
- Space-time analysis - a set of tools for analyzing clustering in time and in space. These include the Knox and Mantel indices, which look for the relationship between time and space, and the Correlated Walk Analysis module, which analyzes and predicts the behavior of a serial offender and a spatial-temporal moving average.
- Journey to crime analysis - a simple criminal justice method for estimating the likely location of a serial offender given the distribution of incidents and a model of travel distance. The routine allows the user to estimate a travel model with a calibration file and apply it to the serial events. It can be used to identify a likely location given the distribution of 'points' and assumptions about travel behavior. There is a routine for drawing lines between origins and destinations (crime trips).
- Bayesian journey to crime analysis - an advanced criminal justice method for estimating the likely location of a serial offender given the distribution of incidents, a model of travel distance, and an origin-destination matrix showing the relationship between where crimes were committed and where offenders lived. A diagnostics routine analyzes serial offenders for whom their residence is known and estimates which of several journey to crime estimates is most accurate. A selected method can be applied to identify a likely residence location of a single serial offender given the distribution of incidents, assumptions about travel behavior, and the origin of offenders who committed crimes in the same locations.
- Regression modeling - a module for analyzing a relationship between a dependent variable and one or more independent variables. The CrimeStat regression module includes both Ordinary Least Squares and Poisson-based regression models, estimated from Maximum Likelihood (MLE) or Markov Chain Monte Carlo (MCMC) algorithms. The current version includes six different models including OLS, Poisson with Linear Dispersion Correction, Poisson-Gamma and a Poisson-Gamma-Conditional Autoregressive (CAR) spatial regression model. The module can handle very large datasets through a Block Sampling approach. There is also a module for applying estimated coefficients to a new dataset to make predictions.
Crime travel demand modeling is a new module in CrimeStat III. It is an application of travel demand modeling, widely used in transportation planning, to crime analysis. The analysis is done by zones. First, crime 'trips' are defined as a link between an offender residence/origin location and a crime location. The number of crimes originating in each zone is counted as is the number of crimes ending in each zone. Second, the model is run sequentially in four separate stages with multiple routine in each stage:
- Trip Generation - Separate models are produced that predict the number of crimes originating in each zone (origins) and the number of crimes ending in each zone (destinations). CrimeStat III uses a multivariate Poisson regression model, with stepwise options, to create the prediction. Trips from outside the study area (external trips) can be added to the origin model to account for travel from outside the region. Once the models are created, a balancing procedure ensures that the number of origins equals the number of destinations.
- Trip Distribution - Using the predicted number of crime trips originating in each zone and the predicted number of trips occurring in each zone, the second stage distributes trips from each zone to every other zone using a gravity model. There are routines for calculating the actual (observed) distribution from individual data, for estimating the prediction coefficients, and for applying the predicted coefficients to the predicted origins and destinations. Another routine allows a comparison of the predicted trip distribution with the observed trip distribution.
- Mode Split - The predicted number of trips for each zone-to-zone pair can be split into likely travel modes using an accessibility function that approximates the utility of one mode relative to the others.
- Network Assignment - Finally, the predicted trips from each zone to every other zone by travel mode are assigned to a likely route based on the shortest path algorithm. The output includes the likely routes taken for each origin-destination zone pair and the total volume of trips on network links. This step requires a travel network, one for each travel mode. There are additional utilities for calculating transit networks from station/stop locations and for testing for one-way streets.
- Parameters can be saved and re-loaded.
- Tab colors can be changed.
- Monte Carlo simulation data can be output.
CrimeStat is accompanied by sample datasets and a manual that gives the background behind the statistics and examples. The manual also discusses applications of CrimeStat developed by other analysts and researchers. The program and sample data sets are in Windows-based zipped files that can be downloaded. The manual is a set of individual chapters in PDF files. They can be viewed online or downloaded. If downloading the PDF chapters separately, they should be saved into the same directory as the CrimeStat program. If the PDF file names are not renamed, they can be accessed directly from the program's help menu.
CrimeStat LibrariesThe CrimeStat Libraries (version 1.0) are component objects that allow for the functions of CrimeStat to be programmed directly into custom software or systems. The CrimeStat Libraries include all of the routines that were developed through version 2.0 of the regular CrimeStat program, including spatial description, hot spot analysis, and kernel density interpolation routines. Additional spatial autocorrelation routines have been included. The libraries can input dbf, shape, and Ascii text files and can output to shape file, MIF/MID files, ASCII text files, and KML files.
CrimeStat III User Workbook and Data (ICPSR 23622)
Detection of Crime, Resource Deployment, and Predictors of Success: A Multi-Level Analysis of CCTV in Newark, New Jersey, 2007-2011 (ICPSR 34619)
The Detection of Crime, Resource Deployment, and Predictors of Success: A Multi-Level Analysis of Closed-Circuit Television (CCTV) in Newark, NJ collection represents the findings of a multi-level analysis of the Newark, New Jersey Police Department's video surveillance system. This collection contains multiple quantitative data files (Datasets 1-14) as well as spatial data files (Dataset 15 and Dataset 16). The overall project was separated into three components:
- Component 1 (Dataset 1, Individual CCTV Detections and Calls-For-Service Data and Dataset 2, Weekly CCTV Detections in Newark Data) evaluates CCTV's ability to increase the "certainty of punishment" in target areas;
- Component 2 (Dataset 3, Overall Crime Incidents Data; Dataset 4, Auto Theft Incidents Data; Dataset 5, Property Crime Incidents Data; Dataset 6, Robbery Incidents Data; Dataset 7, Theft From Auto Incidents Data; Dataset 8, Violent Crime Incidents Data; Dataset 9, Attributes of CCTV Catchment Zones Data; Dataset 10, Attributes of CCTV Camera Viewsheds Data; and Dataset 15, Impact of Micro-Level Features Spatial Data) analyzes the context under which CCTV cameras best deter crime. Micro-level factors were grouped into five categories: environmental features, line-of-sight, camera design and enforcement activity (including both crime and arrests); and
- Component 3 (Dataset 11, Calls-for-service Occurring Within CCTV Scheme Catchment Zones During the Experimental Period Data; Dataset 12, Calls-for-service Occurring Within CCTV Schemes During the Experimental Period Data; Dataset 13, Targeted Surveillances Conducted by the Experimental Operators Data; Dataset 14, Weekly Surveillance Activity Data; and Dataset 16, Randomized Controlled Trial Spatial Data) was a randomized, controlled trial measuring the effects of coupling proactive CCTV monitoring with directed patrol units.
Over 40 separate four-hour tours of duty, an additional camera operator was funded to monitor specific CCTV cameras in Newark. Two patrol units were dedicated solely to the operators and were tasked with exclusively responding to incidents of concern detected on the experimental cameras. Variables included throughout the datasets include police report and incident dates, crime type, disposition code, number of each type of incident that occurred in a viewshed precinct, number of CCTV detections that resulted in any police enforcement, and number of schools, retail stores, bars and public transit within the catchment zone.
Evaluation of the Community Supervision Mapping System for Released Prisoners in Rhode Island, 2008-2010 (ICPSR 32004)
Explaining Developmental Crime Trajectories at Places: A Study of "Crime Waves" and "Crime Drops" at Micro Units of Geography in Seattle, Washington, 1989-2004 (ICPSR 28161)
Exploratory Spatial Data Approach to Identify the Context of Unemployment-Crime Linkages in Virginia, 1995-2000 (ICPSR 4546)
Geographies of Urban Crime in Nashville, Tennessee, Portland, Oregon, and Tucson, Arizona, 1998-2002 (ICPSR 4547)
Integrating Data to Reduce Violence, Milwaukee, WI, 2015-2016 (ICPSR 36591)
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.
The study investigated the feasibility of implementing the Cardiff Model. The Cardiff Model is a unique violence surveillance system and intervention that involves data sharing and violence prevention planning between law enforcement and the medical field. Anonymized data on assaults from emergency and police departments (EDs; PDs) are combined to detail assault incidents and "hotspots." Data are discussed by a multidisciplinary consortium, which develops and implements a data-informed violence prevention action plan that includes behavioral, environmental, and policy changes to impact violence. Model actions led to decreases in injurious assaults and this model is now statutory in the United Kingdom.
The Cardiff Model has never been translated to the U.S. and would require an investigation within our health care system and in different geographical and population contexts. This study investigated the feasibility of essential Cardiff Model Components in order to refine study procedures and situate this community to request further funds for full model implementation.
As part of this study, researchers collected a number of feasibility measures from ED and study staff to evaluate the feasibility of translating included model components. Geospatial and statistical analyses investigated the added benefit of the combined ED, PD and Emergency Medical Services (EMS) data.
The study contains 1 SPSS data files (CHW Data_1.1.15 to 7.31.16.sav (n=748; 14 variables)), 1 STATA data file (nurse survey data.dta (n=43; 26 variables)), a text document (Nurse Survey_Qualitative data.txt), and 1 excel file (CHW Incidents_Block level data only.xlsx).
A Multi-Jurisdictional Test of Risk Terrain Modeling and a Place-Based Evaluation of Environmental Risk-Based Patrol Deployment Strategies, 6 U.S. States, 2012-2014 (ICPSR 36369)
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.
The study used a place-based method of evaluation and spatial units of analysis to measure the extent to which allocating police patrols to high-risk areas effected the frequency and spatial distribution of new crime events in 5 U.S. cities. High-risk areas were defined using risk terrain modeling methods. Risk terrain modeling, or RTM, is a geospatial method of operationalizing the spatial influence of risk factors to common geographic units.
The collection contains 333 shape files, 8 SPSS files, and 9 Excel files. The shape files include both city level risk factor locations and crime data from police departments. SPSS and Excel files contain output from GIS data used for analysis.
Policing by Place: A Proposed Multi-level Analysis of the Effectiveness of Risk Terrain Modeling for Allocating Police Resources, 2014-2015 [New York City] (ICPSR 36899)
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.
This study contains data from a project by the New York City Police Department (NYPD) involving GIS data on environmental risk factors that correlate with criminal behavior. The general goal of this project was to test whether risk terrain modeling (RTM) could accurately and effectively predict different crime types occurring across New York City. The ultimate aim was to build an enforcement prediction model to test strategies for effectiveness before deploying resources. Three separate phases were completed to assess the effectiveness and applicability of RTM to New York City and the NYPD. A total of four boroughs (Manhattan, Brooklyn, the Bronx, Queens), four patrol boroughs (Brooklyn North, Brooklyn South, Queens North, Queens South), and four precincts (24th, 44th, 73rd, 110th) were examined in 6-month time periods between 2014 and 2015. Across each time period, a total of three different crime types were analyzed: street robberies, felony assaults, and shootings.
The study includes three shapefiles relating to New York City Boundaries, four shapefiles relating to criminal offenses, and 40 shapefiles relating to risk factors.
Quantifying the Size and Geographic Extent of CCTV's Impact on Reducing Crime in Philadelphia, Pennsylvania, 2003-2013 (ICPSR 35514)
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.
This study was designed to investigate whether the presence of CCTV cameras can reduce crime by studying the cameras and crime statistics of a controlled area. The viewsheds of over 100 CCTV cameras within the city of Philadelphia, Pennsylvania were defined and grouped into 13 clusters, and camera locations were digitally mapped. Crime data from 2003-2013 was collected from areas that were visible to the selected cameras, as well as data from control and displacement areas using an incident reporting database that records the location of crime events. Demographic information was also collected from the mapped areas, such as population density, household information, and data on the specific camera(s) in the area. This study also investigated the perception of CCTV cameras, and interviewed members of the public regarding topics such as what they thought the camera could see, who was watching the camera feed, and if they were concerned about being filmed.
Regional Crime Analysis Geographic Information System (RCAGIS) (ICPSR 3372)
Spatial Analysis of Crime in Appalachia [United States], 1977-1996 (ICPSR 3260)
Spatial Configuration of Places Related to Homicide Events in Washington, DC, 1990-2002 (ICPSR 4544)
The purpose of this research was to further understanding of why crime occurs where it does by exploring the spatial etiology of homicides that occurred in Washington, DC, during the 13-year period 1990-2002.
The researchers accessed records from the case management system of the Metropolitan Police, District of Columbia (MPDC) Homicide Division to collect data regarding offenders and victims associated with the homicide cases. Using geographic information systems (GIS) software, the researchers geocoded the addresses of the incident location, the victim's residence, and offender's residence for each homicide case. They then calculated both Euclidean distance and shortest path distance along the streets between each address per case. Upon applying the concept of triad as developed by Block et al. (2004) in order to create a unit of analysis for studying the convergence of victims and offenders in space, the researchers categorized the triads according to the geometry of locations associated with each case. (Dots represented homicides in which the victim and offender both lived in the residence where the homicide occurred; lines represented homicides that occurred in the home of either the victim or the offender; and triangles represented three non-coincident locations: the separate residences of the victim and offender, as well as the location of the homicide incident.) The researchers then classified each triad according to two separate mobility triangle classification schemes: Traditional Mobility, based on shared or disparate social areas, and Distance Mobility, based on relative distance categories between locations. Finally, the researchers classified each triad by the neighborhood associated with the location of the homicide incident, the location of the victim's residence, and the location of the offender's residence.
A total of 3 statistical datasets and 7 geographic information systems (GIS) shapefiles resulted from this study. Note: All datasets exclude open homicide cases. The statistical datasets consist of Offender Characteristics (Dataset 1) with 2,966 cases; Victim Characteristics (Dataset 2) with 2,311 cases; and Triads Data (Dataset 3) with 2,510 cases. The GIS shapefiles have been grouped into a zip file (Dataset 4). Included are point data for homicide locations, offender residences, triads, and victim residences; line data for streets in the District of Columbia, Maryland, and Virginia; and polygon data for neighborhood clusters in the District of Columbia.