Using Machine Learning to Identify High-Risk Domestic Violence Offenders in New York City, New York, 2006-2017 (ICPSR 38540)

Name: Using Machine Learning to Identify High-Risk Domestic Violence Offenders in New York City, New York, 2006-2017
Published: 2024-02-12
License: https://www.icpsr.umich.edu/web/ICPSR/studies/38540/terms

Version Date: Feb 12, 2024 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Jens Ludwig, University of Chicago. Crime Lab

https://doi.org/10.3886/ICPSR38540.v1

Version V1

Slide tabs to view more

Summary View help for Summary

To address the relative difficulty in predicting domestic violence incidents and effectively targeting resources, the University of Chicago Crime Lab and the New York Police Department (NYPD) collaborated to develop and test a machine learning-based statistical model to predict the risk of domestic violence victimization in New York City.

Phase 1 of the project was to develop a statistical model using machine learning techniques. NYPD administrative records dated between January 2006 and January 2017 were used as input data to build and refine the tool. Due to the lack of unique identifiers for victims in the records, the research team also used data from the Chicago Police Department to create a probabilistic record linkage toolkit (Name Match) to identify which records belonged to the same person within and across data sources.

In Phase 2, the researchers aimed to field test the tool's capability to identify individuals at risk of repeated domestic violence through a large-scale randomized control trial. Measuring the effects of regular home visits of high-priority individuals thought to be at risk of serious domestic assault, the test intended to compare the selections of individuals made by officers versus those predicted by the tool.

This collection contains only the machine learning code files (R and Python) created during secondary analysis, which have been released as a zipped package. Please refer to the Data Roadmap for instructions on how to obtain the original NYPD data. To access the Name Change algorithm and documentation, please visit the Github repository.

Citation View help for Citation

Ludwig, Jens. Using Machine Learning to Identify High-Risk Domestic Violence Offenders in New York City, New York, 2006-2017. Inter-university Consortium for Political and Social Research [distributor], 2024-02-12. https://doi.org/10.3886/ICPSR38540.v1

Export Citation:

RIS (generic format for RefWorks, EndNote, etc.)
EndNote

Funding View help for Funding

United States Department of Justice. Office of Justice Programs. National Institute of Justice (2017-VA-CX-0033)

Subject Terms View help for Subject Terms

criminal justice system domestic violence intimate partner violence prediction risk assessment technology victims

Geographic Coverage View help for Geographic Coverage

New York City United States New York (state)

Smallest Geographic Unit View help for Smallest Geographic Unit

None

Distributor(s) View help for Distributor(s)

Inter-university Consortium for Political and Social Research

Hide

Time Period(s) View help for Time Period(s)

2006-01-01 -- 2019-05-30

Date of Collection View help for Date of Collection

2006-01-01 -- 2019-05-30

Hide

Study Purpose View help for Study Purpose

The purpose of this project was to develop and test a machine learning-based statistical model to predict the risk of domestic violence victimization to improve targeting of resources in New York City.

Study Design View help for Study Design

The machine learning tool developed incorporated New York Police Department (NYPD) administrative records covering all of New York City between January 2006 and January 2017, including domestic incident reports, criminal complaints, arrests, aided reports, shootings, and homicides.

While unique identifiers were present for arrestees, they did not exist for victims. The research team developed a probabilistic record linkage algorithm, Name Match, to identify which records belonged to the same person within and across data sources. The algorithm compares identifying fields (e.g., name, birthdate, address, sex, race) between two records and predicts whether or not they refer to the same person. With Name Match, the researchers were able to create a victim-level dataset linking domestic violence victims and offenders to past and future law enforcement incidents. To generate predictions of violent felony domestic violence victimization over a 12-month follow-up period, the researchers used data between 2006-2014 to predict outcomes, dividing the data into a training set and a test set.

Phase 2 was designed to test the developed model in the field. The researchers sought to test whether the statistical model or domestic violence officers selected victims at higher risk for regular home visits, as well as to determine the treatment effect of home visits on violent felony domestic violence revictimization. Launched in July 2017, the field intervention involved 60 NYPD commands randomized into either treatment or control groups (30 each group). The control group operated as usual. The intervention group added two individuals per officer to the high-priority list of those receiving regular home visits (one selected via algorithm, one selected by officers). However, due to external constraints, the study design was modified to add a quasi-experiment comparing individuals who received home visits to those who did not receive home visits based on residence in a particular NYPD command area.

Sample View help for Sample

Not applicable.

Time Method View help for Time Method

Cross-sectional

Universe View help for Universe

New York Police Department administrative records covering all of New York City between January 2006 and January 2017.

Unit(s) of Observation View help for Unit(s) of Observation

Event/Process, Individual

Data Source View help for Data Source

New York Police Department (NYPD)

Data Type(s) View help for Data Type(s)

administrative records data

Mode of Data Collection View help for Mode of Data Collection

record abstracts

Description of Variables View help for Description of Variables

The following variables were used in the secondary analysis and tool development:

Victim and/or offender personal identifiable information (PII): name, birthdate, reported age, sex, race, home address, unique ID (if available)
Incident details: date, time, precinct, address, type of incident, penal code, law code, police department code description, narrative description of incident
Other incident indicators: fatal vs. non-fatal, whether desk appearance ticket was issued, if arrest was victim-driven or proactive

Response Rates View help for Response Rates

Not applicable.

Presence of Common Scales View help for Presence of Common Scales

None

Hide

Original Release Date View help for Original Release Date

2024-02-12

Hide

Weight View help for Weight

Not applicable.

Hide

Notes

These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.

Using Machine Learning to Identify High-Risk Domestic Violence Offenders in New York City, New York, 2006-2017 (ICPSR 38540)

Project Description

Summary View help for Summary

Citation View help for Citation

Funding View help for Funding

Subject Terms View help for Subject Terms

Geographic Coverage View help for Geographic Coverage

Smallest Geographic Unit View help for Smallest Geographic Unit

Distributor(s) View help for Distributor(s)

Scope of Project

Time Period(s) View help for Time Period(s)

Date of Collection View help for Date of Collection

Methodology

Study Purpose View help for Study Purpose

Study Design View help for Study Design

Sample View help for Sample

Time Method View help for Time Method

Universe View help for Universe

Unit(s) of Observation View help for Unit(s) of Observation

Data Source View help for Data Source

Data Type(s) View help for Data Type(s)

Mode of Data Collection View help for Mode of Data Collection

Description of Variables View help for Description of Variables

Response Rates View help for Response Rates

Presence of Common Scales View help for Presence of Common Scales

Version(s)

Original Release Date View help for Original Release Date

Analysis Information

Weight View help for Weight

Notes