Learn More About the Chicago Homicide Dataset
HOMICIDES IN CHICAGO, 1965-1995
Study Abstract
The Chicago Homicide Dataset, one of the largest and most comprehensive datasets on violence ever collected in the United States, contains detailed information on every homicide in Chicago police records from 1965 to 1995 -- over 100 variables and nearly 23,000 homicides. The Chicago Homicide Dataset is organized so that questions about victims, offenders, or incidents (and inter-relationships between them) can be answered. For example, it is possible to conduct an analysis of the risk of death and the risk of becoming an offender for a specific type of homicide (such as street gang-related, spousal, or instrumental) for specific racial/ethnic, age, and gender groups, and within specific neighborhoods, and to follow these patterns for almost 30 years.
Study Design
The source of the data is police investigation files. Victims and offenders are those identified by police investigation. Victims are the people who died. Offenders include all those known to the police, whether or not arrested. Offenders in cases that have been cleared exceptionally are also counted (e.g., offender died or charges were rejected by the Assistant State's Attorney). Based on the original investigation report, Detective Division staff in the Crime Analysis Unit fill out a Murder Analysis Report (MAR) for each homicide, using a coding scheme similar to that of the Chicago Homicide Dataset, to measure various factors of the case (e.g., causal factor, relationship, location and weapon). The Murder Analysis Report is a one-page summary of each homicide, which has been maintained since 1965 by the Crime Analysis Unit of the Chicago Police Department.
Since 1982, the Chicago Police Department has maintained data on murder cases in an automated system called RAMIS. The Crime Analysis Unit downloads RAMIS information to .dbf (dBASE) files, which Authority staff convert into SPSS data entry files. Coders then, working at CPD, check the RAMIS data against the MAR for each case and add variables, additional codes and narrative not coded in RAMIS. When the coders have a question about the MAR information, or they need clarification about what happened in a particular case, officers in the Crime Analysis Unit advise them as to the correct codes and definitions. All coding and data entry are carried out in the Crime Analysis Unit. Coders are supervised closely and trained continuously. ICJIA staff run standard cleaning programs on the data to automate the cleaning process and to detect coding errors.
In general, these homicides are defined at the police investigation stage, without regard to later criminal justice decisions. The standard of proof required by the courts is not the same as the "preponderance of evidence" standard required at the police level. For example, a 1970s police investigation determined that an arson homicide, in which 23 nursing home residents were killed, was perpetrated by a cleaning woman employed by the nursing home. Although there was enough evidence to prosecute the cleaning woman, she was not convicted. Nonetheless, these 23 elderly victims are included in the Chicago Homicide Dataset -- with the cleaning woman as the offender -- because by police standards of proof, the cleaning woman did indeed commit the homicides.
Data are received from CPD in two separate files. One file contains offender demographics and has one record per offender. The victim demographics and the rest of the variables are received in a victim-level file. The two data files are linked by a unique identifier for each victim, the homicide file number (HOMINUM). The file containing the offender demographics is converted to a victim-level format, then merged with the victim-level file, by HOMINUM. Multiple offender information is appended to each victim record, up to five offenders. If the incident involves more than one victim, there will be more than one record for that case. Multiple records for one homicide case cannot be linked using the public version of the Chicago Homicide Dataset since the variable, RDNUMBER, needed to link multiple records for the same case is not present in these data.
Chicago Police Department data and the Chicago Homicide Dataset may differ for two reasons -- because cases may become known to the police months or even years after the initial occurrence, or may be delayed because of a lengthy investigation or because the victim died some time after the attack, monthly or yearly totals based on the Chicago Homicide Dataset may not equal official CPD totals, which are usually based on booking date. The variable BOOKYEAR measures the year the homicide was booked by CPD. This variable is important for understanding year-to-year changes in variable definitions and changes in police area and district boundaries. When data for a new year are collected, we also update the information on any earlier cases that have been cleared in the interim. Updating for cases booked in a given year will increase the number of Dataset cases occurring in previous years and decrease the number of cases in recent years; therefore, the most recent years in the dataset should be considered preliminary.
Variables
The complete public version of the Chicago Homicide dataset contains 115 variables. Data are provided on the relationship of victim to offender; whether the victim or offender had previously committed a violent or nonviolent offense; time of occurrence and place of homicide; type of weapon used; cause and motivation for the incident; whether the incident involved drugs, alcohol, gangs, child abuse, or a domestic relationship; if or how the offender was identified; and information on the death of the offender(s). Geographic variables include the census tract, community area, police district, and police area. Demographic variables such as the age, sex, and race of each victim and offender are also provided.
Some variables were created with the intention of making the Dataset more comprehensive and "user-friendly." Note especially the following created variables which have been added to the dataset:
SYNDROME: SYNDROME is a useful variable that provides the researcher with a quick answer to the question: "What were the broad circumstances surrounding this homicide?" SYNDROME was created by combining elements of both relationship and motive variables to create values such as "gang-related", "instrumental", "spousal attack", "child abuse" and "other family, expressive".
DRUGTOT & INTOXTOT: These two variables were created in order to provide succinct information about the prevalence of alcohol and drug use and drug motive, in various combinations, in a homicide.
SEXRACE: This variable was created to make the dataset more convenient to use. For example, SEXRACE informs the researcher of both the gender and the race/ethnicity of the homicide victim. Five additional variables like this one exist in the dataset to measure the gender and race/ethnicity of up to five offenders.
PLACE and counterparts, WEAPON and counterparts: These summary variables categorize the values of LOCATION and WEAPCAL into groups that possess similar characteristics. For example, POUTDOOR groups together all locations that are outdoors, and WHANDGUN groups together all weapons that are handguns. CALIBER is a very important weapon variable for the analysis of homicides committed with a firearm. It has been extremely useful in examining the surge of homicides in the early 1990s.
INVEST (1-5): This variable measures the outcome of the police investigation for up to five offenders in the years 1990 and after. Data collection in the years prior to 1990 included information on only the first offender. You will notice that the dataset consists of two additional variables that measure this same factor. INVSTGN contains information for the years 1965 to 1981, and INVEST contains information for the years 1982 to 1989. This is because data were collected in a different manner in those two periods. For example, INVEST separates "exceptional clearance" into two separate values (death of offender and bar to prosecution), whereas INVSTGN groups them together. The differentiation between the two elements of "exceptional clearance" provides a more descriptive measurement of the outcome of the police investigation. With the addition of the 1990 data a new variation of INVEST was created to include information on up to five offenders.
The values "Missing" and "Unknown" are used to indicate that information for a certain variable is not known to the police according to the MAR, or the variable does not apply to the circumstances of a particular case. For example, if the race variable for the third offender is coded "Missing", this means that either the race of the third offender is not known to the police or a third offender was not involved in the incident.
Data Collection Notes
This unique set of data has been compiled with the close cooperation of the Chicago Police Department over many years by Carolyn Rebecca Block of the Illinois Criminal Justice Information Authority and Richard L. Block of Loyola University Chicago. Initially, the data collection was established by Richard Block and Franklin Zimring of the University of Chicago Law School, working with the Chicago Police Department. Margo Wilson and Martin Daly of McMaster University also have contributed to data collection, and numerous researchers and policy makers have used the data for policy analysis or causal modeling.
Support for the Chicago Homicide Project has been provided over the years by the Illinois Criminal Justice Information Authority, Loyola University Chicago and the University of Chicago Law School, under grants from the National Institute of Justice, the Ford Foundation, the Bureau of Justice Statistics, the National Institute of Mental Health, the Harry Frank Guggenheim Foundation, the National Institute of Occupational Safety and Health and most recently, the Joyce Foundation. The Illinois Criminal Justice Information Authority has maintained the Chicago Homicide Dataset since 1979.
ICPSR Processing
Original data and documentation files were reformatted by ICPSR. ICPSR also performed checks for undocumented codes, standardized some missing data codes, and generated SAS and SPSS data definition statements for the public version of the Chicago Homicide Dataset (ICPSR 6399).

