LEAIC Learning Guide (R): Background

Description of the LEAIC

Criminal justice research may require merging disparate data sources that have no common match keys. The Law Enforcement Agency Identifiers Crosswalk (LEAIC) file facilitates linking reported crime data with other types of data, such as socio-economic data. It does this by including a record for each law enforcement reporting entity and access identifier for the National Crime Information Center (NCIC). Essentially, if an entity (law enforcement agency or section of a law enforcement agency) is capable of reporting crime information, it is included in the file. The LEAIC records contain common match keys for merging agency-reported crime data and other government data. These linkage variables include the Originating Agency Identifier (ORI) code, Federal Information Processing Standards (FIPS) state, county and place codes, and Governments Integrated Directory government identifier codes.

How the LEAIC works in brief

Let’s look at the example below. Table 1 shows the population of five cities which are identified by letters. Table 2 shows the unemployment rate in the same five cities now identified by numbers. We would like to combine these tables and see how population and unemployment are related. However, none of the variables in our data sets match. This is where the crosswalk comes in. Table 3 is our crosswalk (the LEAIC for this example). It contains a column with our identifiers from Table 1, a column with our identifiers from Table 2, and a (supplementary) column giving us the name of the city. Now we have a data set that lets us match the information from Table 1 with Table 2. We can merge Table 1 with Table 3 because the column City_symbol is present in both. From this merged file, it is possible to merge with Table 2 because now the column City_number is present in both. Table 4 shows what the final data set looks like.

Table 1

City_symbol Population
A 681,170
B 8,538,000
C 1,568,000
D 864,816
E 120,782

Table 2

City_number Unemployment Rate
1 5.9%
2 3.9%
3 5.9%
4 2.7%
5 1.8%

Table 3

City_symbol City_number City
A 1 Washington D.C.
B 2 New York City
C 3 Philadelphia
D 4 San Francisco
E 5 Ann Arbor

Table 4

City_symbol City_number City Population Unemployment Rate
A 1 Washington D.C. 681,170 5.9%
B 2 New York City 8,538,000 3.9%
C 3 Philadelphia 1,568,000 5.9%
D 4 San Francisco 864,816 2.7%
E 5 Ann Arbor 120,782 1.8%

The original motivation for creating the LEAIC file was to enable linking Uniform Crime Reporting (UCR) Program data and/or National Incident-Based Reporting System (NIBRS) data, produced by the Federal Bureau of Investigation (FBI), with socio-economic data produced by the Census Bureau (e.g., to examine crime rates and poverty information at the city level). A file such as the LEAIC is necessary to facilitate this type of research because of the different coding systems used by the data sources.

While FBI crime data contains a wealth of information, it has a number of limitations that restrict its use – limitations alleviated by the LEAIC. The FBI data uses Originating Agency Identifier (ORI) codes to indicate its agencies, and these codes are not used by other agencies. For example, a researcher who would like to examine the relationship between poverty and crime would be unable to do so using only FBI data. Government poverty data (available through the Census) uses Federal Information Processing Standards (FIPS) codes to specify locations whereas FBI data uses ORI codes. This disconnect prevents pairing poverty data (or almost any other government data set) with FBI crime data because the identifying codes do not match. The LEAIC solves this problem because for each ORI, it contains the matching FIPS code. LEAIC thus makes it possible to merge with both FBI and other government data sets.

If the above researcher wanted to examine city level poverty and crime, they would encounter another problem. The FBI does not have codes for places – cities, townships, etc. The smallest geographical area available is the county level, meaning that analyses at smaller units are not possible. In addition, some cities have multiple reporting agencies. For example, the City of Philadelphia has multiple police agencies (e.g. Philadelphia Police, Philadelphia Sheriff, University of Pennsylvania Police). Using FBI data alone would allow neither analysis at a city level nor assurance that all crime in the city is properly aggregated, as we would not know which agencies are in which cities.

The LEAIC solves these problems by allowing a match between an individual agency from FBI data and a city from another data source (e.g. the Census). The agencies of Philadelphia Police, Philadelphia Sheriff, and University of Pennsylvania Police, for example, could be matched with poverty data for the City of Philadelphia and aggregated to the city level. At this city level, it is possible to analyze how city poverty affects city crime. The key to these analyses is the LEAIC’s ability to match FBI data with other government data. The Census Bureau typically uses FIPS codes to geographically identify counties and states. In addition, the FIPS system has codes for places (county subdivisions, cities, census-designated places). The LEAIC file “crosswalks” the UCR/NIBRS and FIPS state and county codes; it also adds FIPS place codes to law enforcement agency records. Consequently, a city-level analysis of crime and poverty could be done by merging UCR crime data to the LEAIC file by ORI code (contained in both the UCR/NIBRS and LEAIC files) and then merging the result to Census data using FIPS state and place codes (contained in both the Census and LEAIC).

In this learning guide, we will examine crime rates of different age and gender groups across states. There are many other uses of the LEAIC. For example, it has information about congressional and judicial districts, allowing you to examine crime in those jurisdictions. As a collection of reporting law enforcement agencies, it serves as a census of police departments in the United States. Other uses are limited only by the availability of proper match keys on the data you’re interested in. When considering whether to use the LEAIC, check to see that both of your data sets have match keys in the LEAIC. If they do, then the LEAIC is the right choice for your project.

What is NIBRIS?

In this guide, we will use the National Incident-Based Reporting System (NIBRS) data to calculate homicide victimizations. Why NIBRS? There are two FBI-produced data sources that include homicide victim sex, age and geography information. One is the Uniform Crime Reporting (UCR) Program Data: Supplementary Homicide Reports (SHR). These data are a supplement to the Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest data. The Offenses Known data provide total counts by crime type and law enforcement agency by month. No specific victim information is included. The SHR data provide supplemental information such as victim and offender age, sex, race, weapon type, and location. The SHR data contain one record per homicide incident. Each incident can contain up to 11 victims. As such, multiple-victim incidents contain information for victim 1 up to victim 11 organized on the same record. The unit of analysis in the SHR is the incident, not the victim.

The other source of data on homicide victims is the NIBRS data, specifically the victim record. NIBRS is an FBI data set that “captures details on each single crime incident-as well as on separate offenses within the same incident-including information on victims, known offenders, relationships between victims and offenders, arrestees, and property involved in the crimes.” The details provided for each incident allow researchers to answer questions about crime that would be unanswerable with UCR data. For example, NIBRS includes information about whether the offender is suspected of drug use at the time of the incident. Researchers could use NIBRS to determine which crimes involved intoxicated offenders, whether intoxication rates have changed over time for any crimes, or even which days of the month offenders are more likely to be intoxicated. The same questions could not be answered with UCR data, which provides monthly estimates of crime.

Unlike the UCR, NIBRS data contains detailed information on every crime reported to the police. The UCR uses a Hierarchy Rule for reporting. This means that only the most serious crime in an incident is reported. If there was a robbery that led to the death of the victim, both a robbery and a homicide occurred. However, since homicide is more serious than robbery, the UCR would report only the homicide. NIBRS would report both. As such, any agency that switches from UCR to NIBRS would see a drastic increase of their reported crime (though no real change in crime). In addition, switching from the summarized data of the UCR to the incident-level details of NIBRS incurs financial costs on the department. For these reasons, NIBRS’ response rate is relatively low.

As of 2020, 8,742 law enforcement agencies representing 48.9 percent of the population were reporting NIBRS data to the UCR Program. The limited participation rate is a limitation in using NIBRS. The UCR is more widely used than NIBRS, providing information on nearly all agencies in the country. Using NIBRS gives researchers more detailed insight into crime in a far narrower jurisdiction than the UCR provides. For this guide, we chose NIBRS because its detail on each crime exceeds that of the UCR, allowing us to answer questions such as how murder rates change among age and gender groups.

In short, NIBRS allows easier examination of homicides than does the UCR data, and provides more details of more types of crime. Using NIBRS also means we do not need to restructure the data to the victim level as we would with the SHR data. Furthermore, the FBI transitioned the UCR program to NIBRS-only data collection in 2021. That’s a good reason to become familiar with NIBRS.

NIBRS data as received from the FBI are in one large file with 11 types of records. The 2013 data has approximately 31.8 million records (i.e. crimes) and is approximately 5.8 gigabytes in size. The amount of data NIBRS provides causes each file to be very large. NACJD makes two versions of NIBRS available. The “regular” NIBRS data is the data received from the FBI but split into 11 files corresponding to the 11 record types (each record corresponds to a specific part of the data. E.g. Offenses, Victims, Offenders). The data files can be analyzed separately or by restructuring and merging some or all of them by the ORI and incident number variables.

Substantial data management work is required to combine multiple NIBRS data files into a cohesive file with a consistent unit of analysis. Creating a cohesive file is often needed when using information from multiple segment files. For example, information about victims and offenders are in separate files and require merging to analyze their features together. NACJD does this work in creating the NIBRS Extract Files, which are four files with units of analysis corresponding to the crime incident, victim, offender and arrestee. The extract files are large, but they can make working with NIBRS data much easier.