Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations in the United States, 2009-2011 (ICPSR 33141)
Principal Investigator(s): Vila, Bryan, Washington State University-Spokane
This study developed interval-level measurement scales for evaluating police officer performance during real or simulated deadly force situations. Through a two-day concept mapping focus group, statements were identified to describe two sets of dynamics: the difficulty (D) of a deadly force situation and the performance (P) of a police officer in that situation. These statements were then operationalized into measurable Likert-scale items that were scored by 291 use of force instructors from more than 100 agencies across the United States using an online survey instrument. The dataset resulting from this process contains a total of 685 variables, comprised of 312 difficulty statement items, 278 performance statement items, and 94 variables that measure the demographic characteristics of the scorers.
A downloadable version of data for this study is available; however, certain identifying information in the downloadable version may have been masked or edited to protect respondent privacy. Additional data not included in the downloadable version are available in a restricted version of this data collection. For more information about the differences between the downloadable data and the restricted data for this study, please refer to the codebook notes section of the codebook. Users interested in obtaining restricted data must complete and sign a Restricted Data Use Agreement, describe the research project and data protection plan, and obtain IRB approval or notice of exemption for their research.
These data are freely available.
Vila, Bryan. Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations in the United States, 2009-2011. ICPSR33141-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2014-06-18. http://doi.org/10.3886/ICPSR33141.v1
Persistent URL: http://doi.org/10.3886/ICPSR33141.v1
This study was funded by:
- United States Department of Justice. Office of Justice Programs. National Institute of Justice (2008-IJ-CX-0015)
Scope of Study
Smallest Geographic Unit: Jurisdiction (The data contain dichotomous variables that indicate whether the respondent reported working in a particular state or US territory.)
Geographic Coverage: United States
Date of Collection:
Unit of Observation: individual
Universe: Use of force instructors in the United States.
Data Types: survey data
Data Collection Notes:
Users should be aware that a case count discrepancy exists between the data reported in Appendix G ("Use of Force Instructor Raters' Demographics") of "Final Report: Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations" (Vila, 2011) and the data provided to ICPSR. Appendix G cites 323 individuals; the dataset contains 291 individuals.
Users should be aware that case count discrepancies exist between "Final Report: Developing a Common Metric for Evaluating Police Performance in Deadly Force Situations" (Vila, 2011) and the data provided to ICPSR regarding the numbers of difficulty statement (D-Scale) items and performance statement (P-Scale) items. Regarding difficulty statement items: the final report references 311; the dataset contains 312. Regarding performance statement items: the final report references 289; the dataset contains 278.
Users should be aware that the principal investigators withheld from submission to ICPSR the data for the following categories of demographic variables: Law Enforcement Training Details, Sympathetic Nervous Response Details, Deadly Force Situation Experience Details, Military Experience Details, Final Thoughts (i.e., general comments, etc.) Details, and Letter of Appreciation Details.
Study Purpose: The purpose of the study was to develop two valid and reliable interval-level measurement scales for evaluating police officer performance during real or simulated deadly force situations. The P-scale measure is intended to measure or predict the quality of the outcome-relevant behavior (i.e., performance) of an officer in a deadly force situation. The D-scale is intended to measure attributes of a deadly force situation that affect the probability of an unforeseen or undesirable outcome (i.e., difficulty), and thus make it more difficult for a police officer to steer it toward an optimal outcome.
The study's research team first conducted an intense, two-day concept mapping focus group with a diverse gathering of 17 leading experts on policing and deadly force. During the concept mapping process, experts were asked to identify only the critical variables that affect the difficulty (D) of a deadly force situation and the performance (P) of a police officer in that situation. The outcome of this process was a long list of statements about discrete, measurable variables that affect situation difficulty ("D-statements") and police officer performance ("P-statements").
The research team then developed Likert scales for each statement in order to specify the range of values within each difficulty or performance statement. To determine the meaning of the different values assigned to each statement in terms of how much a given value contributed to the difficulty of a deadly encounter, or to a police officer's performance in that encounter, the team next had 291 use of force instructors from more than 100 different agencies across the United States score the D-statements and P-statements online using a Thurstone equal-appearing-interval scaling process. "Survey Monkey," an online survey instrument, was used to enable efficient scoring of the statements by the raters. Survey Monkey was also used to gather demographic information from each of the use of force instructors.
Because the total number of statements was large, the research team divided the D-statement set into four random subsets of items to be scored (D1-D4) and the P-statement set into four random subsets of items to be scored (P1-P4). Each use of force trainer then received a random P+D subset to score. Once a rater had completed a subset pair, he/she was offered an opportunity to score another pair at a later date. If willing, raters were re-contacted the next day and offered a random pair that did not duplicate subsets they had scored previously. Thus, each rater had the opportunity to score all of the items.
Sample: The subjects for the Thurstone scaling process were 291 use of force instructors from more than 100 different agencies across the United States. The research team used a snowball recruiting process, whereby experts from the concept mapping group recommended the survey to colleagues, police agencies and departments, and fraternal orders around the country. Requests for use of force instructors to participate in the survey were also sent to members of Force Science Institute, the National Tactical Officers Association, International Law Enforcement Training Association, and other organizations via their respective websites. Use of force instructors were instructed to email the research team with requests to participate if they were interested. After their employment was verified and they were enrolled as survey participants they also were encouraged to contact other use of force instructors and ask them to participate in the survey.
Time Method: Cross-sectional
Mode of Data Collection: web-based survey
Web-based surveys administered to use of force instructors.
Description of Variables: The dataset contains a total of 685 variables, comprised of 312 difficulty statement items, 278 performance statement items, and 94 variables that measure the demographic characteristics of the scorers. Demographic variables include sex; age; race; whether the scorer has ever been an armed, sworn law enforcement officer; current status as a law enforcement officer (if any); year that academy training was completed; rank; types of agency worked for; periods of time worked at each agency; the approximate number of sworn officers in agency worked for; current primary duty assignment; patrol type in current primary duty assignment (if applicable); primary patrol area type (e.g., urban, suburban, rural, etc.); and states/jurisdictions worked for during his/her career.
Response Rates: Not available.
Presence of Common Scales: The Deadly Force Judgment and Decision Making Metrics (DFJDM) developed in this study are comprised of two Likert-type scales, one that measures the most important dimensions of performance in deadly force situations (the P-scale) and another that measures the relative difficulty of different deadly force situations (the D-scale). The dataset contains 278 performance statement items and 312 difficulty statement items. P-scale statement scores range from -6 (extremely negative impact) to +6 (extremely positive impact); a score of 0 equals no impact on performance. D-scale statement scores range from 0 (no impact) to 6 (highest impact).
- Standardized missing values.
- Checked for undocumented or out-of-range codes.
Original ICPSR Release: 2014-06-18
Related Publications (?)
- Citations exports are provided above.
Export Study-level metadata (does not include variable-level metadata)