The purpose of the study was to develop two valid and reliable interval-level measurement scales for evaluating police officer performance during real or simulated deadly force situations. The P-scale measure is intended to measure or predict the quality of the outcome-relevant behavior (i.e., performance) of an officer in a deadly force situation. The D-scale is intended to measure attributes of a deadly force situation that affect the probability of an unforeseen or undesirable outcome (i.e., difficulty), and thus make it more difficult for a police officer to steer it toward an optimal outcome.
The study's research team first conducted an intense, two-day concept mapping focus group with a diverse gathering of 17 leading experts on policing and deadly force. During the concept mapping process, experts were asked to identify only the critical variables that affect the difficulty (D) of a deadly force situation and the performance (P) of a police officer in that situation. The outcome of this process was a long list of statements about discrete, measurable variables that affect situation difficulty ("D-statements") and police officer performance ("P-statements").
The research team then developed Likert scales for each statement in order to specify the range of values within each difficulty or performance statement. To determine the meaning of the different values assigned to each statement in terms of how much a given value contributed to the difficulty of a deadly encounter, or to a police officer's performance in that encounter, the team next had 291 use of force instructors from more than 100 different agencies across the United States score the D-statements and P-statements online using a Thurstone equal-appearing-interval scaling process. "Survey Monkey," an online survey instrument, was used to enable efficient scoring of the statements by the raters. Survey Monkey was also used to gather demographic information from each of the use of force instructors.
Because the total number of statements was large, the research team divided the D-statement set into four random subsets of items to be scored (D1-D4) and the P-statement set into four random subsets of items to be scored (P1-P4). Each use of force trainer then received a random P+D subset to score. Once a rater had completed a subset pair, he/she was offered an opportunity to score another pair at a later date. If willing, raters were re-contacted the next day and offered a random pair that did not duplicate subsets they had scored previously. Thus, each rater had the opportunity to score all of the items.
The subjects for the Thurstone scaling process were 291 use of force instructors from more than 100 different agencies across the United States. The research team used a snowball recruiting process, whereby experts from the concept mapping group recommended the survey to colleagues, police agencies and departments, and fraternal orders around the country. Requests for use of force instructors to participate in the survey were also sent to members of Force Science Institute, the National Tactical Officers Association, International Law Enforcement Training Association, and other organizations via their respective websites. Use of force instructors were instructed to email the research team with requests to participate if they were interested. After their employment was verified and they were enrolled as survey participants they also were encouraged to contact other use of force instructors and ask them to participate in the survey.
Use of force instructors in the United States.
Web-based surveys administered to use of force instructors.
The dataset contains a total of 685 variables, comprised of 312 difficulty statement items, 278 performance statement items, and 94 variables that measure the demographic characteristics of the scorers. Demographic variables include sex; age; race; whether the scorer has ever been an armed, sworn law enforcement officer; current status as a law enforcement officer (if any); year that academy training was completed; rank; types of agency worked for; periods of time worked at each agency; the approximate number of sworn officers in agency worked for; current primary duty assignment; patrol type in current primary duty assignment (if applicable); primary patrol area type (e.g., urban, suburban, rural, etc.); and states/jurisdictions worked for during his/her career.
The Deadly Force Judgment and Decision Making Metrics (DFJDM) developed in this study are comprised of two Likert-type scales, one that measures the most important dimensions of performance in deadly force situations (the P-scale) and another that measures the relative difficulty of different deadly force situations (the D-scale). The dataset contains 278 performance statement items and 312 difficulty statement items. P-scale statement scores range from -6 (extremely negative impact) to +6 (extremely positive impact); a score of 0 equals no impact on performance. D-scale statement scores range from 0 (no impact) to 6 (highest impact).