Indicators of Sex Trafficking in Online Escort Ads, 7 U.S. states, 2013-2020 (ICPSR 38328)
Version Date: Jan 30, 2024 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Kristina Lugo-Graulich, Justice Research and Statistics Association
https://doi.org/10.3886/ICPSR38328.v1
Version V1
Summary View help for Summary
With the aim of improving precision in sex trafficking victim identification and investigations, this exploratory, mixed-methods study had two objectives: 1) To investigate whether there are indicators that differentiate online escort ads related to sex trafficking from ads for non-trafficked sex work, and 2) if so, to determine which indicators are most likely to predict whether the ad represents a case of sex trafficking.
Research activities took place over a three-year period (2018-2021). First, the research team developed the set of indicators to test based on previous literature and insight from three sets of focus groups: law enforcement and victim advocates, trafficking survivors, and non-trafficked sex workers. Focus groups also provided insight into indicators that may be misinterpreted and into how advertising practices have changed, especially since the passage of FOSTA (Allow States and Victims to Fight Online Sex Trafficking Act) and SESTA (Stop Enabling Sex Traffickers Act), and the shutdown of Backpage.com by the FBI.
Second, the researchers collected investigative file information on closed cases involving escort ads from several locations in the United States, using phone numbers identified in each case to pull associated ads missing from case files from one of three web scraper databases (the MEMEX archive and the active TellFinder and HTI Labs' Law Enforcement Assistant for Dismantling Sex Trafficking Networks (LEADS) web scrapers). The final dataset includes 318 closed commercial sex and massage cases investigated in seven states, with 1,586 unique associated ads covering 35 U.S. states and Ontario, Canada. Researchers also pulled additional ads not present in the case files from the scraper archives to conduct three case studies of trafficking movement patterns, network management, and advertising structures to provide context for the hypothesis test results.
Finally, after analysis of the ad- and case-level data, a second round of focus groups was conducted to obtain each group's responses to the results, advice on interpretation, and input on recommendations.
The case-level (DS1) and ad-level (DS2) quantitative data are currently available in this collection. The qualitative data will be released at a future date. Please refer to the ICPSR README and the study documentation for more information about the files.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Smallest Geographic Unit View help for Smallest Geographic Unit
State
Restrictions View help for Restrictions
Access to these data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement, specify the reason for the request, and obtain IRB approval or notice of exemption for their research.
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Date of Collection View help for Date of Collection
Study Purpose View help for Study Purpose
The purpose of the study was twofold: 1) To examine whether there are indicators that can differentiate escort ads related to sex trafficking from ads for consensual, non-trafficking sex work, and 2) if so, to determine which indicators are most likely to predict whether the ad represents a trafficking case.
Study Design View help for Study Design
Data collection took place across three phases between 2018 and 2021. Prior to the main data collection, the research team conducted focus groups with law enforcement and victim advocates, sex trafficking survivors, and consensual (non-trafficked) sex workers (2018). These initial interview data were used to refine subsequent data collection. Next, on-site fieldwork was conducted at four locations in three U.S. states (San Diego, California; San Francisco, California; Georgia; Texas), including site interviews with investigators and manual reviews of closed case files. Case files procured from these sites were supplemented with data collected remotely from four other states, and ads associated with the reviewed cases were extracted via web scraping databases. Following analysis of case- and ad-level data, a second round of focus groups with the original stakeholders for guidance on interpretation and further input (2021).
Selection of fieldwork sites. Fieldwork sites were the Georgia Bureau of Investigation (GBI), San Diego County District Attorney's Office, San Francisco District Attorney's Office, Texas Department of Public Safety, and Human Trafficking Initiative (HTI) Labs in Omaha, Nebraska. On-site fieldwork was conducted for the first four sites; closures due to COVID-19 resulted in adding a virtual site (Nebraska). Sites were selected based on their work with trafficking cases, heterogeneity in geographic region and types of trafficking, and willingness to share data.
Focus group interviews. The lead contacts at each fieldwork site, plus two victim advocates, participated in the law enforcement and victim advocates' focus group. A trafficking survivor-consultant recruited the panel of survivors that served in the survivors' focus group from among her contacts. The non-trafficked sex workers' focus group was recruited with the help of a sex workers' rights organizer and activist.
Site interviews and case review. Investigators at each fieldwork site were interviewed to ascertain their experiences with using escort ads in sex trafficking investigations. All investigators available were interviewed (n=27). Data on 318 cases were collected in total: 114 from fieldwork locations (California, Georgia, Nebraska, and Texas) and 204 from DeAngelo's "ground truth" set (New York, Oregon, and New Mexico).
Ad-level data. Data for ads associated with cases dated 2018 and newer were pulled from the web scrapers TellFinder and HTI Labs' Law Enforcement Assistant for Dismantling Sex Trafficking Networks (LEADS). Additional ad data were sampled from DeAngelo and colleagues' "ground truth set", which consisted of phone numbers associated with over 41,000 confirmed sex trafficking victims and a small sample of "negative" case ads over a period of two years. These phone numbers were used to pull the associated ads from the MEMEX web scraper archive (pre-2018). All states that had at least one ad associated with a "negative/unknown" case were represented in the shared dataset, adding Oregon, New York, and New Mexico to the analysis.
Case studies. Three movement-based case studies were created to highlight small, medium, and large travel patterns using detailed case descriptions available in the case-level data. All ads collected from fieldwork for each case were included in the analyses.
Sample View help for Sample
Case- and ad-level data: Closed case files from each agency were selected for coding if they met the following criteria:
- Case year (year the investigation began) was 2013 or later. For cases between 2013 and 2015, which pre-dated the MEMEX project, at least one ad must have been present in the case file itself for coding (electronically saved or a printout).
- Case must fall into the categories of human trafficking or sex trafficking-adjacent activities (prostitution, promoting prostitution).
- Case must involve online escort ads.
- Ads associated with the case were not ads for recruitment of victims.
- Some level of case detail must be available, such as a police report(s), prosecutorial files, indictments, interview transcripts, or similar.
All ads available were coded for every case. In outlier cases with hundreds of associated ads, a power analysis was conducted and then random sampling from the ads available for each outlier case (31 or more ads) was conducted in line with the appropriate N suggested by the power analysis. Analyses were clustered on case number to account for interrelatedness among those ads. From DeAngelo and colleagues' "ground truth" dataset, the research team received a stratified random sample, selecting ads from three additional states that had at least one "negative" case ad. All "negative" ads were included, as well as a random draw of 100 additional phone numbers per state (see final project report for details). This allowed for an oversampling of massage ads, which were underrepresented in data received from fieldwork sites.
Time Method View help for Time Method
Universe View help for Universe
- Closed sex trafficking cases investigated in study sites.
- Online escort ads associated with closed sex trafficking cases.
- Criminal investigators and other law enforcement, victim advocates, sex trafficking survivors, and consensual (non-trafficked) sex workers.
Unit(s) of Observation View help for Unit(s) of Observation
Data Source View help for Data Source
MEMEX Database (archived, Claremont Graduate School)
San Francisco District Attorney's Office (SFDA)
Law Enforcement Assistant for Dismantling Sex Trafficking Networks (LEADS) (HTI Labs)
Nebraska State Patrol (via HTI Labs)
Georgia Bureau of Investigation (GBI)
TellFinder (Uncharted, Inc.)
Ground Truth dataset compiled by Dr. Greg DeAngelo and colleagues (Cafarella et al., 2021)
San Diego County District Attorney's Office (SDCDA)
Nebraska Attorney General (via HTI Labs)
Texas Department of Public Safety (TXDPS)
Data Type(s) View help for Data Type(s)
Mode of Data Collection View help for Mode of Data Collection
Description of Variables View help for Description of Variables
The quantitative case- and ad-level files, along with the qualitative case-level files, can be linked together using the CASENO variable. The quantitative and qualitative ad-level files can be linked with the AD_NO variable.
Case-level quantitative data contains the associated de-identified case number, date of case, state, city/jurisdiction (if available), and various descriptive items about the case (Yes if present): number and ages of perpetrators, victims, and sex workers, relationship between victim and perpetrator, and general incident location.
Ad-level quantitative data contains the case number, an ad identification number, date ad was posted, state, city/county (if available), and various descriptive items about the ad (Yes if present): actual age vs. stated age of provider, language markers, client preference markers, identity verification markers, emojis used, and photos used.
Case-level qualitative data contains the case number, narrative description of the case, and indicator for whether it was a trafficking case.
Ad-level qualitative data contains the ad number, ad text, and trafficking indicator.
Case study data contains the city and state specified by the ad where services were available and the ad text.
Semi-structured investigator interviews covered challenges in investigating human trafficking cases, how ads are used in investigations, task force involvement, and supplementary questions to fill missing information gaps about reviewed case files.
Initial focus groups with investigators covered how ads are used in trafficking investigations, challenges to using ads, tools that could be useful to maximize the ability to use ads in investigations. Victim advocates were asked to provide impressions on how the processes described by investigators would impact victims' safety and suggestions for reducing harm. Follow-up focus groups walked through trafficking indicators determined in previous research phases and asked participants for input on validity, interpretation, and additional context. Participants were also asked about changes in their work after the shutdown of Backpage.
Focus groups with sex trafficking survivors and consensual (non-trafficked) sex workers walked through trafficking indicators and asked participants for input on validity and interpretation, with additional insight on how online ads were created and used for context. Participants also expressed any concerns about the research and resulting guidelines for law enforcement.
Response Rates View help for Response Rates
Not applicable.
Presence of Common Scales View help for Presence of Common Scales
None
HideOriginal Release Date View help for Original Release Date
2024-01-30
Version History View help for Version History
2024-01-30 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:
- Checked for undocumented or out-of-range codes.
Weight View help for Weight
For the ad- and case-level quantitative data, cases were weighted by state to calculate the probability that a given case would be included in the data based on an estimate of total trafficking or trafficking-adjacent cases that came to the attention of authorities in each state, for each year in which case data is available. Data on the number of cases reported from each state and year in the dataset to the National Human Trafficking Hotline run by Polaris were used to estimate these weights. A raking procedure was used in Stata to calculate these probability weights.
HideNotes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.