Population Genetic Issues for Forensic DNA Profiles, 2020-2023 (ICPSR 39194)

Version Date: Jan 30, 2025 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Bruce S. Weir, University of Washington

https://doi.org/10.3886/ICPSR39194.v1

Version V1

Slide tabs to view more

This study was a survey of published Y-chromosome haplotype frequencies in order to compare the performance of alternative approaches for calculating match probabilities. Researchers examined 31,011 PowerPlex Y23 profiles at the population, metapopulation and world levels.

Weir, Bruce S. Population Genetic Issues for Forensic DNA Profiles, 2020-2023. Inter-university Consortium for Political and Social Research [distributor], 2025-01-30. https://doi.org/10.3886/ICPSR39194.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
United States Department of Justice. Office of Justice Programs. National Institute of Justice (2020-DQ-BX-0022)
Inter-university Consortium for Political and Social Research
Hide

2020 -- 2023
2020 -- 2023
  1. The study materials include a file named "Readme.txt". This file was provided by the P.I. with information regarding the data file "YSTRData.xlsx". This file states "The analyses conducted in this project were for the 22 loci that form the PowerPlex Y23 set of genetic markers." When asked whether this should state "...23 loci...", the P.I. responded "This is a problem with the 'Powerplex Y23' name! There really are only 22 separate loci once DYSII.I is ignored."

Hide

The purpose of the study was to produce statistical procedures for providing quantitative strength to DNA evidence. The work extended population genetic theory for accommodating population structure in calculating match probabilities for autosomal and lineage markers, and combinations of these markers. The study addressed the population genetic issues arising from the forensic use of next-generation sequencing.

Researchers compiled a total of 97,592 profiles from eight metapopulation groups (South Asian, East Asian, American, Native American, African, Western Eurasian, Middle Eastern, and Oceanian). Data were obtained from 235 publications: this included situations where two references give overlapping data. Although researchers retained both references, only one copy of each duplicated profile was added to the database.

A data verification process was undertaken to address the presence of missing, extreme (defined as alleles outside the known range), and unconventional data points (defined as alleles containing nonnumeric values). Communication with the publication authors allowed researchers to resolve some, but not all, of the data considered missing, extreme, or unconventional. In some cases, allele designations were translated due to a change in allele naming conventions between the date of original publication and the present.

Some publications had overlapping profiles for which data in one paper were also in another paper. Researchers decided to include only the larger published data set to avoid including multiple instances of the same profile (from the same person) in the database.

Mainland Chinese Y-Short Tandem Repeat (Y-STR) data were removed due to concerns of the methodologies involved in the gathering of such data (including non-consensual gathering or publication of genetic data). Two Han samples remain from Singapore and Mongolia.

The subset of data that had full 22 locus profiles for the PowerPlex Y23 system was used for the analyses. Researchers performed analyses at three levels: population, metapopulation, and the world population. Any population that had fewer than 10 profiles after consolidation of samples described in the source publications was not used further. This left 31,011 profiles.

Y-Short Tandem Repeat (Y-STR) profiles were obtained from publicly accessible data from a wide-ranging selection of papers encompassing a diverse array of reputable publications. Notably, the data compilation used references given by Y-Chromosome STR Haplotype Reference Database (YHRD), as well as an exhaustive survey of supplementary reference lists and bibliographic databases of scientific publications (such as ScienceDirect).

Cross-sectional

Adult men throughout the world.

Chromosomes
Hide

2025-01-30

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.