Population Genetic Issues for Forensic DNA Profiles, 2020-2023 (ICPSR 39194)
Version Date: Jan 30, 2025 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Bruce S. Weir, University of Washington
https://doi.org/10.3886/ICPSR39194.v1
Version V1
Summary View help for Summary
This study was a survey of published Y-chromosome haplotype frequencies in order to compare the performance of alternative approaches for calculating match probabilities. Researchers examined 31,011 PowerPlex Y23 profiles at the population, metapopulation and world levels.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Date of Collection View help for Date of Collection
Data Collection Notes View help for Data Collection Notes
-
The study materials include a file named "Readme.txt". This file was provided by the P.I. with information regarding the data file "YSTRData.xlsx". This file states "The analyses conducted in this project were for the 22 loci that form the PowerPlex Y23 set of genetic markers." When asked whether this should state "...23 loci...", the P.I. responded "This is a problem with the 'Powerplex Y23' name! There really are only 22 separate loci once DYSII.I is ignored."
Study Purpose View help for Study Purpose
The purpose of the study was to produce statistical procedures for providing quantitative strength to DNA evidence. The work extended population genetic theory for accommodating population structure in calculating match probabilities for autosomal and lineage markers, and combinations of these markers. The study addressed the population genetic issues arising from the forensic use of next-generation sequencing.
Study Design View help for Study Design
Researchers compiled a total of 97,592 profiles from eight metapopulation groups (South Asian, East Asian, American, Native American, African, Western Eurasian, Middle Eastern, and Oceanian). Data were obtained from 235 publications: this included situations where two references give overlapping data. Although researchers retained both references, only one copy of each duplicated profile was added to the database.
A data verification process was undertaken to address the presence of missing, extreme (defined as alleles outside the known range), and unconventional data points (defined as alleles containing nonnumeric values). Communication with the publication authors allowed researchers to resolve some, but not all, of the data considered missing, extreme, or unconventional. In some cases, allele designations were translated due to a change in allele naming conventions between the date of original publication and the present.
Some publications had overlapping profiles for which data in one paper were also in another paper. Researchers decided to include only the larger published data set to avoid including multiple instances of the same profile (from the same person) in the database.
Mainland Chinese Y-Short Tandem Repeat (Y-STR) data were removed due to concerns of the methodologies involved in the gathering of such data (including non-consensual gathering or publication of genetic data). Two Han samples remain from Singapore and Mongolia.
The subset of data that had full 22 locus profiles for the PowerPlex Y23 system was used for the analyses. Researchers performed analyses at three levels: population, metapopulation, and the world population. Any population that had fewer than 10 profiles after consolidation of samples described in the source publications was not used further. This left 31,011 profiles.
Sample View help for Sample
Y-Short Tandem Repeat (Y-STR) profiles were obtained from publicly accessible data from a wide-ranging selection of papers encompassing a diverse array of reputable publications. Notably, the data compilation used references given by Y-Chromosome STR Haplotype Reference Database (YHRD), as well as an exhaustive survey of supplementary reference lists and bibliographic databases of scientific publications (such as ScienceDirect).
Time Method View help for Time Method
Universe View help for Universe
Adult men throughout the world.
Unit(s) of Observation View help for Unit(s) of Observation
Data Type(s) View help for Data Type(s)
Mode of Data Collection View help for Mode of Data Collection
HideNotes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.