Probabilistic Genotyping of Microhaplotype Data, 1993-2003 (ICPSR 38888)

Version Date: Nov 16, 2023 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Daniele Podini, George Washington University. Department of Forensic Sciences

https://doi.org/10.3886/ICPSR38888.v1

Version V1

Slide tabs to view more

Microhaplotypes (MHs) are an emerging forensic DNA marker characterized by sets of single nucleotide polymorphisms (SNPs) within a short distance of each other displaying multiple allelic combinations. Although less polymorphic than short tandem repeat polymorphisms (STRs), they have some advantages, such as alleles all of the same size within a locus, absence of stutter artifacts, and lower mutation rates than that of STRs. Several MH-multiplex panels have been reported in the past, including the 74-locus panel developed in the research team's laboratory. Casework implementation of such large panels is only feasible if paired with probabilistic genotyping (PG) as manual deconvolution of complex mixtures would be excessively time consuming and not compatible with conventional forensic DNA laboratory operations.

In this study, DNA-View Mixture Solution and EuroForMix PG software were adapted to processing MH data from 74 loci analyzed on the Ion S5 massively parallel sequencing (MPS) platform. Relative fluorescence unit (RFU) values were replaced by allele-sequence coverage and tested on a set of DNA mixtures. The goals of this project were to (1) adapt and thoroughly test the two PG software platforms for the use multiplex MH data for DNA mixture interpretation, and (2) generate a data repository of mixtures and references that can be used by developers and users to adapt other PG software to intake MH data.

The data are organized into spreadsheet type files. Data consists of likelihood ratios (Log10LR) of possible hypotheses involving mixture samples. Log10LRs are averaged across samples and categorized across 4 population frequencies: African American, European, Asian American, and Southwest Hispanic.

Podini, Daniele. Probabilistic Genotyping of Microhaplotype Data, 1993-2003. Inter-university Consortium for Political and Social Research [distributor], 2023-11-16. https://doi.org/10.3886/ICPSR38888.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
United States Department of Justice. Office of Justice Programs. National Institute of Justice (2019-DU-BX-0023)

None

Inter-university Consortium for Political and Social Research
Hide

1993 -- 2003
2020 -- 2022
  1. These data are a Fast Track Release and are distributed as they were received from the data producer. The files have been zipped for release, but not checked or processed. Users should refer to the accompanying ICPSR README file for a brief description of the files available with this collection and consult with the investigator(s) if further information is needed.

  2. Geographic origin for the collected samples was not provided.

Hide

The main research question of this project examined whether microhaplotypes (MHs) are amenable and effective for mixture analysis and deconvolution using currently available probabilistic genotyping (PG) software originally developed for short tandem repeat polymorphism (STR) analysis. During this project, the research team proposed to:

  1. Adapt and thoroughly test PG software packages (DNA-View Mixture Solution and EuroForMix) for the use of multiplex MH data for DNA mixture interpretation.
  2. Generate a publicly available data repository containing MH profiles from over 30 DNA mixtures (and corresponding references) that can be used (1) by developers for adapting other PG software to in-taking MH data and (2) by practitioners for education and training.
  3. Develop guidelines for designing and conducting a sound validation study for casework implementation of MH-based PG.

Some of the implications for criminal justice policy and practice include:

  • Increased power of discrimination, which will generally strengthen the evidence.
  • Complement and enhance DNA mixture interpretation, which will lead to less controversy (and possibly faster trials).
  • Not having to model stutter and preferential amplification of shorter alleles will benefit probabilistic genotyping software developers, which would be able to produce more robust and rapid mixture deconvolution tools based on massively parallel sequencing (MPS) data from MHs compared STRs.
  • The number of samples considered suitable for comparison in routine casework will increase reducing inconclusive reports.
  • Past ('cold') cases in which it was possible to determine that a minor contributor was present but not suitable for comparison could be retested potentially leading to exonerations and/or convictions. This would impact only cases with identified suspects, as there is currently no MH DNA database.
  • Enable ancestry prediction capabilities on complex mixtures (practically impossible with single nucleotide polymorphisms).
  • Enhanced relationship testing capabilities. This could expand the number of family reunification cases that are suitable for DNA testing (i.e., at greater distance on the family tree than what conventional STRs allow for sound statistics).

Several sets of DNA mixtures were prepared from unrelated individuals of different ancestries totaling 49 individual mixtures. The samples used were part of a collection obtained between 1993 and 2003. Three sets of mixtures were created with different numbers of contributors and at different contributor ratios to simulate different types of DNA mixtures potentially found at a crime scene.

EuroForMix and DNA-View Mixture Solution probabilistic genotyping software were adapted to processing microhaplotype (MH) data from the 74 loci analyzed on the Ion S5 massively parallel sequencing (MPS) platform. Relative fluorescence unit (RFU) values were replaced by allele-sequence coverage. Default parameters were used for the analyses with stutter modeling switched off. Mixtures of a varying number of contributors and their relative contributor ratios were generated and tested with both short tandem repeat (STR)/conventional using conventional 24-plex assays and the MH 74-plex assay.

Both STR and MH data generated from these mixtures were processed with probabilistic genotyping software simulating multiple scenarios for each mixture by changing the number of known and unknown contributors to the hypothesis tested. Non-contributor tests were also performed to evaluate and compare the significance of likelihood ratios obtained when including known true contributors in the hypothesis at the numerator (Hp) vs known non-contributors to the mixture. Additionally, most scenarios were tested using allele frequencies from the four major US populations (African American, European, Asian American, and Southwest Hispanic).

Cross-sectional
Likelihood ratio

Data in the collection include results, summary charts, graphs, and allele frequency tables.

Not applicable

None

Hide

2023-11-16

2023-11-16 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Checked for undocumented or out-of-range codes.

Hide

Not applicable

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.

NACJD logo

This dataset is maintained and distributed by the National Archive of Criminal Justice Data (NACJD), the criminal justice archive within ICPSR. NACJD is primarily sponsored by three agencies within the U.S. Department of Justice: the Bureau of Justice Statistics, the National Institute of Justice, and the Office of Juvenile Justice and Delinquency Prevention.