Machine learning used to identify areas for public health intervention that are hidden in health data

January 27, 2023

Published this week in PLoS ONE, the article, “Inferred networks, machine learning, and health data,” demonstrated the use of machine learning as a tool for understanding medical and health datasets. The authors, Matta et al., presented a framework for applying graph inference techniques and clustering to assess the public health significance of HIV, drug use, homelessness, and health insurance, when mining data from the Sexual Acquisition and Transmission of HIV Cooperative Agreement Program (SATHCAP), 2006-2008 [United States] Restricted Use Files (ICPSR 29181), which they accessed via the National Addiction & HIV Data Archive Program (NAHDAP).

The SATHCAP is a cross-sectional study conducted in three US cities to assess the role of drug use in the sexual transmission of HIV from traditional high-risk groups. The key research questions for SATHCAP were: (1) To what extent do HIV infections among populations of drug users (DU) and men who have sex with men (MSM) spread to uninfected non-DU and non-MSM individuals through sexual activity? (2) What is the role of drugs in this spread? (3) What individual, behavioral, network, and structural characteristics determine the speed, extent, and path of this spread? Respondents were asked questions about their sexual relationships with their partners, method of drug use, name of drugs they used, method of sharing drugs, and method of sexual activities with their partners. Matta et al. were attracted to the robustness of the SATHCAP data, stating that it “can be subjected to standard statistical analysis, the referral chains can be interpreted as complex networks, and the survey design itself can be used to learn more about respondent-driven sampling.” Its size also was key. The participants were asked nearly 1,500 questions, thereby providing a large number of variables applicable for studying hidden relationships involving a variety of public health issues. By identifying meaningful attributes associated with clustered groups, the authors’ methodology “provided a basis for targeted intervention to help prevent HIV, to improve the lives of marginalized groups like injecting drug users and the homeless, and to show the importance of insurance in the mitigation of various health challenges.” See the study home page to read more findings resulting from analyses of data from the SATHCAP.