Synthetic Data Generation of Health and Demographic Surveillance Systems Dataset, Kenya, 2019-2020 (ICPSR 39209)

Version Date: Oct 1, 2024 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Akbar K. Waljee, University of Michigan. Medical School

https://doi.org/10.3886/ICPSR39209.v1

Version V1

Slide tabs to view more

Surveillance data play a vital role in estimating the burden of diseases, pathogens, exposures, behaviors, and susceptibility in populations, providing insights that can inform the design of policies and targeted public health interventions. The use of Health and Demographic Surveillance System (HDSS) collected from the Kilifi region of Kenya, has led to the collection of massive amounts of data on the demographics and health events of different populations. This has necessitated the adoption of tools and techniques to enhance data analysis to derive insights that will improve the accuracy and efficiency of decision-making. Machine Learning (ML) and artificial intelligence (AI) based techniques are promising for extracting insights from HDSS data, given their ability to capture complex relationships and interactions in data. However, broad utilization of HDSS datasets using AI/ML is currently challenging as most of these datasets are not AI-ready due to factors that include, but are not limited to, regulatory concerns around privacy and confidentiality, heterogeneity in data laws across countries limiting the accessibility of data, and a lack of sufficient datasets for training AI/ML models. Synthetic data generation offers a potential strategy to enhance accessibility of datasets by creating synthetic datasets that uphold privacy and confidentiality, suitable for training AI/ML models and can also augment existing AI datasets used to train the AI/ML models. These synthetic datasets, generated from two rounds of separate data collection periods, represent a version of the real data while retaining the relationships inherent in the data. For more information please visit The Aga Khan University Website.

Waljee, Akbar K. Synthetic Data Generation of Health and Demographic Surveillance Systems Dataset, Kenya, 2019-2020. Inter-university Consortium for Political and Social Research [distributor], 2024-10-01. https://doi.org/10.3886/ICPSR39209.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote

Region

Inter-university Consortium for Political and Social Research
Hide

2019 -- 2020
2019 (Round 6), 2020 (Round 8)
  1. Information on key indicators and their definitions within KRHDDS and variable attributes can be found in the P.I. documentation.

Hide

Longitudinal, Cross-sectional

All residents of rural Kaloleni and Rabai sub-counties in the coastal area of Kenya.

Individual, Household

The HDSS is a population-based demographic and health surveillance system established in 2017. It was developed and is maintained by the Aga Khan University in collaboration with the Health Management of two rural sub-counties in the coast of Kenya. The HDSS captures both cross-sectional and longitudinal data for information on forty populations, health and social determinants of health, and vital events according to the guidelines provided by the Ministry of Health household-level data collection in Kenya. In each data collection round, demographics, household composition, health events, and vital events (i.e., births and deaths) are collected and shared with the local health department. In addition, HDSS serves as a sampling frame for surveys, as a source of population controls in case-control studies, and links these data longitudinally on unique individual identifiers (IDs), providing a basis for individual and cluster tracking in randomized trials and cohort studies.

Hide

2024-10-01

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.