Data Harmonization

flavor image depicting a a bunch of connnect points with some enlarged to show human silhouettes

Introduction

"Consistent large-scale microdata that extend over many decades and span national boundaries with fine geographic detail provide a unique laboratory for studying demographic processes and for testing social and economic models." -Steven Ruggles, Demography, 2014.

Data harmonization refers to all efforts to combine data from different sources and provide users with a comparable view of data from different studies.This process is becoming more and more significant in demography and sociology research, since the needs of data harmonization is rapidly growing as the volume and the need to share existing data explodes.

Examples of Data Harmonization Efforts

  • University of Minnesota Population Center -- IPUMS Data

    Steven Ruggles is the Professor of History at the University of Minnesota and the Director of the Minnesota Population Center. Professor Steve Ruggles has been working on a set of harmonized resources for demographic research:

    • IPUMS International, harmonized data from 1960 forward on people in 130 censuses from around the world.

    • IPUMS USA, harmonized data on people in the U.S. census and American Community Survey, from 1850 to the present.

    • IPUMS CPS, harmonized data on people in the Current Population Survey, every March from 1962 to the present.

    • North Atlantic Population Project, harmonized data from the 1800s censuses of Canada, Great Britain, Norway, Sweden, and the U.S.

    • Integrated Health Interview Series, harmonized data on people in the U.S. National Health Interview Survey, from the 1960s to the present.

    • American Time Use Survey, harmonized data from 2003 forward on how U.S. adults use their time.

    • Terra Populus, integrated data on world's population and environmental data, from 1960 to the present.

  • McGill University Health Centre Research Institute, Canada

    Dr. Isabel Fortier leads the DataSHaPER (Data Schema and Harmonization Platform for Epidemiological Research) program at the Research Institute of the McGill University Health Center. DataSHaPER aims to facilitate the prospective harmonization of emerging biobanks. Also, she serves as a coordinator of data harmonization for the BioSHaRE (Biobank Standardization and Harmonization for Research Excellence in the European Union) project. Dr Isabel Fortier has been working on harmonizing of data across large European cohorts studies including UK Biobank, Lifelines, KORA, LifeGene, Estonian Genome Center, Nord-Trøndelag Health Study, National Child Development Study, and National FINRISK Study.

  • RAND

    Jinkook Lee is an adjunct staff member at the RAND Corporation. Lee leads the research network of the Health and Retirement Studies around the world and she has developed the Survey Meta Data Repository with her colleagues at RAND. Also, she received a five-year grant in 2010 from the NIH entitled Harmonization of Cross-national Studies of Aging to the Health and Retirement Study.

Selected Publications

Angrisani, M., Lee, J. (2012). Harmonization of Cross-National Studies of Aging to the Health and Retirement Study: Income Measures. RAND Corporation Working Papers.

Angrisani, M., Lee, J. (2012). Harmonization of Cross-National Studies of Aging to the Health and Retirement Study: Wealth Measures. RAND Corporation Working Papers.

Delavande, A., Lee, J., Yoong J. K. (2012). Harmonization of Cross-National Studies of Aging to the Health and Retirement Study: Expectations. RAND Corporation Working Papers.

Delavande, A., Lee, J., Yoong J. K. (2013). Harmonized LASI Pilot Data Documentation: Version A. RAND Corporation Working Papers.

Dewaard, J., Kim, K., Raymer, J. (2012). Migration systems in Europe: evidence from harmonized flow data. Demography, 49, 1307-33.

Fortier, I., Burton, P. R., Robson, P. J., Ferretti, V., Little, J., L'Heureux, F., Deschênes, M., Knoppers, B. M., Doiron, D., Keers, J. C., Linksted, P., Harris, J. R., Lachance, G., Boileau, C., Pedersen, N. L., Hamilton, C. M., Hveem, K., Borugian, M. J., Gallagher, R. P., McLaughlin, J., Parker, L., Potter, J. D., Gallacher, J., Kaaks, R., Liu, B., Sprosen, T., Vilain, A., Atkinson, S. A., Rengifo, A., Morton, R., Metspalu, A., Wichmann, H. E., Tremblay, M., Chisholm, R. L., Garcia-Montero, A., Hillege, H., Litton, J. E., Palmer, L. J., Perola, M., Wolffenbuttel, B. H., Peltonen, L., Hudson, T.J. (2010). Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. International Journal of Epidemiology, 39, 1383-93.

Fortier, I., Doiron, D., Burton, P., Raina, P. (2011). Invited commentary: consolidating data harmonization--how to obtain quality and applicability? American Journal of Epidemiology, 174, 261-4.

Fortier, I., Doiron, D., Little, J., Ferretti, V., L'Heureux, F., Stolk, R. P., Knoppers, B. M., Hudson, T. J., Burton, P. R. (2011). Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies. International Journal of Epidemiology, 40, 1314-28.

Fortier, I., Doiron, D., Wolfson, C., Raina, P. (2012). Harmonizing data for collaborative research on aging: why should we foster such an agenda? Canadian Journal on Aging, 31, 95-9.

Granda, P., Wolf, C. and Hadorn, R. (2010) Harmonizing Survey Data, in Survey Methods in Multinational, Multiregional, and Multicultural Contexts (eds J. A. Harkness, M. Braun, B. Edwards, T. P. Johnson, L. Lyberg, P. Ph. Mohler, B.-E. Pennell and T. W. Smith), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470609927.ch17

Hu, P., Lee, J. (2012). Harmonization of Cross-National Studies of Aging to the Health and Retirement Study: Chronic Medical Conditions. RAND Corporation Working Papers.

Kashyap, R., Esteve, A., García-Román, J. (2015). Potential (Mis)match? Marriage Markets Amidst Sociodemographic Change in India, 2005-2050. Demography, 52, 183-208.

King, G. (2011). Ensuring the data-rich future of the social sciences. Science, 331, 719-721.

Kornrich, S., Furstenberg, F. (2013). Investing in children: changes in parental spending on children, 1972-2007. Demography, 50, 1-23.

McCaa, R., & Ruggles, S. (2002). The census in global perspective and the coming microdata revolution. Scandinavian Population Studies, 13, 7-30.

NIA Workshop Summary Report. (2012). Harmonization Strategies for Behavioral, Social Science, and Genetic Research: Workshop Summary.

Noble, P., VAN, R. D., Ruggles, S., Schroeder, J., Hindman, M. (2011). Harmonizing Disparate Data across Time and Place: The Integrated Spatio-Temporal Aggregate Data Series. Historical Methods, 44, 79-85.

Roberts, E., Ruggles, S., Dillon, L. Y., Gardarsdottir, O., Oldervoll, J., Thorvaldsen, G., Woollard, M. (2003). The North Atlantic Population Project: An overview. Historical Methods, 36, 80-88.

Ruggles S. (2014). Big microdata for population research. Demography, 51, 287-97.

Ruggles, S. (2005). The Minnesota Population Center Data Integration Projects: Challenges of harmonizing census microdata across time and place. Proceedings of the American Statistical Association, Government Statistics Section, Alexandria, VA: American Statistical Association, pp. 1405-1415.

Ruggles, S.(2006). Linking historical censuses: a new approach. History and Computing, 14, 213-224.

Ruggles, S., Roberts, E., Sarkar, S., & Sobek, M. (2011a). The North Atlantic Population Project: Progress and prospects. Historical Methods, 44, 1-7.

Sevilla, A., Gimenez-Nadal, J. I., Gershuny, J. (2012). Leisure inequality in the United States: 1965-2003. Demography, 49, 939-64.

Shih, R. A., Lee, J., Das, L. (2012). Harmonization of Cross-National Studies of Aging to the Health and Retirement Study: Cognition. RAND Corporation Working Papers.

Sobek, M., Cleveland, L., Flood, S., Hall, P. K., King, M. L., Ruggles, S., Schroeder, M. (2011). Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center. Historical Methods, 44, 61-68.

Zissimopoulos, J., Lee, J., Carroll, J. (2012). Harmonization of Cross-National Studies of Aging to the Health and Retirement Study: Financial Transfer. RAND Corporation Working Papers.