Maintaining datasets to support the data linkage community

LinkageLibrary is a National Science Foundation-funded community and repository for researchers involved in combining datasets, facilitating comparison of different algorithms, and promoting transparency and replicability of research. We invite computer scientists, statisticians, and social, behavioral, economic, and health scientists to deposit code and/or data, and to join the conversation.

Why is there a need for LinkageLibrary?

Combining datasets is a powerful means to expanding information and analysis. For example, income could be added to census data at the individual or household level. Contextual data could also be merged, such as distance to the nearest grocery store. The resulting merged dataset would allow nuanced research into food insecurity and availability.

Lack of a community space to share ideas, data and linking techniques hinders research, transparency and replication. Different disciplines refer to combining datasets using different terms—data linkage, entity resolution, deduplication, entity clustering, object identification, duplicate detection, and others. Researchers approach the basic problem in different ways, emphasize different assessments and objectives, and oftentimes reinvent the wheel. Comparisons of different linking methods are rare.

What are the benefits of LinkageLibrary?

  • You can help build a build cross-disciplinary community around data linkage, learning from others who have already tackled the task you face or training the next generation of multidisciplinary data scientists.
  • You can create or download new record linkage and evaluation methods, and real data.
  • You can help improve reproducibility of analyses, close the gap between research and practice develop critical collaborations between researchers, users, and data custodians.
  • Your research will get more exposure.

Come join us in the LinkageLibrary!

PI: Margaret Levenstein, Inter-university Consortium for Political and Social Research, Ross School of Business, School of Information, and Survey Research Center

Co-PI: Susan Hautaniemi Leonard, Inter-university Consortium for Political and Social Research;


  • Hye-Chung Kum, Health Policy & Management; Computer Science & Engineering, Texas A&M University
  • Luiza Antonie, School of Computer Science, University of Guelph

LinkageLibrary is funded by NSF GRANT #1744065