Announcing LinkageLibrary, a new resource for researchers who merge data
The University of Michigan and Texas A&M have launched the LinkageLibrary, a new website to share and preserve linked data projects. The website is hosted by ICPSR at the U-M Institute for Social Research. We asked the project team about the need for the LinkageLibrary, and how it will support the data linkage community.
Why is there a need for LinkageLibrary?
Combining datasets is a powerful means to expanding information and analysis. For example, income could be added to census data at the individual or household level. Contextual data could also be merged, such as distance to the nearest grocery store. The resulting merged dataset would allow nuanced research into food insecurity and availability.
Lack of a community space to share ideas, data and linking techniques hinders research, transparency and replication. Different disciplines refer to combining datasets using different terms — data linkage, entity resolution, deduplication, entity clustering, object identification, duplicate detection and others. Researchers approach the basic problem in different ways, emphasize different assessments and objectives, and oftentimes reinvent the wheel. Comparisons of different linking methods are rare.
How will LinkageLibrary support the data linkage community?
LinkageLibrary maintains datasets to support the data linkage community. The project is an NSF-funded collaboration between the University of Michigan and Texas A&M to bring together social and data scientists involved in combining datasets into a community and repository. Linkage Library facilitates comparison of different algorithms and promotes transparency and replicability of research. LinkageLibrary also preserves linked data and data linkage methodologies.
What can members of the data linkage community do to get involved?
LinkageLibrary is asking the research community to be a part of this exciting new interactive repository: create a project, comment on a project, add to a project, recommend the repository to others. By joining the LinkageLibrary YOU can help build a build cross-disciplinary community around data linkage, learning from others who have already tackled the task you face or training the next generation of multidisciplinary data scientists. You can create or download new record linkage and evaluation methods, and real data. You can help improve reproducibility of analyses, close the gap between research and practice develop critical collaborations between researchers, users, and data custodians. Your research will get more exposure.
What’s next?
Come join us in the LinkageLibrary! We invite computer scientists, statisticians, and social, behavioral, economic, and health scientists to deposit code and/or data, and to join the conversation. Contact the LinkageLibrary team for more information.
For more information:
Feb 21, 2019