ICPSR Editor
The Virtual Data Enclave (VDE) at ICPSR is now available to accept a variety of restricted-use data from depositors.
The VDE provides researchers access to quantitative and qualitative restricted-use data in a secure environment. It is a virtual machine launched from the researcher's desktop but operating on a remote server at ICPSR, similar to remotely logging into another physical computer. The virtual machine is isolated from the users' physical desktop computers, restricting them from downloading files. Users also are prevented from emailing, copying, or otherwise moving files outside of the secure environment.
Data remain on ICPSR file servers and are accessed and analyzed by researchers virtually. The results of analyses are reviewed by ICPSR for disclosure risk before they are transferred to researchers.
Providing data to researchers
"The VDE offers new ways of providing important data to the research community while respecting the confidentiality of subjects," said ICPSR Director George Alter. "We are reminded every day that computer security is a complex problem, and the VDE is an important new tool for reconciling data access with data protection."
Two thematic archives at ICPSR have experience using the virtual environment to provide access to data. They are the Bill & Melinda Gates Foundation-sponsored Measures of Effective Teaching Longitudinal Database (MET LDB) and the Substance Abuse and Mental Health Data Archive (SAMHDA), funded by the Substance Abuse & Mental Health Services Administration. (SAMHDA's virtual enclave is named the Data Portal.)
About 100 total data files constituting 10 studies in the two archives currently are housed within the virtual enclave.
Other archives at ICPSR are exploring or implementing VDE access, said Asmat Noori, assistant director of ICPSR's Computer and Network Services department.
The VDE is a standard Windows desktop environment with Microsoft Office and a broad range of widely used statistical packages and GIS software.
Benefits for depositors and funders
For the depositor or funder, the VDE offers several benefits. It:
- Provides a more secure distribution of data than physically sending data files to users. "The VDE keeps data on the secure server, so we're not sending it 'out' anywhere," said Linda Detterman, ICPSR director of Marketing and Membership. "It's not being downloaded, and it's not being delivered on a CD. It's not going to go across the firewall."
- Reduces the risk of a research subject being identified, as the results of analyses are reviewed by ICPSR before they are released.
- Ensures access to data is removed at the end of the contract period.
"We've found that depositors are broadening their field of vision in terms of what data they will consider depositing," said Johanna Bleckman, a manager of the MET LDB project. "Data that were formerly assumed to be too risky are being seriously considered for access and secondary analysis via ICPSR, which is a big win for the field. We've also seen a fair amount of interest in exploring VDE access to data with contextual variables or geocodes."
The number of VDE users continues to grow. About 60 project or research groups -- utilizing about 300 user accounts -- have access to files in it.
Benefits for researchers
For researchers, the VDE:
- Provides a collaborative work environment for teams. "The VDE is particularly useful when researchers are at different institutions," said John Marcotte, SAMHDA project director.
- Makes updates to the data more readily available, making them accessible quickly.
- Simplifies data security plans and requirements for users. "They don't need to worry about data encryption, firewalls, access control, backups, etc. All that is handled on our end," Noori said.
- Offers an opportunity for access to data that would otherwise be off-limits due to significant disclosure risk. "Data now or in the future offered via the VDE would have either undergone more stringent risk mitigation -- data transformations that protect confidentiality but often reduce analytic value to some extent -- or access would have been offered via the ICPSR Physical Enclave, requiring a trip to Ann Arbor for analysis," said Bleckman.
Part of geospatial data project
Additionally, the VDE is a key element in a two-year, NSF-funded research project, "Research on Unique Confidentiality Risks & Geospatial Data Sharing within a Virtual Archive." The project explores the unique confidentiality characteristics of geospatial data and tests various methods of masking such data within the VDE. Douglas Richardson, executive director of the Association of American Geographers, is Principal Investigator. Alter is co-PI.
"The virtual data environment allows the sharing of confidential geospatial research data among researchers, and it also allows some of that data to be masked and removed from the VDE for publication, distribution, and so forth, once it has been transformed," Richardson said.
Bleckman said researchers have helped ICPSR enhance the enclave over the past two years. "They have provided feedback on the user experience, and we have refined the tool and the experience in response."
Technology utilized in public-access service
The technologies of the VDE also are being employed in ICPSR's new public-access data sharing service, openICPSR, for handling restricted-use datasets. "A virtual environment is an expensive and complicated thing to build, and we've got experience using it," Detterman said. "So it's a great thing that openICPSR can utilize the existing virtual environment infrastructure and our knowledge about using it."
"ICPSR loves data and wants to see people use data," said Marcotte. "When data are restricted-use, the virtual environment provides an additional avenue for making them available to researchers."
Data providers interested in depositing data for use in the virtual enclave should contact Amy Pienta, ICPSR director of Acquisitions (apienta@umich.edu).