Guidelines for OSTP Data Access Plan

In February 2013, the Executive Office of the President's Office of Science and Technology Policy published a memo entitled "Increasing Access to the Results of Federally Funded Scientific Research," which directs funding agencies with an annual R&D budget over $100 million to develop a public access plan for disseminating the results of their research. ICPSR strongly supports this memorandum and feels it will "promote re-use of scientific data, maximize the return on investments in data collection, and prevent the loss of thousands of potentially valuable datasets."

ICPSR currently partners with several federal agencies to help them fulfill the mandate to provide open access to results of federally funded research. These agencies leverage ICPSR's capacity to curate, preserve, and disseminate data efficiently.

To help these and other federal agencies develop their public access plans, ICPSR is providing guidance on how to meet the requirements laid out in the memo. In the sections below, we provide an overview of each requirement, and discuss why they matter and the key issues to consider when formulating plans. We also provide a glossary of terms for specialized definitions.

We stress that standards and guidelines for many of the requirements currently exist. We also stress that existing specialized, long-lived, and sustainable repositories can mediate between the needs of scientific disciplines and data preservation requirements.

Please contact us for more information about our work or these guidelines.

Elements of a Public Access Plan for Scientific Data

Maximize access

"Maximize access, by the general public and without charge, to digitally formatted scientific data created with Federal funds"

Description

Increasing access to research data prevents the duplication of effort, provides accountability and verification of research results, and increases opportunities for innovation and collaboration.

Finding and accessing data in repositories requires descriptive metadata ("data about data") in standard, machine-actionable form. Metadata help search engines find and catalog data, as well enable researchers to perform detailed searches across and understand the context of data collections. In the social sciences, for instance, the Data Documentation Initiative (DDI) is an established international standard for the description of data.

For an inventory of metadata standards across scientific disciplines, see the Digital Curation Centre website.

Access involves not just finding data, but also knowing how to use and interpret the data. Incomplete, incorrect, or messy data limit use and reuse. Proprietary or obsolete data formats can be unreadable or limit access. Repositories 'curate', or enhance, data to make it complete, self-explanatory, and usable for future researchers. This includes adding descriptive labels, correcting coding errors, gathering documentation, and standardizing the final versions of files. Curation is crucial to maximizing access.

For guidelines on preparing and curating data for archiving, see ICPSR's Guide to Social Science Data Preparation and Archiving and the UK Data Archive's Managing and Sharing Data guide.

Issues to Consider

  • Which descriptive metadata standards will your agency use to help researchers discover and find data? Adequate metadata to describe collections and facilitate discovery are essential; otherwise, data are difficult to find and understand.
  • Which curation standards will your agency promote so data are accurate and useful? Incomplete or messy collections are not as useful or valuable as curated data.

Examples

show

Protect confidentiality and privacy

"...protecting confidentiality and personal privacy"

Description

A growing number of studies include sensitive and confidential data. Stringent protections must be in place to guard and provide access to these data. Robust methods, such as those promoted by the American Statistical Association, are in place for evaluating and treating disclosure risks, and repositories can offer infrastructure, including virtual and physical data enclaves, for protecting and safely sharing confidential data.

For more information, see ICPSR's Confidentiality page or DataONE's Identify data sensitivity page.

Issues to Consider

  • Who will be responsible for reviewing and treating data for confidentiality issues? Data security is of utmost legal and ethical concern.
  • How will disclosure review and treatment impact the future use and reuse of the data? While steps to anonymize data are necessary, these must be done in consideration of the impact they will have on future use.

Examples

show

Preserve intellectual property rights and commercial interests

"...recognizing proprietary interests, business confidential information, and intellectual property rights and avoiding significant negative impact on intellectual property rights, innovation, and U.S. competitiveness"

Description

Original research may be both commercially valuable and proprietary. There are several approaches to managing these interests, including tailoring copyright and patent licenses, such as through Creative Commons licenses, and putting an embargo period or delayed dissemination on distribution. Ultimately, though, all proprietary and personal interests should be considered with an eye toward eventual public access.

Issues to Consider

  • Which licensing options are optimal for your research community? Will your agency require all data, for instance, to be copyright free? Or will data producers be able to choose freely according to their needs and desires?
  • How can commercial interests be reconciled with the mandate of providing open access to data?

Examples

show

Balance demands of long-term preservation and access

"...preserving the balance between the relative value of long-term preservation and access and the associated cost and administrative burden"

Description

Digital preservation is the proactive and ongoing management of digital content to lengthen the lifespan and mitigate against loss, including physical deterioration, format obsolescence, and hardware and software failure. Preserving digital data requires much more than storing files on a server or desktop. At the same time, we also recognize that not all data are worth preserving indefinitely; less valuable or easily producible data may be preserved for shorter periods -- perhaps five to ten years depending on the scientific discipline.

Selection and appraisal guidelines that make it clear what to save or discard ensure that the limited resources available for long-term preservation and access are spent wisely. Selection criteria consider factors like availability, confidentiality, copyright, quality, file format, and financial commitment.

For example selection and appraisal guidelines, see the National Archives and Records Administration (NARA) Appraisal Policy or the Data Preservation Alliance for the Social Sciences (Data-PASS) website.

Long-term costs and administrative burdens are essential to consider when selecting data for preservation. The University of California Curation Center has "developed an analytic framework for modeling the full economic costs of preservation," including an interactive spreadsheet. The Keeping Research Data Safe project, funded by the Joint Information Systems Committee (JISC) in the UK, also produced tools and methodologies for "assessing the costs and benefits of curation and preservation of research data."

Issues to Consider

  • Which data will your agency target and select for long-term preservation? Not all data may fit the agency's scope and goals. What are the criteria for selecting which data to preserve?
  • How long will data be preserved and made accessible? Data are costly to preserve for the long term, and not all data must be preserved in perpetuity.
  • What are the long-term preservation costs to make research data available? Understanding the actual financial costs will sharpen selection and retention policies and decisions.

Examples

No examples of this from the policies surveyed

Use of data management plans

"Ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans and, as appropriate, describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified"

Description

Data management plans provide opportunities for researchers to manage and curate their data more actively from project inception to completion. Careful planning helps ensure quality data products when projects are completed. Recommended components of a plan include descriptions of the nature and scale of the data collection, file format types, metadata standards used, and any intellectual property or confidentiality concerns that exist.

For more information about data management plans, see ICPSR's Guidelines for Effective Data Management Plans, the Digital Curation Centre's Data Management Plans page, and MIT's Data Management Plans page.

Issues to Consider

  • Should your agency mandate which elements must be included in the data management plan? Specific disciplines may have different standards related to data management planning, and certain elements may be more relevant to some research than others. However, having standard formats for data management planning may ease an agency's evaluation of plans.
  • What resources can be provided to educate and aid researchers in the writing of effective data management plans? Data management planning is a relatively new area, so many researchers may not be familiar with what should be included in an effective plan.

Examples

show

Include cost of data management in funding proposals

"Allow the inclusion of appropriate costs for data management and access in proposals for Federal funding for scientific research"

Description

Data management services carry real costs, ranging from personnel to storage to software. Estimating and planning for these costs at the beginning of a project ensures long-term investment in the research data. Just as maintenance costs are routinely built into physical infrastructure development, so too should data management costs be built into data development. Long-term access to data requires durable institutions that plan on a scale of decades and even generations.

For guidance on costs to include when creating funding proposals, see DataONE's Provide budget information for your data management plan page and the UK Data Archive's Costing Tool: Data Management Planning guide.

Issues to Consider

  • How will your agency determine what constitutes reasonable costs for data management? Whether this is a dollar amount or a percentage of the total project budget, having a cap or an expected range for the cost of data management will provide more context for researchers and aid in the evaluation of plans.
  • What funding models are appropriate for supporting long-term data management? Although funding may be limited to the duration of the grant period, data management is not a one-time cost. How can long-term preservation and access be built into proposals?
  • What existing, additional, or new funding tied to proposals will support access and preservation of data?

Examples

show

Evaluate data management plans

"Ensure appropriate evaluation of the merits of submitted data management plans"

Description

Data management plans give insight into the researcher's intentions for their data both during and after the research project. Plans help: researchers prepare for working with and preserving data, repositories get ready to accession and provide access, and agencies to understand the community needs for archiving and access. Evaluation helps refine plans so they are realistic and attainable.

Issues to Consider

  • What standards will be used to determine whether a data management plan is sufficient for a proposed research project? Because data management plans will be specific to the given research project and/or scientific community, can they be assessed based on a standard set of criteria?
  • How will the agency handle data management plans that fall short of criteria? If a data management plan does not meet expected quality standards, agencies should determine how to respond.
  • How will the merits of a data management plan be considered alongside other factors in the evaluation process? Agencies should determine the weighting of this aspect of the funding application against other portions of the proposal.

Examples

show

Ensure researcher compliance with data management plans

"Include mechanisms to ensure that intramural and extramural researchers comply with data management plans and policies"

Description

If data management plans are to be a standard component of funding applications, funding recipients should be held accountable for diversions from the originally stated plans. As a benefit, monitoring and ensuring compliance should increase the quality of data deposited in repositories and ultimately made available to the public.

Issues to Consider

  • How will funders determine what constitutes compliance with data management plans? Since data management is an ongoing process, establishing clear criteria for assessing compliance will be crucial. Does deposit in a trustworthy digital repository qualify as compliant? Is the quality of the final, curated data product also assessed when judging compliance?
  • What penalties will be in place for researchers who do not comply with data management plans? Will non-compliant researchers be penalized when applying for future funding? Will current funding be withheld for non-compliance?
  • What deadlines will be set for researcher compliance with data management plans? Will compliance be measured during just the lifetime of the funding or extend for a period of time after project funding ends?

Examples

show

Promote public deposit of data

"Promote the deposit of data in publicly accessible databases, where appropriate and available"

Description

Public deposit of data helps to ensure the long-term accessibility and preservation of the data. It removes the burden of ongoing maintenance and care from the researcher and provides a stable system to which data can be entrusted. Centralized databases also provide more comprehensive and discoverable resources in one place. Data hosted in repositories are indexed by major search engines and are widely accessible to the public.

Many sustainable online repositories are now available to host and archive research data. These may include discipline-specific repositories, archives administered by funding agencies, or institutional repositories.

Databib, a searchable directory of over 500 research data repositories, can help locate relevant repositories by subject area.

Data producers need to trust that the data they archive will be properly stored and shared, rather than lost, corrupted, or neglected. Data consumers need to trust that the data they receive is the original, unaltered version saved by the producer. The Open Archival Information System (OAIS) Reference Model, the Trusted Digital Repository (TDR) Checklist (ISO/DIS 16363), and the Data Seal of Approval are standards that guide repositories in documenting and verifying that they are organizationally, procedurally, and technologically sound as data custodians.

Issues to Consider

  • Which research data repository will your agency use or recommend to store and disseminate data? There are many repositories available, although not all provide the same services, target similar disciplines, or are set up for long-term, trusted preservation and access.
  • How will your agency insure that selected repositories are trustworthy, secure and long-lived? Standards exist to gauge whether repositories can be trusted to store and disseminate valuable research data.
  • How will publicly deposited data be promoted by the agency?

Examples

show

Private-sector cooperation to improve access

"Encourage cooperation with the private sector to improve data access and compatibility, including through the formation of public-private partnerships with foundations and other research funding organizations"

Description

Since data stewardship can be such a costly and technologically demanding proposition, partnerships with other data stewards and producers can provide opportunities for innovation and collaboration. Cooperation between funding agencies and the private sector can take a number of forms. From collaborating with service providers (such as publishers or web services companies) to develop tools and services, to pooling resources with foundations and private funding organizations, these relationships can result in benefits for all parties involved. Two examples of partnerships between repositories and private-sector companies are Flickr Commons and Google Books; while these projects may differ from those undertaken in the preservation and dissemination of scientific data, they are a useful reference point for understanding the benefits and risks involved.

Issues to Consider

  • What funding structures will be in place to ensure that all organizations involved are benefiting from the partnership? For the partnership to be successful, all parties must ensure that the terms of the agreement are clearly laid out. With well-articulated responsibilities and desired outcomes, all partners may benefit.
  • Will the partnership require any rights to be transferred to the private organization? If the partner requires that copyright be transferred to that organization, access restrictions on the content may result, and the collaboration may go against the ideals of providing free public access to the datasets.
  • How does private-sector cooperation affect access restrictions and intellectual property concerns? If there are confidential or proprietary data involved in the collaborative project, special attention must be paid to protecting these data.

Examples

No examples of this from the policies surveyed

Mechanisms for identification & attribution of data

"Develop approaches for identifying and providing appropriate attribution to scientific data sets that are made available under the plan"

Description

Properly citing data encourages the replication of scientific results, improves research standards, guarantees persistent reference, and gives proper credit to data producers.

Citing data is straightforward. Each citation must include the basic elements that allow a unique dataset to be identified over time: title, author, date, version, and persistent identifier (such as the Digital Object Identifier (DOI), Uniform Resource Name URN, or Handle System). Some academic journals, such as the American Sociological Review, have already adopted a set of standards for citing data. DataCite, an international consortium, strives to improve and support data citation.

For more information, see ICPSR's Data Citations page, IASSIST's Quick Guide to Data Citation, the Digital Curation Centre's guide How to Cite Datasets and Link to Publications, or DataONE's Provide identifier for dataset used page.

Issues to Consider

  • How can funders encourage consistent citation methods for data? This is also the responsibility of researchers, secondary data users, professional organizations, librarians, and others. The basic elements of data citation are clear.

Examples

show

Data stewardship workforce development

"In coordination with other agencies and the private sector, support training, education, and workforce development related to scientific data management, analysis, storage, preservation, and stewardship"

Description

As stakeholders in the research data lifecycle, funding agencies should ensure that those engaging with research data at all stages are trained and aware of their responsibilities and skills. Training both data producers and data stewards in the appropriate methods for managing, curating and preserving research data will help ensure the ongoing accessibility of the research.

The National Science Board emphasized the importance of data stewardship training and development in a 2005 report titled Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century: "Data scientists materially determine the quality of the data collections that now play a vital role in research. Their role is new, so it is crucial that the professional career of data scientist be defined and recognized so that it will attract the best and brightest."

Recent data stewardship workforce development in the United States has included:

ICPSR hosts data stewardship courses as part of its Summer Program in Quantitative Methods of Social Research. These include:

Issues to Consider

  • How can staff be trained in these new competencies and roles relating to digital stewardship? Digital curation may be a minor part of any single staff member's responsibilities, so training should contextualize these activities in terms of broader objectives and processes.
  • What partnerships (with universities, data repositories, etc.) can support the development of these programs?
  • How can agencies create and foster a culture that is supportive of data stewardship and curation? Without cultural buy-in, scientists may hesitate to fully participate.

Examples

show

Long-term support for repository development

"Provide for the assessment of long-term needs for the preservation of scientific data in fields that the agency supports and outline options for developing and sustaining repositories for scientific data in digital formats, taking into account the efforts of public and private sector entities"

Description

We advocate long-term funding for specialized, long-lived, trustworthy, and sustainable repositories that can mediate between the needs of scientific disciplines and data preservation requirements. As digital data management becomes an increasingly important part of scientific research, funding agencies must contribute to the developing ecosystem of services and technologies that support access to and preservation of data.

As we noted in a position paper in January 2013, "Long-term access to data requires durable institutions that plan on a scale of decades and even generations. Such planning is difficult when grant cycles are of limited duration, and proposed projects are rated for innovation and transformation but not for reliability or permanence."

Issues to Consider

  • Who will bear the costs of documenting and preserving all of the data collections? Will the funding agencies fully support all costs? The researchers? The consumers?
  • How can limited resources for data archiving be focused on data with the highest value for secondary analysis? Long-term preservation may constrain resources and require attention to be prioritized across and within data collections.
  • How can cost models be developed to support future preservation costs? Preservation costs can be difficult to predict. Although no cost model is guaranteed to predict the future financial requirements of a repository project, they can help agencies prepare for the long-term nature of this significant investment.

Examples

Acknowledgements
We thank Emily Reynolds and Gavin Strassel, both students at the University of Michigan School of Information, for contributing to the development of this page.

Found a problem? Use our Report Problem form to let us know.