Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
NOTE: By downloading ICPSR metadata records, you agree to ICPSR's Conditions of Use regarding those records.
ICPSR provides study-level metadata via OAI-PMH. Using OAI-PMH is pretty straightforward:
- Request your IT staff install an OAI harvester.
- In the harvester software, enter the base URL below as well as the metadataPrefix for the format you wish to download.
- Run the software.
Some harvesters operate at the unix/linux command line; some operate using simple web interfaces. ICPSR tested its OAI-PMH implementation using jOAI, which uses a web interface.
OAI-PMH is basically a base URL
http://www.icpsr.umich.edu/icpsrweb/ICPSR/oai/studies with 1-3 variables tacked onto the end of the URL. The three variables are:
- metadataPrefix (format)
So a URL to retrieve the metadata record for ICPSR 6849 in Dublin Core format would look like this:
For the non-techies, variables are added to the end of a URL after a question mark. Individual variables are constructed as fieldname=value and are separated with ampersands.
The metadataPrefix variable spells out the format of the output. For example, you might choose to obtain MARC output, or DDI XML, or Dublin Core. At present, ICPSR supports the following prefixes:
- Dublin Core -
- DDI 2.5 -
- DDI 2.5 with Citations -
- MARC21XML -
ICPSR maintains a bibliography of publications that cite its datasets. If you choose DDI 2.5 with Citations, those citations will be included with the metadata. Please note that this will probably slow down the data harvesting, as it involves importing an additional 65,000+ citation records. Most datasets have 5-20 related citations; our most popular dataset has over 5,000 related citations.
If you would like ICPSR to provide additional formats/objects, please contact us at firstname.lastname@example.org.
The verb variable spells out what kind of result you want to obtain; not all OAI-PMH verbs are useful for our particular implementation of OAI-PMH. The useful verbs are:
- ListRecords - Retrieves 50 records at a time. ICPSR has over 9000 studies, so we use something called a resumptionToken, which will enable scripts to retrieve the entire collection in 50-record increments.
- GetRecord - Return an individual metadata record; requires an identifier
In addition, there are other OAI-PMH verbs that we don't fully utilize:
- Identify - This just provides a little information on the OAI-PMH service and repository.
- ListSets - Not used by ICPSR.
- ListIdentifiers - This returns a list of ICPSR identifiers (and the release date for each). Since ICPSR identifiers are just the study numbers, this isn't terribly useful.
- ListMetadataFormats - Lists the available metadata formats for a given record; requires an identifier. As ICPSR currently only supports Dublin Core, this is mostly useless.
The identifier variable enables you to spell out which object you wish to retrieve, in this case a study. ICPSR identifiers are just the study number. You can use either the 5-digit study number, or the study number without padding. I.e., both 6849 and 06849 will work.
ICPSR can provide some support for OAI-PMH if our server is not responding or the retrieved metadata is not valid. We can also add additional metadata formats if there is sufficient demand. We cannot provide support for installing or implementing OAI harvesters; that responsibility lies with your IT staff.
If you have questions, please email us at email@example.com.
ICPSR originally tested its OAI-PMH implementation using jOAI. As of 2015-12-18, we were able to download the full set of 9400+ metadata records in three different formats and all the XML was valid.