Guide to Social Science Data Preparation and Archiving:
Phase 3: Data Collection and File Creation

Qualitative Data

With proper and complete documentation, archived qualitative data can provide a rich source of research material to be reanalyzed, reworked, and compared to other data.

The UK Data Service, a national center of expertise in data archiving in the United Kingdom, suggests five possible reuses of qualitative data (2007):

  • Comparative research, replication or restudy of original research -- comparing with other data sources or providing comparison over time or between social groups or regions, etc.
  • Re-analysis -- asking new questions of the data and making different interpretations than the original researcher made.
  • Research design and methodological advancement -- designing a new study or developing a methodology or research tool by studying sampling methods, data collection, and fieldwork strategies.
  • Description -- describing the contemporary and historical attributes, attitudes and behavior of individuals, societies, groups, or organizations.
  • Teaching and learning -- providing unique materials for teaching and learning research methods.

Types of qualitative data

Examples of types of qualitative data that may be archived for secondary analysis include:

  • In-depth/unstructured interviews, including video
  • Semi-structured interviews
  • Structured interview questionnaires containing substantial open comments
  • Focus groups
  • Unstructured or semi-structured diaries
  • Observation field notes/technical fieldwork notes
  • Case study notes
  • Minutes of meetings
  • Press clippings
  • Court transcripts

This is only a partial list and is not meant to be exhaustive. Concerns about what can be submitted for deposit should be discussed with archive staff.

Confidentiality in qualitative data

Ideally, prior to submitting qualitative data to an archive, data depositors should take care to remove information that would allow any of their research subjects to be identified. This process can be made less arduous by creating an anonymization scheme prior to data collection and anonymizing the data as the qualitative files are created for the analysis.

The following are examples of modifications that can be made to qualitative data to ensure respondent confidentiality (Marz and Dunn, 2000):

  • Replace actual names with generalized text. For example, “John” can be changed to “uncle” or “Mrs. Briggs” to “teacher.” More than one person with the same relationship to the respondent can be subscripted to represent each unique individual -- e.g., friend1, friend2. Demographic information can also be substituted for actual names of individuals, e.g., “John” can be changed to “M/W/20” for male, white, 20 years old. Pseudonyms can be used; however, they may not be as informative to future users as other methods of name replacement. Note that actual names may also be store names, names of juvenile facilities, transportation systems, program names, neighborhood names, or other geographic location and their acronyms or well-known and/or often used nicknames.
  • Replace dates. Dates referring to specific events, especially birthdates or events involving the criminal justice system, should be replaced with some general marker for the information, e.g., “month,” “month/year,” or “mm/dd/yy.”
  • Remove unique and/or publicized items. If the item cannot be generalized using one of the above options, the entire text may need to be removed and explicitly marked as such, e.g., using either “description of event removed,” or ellipses (“ ... ”) as a general indicator.

Since investigators are most familiar with their data, they are asked to use their judgment on whether certain qualitative information in combination with the rest of the text or related quantitative information could allow an individual to be identified.

Data depositors should document any modifications to mask confidential information in the qualitative data. This will ensure that archive staff do not make unnecessary changes to the investigator’s modifications when performing their confidentiality review. Such information will thus also be made available to secondary users of the data to assist them with their use of the data.

Documentation for qualitative data

In order for qualitative data to be used in secondary analysis, it is extremely important that the data are well-documented. Any information that could provide context and clarity to a secondary user should be provided. Specifically, documentation for qualitative data should include:

  • Research methods and practices (including the informed consent process) that are fully documented
  • Blank copy of informed consent form with IRB approval number
  • Details on setting of interviews
  • Details on selection of interview subjects
  • Instructions given to interviewers
  • Data collection instruments such as interview questionnaires
  • Steps taken to remove direct identifiers in the data (e.g., name, address, etc.)
  • Any problems that arose during the selection and/or interview process and how they were handled
  • Interview roster

The purpose of the interview roster is twofold. First, it provides archive staff a means of checking the completeness and accuracy of the data collection provided for archiving. Second, the interview roster provides a summary listing of available interviews to a secondary user to allow for a more focused review of the data.