What is a codebook?
A codebook describes the contents, structure, and layout of a data collection. A well-documented codebook "contains information intended to be complete and self-explanatory for each variable in a data file1."
Codebooks begin with basic front matter, including the study title, name of the principal investigator(s), table of contents, and an introduction describing the purpose and format of the codebook. Some codebooks also include methodological details, such as how weights were computed, and data collection instruments, while others, especially with larger or more complex data collections, leave those details for a separate user guide and/or data collection instrument.
The main body of a codebook contains unambiguous variable level details. These include, as shown in the example below from the National Longitudinal Survey of Youth, 19792, the following:
- Variable name: The name or number assigned to each variable in the data collection. Some researchers prefer to use mnemonic abbreviations (e.g., EMPLOY1), while others use alphanumeric patterns (e.g., VAR001). For survey data, try to name variables after the question numbers - e.g., Q1, Q2b, etc. [In above example, H40-SF12-2]
- Variable label: A brief description to identify the variable for the user. Where possible, use the exact question or research wording. ["SF12 - ASSESSMENT OF R'S GENERAL HEALTH"]
- Question text: Where applicable, the exact wording from survey questions. ["In general, would you say your health is . . ."]
- Values: The actual coded values in the data for this variable. [1, 2, 3, 4, 5]
- Value labels: The textual descriptions of the codes. [Excellent, Very Good, Good, Fair, Poor]
- Summary statistics: Where appropriate and depending on the type of variable, provide unweighted summary statistics for quick reference. For categorical variables, for instance, frequency counts showing the number of times a value occurs and the percentage of cases that value represents for the variable are appropriate. For continuous variables, minimum, maximum, and median values are relevant.
- Missing data: Where applicable, the values and labels of missing data. Missing data can bias an analysis and is important to convey in study documentation. Remember to describe all missing codes, including "system missing" and blank. [e.g., Refusal (-1)]
- Universe skip patterns: Where applicable, information about the population to which the variable refers, as well as the preceding and following variables. [e.g., Default Next Question: H00035.00]
- Notes: Additional notes, remarks, or comments that contextualize the information conveyed in the variable or relay special instructions. For measures or questions from copyrighted instruments, the notes field is the appropriate location to cite the source.
For variables that are compiled, created, or constructed, such as the examples below from the Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-19403 study and the Welfare, Children, and Families: A Three-City Study4 , fewer details are needed: variable name and label, as well as a description of how the data were compiled or created.
The order of variable descriptions in the codebook usually matches the order of the data. To enhance usability on complex or larger data collections, researchers sometimes add appendices listing variable names and labels alphabetically, by sample characteristic, or according to the substantive groups to which they belong - e.g., Demographic Variables, Health Status Variables. This is helpful to the user in locating variables of interest.
Codebooks come in a variety of shapes and formats. As long as the content is complete and self-explanatory, the stylistic touches can match the needs of the research project.
Below are additional examples of variable level details from a wide variety of research codebooks.
American National Election Study, 2008-2009 Panel Study5
National Longitudinal Study of Adolescent Health (Add Health), 1994-19956
General Social Surveys, 1972-20087
National Survey on Drug Use and Health, 20098
Capital Punishment in the United States, 1973-20089
UK Data Archive, "Documenting Your Data/Data Level/Structured Tabular Data"http://www.data-archive.ac.uk/create-manage/document/data-level?index=1
Institute for Health and Care Research Quality Handbookhttp://www.emgo.nl/kc/codebook/
Princeton University Data and Statistical Services, "How to Use a Codebook"http://dss.princeton.edu/online_help/analysis/codebook.htm
UCLA Social Science Data Archive, "Codebooks"http://dataarchives.ss.ucla.edu/tutor/tutcode.htm
1Guide to the NLSY97 Data. Retrieved August 1, 2011, from http://www.nlsinfo.org/nlsy97/97guide/chap3.htm#threethree
2Ohio State University. Center for Human Resource Research. National Longitudinal Survey of Youth, 1979 [Computer file]. ICPSR04683-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2007-09-17. doi:10.3886/ICPSR04683
3Fogel, Robert W., et al. Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-1940 [Computer file]. ICPSR06837-v6. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2006-06-05. doi:10.3886/ICPSR06837
4Angel, Ronald, Linda Burton, P. Lindsay Chase-Lansdale, Andrew Cherlin, and Robert Moffitt. Welfare, Children, and Families: A Three-City Study [Computer file]. ICPSR04701-v7. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-02-10. doi:10.3886/ICPSR04701
5American National Election Study, 2008-2009 Panel Study Frequency codebook, version 20090903. Retrieved August 1, 2011, from http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_fcodebook.txt
6National Longitudinal Study of Adolescent Health (Add Health), Wave I School Administrator Codebook. Retrieved August 1, 2011, from http://www.cpc.unc.edu/projects/addhealth/codebooks/wave1/index.html
7Davis, James A., Tom W. Smith, and Peter V. Marsden. General Social Surveys, 1972-2008 [Cumulative File] [Computer file]. ICPSR25962-v2. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-02-08. doi:10.3886/ICPSR25962
8United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2009 [Computer file]. ICPSR29621-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-16. doi:10.3886/ICPSR29621
9United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. Capital Punishment in the United States, 1973-2008 [Computer file]. ICPSR27982-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-09-07. doi:10.3886/ICPSR27982