Learn: A Tutorial on Searching Codebooks for Variables

If you were looking at a "raw" dataset, all you would see are rows and columns of numbers called a "data matrix." There are several ways to makes sense of a data matrix. First, statistical packages (e.g., SPSS and Stata) and spreadsheet programs (e.g., Excel) often contain labels for what each column represents in a data matrix. In the datasets you will be using for your investigation, each column of numbers represents a variable , and each row of numbers represents a "case" in the data, which is another term for the unit of analysis .

For example, in the DDB dataset, the unit of analysis is individuals. Therefore, if you were looking at the raw dataset, there would be exactly 84,989 rows. Why so big? If you remember, the DDB Needham marketing company conducted surveys of 3,500-4,000 individuals from the year 1975, and the dataset for your investigation goes through 1998.

However, what is unique about using the SDA Web-based software is that you never actually see the raw dataset. The next best way to make sense of it is to look at a codebook. A codebook is like a guidebook for datasets. One of its most important functions is to provide details about each variable in the dataset.

Let's look at an example using the DDB data.

When you first open open in new window the DDB dataset, you will see the Authorized Download window where you have to first log on to your account at the ICPSR data archive at the University of Michigan (if you have not already created an account, you can do so now).

Once you have logged on, the following window will appear where you can select Open Extra Codebook Window at the top:


Another window will open that will allow you to select from the three search options under Indexes. Select the Sequential Variable List. Once you do this, a window will appear with the following groupings for the DDB dataset: Sample Information, Leisure/Social/Personal Activities, Civic/Political Activities, Media Activities, Social and Political Attitudes, and Demographic and Social Characteristics.


On the left hand side is the shorthand variable name with a brief description of the variable immediately following. For example, under Leisure/Social/Personal Activities the variable "beerbar" is the shorthand name for the variable "Went to a bar or tavern (freq last 12 months)." This is not the most detailed description possible, since the best detail would be the exact question wording. However, for the DDB data, this is all that we have and it is sufficient for our investigation.

If you select "beerbar," a window showing the frequency of the variable will appear.


This frequency reveals even more information about the variable, for example, a '1' means that an individual in the survey did not go to a bar or tavern in the last 12 months. In addition to what the codes such as '1' represent, there is a lot more information packed into this frequency that you will learn about next in Exercise 1. But first, let's return to Exercise 1 and practice what you have just learned.