Survey Research Methods

The study of voting behavior generally relies on information from sample surveys. Aggregate election statistics from states or counties, another common type of election data, are useful for examining the patterns of election results, such as differences in the presidential vote among the 50 states, but such data are not suitable for an analysis that focuses on the individual voter. In order to investigate the factors that affect how people vote, we need information on individuals. Such information commonly includes data on voting behavior, attitudes and beliefs, and personal characteristics. As it is impractical to obtain this information for each member of the electorate, the common procedure is to draw a sample of people from the population and interview these individuals. Once collected, survey data are usually processed and stored in a form allowing for computer-assisted data analysis. This data analysis generally focuses on describing and explaining patterns of political opinion and electoral behavior.

This Dataset

The data for this instructional package are drawn from the 2012 American National Election Study (ANES), sponsored by the University of Michigan and Stanford University. Funding for the 2012 ANES came from the National Science Foundation (NSF). The study interviewed more than 5,900 respondents before and after the election. The study designers were interested in how people respond to different kinds of surveys, so they designed the 2012 ANES to be conducted both through face-to-face interviews and through the Internet. Approximately one-third of the respondents were interviewed face-to-face, while the other two-thirds participated in a Web-based interview. Only a portion of all the information collected by the study is contained in this dataset, and the selected data have been prepared for instructional purposes.

Efficient data analysis requires that the data be recorded, coded, processed, and stored according to standard procedures. This involves representing all information by numeric codes. For example, the information that John Smith is an evangelical Protestant might be stored by recording a value of "2" (evangelical Protestant) on R10 (religion) for respondent "907" (John Smith). This numerically coded information is placed on a storage medium — such as a memory stick — allowing the data to be analyzed with the aid of a computer.


In order to use a dataset, a codebook is needed. The codebook describes the dataset by providing a list of all variables, an explanation of each variable, and a description of the possible values for each variable. The codebook also indicates how the data are stored and organized for use by the computer. A codebook can be thought of as a combination of a map and an index to the dataset.

Survey Sampling

Many people ask how it is possible to make any generalizations about the American public on the basis of a survey sample of about 5900 individuals. The truth of the matter is this: it is not possible to do so unless some methodical type of sampling scheme is used. If we just stood on a street corner and asked questions of the first 5900 people who walked by, we could, of course, not draw any conclusions about the attitudes and opinions of the American public. If however, we have some kind of sampling scheme, a similar size sample can yield accurate generalizations about the American public.

A full explanation of the theory of survey sampling is beyond the scope of this instructional package. However, we can introduce some basics. The goal of any social science survey is to reduce the error in making generalizations about the population. Error can have two origins — systematic and random. The goal of proper sampling is to reduce both of these. Systematic error is much more serious than is random error, so we are especially concerned about it.

We can reduce error in a survey through good sampling.

The most basic form of sampling is the simple random sample. This involves drawing a sample from a list of all members of the population in such a way that everybody in the population has the same chance of being selected for the sample.

Simple random samples are not appropriate for many social science applications. Often, we want samples in which we are sure there are a similar number of subgroups (women, Southerners, union members, etc.) in the sample as there are in the population. A simple random sample will not guarantee this. Stratified probability sampling comes closer to this guarantee.

Simple random samples are impractical in national face-to-face surveys for two main reasons:

  • There is no national list of American adults;
  • The sample would be scattered all over the U.S., making it very expensive to conduct face-to face interviews.

Therefore, a form of stratified cluster probability sampling is used in national face-to-face surveys.

The actual form of sampling depends on whether the interviews will be conducted in person or by telephone or in some other way such as on the Internet.

Sources of Error in Surveys

Potential sources of error in national surveys include:

  • The sampling procedure itself. Since surveys are based on samples and not the entire population, there is a certain amount of random sampling error in any survey. For a properly constructed national sample with about 2000 respondents, the margin of error is around +/-2 percentage points. For the large sample of 5900 in the 2012 ANES, the margin of error should be about +/-1.3 percentage points, but the complex nature of the sample makes it difficult to use simple margin of error calculations;

  • Certain unavoidable systematic errors. For example, the ANES does not conduct face-to-face interviews in Alaska and Hawaii. Also, homeless people and those in penal or mental institutions are not sampled.

  • Survey nonresponse. This is the result of not being able to contact a potential respondent;

  • Refusals to cooperate with the survey by potential respondents. As the number of surveys and polls has increased in recent years, respondents have displayed survey fatigue, and nonresponse has increased over time;

  • Lack of candor by respondents. This involves questions that have socially acceptable answers (Did you vote in the election?) or socially unacceptable answers (Did you cheat on your income tax last year?);

  • Inability of respondents to remember past behaviors. For example, did you contact a public official about some matter in the past year?;

  • Respondents misconstruing survey questions as exams. This can result in them providing answers to questions that they really have not thought much about (non-attitudes);

  • Badly trained interviewers. These might give respondents cues as to how to answer questions, or mis-record respondents' answers, or falsify data;

  • Errors in the preparation, coding, and processing. These can occur when entering the survey into a computer data file.

When these and other errors are random in nature, they are annoying, but when the errors are systematic they can cause great trouble. We can reduce systematic error in a survey through proper training of interviewers and adherence to proper norms of developing and conducting surveys, such as those developed by the American Association for Public Opinion Research (AAPOR).

It is important to be aware of the potential sources of error when working with survey data. Small differences in percentages may be the result of error and so be virtually meaningless.

2012 ANES Data

The 2012 American National Election Study was conducted both face-to-face and via the Internet. The face-to-face interviews utilized Computer-Assisted Personal Interviewing (CAPI) technology — either the interviewer read the questions from a tablet screen to the respondent and recorded his or her answers, or the interviewer handed the tablet to the respondent, who recorded his or her own answers. Great care was taken in identifying the sample, training interviewers, and in the conducting of the interviews. The care that ANES takes in conducting surveys results in data of a very high quality, but it also is expensive. Face-to-face surveys are more expensive to conduct than are telephone or Internet surveys because face-to-face surveys require interviewers to be sent out into the field. To ensure that face-to-face interviews are high quality, the field interviewers must be very highly trained, as there is little supervision when they are out of the office. Many researchers feel that face-to-face interviews yield "richer" data; the interviewer can ask a variety of follow-up questions, can spend adequate time making the respondent feel comfortable in answering sensitive questions, and can note any special circumstances about the interview that might have affected the answers given. Face-to-face interviews were conducted by Abt SRBI, with one interview conducted before the November election and one after the election. Some 2,056 respondents were interviewed face to face.

The Internet sample was identified by GfK (formerly Knowledge Networks). GfK recruits participants for Internet surveys via email and then selects samples from their lists of potential respondents. Respondents for the 2012 ANES Internet survey were selected using both random digit dialing (RDD) sampling of telephone numbers and address-based sampling (ABS) methodology (described below). Respondents for the Internet survey logged into a website where the questions were displayed and the respondent could answer them online. Some 3,860 respondents were interviewed twice before the November election and twice after the election.

The response rate for the face-to-face 2012 ANES is approximately 49 percent, low in comparison to previous ANES studies. The response rate for the Internet survey is more difficult to calculate but is within Knowledge Panel expectations. The dataset for this instructional package includes only respondents who were interviewed both before and after the election.

The data for this instructional module are weighted. Weighting a dataset is a technical procedure to correct the data for several basic factors. The goal of weighting is to produce a dataset that is more demographically representative of the population. When the data are weighted, some respondents count for more than one person (they have a sample weight greater than 1.0) and some count for less than one person (they have a sample weight less than 1.0). You need not be overly concerned about weighting, as the dataset is designed to be automatically weighted when you open it, and you will only sometimes notice that are you working with weighted data. Some small anomalies that you may occasionally notice will be the result of working with weighted data. More about Weighting.