Survey Research Methods

The study of voting behavior generally relies on information from sample surveys. Aggregate election statistics from states or counties, another common type of election data, are useful for examining the patterns of election results, such as differences in the presidential vote among the 50 states, but such data are not suitable for an analysis that focuses on the individual voter. In order to investigate the factors that affect how people vote, we need information on individuals. Such information commonly includes data on voting behavior, attitudes and beliefs, and personal characteristics. As it is impractical to obtain this information for each member of the electorate, the common procedure is to draw a sample of people from the population and interview these individuals. Once collected, survey data are usually processed and stored in a form allowing for computer-assisted data analysis. This data analysis generally focuses on describing and explaining patterns of political opinion and electoral behavior.

This Dataset

The data for this instructional package are drawn from the 2016 American National Election Study (ANES), sponsored by the University of Michigan and Stanford University. Funding for the 2016 ANES came from the National Science Foundation (NSF). The study interviewed more than 4,270 respondents before and after the election, but we have included only the 3,649 respondents who completed both pre-election and post-election interviews. The study designers were interested in how people respond to different kinds of surveys, so they designed the 2016 ANES to be conducted both through face-to-face interviews and through the Internet. Approximately 27 percent of the respondents were interviewed face-to-face, while the other 73 percent participated in a Web-based interview. Only a portion of all the information collected by the study is contained in this dataset, and the selected data have been prepared for instructional purposes.

Efficient data analysis requires that the data be recorded, coded, processed, and stored according to standard procedures. This involves representing all information by numeric codes. For example, the information that John Smith is an evangelical Protestant might be stored by recording a value of "2" (evangelical Protestant) on R12 (religion) for respondent "907" (John Smith). This numerically coded information is placed on a storage medium—such as a memory stick—allowing the data to be analyzed with the aid of a computer.


In order to use a dataset, a codebook is needed. The codebook describes the dataset by providing a list of all variables, an explanation of each variable, and a description of the possible values for each variable. The codebook also indicates how the data are stored and organized for use by the computer. A codebook can be thought of as a combination of a map and an index to the dataset.

Survey Sampling

Many people ask how it is possible to make any generalizations about the American public on the basis of a survey sample of 3,649 individuals. The truth of the matter is this: it is not possible to do so unless some methodical type of sampling scheme is used. If we just stood on a street corner and asked questions of the first 3,649 people who walked by, we could, of course, not draw any conclusions about the attitudes and opinions of the American public. If however, we have some kind of sampling scheme, a similar size sample can yield accurate generalizations about the American public.

A full explanation of the theory of survey sampling is beyond the scope of this instructional package. However, we can introduce some basics. The goal of any social science survey is to reduce the error in making generalizations about the population. Error can have two origins—systematic and random. The goal of proper sampling is to reduce both of these. Systematic error is much more serious than is random error, so we are especially concerned about it.

We can reduce error in a survey through good sampling.

The most basic form of sampling is the simple random sample. This involves drawing a sample from a list of all members of the population in such a way that everybody in the population has the same chance of being selected for the sample.

Simple random samples are not appropriate for many social science applications. Often, we want samples in which we are sure there are a similar number of subgroups (women, southerners, Latinos, etc.) in the sample as there are in the population. A simple random sample will not guarantee this. Stratified probability sampling comes closer to this guarantee.

Simple random samples are impractical in national face-to-face surveys for two main reasons:

  • There is no national list of American adults;
  • The sample would be scattered all over the U.S., making it very expensive to conduct face-to face interviews.

Therefore, a form of stratified cluster probability sampling is used in national face-to-face surveys.

The actual form of sampling depends on whether the interviews will be conducted in person or by telephone or in some other way such as on the Internet.

Sources of Error in Surveys

All errors can be of two different types—random errors and systematic errors. Random errors occur any time we seek to measure anything and often re-measuring the same thing will get a slightly different result—which is why carpenters say "measure twice, cut once." Systematic error is when the device we are using to do our measuring is badly calibrated—like a ruler with the first half-inch missing. Surveys can contain both types of errors.

We can typically deal effectively with random error by phrasing our conclusions in probabilistic terms rather than certainties. So we say, for example, that one position on an issue is favored by more people than the other but that the difference is within the margin of error for a sample of the size taken for this survey.

We also attempt to minimize systematic error by making sure that our questions are well worded, that our interviewers are well trained, and by adhering to proper norms of developing and conducting surveys, such as those developed by the American Association for Public Opinion Research (AAPOR).

Potential sources of error in national surveys include:

  • The sampling procedure itself. Since surveys are based on samples and not the entire population, there is a certain amount of random sampling error in any survey. For a properly constructed national sample with about 2000 respondents, the margin of error is around +/-2 percentage points. For the large sample of 4270 in the 2016 ANES, the margin of error should be about +/-1.5 percentage points, but the complex nature of the sample makes it difficult to use simple margin of error calculations;
  • Certain unavoidable systematic errors in the sample. For example, the ANES does not conduct face-to-face interviews in Alaska and Hawaii. Also, homeless people and those in penal or mental institutions are not sampled.
  • Survey nonresponse. This is the result of not being able to contact a potential respondent. If non-respondents differ from respondents, this can be a big problem;
  • Refusals to cooperate with the survey by potential respondents. As the number of surveys and polls has increased in recent years, respondents have displayed survey fatigue, and refusals have increased over time;
  • Lack of candor by respondents. This involves questions that have socially acceptable answers (Did you vote in the election?) or socially unacceptable answers (Did you cheat on your income tax last year?);
  • Inability of respondents to remember past behaviors. For many students of politics, it is difficult to believe that some people just don't remember who they voted for, or they remember it wrongly. ANES has found, for example, that after an election more people remember voting for the winner than actually voted for him or her. Other political behaviors are harder for people to recall (did you contact a public official about some matter in the past year?);
  • Respondents misconstruing survey questions as exams. This can result in them providing answers to questions that they really have not thought much about (what has been termed non-attitudes);
  • Badly trained interviewers. They may give respondents cues as to how to answer questions, or mis-record respondents' answers, or falsify data;
  • Errors in the preparation, coding, and processing. These can occur when entering the survey into a computer data file.

It is important to be aware of the potential sources of error when working with survey data. Small differences in percentages may be the result of error and so be virtually meaningless.

2016 ANES Data

The 2016 American National Election Study was conducted both face-to-face and via the Internet. For the face-to-face interviews, the interviewer read the questions from a laptop screen to the respondent and recorded his or her answers. For certain sensitive questions—such as household income—the interviewer handed the laptop to the respondent, who recorded his or her own answers, a process known as computer assisted self-interviewing. Great care was taken in identifying the sample, training interviewers, and in the conducting of the interviews. The care that ANES takes in conducting surveys results in data of a very high quality, but it also is expensive. Face-to-face surveys are more expensive to conduct than are telephone or Internet surveys because face-to-face surveys require interviewers to be sent out into the field. To ensure that face-to-face interviews are high quality, the field interviewers must be very highly trained, as there is little supervision when they are out of the office. Many researchers feel that face-to-face interviews yield "richer" data; the interviewer can ask a variety of follow-up questions, can spend adequate time making the respondent feel comfortable in answering sensitive questions, and can note any special circumstances about the interview that might have affected the answers given. Face-to-face interviews were conducted by Westat, with one interview conducted before the November election and one after the election. Some 1,181 respondents were interviewed face to face. Interviews were conducted in both English and Spanish.

Respondents for the Internet survey logged into a website where the questions were displayed and the respondent could answer them online. Some 3,090 respondents participated in the Internet survey. They also completed interviews before and after Election Day.

The response rate for the face-to-face 2016 ANES is approximately 50 percent while the response rate for the Internet survey was 44 percent. The dataset for this instructional package includes only respondents who were interviewed both before and after the election.

The data for this instructional module are weighted. Weighting a dataset is a technical procedure to correct the data for several basic factors. The goal of weighting is to produce a dataset that is more demographically representative of the population. When the data are weighted, some respondents count for more than one person (they have a sample weight greater than 1.0) and some count for less than one person (they have a sample weight less than 1.0). You need not be overly concerned about weighting, as the dataset is designed to be automatically weighted when you open it, and you will only sometimes notice that are you working with weighted data. Some small anomalies that you may occasionally notice will be the result of working with weighted data. More about Weighting.