Survey Research Methods
The study of voting behavior generally relies on information from sample surveys. Aggregate election statistics from states or counties, another common type of election data, are useful for examining the patterns of election results, such as differences in the presidential vote among the fifty states, but such data are not suitable for an analysis that focuses on the individual voter. In order to investigate the factors that affect how people vote, we need information on individuals. Such information commonly includes data on voting behavior, attitudes and beliefs, personal characteristics, and so on. Since it is impractical to obtain this information for each member of the electorate, the common procedure is to draw a sample of people from the population and interview these individuals. Once collected, survey data are usually processed and stored in a form allowing for computer-assisted data analysis. This data analysis generally focuses on describing and explaining patterns of political opinion and electoral behavior.
The data for this instructional package are drawn from the 2008 American National Election Study (ANES), conducted by the Center for Political Research at The University of Michigan. These data are available for download as ICPSR Study Number 25383. Based on a very large sample (over 2,000 people), the study interviewed respondents both before and after the election. Only a portion of all the information collected by the study is contained in this dataset, and the selected data have been prepared especially for instructional purposes.
Efficient data analysis requires that the data be recorded, coded, processed, and stored according to standard procedures. Essentially, this involves representing all information by numeric codes. For example, the information that John Smith is an evangelical Protestant would be stored by recording a value of "2" (evangelical Protestant) on variable 179 (religious affiliation) for respondent "907" (John Smith). This numerically coded information is placed on a storage medium–such as a memory stick–allowing the data to be analyzed with the aid of a computer. In the past, many large surveys were analyzed with larger "mainframe" computers; nowadays, powerful microcomputers make it possible for data analysts to analyze data on personal computers.
In order to use a dataset, a codebook is needed. The codebook describes the dataset by providing a list of all variables, an explanation of each variable, and a description of the possible values for each variable. The codebook also indicates how the data are stored and organized for use by the computer. A codebook can thus be thought of as a combination of a map and an index to the dataset.
C. Survey Sampling
Many people ask how it is possible to make any generalizations about the American public on the basis of a survey sample of about 2000 individuals. The truth of the matter is this–it is not possible to do so unless some methodical type of sampling scheme is used. If we just stood on a street corner and asked questions of the first 2000 people who walked by, we could, of course, not draw any conclusions about the attitudes and opinions of the American public. If however, we have some kind of sampling scheme, a similar size sample can yield quite accurate generalizations about the American public.
A full explanation of the theory of survey sampling is beyond the scope of this instructional package. However, we can introduce some basics. The goal of any social science survey is to reduce the error in making generalizations about the population. Error can have two origins –systematic and random. The goal of proper sampling is to reduce both of these. Systematic error is much more serious than is random error, so we are especially concerned about it.
We can reduce error in a survey through good sampling.
The most basic form of sampling is the simple random sample. This involves drawing a sample from a list of all members of the population in such a way that everybody in the population has the same chance of being selected for the sample.
Simple random samples are not appropriate for most social science applications. Often, we want samples in which we are sure there are a similar number of sub-groups (women, Southerners, union members, etc.) in the sample as there are in the population. A simple random sample will not guarantee this. Stratified probability sampling comes closer to this guarantee.
Simple random samples are impractical in national surveys for two main reasons:
- There is no national list of American adults
- The sample would be scattered all over the US, making it very expensive to conduct face-to face interviews
Therefore, a form of stratified probability sampling is used in national surveys.
D. Sources of Error in Surveys
Potential sources of error in national surveys include:
- the sampling procedure itself. Since surveys are based on samples and not the entire population, there is a certain amount of random sampling error in any survey. For a properly constructed national sample such as the ANES with about 2000 respondents, the margin of error is less than ±3 percentage points;
- certain unavoidable systematic errors–the ANES does not interview in Alaska and Hawaii, for example;
- survey non-response as a result of not being able to contact a potential respondent;
- refusals to cooperate with the survey by potential respondents. As the number of surveys and polls has increased in recent years, respondents have displayed what has been called survey fatigue, and non-response has increased over time;
- lack of candor by respondents on questions that have socially acceptable answers (Did you vote in the election?) or socially unacceptable answers (Did you cheat on your income tax last year?);
- the inability of respondents to remember past behaviors (Who did you vote for in the last presidential election?);
- respondents misconstruing survey questions as exams and so providing answers to questions that they really have not thought much about (non-attitudes);
- badly trained interviewers who might give respondents cues as to how to answer questions, or who mis-record respondents' answers, or who falsify data, and so on;
- errors in the preparation, coding, and processing of the survey into a computer data file.
Where these and other errors are random in nature, they are annoying, but when the errors are systematic they can cause great trouble. We can reduce systematic error in a survey through proper training of interviewers and adherence to proper norms of developing and conducting surveys, such as those developed by the American Association for Public Opinion Research (AAPOR).
It is important to be aware of the potential sources of error when working with survey data. Small differences in percentages may be the result of error and so be virtually meaningless.
The 2008 American National Election Study was conducted entirely face-to-face utilizing Computer-Assisted Personal Interviewing (CAPI) technology (PDF).1 Great care was taken in identifying the sample, training interviewers, and in the actual conducting of the interviews. Each interviewer was given a laptop computer that had survey questionnaire software pre-installed so that the interviewer could enter the respondent's data as the interview proceeded. The care that ANES takes in conducting surveys results in data of a very high quality, but it also is expensive. Face-to-face surveys are more expensive to conduct than are telephone surveys since face-to-face surveys require interviewers to be sent out into the field while telephone interviewers can all sit in one room. To insure that face-to-face interviews are high quality, the field interviewers must be very highly trained since there is little supervision when they are out of the office. Many researchers feel that face-to-face interviews yield 'richer' data; the interviewer can ask a variety of follow-up questions, can spend adequate time making the respondent feel comfortable in answering sensitive questions, and can note any special circumstances about the interview that might have affected the answers given.
All respondents in the 2008 ANES were interviewed in the fall of 2008 by RTI International before and after the election. The response rate for the pre-election survey was approximately 60%. Some 2,323 people were interviewed before the election and 2,102 of these were successfully interviewed after the election. The dataset for this instructional package includes only the 2,102 respondents who were interviewed both before and after the election.
The data for this instructional module are weighted. Weighting a dataset is a technical procedure to correct the data for several basic factors. The goal of weighting is to produce a dataset that is more demographically representative of the population. When the data are weighted, some respondents count for more than one person (they have a sample weight greater than 1.0) and some count for less than one person (they have a sample weight less than 1.0). You need not be overly concerned about weighting, as the dataset is designed to be automatically weighted when you open it, and you will only sometimes notice that are you working with weighted data. Some small anomalies that you may occasionally notice will be the result of working with weighted data. This point is discussed in greater detail elsewhere on this website.
1For the 2008 survey, ANES allowed–for the first time–a portion of the study to be self-administered. This portion of the data are not utilized in the 2008 SETUPS but those interested in the topic can read more at http://www.electionstudies.org/conferences/duke/abstracts.html. [Site no longer available; this portion of the data is not utilized in the 2008 SETUPS.]