Survey Research Methods
The study of voting behavior generally relies on information from sample surveys. Aggregate election statistics from states or counties, another common type of election data, are useful for examining the patterns of election results, such as differences in the presidential vote among the 50 states, but such data are not suitable for an analysis that focuses on the individual voter. In order to investigate the factors that affect how people vote, we need information on individuals. Such information commonly includes data on voting behavior, attitudes and beliefs, personal characteristics, and so on. Since it is impractical to obtain this information for each member of the electorate, the common procedure is to draw a sample of people from the population and interview these individuals. Once collected, survey data are usually processed and stored in a form allowing for computer-assisted data analysis. This data analysis generally focuses on describing and explaining patterns of political opinion and electoral behavior.
The data for this instructional package are drawn from the 2004 National Election Study (NES), conducted by the Center for Political Research at The University of Michigan. Based on a very large sample (over 1,000 people), the study interviewed respondents both before and after the election. Only a portion of all the information collected by the study is contained in this dataset, and the selected data have been prepared especially for instructional purposes.
Efficient data analysis requires that the data be recorded, coded, processed, and stored according to standard procedures. Essentially, this involves representing all information by numeric codes. For example, the information that John Smith is an evangelical Protestant would be stored by recording a value of "2" (evangelical Protestant) on variable 145 (religious affiliation) for respondent "907" (John Smith). This numerically coded information is placed on a storage medium--such as a CD--allowing the data to be analyzed with the aid of a computer. In the past, many large surveys were analyzed with larger "mainframe" computers; nowadays, powerful microcomputers make it possible for data analysts to analyze data on personal computers.
In order to use a dataset, a codebook is needed. The codebook describes the dataset by providing a list of all variables, an explanation of each variable, and a description of the possible values for each variable. The codebook also indicates how the data are stored and organized for use by the computer. A codebook can thus be thought of as a combination of a map and an index to the dataset.
Many people ask how it is possible to make any generalizations about the American public on the basis of a survey sample of about 1,000 individuals. The truth of the matter is this-it is not possible to do so unless some methodical type of sampling scheme is used. If we just stood on a street corner and asked questions of the first 1,000 people who walked by, we could, of course, not draw any conclusions about the attitudes and opinions of the American public. If however, we have some kind of sampling scheme, a similar size sample can yield quite accurate generalizations about the American public.
A full explanation of the theory of survey sampling is beyond the scope of this instructional package. However, we can introduce some basics. The goal of any social science survey is to reduce the error in making generalizations about the population. Error can have two origins--systematic and random. The goal of proper sampling is to reduce both of these. Systematic error is much more serious than is random error, so we are especially concerned about it.
We can reduce error in a survey through good sampling.
The most basic form of sampling is the simple random sample. This involves drawing a sample from a list of all members of the population in such a way that everybody in the population has the same chance of being selected for the sample.
Simple random samples are not appropriate for most social science applications. Often, we want samples in which we are sure there are a similar number of subgroups (women, Southerners, union members, etc.) in the sample as there are in the population. A simple random sample will not guarantee this. Stratified probability sampling comes closer to this guarantee.
Simple random samples are impractical in national surveys for two main reasons:
There is no national list of American adults;
The sample would be scattered all over the US, making it very expensive to conduct face-to-face interviews.
Therefore, a form of stratified probability sampling is used in national surveys.
Sources of error in surveys
Potential sources of error in national surveys include:
The sampling procedure itself. Since surveys are based on samples and not the entire population, there is a certain amount of random sampling error in any survey. For a properly constructed national sample such as the NES with about 1,000 respondents, the margin of error is about +/-4 percentage points;
Certain unavoidable systematic errors--the NES does not interview in Alaska and Hawaii, for example;
Survey non-response as a result of not being able to contact a potential respondent;
Refusals to cooperate with the survey by potential respondents. As the number of surveys and polls has increased in recent years, respondents have displayed what has been called survey fatigue, and non-response has increased over time;
Lack of candor by respondents on questions that have socially acceptable answers (Did you vote in the election?) or socially unacceptable answers (Did you cheat on your income tax last year?);
The inability of respondents to remember past behaviors (Who did you vote for in the last presidential election?);
Respondents misconstruing survey questions as exams and so providing answers to questions that they really have not thought much about (non-attitudes);
Badly trained interviewers who might give respondents cues as to how to answer questions, or who misrecord respondents' answers, or who falsify data, and so on;
Errors in the preparation, coding, and processing of the survey into a computer data file.
Where these and other errors are random in nature, they are annoying, but when the errors are systematic they can cause great trouble. We can reduce systematic error in a survey through proper training of interviewers and adherence to proper norms of developing and conducting surveys, such as those developed by the American Association for Public Opinion Research (AAPOR).
It is important to be aware of the potential sources of error when working with survey data. Small differences in percentages may be the result of error and so be virtually meaningless.
The 2004 NES data
The 2004 National Election Study was conducted entirely face-to-face. Great care was taken in identifying the sample, training interviewers, and in the actual conducting of the interviews. Each interviewer was given a laptop computer that had survey questionnaire software pre-installed so that the interviewer could enter the respondent's data as the interview proceeded. The care that NES takes in conducting surveys results in data of a very high quality, but it also is expensive. Face-to-face surveys are more expensive to conduct than are telephone surveys since face-to-face surveys require interviewers to be sent out into the field while telephone interviewers can all sit in one room. To insure that face-to-face interviews are high quality, the field interviewers must be very highly trained since there is little supervision when they are out of the office. Many researchers feel that face-to-face interviews yield "richer" data; the interviewer can ask a variety of follow-up questions, can spend adequate time making the respondent feel comfortable in answering sensitive questions, and can note any special circumstances about the interview that might have affected the answers given.
The national sample for the NES is based on methodology developed by the Survey Research Center and the University of Michigan and the National Opinion Research Center at The University of Chicago. The sample was drawn by an area probability method that relies on US Census figures and maps of the country.
All respondents were interviewed in the fall of 2004 before and after the election. The response rate for the pre-election survey was approximately 66%, the highest it has been since 1992. Some 1,212 people were interviewed before the election and 1,066 of these were successfully interviewed after the election. The dataset for this instructional package includes only the 1,066 respondents who were interviewed both before and after the election.
The data for this instructional module are weighted. Weighting a dataset is a technical procedure to correct the data for several basic factors. The goal of weighting is to produce a dataset that is more demographically representative of the population. When the data are weighted, some respondents count for more than one person (they have a sample weight greater than 1.0) and some count for less than one person (they have a sample weight less than 1.0) You need not be overly concerned about weighting, as the dataset is designed to be automatically weighted when you open it, and you will only sometimes notice that are you working with weighted data. Some small anomalies that you may occasionally notice will be the result of working with weighted data. This point is discussed more in later sections.