2000 Florida Ballots Project (ICPSR 36207)

Published: Oct 22, 2015

Principal Investigator(s):
NORC at the University of Chicago; The New York Times; The Wall Street Journal; The Washington Post Company; Tribune Publishing; CNN; Associated Press; St. Petersburg Times; The Palm Beach Post


Version V1

In the United States presidential election of November 2000, approximately 180,000 ballots in Florida's 67 counties were uncertified because they failed to register a "valid" vote for president. These ballots included those in which no vote was recorded (undervotes) and those in which people voted for more than one candidate (overvotes). The 2000 Florida Ballots Project examined the undervotes and overvotes. The goal of the project was not to declare a "winner," but rather to carefully examine the ballots to assess the relative reliability of the three major types of ballot systems used in Florida. The results of this assessment may help state legislatures, other decision-makers, and developers of ballot systems to work toward more reliable ballot systems in the future.

This collection contains seven separate data sets. The first data set is the "Raw Data File" which contains one record for each ballot examined. In addition to ballot information, each record includes county name, FIPS code, ballot system and other identifying information. The unique identifier for each record is recorded in the variable BALNUM, and can be used to link the data sets. The second data set is the "Aligned Data File." This data set matches the Raw Data File with the exception of the variables associated with the candidates. All chad-level data (including chads that represent a particular candidate) are presented in the raw file. In the aligned data file, only those data that apply to candidate chads are included - data from three coding systems are contained in the same variable for each candidate. The third data set is the "Recode Data File." At random intervals, after coding a group of ballots, the coders were instructed to recode the same ballots as a check on intra-coder reliability (or consistency within a coder). These second codings are contained in the recode data file. The difference between variables in the recode data and file and the aligned data file is variables with the suffix C1, C2, or C3 in the aligned data has R1, R2, and R3, respectively, in the recode data. The fourth data file is the "Comment Data File." The comments data file is a ballot-level file containing all comments made by coders during the coding of ballots. The data file contains one record for each ballot for which at least one of the three coders recorded a comment; 5,407 ballots had at least one coder comment and are contained in this file. The fifth data file is the "Coder Demographic Data File." The Coder Demographic data file contains the results of a questionnaire given to each coder employed by NORC for the Florida Ballots Project. This file contains one record for each coder and includes information such as the sex, marital status, age, income level, ethnicity, and political affiliation of each coder. The ID field contains the identification number of the coder which can be used as a link to the raw and aligned data files. The sixth and seventh data sets are the "Orange County Raw Data File" and "Orange County Aligned Data File." These two data sets are identical to the structures of the raw and aligned data files, respectively. Each file has 417 records. These data files are being made available because the 966 undervotes and 1,383 overvotes reported by Orange County on election day (that ultimately informed the tally of certified totals) could not be segregated by county officials responsible for producing the ballots for NORC review. The NORC coders were initially shown only 640 undervotes and 1,197 overvotes. At the time of initial coding, more than 400 of the ballots rejected by machines on election day simply could not be distinguished from ballots that were accepted and certified on election day.

NORC at the University of Chicago, The New York Times, The Wall Street Journal, The Washington Post Company, Tribune Publishing, CNN, … The Palm Beach Post. 2000 Florida Ballots Project. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2015-10-22. https://doi.org/10.3886/ICPSR36207.v1

2001-02 -- 2001-05

This collection includes a zipped package containing supplemental data files on voting totals by county and precinct, and an accompanying instructional document.

Additional information about the 2000 Florida Ballots Project can be found at the 2000 Florida Ballots Project Web site.

The unique ballot identifier variable BALNUM is present in multiple data sets, and can be used to link or merge data sets for analysis.

Every voting system yields slightly different results with each pass through the ballots. The 2000 Florida Ballots Project used the extent of this variation to assess the relative reliability of the different systems used in Florida. The results of the assessment may help state and local governments improve their ability to assess the will of the voters, through selecting systems that count ballots with a high degree of reliability.

Each ballot was assessed by calibrating the variation from pass to pass of the various ballot systems against the results of a careful hand examination by trained observers. These observers noted in detail all aspects of the ballot that might have helped identify the voter intent. The project tried to minimize variation by using three-person teams of observers, each team member working independently, to classify each ballot into categories based on the varying interpretations Florida canvassing boards have confronted in manual recounts of machine-readable ballots.

The data was not sampled. Rather, the data was collected from the approximate 180,000 ballots in Florida during the 2000 presidential election that failed to register a "valid" vote.

Uncertified ballots for the 2000 United States presidential election in Florida.



observational data

coded on-site observation



2015-10-22 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Created variable labels and/or value labels.
  • Standardized missing values.
  • Checked for undocumented or out-of-range codes.

The data are not weighted, and no weight variables are present in the data set.


  • Data in this collection are available only to users at ICPSR member institutions.

