Gates Millennium Scholars (GMS) Survey Data Cohort 5, United States, 2004-2009 (ICPSR 34439)

Version Date: Oct 1, 2019 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Bill & Melinda Gates Foundation


Version V4 ()

  • V4 [2019-10-01]
  • V3 [2019-04-01] unpublished
  • V2 [2018-02-15] unpublished
  • V1 [2013-04-05] unpublished
GMS Cohort 5, 2004-2009

In 1999, the Bill and Melinda Gates foundation started the Gates Millennium Scholars Program (GMS), a 20-year initiative which intends to expand access to higher education for high achieving, low-income minority students. In addition to its academic objectives, GMS also has the goal of creating future leaders in minority groups. The program is administered by the United Negro College Fund (UNCF). Awardees can receive the scholarship for up to 5 years as an undergraduate and 4 years as a graduate student. The scholarship is renewable through graduate school in math, science, engineering, library science, and education.

In order to see how GMS has impacted students and to know how to better prepare minority students for college, the Bill and Melinda Gates Foundation has commissioned a survey of recipients. Cohorts are composed of both recipients and non-recipients. Non-recipients are defined as individuals who were asked to go on to the scholar confirmation/verification phase, but did not become a scholar for one or more reasons.

Baseline, first follow-up, second follow-up survey, and longitudinal survey data have been collected from both recipients and non-recipients.

Bill & Melinda Gates Foundation. Gates Millennium Scholars (GMS) Survey Data Cohort 5, United States, 2004-2009 . Inter-university Consortium for Political and Social Research [distributor], 2019-10-01.

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Bill and Melinda Gates Foundation


Public and restricted versions of the data are included in this collection. Due to the sensitive nature of the restricted data, users will need to complete a Restricted Data Use Agreement before they can obtain the restricted version. These forms can be accessed on the download page associated with this dataset.

Inter-university Consortium for Political and Social Research
2004 -- 2009
2005 -- 2009
  1. Due to the timing of the end of field period and the project end, GMS Cohort 5 Freshman 2nd Follow-up survey dataset is limited compared to the other studies in this series in that there was no preliminary data cleaning conducted. More specifically, data from this cohort did not undergo the full coding, data cleaning, and weight estimation process that was standard to other cohorts in the Gates Millennium Scholars studies before depositing to ICPSR.

  2. This study is related to the other studies within the Gates Millennium Scholars (GMS) Survey Data series. For similar study information and characteristics, including administrative data applicable across cohorts, please refer to all studies within the data series.
  3. The questionnaire document is the exact same version for both the public (DS0001) and restricted (DS0002) datasets.

  4. Some variables in the data collection reference industry and occupation codes from the 1990 Census. No documentation was provided to ICPSR to supplement these codes, therefore users are encouraged to visit the United States Census Bureau website for additional information.

  5. For additional information on the Gates Millennium Scholars Program, please visit the GMS website.

The purpose of the Gates Millennium Scholars (GMS) Tracking and Longitudinal study is to gather data on the lives of scholars and selected non-recipients in order to analyze the effects on the educational, civic, and personal lives of selected sample members. The Gates Foundation hopes to generate research that will help improve education attainment and achievement of minority students.

This study was conducted using cross-sectional survey files and a longitudinal survey design. For this study, a cohort is defined as all Gates Millennium Scholars (GMS) scholars and a representative sample of non-recipients. Non-recipients are defined as individuals who were asked to go on to the scholar confirmation/verification phase, but did not become a scholar for one or more reasons. Each cohort will be asked to fill out a baseline survey and five follow-up surveys.

The baseline survey occurs after the first year out of high school when they have made the transition to college or the workforce. The first follow-up will correspond to their third year out of high school, which for many will be during their junior year in college. The second follow-up will occur at the end of the applicants' fifth year out of high school, which may mark the transition to graduate or professional school, or into the workforce.

Cross-sectional data collection contains the responses of approximately 1,000 scholarship recipients in the first year of Gates Millennium Scholars (GMS) and additional sampling of approximately 1,000 non-recipients. To be eligible, students had to meet several qualifications. They must:

  • (1) be of African American, American Indian/Alaska Native, Asian American, Hispanic/Latino, or Pacific Islander background;
  • (2) be full-time students entering college or university;
  • (3) have a GPA of at least 3.3 on a 4.0 scale;
  • (4) be eligible for Pell Grants; and
  • (5) be leaders in community service, extracurricular, or other activities.

A stratified sample design was used for non-recipients in order to enable powerful comparisons at the level of racial/ethnic groups between freshman and continuing undergraduate students. All Pell-Grant-in-place non-recipients were included in the samples for each racial/ethnic group. The remainder of cases was drawn from the pool of non-recipients without Pell Grants in place. The goal was to obtain 300 complete interviews from each racial/ethnic group except American Indians. Due to their small number, all non-recipients were included in the samples of both populations. The designs are likely to produce a design effect for non-recipients because the samples are not proportional to their representation in the population as a whole. This effect is likely to be offset by stratification. Therefore, by weighting the responses with respect to their relative response rates, the population estimates accurately reflect the proportion of the groups in the population.

Longitudinal: Cohort / Event-based, Cross-sectional

Population of American high school students graduating in the year of 2004 who meet the criteria established by the Gates Foundation as stated in the sampling.


National Opinion Research Center (NORC) collected data to be delivered and produced by the Bill and Melinda Gates Foundation, 2005-2009.

See Variables Section corresponding to Cohort 5 Year Codebook. The survey included questions that address the topics of

  • (a) social, cultural, linguistic, economic background;
  • (b) race/ethnicity and gender patterns;
  • (c) high school preparation and experiences;
  • (d) the role of financial aid;
  • (e) college choice;
  • (f) major choice;
  • (g) engagement and leadership in college;
  • (h) academic achievement, persistence, and completions;
  • (i) graduate education plans;
  • (j) career choice and transition to the workplace; and
  • (k) democratic values and leadership after college.

Please refer to the final report for response rate information.

Several Likert Scales



2019-10-01 The collection has been updated to correct an error in the previously released data.

2019-04-01 The surveys administered at Baseline, 1st Follow-Up, and 2nd Follow-Up were merged together into a single file for cohort 5. There is now a single merged public-use file and a single merged restricted-use file. Non-cognitive scores were removed from this study and will be added back into other data forthcoming in the Gates Millennium Scholars (GMS) series. The data and associated documentation were updated to current curation standards.


Cohort 5 (2009) 2nd Follow-Up Survey (Public and Restricted) have been added to this study. The original datasets (5 and 6) were incorrectly labeled and have been corrected to say Freshman Cohort 5 (2005, 2007) Longitudinal Survey (Baseline, 1st Follow-Up).

2018-02-15 The citation of this study may have changed due to the new version control system that has been implemented. The previous citation was:
  • Bill & Melinda Gates Foundation. Gates Millennium Scholars (GMS) Survey Data Cohort 5, United States, 2004-2009 . ICPSR34439-v4. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2019-10-01.

2013-04-05 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Created variable labels and/or value labels.
  • Standardized missing values.
  • Created online analysis version with question text.
  • Performed recodes and/or calculated derived variables.
  • Checked for undocumented or out-of-range codes.

In this study, the entire population of scholars and the entire population of non-recipients were asked to participate in the survey, so weights were only needed to compensate for sub-population differences in non-response to the survey. The response rates for non-recipients are weighted to account for differences in the selection probabilities of the sample members. No such adjustment was necessary for the scholars since all scholars were selected for the survey with certainty. For the cross-sectional datasets (baseline and first follow-up), the weight variables are named BL_R1_WGT and FU1_R2_WGT respectively.



  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.

RCMD logo

This study is provided by Resource Center for Minority Data (RCMD).