Version Date: Jun 14, 2024 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse;
United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products
Series:
https://doi.org/10.3886/ICPSR36231.v39
Version V39 (see more versions)
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco.
45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent.
At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Unit (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort.
At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort.
Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts.
Dataset 0002 (DS0002) contains the data from the State Design Data. This file contains 7 variables and 82,139 cases. The state identifier in the State Design file reflects the participant's state of residence at the time of selection and recruitment for the PATH Study.
Dataset 1011 (DS1011) contains the data from the Wave 1 Adult Questionnaire. This data file contains 2,021 variables and 32,320 cases. Each of the cases represents a single, completed interview.
Dataset 1012 (DS1012) contains the data from the Wave 1 Youth and Parent Questionnaire. This file contains 1,431 variables and 13,651 cases.
Dataset 1411 (DS1411) contains the Wave 1 State Identifier data for Adults and has 5 variables and 32,320 cases. Dataset 1412 (DS1412) contains the Wave 1 State Identifier data for Youth (and Parents) and has 5 variables and 13,651 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state Federal Information Processing System (FIPS), state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 1, which is also their state of residence at the time of recruitment.
Dataset 1611 (DS1611) contains the Tobacco Universal Product Code (UPC) data from Wave 1. This data file contains 32 variables and 8,601 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 1. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 1.
Dataset 1801 (DS1801) contains Location Characteristics for Wave 1 Adults. This data file contains 4 variables and 32,320 cases.
Dataset 1802 (DS1802) contains Location Characteristics for Wave 1 Youth. This data file contains 4 variables and 13,651 cases.
Dataset 1901 (DS1901) contains Study Research Derived Variables for Wave 1 Adults created by PATH Study analysts. This data file contains 104 variables and 32,320 cases.
Dataset 1902 (DS1902) contains Study Research Derived Variables for Wave 1 Youth created by PATH Study analysts. This data file contains 89 variables and 13,651 cases.
Dataset 2011 (DS2011) contains the data from the Wave 2 Adult Questionnaire. This data file contains 2,421 variables and 28,362 cases. Of these cases, 26,447 also completed a Wave 1 Adult Questionnaire. The other 1,915 cases are "aged-up adults" having previously completed a Wave 1 Youth Questionnaire.
Dataset 2012 (DS2012) contains the data from the Wave 2 Youth and Parent Questionnaire. This data file contains 1,596 variables and 12,172 cases. Of these cases, 10,081 also completed a Wave 1 Youth Questionnaire. The other 2,091 cases are "aged-up youth" having previously been sampled as "shadow youth."
Dataset 2411 (DS2411) contains the Wave 2 State Identifier data for Adults and has 5 variables and 28,362 cases. Dataset 2412 (DS2412) contains the Wave 2 State Identifier data for Youth and Parents and has 5 variables and 12,172 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 2.
Dataset 2611 (DS2611) contains the Tobacco Universal Product Code (UPC) data from Wave 2. This data file contains 32 variables and 7,295 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 2. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 2.
Dataset 2801 (DS2801) contains Location Characteristics for Wave 2 Adults. This data file contains 4 variables and 28,362 cases.
Dataset 2802 (DS2802) contains Location Characteristics for Wave 2 Youth. This data file contains 4 variables and 12,172 cases.
Dataset 2901 (DS2901) contains Study Research Derived Variables for Wave 2 Adults created by PATH Study analysts. This data file contains 178 variables and 28,362 cases.
Dataset 2902 (DS2902) contains Study Research Derived Variables for Wave 2 Youth created by PATH Study analysts. This data file contains 123 variables and 12,172 cases.
Dataset 3011 (DS3011) contains the data from the Wave 3 Adult Questionnaire. This data file contains 2,359 variables and 28,148 cases. Of these cases, 26,241 are continuing adults having completed a prior Adult Questionnaire. The other 1,907 cases are "aged-up adults" having previously completed a Youth Questionnaire.
Dataset 3012 (DS3012) contains the data from the Wave 3 Youth and Parent Questionnaire. This data file contains 1,492 variables and 11,814 cases. Of these cases, 9,769 are continuing youth having completed a prior Youth Interview. The other 2,045 cases are "aged-up youth" having previously been sampled as "shadow youth."
Datasets 3111, 3211, 3112, and 3212 (DS3111, DS3211, DS3112, and DS3212) are data files comprising the weight variables for Wave 3. The weight variables for Wave 1 and Wave 2 are included in the main data files. However, starting with Wave 3, the weight variables have been separated into individual data files. The "all-waves" weight files contain weights for respondents who completed an interview for all waves in which they were old enough to do so or verified their information with the study for waves in which they were not old enough to be interviewed. The "single-wave" weight files contain weights for all respondents in Wave 3 regardless of their participation in previous waves.
Dataset 3503 (DS3503) contains data derived from responses to Wave 1-3 questionnaires indicating if participants had ever/never used various tobacco products as of the Wave 3 study period. This data file contains 25 variables for all 53,178 study participants as of Wave 3. This file is provided for reference only to simplify the definitions of tobacco use variables in the Adult and Youth data files for subsequent waves.
Dataset 3411 (DS3411) contains the Wave 3 State Identifier data for Adults and has 5 variables and 28,148 cases. Dataset 3412 (DS3412) contains the Wave 3 State Identifier data for Youth and Parents and has 5 variables and 11,814 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 3.
Dataset 3611 (DS3611) contains the Tobacco Universal Product Code (UPC) data from Wave 3. This data file contains 32 variables and 6,768 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 3. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 3.
Dataset 3801 (DS3801) contains Location Characteristics for Wave 3 Adults. This data file contains 4 variables and 28,148 cases.
Dataset 3802 (DS3802) contains Location Characteristics for Wave 3 Youth. This data file contains 4 variables and 11,814 cases.
Dataset 3901 (DS3901) contains Study Research Derived Variables for Wave 3 Adults created by PATH Study analysts. This data file contains 107 variables and 28,148 cases.
Dataset 3902 (DS3902) contains Study Research Derived Variables for Wave 3 Youth created by PATH Study analysts. This data file contains 88 variables and 11,814 cases.
Dataset 4001 (DS4001) contains the data from the Wave 4 Adult Questionnaire. This data file contains 2,504 variables and 33,822 cases. Of these cases, 25,857 are continuing adults having completed a prior Adult Questionnaire, 1,900 are "aged-up adults" having previously completed a Youth Questionnaire, and 6,065 are "replenishment sample adults" (also known as "new cohort adults" in the annotated instrument).
Dataset 4002 (DS4002) contains the data from the Wave 4 Youth and Parent Questionnaire. This data file contains 1,600 variables and 14,798 cases. Of these cases, 9,365 are continuing youth having completed a prior Youth Interview, 1,694 cases are "aged-up youth" having previously been sampled as "shadow youth," and 3,739 are "replenishment sample youth" (also known as "new cohort youth" in the annotated instrument).
Datasets 4111, 4211, 4321, 4112, 4212, and 4322 (DS4111, DS4211, DS4321, DS4112, DS4212, and DS4322) are data files comprising the weight variables for Wave 4. In Wave 4, the weight variables have been separated into individual data files corresponding to the Wave 1 and Wave 4 Cohorts and different weight types. The "all-waves" weight files contain weights for those Wave 1 Cohort respondents who completed an interview for all waves in which they were old enough or verified their information for waves in which they were not old enough to be interviewed. The "single-wave" weight files contain weights for Wave 1 Cohort respondents at Wave 4 who completed an interview at Wave 1, regardless of their participation in previous waves. The "cross-sectional" weight files contain weights for all respondents in the Wave 4 Cohort.
Dataset 4401 (DS4401) contains the Wave 4 State Identifier data for Adults and has 5 variables and 33,822 cases. Dataset 4402 (DS4402) contains the Wave 4 State Identifier data for Youth and Parents and has 5 variables and 14,798 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 4. For adults and youth from the replenishment sample, the values also represent state of residence at the time of recruitment.
Dataset 4503 (DS4503) contains data derived from responses to Wave 1-4 questionnaires, indicating if participants had ever/never used various tobacco products as of the Wave 4 data collection period. This data file contains 27 variables for all 67,276 study participants as of the Wave 4 data collection. This file is provided for reference only to simplify the definitions of tobacco use variables in the Adult and Youth data files for subsequent waves.
Dataset 4601 (DS4601) contains the Tobacco Universal Product Code (UPC) data from Wave 4. This data file contains 32 variables and 7,684 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 4. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 4.
Dataset 4801 (DS4801) contains Location Characteristics for Wave 4 Adults. This data file contains 4 variables and 33,822 cases.
Dataset 4802 (DS4802) contains Location Characteristics for Wave 4 Youth. This data file contains 4 variables and 14,798 cases.
Dataset 5001 (DS5001) contains the data from the Wave 5 Adult Questionnaire. This data file contains 2,606 variables and 34,309 cases. Of these cases, 29,876 are continuing adults having completed a prior Adult Questionnaire and 4,433 are "aged-up adults" having previously completed a Youth Questionnaire.
Dataset 5002 (DS5002) contains the data from the Wave 5 Youth and Parent Questionnaire. This data file contains 1,776 variables and 12,098 cases. Of these cases, 10,446 are continuing youth having completed a prior Youth Interview and 1,652 cases are "aged-up youth" having previously been sampled as "shadow youth."
Datasets 5111, 5112, 5211, 5212, 5221, 5222, 5711, 5712, 5721, and 5722 (DS5111, DS5112, DS5211, DS5212, DS5221, DS5222, DS5711, DS5712, DS5721, and DS5722) are data files comprising the weight variables for Wave 5. In Wave 5, the weight variables are in individual data files corresponding to the Wave 1 and Wave 4 Cohorts and different weight types. The "all-waves" weight files contain weights for those Wave 1 Cohort participants who completed a Wave 5 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, and 4.
There are two separate sets of files with "single wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight files for the Wave 1 Cohort contain weights for participants who completed an interview in Wave 1 and in Wave 5, regardless of their participation in the intervening waves. The "single-wave" weight files for the Wave 4 Cohort contain weights for all Wave 5 interview respondents in the Wave 4 Cohort.
There are also two separate sets of files with "special collection all-waves" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "special collection all-waves" weight files for the Wave 1 Cohort contain weights for participants who completed a Wave 5 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, 4, and the special collection in Wave 4.5. The "special collection all-waves" weight files for the Wave 4 Cohort contain weights for participants who completed a Wave 5 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Wave 4 and the special collection in Wave 4.5.
Dataset 5401 (DS5401) contains the Wave 5 State Identifier data for Adults and has 5 variables and 34,309 cases. Dataset 5402 (DS5402) contains the Wave 5 State Identifier data for Youth and Parents, and has 5 variables and 12,098 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 5.
Dataset 5503 (DS5503) contains data derived from responses to Wave 1-5 (including Wave 4.5) questionnaires indicating if participants had ever/never used various tobacco products as of the Wave 5 data collection period. This data file contains 26 variables for all 67,276 study participants as of the Wave 5 data collection. This file is provided for reference only to simplify the definitions of tobacco use variables in the Adult and Youth data files for subsequent waves.
Dataset 5601 (DS5601) contains the Tobacco Universal Product Code (UPC) data from Wave 5. This data file contains 33 variables and 6,678 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 5. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 5.
Dataset 5801 (DS5801) contains Location Characteristics for Wave 5 Adults. This data file contains 4 variables and 34,309 cases.
Dataset 5802 (DS5802) contains Location Characteristics for Wave 5 Youth. This data file contains 4 variables and 12,098 cases.
Dataset 6001 (DS6001) contains the data from the Wave 6 Adult Questionnaire. This data file contains 2,935 variables and 30,516 cases
Of these cases, 28,852 are continuing adults having completed a prior Adult Questionnaire and 1,664 are "aged-up adults" having previously completed a Youth Questionnaire.
Dataset 6002 (DS6002) contains the data from the Wave 6 Youth and Parent Questionnaire. This data file contains 2,080 variables and 5,652 cases. Of these cases, 5,622 are continuing youth having completed a prior Youth Interview and 60 cases are "aged-up youth" having previously been sampled as "shadow youth."
Datasets 6111, 6112, 6121, 6122, 6211, 6212, 6221, 6222, 6711, 6712, 6721, and 6722 (DS6111, DS6112, DS6121, DS6122, DS6211, DS6212, DS62221, DS6222, DS6711, DS6712, DS6721, and DS6722) are data files comprising the weight variables for Wave 6. In Wave 6, the weight variables are in individual data files corresponding to the Wave 1 and Wave 4 Cohorts and different weight types. There are two separate sets of files with "all-waves" weights: one for the Wave 1 Cohort and one of the Wave 4 Cohort. The "all-waves" weight files for the Wave 1 Cohort contain weights for participants who completed a Wave 6 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, 4, and 5. The "all-waves" weight files for the Wave 4 Cohort contain weights for participants who completed a Wave 6 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 4 and 5.
There are two separate sets of files with "single-wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight files for the Wave 1 Cohort contain weights for participants who completed an interview in Wave 1 and in Wave 6, regardless of their participation in the intervening waves. The "single-wave" weight files for the Wave 4 Cohort contain weights for participants who completed an interview in Wave 4 and in Wave 6, regardless of their participation in the intervening waves
There are also two separate sets of files with "special collection all-waves" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "special collection all-waves" weight files for the Wave 1 Cohort contain weights for participants who completed a Wave 6 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, 4, 5, and the special collections in Wave 4.5, and Wave 5.5 or PATH-ATS. The "special collection all-waves" weight files for the Wave 4 Cohort contain weights for participants who completed a Wave 6 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 4 and 5, and the special collections in Wave 4.5, and Wave 5.5 or PATH-ATS.
Dataset 6401 (DS6401) contains the Wave 6 State Identifier data for Adults and has 5 variables and 30,516 cases. Dataset 6402 (DS6402) contains the Wave 6 State Identifier data for Youth and Parents, and has 5 variables and 5,652 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 6.
Dataset 6503 (DS6503) contains data derived from responses to questionnaires in Waves 1-6 (including the special collections in Wave 4.5, Wave 5.5, and PATH-ATS) indicating if participants had ever/never used various tobacco products as of the Wave 6 data collection period. This data file contains 24 variables for all 67,276 study participants as of the Wave 6 data collection. This file is provided for reference only to simplify the definitions of tobacco use variables in the Adult and Youth data files for subsequent waves.
Dataset 6601 (DS6601) contains the Tobacco Universal Product Code (UPC) data from Wave 6. This data file contains 53 variables and 5,408 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 6. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 6.
Dataset 7001 (DS7001) contains the data from the Wave 7 Adult Questionnaire. This data file contains 3,221 variables and 30,801 cases. Of these cases, 27,258 are continuing adults having completed a prior Adult Questionnaire, 1,740 are "aged-up adults" having previously completed a Youth Questionnaire, and 1,803 are "replenishment sample adults" (also known as "new cohort adults" in the annotated instrument).
Dataset 7002 (DS7002) contains the data from the Wave 7 Youth and Parent Questionnaire. This data file contains 2,171 variables and 10,834 cases. Of these cases, 3,512 are continuing youth having completed a prior Youth Interview, 1 case is an "aged-up youth" having previously been sampled as "shadow youth," and 7,321 are "replenishment sample youth" (also known as "new cohort youth" in the annotated instrument).
Datasets 7111, 7112, 7121, 7122, 7211, 7212, 7221, 7222, 7331, 7332, 7711, 7712, 7721, and 7722 (DS DS7111, DS7112, DS7121, DS7122, DS7211, DS7212, DS7221, DS7222, DS7331, DS7332, DS7711, DS7712, DS7721, and DS7722) are data files comprising the weight variables for Wave 7. In Wave 7, the weight variables are in individual data files corresponding to the Wave 1, Wave 4, and Wave 7 Cohorts and different weight types.
There are two separate sets of files with "all-waves" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "all-waves" weight files for the Wave 1 Cohort contain weights for participants who completed a Wave 7 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, 4, 5, and 6. The "all-waves" weight files for the Wave 4 Cohort contain weights for participants who completed a Wave 7 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 4, 5, and 6.
There are two separate sets of files with "single-wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight files for the Wave 1 Cohort contain weights for participants who completed an interview in Wave 1 and in Wave 7, regardless of their participation in the intervening waves. The "single-wave" weight files for the Wave 4 Cohort contain weights for participants who completed an interview in Wave 4 and in Wave 7, regardless of their participation in the intervening waves.
There are also two separate sets of files with "special collection all-waves" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "special collection all-waves" weight files for the Wave 1 Cohort contain weights for participants who completed a Wave 7 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, 4, 5, 6, and the special collections in Wave 4.5, and Wave 5.5 or PATH-ATS. The "special collection all-waves" weight files for the Wave 4 Cohort contain weights for participants who completed a Wave 7 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 4, 5, 6, and the special collections in Wave 4.5, and Wave 5.5 or PATH-ATS.
The "cross-sectional" weight files contain weights for all respondents in the Wave 7 Cohort.
Dataset 7401 (DS6401) contains the Wave 7 State Identifier data for Adults and has 5 variables and 30,801 cases. Dataset 7402 (DS7402) contains the Wave 7 State Identifier data for Youth and Parents, and has 5 variables and 10,834 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 7.
Dataset 7601 (DS7601) contains the Tobacco Universal Product Code (UPC) data from Wave 7. This data file contains 53 variables and 4,533 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 7. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used by these respondents at the time of Wave 7.
Each case in an Adult data file represents a single, completed interview. Each case in a Youth data file represents one youth and his or her parent's responses about that youth. Parents who provided permission for their child to participate in a Youth Interview were asked to complete a brief interview about their child. In both waves of data collection, less than 0.5 percent of the parents did not complete an interview. Most questions are asked about the child.
When multiple youth from the same household were selected to be in the study, the parent(s) completed separate interviews about each youth. If one parent completed two or more interviews, that parent only answered questions about himself/herself once. Those questions were then skipped in the subsequent interview(s) for the other child(ren) and the responses duplicated in that child(ren)'s data file(s).
Export Citation:
Census Region; Census Division; State
Users are reminded that these data are to be used solely for statistical analysis and reporting of aggregated information, and not for the investigation of specific individuals or organizations.
Access to these data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement. Data are provided via ICPSR's Virtual Data Enclave (VDE). Apply for access to these data through the ICPSR VDE portal. Information and instructions are available within the data portal. For further assistance please reference the VDE Guide to learn about the application process, about using the VDE, and how to request disclosure review of VDE output.
Data are provided via ICPSR's Virtual Data Enclave (VDE) where researchers will work with data stored on secure ICPSR servers. Researchers will not possess actual physical copies of the data; however, they may request permission to access selected output outside the virtual environment after review by ICPSR. See the Access Notes to apply for access. Researchers are also encouraged to read the VDE Guide.
The data files contain person-level (PERSONID) and household-level identification (R0#_HHID) variables allowing linkage of people within a file, between Adult and Youth/Parent files, and across waves of data collection. The values in these two variables are random and contain no direct or indirect personally identifiable information. Please review Appendix G in the Restricted-Use Files User Guide for information and programming code on linking files together. The files are sorted by the variable PERSONID.
ICPSR attempted to duplicate all information contained in the questionnaires into the question text used in the codebooks. Some of the longer programming instructions were not incorporated into the question text. In these cases, the question text includes a note for the user to read the full programming instructions in the corresponding section of the questionnaire. Derived and imputed variables contain the algorithms used in the creation of these variables. Users are advised to refer to the Restricted-Use Files User Guide and annotated questionnaires when reviewing the codebooks.
Some variables were withheld to limit the release of information that is a potential risk for disclosure. These variables are listed in Appendix E in the Restricted-Use Files User Guide.
The Youth Interview and Parent Interview questionnaires were distinct and separate questionnaires used in data collection. However, for each wave, both instruments have been combined into a single document since the responses to these instruments are also combined into a single data file.
Both the Adult and Youth questionnaires in each wave include several questions about tobacco brands and products the respondent usually uses and most recently used. For each question, a list of response options was displayed on the computer screen for the respondent to select. For many major brands and products, the displayed list included both a text label and a thumbnail image of the brand logo or product package. The displayed list was different for each of the tobacco product types with the brands and products listed being those that were known to exist for the specific tobacco product type. Because these lists are long, they are not provided in a frequency table for each variable in the codebook or in the annotated instrument. For convenience, both the Adult and Youth/Parent codebooks contain an appendix with a frequency table of the top 20 responses for each variable. The PATH Study Master Tobacco Brand and Product Code Guide is available as an Excel workbook file [Documentation.xlsx (Tobacco_Brand)]. The spreadsheets in this Excel workbook file are protected and may not be edited. However, the last spreadsheet contains filters to narrow the complete list. This spreadsheet is the master file of all brand and product responses for these questions from all waves, including any responses that were not in the list of options displayed to the respondent.
The PATH Study Adult Variable Crosswalk and the PATH Study Youth Variable Crosswalk are also available as Excel workbook files. The spreadsheets in these Excel workbook files are protected and may not be edited. These crosswalk files are auxiliary files that can be used in conjunction with the annotated instruments and codebooks to quickly compare content across waves. The crosswalk files link questions across waves for each respective instrument so that users can easily identify the number of waves or time points at which data for specific questions are available for analyses.
In the Wave 1 and Wave 2 Youth/Parent files, the last section of the questionnaire contains demographic and health history questions. A few of the questions were asked of all youth. However, most questions were only asked of emancipated youth. The responses to these questions for non-emancipated youth were coded as "Inapplicable". The questionnaire and codebook note which variables were asked only of emancipated youth. Conversely, in the Parent Interview section the same questions were asked of parents of all sampled youth except for the emancipated youth. In this section the cases for emancipated youth were coded as "Inapplicable". There are a small number of emancipated youth in Waves 3 and 4, but there are no individual questions asked exclusively of emancipated youth.
In both the Adult and Youth/Parent data files, several groups of variables contain the word "RANDOM" in both the variable name and label. This indicates computerized randomization of the question order. These "RANDOM" variables detail the order in which the questions were asked of a particular respondent.
The Wave 1 data files contain 20 variable triplets pertaining to tobacco advertising. The computer randomly selected 20 advertisements and then asked the respondents whether they had seen the ad and whether they liked the ad. The Image ID variable (_AD) identifies the advertisement that was displayed to the respondent to characterize the ad, e.g., the tobacco product and brand. However, vendors did not grant permission to publicly release the actual .jpg and .bmp files containing the images seen by respondents.
Derived and imputed (if present) demographic variables (age, sex, Hispanic ethnicity, and race) are included near the end of the data file. An accompanying imputation flag variable is also included. These variables are distinguished by the variable name starting with "R0#R" and contain the word "DERIVED" or "IMPUTED" in the variable label. Imputed variables are only available on the Wave 1 and Wave 4 data files.
Within the "Derived and Imputed Variables" section of the codebooks of the Adult and Youth/Parent files for Wave 1 and Wave 4 only are two geographic variables - Census Region and Census Division. For Wave 1 there are additional variables to designate urban areas and Census Block characteristics.
The Location Characteristics Restricted Use Files (LCRUF) are a set of auxiliary files that provide location-based information for the geographical areas in which respondents lived at the time of their interviews for each wave. The LCRUF for each wave consists of two data files: one for adults and one for youth. The adult LCRUF has one record for every Adult Interview completed in the given wave, and the youth LCRUF has one record for every Youth Interview completed in the given wave. Each file includes locale classifications identifying urban, suburban, town, and rural areas based on the 2021 definitions compiled by the National Center for Education Statistics (NCES). Also included are variables that identify the level of accuracy for matching geocoded information with NCES locale codes. The LCRUF Coding Documentation provides additional information on the geocoding and matching processes. Information regarding usage of these data is available in the Restricted-Use Files User Guide.
All Adult and Youth/Parent data files contain additional derived variables. These variables can be distinguished by the variable name starting with "R0#R" and contain the word "DERIVED" in the variable label. There are several variables for each tobacco category to identify certain classes of respondents with current and former tobacco uses.
The Study Research Derived Variables Restricted Use Files (SRDV-RUF) are a set of auxiliary files that provide some derived variables used in tables, published papers, presentations, and other analyses that have been made public by the PATH Study through May 31,2020. The objective of the SRDV-RUF is to provide access to these derived variables for use by other researchers. The SRDV-RUF for each wave consists of two data files: one for adults and one for youth. The adult SRDV-RUF has one record for every Adult Interview completed in the given wave, and the youth SRDV-RUF has one record for every Youth Interview completed in the given wave. Each file includes a set of variables with complex derivations. For example, some variables include assumptions about the meaning of skip patterns, and others involve combining the data into standardized indexes. In addition, an SRDV-RUF Supplement is also available. The Supplement (Excel) includes descriptions and algorithms for additional derived variables used for research purposes by the PATH Study but not included in the SRDV-RUF data files. It also includes citations for published works in which these derived variables were used. The derived variables are linked to citations using ArticleID, which is present in both the codebooks and the SRDV-RUF Supplement. This Supplement is available for reference so that researchers may use the algorithms to derive variables for their own purposes. SRDV-RUF data are available for Waves 1, 2, and 3.
In accordance with the study's informed consent, information is suppressed about individuals who withdrew from the PATH Study. Their information was recoded to a special missing value, designated as -97777.
Consent forms provided to and signed by the respondents for the various types of interviews conducted and biological samples collected are included with Wave 1 and Wave 4 files (Informed Consent forms used for Wave 1 and the Wave 4 Informed Consent form is provided with the Wave 4 files). Participants provide consent at their initial interview and biological sample collection; consents remain in effect for all subsequent waves. Aged-up adults who responded to a Youth interview in a previous wave are re-consented as an adult at the time of their first interview.
The Nonresponse Bias Analysis Report for Wave 1 details the response rates and the potential for bias from nonresponse. There are also Nonresponse Bias Analysis Reports for Wave 2, Wave 3, Wave 4, Wave 5, and Wave 6.
The Informed Consent Document and Nonresponse Bias Analysis Reports are specific to each wave. The same files are available as documentation for both the Adult and Youth/Parent data.
The questionnaires in this collection are updated versions of the fielded questionnaires that were annotated for analytic purposes. Spanish versions are also available.
The PATH Study's documentation is available for your use and may be reproduced in whole or in part without permission from NIH's National Institute on Drug Abuse or FDA's Center for Tobacco Products. Citation of the source is appreciated.
Additional background information including answers to frequently asked questions for study participants and researchers can be found in the Researchers section of the PATH Study Series page.
The Restricted-Use Files User Guide provides an overview of the entire PATH Study. The guide covers topics such as sample design, data collection, weighting, response rates, analytic considerations, and programming syntax to run common statistics and link the files together. Researchers should feel free to use the information in the User Guide for their publication and the guide should be cited as follows:
The data for the PATH Study was collected and prepared by Westat. The contract number under which they performed their work for Waves 1 through 3 is HHSN271201100027C. Work for Waves 4, 5, 6, and 7 was performed under contract number HHSN271201600001C.
The Population Assessment of Tobacco and Health (PATH) Study is a nationally representative longitudinal cohort study on tobacco use behavior, attitudes and beliefs, and tobacco-related health outcomes among adults and youth in the United States. The study's primary objectives are to:
At Wave 1, the study sampled over 150,000 mailing addresses which, using a four-staged stratified sampling design, yielded a sample of 45,971 respondents (32,320 adults / 13,651 youth) who completed a Wave 1 interview. People who use or do not use tobacco who were at least 9 years old living in a civilian, non-institutionalized setting were considered for participation during Wave 1. Youth who turn 18 by the next wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) are considered "aged-up youth" upon turning 12 years old when they are asked to join the study. These 53,178 participants form the Wave 1 Cohort.
At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from close to 174,000 mailing addresses not selected for Wave 1, in the same sampled PSUs and segments using similar within-household sampling procedures. To meet the needs for the Wave 4 Cohort shadow sample, a randomly selected subset of the sampled addresses (115,500 or close to two-thirds of the addresses) were screened solely to identify shadow youth ages 10 to 11. The remaining addresses (close to 58,500) were screened for adults, youth, and shadow youth ages 10 to 11. These are referred to as the "SO" (shadow youth only) and "AYS" (adults, youth, and shadow youth) replenishment samples, respectively. This replenishment sample was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort.
At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from close to 244,000 mailing addresses not selected for Wave 1 or Wave 4, in the same sampled PSUs and segments using similar within-household sampling procedures. To meet the needs for the Wave 7 youth sample and the need for a Wave 7 Cohort shadow sample, the address sample was randomly divided into three subsamples. A subset of about 111,500 addresses (or close to 45 percent) were screened solely to identify youth ages 9 to 14; another subset of about 97,000 addresses (or close to 40 percent) were screened to identify youth ages 9 to 17. The remaining addresses (close to 36,000) were screened for adults, youth, and shadow youth ages 9 to 11. These subsamples are referred to as the "YYO" (young youth only ages 9 to 14), "YO" (youth only ages 9 to 17) and "AYS" (adults ages 18 and above, youth ages 12 to 17, and shadow youth ages 9 to 11) replenishment samples, respectively. This replenishment sample was combined for estimation and analysis purposes with Wave 7 adult and youth respondents from the Wave 1 and Wave 4 Cohorts who were at least age 15 and in the civilian,and, in the civilian, noninstitutionalized population at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort.
The Adult files contain a single record for every adult who completed an interview in the wave. The Youth/Parent files contain a single record of every youth who completed an interview in a given wave. Parents who provided permission for their child to complete a Youth Interview were asked to complete a brief Parent Interview that contained questions about parental supervision, school performance, and tobacco use by youth. The Parent Interview is primarily an interview about the child(ren), not the parent. Almost all youth respondents had a parent or guardian complete the Parent Interview (over 99.0 percent). When multiple youth from the same household were selected to be in the study, the parent(s) completed separate interviews about each youth. If one parent completed multiple interviews, then questions asked about him or her were only asked once and skipped in the other interview(s). The parent's responses were then duplicated for the other child or children.
All data were collected through in-person interviews in Waves 1, 2, 3, 4, and 5. For the continued safety of PATH Study participants and interviewers during the COVID-19 pandemic, data collection for Wave 6 began with telephone interviews only. As conditions improved in certain parts of the country, the PATH Study began in-person interviews with participants on May 7, 2021. All in-person contacts with participants were conducted in compliance with local and state restrictions for COVID-19 mitigation. Wave 6 data were collected with a mix of telephone and in-person interviews. Data collection for Wave 7 began with both telephone and in-person interviews. In addition, a pilot test was conducted to evaluate the feasibility of introducing Web versions of the PATH Study interviews. A sample of continuing adults, parents, and youth was selected to complete the Wave 7 data collection via the Web. Wave 7 data were collected with a mix of telephone, Web, and in-person interviews.
A $2 incentive was mailed to all addresses sampled at Wave 1 and Wave 4 prior to screening. At Wave 7, a $2 incentive was mailed to all addresses in the AYS portion of the replenishment sample. Addresses in the YYO and YO portions of the replenishment sample were prepaid a $5 incentive along with invitation to complete the pre-screener on the Web, and an offer of an additional $5 for completing the online survey. All households that completed the pre-screener received the completion incentive: $5 if the pre-screener was returned via mail or if it was returned via the Web before the nonresponse follow-up mailing; $10 if the pre-screener was returned via the Web after the nonresponse follow-up mailing.
Adult respondents were paid $35 for their participation in Wave 1, Wave 2, Wave 3, and Wave 4. In Wave 5, Wave 6, and Wave 7, adult respondents were paid $50 for their participation. In Wave 1, Wave 2, Wave 3, and Wave 4, youth were paid $25 to complete the Youth Interview, and their parents were given $10 for each Parent Interview. In Wave 5, Wave 6, and Wave 7, youth were paid $35 to complete the Youth Interview, and their parents were given $15 for each Parent Interview.
A four-stage stratified area probability sample design was used in the PATH Study, with a two-phase design for sampling adults at the final stage. At the first stage, a stratified sample of geographical primary sampling units (PSUs) was selected, in which a PSU is a county or group of counties. For the second stage, within each selected PSU, smaller geographical segments were formed and then a sample of these segments was drawn. At the third stage, the sampling frame consisted of the residential addresses located in these segments. The fourth stage selected adults and youth from the sampled households identified at these addresses, with varying sampling rates for adults by age, race, and tobacco use status. Adults were sampled in two phases - Phase 1 sampling used information provided in the household screener and Phase 2 sampling used information provided by the adult in the Phase 2 screener at the beginning of the Adult Instrument. Please consult the Restricted-Use Files User Guide for additional details about the sampling.
People who use or do not use tobacco products in the civilian, non-institutionalized household population of the United States aged 9 and older at the time of Wave 1 (Wave 1 Cohort); People who use or do not use tobacco products in the civilian, non-institutionalized household population of the United States aged 10 and older at the time of Wave 4 (Wave 4 Cohort); People who use or do not use tobacco products in the civilian, non-institutionalized household population of the United States aged 9 or older at the time of Wave 7 (Wave 7 Cohort)
In Wave 1, Wave 2, Wave 3, Wave 4, Wave 5, Wave 6 and Wave 7 adults and youth were asked about the following types of tobacco products:
Although each section on tobacco products has some unique questions, most questions fit into one of the following categories:
Additional topics, in at least one wave, include:
Most questions asked in the questionnaires are categorical. Other questions ask, for example, the age at which something occurred or the person's body measurements. Responses to these questions are numerical.
The weighted response rates for the Wave 1 Cohort of the PATH Study are shown below. The Wave 1 interview rates are conditional on completion of the Wave 1 screener. The response rates for Waves 2, 3, 4, 5, and 6 are conditional on Wave 1 participation.
The weighted response rates for the adults and youth in the Wave 4 replenishment sample are shown below. The Wave 4 interview rates for the adults and youth in this sample are conditional on completion of the Wave 4 screener.
The weighted response rates for the Wave 4 Cohort of the PATH Study are shown below. The response rates for Wave 5 are conditional on interview response or shadow youth participation at Wave 4 (for replenishment sample members selected as shadow youth).
The weighted response rates for the adults and youth in the Wave 7 replenishment sample are shown below. The Wave 7 interview rates for the adults and youth in this sample are conditional on completion of the Wave 7 screener.
Please consult the Restricted-Use Files User Guide for further information regarding response rates.
Hide2015-12-19
2024-06-14 Data files and documentation with Location Characteristics for Waves 1, 2, 3, 4, and 5 were added to the collection. State Design Data, Wave 7 Adult and Youth/Parent State Identifier Data and Wave 7 Adult Tobacco Universal Product Code (UPC) Data were updated to correct PERSONID values for some records to match and merge with Questionnaire data files.
2024-04-08 Wave 7 Adult and Youth Questionnaire and Weight data and documentation files were added to the collection along with Wave 7 Adult and Youth/Parent State Identifier Data and Wave 7 Adult Tobacco Universal Product Code (UPC) Data. Wave 6 Ever/Never Reference data and documentation were also added. The State Design data and documentation were updated to include participants recruited in Wave 7. The Restricted Use Files User Guide, State Identifier Restricted Use Files User Guide, and Tobacco UPC Data Restricted Use Files User Guide were updated. The Master Tobacco Brand and Product List, Adult Variable Crosswalk, and Youth Variable Crosswalk were updated to include Wave 7.
2023-10-04 Updated Restricted-Use Files User Guide. Removed Collection Note regarding PATH Data User Forum due to pending decommission.
2023-05-19 Corrected file names for Study Research Derived Variable data files.
2023-04-27 Study Research Derived Variables (SRDV) data files and SRDV Supplement for Waves 1, 2, and 3 were added to the collection.
2023-04-03 2023-03-31 Wave 6 Adult and Youth Questionnaire and Weight data and documentation files were added to the collection along with Wave 6 Adult and Youth/Parent State Identifier Data and Wave 6 Adult Tobacco Universal Product Code (UPC) Data. The Restricted Use Files User Guide and Tobacco UPC Data Restricted Use Files User Guide were updated. The Master Tobacco Brand and Product List, Adult Variable Crosswalk, and Youth Variable Crosswalk were updated to include Wave 6. The Wave 6 Nonresponse Bias Analysis (NRBA) Report was added to the collection.
2022-11-09Adult and Youth Data files and Codebooks across Waves 1-5 were updated to improve the clarity and consistency of Codebook notes, variable long descriptions, and variable labels. Adult Data files across Waves 1-5 were also updated to reflect withdrawal by a participant (indicated by Special Missing -97777). The Ever/Never Reference data files for Waves 3, 4, and 5 were also updated to reflect participant withdrawal.
Restricted Use Files User Guide was updated to correct Table 5-3 and add clarifications in Section 5.4.3.5.
2022-04-21 Wave 5 Ever/Never Reference data file and documentation added.
2021-12-16 Four derived variables were added to the Wave 5 Adult Questionnaire data file and codebook (DS5001): R05R_A_FIRST_EPROD_FLAV, R05R_A_FIRST_GTRAD_FLAV, R05R_A_FIRST_GRILLO_FLAV, and R05R_A_FIRST_GFILTR_FLAV. Four derived variables were also added to the Wave 5 Youth / Parent Questionnaire data and file and codebook (DS5002): R05R_Y_FIRST_EPROD_FLAV, R05R_Y_FIRST_GTRAD_FLAV, R05R_Y_FIRST_GRILLO_FLAV, and R05R_Y_FIRST_GFILTR_FLAV. Minor edits were made for 1 variable in the Wave 5 Adult Questionnaire codebook and 2 variables in the Wave 5 Youth / Parent Questionnaire codebook. Algorithms for 82 derived variables were corrected in the Wave 5 Adult Questionnaire codebook and algorithms for 12 derived variables were corrected in the Wave 5 Youth / Parent Questionnaire codebook. These corrections were communicated to data users on 2021-11-17 in the codebook errata. (Codebook Errata download removed)
2021-11-11 Study was updated to include Codebook Errata for Wave 5 Adult and Youth / Parent codebooks. Codebooks will be updated in December to incorporate revised information and new variables.
2021-06-29 Study was updated to resolve issue of multiple user guide files.
2021-06-24 Data and documentation were updated for the Wave 5 Adult Questionnaire and all associated Wave 5 Adult Weights. Updated Restricted-Use Files (RUF) User Guide, Tobacco Brand Frequencies, Adult and Youth Crosswalks to Include Wave 5, and Wave 5 Non-response Bias Analysis Report. Data and documentation related to the Master Linkage File were removed: please see the Master Linkage File Study (ICPSR 38008).
2021-02-23 Wave 5 Adult and Youth Questionnaire and Weight data files were added to the collection along with Wave 5 Adult and Youth/Parent State Identifier Data and Wave 5 Adult Tobacco Universal Product Code (UPC) Data. The Restricted Use Files User Guide, State Identifier Restricted Use Files User Guide, and Tobacco UPC Data Restricted Use Files User Guide were updated. Data and documentation for the Master linkage file was updated to reflect the addition of these files.
2020-06-24 The study was updated to include the TUPCRUF User Guide.
2020-06-22 State Design Data, Adult State Identifier Data and Youth/Parent State Identifier Data for Waves 2-4 were added to the collection and the State Identifier Restricted Use File User Guide was updated. Tobacco Universal Product Code (UPC) Data for Waves 1-4 were also added to the collection, along with a Tobacco UPC Data Restricted Use File User Guide. Data for the Master linkage file was updated to reflect the addition of these files.
2020-03-31 Data for the Master linkage file was updated.
2020-03-23 Data and documentation for the Master linkage file was updated. Wave 4 Ever/Never data file has been added. Nonresponse Bias Analysis Reports for Waves 1-3 have been updated as well. Codebooks for the Wave 1 Adult, Wave 1 Youth/Parent, and Wave 3 Adult data files have been updated to adjust question text. This study is also being updated to reflect the correction version number.
2020-03-19 Data and documentation for the Master linkage file was updated. Wave 4 Ever/Never data file has been added. Nonresponse Bias Analysis Reports for Waves 1-3 have been updated as well. Codebooks for the Wave 1 Adult, Wave 1 Youth/Parent, and Wave 3 Adult data files have been updated to adjust question text.
2019-11-21 Wave 3 Adult codebook was updated to correct spelling error in question text.
2019-11-05Dataset and corresponding documentation number-schemes have changed across all waves/releases.
Adult and Youth Data files across Waves 1-4 were updated to improve the clarity and consistency of variable labels, as well as to reflect the withdrawn participants (indicated by Special Missing -97777). Documentation was updated for 508 compliance at this time.
Wave 1 and Wave 2 Adult data files were updated to include 11 Lifetime Threshold of Use Derived Variables. Wave 1 Youth data files were updated to include 1 Lifetime Threshold of Use Derived Variable. Wave 3 Youth data files were updated to include 8 Lifetime Threshold of Use Derived variables.
2019-05-30 2019-04-08 Data and documentation for the Master linkage file was updated. Wave 4 Adult and Youth Questionnaire and Weight data files have been added.
2019-02-01 Updating to include public codebooks for Wave 1 Adult and Youth State Identifier data files and the Master linkage file and the public User Guide for the State Identifier Restricted-Use Files.
2019-02-01 Wave 1 Adult and Youth State Identifier data files were added to the collection. Data and documentation for the Master linkage file was updated.
2018-10-01 2018-09-28 Data and documentation for the Master linkage file was updated. The Nonresponse Bias Analysis Report is now included for Wave 3.
2018-05-01Wave 3 Adult and Youth data files were added to the collection. Wave 1 and Wave 2 Adult and Youth data files were updated to improve the clarity and consistency of variable labels, especially in the Nicotine Dependence section.
A new variable was added to Wave 1 and Wave 2 Adult data - R0#_ND_DATA_ROUTE. A second variable was added to the Wave 2 Adult data - R02R_A_P12M_BLUNTONLY_GRILLO. An additional 18 derived variables in the Wave 2 Adult data were revised and replaced the original variables. The newly named variables possess the original name, but also contain "_REV" at the end of the variable name.
A skip error was identified in the Wave 2 Adult instrument, which resulted in some respondents being asked two questions when they should not have been. Therefore, the affected items, R02_AG0100CG and R02_AG0100FC, contain some extra data. Notes were added to the annotated instrument and codebook to describe the issue.
The User Guide and Questionnaires were also updated to improve understanding of the data files. A Nonresponse Bias Analysis report is now included for Wave 2.
2018-02-15 The citation of this study may have changed due to the new version control system that has been implemented. The previous citation was:2017-06-19 The Wave 1 and Wave 2 data files, for both Adults and Youth, were updated to correct minor errors along with the questionnaires to correct minor typos and clarify specifications.
2017-04-27 A minor revision was made to the both the English and Spanish versions of the Wave 1 Adult questionnaire. The User Guide was also updated. Two Excel crosswalks, one for Adults and one for Youth, were added to the available documentation to highlight the differences between the Wave 1 and Wave 2 files.
2017-04-03 An update was made to internal files to correct an issue with how missing values are displayed online through ICPSR's variables database.
2017-03-23 Minor revisions were made to the Missing Values Code table within the User Guide and both Codebooks for Wave 2.
2017-03-15 Data from Wave 2 of the study were added to the collection. The User Guide and Master Tobacco Brand and Product Code Guide were expanded to include information for Wave 2.
2017-01-31 The variable R01X_CB_REGION in both the Wave 1 Adult and Youth/Parent files was updated to correct an error in the value labels. The values for codes 2 and 3 had been inadvertently swapped. The data did not change; only the value labels for codes 2 and 3 have been corrected.
2016-11-28 An additional 40 derived variables were added to the end of the Wave 1 Youth / Parent file that are similar to those already in the Wave 1 Adult file. Information for individuals who withdrew from the study is denoted in the datasets by the special missing value -97777. Spanish versions of the annotated instruments are also now available.
2016-05-24 The study's title changed with the removal of the year range. The Informed Consent Document and Non-Response Bias Analysis Report were changed from being study level files to being a part of the Wave 1 (DS1001 and DS1002) specific documentation.
2016-04-22 An additional documentation file (Non-response_Report) was added to the collection.
2016-04-20 Updated the file names only for the two study level documentation files (Informed_Consent and Tobacco_Brand) so that each file was easily identified and distinguishable from the other. No change was made to the content in either file.
2016-04-18 Coding was updated for the sexual attraction variables. The questionnaires were revised to enhance the clarity of the ASK statements. The PDF codebooks now contain full question text from the questionnaires. Lastly, the PATH Study Master Tobacco Brand and Product Code Guide and an Informed Consent Document were also released.
2016-01-13 PDF codebooks were released without question text. The codebooks will be updated in the near future that includes question text.
2015-12-19 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:
Each data file for Wave 1 and Wave 2 contains weights for use in analyses of the data from the complex PATH Study sample design. The final full-sample person-level weight for Waves 1 and 2 on the Adult file is R0#_A_PWGT, and the final full-sample person-level weight for Waves 1 and 2 on the Youth / Parent file is R0#_Y_PWGT.
The weights for Wave 3 are in two sets of files:
The weights for Wave 4 are in three sets of files corresponding to the Wave 1 and Wave 4 Cohorts and different weight types:
The weights for Wave 5 are in five sets of files corresponding to the Wave 1 and Wave 4 Cohorts and different weight types:
The weights for Wave 6 are in six sets of files corresponding to the Wave 1 and Wave 4 Cohorts and different weight types:
The weights for Wave 7 are in seven sets of files corresponding to the Wave 1, Wave 4, and Wave 7 Cohorts and different weight types:
For each weight mentioned above, there are also 100 replicate weights and design variables (VARPSU and VARSTRAT) for use in variance estimation. Detailed information on how these variables were created, and how and why they should be used is provided in the Restricted-Use Files User Guide.
Note that the weighting procedures adjust for oversampling of specified population groups and nonresponse. ICPSR strongly recommends that researchers read and understand the sections pertaining to weights before analyzing the data to ensure correct use of these variables.
HideThe public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.