Version Date: Aug 18, 2021 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse;
United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products
Series:
https://doi.org/10.3886/ICPSR37519.v4
Version V4 (see more versions)
You are currently viewing an older version of this data collection. A more recent version may be available by selecting (see more versions)
Additional information about this collection can be found in Version History.
2021-08-18 Data and documentation related to the Master Linkage File were retired: please see the Master Linkage File Study (ICPSR 38008).
2021-02-23 Wave 4.5 Youth/Parent State Identifier Data and Wave 4.5 Ever/Never data files were added to the collection.
2020-06-24 The user guide for this study has been updated and online variable search capabilities have been added for this study.
2020-03-19 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of tobacco users and non-users.
45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent.
At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled PSUs and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1 and Wave 4 Cohorts.
Wave 4.5 is a special data collection for youth only who were aged 12 to 17 at the time of the Wave 4.5 interview. Wave 4.5 is the fourth annual follow-up wave for those who were members of the Wave 1 Cohort. For those who were sampled at Wave 4, Wave 4.5 is the first annual follow-up wave.
Dataset 0001 (DS0001) contains the data from the Master Linkage file. This file contains 54 variables and 67,276 cases. The file provides a master list of every person's unique identification number and what type of respondent they were for each wave, starting with Wave 1.
Dataset 1002 (DS1002) contains the data from the Wave 4.5 Youth (and Parent) Questionnaire. This file contains 1,617 variables and 13,131 cases.
Datasets 1112, 1212, 1222, (DS1112, DS1212, and DS1222) are data files comprising the weight variables for Wave 4.5. The "all-waves" weight file contains weights for participants in the Wave 1 Cohort who completed a Wave 4.5 youth interview and completed interviews (if old enough to do so) or verified their information with the study (if not old enough to be interviewed) in Waves 1, 2, 3, and 4.
There are two separate files with "single wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight file for the Wave 1 Cohort contains weights for youth who completed an interview in Wave 1 and in Wave 4.5, regardless of their participation in the intervening waves. The "single-wave" weight file for the Wave 4 Cohort contains weights for all Wave 4.5 youth interview respondents in the Wave 4 Cohort.
Dataset 1402 (DS1402) contains the Wave 4.5 State Identifier data for Youth (and Parents) and has 5 variables and 13,131 cases. The State Identifier dataset includes PERSONID for linking the State Identifier to the questionnaire data and 3 variables designating the state (state FIPS, state abbreviation, and full name of the state). The State Identifier values in this dataset represent participants' state of residence at the time of Wave 4.5.
Dataset 1503 (DS1503) contains data derived from responses to questionnaires in Wave 1, Wave 2, Wave 3, Wave 4, and Wave 4.5 indicating if participants had ever/never used various tobacco products as of the Wave 4.5 data collection period. This data file contains 26 variables for all 67,276 study participants as of the Wave 4.5 data collection. This file is provided for reference only to simplify the definitions of tobacco use variables in the Adult and Youth data files for subsequent waves.
Export Citation:
None
Users are reminded that these data are to be used solely for statistical analysis and reporting of aggregated information, and not for the investigation of specific individuals or organizations.
Access to these data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement. Data are provided via ICPSR's Virtual Data Enclave (VDE). Apply for access to these data through the ICPSR VDE portal. Information and instructions are available within the data portal. For further assistance please reference the VDE Guide to learn about the application process, about using the VDE, and how to request disclosure review of VDE output.
The PATH Study Data User Forum allows researchers using any PATH Study data files to communicate with each other to ask and answer questions. Announcements, data releases and updates, new publications, upcoming events, and other information for PATH Study data users will also be posted to the forum.
Data are provided via ICPSR's Virtual Data Enclave (VDE) where researchers will work with data stored on secure ICPSR servers. Researchers will not possess actual physical copies of the data; however, they may request permission to access selected output outside the virtual environment after review by ICPSR. See the Access Notes to apply for access. Researchers are also encouraged to read the VDE Guide.
The data files contain person-level (PERSONID) and household-level identification (X0#_HHID) variables allowing linkage of people within a file, between Adult and Youth/Parent files, and across waves of data collection. The values in these two variables are random and contain no direct or indirect personally identifiable information. Please review Appendix D in the Restricted-Use Files User Guide for information and programming code on linking files together. The files are sorted by the variable PERSONID.
ICPSR attempted to duplicate all information contained in the questionnaires into the question text used in the codebooks. Some of the longer programming instructions were not incorporated into the question text. In these cases, the question text includes a note for the user to read the full programming instructions in the corresponding section of the questionnaire. Derived and imputed variables contain the algorithms used in the creation of these variables. Users are advised to refer to the Restricted-Use Files User Guide and annotated questionnaires when reviewing the codebooks.
Some variables were withheld to limit the release of information that is a potential risk for disclosure. These variables are listed in Appendix B in the Restricted-Use Files User Guide.
The Youth Interview and Parent Interview questionnaires were distinct and separate questionnaires used in data collection. However, for each wave, both instruments have been combined into a single document since the responses to these instruments are also combined into a single data file.
The Youth questionnaires in Wave 4.5 includes several questions about tobacco brands and products the respondent usually uses and most recently used. For each question, a list of response options was displayed on the computer screen for the respondent to select. For many major brands and products, the displayed list included both a text label and a thumbnail image of the brand logo or product package. The displayed list was different for each of the tobacco product types with the brands and products listed being those that were known to exist for the specific tobacco product type. Because these lists are long, they are not provided in a frequency table for each variable in the codebook or in the annotated instrument. For convenience, the Youth/Parent codebook contains an appendix with a frequency table of the top 20 responses for each variable. The PATH Study Master Tobacco Brand and Product Code Guide is available as an Excel workbook file [Documentation.xlsx (Tobacco_Brand)]. The spreadsheets in this Excel workbook file are protected and may not be edited. However, the last spreadsheet contains filters to narrow the complete list. This spreadsheet is the master file of all brand and product responses for these questions from all waves, including any responses that were not in the list of options displayed to the respondent.
In the Parent Interview section, the same questions were asked of parents of all sampled youth except for the emancipated youth. In this section the cases for emancipated youth were coded as "Inapplicable". There are a small number of emancipated youth in Wave 4.5, but there are no individual questions asked exclusively of emancipated youth.
In the Youth/Parent data files, several groups of variables contain the word "RANDOM" in both the variable name and label. This indicates computerized randomization of the question order. These "RANDOM" variables detail the order in which the questions were asked of a particular respondent.
The Youth/Parent data file contains additional derived variables. These variables can be distinguished by the variable name starting with "X04R" and contain the word "DERIVED" in the variable label. There are several variables for each tobacco category to identify certain classes of current and former tobacco users.
In accordance with the study's informed consent, information is suppressed about individuals who withdrew from the PATH Study. Their information was recoded to a special missing value, designated as -97777.
Consent forms provided to and signed by the respondents for the various types of interviews conducted and biological samples collected are included with Wave 1 and Wave 4 files (Informed Consent forms used for Wave 1 and the Wave 4 Informed Consent form is provided with the Wave 4 files). Participants provide consent at their initial interview and biological sample collection; consents remain in effect for all subsequent waves.
The Nonresponse Bias Analysis Report for Wave 4.5 details the response rates and the potential for bias from nonresponse.
The questionnaires in this collection are updated versions of the fielded questionnaires that were annotated for analytic purposes. Spanish versions are also available.
The PATH Study's documentation is available for your use and may be reproduced in whole or in part without permission from NIH's National Institute on Drug Abuse or FDA's Center for Tobacco Products. Citation of the source is appreciated.
Additional background information including answers to frequently asked questions for study participants and researchers can be found in the Researchers section of the PATH Study Series page.
The Restricted-Use Files User Guide provides an overview of the entire PATH Study. The guide covers topics such as sample design, data collection, weighting, response rates, and programming syntax to run common statistics and link the files together. Researchers should feel free to use the information in the User Guide for their publication and the guide should be cited as follows:
Work for Wave 4.5 was performed under contract number HHSN271201600001C.
The Population Assessment of Tobacco and Health (PATH) Study is a longitudinal cohort study on tobacco use behavior, attitudes and beliefs, and tobacco-related health outcomes among approximately adults and youth in the United States. The study's primary objectives are to:
At Wave 1, the study sampled over 150,000 mailing addresses which, using a four-staged stratified sampling design, yielded a sample of 45,971 respondents (32,230 adults/ 13,651 youth) who completed a Wave 1 interview. Tobacco users and non-users who were at least 9 years old living a civilian, non-institutionalized setting were considered for participation during Wave 1. Youth who turn 18 by the next wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave1) are considered "aged-up youth" upon turning 12 years old when they are asked to join the the study. Theses 53,178 participants form the Wave 1 Cohort.
At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ags 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. this sample was recruited from close to 174,000 mailing addresses not selected for Wave 1, in the same sampled PSUs and segments using similar within-household sampling procedures. To meet the needs for the Wave 4 Cohort shadow sample, a randomly selected subset of the sampled addresses (115,500 or close to two-thirds of the addresses) were screened solely to identify shadow youth ages 10 to 11. The remaining addresses (close to 58,800) were screened for adults, youth, and shadow youth ages 10 to 11. These are referred to as the "SO" (shadow youth only) and "AYS" (Adults, youth, and shadow youth) replenishment samples, respectively. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population (CNP) at the the time of Wave 4. This combined set of Wave 4 participants, 52,371 participants in total, forms the Wave 4 Cohort.
The target population for the Wave 1 Cohort in Wave 4.5 is the resident population of the U.S. and ages 13-17 at the time of Wave 4.5 (other than those who were incarcerated) who were in the U.S. CNP. The target population for the Wave 4 Cohort in Wave 4.5 is the resident population of the U.S. and ages 12-17 at the time of Wave 4.5 (other than those who were incarcerated) who were in the U.S. CNP.
The Youth/Parent file contains a single record of every youth who completed an interview in Wave 4.5. Parents who provided permission for their child to complete the Youth Interview were asked to complete a brief Parent Interview that contained questions about parental supervision, school performance, and tobacco use by youth. The Parent Interview is primarily an interview about the child(ren), not the parent. Almost all youth respondents had a parent or guardian complete the Parent Interview (over 99.0 percent). When multiple youth from the same household were selected to be in the study, the parent(s) completed separate interviews about each youth. If one parent completed multiple interviews, then questions asked about him or her were only asked once and skipped in the other interview(s). The parent's responses were then duplicated for the other child or children.
A $2 incentive was mailed to all addresses sampled at Wave 1 and Wave 4 prior to screening. Adult respondents were paid $35 for their participation in Wave 1, Wave 2, Wave 3, and Wave 4. In Wave 1, Wave 2, Wave 3, Wave 4, and Wave 4.5, youth were paid $25 to complete the Youth Interview, and their parents were given $10 for each parent interview.
A four-stage stratified area probability sample design was used in the PATH Study, with a two-phase design for sampling adults at the final stage. At the first stage, a stratified sample of geographical primary sampling units (PSUs) was selected, in which a PSU is a county or group of counties. For the second stage, within each selected PSU, smaller geographical segments were formed and then a sample of these segments was drawn. At the third stage, the sampling frame consisted of the residential addresses located in these segments. The fourth stage selected adults and youth from the sampled households identified at these addresses, with varying sampling rates for adults by age, race, and tobacco use status. Adults were sampled in two phases - Phase 1 sampling used information provided in the household screener and Phase 2 sampling used information provided by the adult in the Phase 2 screener at the beginning of the Adult instrument. Please consult the Restricted-Use Files User Guide for additional details about the sampling. There was no additional sampling for Wave 4.5. Wave 4.5 is a special data collection in which PATH Study participants ages 12 to 17 at the time of Wave 4.5 were interviewed.
The resident population of the United States who were ages 13-17 at the time of Wave 4.5 (other than those who were incarcerated) and part of the civilian, non-institutionalized household population of the United States at the time of Wave 1 (Wave 1 Cohort); the resident population of the United States who were ages 12-17 at the time of Wave 4.5 (other than those who were incarcerated) and part of the civilian, non-institutionalized household population of the United States at the time of Wave 4 (Wave 4 Cohort).
Parents and youths were asked about the following types of tobacco products:
Although each section on tobacco products has some unique questions, most questions fit into one of the following categories:
Additional topics include:
Most questions asked in the questionnaires are categorical. Other questions ask, for example, the age at which something occurred or the person's body measurements. Responses to these questions are numerical.
The weighted Wave 4.5 youth interview response rate for the Wave 1 Cohort (conditional on Wave 1 participation) was 74.6 percent; the weighted Wave 4.5 youth interview response rate for the Wave 4 Cohort (conditional on Wave 4 participation) was 89.1 percent.
Please consult the Restricted-Use Files User Guide for further information regarding response rates.
Hide2020-03-19
2021-08-18 Data and documentation related to the Master Linkage File were retired: please see the Master Linkage File Study (ICPSR 38008).
2021-02-23 Wave 4.5 Youth/Parent State Identifier Data and Wave 4.5 Ever/Never data files were added to the collection.
2020-06-24 The user guide for this study has been updated and online variable search capabilities have been added for this study.
2020-03-19 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:
At Wave 4.5, only youth ages 12 to 17 were interviewed, along with parents. There are two longitudinal weights available for analysis of Wave 4.5 data for the Wave 1 Cohort: the all-waves weight and the single-wave weight. The "all-waves" weight file contains weights for those Wave 1 Cohort participants who completed a Wave 4.5 interview and completed interviews (if old enough to do so) or verified their information (if not old enough to be interviewed) in Waves 1, 2, 3, and 4. The Wave 4.5 single-wave weight was assigned to participants who completed an interview in Wave 1 and in Wave 4.5, regardless of their participation in the intervening waves. In addition, there is a single-wave weight for all Wave 4.5 youth interview respondents in the Wave 4 Cohort.
For each weight mentioned above, there are also 100 replicate weights and design variables (VARPSU and VARSTRAT) for use in variance estimation. Detailed information on how these variables were created, and how and why they should be used is provided in the Restricted-Use Files User Guide
Hide