Population Assessment of Tobacco and Health (PATH) Study [United States] Restricted-Use Files (ICPSR 36231)
Principal Investigator(s): United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse; United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products
The Population Assessment of Tobacco and Health (PATH) Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The PATH Study collected data from adults and youth. The study sampled over 150,000 mailing addresses across the United States to create a national sample of tobacco users and non-users. The sample resulted in 45,971 respondents.
These 45,971 individuals constitute the first (baseline) annual wave of data collected by this longitudinal cohort study. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, the 7,207 "shadow youth", those not yet 12 at the time of the Wave 1 interview, are considered "aged-up youth" and are asked to join the study upon turning 12 subsequent to parental consent. Please refer to the User Guide that provides further details about these children designated as "shadow youth". At each subsequent wave of data collection, the parents of sampled youth are invited to complete a short Parent Interview about his or her child(ren).
Dataset 0001 (DS0001) contains the data from the Master Linkage file. This file contains 4 variables and 53,178 cases. The file provides a master list of every person's unique identification number and what type of respondent they were for each wave.
Dataset 1001 (DS1001) contains the data from the Wave 1 Adult Interview. This data file contains 2,010 variables and 32,320 cases. Each of the cases represents a single, completed interview.
Dataset 1002 (DS1002) contains the data from the Wave 1 Youth (and Parent) Interviews. This file contains 1,430 variables and 13,651 cases.
Dataset 2001 (DS2001) contains the data from the Wave 2 Adult Interview. This data file contains 2,409 variables and 28,362 cases. Of these cases, 26,447 also completed a Wave 1 Adult Interview. The other 1,915 cases are "aged-up adults" having previously completed a Wave 1 Youth Interview.
Dataset 2002 (DS2002) contains the data from the Wave 2 Youth Interview. This data file contains 1,588 variables and 12,172 cases. Of these cases, 10,081 also completed a Wave 1 Youth Interview. The other 2,091 cases are aged-up youth respondents having previously been flagged as "shadow youth".
Each case in an Adult data file represents a single, completed interview. Each case in a Youth data file represents one youth and his or her parent's responses about that youth. Parents who provided permission for their child to participate in a Youth Interview were asked to complete a brief interview about their child. In both waves of data collection less than 0.5 percent of the parents did not complete an interview. Most questions are asked in reference to the child.
In Wave 1, about 88 percent of the "parent" respondents were the biological mother or father. When multiple youth from the same household were selected to be in the study, the parent(s) completed separate interviews about each youth. If one parent completed two or more interviews that parent only answered questions about himself/herself once. Those questions were then skipped in the subsequent interview(s) for the other child(ren) and the responses duplicated in that child(ren)'s data file(s).
One or more files in this data collection have special restrictions ; consult the restrictions note to learn more. You can apply online for access to the restricted-use data. A login is required to apply.
Users are reminded that these data are to be used solely for statistical analysis and reporting of aggregated information, and not for the investigation of specific individuals or organizations.
Access to these data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement. Data are provided via ICPSR's Virtual Data Enclave (VDE). Apply for access to these data through the ICPSR VDE portal. Information and instructions are available within the data portal. For further assistance please reference the VDE Guide to learn about the application process, about using the VDE, and how to request disclosure review of VDE output.
Any public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse, and United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products. Population Assessment of Tobacco and Health (PATH) Study [United States] Restricted-Use Files. ICPSR36231-v9. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2017-03-15. http://doi.org/10.3886/ICPSR36231.v9
Persistent URL: http://doi.org/10.3886/ICPSR36231.v9
This study was funded by:
- United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse
- United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products.
Scope of Study
Subject Terms: adults, advertising, alcohol, bidis, chewing tobacco, cigarettes, cigarillos, cigars, dissolvable tobacco, e-cigarettes, filtered cigars, hookah, kreteks, marijuana, marketing, mental health, nicotine addiction, nicotine dependence, parents, pipe tobacco, second-hand smoke, sexual preference, smokeless tobacco, smoking cessation, snus pouches, social media, substance abuse, tobacco products, tobacco use, youths
Geographic Coverage: United States
- 2013--2014 (Wave 1)
- 2014--2015 (Wave 2)
- 2013-09--2014-12 (Wave 1)
- 2014-10--2015-10 (Wave 2)
Data are provided via ICPSR's Virtual Data Enclave (VDE) where researchers will work with data stored on secure ICPSR servers. Researchers will not possess actual physical copies of the data; however, they may request permission to access selected output outside the virtual environment after review by ICPSR. See the Access Notes to apply for access. Researchers are also encouraged to read the VDE Guide.
The data files contain person-level (PERSONID) and household-level identification (R0#_HHID) variables allowing linking of people within a file, between Adult and Youth/Parent files, and across waves of data collection. The values in these two variables are random and contain no direct or indirect personally identifiable information. Please review Appendix D in the User Guide for information and programming code on linking files together. The files are sorted by the variable PERSONID.
ICPSR attempted to duplicate all information contained in the questionnaires into the question text used in the PDF codebooks. Some of the longer programming instructions were not incorporated into the question text. In these cases the question text includes a note pointing to the questionnaire so that a user may read the full programming instructions for further clarity. Derived and imputed variables contain the algorithms used in the creation of these variables. Users are advised to refer to the User Guide and annotated questionnaires when reviewing the codebooks.
Some variables were withheld to limit the release of information that is a potential risk for disclosure. These variables are listed in Appendix B in the User Guide.
The Youth Interview and Parent Interview instruments were distinct and separate instruments used in data collection. However, for each wave both instruments have been combined into a single document since the responses to these instruments are also combined into a single file.
Both the Adult and Youth instruments in each wave include several questions about tobacco brands and products the respondent usually uses and most recently used. For each question, a list of response options was displayed on the computer screen for the respondent to select. For a large number of major brands and products, the displayed list included both a text label and a thumbnail image of the brand logo or product package. The displayed list was different for each of the tobacco product types with the brands and products listed being those that were known to exist for the specific tobacco product type. Because these lists are long, they are not provided in a frequency table for each variable in the codebook or in the annotated instrument. For convenience, both the Adult and Youth codebook contains an appendix that provides a frequency table of the top 20 responses for each variable. The PATH Study Master Tobacco Brand and Product Code Guide is available as an Excel workbook file [Documentation.xlsx (Tobacco_Brand)]. The spreadsheets in this Excel workbook file are protected and may not be edited. However, the last spreadsheet contains filters to narrow the complete list. This particular spreadsheet is the master file of all brand and product responses for these questions from both Wave 1 and Wave 2, including any responses that were not in the list of options displayed to the respondent.
In the Youth / Parent file (DS1002 and DS2002), the last section of the questionnaire contains demographic and health history questions. A few of the questions were asked of all youth. However, the majority of questions were only asked of emancipated youth. The responses to these questions for non-emancipated youth were coded "Inapplicable". The questionnaire and codebook note which variables were only asked of emancipated youth. Conversely, in the Parent Interview section the same questions were asked of parents of all sampled youth with the exception of the emancipated youth. In this section the cases for emancipated youth were coded as "Inapplicable".
In both the Adult and Youth / Parent data files, several groups of variables contain the word "RANDOM" in both the variable name and label. This indicates computerized randomization of the question order. These "RANDOM" variables detail the order in which the questions were asked for a particular respondent.
Each data file contains a section about tobacco advertising. There are 20 variable triplets contained in this section. The computer randomly selected 20 advertisements and then asked the respondents whether they had seen the ad and whether they liked the ad. The Image ID variable (_AD) identifies the advertisement that was displayed to the respondent to characterize the ad, e.g., the tobacco product and brand. However, vendors did not grant permission to release the actual .jpg and .bmp files containing the images seen by respondents.
Derived and imputed demographic variables (age, sex, Hispanic ethnicity, and race) are included near the end of each data file. The Adult file also contains education. An accompanying imputation flag variable is also included. These variables are distinguished by the variable name starting with "R0#R" and contain the word "DERIVED" or "IMPUTED" in the variable label.
Within the "Derived and Imputed Variables" section of the PDF codebooks for both waves of the Adult and Youth/Parent files are two geographic variables - Census Region and Census Division. There are additional variables to designate urban areas and Census Block characteristics.
All Adult and Youth/Parent data files contain additional derived variables. These variables can be distinguished by the variable name starting with "R0#R" and contain the word "DERIVED" in the variable label. There are several variables for each tobacco category to identify certain classes of current and former tobacco users.
In accordance with the study's informed consent, information is suppressed about individuals who withdraw from the PATH Study. Their information was recoded to a special missing value, designated as -97777.
The current release only contains the restricted-use versions of Wave 1 and Wave 2 data files. A public-use file for Wave 2 will be made available later in 2017. Wave 3 data files are tentatively planned to be released in 2018.
The documentation file titled Documentation.pdf (Informed_Consent) includes the six consent forms provided to and signed by the respondents for the various types of interviews conducted and biological samples collected. Participants provide consent at their initial interview and biological sample collection; consents remain in effect for subsequent waves. Aged-up adults who previously responded to the youth interview at one wave need to re-consent at the time of the subsequent wave.
The questionnaires in this collection are updated versions of the fielded questionnaires that were annotated for analytic purposes. Spanish versions are also available.
The User Guide provides an overview of the entire PATH Study. The guide covers topics such as sample design, data collection, weighting, response rates, and programming syntax to run common statistics and linking the files together. Researchers should feel free to use the information in the User Guide for their publication and the guide should be cited as follows:
- United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse, and United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products. Population Assessment of Tobacco and Health (PATH) Study [United States] Restricted-Use Files, User Guide. ICPSR36231-v9 Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2017-03-15. http://doi.org/10.3886/ICPSR36231.userguide
Additional background information including answers to frequently asked questions for study participants and researchers can be found in the Researchers section of the PATH Study series page.
The data for the PATH Study was collected and prepared by Westat. The contract number under which they performed their work is: HHSN271201100027C.
The Population Assessment of Tobacco and Health (PATH) Study is a longitudinal cohort study on tobacco use behavior, attitudes and beliefs, and tobacco-related health outcomes among an estimated 46,000 adults and youth in the United States. Taken directly from the PATH Study web site, the study's primary objectives are to:
- Objective 1: Identify and explain between-person differences and within-person changes in tobacco-use patterns, including the rate and length of use by specific product type and brand, product/brand switching over time, uptake of new products, and dual- and poly-use of tobacco products (i.e., use of multiple products within the same time period, and switching between multiple products).
- Objective 2: Identify between-person differences and within-person changes in risk perceptions regarding harmful and potentially harmful constituents, new and emerging tobacco products, filters and other design features of tobacco products, packaging, and labeling; and, identify other factors that may affect use, such as social influences and individual preferences.
- Objective 3: Characterize the natural history of tobacco dependence, cessation, and relapse including readiness and self-efficacy to quit, motivations for quitting, the number and length of quit attempts, and the length of abstinence related to various tobacco products.
- Objective 4: Update the comprehensive baseline and subsequent waves of data on tobacco-use behaviors and related health conditions (including markers of exposure and tobacco-related disease processes identified from the collection and analysis of biospecimens) to assess between-person differences and within-person changes over time in health conditions potentially related to tobacco use, particularly with use of new and different tobacco products, including modified-risk tobacco products. Each wave may also facilitate the selection of individuals by disease status, biomarker levels, or tobacco use status for participation in small-scale research studies (see Objective 8).
- Objective 5: Assess associations between TCA-specific actions and tobacco-product use, risk perceptions and attitudes, use patterns, cessation outcomes, and tobacco-related intermediate endpoints (e.g., exposure and disease biomarker levels). Analyses will attempt to account for other potential factors, such as demographics, local tobacco-control policies, and social, familial, and economic factors, that may influence the observed patterns.
- Objective 6: Assess between-person differences and within-person changes over time in attitudes, behaviors, exposures to tobacco products, and related biomarkers among and within population subgroups defined by racial-ethnic, gender, age, and risk factors (e.g., pregnancy or co-occurring substance use or mental health disorders).
- Objective 7: To the extent to which sample sizes are sufficient, assess and compare samples of former and never users of tobacco products for between-person differences and within-person changes in relapse and uptake, risk perceptions, and indicators of tobacco exposure and disease processes.
- Objective 8: Use the PATH Study's comprehensive baseline (i.e., Wave 1) and first follow-up wave (i.e., Wave 2) data on tobacco-use behaviors, attitudes, related health conditions (including markers of exposure, tobacco use-related disease processes identified from the collection and analysis of bio-specimens) as a potential basis to screen respondents for participation in small-scale research studies. Such studies would be submitted for approval to the Office of Management and Budget (OMB), for example, through one of the PATH Study's two generic clearances for cognitive testing or for methodological studies, or as an embedded study within a revision request, such as a request to conduct a small-scale research study during a follow-up wave of data and bio-specimen collection.
The study sampled over 150,000 mailing addresses which, using a four-staged stratified sampling design, yielded a sample of 45,971 respondents (32,320 adults / 13,651 youth) who completed a Wave 1 interview. Tobacco users and non-users who were at least 12 years old living in a civilian, non-institutionalized setting were considered for participation during Wave 1. Youth who turn 18 by the next wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, the 7,207 "shadow youth", those not yet 12 at the time of the Wave 1 interview, are considered "aged-up youth" and are asked to join the study upon turning 12.
The Adult files contain a single record for every adult participant. The Youth / Parent files contain a single record of every youth who participated in a given wave. Parents who provided permission for their child to complete a Youth Interview were asked to complete a brief Parent Interview that contained questions about parental supervision, school performance, and tobacco use by youth. The Parent Interview is primarily an interview about the child(ren), not the parent. In both waves, almost all youth had a parent or guardian complete the Parent Interview (over 99.5 percent). When multiple youth from the same household were selected to be in the study, the parent(s) completed separate interviews about each youth. If one parent completed multiple interviews then questions asked about him or her were only asked once. Those questions were skipped in the other interview(s) and the responses duplicated for the other child(ren).
A $2 incentive was included with the mailed Wave 1 screener. For both Wave 1 and Wave 2, adult respondents were paid $35 for their participation. Youth were paid $25 to complete the Youth Interview, and their parents were given $10 for each Parent Interview.
Sample: A four-stage stratified area probability sample design was used in the PATH Study, with a two-phase design for sampling adults at the final stage. At the first stage, a stratified sample of geographical primary sampling units (PSUs) was selected, in which a PSU is a county or group of counties. For the second stage, within each selected PSU, smaller geographical segments were formed and then a sample of these segments was drawn. At the third stage, the sampling frame consisted of the residential addresses located in these segments. The fourth stage selected adults and youth from the sampled households identified at these addresses, with varying sampling rates for adults by age, race, and tobacco use status. Adults were sampled in two phases - Phase 1 sampling used information provided in the household screener and Phase 2 sampling used information provided by the adult in the Phase 2 screener at the beginning of the adult instrument. Please consult the User Guide for additional details about the sampling.
Each data file contains weights for use in analyses of the data from the complex PATH Study sample design.
The final full-sample person-level weight for each wave of the Adult file is R0#_A_PWGT, and the final full-sample person-level weight for each wave of the Youth / Parent file is R0#_Y_PWGT. There are also 100 replicate weights and design variables (VARPSU and VARSTRAT) for use in variance estimation. Detailed information on how these variables were created, and how and why they should be used is provided in the User Guide. One important note is that the weighting procedures adjust for oversampling of certain population groups and non-response.
In both waves, adults and youth were asked about seven main types of tobacco products: cigarettes, e-cigarettes, cigars (traditional, cigarillos, filtered), pipes, hookah, smokeless tobacco (snus pouches and other forms of smokeless tobacco), and dissolvable tobacco. Bidis and kreteks were additional types asked about on the Youth Interview, but were not asked on the Adult Interview. Although each section of tobacco products has some unique questions the majority of the questions fit into one of the following categories.
- Ever use
- Recency of use
- Frequency of use
- Amount of use
- Brands used
- Purchase details
Both files contain additional topics which include:
- Nicotine dependence
- Packaging and health warnings
- Risk and harm perceptions
- Secondhand smoke exposure
- Marketing and advertising
- Media use
- Psychosocial and mental health
- Substance use
- Peer and family influences
Most questions asked in the questionnaires are categorical. Other questions ask, for example, the age at which something occurred or the person's body measurements. The responses to these are typically numerical.
- Household screener: 54.1 percent (unweighted); 54.0 percent (weighted)
- Wave 1 Adult Interview: 74.8 percent (unweighted); 74.0 percent (weighted)
- Wave 1 Youth Interview: 78.2 percent (unweighted); 78.4 percent (weighted)
Please consult the User Guide for information regarding the response rates for data collected during Wave 2.
- Checked for undocumented or out-of-range codes.
Original ICPSR Release: 2015-12-19
- 2017-03-15 Data from Wave 2 of the study were added to the collection. The User Guide and Master Tobacco Brand and Product Code Guide were expanded to include information for Wave 2.
- 2017-01-31 The variable R01X_CB_REGION in both the Adult and Youth/Parent files was updated to correct an error in the value labels. The values for codes 2 and 3 had been inadvertently swapped. The data did not change; only the value labels for codes 2 and 3 have been corrected.
- 2016-11-28 An additional 40 derived variables were added to the end of the Youth / Parent file that are similar to those already in the Adult file. Information for individuals who withdrew from the study is denoted in the datasets by the special missing value -97777. Spanish versions of the annotated instruments are also now available.
- 2016-05-24 The study's title changed with the removal of the year range. The Informed Consent Document and Non-Response Bias Analysis Report were changed from being study level files to being a part of the Wave 1 (DS1001 and DS1002) specific documentation.
- 2016-04-22 An additional documentation file (Non-response_Report) was added to the collection.
- 2016-04-20 Updated the file names only for the two study level documentation files (Informed_Consent and Tobacco_Brand) so that each file was easily identified and distinguishable from the other. No change was made to the content in either file.
- 2016-04-18 Coding was updated for the sexual attraction variables. The questionnaires were revised to enhance the clarity of the ASK statements. The PDF codebooks now contain full question text from the questionnaires. Lastly, the PATH Study Master Tobacco Brand and Product Code Guide and an Informed Consent Document were also released.
- 2016-01-13 PDF codebooks were released without question text. The codebooks will be updated in the near future that includes question text.
- Citations exports are provided above.
Export Study-level metadata (does not include variable-level metadata)