1987 -- 1991 (6 Month Interval Interviews [Spring and Fall])
1991 -- 2001 (Once yearly interviews)
2006 -- 2007 (Data currently not available)
2009 -- 2010 (Data currently not available)
1987 (Spring--Screening [Cohort 1])
1987 (Fall--Phase A [Cohort 1])
1999 (Spring--Phase Y [Cohort 1], Phase V [Cohort 2])
2001 (Spring--Phase AA [Cohort 2])
1988 (Spring--Phase B [Cohort 1], Screening [Cohort 2])
1988 (Fall--Phase C [Cohort 1], Phase A [Cohort 2])
1996 (Spring--Phase R [Cohort 1], Phase P [Cohort 2])
1989 (Spring--Phase D [Cohort 1], Phase B [Cohort 2])
1989 (Fall--Phase E [Cohort 1], Phase C [Cohort 2])
1990 (Spring--Phase F [Cohort 1], Phase D [Cohort 2])
1998 (Spring--Phase V [Cohort 1], Phase T [Cohort 2])
1995 (Spring--Phase P [Cohort 1], Phase N [Cohort 2])
2000 (Spring--Phase AA [Cohort 1], Phase Y [Cohort 2])
2006 -- 2007 (Fall--Phase CC [Cohort 1&Cohort 2])
1990 (Fall--Phase G [Cohort 1], Phase E [Cohort 2])
1991 (Spring--Phase H [Cohort 1], Phase F [Cohort 2])
1994 (Spring--Phase N [Cohort 1], Phase L [Cohort 2])
1991 (Fall--Phase G [Cohort 2])
1997 (Spring--Phase T [Cohort 1], Phase R [Cohort 2])
1993 (Spring--Phase L [Cohort 1], Phase J [Cohort 2])
1992 (Spring--Phase J [Cohort 1], Phase H [Cohort 2])
2009 -- 2010 (Fall--Phase DD [Cohort 1&Cohort 2])
Most variables in the PYS data collection match the questions and labeling information presented in the original survey and interview booklets provided by the research team. However, users should be aware that there are some minor discrepancies between the raw data available as part of this collection and the original survey and interview booklets.
A limited number of questions that appear in the original booklets are not represented by corresponding variables in the data. Variables were dropped from the data collection for various reasons including confidentiality concerns, the presence of duplicate questions in the original questionnaire, and missing unit information that made particular questions/variables meaningless. Additionally, some free-response questions were never coded or entered into the data files.
Variable labels were written with the intention of conveying exactly what is measured by that individual variable, which in some cases requires including some of the text of other questions. For some variables, the answer codes printed in the questionnaire were inadequate for representing the responses given, so additional codes were added in the data file.
When working with the PYS data, please use Cohort 2 booklets unless the question is labeled "Cohort 1 only." Approximately two-thirds of the participants in PYS are part of Cohort 2. Therefore, the PYS standard is to use Cohort 2 variable names and data structure as much as possible. Consequently, cohort 2 booklets match best with the available datasets.
Open-ended questions have been recoded to categorical variables to facilitate analyses and to protect respondent confidentiality. A detailed inventory of variable-level recodes is not available.
Single questions in which a respondent could circle or select more than one response are represented by multiple dichotomous variables in the data.
In continuous variables, the PYS research team and ICPSR retained all responses even if a value seems implausible. Users should review continuous variables for high and low outliers before including a variable in their analyses. Researchers may consider top- or bottom-coding responses or re-coding outliers to missing data codes.
In some datasets, there are differences in the numbering and lettering of questions between the data and documentation. Please note that even though differences may exist, the text of the questions present are largely identical between the data and documentation. Therefore, these numbering or lettering differences should not significantly impede users attempts to match questions in the data with the documentation provided.
There may be differences between early versions of the questionnaire booklets and the data in the way answers are coded. Changes were made to coding in the data to make it consistent with questionnaires from subsequent phases.
Please see the readme file for a description of the public use documentation and the 636 datasets in this collection.
The following phases from the youngest sample are included in this collection:
- Screening (1987/1988)
- Phases A - H (1987/1988 - 1991/1992)
- Phase J (1992/1993)
- Phase L (1993/1994)
- Phase N (1994/1995)
- Phase P (1995/1996)
- Phase R (1996/1997)
- Phase T (1997/1998)
- Phase V (1998/1999)
- Phase Y (1999/2000)
- Phase AA (2000/2001)
The following phases from the youngest sample are not included in this collection:
- Phase CC (2006/2007)
- Phase DD (2009/2010)