PreK=-3rd Data Resource Center: The First Six Years of Schooling and Beyond

Panel Study for Income Dynamics, Child Development Supplement Resource Guide

Introduction

About the Guide

This resource guide provides a brief overview of the Panel Study of Income Dynamics, Child Development Supplement (PSID-CDS), and specific instructions for creating an extract dataset which you can download to your own computer. It also provides guidance in obtaining access to additional restricted-use data. This document draws extensively on the official PSID documentation. For complete information about the study, users can refer to the PSID Web site and the PSID-CDS Web site.

About the Data

The PSID-CDS, which is an extension of the core PSID, follows PSID households that had a child under the age of 13 in 1997 to collect information about their developmental outcomes in the context of family, community, and school environments. The core PSID dataset has surveyed a panel of U.S. families annually between 1968 and 1997, and every two years beginning in 1997. Over 8,000 families were interviewed in the most recent wave in 2005.

The PSID is sponsored by the National Science Foundation, the National Institute on Aging, and the National Institute on Child Health and Human Development, the Economic Research Service of the U.S. Department of Agriculture, the Center on Philanthropy at Indiana University, the Office of the Assistant Secretary for Planning and Evaluation, and the U.S. Department of Housing and Urban Development. Content is broad, including employment, income, housing/neighborhood characteristics, program participation, health, health behaviors, expenditures, wealth, marriage and fertility history, demographic outcomes, and more.

Beginning in 1997, the PSID-CDS began collecting supplemental developmental information about a subsample of children in PSID households. To date, the CDS has collected data on these children at two points in time, 1997 (CDS-I) and 2002 (CDS-II) with a third wave of data collection taking place in 2007-2008 (CDS-III). Data related to child development is collected through multiple instruments including in-home interviews with the child and family members, telephone interviews with the primary and secondary caregivers, achievement tests, height and weight assessments, time diaries, interviews with teachers, and curriculum and school administration information.

Restricted access to geo-coded data is also available through special contractual arrangements, providing access to census geographic locations of households, and school identifiers that allow of PSID-CDS data to be linked to NCES Common Core Data and Private School Survey data. For details contact PSID staff.

Acknowledgements

This resource guide was prepared by Donald J. Hernandez, Department of Sociology, University at Albany, State University of New York. It was developed for the PreK-3rd Data Resource Center: The First Six Years of Schooling and Beyond, a Web site hosted by ICPSR with support from the Foundation for Child Development. The help that PSID staff provided during the preparation this user's guide is gratefully acknowledged.

Sample

The PSID core families were selected as a nationally representative sample in 1968, with an oversample of low-income households. Core families were surveyed annually between 1968-1997 and then every two years beginning in 1999. Grown children from original PSID families are followed as they split off from their original family units and establish independent households, making it possible to study multigenerational family influences. In 1997 and 1999, the PSID sample was "freshened" to introduce a sample of recent immigrant families. In the same year the sample was trimmed to limit the number of surveyed households. As of 2007, the sample consisted of 8,500 families.

The initial wave of data collection for the CDS took place in 1997, at which time as many as two children between ages 0-12 were randomly selected from 2,705 PSID families. Of those selected, 2,394 families (88%) were interviewed successfully, yielding data for 3,563 children. For the second round of data collection in 2002 and 2003, 84% (2,019 families) of the 1997 sample were re-contacted providing data for 2,907 children ages 10-18.

A successful primary caregiver interview is required for a child to be counted as included in the study, although it should be noted that response rates for the various data collection instruments differ. (See Table 1 for continuation rates between CDS-I and CDS-II.)

Table 1. Sample Continuation Rates between CDS-I and CDS-II

Instrument Number completed in CDS-II Response rate*
Primary caregiver household interview2,89191%
Assessments2,64491%
Child interview (8+ years)2,18291%
Time diary2,56988%
Teacher Interview69954%

* Response rate is calculated from the total number of eligible children in CDS-II.

Additional information about sample composition can be found in the CDS User's Guides for CDS-I and CDS-II.

An adolescent module was incorporated in the study during CDS-II and CDS-III for members of the sample ages 12 and older. After CDS children turn age 18, they are interviewed in a study called "Transition to Adulthood" that focuses on the unique developmental changes which occur when teenagers transition to independent adulthood. After establishing an independent household, they are interviewed as a separate PSID family unit, and, because they are PSID sample members, they will be followed throughout adulthood.

Data Elements

The CDS gathers a broad array of measures on developmental outcomes across the domains of health, psychological well-being, social relationships, cognitive development, achievement motivation, and education as well as a number of measures of the family, neighborhood, and school environments in which sample members live and learn. The breadth and depth of measurement offers a substantively rich resource to study development of children and teens from infancy/early childhood through middle childhood and adolescence.

In addition to the survey measures described above, the CDS collects time diaries from the sample children 3-18 years of age. These diaries provide a basic foundation for understanding how children across ages, ethnic groups, and socioeconomic status engage in a range of activities and social circumstances. They also offer an excellent opportunity to investigate research questions that examine relationships among time spent in various activities, aspects of the family environment, and outcomes related to achievement, social and behavioral development, and health.

Because the CDS is a supplement to the PSID, the study takes advantage of extensive family demographic and economic data about the sample children's family--not only parents, but also grandparents, siblings, cousins, and other relatives--providing more extensive family data than any other nationally-representative longitudinal survey of children and youth in the U.S. This rich data structure allows analysts a unique opportunity to fully link information on children, their parents, their grandparents, and other relatives to take advantage of intergenerational and long-panel dimensions of the data.

In order to create a comprehensive picture of the developmental trajectory of each child in the CDS sample, data are collected from a variety of sources. Primary variable domains collected in the CDS are listed in Table 2, along with examples of specific items in each domain.

The primary caregiver is considered to be the respondent of record for each child observation and is the "anchor" for all other data modules collected on the child. The primary caregiver is considered to be the adult who takes primary responsibility for the child and may not be a paid caretaker such as a nanny or babysitter. Children ages eight and older in 1997 were interviewed directly, and for children under eight years of age, data was collected from the primary caregiver.

Table 2 (below) presents general categories of data collected directly for the CDS. But the user can also access thousands of additional variables pertaining to the child, and her or his family and neighborhood by linking CDS data to the core PSID using each family's and child's unique identifier. Indeed, variables that capture the demographic and social characteristics necessary for many research projects require extraction of PSID core variables. Instructions for doing so are described in the Create Extract File section.

Table 2. Child Development Supplement: Measurement Domains, Descriptions, and Data Collection Modules

Domain Description Module
Health Status & Behaviors* General health status, chronic conditions, obesity, limitations, health care utilization, health-related expenditures, nutrition, exercise, sleep, smoking PCG Child
Psychological & Social Well-Being* Behavior problems, depression, self-esteem, worry, social well- being; risky behaviors, thrill seeking, anti-social behaviors; drug and alcohol abuse /dependence PCG Child
Family Environment* HOME SF cognitive & emotional stimulation; parental warmth; household tasks; involvement, closeness, conflict w/ father & mother PCG OCG
Parental Monitoring* Caregivers' knowledge of the child's whereabouts, activities, and associations; child disclosure of activities PCG; OCG Child
Child Care Type, frequency of use, and costs of arrangements for children up to Kindergarten. PCG
Education Parental expectations; enrollment; type of school; tuition; attendance; federal lunch & breakfast programs; attended special class/school for gifted students; classified as needing special education; repeated grade; dropped out PCG
Achievement Woodcock Johnson tests of achievement; course grades; WISC Digit Span short-term memory; ability self-concepts in reading and math Child
Time Use Stylized questions about structured and unstructured activities, activities with parents, extra curricular, part-time jobs; Time Diary measures of type, number, duration, and location of weekday and weekend activities PCG Child
Religiosity* Comfort, importance of religious affiliation or spirituality Child
Future Work & Schooling Expectations* Achieved occupational certainty and identity; job values, career orientation and expectations for future work and schooling; negative economic expectations Child
Sibling Relationships Type and frequency of cooperation with, kindness towards, and helping behaviors towards siblings PCG
Caregiver Social and Psychological Resources Rosenberg self-esteem; Pearlin self-efficacy; K-6 non-specific psychological distress; social support; parenting attitudes; aggravation in parenting; gender role beliefs; family conflict; economic strain; work schedules; community involvement PCG; OCG
Absent Parents Frequency/types of activities absent parents are involved with their children; conflict between resident and absent parent PCG
Spending & Savings* Variety of expenses for household members, non-household members, and absent parents; savings mechanisms for child PCG
School Environment School type; racial/ethnic composition; pupil-teacher ratio; completion rates; expenditures per child; other school resources; climate; curriculum tracks; science/math courses CCD**; Curriculum Catalogs*

*New domain added or expanded for CDS-II and CDS-III.

**CCD=Links to Common Core of Data, which replaced school administrator interview in CDS-I.

PCG is the Primary Caregiver Child Interview.

OCG is a Secondary Caregiver Child Interview.

Table 2 presents general categories of data collected directly for the CDS. But the user can also access thousands of additional variables pertaining to the child, and her or his family and neighborhood by linking CDS data to the core PSID using each family's and child's unique identifier. Indeed, variables that capture the demographic and social characteristics necessary for many research projects require extraction of PSID core variables. Instructions are presented below to accomplish this task.

Create Extract File

Researchers interested in extracting all or a portion of the CDS can do so through a data extraction application called the Data Center located on the PSID Web site. The Data Center allows the user to download both PSID core data and CDS data directly to the user's computer. A detailed view is provided on the PSID Web site. A series of tutorials has been developed to facilitate use of the CDS and PSID data. Two of the tutorials that have been developed are specific to CDS. Tutorial #4 includes information about creating a customized subset from a selection of CDS data modules. Tutorials #5A and #5B provide instruction for creating customized files and conducting intergenerational analysis using both PSID and CDS individuals, with applications in the areas of home ownership and health outcomes.

The first step in extracting data is to access the following PSID Web site.

On the left-hand side of this page is a direct link to the Data Center. On the right hand side is a link that says "CDS Data and Documentation". First time users should follow the "CDS Data and Documentation" link by clicking on it. On the page that opens you will find links to several helpful pages that may be of assistance in learning more about the Child Development Supplement. Click on "CDS Documentation" to locate user's guides for Wave I and II as well information about weights, file structure, and selected individual and summary variables. The link "Questionnaires" opens a page to access questionnaires for each wave of the study. Other links on this page connect the user to tutorials on the CDS, FAQs, and information on CDS project directors.

To access the CDS Data Center from the link above, select "Data Center" on the right-hand side of the screen. This will bring you to the Data Center's main page titled "Welcome to the Data Center!."

Step 1: Introduction to the PSID-CDS Data Center

The Data Center is a platform from which users can extract a subset of the core PSID dataset, the CDS dataset, or portions of both. On the main page four options are shown for reviewing available data elements and selecting them for extraction. Here is a summary of those four options:

By File: Users may view available data organized by collection instrument by selecting "By File". This will open a "drill-down interface" that the user can manipulate to view available data categories and also to narrow down to the desired variable or group of variables. See below for more information on "By File" searching and extraction.

By Index: Users interested in viewing available data by topic or by its availability across waves should select "By Index". This selection opens a vertical dropdown list of data topics and a horizontal list of data collection years. For most CDS variables, data is available in only one or two of the collection years (1997 and 2002); however, users interested in incorporating data from the core PSID will benefit from this capacity to compare availability over multiple data collection years. See below for more information on "By Index" searching and extraction.

By Search: This option allows the user to search the codebook for variables using a keyword. The keyword entered is searched for within the question text, variable label, and variable name fields within the codebook (Users can choose where to search and search type.). The search will return a list of all references to the keyword located in the datasets and fields specified by the user. See below for more information on "By Search" searching and extraction.

By Cart: Users may select pre-defined "carts" or catalogs of variables that have been created at an earlier session or by another user. In order to select data carts the user will need to create an account and login to the Data Center.

By more than one approach: Users can begin with any method of these four methods, and then switch methods while shopping. The content of the user's cart will not be lost. To view the variables selected, simply click on data cart.

Step 2: Identifying Relevant Data Elements

As indicated above, the process of searching for the desired group of variables and extracting them to your personal computer begins by selecting one of the four search options listed above. Determining which method to use is a matter of personal preference and the user's advanced knowledge of the desired dataset variables. Instructions for each method follow:

By File: From the Data Center main page click on "By file". This will bring you to a data "tree" that is used for narrowing the complete list of variables available in the PSID and CDS to identify specific data elements. Begin by choosing from one of the primary data sources: PSID Family-level, PSID Individual-level, CDS (including time aggregates), or CDS Time Diaries. This example will progress assuming that the user has selected "CDS (including time aggregates)".

Clicking on the "+" sign to the left of this selection will open a list of 18 broad data categories corresponding to the CDS collection instruments plus an entry for demographic variables collected by the PSID. Clicking on the "+" sign to the left of one of these categories will allow the user to select data from either the first (1997) or second (2002) wave of CDS data. Next to each of these options is the number of observations and variables contained in each of these smaller datasets. For the purpose of this example, select "Primary Caregiver Child File" and then "2002". A small window will open listing the 919 variables located in the 2002 Primary Caregiver Child File. In this list the variable name and a short description are provided.

Additional information about each variable can be easily obtained by double-clicking on the variable name. A window will open that displays the variable name and description, question wording, the range of valid values, and the number and percent of individuals responding with each of those values. In addition, the years that the variable is available and the variable name in each of those years are listed. You can print the codebook section for that variable by clicking "Print" at the top of the window.

To add one or more variables to your cart highlight the variable and click "Add to cart". Multiple variables listed in a row can be highlighted and added by pressing the "Shift" key or the "CTRL" key while clicking variable names.

By Index: From the Data Center main page click on "By index". The user then selects Individual Data Index, Family Data Index, or CDS Data Index. This opens a data "tree" with a list of topical variable groupings. Clicking on the "+" to the left of each category opens either a list of related variables or additional sub-categories within which variables can be found. To locate the appropriate variables you will need to "drill down", opening each menu until the variable level is available.

Individual variables can be identified from variable categories by a purple box that appears next to the variable description. In addition, there is a long row of boxes corresponding to PSID data collection years. Boxes shaded white indicate the variable is available for that year. Boxes shaded gray indicate information for the variable was not collected for that year. To select a variable for extraction, click the adjacent box, then select "Add to Cart" at the top of the page. Multiple variables may be selected before adding them to the cart.

To obtain codebook information about a particular variable, click on the purple box next to the variable description. This displays the variable name, a short description, the years for which the variable is available, the range of valid values, and frequency distributions.

By Search: Users interested in locating all variables associated with a particular topic or concept can search for these variables using the "By Search" option. From the main Data Center page, click on "By Search". You will be brought to a screen with several options for limiting the search. On the left-hand side of the search box is the Data File Type box. The user must select either PSID family-level, PSID individual-level, CDS (including time diary aggregates), or CDS Time Diaries.

For the purpose of this example, select "CDS (including time diary aggregates)". Making this selection narrows the available options in the next box, "Data Year" to 1997 and 2002, the two collection points for CDS data. If choosing to search the core PSID datasets, the full list of years between 1968 and 2005 will appear in this box. The next section of the search box, "Codebook Part" allows the user to search "Question or Explanation Text", "Variable Label", "Variable Name", or "All of the above". The introductory user is recommended to select "All of the above" in order to capture the broadest range of relevant variables. Next, the user chooses the radial button associated with a "Any words", "All words", or "Phrase" search. Select "All words" as the most comprehensive search option. Finally, enter a keyword to be used in the search and click the "Search Codebooks" tab. It is generally better to begin with a general term followed by more specific terms to ensure that all relevant variables are captured.

Results from your search will appear in a table below the search box. The top of the table reports how many matches were found for the keyword entered. The table of results has six columns. The first three columns report the primary and secondary data files in which each variable is located and the data collection year for the variable. The final three columns contain the variable name, a short label, and a "Select" box that is used to add each variable to your data cart. To add one or more variables check the box to the right of the variable and then click "Add to Cart" at the top of the table, or you can choose to click "Check All" to extract all variables in the table.

To obtain additional information about each of the displayed variables, click on the purple box next to each variable name. The window that opens displays the variable name, a short description, the years for which the variable is available and the corresponding variable name in each of those years, the range of valid values, and the number and percentage of observations responding with each of those values.

By Cart: The option to select CDS variables "By Cart" allows the user to pick up lists of variables that they have created at an earlier date or variable lists that have been created by another user and marked "public", or by oneself and marked "private". To access a data cart that you have saved from an earlier session, you will need to login to the Data Center and then click "By Cart" as your search option. A list of carts and the dates they were saved is presented. To access a cart created by another user, enter that user's email address into the space provided. Any carts that they have made public will be listed.

Step 3: Extracting CDS Variables to your Personal Computer

To view the cart of variables selected and begin the extraction process, click on the bolded words "your cart" on the left side of the screen. A data "tree" presents the selected variables as well as the identifier variables required for each data download. Note that for all downloads two summary variables, a unique family ID and a unique person ID, are included. Also, notice that when downloading data from the 2002 CDS, three identifier variables are listed under the year 2001 (1996 will appear when 1997 variables are downloaded). This is because the identifier and demographic information for the 2002 sample was drawn from 2001 collection information, thus the identifiers are attached to the 2001 PSID dataset. Following these identifiers, you will see the list of variables you selected for extraction.

Three tabs at the top of the page allow the user to delete specific variables in the cart, empty the cart of all variables, or proceed to checkout. After the list is finalized, select "Checkout". The next screen will ask you to login or register. After registering, new users will need to select the "Data Cart" option, located by placing the cursor on the word "Data Center" at the top of the screen. Subsequent screens provide several options for the download process.

Begin by specifying whether the codebook should be included along with the data download (recommended) and the format of the codebook (HTML, PDF, XML)(Users can request codebook only - without data.). In the next box select the data output type. You can choose to receive the data in ASCII format with statements for SAS, SPSS, or STATA programs. Data can be also be downloaded in Excel spreadsheets, dBase files, or SAS V9 Transport data files. The "Subsetting criteria" box is available for users to limit the sample of extracted observations by writing filter limitation statements based on downloaded variables. Users can also choose to extract records for all individuals, or only for CDS children, or for CDS Primary Caregivers. Finally, users have the option to receive the data file in compressed format. After making selections for each option, click "Submit".

On the next page the Data Center displays links to the codebook sections requested and the available data files. Click on these files to open them, and then save to your personal computer, or right click on these files to save directly to your personal computer by clicking on "Save Target As...". Links also are available on this page for PSID recommended downloads of complete versions of the PSID and CDS codebooks and other documentation.

Learn More

Additional Data for Siblings, Parents, Grandparents, and other Family Members

As noted above, many CDS children were in families where two children were selected for the sample. In addition, information has been collected regarding PSID family members since 1968. Therefore, for any specific child, the CDS data set may include detailed information for the "CDS sibling", and the core PSID dataset may include extensive information about parents and grandparents.

A tool on the PSID website can be used to create a file of identification numbers linking individuals to their parents, grandparents, and siblings of respondents. With these IDs, users can extract other relevant data from the PSID core files or CDS files for these family members. This tool is called the Family Identification and Mapping System (FIMS). A tutorial is available to help users who are interested in using this tool. This information can be found on the following Web sites: PSID Data Download Site and PSID Tutorial #6: Learn how to create and analyze intergenerational data using the PSID Data Center and SAS.

For detailed answers to specific questions regarding PSID, users can email PSID Help.