Resource Guide
Project on Human
Development in Chicago Neighborhoods
Longitudinal Cohort Study
The Longitudinal Cohort Study collected three waves of data over a period of seven years from a sample of children, adolescents, young adults, and their primary caregivers. Seven randomly-selected cohorts of respondents were selected to study the changing circumstances of their lives and the personal characteristics that may lead them towards or away from a variety of antisocial behaviors. The age cohorts include birth (0), 3, 6, 9, 12, 15, and 18 years. Data were collected at three points in time: 1994-1997, 1997-1999, and 2000-2001. Numerous measures were administered to respondents to gauge various aspects of human development, including individual differences, as well as family, peer, and school influences
Sampling
Neighborhood Clusters: The PHDCN Scientific Directors defined neighborhoods spatially, as a collection of people and institutions occupying a subsection of a larger community. The project collapsed 847 census tracts in the city of Chicago to form 343 neighborhood clusters (NCs). The predominant guideline in formation of the NCs was that they should be as ecologically meaningful as possible, composed of geographically contiguous census tracts, and internally homogenous on key census indicators. The project settled on an ecological unit of about 8,000 people, which is smaller than the 77 established community areas in Chicago (of which the average size is almost 40,000 people), but large enough to approximate local neighborhoods. Geographic boundaries (e.g., railroad tracks, parks, and freeways) and knowledge of Chicago's neighborhoods guided this process.
As part of the longitudinal cohort study, 800-900 participants in each of seven age groups were sampled from households in 80 of the 343 NCs.
PHDCN used a three-stage sampling design. At the first stage, 343 neighborhoods, containing all residents of Chicago, were cross-classified by two census-derived stratification variables--racial-ethnic mix (seven levels) and socioeconomic status (three levels). A stratified probability sample of 80 neighborhoods was selected for inclusion in the study. Next, block groups were selected at random within each of the sample neighborhoods and a complete listing of dwelling units collected for sampled block groups. Finally, residents were contacted and the household composition enumerated. Children within 6 months of birth, ages 3, 6, 9, 12, 15, and 18 were selected for longitudinal study.
Response Rates
Screening
A stratified probability sample of 80 neighborhoods was selected from the 343 defined neighborhood clusters. Next, block groups were selected at random within each of the sample neighborhoods. A complete listing of dwelling units was collected for all sampled block groups. Pregnant women, children, and young adults in seven age cohorts (birth, 3, 6, 9, 12, 15, and 18 years) were identified through in-person screening of approximately 40,000 dwelling units within the 80 NCs. The screening response rate was 80 percent.
Wave 1
Children within six months of the birthday that qualified them for the sample were selected for inclusion in the Longitudinal Cohort Study. A total of 8,347 participants were identified through the screening. Of the eligible study participants, 6,228 were interviewed for an overall response rate of 75%.
Cohort | Screened Eligibles | Completes | Response Rate |
---|---|---|---|
0 | 1,666 | 1,269 | 76.2 |
3 | 1,309 | 1,003 | 76.6 |
6 | 1,307 | 980 | 75 |
9 | 1,091 | 828 | 75.9 |
12 | 1,103 | 820 | 74.3 |
15 | 972 | 696 | 71.6 |
18 | 899 | 632 | 70.3 |
Wave 2
Of the 6,228 respondents from Wave 1, 16 were deceased by Wave 2, leaving 6,212 eligible for participation in the study. Of the eligible respondents, 5,338 participated in the Wave 2 interview for an overall response rate of 85.93%.
Cohort | Study Participant Rate | Primary Caregivers Rate | Total Subjects (minus deceased) |
---|---|---|---|
0 | 0 | 83.3 | 1257 |
3 | 87.5 | 88.3 | 1003 |
6 | 88 | 88.3 | 979 |
9 | 85.6 | 86.6 | 828 |
12 | 86.2 | 87.2 | 820 |
15 | 82.7 | 85.9 | 694 |
18 | 80.2 | 0 | 631 |
Wave 3
Of the 6,212 eligible participants for Wave 2, nine were deceased by Wave 3, leaving 6,203 eligible participants in the study. Of the eligible respondents, 4,850 participated in the Wave 3 interview for an overall response rate of 78.19%.
Cohort | Study Participant Rate | Primary Caregivers Rate | Total Subjects (minus deceased) |
---|---|---|---|
0 | 76 | 76.6 | 1254 |
3 | 80.5 | 81.3 | 1002 |
6 | 80.2 | 80.6 | 979 |
9 | 77.5 | 79 | 828 |
12 | 74.9 | 79.1 | 820 |
15 | 71.3 | 77 | 691 |
18 | 67.4 | 0 | 629 |
Units of Analysis
General Range of Age for Each Cohort at Wave 1-3
Cohort 00 | Cohort 03 | Cohort 06 | Cohort 09 | Cohort 12 | Cohort 15 | Cohort 18 | |
---|---|---|---|---|---|---|---|
Wave 1 (1/1995 - 6/1997) | prenatal-2 | 2-4 | 4-7 | 7-10 | 10-13 | 13-16 | 16-19 |
Wave 2 (2/1997 - 1/2000) | 0-4 | 3-7 | 6-11 | 9-13 | 12-17 | 15-19 | 18-23 |
Wave 3 (1/2000 - 12/2001) | 2-7 | 6-9 | 9-13 | 11-15 | 15-18 | 18-22 | 20-24 |
Instruments and Measures
The protocols used to measure individual and family characteristics for participants in the PHDCN consisted of several types of assessment instruments, including self-report questionnaires, structured interview formats and educational tests. A list of all instruments which comprised the protocols for Wave 1, Wave 2, and Wave 3 of the PHDCN, along with a brief description and selected citations for each measure, is provided below. Information listed here includes in which waves the protocol was conducted. For the most part, instruments conducted across multiple waves kept the same naming convention. In the cases where the titles of instruments changed from wave to wave but measured the same construct, multiple titles are listed (e.g. Conflict Tactics Scale for Parent and Child / Caregiver-Subject Conflict Scale).
Instruments Administered by Wave and Cohort (pdf)
Imputations
At each wave, three primary caregiver-level variables have versions where missing values have been imputed by the Scientific Directors. The three variables pertain to education level (ordinal), salary level (ordinal), and SEI (socioeconomic index, continuous). Imputation calculations are done at the primary caregiver level: 1 record per unique value of FAM_ID in the PHDCN MASTER data file (ICPSR 13580).
The mean values of the three variables over all records with the same FAM_ID are calculated and those means values are used in the imputation calculations. Imputations are done only if at least one of the three variables is non-missing; otherwise, the imputed values are also missing. Imputed values for each variable are based on the results of regression models where the other two variables are independent covariates. The principal component of the three variables also has a version that is imputed if any of the three component variables are missing.
Wave 1 imputations
- The 3 variables for which imputed versions are derived are as follows:
- EDUC_MAX (maximum of EDUC_PC [education level of PC] and EDUC_PR [education level of partner]); 5 levels; grade equivalent for level is used in regression models (range 8-16).
- SALARY (household income); 7 levels; dollar equivalent of level is used in regression models (range 2500-55000).
- SEIMAX (maximum of SEI for PC job and partner job); continuous.
- SESCOMP is the principal component of the 3 variables.
- Note: EDUC_MAX is derived even if PC education level is missing (49 cases) and even if a partner is present and the partner's education level is missing (177 cases). Thus, for these 226 cases, the value of EDUC_MAX itself involves some imputation. The derivation of EDUC_MAX is justified for the following reasons:
- The Spearman correlation of the education levels of PC and partner for the 2812 cases where both are non-missing is .604 nd the mean grade difference (PC-partner) for these cases is .36.
- The mean grade completed for 2,989 PC's with a partner is 11.89 and for 1,226 PC's without a partner is 12.09.
- For the 2989 PC's with a partner, the 2812 with non-missing partner education have a mean grade completed of 11.99 and the 177 with missing partner education have a mean grade completed of 10.18.
- For the 49 cases where PC education is missing, the mean grade completed for the partner is 9.39.
- Three regressions are run:
- EDUC_MAX dependent independent: SALARY (0 if missing), variable to indicate if SALARY is missing, SEIMAX (0 if missing), variable to indicate if SEIMAX is missing.
- SALARY dependent independent: EDUC_MAX (0 if missing), variable to indicate if EDUC_MAX is missing, SEIMAX (0 if missing), variable to indicate if SEIMAX is missing.
- SEIMAX dependent independent: EDUC_MAX (0 if missing), variable to indicate if EDUC_MAX is missing, SALARY (0 if missing), variable to indicate if SALARY is missing.
- After each regression, the imputed value of the dependent variable is calculated; it is the predicted value plus a random value from a normal distribution multiplied by the MSE (mean squared error) for the regression. For SEIMAX and SALARY, negative imputed values are set to 0. Continuous imputed values for EDUC_MAX and SALARY are categorized.
- If the value of the original variable (SP-level) is missing, the imputed value is assigned the imputed version of the variable; otherwise, the value of the imputed version equals the value of the original version.
- The imputed version of EDUC_MAX is IEDUCMAX; variable EDUCMAXI indicates whether the value of IEDUCMAX has been imputed.
- The imputed version of SALARY is ISALARY; variable SALARYI indicates whether the value of ISALARY has been imputed.
- The imputed version of SEIMAX is ISEIMAX; variable SEIMAXI indicates whether the value of ISEIMAX has been imputed.
- A principal components analysis is run using the 3 imputed variables; if SESCOMP is non-missing then ISESCOMP is set equal to SESCOMP; otherwise, it equals the principal component of this new analysis; variable SESCOMPI indicates whether the value of ISESCOMP was calculated using any imputed variables.
Wave 2 imputations
- The 3 variables for which imputed versions are derived are as follows:
- EDUCMAX2 (maximum of EDUC_PC [education level of PC] and EDUC_PR [education level of partner at Wave 1]); 5 levels; grade equivalent for level is used in regression models (range 8-16).
- SALARY2 (household income); 11 levels; dollar equivalent of level is used in regression models (range 2500-95000).
- SEIMAX2 (maximum of SEI for PC job and partner job); continuous. SESCOMP2 is the principal component of the 3 variables.
- Note: partner education level was not measured in Wave 2. At Wave 2 the PC education level was inquired about only if the PC had attended school since Wave 1. For 66 cases it is unknown whether the Wave 2 PC is the same as the Wave 1 PC. Unfortunately, for another 177 cases the Wave 2 PC differs from the Wave 1 PC but PC education level was not inquired about in Wave 2. For 64 cases the PC education level at both waves is missing as is the partner education level (if there is a partner). For these 307 cases, EDUCMAX2 is missing. For the remaining cases EDUCMAX2 is assigned as follows:
- If the Wave 2 PC differs from the Wave 1 PC, EDUCMAX2 is set to the PC education level collected at Wave 2.
- Else if PC has no partner at Wave 2, EDUCMAX2 is set to PC education level(from Wave 2 if collected, otherwise from Wave 1).
- Else if PC has a partner at Wave 2 and partner is same as in Wave 1, EDUCMAX2 is set to the maximum of the PC education level(from Wave 2 if collected, otherwise from Wave 1) and the Wave 1 partner education.
- Else if PC has a partner at Wave 2 and partner differs from Wave 1 partner, EDUCMAX2 is set to the PC education level(from Wave 2 if collected, otherwise from Wave 1).
- The remainder of the Wave 2 imputation algorithm is exactly comparable to the Wave 1 imputation algorithm.
- The imputed version of EDUCMAX2 is IEDUMAX2; variable EDUMAXI2 indicates whether the value of IEDUMAX2 has been imputed.
- The imputed version of SALARY2 is ISALARY2; variable SALARYI2 indicates whether the value of ISALARY2 has been imputed.
- The imputed version of SEIMAX2 is ISEIMAX2; variable SEIMAXI2 indicates whether the value of ISEIMAX2 has been imputed.
- The imputed version of SESCOMP2 is ISESCOMP2; variable SESCOMPI2 indicates whether any component of ISESCOMP2 has been imputed.