CPES logoCollaborative Psychiatric
Epidemiology Surveys

Sample Design

The NIMH-CPES survey data collections were each based on a multi-stage area probability sample conducted in a total of 252 geographic areas or primary sampling units across the United States. The sample was selected using the sampling frames and selection procedures that are common to the University of Michigan Survey Research Center's (SRC) National Sample design. The national area probability samples for the three individual studies include unique features designed to optimize the cost and error properties of the study-specific samples. The general features of each study sample are summarized in Table 1.

Table 1. Key features of the Collaborative Psychiatric Epidemiology Studies (CPES) sample designs
Sample design feature National Comorbidity Survey Replication (NCS-R) National Survey of American Life (NSAL) National Latino and Asian American Study (NLAAS)
Survey population All adults, age 18+ residing in households in the coterminous United States. Exclusions include institutionalized persons, those living on military bases, and non-English speakers. African-American, Afro-Caribbean, and non-Hispanic white adults, age 18+ residing in households in the coterminous United States. Exclusions include institutionalized persons, those living on military bases and non-English speakers. Latino and Asian-American adults, age 18+ residing in households in the coterminous United States, Alaska, and Hawaii. Exclusions include institutionalized persons and those living on military bases.
Sample frame Four-stage national area probability sample. Four-stage national area probability sample with special supplement for Afro-Caribbean adults. Four-stage national area probability sample with special supplements for adults of Puerto Rican, Cuban, Chinese, Filipino and Vietnamese national origin.
Sample size 13,054 sample housing units screened for eligible adults. 9,282 completed interviews with eligible respondents. 26,495 sample housing units screened for eligible adults. 6,199 completed interviews with eligible respondents. 27,026 sample housing units screened for eligible adults. 4,649 completed interviews with eligible respondents.
Special features Selection of two adult respondents in a subsample of households. Special study of main survey nonresponse. Two-phase sample design to control survey costs in final stages of data collection. Sample linked to NCS-R for statistical comparisons. Selection of two adult respondents in a subsample of households. Two-phase sample design to control survey costs in final stages of data collection.

General features common to each CPES sample design

The selection of a probability sample of respondents for each study's interview required a four-step sampling process -- a primary stage sampling of U.S. Metropolitan Statistical Areas (MSAs) and counties, followed by a second stage sampling of area segments, a third stage sampling of housing units within the selected area segments, and concluding with the random selection of eligible respondents from the sample housing units.

The primary stage units (PSUs) of SRC's National Sample are either MSAs, single counties, or a grouping of geographically contiguous counties with small populations. In each CPES sample design, PSUs are assigned to explicit sampling strata based on MSA/non-MSA status, PSU size, geographic location, and population characteristics. Depending on the CPES study sample design, from 12 to 20 of the primary stage strata contain only a single self-representing (SR) metropolitan PSU. Each SR PSU is included in the sample with certainty in the primary stage of selection. The remaining non-self-representing (NSR) primary stage strata in each design contain more than one PSU. From each of these NSR strata, one PSU is sampled with probability proportionate to its size measured in occupied housing unit counts reported at the most recent census.

The designated second-stage sampling units (SSUs) in each CPES sample design are termed area segments. Area segments were formed by linking geographically contiguous census blocks to form units with a minimum number of occupied housing units (typically 50 to 100 based on the needs of the study). Within primary stage units, area segments were stratified at the county level by geographic location and race/ethnicity composition of residents' households. The race/ethnicity stratification of area segments played a particularly important role in the NSAL and NLAAS sample designs where it was used both to improve the sampling precision of the design and as a basis for more cost-effective oversampling in area segments with higher densities of households for targeted race and ethnicity subpopulations. Within each second stage stratum, the actual probability sampling of area segments was performed with probabilities proportionate to census counts of the occupied housing units for the census blocks that comprise the area segment.

The SRC field staff conducted an up-to-date enumeration or 'listing' of all housing units located within the physical boundaries of the selected area segments for each CPES sample design. A third-stage sample of housing units was then selected for screening interviews according to a predetermined sampling rate.

The third stage sampling rate was computed for each selected area segment in the CPES sample design. This rate was then used to select a systematic random sample of actual housing units from the area segment listing. Each sample housing unit was contacted in person by an interviewer. Within each cooperating sample household, the interviewer conducted a short screening interview with a knowledgeable adult to determine if household members met the study eligibility criteria. If the informant reported that one or more eligible adults lived at the sample housing unit address, the interviewer prepared a complete listing of household members and proceeded to randomly select a respondent for the study interview. The random selection of the respondent was performed using a special adaptation of the objective household roster/selection table method developed by Kish (1949).

National Comorbidity Study Replication (NCS-R) sample design

The survey population for the NCS-R included all U.S. adults aged 18 years and older residing in households located in the coterminous 48 states. Institutionalized persons including individuals in prisons, jails, nursing homes, and long-term medical or dependent care facilities were excluded from the survey population. Military personnel living in civilian housing were eligible for the study but due to security restrictions residents of housing located on a military base or military reservation were excluded. Adults who were not able to conduct the NCS-R interview in English were not eligible for the survey.

The NCS-R was designed to be a cross-sectional replication of the original 1992 National Comorbidity Survey (NCS; Kessler, 1994). To improve the statistical efficiency for cross-time comparison of results from these two surveys, a decision was made early in the NCS-R planning process to maximize the overlap in the primary and secondary stages of the multi-stage sample designs for the two studies. Data from the 2000 U.S. census were not available at the time of the NCS-R sample selection. Therefore, the primary stage design for the NCS-R was carried forward directly from the 1992 NCS multi-stage sample selection with no changes in primary stage strata or PSU definitions and no adjustment to the 1990 census-based measures of size or primary stage selection probabilities. The shared NCS/NCS-R primary stage sample design consisted of a single PSU selection from each of 62 primary stage strata. As shown in Table 2, 16 of these NCS-R PSUs were the largest self-representing MSAs. A total of 31 non-self-representing PSU selections represented the remaining MSAs in the US survey population. More rural non-MSA counties were represented by 15 nonself-representing PSU selections.

Table 2 provides a summary of primary and secondary stage sample allocation for the NCS-R study.

Table 2 . Primary and secondary stage sample allocation for the NCS-R
Domain of
Primary stage strata
Number of NCS-R
Primary Stage Units (PSUs)
Number of NCS-R second stage units (SSUs)
Total NCS area segments New listing needed
Total 62 1001 174
SR MSA 16 395 24
NSR MSA 31 402 48
NSR Non-MSA 15 204 102

National Study of American Life (NSAL) sample design

The NSAL survey populations included all US adults in the three target groups who were age 18 and older and resided in households located in the coterminous 48 states. The African-American survey population included only Black adults who did not identify ancestral ties in the Caribbean. The Afro-Caribbean survey population was limited to Black adults who self-identified as being of Caribbean ancestry. The White survey population included all Caucasian adults except persons of self-reported Hispanic ancestry. Institutionalized persons including individuals in prisons, jails, nursing homes, and long-term medical or dependent care facilities were excluded from the study population. Military personnel living in civilian housing were eligible for the study but residents of housing located on a military base or military reservation were excluded. The NSAL survey populations were restricted to adults who were able to complete the interview in English.

The NSAL multi-stage sample design combines a 'core' national area probability sample of households with a special supplemental sample of households in areas of higher Afro-Caribbean residential density. The NSAL Core national sample is designed to be optimal for a national study of the African-American survey population. The design of the NSAL Core sample closely resembles that used for the 1979-80 National Survey of Black Americans (NSBA) (Hess, 1985; Jackson, 1991). The NSAL Supplement design served solely to augment the sample size from the Afro-Caribbean survey population in a cost and statistically efficient manner and did not contribute to the representative samples of the NSAL's African-American and White survey populations. The NSAL national area probability sample was selected independently of the sample for the NCS-R and the NLAAS although the three designs share many common features such as PSU and area segment definitions and sample selection methods.

The Survey Research Center (SRC) 1990 National Sample of US households (Heeringa et al., 1994) was the starting point for NSAL sample selection. To adapt the sample to be optimal for a national study of the African-American survey population for NSAL, some modification to the primary stage of the basic 1990 SRC National Sample design was needed. The definitions of the primary sampling units in the primary stage frame for the SRC National Sample remained unchanged, but measures of size used in the PPS selection of PSUs were changed from 1990 census counts of total occupied households to African-American occupied households. Some reorganization (combining, splitting) of 1990 'A' National Sample strata (Heeringa and Redmond, 1994) was also required to transform the design from one that was optimal for surveys of all US households to one that emphasized precision for samples of African Americans.

As shown in Table 3, the NSAL Core primary stage design includes 64 PSU selections. The eight largest self-representing MSA PSUs in the 'A' partition of the SRC National Sample remained self-representing (SR) selections in the NSAL primary stage sample. An additional 13 MSA PSUs were designated as self-representing PSUs for the NSAL on the basis of the size of their African-American population, bringing the total number of NSAL SR PSUs to 21. The NSAL primary stage design includes 43 NSR selections, 14 PSUs selected from strata representing the MSA and non-MSA regions of the census Northeast, Midwest and West regions and 29 PSUs selected from MSA and non-MSA strata representing the census South - the region that includes almost 50% of the US African-American population. The primary stage sample allocation for the urban and rural areas of the census South region was deliberately increased to improve sample precision for national estimates derived from the African-American sample. The PPS selection of the 43 NSAL NSR PSUs used a probability sampling method that maximized the overlap of the NSAL primary stage sample with the 1990 SRC National Sample 'A' partition selection for the design stratum. The objective in maximizing the overlap of the NSAL primary stage sample with the 1990 SRC National Sample was, where possible, to take advantage of experienced, trained SRC staff in the National Sample primary stage sample locations.

Table 3. Primary and secondary stage sample allocation for the NSAL Core sample
Domain of
Primary Stage Strata
Number of NSAL
Primary stage units (PSUs)
Number of NSAL Second Stage
Units (SSUs)
Total 64 456
SR MSA 21 198
NSR MSA 27 162
NSR Non-MSA 16 96

The NSAL sample of African-Americans was identified exclusively from the screening of the sample of housing units selected from the 456 NSAL Core area segment listings.

The NSAL sample of Afro-Caribbean households was identified through samples selected from two overlapping area probability sample frames. The first sample source for Afro-Caribbean respondents was from the screening of households in the nationally representative NSAL Core sample. As described above, all sample housing units in this national probability sample were contacted and a screening interview was conducted with each eligible, cooperating household. In total, 266 adult Afro-Caribbeans were successfully interviewed in the NSAL Core national sample. Therefore it was necessary to supplement the NSAL Core sample in order to achieve the original NSAL target sample size of 1,600 Afro-Caribbeans.

Construction of the NSAL Caribbean Supplement sample began with the selection of a stratified sample of eight supplemental PSUs. From these eight PSUs, 86 area segments were selected from the set of qualifying census block groups within the PSUs. To qualify for the Caribbean Supplement, a block group population needed to be at least 10% Afro-Caribbean (based on the 1990 census estimates). Once the primary and secondary stage sampling units were selected, field staff visited each area segment to list housing units.

Table 4 . Primary and secondary stage sample allocation for the NSAL Caribbean Supplement
Domain of
Primary stage strata
Number of NSAL
Primary Stage Units (PSUs)
Number of NSAL Second Stage
Units (SSUs)
Total 8 86
SR MSA 5 66
NSR MSA 3 20

The NSAL White sample was a stratified, disproportionate sampling of non-Hispanic white adults in the US household population. Although in the strictest sense it may be described as a nationally representative sample of White adults, it is not optimal for descriptive analysis of the U.S. White adult population. Instead, the NSAL White sample was designed to be optimal for comparative descriptive and multivariate analyses in which residential, environmental and socioeconomic characteristics are carefully controlled in the black/white statistical contrasts. As described above, the NSAL sample of White adults was identified by screening the national probability sample of housing units selected for the NSAL Core. The original completed interview target for the NSAL White sample was set at n = 1,800. Later in the study period, a decision was made to reduce this target to n = 1,000 White adult interviews based on survey costs and updated analysis objectives for the NSAL project. By the nature of its equal probability national sampling of all US households, the NSAL Core screening for eligible African-American and Afro-Caribbean households was projected to identify far more eligible White households than required to meet the sample size target. Therefore, subsampling of eligible White adults at the screening stage was employed to bring the sample of interviews with this group in line with the study targets.

National Latino and Asian American Study (NLAAS) Sample Design

The survey populations for the NLAAS study included all Latino and Asian American adults who resided in households in the US states and Washington, DC. Latinos were divided into four strata of interest: Mexican, Puerto Rican, Cuban, and all other Latinos. The Asian American survey population was also stratified based on eligible adults' ancestry or national origin: Chinese, Filipino, Vietnamese, and all other Asians. This stratification of the NLAAS survey populations relied on self-reports by household members at the time of the household screening. In cases where a member of the survey population reported belonging to more than one Latino or Asian American target population, the following order of priority was used to assign individuals to a single group for the purpose of the stratified sample selection:

  1. Vietnamese;
  2. Cuban;
  3. Filipino;
  4. Puerto Rican;
  5. Chinese;
  6. Mexican;
  7. other Asian; and
  8. other Latino.

Institutionalized persons including individuals in prisons, jails, nursing homes, and long-term medical or dependent care facilities were excluded from the study populations. Military personnel living in civilian housing were eligible for the study, but due to security restrictions residents of housing located on a military base or military reservation were excluded.

The NLAAS is based on a stratified probability sample design that includes multiple area probability sample components:

  • An NLAAS Core sampling of PSUs, area segments, and housing units that is designed to be nationally representative of all US populations including Latinos and Asians

  • The NLAAS High Density (HD) supplemental samples, targeted oversamples of geographic areas with greater than 5% residential density for individual national origin groups of interest in the NLAAS

The NLAAS Core sample is designed to provide a nationally representative sample of Latinos and Asian Americans without regard to geographic residential patterns. The price for the national representation under the NLAAS Core sample design was a high per unit cost of data collection for eligible respondents. This high cost per interviewed case was due to the fact that many area segments in the Core sample had very low density of the populations of interest in NLAAS and there was a need to screen large numbers of households to identify the targeted samples of Latinos and Asians. Even for the more prevalent and widely distributed Mexican or Chinese ancestry groups, it was very costly to screen a general national area probability sample to identify and interview a large nationally representative sample of eligible adults. Survey costs would have been prohibitively high if this method alone had been used to obtain desired numbers of sample observations of less prevalent national origin groups (such as Puerto Ricans, Cubans, Filipinos, and Vietnamese).

To maximize the statistical efficiency of comparisons between the NLAAS survey populations and the larger US adult population, the primary and secondary stages of the NLAAS Core national sample design were completely integrated with the National Comorbidity Survey Replication (NCS-R) national sample design. The NLAAS Core and NCS-R designs shared the same 62 primary areas representing the MSA and non-MSA strata for the 48 coterminous United States (see Tables 2 and 5). Since full representation of Asian ancestry populations was critical to the NLAAS, the Honolulu HI MSA was added to the primary stage sample as a metropolitan self-representing PSU, bringing the total number of NLAAS National Sample PSUs to 63. The second stage of the NLAAS national sample design component was also fully integrated with the second stage of the NCS-R national sample. The two designs did not share exactly the same area segments and housing unit listings; however, each selected NLAAS Core area segment was paired with an NCS-R area segment and the paired segments from the two samples were physically adjacent to one another - maximizing the 'geographical/ecological correlation' of the two samples (Kish, 1987). The decision to introduce geographic 'overlap' with the NCS-R to the NLAAS Core national sample was based on statistical aims for the NLAAS. A primary aim of the NLAAS was to enable comparisons of mental health characteristics both among the NLAAS survey populations of Latinos and Asians and with the larger US population. Full geographic linkage of the NLAAS national sample area segments to the NCS-R maximized the geographic and socio-economic correlation of the two samples. Since both the NCS-R and the NLAAS Core were designed to be nationally representative, this 'correlation of designs' produced no major inefficiencies for stand alone analysis of the NLAAS survey data but significantly reduced the variance of statistical analyses designed to contrast the populations from the two studies.

For the purpose of statistical efficiency in comparing the NLAAS survey population and the larger US adult population, the primary and secondary stages of the NLAAS Core national sample design were integrated with the National Comorbidity Survey Replication (NCS-R) national sample design.

Table 5 . Primary and secondary stage sample allocation for the NLAAS Core National Sample
Multi-stage sample design units Primary stage sample domain
Total SR NSR
NLAAS Core primary stage sample units (PSUs)
Core PSUs in original NCS-R national sample (plus Honolulu) 63 17 46
Core PSUs not fielded due to near zero expected interviews 25 1 24
Core PSUs fielded in NLAAS household screening 38 16 22
NLAAS Core second stage sample units (SSUs)
Core SSUs matched to NCS-R national sample (plus Honolulu) 474 204 270
Core SSUs not fielded due to near zero expected interviews 157 13 144
Core SSUs fielded in NLAAS household screening 317 191 126
Not high density for special HD oversample populations 26 137 126
High-density Puerto Rican 17 17 0
High-density Cuban 4 4 0
High-density Chinese 15 15 0
High-density Filipino 14 14 0
High-density Vietnamese 4 4 0

As with the NSAL study, the screening and interviewing process of the NLAAS was also conducted through a two-phase method:

Table 6 . Primary and secondary stage sample allocation for the NLAAS high-density (HD) samples
NLAAS HD supplemental samples:
Multi-stage sample design units
Primary stage sample domain
Total SR NSR
NLAAS HD primary stage sample units (PSUs)
      High-density Puerto Rican PSUs 20 12 8
      High-density Cuban PSUs 9 7 2
      High-density Chinese PSUs 17 13 4
      High-density Filipino PSUs 18 10 8
      High-density Vietnamese PSUs 18 12 6
NLAAS-HD second stage sample units (SSUs)
      High-density Puerto Rican SSUs 51 34 17
      High-density Cuban SSUs 70 66 4
      High-density Chinese SSUs 46 34 12
      High-density Filipino SSUs 51 32 19
      High-density Vietnamese SSUs 60 43