BackgroundUsing CPESInteractive DocumentationDownload DataTraining ResourcesPublicationsOnline Analysis

Weighting

National Institutes of Mental Health (NIMH)
Collaborative Psychiatric Epidemiology Survey Program (CPES) Data Set.
Integrated Weights and Sampling Error Codes for Design-based Analysis

Steven G. Heeringa, Patricia Berglund
Statistical Design Group, Survey Research Center, University of Michigan

June 4, 2007

I. Introduction

Under contract to the National Institutes of Mental Health (NIMH), the Survey Research Center (SRC) has developed an integrated data base for the Collaborative Psychiatric Epidemiology (CPES) surveys: National Comorbidity Survey-Replication (NCS-R), National Survey of American Life (NSAL) and National Latino and Asian American Study (NLAAS). Heeringa, et al (2004) describe the sample designs and sample outcomes for the three CPES surveys. A general description of the survey methodology for the CPES surveys can be found in Pennell, et al. (2004).

This technical report outlines the method for integrating the design-based analysis weights and variance estimation codes for these three studies to permit analysts to approach analysis of the combined dataset as though it were a single, nationally-representative study.

The method of integrating the analysis of these three major survey programs was based on an adaptation of a multiple frame approach to estimation and inference for population characteristics (Hartley, 1962, 1974). There are several features and advantages to the method that are worth noting:

  1. It was built on all of the study-specific weight development efforts conducted to date (Kessler et al. , 2004; Heeringa et al. 2004; Heeringa, et al. 2006).

  2. It integrated overlapping representation of domains of the CPES survey population in a way that was mathematically transparent and easily understood by analysts of the combined data set. Given the large investments in study-specific weight development, this approach minimized the chance for conceptual or computational errors.

  3. It was centered on the assumption that, conditional on the sample domain (e.g., block groups with 10-29.9% African American population) and the race/ethnicity of the respondent (e.g., Mexican-American), each study's sample representation based on the revised weight is proportional to the number of cases it "contributes" to the geographic domain x race/ethnicity cell.

II. CPES Survey Population

The CPES survey population was defined by the union of the survey populations for the three component studies. This included adults age 18 and older, living in households in the 48 coterminous United States (NCS-R, NSAL). The survey population for the Latino and Asian ancestry groups extended to the State of Hawaii as well.

III. Race/Ancestry Populations

CPES analysts are free to define respondent groupings for analysis; however, for purposes of weight development twelve specific race/ancestry groupings were initially specified. These groupings are listed in Table 1. Due to the small number of persons of Other ancestry interviewed in the NCS-R, those individuals were combined with the White race category for purposes of the CPES weight computation.

Table 1: Race/Ancestry Groupings Required For CPES Weight Development
CPES Race/Ancestry Population GroupSurvey Populations
VietnameseNCS-R, NLAAS
FilipinoNCS-R, NLAAS
ChineseNCS-R, NLAAS
All Other Asian *NCS-R, NLAAS
CubanNCS-R, NLAAS, NSAL
Puerto RicanNCS-R, NLAAS, NSAL
MexicanNCS-R, NLAAS
All Other Hispanic *NCS-R, NLAAS, NSAL
Afro-Caribbean (non-Hispanic) NCS-R, NSAL
African-American (non-Hispanic)NCS-R, NSAL
WhiteNCS-R, NSAL
All Other (Pacific Islander, Native American, etc.)NCS-R
* Based on NLAAS screening criteria

The breakdown of the full population into these 12 race/ancestry populations was a direct result of the specific eligibility and oversampling provisions of the NSAL and the NLAAS study designs. As shown in Table 1, NCS-R provided nearly universal coverage of all 12 race/ancestry groups. NSAL and NLAAS provided in-depth coverage of specific populations and with the exception of Afro-Caribbeans from Spanish language countries in the Caribbean (e.g., Cuba, Dominican Republic), the oversampling in each of these two studies did not overlap.

These 12 population groupings form the first dimension of a two-dimensional array that was used to apportion/adjust study-specific weights to create a new weight variable for integrated CPES analyses. These "population" groupings were defined at the respondent level. If individual respondents had multiple race/ancestry, they were assigned to a single category according to the priority order in the NLAAS and NSAL respondent classification rules (e.g. Afro-Caribbean taking preference over African-American, Vietnamese over Chinese). If ancestry for NCS-R cases could not be explicitly established at the level of detail required to map them into the NSAL or NLAAS population categories, they were stochastically assigned to a category based on the prevalence of each population in the Census Block Group in which the respondent's household was located.

IV. Sample Frame Geographic Domains

The second dimension of the CPES weight computation array was defined based on the geographic domain of the U.S. national sample frame with which individual area segments for the three component samples were associated (see Heeringa et al., 2004). The "domain" groupings were assigned at the area segment level. All respondents from the same segment, regardless of population, were assigned to the same domain. Table 2 defines the 11 domain categories that were used to classify area segments and thereby assign each CPES respondents to a geographic domain.

Table 2. Sample Frame Geographic Domains Required for CPES Weight Development
CPES DomainDomain Definition
1Census Block Group >5% Cuban Population
2Census Block Group >5% Vietnamese Population
3Census Block Group >5% Filipino Population
4Census Block Group >5% Puerto Rican Population
5Census Block Group >5% Chinese Population
6Census Block Group >10% Afro-Caribbean (non-Hispanic) (Restricted to NY, NJ, FL, CT, MA, RI and DC)
7Census Block Group 60-100% African-American
8Census Block Group 30-59.9% African-American
9Census Block Group 10-29.9% African-American
10Census Block Group 0-9.9% African-American
11Hawaii (NLAAS only)

All segment assignments to geographic domains were performed using Census 2000 data for Block Groups. Like the population assignments for mixed ancestry respondents, area segment domain assignments based on this 11 category classification were not always unique. For example a Census Block Group might have contained a population that was >5% Vietnamese and also >5% Chinese. In cases where a Census Block Group qualified for more than one domain, the corresponding area segment was assigned to the lowest numbered category (e.g. the high density Vietnamese domain for this last example).

V. Integrated weight for the pooled CPES data set

Case-specific population weights had been developed for each CPES component survey (Kessler et al., 2004; Heeringa, 2004; Heeringa et al., 2005). Each project had carefully developed and refined its weight vector to enable robust probability sampling inference ("design-based") to its chosen survey population. NCS-R was unique among the three component studies in that it required two final analysis weights--one weight for the full sample of cases who participated in the Part 1 interview and a second for the subsample of cases that also completed Part 2 of the NCS-R. Consequently, the CPES combined data set also has two analysis weights--the first for analysis of common data items and the second for analysis of survey items that NCS-R only administered to Part 2 respondents.

The integrated weight development began with the existing final population weights for the NCS-R, NSAL, and NLAAS. The integrated weight development then proceeded according to the following steps:

Step 1. Each NLAAS, NSAL, and NCS-R case was assigned to a race/ancestry category based on the categories and priority order provided in Table 1 (see Section III).

Step 2. Each NLAAS, NSAL, and NCS-R area segment was assigned to a geographic domain based on the definitions and priority order shown in Table 2 (see Section IV). Each NLAAS, NSAL, and NCS-R respondent was assigned to a geographic domain based on its area segment classification.

Step 3. The final population weight values for the three data sets were obtained from the NLAAS, NSAL, and NCS-R investigators. Since the final NCS-R and NSAL weights had been "centered" or "normalized" (mean weight=1.0), they were restored to the original U.S. population scaling based on weighted totals from the March 2002 demographic supplement of the Current Population Survey ( CPS).

Step 4. Notation: each case in the CPES pooled data set was indexed as follows:

Table 3. Subscript notation for weight integration expressions
Index SubscriptValuesRepresenting
i1,...,nIndividual sample case subscript
j1,2,3Study index, 1=NCS-R; 2=NSAL, 3=NLAAS
k1-11Population index (Table 1), collapsing White, Other
l1-11Domain index (Table 2)

Step 5. From the pooled data set, EXCEL spreadsheets were used to compute the sums of nominal cases for each study by race/ancestry population by geographic domain cell. These counts were then aggregated across the three studies to produce CPES pooled case counts for each population x domain cell:

Step 6. The March 2002 CPS data enabled estimation of post-stratification control totals for each race/ancestry group, k=1,...,11; however, it did not provide the geographic detail needed to allocate the population total to the l=1,...,11 geographic domains. For this purpose, the weighted population distribution from the CPES study with the most robust estimates of geographic distribution was used. NLAAS was chosen as the basis for allocating the Asian and Hispanic populations to the 11 sample geographic domains. NSAL weighted sample distributions were used to apportion the African-American and Afro-Caribbean populations to the geographic domains. White and Other population totals were allocated to geographic domains based on the empirical distribution of weights in the NCS-R.

where:

= the CPES control total for race/ethnicity population k and domain l,

= the original study-specific weight for case i, study,

= the CPES study chosen to estimate the domain allocation for population k, and

= the March 2002 CPS population estimate for race/ethnicity category k.

Table 4 provides the final population controls for the race/ethnicity x domains cells of the CPES weight computation array.

Step 7: The original population weights from each study were post-stratified to the common race/ethnicity x domain population control totals derived from the March 2002 CPS (see Step 6 and Table 4).

where:

= the study specific weight adjusted to 2002 CPS population totals,

= the original study-specific population weight for case i, study j,

= the 2002 CPS estimate for race/ethnicity population k allocated to domain l.

Since the original study-specific weights for the major populations of interest had already included some form of population-based control, this rescaling to a common post-stratification standard did not require major adjustments.

Step 8. Since in Step 7 the individual study-specific weights were controlled to exact counts for each race/ancestry x geographic domain cell, the remaining step involved rescaling the study-specific weights to reflect the proportion of nominal cases that each study contributed to the cell in the pooled data set.

where:

= the CPES population weight for case i;

= the standard population weight for case i, study j, (assigned to population k and domain l).

Conditional on the assigned population (k) and domain classification (l), this rescaling provided a "proportionate to sample size" contribution from each study. It linearly rescaled the weights for each individual study. It did not alter the distribution of the study-specific population weights except to reduce the study specific mean by n+jkl/n++kl and the variance of the study weights (not relvariance) by a factor of (n+jkl/n++kl)2.

Table 4: Standardized Population Control Totals for CPES Weights Based on March 2002 Current Population Survey (Part 1 of 2)
Race/Ancestry
Population
Group
Sample Frame Geographic Domain
CUBAN
>5%
VIET
<5%
FILIP
>5%
PUERTO RICAN
>5%
CHINESE
>5%
AFRO-CARIB
(see text)
VIETNAMESE24803833493842241222551010
FILIPINO99509439228971410579251660
CHINESE48732545782402471326085415530
OTHER ASIAN356541942222111669342471250
CUBAN103587364350001033450006447
PUERTO RICAN48478192463950510093351945778449
MEXICAN375028725570701509666695840
OTHER HISPANIC202542110127249341109915164478111795
AFRO-CARIBBEAN3687250012501262460577382
AFRIC-AMERICAN1025676847524253115714042750062549
WHITE AND OTHER10867967395181210972202630421917411250
Total16482231782505238439265545473346705837872
Table 4: Standardized Population Control Totals for CPES Weights Based on March 2002 Current Population Survey (Part 2 of 2)
Race/Ancestry
Population
Group
Sample Frame Geographic Domain
AFRI-AMER
60-100%
AFRI-AMER
30-59.9%
AFRI-AMER
10-29.9%
AFRI-AMER
0-9.9%
HAWAIITotal
VIETNAMESE24555434817994543744834031170273
FILIPINO11442270323897666741844216171953842
CHINESE107205015325519710396602234682596916
OTHER ASIAN441725338159230717492903609773330378
CUBAN163471141214958580562801116983
PUERTO RICAN687273229374734536439374662235065
MEXICAN28234667944294272211529509015763477
OTHER HISPANIC165520203503759917259210029455447776
AFRO-CARIBBEAN2881192206381983431211901430284
AFRIC-AMERICAN1059692151103873545866721516022049716
WHITE AND OTHER19430906258727188673721184048220152730592
Total1345195912010754282557541385027151049876209825302

VI. Final Weights and Special Analysis Consideration for Weighted Analysis

VI.A Final Weights

As noted in Section V above, n=9282 NCS-R survey respondents completed Part 1 of the two-part CIDI-based interview; however, only a subsample of n=5692 NCS-R respondents went on to complete the more in-depth Part 2 questionnaire modules. All n=6082 NSAL and n=4649 NLAAS respondents completed the full interview schedule--the equivalent of NCS-R Parts 1 and 2. To account for the split schedule of NCS-R questionnaire administration, two CPES pooled analysis weights were computed using the calculation sequence described in steps (5) through (8) in Section V above. The first weight, termed the "Part 1" weight, is labeled CPESWTSH in the merged CPES data set. It is the population weight that should be used for analysis involving variables that are included in Part 1 of the NCS-R. The second weight, termed the "Part 2" weight, is stored in the merged data set as the variable, CPESWTLG. It is the population weight that should be used when the CPES analysis includes variables that NCS-R only asked of the Part 2 subsample of respondents.

Table 5 provides a summary of selected distributional statistics for the final CPES Part 1 and Part 2 analysis weights.

Table 5: Distributions of CPES Final Part1 and Part 2 Analysis Weights
Race/Ethnicity
Population
Group
CPES Final Analysis Weight
CPESWTSH CPESWLTG
nMeanMinimumMaximumnMeanMinimumMaximum
VIETNAMESE527222176011898526222528111898
FILIPINO525372272615307520375772615307
CHINESE619419575415820613423693415820
OTHER ASIAN613543358720950519641773227255
CUBAN625178726720949610641673127225
PUERTO RICAN65434171281084662036047516372
MEXICAN1442109317814938212141298437389022
OTHER HISPANIC899606053344888820664459140582
AFRO-CARIBBEAN149295911120796147696916221040
AFRIC-AMERICAN47464646728185944249518997836257
WHITE AND OTHER78711936212501313315256288001250195000
Total Sample2001310468111131331164231269375195000

VI.B Special Considerations in Weighted Analysis of the CPES Data

The weights have been designed to enable analysts to compute unbiased or nearly unbiased estimates of population statistics and relationships (e.g. bivariate associations, regression relationships) for the larger CPES survey population of U.S. residents. Contemporary statistical software systems such as SAS, Stata, SPSS, and SUDAAN all provide the capability to conduct weighted analysis of the CPES survey data. CPES data analysts are encouraged to consult the user guides and help support for their chosen software package to learn the syntax and program specific features for conducting weighted analysis. The following paragraphs provide guidance on weighted analysis that is specific to the CPES data set.

VI.B. 1 Part 1 or Part 2 Weight ?

CPES analysts should consult the data documentation to determine if variables of interest in their analysis were obtained in Part 1 or Part 2 on the NCS-R interview. If the analysis includes only Part 1 variables, the CPESWTSH analysis weight should be used. It will include the full sample of NCS-R cases and provide greatest precision for sample estimates of population characteristics or relationships. If the analysis includes one or more variables that NCS-R collected only in Part 2, the appropriate weight for population estimation is CPESWTLG.

In the calculation of the Part 1 and Part 2 weights, the absolute contributions from NSAL and NLAAS to the pooled weight calculation remained unchanged--only NCS-R required changes to the nominal case counts and initial rescaling steps. However, due to the reduced NCS-R sample size for the Part 2 variables, the relative contributions of the NSAL and NLAAS to any give race/ethnicity x sample domain weighting cell did change. Therefore the final CPES Part 1 and Part 2 analysis weights (Step 8 above) differ for NSAL and NLAAS cases as they do for the NCS-R cases.

VI.B.2 Subsetting the CPES data by study

Occasionally, analysts may choose to extract CPES data for only one or two of the three component data sets. The CPES analysis weights will support this type of analysis; however, analysts should recognize that the sum of weights for this special CPES subset may not sum to the population control for that population. For example, consider an analysis which only used Afro-Caribbean data from the NLAAS and NSAL. Since a small number of Afro-Caribbeans interviewed in the NCS-R would be excluded from this analysis, the sum of weights for the combined NSAL and NLAAS cases would no longer match the CPES population control total for the Afro-Caribbean race/ethnicity population. A principle of weighted analysis of data is that population estimates and sampling errors (except for estimates of totals) should be invariant to any linear scaling of the weights (multiplication or division by a constant). Under the procedures used to compute the CPES Part 1 and Part 2 weights, this assumption of linear scaling applies when the data for one or two studies are used independently or are compared.

VI.B.3 Subsetting the CPES data based on characteristics or respondents

In general, CPES analysts can apply the analysis weights for subpopulation analysis (e.g., estimation for women of Mexican-American ancestry). Provided all qualifying cases in the CPES data are included in the subpopulation analysis, the estimates would be unbiased and the sum of the CPES weights would be an unbiased estimator of the 2002 population count for that subset of the larger U.S. population. Experience has shown that due to the sheer numbers of observations and richness of the variable set, data sets such as the CPES generate interest in rare populations or populations for which the original samples were not optimal (e.g. women of Mexican-American ancestry living in the West Census Region and covered by a regional health maintenance organization (HMO) program). CPES analysts who have concerns about the appropriateness of the CPES for subpopulation analysis they are proposing to conduct are encouraged to consult a survey statistician.

VI.B.4 Item Missing Data for Analysis Variables

The original NCS-R, NSAL, and NLAAS analysis weights included adjustments for survey nonresponse. Through the process used to create the combined analysis weights, these adjustments for differential nonresponse are preserved in the CPES Part 1 and Part 2 weights. However, the CPES weights do not include adjustments for item missing data in the CPES data set. With a few special exceptions, most statistical software packages employ "case-wise" deletion as the means to address the problem of missing values for the variables. That is, any case with a missing value on one or more variables (e.g. fitting a multivariate logistic regression model) will cause the case to be dropped from the analysis. If the amount of such case-wise deletion is substantial, the unbiasedness of the weighted estimation may be compromised. Analysts are encouraged to use standard data checking techniques to establish the patterns of missing data in their analysis variables and assess the extent and impact of software-driven case-wise deletion on the integrity of their analysis. If the variables of interest have high rates of item missing data, analysts may consider consulting a survey statistician to consider remediation approaches such as stochastic imputation (Little and Rubin, 2002).

VII. Procedures for Sampling Error Estimation in Design-based Analysis of the CPES Data.

The CPES data set is the product of the merger of three probability samples of the U.S. population and therefore shares the primary stage sample stratification and clustering features of the component sample designs. The NCS-R, NSAL and NLAAS sample designs were very similar in their basic structure to the multi-stage designs used for major survey programs such as the U.S. Health Interview Survey (HIS), the National Survey of Family Growth (NSFG) or the other national scientific surveys. The survey literature refers to the these samples as complex designs, a loosely-used term meant to denote the fact that the sample incorporates special design features such as stratification, clustering and differential selection probabilities (i.e., weighting) that analysts must consider in computing sampling errors for sample estimates of descriptive statistics and model parameters. Standard programs in statistical analysis software packages assume simple random sampling (SRS) or independence of observations in computing standard errors for sample estimates. In general, the SRS assumption results in underestimation of variances of survey estimates of descriptive statistics and model parameters. Confidence intervals based on computed variances that assume independence of observations will be biased (generally too narrow) and design-based inferences will be affected accordingly. Likewise, test statistics (t, X2, F) computed in complex survey data analysis using standard programs will tend to be biased upward and overstate the significance of tests of effects.

This section focuses on sampling error estimation and construction of confidence intervals for survey estimates of descriptive statistics such as means, proportions, ratios, and coefficients for linear and logistic regression models.

VII.A Sampling Error Computation Methods and Programs

Over the past 50 years, advances in survey sampling theory have guided the development of a number of methods for correctly estimating variances from complex sample data sets. Sampling error programs that implement these complex sample variance estimation methods are available to CPES data analysts. The two most common approaches (Rust, 1985) to the estimation of sampling error for complex sample data are through the use of a Taylor Series linearization of the estimator (and corresponding approximation to its variance) or through the use of resampling variance estimation procedures such as Balanced Repeated Replication (BRR) or Jackknife Repeated Replication (JRR).

VII.B Taylor Series linearization method:

When survey data are collected using a complex sample design with unequal size clusters, most statistics of interest will not be simple linear functions of the observed data. The linearization approach applies Taylor's method to derive an approximate form of the estimator that is linear in statistics for which variances and covariances can be directly and easily estimated. Stata Release 8 and 9, SAS V8.2/V9.0, SUDAAN Version 9, and the most recent releases of SPSS are commercially available statistical software packages that include procedures that apply the Taylor Series method to sampling error estimation and inference for complex sample data.

Stata (StataCorp, 2005) is a more recent commercial entry to the available software for analysis of complex sample survey data and has a growing body of research users. Stata includes special versions of its standard analysis routines that are designed for the analysis of complex sample survey data. Special survey analysis programs are available for descriptive estimation of means (SVY MEAN), ratios (SVY RATIO), proportions (SVY TAB), and population totals (SVY TOTAL). STATA programs for multivariate analysis of survey data include linear regression (SVY REGRESS), logistic regression (SVY LOGIT) and probit regression (SVY PROBT). STATA program offerings for survey data analysts are constantly being expanded. Information on the STATA analysis software system can be found on the Web at: http://www.stata.com.

Programs in SAS Version 9 (SAS, 2003; http://www.sas.com/) also use the Taylor Series method to estimate variances of means (PROC Surveymeans), proportions and cross-tabular analysis (PROC SurveyFreq), linear regression (PROC SurveyReg), and logistic regression (PROC SurveyLogistic).

SUDAAN (RTI, 2004) is a commercially available software system developed and marketed by the Research Triangle Institute of Research Triangle Park, North Carolina (USA). SUDAAN was developed as a stand-alone software system with capabilities for the more important methods for descriptive and multivariate analysis of survey data, including: estimation and inference for means, proportions, and rates (PROC DESCRIPT and PROC RATIO); contingency table analysis (PROC CROSSTAB); linear regression (PROC REGRESS); logistic regression (PROC LOGISTIC); log-linear models (PROC CATAN); and survival analysis (PROC SURVIVAL). SUDAAN V9.0 and earlier versions were designed to read directly from ASCII and SAS system data sets. The latest versions of SUDAAN permit procedures to be called directly from the SAS system. Information on SUDAAN is available at the following Web site address: http://www.rti.org/.

SPSS Version 14.0 (http:// www.spss.com/) users can obtain the SPSS Complex Samples module which supports Taylor Series linearization estimation of sampling errors for descriptive statistics (CSDESCRIPTIVES), cross-tabulated data (CSTABULATE), general linear models (CSGLM), and logistic regression (CSLOGISTIC).

VII.C Resampling methods:

BRR, JRR, and the bootstrap comprise a second class of nonparametric methods for conducting estimation and inference from complex sample data. As suggested by the generic label for this class of methods, BRR, JRR, and the bootstrap utilize replicated subsampling of the sample database to develop sampling variance estimates for linear and nonlinear statistics. WesVar PC (Westat, Inc., 2000) is a software system for personal computers that employs replicated variance estimation methods to conduct the more common types of statistical analysis of complex sample survey data. WesVar PC was developed by Westat, Inc. and is distributed along with documentation to researchers at Westat's Web site: http://www.westat.com/wesvarpc/ . WesVar PC includes a Windows-based application generator that enables the analyst to select the form of data input (SAS data file, SPSS for Windows data base, ASCII data set) and the computation method (BRR or JRR methods). Analysis programs contained in WesVar PC provide the capability for basic descriptive (means, proportions, totals, cross tabulations) and regression (linear, logistic) analysis of complex sample survey data. WesVar also provides the best facility for estimating quantiles of continuous variables (e.g., 95%-tile of a cognitive test score) from survey data. WesVar Complex Samples 4.0 is the latest version of WesVar. Researchers who wish to analyze the CPES data using WesVar PC should choose the BRR or JRR (JK2) replication option.

STATA V9 has introduced the option to use JRR or BRR calculation methods as an alternative to the Taylor Series method for all of its svy command options. SUDAAN V9.0 also allows the analysts to select the JRR method for computing sampling variances of survey estimates.

IVEWare is another software option for the JRR estimation of sampling errors for survey statistics. IVEWare has been developed by the Survey Methodology Program of the Survey Research Center and is available free of charge to users at: http://www.isr.umich.edu/src/smp/ive/ . IVEWare is based on SAS Macros and requires SAS Version 6.12 or higher. The system includes programs for multiple imputation of item missing data as well as programs for variance estimation in descriptive (means, proportions) and multivariate (regression, logistic regression, survival analysis) analysis of complex sample survey data.

These new and updated software packages include an expanded set of user-friendly, well-documented analysis procedures. Difficulties with sample design specification, data preparation, and data input in the earlier generations of survey analysis software created a barrier to use by analysts who were not survey design specialists. The new software enables the user to input data and output results in a variety of common formats, and the latest versions accommodate direct input of data files from the major analysis software systems.

VII.D Sampling Error Computation Models

Regardless of whether the linearization method or a resampling approach is used, estimation of variances for complex sample survey estimates requires the specification of a sampling error computation model. CPES data analysts who are interested in performing sampling error computations should be aware that the estimation programs identified in the preceding section assume a specific sampling error computation model and will require special sampling error codes. Individual records in the analysis data set must be assigned sampling error codes that identify to the programs the complex structure of the sample (stratification, clustering) and are compatible with the computation algorithms of the various programs. To facilitate the computation of sampling error for statistics based on CPES data, design-specific sampling error codes will be routinely included in all versions of the data set. Although minor recoding may be required to conform to the input requirements of the individual programs, the sampling error codes that are provided should enable analysts to conduct either Taylor Series or Replicated estimation of sampling errors for survey statistics. In programs that use the Taylor Series Linearization method, the sampling error codes (SESTRAT and SECLUSTR) will typically be input as keyword statements (SAS V9.1, SUDAAN V9.0) or as global settings (Stata V9) along with the analysis weight and will be used directly in the computational algorithms. Programs that permit BRR or JRR computations will require the user supplied sampling error codes to construct "replicate weights" that are required for these approaches to variance estimation.

Two sampling error code variables are defined for each case based on the sample design stratum and primary stage unit (PSU) cluster in which the sample respondent resided: Sampling Error Stratum Code (SESTRAT) and Sampling Error Cluster Code (SECLUSTR). The CPES SESTRAT codes were derived directly from a concatenation of the existing sampling error stratum codes for the NCS-R, NSAL and NLAAS sample designs. A total of 180 sampling error strata were defined. These were allocated to the individual contributing samples according to the coding scheme shown in Table 6.

Table 6. CPES Sampling Error Strata
CPES Component SampleCPES Sampling Error Strata
NCS-R1-42
NSAL43-111
NLAAS112-180

All original sampling error strata definitions for the NCS-R and NLAAS were preserved unchanged in the mapping to the CPES sampling error stratum code. In general, the assignment of NSAL cases to CPES sampling error strata also followed the original NSAL coding. The single exception involved a NSAL sampling error stratum that included multiple clusters. This stratum was divided into several pseudo-strata each with a pair of combined clusters. This minor change enables CPES analysts to use any of the sampling error calculation methods (Taylor, BRR or JRR) without having to perform additional recoding of the sampling error variables.

Likewise, with one exception, the values of SECLUSTR for CPES sampling error strata are identical to those in the original NCS-R, NSAL and NLAAS data sets. The exception was the cluster numbering for the one NSAL sampling error stratum with multiple clusters. Clusters in this stratum were randomly grouped into pairs and assigned to pseudo-strata as described in the preceding paragraph. The result is that the CPES SECLUSTR code takes a value of either 1 or 2 and exactly two sampling error clusters are assigned to each sampling error stratum.

VII.E Syntax for CPES Design-based Variance Estimation Using STATA and SAS

The following two sections provide a short overview of the general syntax and command file structure for computing sampling errors using STATA and SAS programs that have been designed for the analysis of complex sample survey data. Analysts are referred to the user guides and the on-line help facilities of these two software systems for documentation of the individual programs.

VII.E.1 Stata command syntax

As described above, CPES data analysts who are familiar with the STATA software system can utilize STATA's "svy" commands for the analysis of complex sample survey data. STATA Version 9 syntax for some of the more commonly used analysis programs is illustrated below (shown for the Part 2 weight option) :

.svyset seclustr [pweight=cpeswgtl], strata(sestrat)

This statement defines the sample design variables for the duration of the analysis session. SVY commands issued after this statement will automatically incorporate these design specifications.

To conduct analyses, the following STATA commands and syntax are used (please refer to STATA V9 Reference Manual for specific command syntax and output options):

.svy, vce(linearized): mean vars

[estimates, standard errors, design effects for means]

.svy, vce(linearized): tab v1 v2

[estimates, standard errors for proportions of single variable categories, or crosstabulations of two variables with tests of independence]

.svy, vce(linearized): regress dep x1 ...

[simple linear regression model for a continuous dependent variable]

.svy, vce(linearized): logit dep x1...

[simple logistic regression model for a binary dependent variable]

To estimate the single statistics or regression models for subpopulations of the survey population in STATA, the following optional syntax is used (illustrated for svytab):

.svy, vce(linearized): tab v1 v2, over(var)

where var is a categorical variable that defines the subpopulations for which separate estimates are desired (e.g. gender).

VII.E.2 SAS Version 9 Command Syntax

SAS Version 9 includes four programs for the analysis of complex sample survey data: PROCS Surveymeans, SurveyFreq, SurveyReg and SurveyLogistic. The general syntax for specifying the CPES design structure in the SAS system is as follows:

PROC SurveyXXXX data=libname.filename;
STRATUM SESTRAT;
CLUSTER SECLUSTR;
WEIGHT CPESWTLG;
program specific statements here;
RUN;

Users are referred to the SAS/STAT(R) 9.1 User's Guide (SAS, 2004) for documentation on program specific statements, keywords and options

VIII. Weights for Study pairs

Final weights were also developed for analyzing pairs of the CPES studies (NCS-R and NSAL, NCS-R and NLAAS, and NSAL and NLAAS) and for Part I and Part II sub-samples (only for pairs that include NCS-R respondents). This will allow for generating population estimates by analyzing data from study pairs only. Table 7 below summarizes the paired weights and key descriptive statistics for the weight distributions.

Table 7: Descriptive Statistics for Paired Study Weights
Study PairSampleVariable NameSample
Size
Mean
Weight
Standard
Deviation
of Weights
Sum of
Weights
NCS-R and NSALShort NCNSWTSH1536413588.611613251.1943208775428
NCS-R and NSALLong NCNSWTLG1177417731.902626960.0142208775421
NCS-R and NLAASShort NCNLWTSH1393115061.754811814.4696209825306
NCS-R and NLAASLong NCNLWTLG1034120290.6198 26553.902209825299
NSAL and NLAAS---a NSNLWT1073119553.192263077.1289209825306

a. Short and long sub-samples don't not apply as NCS-R sample is not included.

Weights for analysis of CPES study pairs are based on the final CPES 3-study weights and were developed according to the following steps:

Step 1. Each NLAAS, NSAL and NCS-R case was assigned to a race/ancestry category based on the categories and priority order provided in Table 1 (see Section III).

Step 2. Each NLAAS, NSAL and NCS-R area segment was assigned to a geographic domain based on the definitions and priority order shown in Table 2 (see Section IV). Each NLAAS, NSAL and NCS-R respondent was assigned to a geographic domain based on its area segment classification.

Step 3. For each pair of studies, race x domain cell counts were obtained. Due to the lack of over-sampling of certain race groups in specific pairs (Asians in NCS-R and NSAL) some of the CPES race x domain cells had no cases or a small number of cases. Such small cell counts could affect the robustness of post-adjustments and are usually dealt with by collapsing. Collapsing was mainly done over similar race groups (e.g., Vietnamese, Filipino, Chinese and other Asian groups) within a domain. Similar domains were then collapsed (examples of collapsed domains include Census Block Group > 5% Cuban and include Census Block Group > 5% Puerto Rican) if the cell count was still small (mainly <10) after race group collapsing. The same collapsed groups were used for the long and the short sub samples (whenever applicable). Collapsed groups for each pair of studies are shown in Tables 8-12.

Step 4. CPS 2002 totals were calculated for each collapsed group. Weighted counts using the final CPES weight (short form when dealing with short sample and long form when dealing with long sub-sample) were also generated for each collapsed group.

Step 5. A post-stratification adjustment factor (CPS 2002 total divided by the weighted count using CPES final weight) was calculated and applied to the final CPES weight to generate the paired weights. Respondents' weights for cases belonging to the same race x domain received the same factor.

Table 8: NCS-R and NSAL collapsed cells for the short sample, post-stratification adjustment factor and mean weights
Collapsed groups for NCSR & NSAL shortCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtsh)
Asians over all domains80419441899593918.3823425495076.142857
Cuban, PR and other Hisp over all domains875941349220576624.2569736924182.239837
Mexicans over all domains1576347757460519092.60471150510543.39547
Africocarib and AA in Cuban blocks106254351062531.0000094113035.8
Whites in Cuban blocks1086796266118321.77629806923532
Afrocarib and AA in Vit blocks7097530709760.9999859112365.866667
Whites in in Vit blocks739518376915101.06942488218689.45946
Afrocarib and AA in Flip blocks243781552437801.0000041024432.363636
Whites in Flip blocks1210972659448091.28171090714535.52308
Afrocarib in PR blocks126246571262451.0000079212214.824561
AA in PR blocks157140423015714120.9999949096832.226087
Whites in PR blocks202630413020144611.00587899215495.85385
Afrocarib and AA in Chinese blocks27500222750011250
Whites in Chinese blocks219174112326667270.8218842821680.70732
Afrocarib in Afrocarib blocks5773829335773860.999993072618.8488746
AA in Afrocarib blocks62549126254915212.416667
Whites in Afrocarib and AA 60-100194434026719443391.0000005147282.168539
Afrocarib in AA 60-1002881192562881161.0000104121125.453125
AA in AA 60-100105969212343105969051.000001514522.793427
Afrocarib in AA 30-59.92206381722206420.9999818711282.802326
AA in AA 30-59.95110387112251104000.9999974564554.723708
Whites in AA 30-59.9625872763862586831.000007039809.847962
Afrocarib in AA 10-29.9198343481983440.9999949584132.166667
AA in AA 10-29.9354586670335458431.0000064865043.8734
Whites in AA 10-29.9 188673721216188672351.00000726115515.81826
Afrocarib and AA in AA 0-9.97336352207336490.9999809173334.768182
Whites in AA 0-9.911840482253691184058240.99999153822053.60849
Table 9: NCS-R and NSAL collapsed cells for the long sample, post-stratification adjustment factor and mean weights
Collapsed groups for NCSR & NSAL longCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnswtlg)
Asians over all domains 80419448346143217.428232115559.421687
Cuban, PR and other Hisp over all domains875941336415240275.7475444994186.887363
Mexican over all domains1576347734643124013.65538292912463.58671
Afrocarib and AA in Cuban blocks 1062542510625414250.16
Whites in Cuban blocks 10867961810553361.02981041158629.77778
Afrocarib and AA in Vit blocks 70975217097513379.761905
Whites in Vit blocks 739518237395270.9999878332153.34783
Afrocarib and AA in Flip blocks 243781472437820.9999958985186.851064
Whites in Flip blocks 12109724812109790.9999942225228.72917
Afrocarib in PR blocks 126246551262451.0000079212295.363636
AA in PR blocks 157140420215714170.9999917277779.292079
Whites in PR blocks 20263049020263140.99999506522514.6
Afrocarib and AA in Chinese blocks 27500142750011964.285714
Whites in Chinese blocks 21917417121917470.99999726230869.67606
Afrocarib in Afrocarib blocks 5773829335773860.999993072618.8488746
AA in Afrocarib blocks 62549126254915212.416667
Whites in Afrocarib and AA 60-100 194434022119095081.0182413488640.307692
Afrocarib in AA 60-100 2881192532881161.0000104121138.798419
AA in AA 60-100 105969212137105968811.0000037754958.765091
Afrocarib in AA 30-59.9 2206381702206420.9999818711297.894118
AA in AA 30-59.9 5110387102551103970.9999980434985.753171
Whites in AA 30-59.9 625872750958523641.06943570211497.76817
Afrocarib in AA 10-29.9 198343461983450.9999899174311.847826
AA in AA 10-29.9 354586662135458501.0000045125709.903382
Whites in AA 10-29.9 18867372891203005710.9294010522784.0303
Afrocarib and AA in AA 0-9.9 7336351647336360.9999986374473.390244
Whites in AA 0-9.9 11840482233851160883231.01995462534294.92555
Table 10: NCS-R and NLAAS collapsed cells for the short sample, post-stratification adjustment factor and mean weights
Collapsed groups for NCSR & NLAAS shortCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtsh)
Asians in Cuban blocks9681618968031.0001342935377.944444
Cubans in Cuban blocks103587701035880.9999903461479.828571
PR in Cuban blocks48478264847811864.538462
Mexican and other His in Cuban blocks206292462016881.0228273374384.521739
Afrocarib and AA in Cuban blocks10625419535521.9841275772818.526316
Whites in Cuban blocks1086796255883001.84734999223532
Vit in Vit blocks38334920838334911843.024038
Flip in Vit blocks94392199439214968
Chinese and Other Asians in Vit blocks74000187400014111.111111
Cuban, PR and Mexican in Vit blocks310144353101391.0000161228861.114286
Other Hispanics in Vit blocks1101271311012718471.307692
Afrocarib and AA in Vit blocks7097521489661.4494751462331.714286
Whites in Vit blocks739518376915101.06942488218689.45946
Vit in Flip Blocks38422173842212260.117647
Flip in Flip Blocks2897148328971413490.53012
Chinese in Flip Blocks2402476824024713533.044118
other Asians in Flip blocks2211163922111615669.641026
Cuban and PR in Flip blocks44505134450513423.461538
Mexican and other Hisp in Flip block9563513956341.0000104577356.461538
Africocarib and AA in Flip blocks243781251104492.2071815954417.96
Whites in Flip blocks1210972517413141.63354799714535.56863
Vit and Flip in PR blocks51801145180113700.071429
Chinese and other Asians in PR blocks139542221395411.0000071666342.772727
Cuban and PR in PR blocks10196692649804251.0400275393713.731061
Mexican in PR blocks509666445096611.0000098111583.20455
other Hisp in PR blocks110991521510749171.0325587934999.613953
Afrocarib and AA in PR blocks1697650714204404.037793745921.690141
Whites in PR block202630411017045451.1887653315495.86364
Table 10: NCS-R and NLAAS collapsed cells for the short sample, post-stratification adjustment factor and mean weights (Continued)
Collapsed groups for NCSR & NLAAS shortCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtsh)
Vit in Chinese blocks55101195510112900.052632
Flip in Chinese blocks2516610251651.0000397382516.5
Chinese in Chinese blocks54155311954155314550.865546
other Asians in Chinese block247125492471260.9999959535043.387755
Cuban, PR and Mexican in Chinese blocks94041239404114088.73913
Other Hispanics in Chinese blocks164478241644790.999993926853.291667
Afrocarib and AA in Chinese blocks27500222750011250
Whites in Chinese blocks219174112326667270.8218842821680.70732
Asians in Africocarib and AA 60-100 blocks90889169088915680.5625
Cuban and PR in Africocarib and AA 60-10016997028618072.7500121352207.392857
Mexican in Africocarib and AA 60-100282346272823510.99998229210457.44444
Other His in Africocarib and AA 60-100277315191014482.7335679365339.368421
Whites in Africocarib and AA 60-1001944340916647402.9249631437304.835165
Asians in AA 30-59.9134914241349131.0000074125621.375
Cubans, PR, and Mexicans in AA 30-59.98258531777101.0627332392506.774194
Other His in AA 30-59.9203503301526271.3333355175087.566667
Whites in AA 30-59.9625872730229625492.1126155219809.764901
Vit in AA 10-29.91799457117994512534.43662
Flip in AA 10-29.93897669338976614191.032258
Chinese in AA 10-29.92551976225519714116.080645
Other Asian in AA 10-29.9592307885923051.0000033776730.738636
Cubans in AA 10-29.9149585811477611.0123442591824.209877
PR in AA 10-29.9374734943669271.0212767123903.478723
Mexicans in AA 10-29.9294272221629427250.99999898113623.72685
Other Hisp in AA 10-29.9759917997375691.0302995387450.191919
Whites in AA 10-29.918867372925143522041.31459753515515.89622
Vit in AA 0-9.943744818843744812326.851064
Flip in AA 0-9.96741841796741831.0000014833766.385475
Chinese in AA 0-9.91039660259103966014014.131274
other asian in AA 0-9.9174929032017492881.0000011435466.525
Table 10: NCS-R and NLAAS collapsed cells for the short sample, post-stratification adjustment factor and mean weights (Continued)
Collapsed groups for NCSR & NLAAS shortCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtsh)
Cubans in AA 0-9.98056284358056290.9999987591852.02069
PR in AA 0-9.95364391565364381.0000018643438.705128
Mexican in AA 0-9.9115295091083115295230.99999878610645.91228
Other His in AA 0-9.9259210034025769351.0058848987579.220588
Africocarib and AA in Africocarib and AA blocks21333840107247266154.5135556844409.155784
Whites AA 0-9.911840482253161172370051.00996116422053.61268
Vit in Hawaiian blocks34032340311701.5
Flip in Hawaiian blocks42161712642161713346.166667
Chinese in Hawaiian blocks223468692234690.9999955253238.681159
other Asian in Hawaiian blocks3609778436097714297.345238
Cuban in Hawaiian blocks00000
PR in Hawaiian blocks37466113746613406
Mexican in Hawaiian blocks00000
Other Hisp in Hawaiian blocks29451294512945
Africocarib in Hawaiian blocks00000
AA in Hawaiian blocks00000
Whites in Hawaiian blocks00000
Table 11: NCS-R and NLAAS collapsed cells for the long sample, post-stratification adjustment factor and mean weights
Collapsed groups for NCSR & NLAAS longCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtlg)
Asians in Cuban blocks9681614968021.0001446256914.428571
Cubans in Cuban blocks1035876310358711644.238095
PR in Cuban blocks48478234847812107.73913
Mexican and other His in Cuban blocks206292422012291.0251603894791.166667
Afrocarib and AA in Cuban blocks1062549375832.8271825034175.888889
Whites in Cuban blocks1086796179949581.09230339458526.94118
Vit in Vit blocks38334920838334911843.024038
Flip in Vit blocks94392199439214968
Chinese and Other Asians in Vit blocks74000137400015692.307692
Cuban, PR and Mexican in Vit blocks310144303101421.00000644910338.06667
Other Hispanics in Vit blocks11012710110127111012.7
Afrocarib and AA in Vit blocks7097512385401.841593153211.666667
Whites in Vit blocks739518237395270.9999878332153.34783
Vit in Flip Blocks38422173842212260.117647
Flip in Flip Blocks2897148328971413490.53012
Chinese in Flip Blocks2402476824024713533.044118
Other Asians in Flip blocks221116342211170.9999954786503.441176
Cuban and PR in Flip blocks44505124450513708.75
Mexican and other Hisp in Flip block9563510956341.0000104579563.4
Africocarib and AA in Flip blocks24378117868582.8066614475109.294118
Whites in Flip blocks1210972348577801.41175126525228.82353
Vit and Flip in PR blocks51801145180113700.071429
Chinese and other Asians in PR blocks1395422013954216977.1
Cuban and PR in PR blocks10196692539786321.0419330253868.110672
Mexican in PR blocks509666415096651.00000196212430.85366
other Hisp in PR blocks110991520910739451.0334933355138.492823
Afrocarib and AA in PR blocks1697650412531436.706288546174.219512
Whites in PR block20263047015760271.28570386222514.67143
Vit in Chinese blocks55101195510112900.052632
Flip in Chinese blocks2516610251651.0000397382516.5
Table 11: NCS-R and NLAAS Collapsed Cells for the long sample, post-stratification adjustment factor and mean weights (Continued)
Collapsed groups for NCSR & NLAAS longCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtlg)
Chinese in Chinese blocks54155311854155314589.432203
Other Asians in Chinese block247125402471241.0000040476178.1
Cuban, PR and Mexican in Chinese blocks9404120890411.0561539074452.05
Other Hispanics in Chinese blocks1644782116447817832.285714
Afrocarib and AA in Chinese blocks27500142750011964.285714
Whites in Chinese blocks21917417121917470.99999726230869.67606
Asians in Africocarib and AA 60-100 blocks90889159088916059.266667
Cuban and PR in Africocarib and AA 60-10016997025598872.838178572395.48
Mexican in Africocarib and AA 60-10028234624282346111764.41667
Other His in Africocarib and AA 60-10027731516945832.9319750915911.4375
Africocarib and AA in Africocarib and AA blocks2133384062430916906.9003813454954.63141
Whites in Africocarib and AA 60-1001944340453903244.9813488288673.866667
Asians in AA 30-59.91349141813491417495.222222
Cuban, PR and Mexicans in AA 30-59.98258522771721.070142023507.818182
other Hisp in AA 30-59.9203503241436501.4166585455985.416667
Whites in AA 30-59.9625872717319890943.14652148211497.65318
Vit in AA 10-29.91799457017994512570.642857
Flip in AA 10-29.93897669138976614283.142857
Chinese in AA 10-29.92551976025519714253.283333
Other Asian in AA 10-29.95923078159230717312.432099
Cubans in AA 10-29.9149585801477391.0124950081846.7375
PR in AA 10-29.9374734923667611.0217389533986.532609
Mexicans in AA 10-29.9294272219429427171.00000169915168.64433
Other Hisp in AA 10-29.9759917907354041.0333326998171.155556
Whites in AA 10-29.918867372600136705031.38015199622784.17167
Vit in AA 0-9.943744818843744812326.851064
Flip in AA 0-9.96741841766741831.0000014833830.585227
Chinese in AA 0-9.91039660256103966014061.171875
Other Asian in AA 0-9.9174929026517492861.0000022876601.079245
Cubans in AA 0-9.98056284328056290.9999987591864.881944
PR in AA 0-9.95364391425364381.0000018643777.732394
Table 11: NCS-R and NLAAS collapsed cells for the long sample, post-stratification adjustment factor and mean weights (Continued)
Collapsed groups for NCSR & NLAAS longCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(ncnlwtlg)
Mexican in AA 0-9.911529509896115295110.99999982712867.75781
Other His in AA 0-9.9259210029825748181.0067119318640.328859
Whites AA 0-9.911840482233321142550791.03631998734290.2398
Vit in Hawaiian blocks34032340311701.5
Flip in Hawaiian blocks42161712642161713346.166667
Chinese in Hawaiian blocks223468692234690.9999955253238.681159
other Asian in Hawaiian blocks3609778436097714297.345238
Cuban in Hawaiian blocks00000
PR in Hawaiian blocks37466113746613406
Mexican in Hawaiian blocks00000
Other Hisp in Hawaiian blocks29451294512945
Africocarib in Hawaiian blocks00000
AA in Hawaiian blocks00000
Whites in Hawaiian blocks00000
Table 12: NSAL and NLAAS collapsed cells, post-stratification adjustment factor and mean weights
Collapsed groups for NSAL & NLAASCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(nsnlwt)
Asian in Cuban blocks9681614764291.2667442995459.214286
Cubans in Cuban blocks10358757843491.2280762071479.807018
PR in Cuban blocks4847818335621.4444312021864.555556
Mexican and other His in Cuban blocks206292331519081.3580061624603.272727
Afrocarib and AA in Cuban blocks10625416527012.0161666763293.8125
Whites in Cuban, PR and Asian blocks72553313553694313.5122927415341.22857
Vit in Vit blocks3833492073815061.0048308551843.024155
Flip in Vit blocks94392199439214968
Chinese and Other Asians in Vit blocks7400013601271.2307282924625.153846
Cuban, PR and Mexican in Vit blocks310144171439602.1543762168468.235294
Other Hispanics in Vit blocks11012710847131.300001188471.3
Afrocarib and AA in Vit,, Flip, and Chinese blocks342256391553412.2032560623983.102564
Vit in Flip Blocks38422173842212260.117647
Flip in Flip Blocks289714802792421.0375015223490.525
Chinese in Flip Blocks240247672367141.0149251843533.044776
other Asians in Flip blocks221116321814291.2187467275669.65625
Cuban, PR, Mexican and other Hisp in Flip blocks140140171212871.1554412267134.529412
Vit and Flip in PR blocks5180113482751.0730398763713.461538
Chinese and other Asians in PR blocks139542161254751.1121099827842.1875
Cuban and PR in PR blocks10196692499261721.1009499323719.566265
Mexican in PR blocks509666343938331.2941170511583.32353
other Hisp in PR blocks110991521010499201.0571424494999.619048
Afrocarib in PR blocks12624643952371.3255982442214.813953
AA in PR blocks157140417311819811.3294663796832.260116
Vit in Chinese blocks55101195510112900.052632
Flip in Chinese blocks251669226491.1111307342516.555556
Chinese in Chinese blocks5415531175324511.0170945314550.863248
other Asians in Chinese block247125341714741.4411805875043.352941
Cuban, PR, Mexican, and other Hisp in Chinese blocks258519251443681.7906946145774.72
Table 12: NSAL and NLAAS collapsed cells, post-stratification adjustment factor and mean weights (Continued)
Collapsed groups for NSAL & NLAASCPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(nsnlwt)
Asians in Africocarib and AA 60-100 blocks9088913643851.4116486764952.692308
Cuban and PR in Africocarib blocks8489644848970.9999882211929.477273
Mexican and other Hisp in Africocarib blocks111795711117960.9999910551574.591549
Africocarib in Africocarib blocks5773829335773860.999993072618.8488746
AA in Africocarib blocks62549126254915212.416667
Cuban in AA 60-1001634711149841.0909636951362.181818
PR in AA 60-1006872723585451.1739174992545.434783
Mexican in AA 60-100282346141464021.9285665510457.28571
Other His in AA 60-100165520251334841.2399988015339.36
Afrocarib in AA 60-1002881192462768611.0406630041125.45122
AA in AA 60-10010596921184083219161.2733751464522.780435
Asians in AA 30-59.913491415766871.7592812345112.466667
Cubans, PR, and Mexicans in AA 30-59.98258517234863.5163501661381.529412
Other His in AA 30-59.9203503281424511.428582465087.535714
Afrocarib in AA 30-59.92206381682155111.0237899691282.803571
AA in AA 30-59.9511038790841357081.2356740374554.744493
Vit in AA 10-29.9179945681723421.044115772534.441176
Flip in AA 10-29.9389766893730021.0449434594191.033708
Chinese in AA 10-29.9255197592428491.0508464114116.084746
Other Asian in AA 10-29.9592307744980771.1891876166730.77027
Cubans in AA 10-29.9149585801459371.0249970881824.2125
PR in AA 10-29.9374734833239891.1566256883903.481928
Mexicans in AA 10-29.9294272216021797941.35000004613623.7125
Other Hisp in AA 10-29.9759917846258141.2142857147450.166667
Afrocarib in AA 10-29.9198343391611551.2307592074132.179487
AA in AA 10-29.9354586651826127291.3571503215043.878378
Vit in AA 0-9.94374481854304671.0162172712326.848649
Flip in AA 0-9.96741841726478191.0406980963766.389535
Chinese in AA 0-9.910396602499995181.0401613584014.128514
Other Asian in AA 0-9.9174929023312737021.3733903225466.532189
Cubans in AA 0-9.98056284257871081.0235291731852.018824
PR in AA 0-9.95364391174023291.3333341623438.709402
Table 12: NSAL and NLAAS collapsed cells, post-stratification adjustment factor and mean weights (Continued)
Collapsed groups for NSAL & NLAAS CPS 2002Un-weighted CountCount using CPES
weights
Post-stratification
adjustment factor
Mean weight
(nsnlwt)
Mexican in AA 0-9.91152950963567601461.70551183410645.89921
Other His in AA 0-9.9259210023818038591.4369748417579.239496
Afrocarib and AA in AA 0-9.9733635732434053.0140506563334.315068
Whites in Africocarib and AA blocks1454752618561025958314.1794516411985.49416
Vit in Hawaiian blocks34032340311701.5
Flip in Hawaiian blocks42161712642161713346.166667
Chinese in Hawaiian blocks223468692234690.9999955253238.681159
Other Asian in Hawaiian blocks3609778436097714297.345238
Cuban in Hawaiian blocks00000
PR in Hawaiian blocks37466113746613406
Mexican in Hawaiian blocks00000
Other Hisp in Hawaiian blocks29451294512945
Africocarib in Hawaiian blocks00000
AA in Hawaiian blocks00000
Whites in Hawaiian blocks00000

IX. CPES Weights Chart

To better understand weights in CPES, please consult our CPES Weights Chart and take a look at our FAQs on weights and complex design.

X. References and Readings

Alegria M, Takeuchi D, Canino G, Duan N, Shrout P, Meng X, Vega W, Zane N, Vila D, Woo M, Vera M, Guarnaccia P, Aguilar-Gaxiola S, Sue S, Escobar J, Lin K, Gong F. Considering context, space and culture: the National Latino and Asian American Study. IJMPR 2004; 13(4): 208-20.

Alegria M, Vila D, Woo M, Canino G, Takeuchi D, Vera M, Febo V, Guarnaccia P, Aguilar-Gaxiola S, Shrout P. Cultural relevance and equivalence in the NLAAS instrument: integrating etic and emic in the development of cross-cultural measures for a psychiatric epidemiology and services study of Latinos. IJMPR 2004; 13(4): 270-88.

Alegria, M., Takeuchi, D.T., Canino, G.,Duan, N.,Shrout, P.E., Vega, W., Zave, N., Guarnaccia, P., Aguilar-Gaxiola, Ver, M., Sue, S., Escobar, J., Lin, Keh-Ming, Jang, M. amd Gong, F. (2004). " Considering Context, Space and Culture: The National Latino and Asian American Study". International Journal of Methods in Psychiatric Research, Vol. 13, No.2, pp. 208-220.

American Association for Public Opinion Research. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. American Association for Public Opinion Research (AAPOR) standard: http://www.aapor.org , 2004.

Atrostic BK, Bates N, Burt G, Silberstein A. Non-response in US government household surveys: consistent measures, recent trends, and new insights. Journal of Official Statistics 2001; 17: 209-26.

Blaise(r) Survey Processing System: Version 4.5. Statistics Netherlands, 2000-2004.

Bogen K. The effect of questionnaire length on response rates - a review of the literature. Proceedings of the Section on Survey Research Methods, American Statistical Association 1996: 1020-5.

Bradburn NM. Respondent burden. Proceedings of the Section on Survey Research Methods. American Statistical Association 1978: 35-40.

Cannell C, Marquis K, Laurent A. A summary of studies. Vital Health Stat 1977; 2: 69.

Chambers, R.L. and Skinner C.J. (editiors). (2003). Analysis of Survey Data. JohnWiley and Sons, New York.

Cheung GQ, Liu Y. Displaying Chinese characters in Blaise. Proceedings of the Eighth International Blaise Users Conference 2003, Copenhagen, Denmark.

Cochran WG. Sampling Techniques, 3 edn. New York: John Wiley & Sons, 1977.

Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons.

Corbin J, Morse JM. The unstructured interactive interview: issues of reciprocity and risks when dealing with sensitive topics. Qualitative Inquiry 2003; 9(3): 335-54.

de Leeuw E, de Heer W. Trends in household survey nonresponse: a longitudinal and international comparison. In R Groves, DA Dillman, JL Eltinge and RJA Little (eds) Survey Non-response. New York: Wiley, 2002, pp. 41-54.

Groeneveld R. Using non-Latin alphabets in Blaise. Proceedings of the Eighth International Blaise Users Conference 2003. Copenhagen, Denmark.

Groves RM, Couper MP. Non-response in Household Interview Surveys. New York: Wiley, 1998.

Groves RM, Fowler FJ, Couper MP, Lepkowski J, Singer E, Tourangeau R. Survey Methodology. New York: Wiley, 2004.

Groves RM. Survey Errors and Survey Costs. New York: John Wiley & Sons, 1989.

Guenzel PJ, Berckmans TR, Cannell CF. General Interviewing Techniques: A Self-Instructional Workbook for Telephone and Personal Interviewer Training. Ann Arbor, Michigan: Institute for Social Research, Survey Research Center, 1983.

Hansen MH, Hurwitz WN, Madow WG. Sample Survey Methods and Theory, Volumes I and II. New York: John Wiley & Sons, 1953.

Hansen MH, Hurwitz WN. The problem of nonresponse in surveys. Journal of the American Statistical Association 1946; 41: 517-29.

Hartley, H.O. (1962). "Multiple frame surveys." Proceedings of the Socials Science Section of the American Statistical Association Meeting, Minneapolis, Minnesota.

Hartley, H.O. (1974). "Multiple Frame Methodology and Selected Applications." Sankhya, Series C, 3, pp. 99-118.

Heeringa SG, Connor J, Darrah D. The 1980 SRC/NORC National Sample. Ann Arbor: Survey Methodology Program, Survey Research Center, University of Michigan, 1984.

Heeringa SG, Connor J, Redmond G. The 1990 SRC National Sample. Ann Arbor: Survey Methodology Program, Survey Research Center, University of Michigan, 1994.

Heeringa SG, Groves RM. Responsive Design for Household Surveys. Ann Arbor: Survey Methodology Program, Institute for Social Research, University of Michigan, 2004.

Heeringa SG, Liu J. Complex sample design effects and inference for mental health survey data. IJMPR 1997; 7(1): 56-65.

Heeringa SG, Wagner J, Torres M, Duan N, Adams T, Berglund P. Sample designs and sampling methods for the Collaborative Psychiatric Epidemiology Studies (CPES) IJMPR 2004; 13(4): 221-40.

Heeringa, S. , (2006). Technical Sample Design Documentation: National Study of American Life (NSAL). Technical Report. Statistical Design Group, Institute for Social Research, University of Michigan, Ann Arbor.

Heeringa, S., (2004). Technical Sample Design Documentation: 2002-2003 National Latino and Asian American Study (NLAAS). Technical Report. Statistical Design Group, Institute for Social Research, University of Michigan, Ann Arbor.

Heeringa, S., Wagner, J., Torres, M.,Duan, N., Adams, T., Berglund, P. (2004). "Sample designs and sampling methods for the Collaborative Epidemiology Studies (CPES)", International Journal of Methods in Psychiatric Research, Vol. 13, No. 4, pp. 221-240.

Heeringa, S.G. and Liu, J. (1997). "Complex sample design effects and inference for mental health survey data." International Journal of Methods in Psychiatric Research, Volume 7, Number 1, 56-65.

Henderson AS, Jorm AF. Do mental health surveys disturb? Psychol Med 1990; 20: 721-4.

Hess I. Sampling for social research surveys: 1947-1980. Ann Arbor: Institute for Social Research, University of Michigan, 1985.

Jackson J. Life in Black America. London: Sage Publications, 1991.

Jackson JS, Torres M, Caldwell CH, Neighbors HW, Nesse R, Taylor RJ, Trierweiler, SJ, Williams DR. The National Survey of American Life: a study of racial, ethnic and cultural influences on mental disorders and mental health. IJMPR 2004; 13(4): 196-207.

Jorm AF, Henderson AS, Scott R, MacKinnon AJ, Korten AE, Christensen H. Do mental health surveys disturb? Further evidence. Psychol Med 1994; 24: 233-7.

Kalton, G. (1977), "Practical methods for estimating survey sampling errors," Bulletin of the International Statistical Institute, Vol 47, 3, pp. 495-514.

Kessler R, Berglund P, Chiu WT, Demler O, Heeringa S, Hiripi E, Jin R, Pennell B, Walters E, Zaslavsky A, Zheng H. The US National Comorbidity Survey Replication (NCS-R): an overview of design and field procedures. IJMPR 2004; 13(2): 69-92.

Kessler R. The National Comorbidity Study of the United States. International Review of Psychiatry, 1994; 6: 365-76.

Kessler RC, Üstün TB. The World Mental Health (WMH) survey initiative version of the World Health Organization Composite International Diagnostic Interview (CIDI). IJMPR 2004; 13(2): 93-121.

Kessler RC, Wittchen H-U, Abelson J, Zhao S. Methodological issues in assessing psychiatric disorders with self-reports. In AA Stone, JS Turkkan, CA Barchrach, JB Jobe, HS Kurtzman and VS Cain (eds) The Science of Self-Report: Implications for Research and Practice. Mawah NJ: Lawrence Erlbaum Associates, 200; 229-55.

Kessler, R., Berglund, P., Chiu, W.T., Demler, O., Heeringa, S.,Hiripi, E., Jin, R.,Pennell, B., Walters, E., Zaslavsky, A., Zheng, H. (2004). "The U.S. National Comorbidity Survey Replication (NCS-R): an overview of design and field procedures." International Journal of Methods in Psychiatric Research, Vol. 13, No.2, pp. 69-92.

Kish L. A procedure for the objective selection of the respondent within the household. Journal of the American Statistical Association 1949; 44: 380-7.

Kish L. Statistical Design for Research. New York: John Wiley & Sons, 1987.

Kish L. Survey Sampling. New York: John Wiley & Sons, 1965.

Kish, L. (1965), Survey Sampling. New York: John Wiley & Sons, Inc.

Lessler JT, Kalsbeek WD. Nonsampling Errors in Surveys. New York: John Wiley & Sons, 1992.

Little, R.J.A. and Rubin, D.B. (2003). Statistical Analysis with Missing Data, 2nd Edition, John Wiley and Sons, New York.

Pennell, B.P., Bowers, A. Carr. D., Chardoul, S. Cheung, G-Q, Dinkelmann, K, Gebler, N., Hansen, S.E, Pennell, S. Torres, M. (2004). "The Development and Implementation of the National Comorbidity Survey Replication, the National Survey of American Life, and the National Latino and Asian American Survey", International Journal of Methods in Psychiatric Research, Vol. 13, No. 4, pp. 241-269.

Rao, J.N.K & Wu, C.F.J. (1988.), "Resampling inference with complex sample data," Journal of the American Statistical Association, 83, pp. 231-239.

Research Triangle Institute (2003). SUDAAN User's Manual, Release 9.0. Research Triangle Park, NC: Research Triangle Institute.

Rust, K. (1985). "Variance estimation for complex estimators in sample surveys," Journal of Official Statistics, Vol. 1, No. 4.

SAS Institute, Inc. (2003). SAS/STAT(R) User's Guide, Version 9, Cary, NC: SAS Institute, Inc.

Sharp LM, Frankel J. Respondent burden: a test of some common assumptions. Public Opinion Quarterly 1983; 47: 36-53.

Singer E, Groves RM, Corning AD. Differential incentives: beliefs about practices, perceptions of equity, and effects on survey participation. Public Opinion Quarterly 1999; 63: 251-60.

Singer E, Van Hoewyk J, Gebler N, Raghunathan T, McGonagle K. The effect of incentives in interviewermediated surveys. Journal of Official Statistics 1999; 15: 217-30.

Skinner, C.J., Holt, D., & Smith, T.M.F. (1989). Analysis of Complex Surveys. New York: John Wiley & Sons.

STATA Corp. (2004). STATA Statistical Software: Release 9.0. College Station, TX: STATA Corporation.

Turnbull JE, McLeod JD, Callahan JM, Kessler RC. Who should ask? Ethical interviewing in psychiatric epidemiology studies. American Journal of Orthopsychiatry 1988; 58(2): 228-39.

Westat, Inc. (2000). WesVar 4.0 User's Guide. Rockville, MD: Westat, Inc.

Wittchen H-U. Reliability and validity studies of the WHD Composite International Diagnostic Interview (CICDI): a critical review. J Psychiatr Res 1994; 28(1): 57-84.

Wolter, K.M. (1985 ). Introduction to Variance Estimation. New York: Springer-Verlag.

World Health Organization. Composite International Diagnostic Interview, Version 1.0. Geneva, Switzerland: World Health Organization, 1990.

World Health Organization. Composite International Diagnostic Interview, Version 2.1. Geneva, Switzerland: World Health Organization, 1997.