VI. Final Weights and Special Analysis Consideration for Weighted Analysis
VI.A Final Weights
As noted in Section V above, n=9282 NCS-R survey respondents completed Part 1 of the two-part CIDI-based interview; however, only a subsample of n=5692 NCS-R respondents went on to complete the more in-depth Part 2 questionnaire modules. All n=6082 NSAL and n=4649 NLAAS respondents completed the full interview schedule--the equivalent of NCS-R Parts 1 and 2. To account for the split schedule of NCS-R questionnaire administration, two CPES pooled analysis weights were computed using the calculation sequence described in steps (5) through (8) in Section V above. The first weight, termed the "Part 1" weight, is labeled CPESWTSH in the merged CPES data set. It is the population weight that should be used for analysis involving variables that are included in Part 1 of the NCS-R. The second weight, termed the "Part 2" weight, is stored in the merged data set as the variable, CPESWTLG. It is the population weight that should be used when the CPES analysis includes variables that NCS-R only asked of the Part 2 subsample of respondents.
Table 5 provides a summary of selected distributional statistics for the final CPES Part 1 and Part 2 analysis weights.
|Table 5: Distributions of CPES Final Part1 and Part 2 Analysis Weights|
|CPES Final Analysis Weight|
|WHITE AND OTHER||7871||19362||1250||131331||5256||28800||1250||195000|
VI.B Special Considerations in Weighted Analysis of the CPES Data
The weights have been designed to enable analysts to compute unbiased or nearly unbiased estimates of population statistics and relationships (e.g. bivariate associations, regression relationships) for the larger CPES survey population of U.S. residents. Contemporary statistical software systems such as SAS, Stata, SPSS, and SUDAAN all provide the capability to conduct weighted analysis of the CPES survey data. CPES data analysts are encouraged to consult the user guides and help support for their chosen software package to learn the syntax and program specific features for conducting weighted analysis. The following paragraphs provide guidance on weighted analysis that is specific to the CPES data set.
VI.B. 1 Part 1 or Part 2 Weight ?
CPES analysts should consult the data documentation to determine if variables of interest in their analysis were obtained in Part 1 or Part 2 on the NCS-R interview. If the analysis includes only Part 1 variables, the CPESWTSH analysis weight should be used. It will include the full sample of NCS-R cases and provide greatest precision for sample estimates of population characteristics or relationships. If the analysis includes one or more variables that NCS-R collected only in Part 2, the appropriate weight for population estimation is CPESWTLG.
In the calculation of the Part 1 and Part 2 weights, the absolute contributions from NSAL and NLAAS to the pooled weight calculation remained unchanged--only NCS-R required changes to the nominal case counts and initial rescaling steps. However, due to the reduced NCS-R sample size for the Part 2 variables, the relative contributions of the NSAL and NLAAS to any give race/ethnicity x sample domain weighting cell did change. Therefore the final CPES Part 1 and Part 2 analysis weights (Step 8 above) differ for NSAL and NLAAS cases as they do for the NCS-R cases.
VI.B.2 Subsetting the CPES data by study
Occasionally, analysts may choose to extract CPES data for only one or two of the three component data sets. The CPES analysis weights will support this type of analysis; however, analysts should recognize that the sum of weights for this special CPES subset may not sum to the population control for that population. For example, consider an analysis which only used Afro-Caribbean data from the NLAAS and NSAL. Since a small number of Afro-Caribbeans interviewed in the NCS-R would be excluded from this analysis, the sum of weights for the combined NSAL and NLAAS cases would no longer match the CPES population control total for the Afro-Caribbean race/ethnicity population. A principle of weighted analysis of data is that population estimates and sampling errors (except for estimates of totals) should be invariant to any linear scaling of the weights (multiplication or division by a constant). Under the procedures used to compute the CPES Part 1 and Part 2 weights, this assumption of linear scaling applies when the data for one or two studies are used independently or are compared.
VI.B.3 Subsetting the CPES data based on characteristics or respondents
In general, CPES analysts can apply the analysis weights for subpopulation analysis (e.g., estimation for women of Mexican-American ancestry). Provided all qualifying cases in the CPES data are included in the subpopulation analysis, the estimates would be unbiased and the sum of the CPES weights would be an unbiased estimator of the 2002 population count for that subset of the larger U.S. population. Experience has shown that due to the sheer numbers of observations and richness of the variable set, data sets such as the CPES generate interest in rare populations or populations for which the original samples were not optimal (e.g. women of Mexican-American ancestry living in the West Census Region and covered by a regional health maintenance organization (HMO) program). CPES analysts who have concerns about the appropriateness of the CPES for subpopulation analysis they are proposing to conduct are encouraged to consult a survey statistician.
VI.B.4 Item Missing Data for Analysis Variables
The original NCS-R, NSAL, and NLAAS analysis weights included adjustments for survey nonresponse. Through the process used to create the combined analysis weights, these adjustments for differential nonresponse are preserved in the CPES Part 1 and Part 2 weights. However, the CPES weights do not include adjustments for item missing data in the CPES data set. With a few special exceptions, most statistical software packages employ "case-wise" deletion as the means to address the problem of missing values for the variables. That is, any case with a missing value on one or more variables (e.g. fitting a multivariate logistic regression model) will cause the case to be dropped from the analysis. If the amount of such case-wise deletion is substantial, the unbiasedness of the weighted estimation may be compromised. Analysts are encouraged to use standard data checking techniques to establish the patterns of missing data in their analysis variables and assess the extent and impact of software-driven case-wise deletion on the integrity of their analysis. If the variables of interest have high rates of item missing data, analysts may consider consulting a survey statistician to consider remediation approaches such as stochastic imputation (Little and Rubin, 2002).