Collaborative Psychiatric
Epidemiology Surveys

# How should I handle no subpopulation (Asian or Latino) members in a stratum for only NLAAS Asian or Latino groups with Stata?

In FAQ #67 we discussed "How should I detect and handle the single PSU in a stratum for NLAAS or CPES Latino groups with Stata?" This FAQ is going to introduce how we should handle no subpopulation (Asian or Latino) members in a stratum for NLAAS Asian or Latino groups and perform an unconditional subpopulation analysis based on the full NLAAS data set.
• Step 1: Create a copy of the race/ethnicity variable, and set values for the groups not in your subpopulation equal to one of the ethnicity values for groups in your subpopulation.
• Step 2: As FAQ #76 suggested, to perform the analysis only for Latinos (or any other ancestry of demographic subpopulation of interest), first create an indicator variable that has a value of "1" for all eligible cases you wish to include in your analysis and a value of "0" for all other NLAAS cases.
• Step 3: Use the STATA subpop() option to restrict your analysis to the chosen subpopulation of cases.
Here we provide a case study along with two common questions/problems analysts usually need to deal with:

Let's say we would like to test if there is a relationship between races/ancestries and gender for NLAAS Latino groups. The we can generate the cross-table of RANCEST and Sex variables and compute chi-square statistics for NLAAS Latino groups. However, there are no Latino groups in 12 strata so chi-square statistics will not be computed.

• Step 1: Create a new RANCEST2 variable, recoded all Asian groups? values (RANCEST=1, 2, 3, or 4) as the same as Cuban group (RANCEST=5).
• Step 2: Create an indicator variable LATINO  hat has a value of "1" for all Latino groups a value of "0" for all other NLAAS cases.
• Step 3: Use the STATA subpop (LATINO) option with corresponding cluster, strata, and weight variables.

. svyset SECLUSTR [pweight=NLAASWGT], strata(SESTRAT)

pweight: NLAASWGT
VCE: linearized
Single unit: missing
Strata 1: SESTRAT
SU 1: SECLUSTR
FPC 1: <zero>

. generate RANCEST2 = 0

. replace RANCEST2 = 5 if RANCEST<=5

. replace RANCEST2  = 6if RANCEST==6

. replace RANCEST2  = 7if RANCEST==7

. replace RANCEST2  = 8if RANCEST==8

. generate LATINO = 0

. replace LATINO = 1 if RANCEST>=5

. svy, subpop (LATINO): tab RANCEST2 SEX
(running tabulate on estimation sample)

Number of strata   =        57                  Number of obs      =      3956
Number of PSUs     =       114                  Population size    =  27942479
Subpop. no. of obs =      2554
Subpop. size       =  21654900
Design df          =        57

----------------------------------
|          Sex
RANCEST2 |   MALE  FEMALE   Total
----------+-----------------------
5 |  .0243   .0219   .0463
6 |  .0489   .0516   .1005
7 |  .3052   .2611   .5663
8 |  .1366   .1503   .2869
|
Total |   .515    .485       1
----------------------------------
Key:  cell proportions

Pearson:
Uncorrected   chi2(3)         =   13.3482
Design-based  F(2.23, 126.94) =    4.3779     P = 0.0117

Note: 12 strata omitted because they contain no subpopulation members.

Below is the output of analyses if we skip the Step 1.

You can see that the chi-square statistics was not computed because 12 strata contained no subpopulation (Latino) members.

. svyset SECLUSTR [pweight=NLAASWGT], strata(SESTRAT)

pweight: NLAASWGT
VCE: linearized
Single unit: missing
Strata 1: SESTRAT
SU 1: SECLUSTR
FPC 1: <zero>

. generate LATINO = 0

. replace LATINO = 1 if RANCEST>=5

. svy, subpop (LATINO): tab RANCEST SEX
(running tabulate on estimation sample)

Number of strata   =        57                  Number of obs      =      3956
Number of PSUs     =       114                  Population size    =  27942479
Subpop. no. of obs =      2554
Subpop. size       =  21654900
Design df          =        57

----------------------------------
Race/Ance |          Sex
stry      |   MALE  FEMALE   Total
----------+-----------------------
VIETNAME |      0       0       0
FILIPINO |      0       0       0
CHINESE |      0       0       0
ALL OTHE |      0       0       0
CUBAN |  .0243   .0219   .0463
PUERTO R |  .0489   .0516   .1005
MEXICAN |  .3052   .2611   .5663
ALL OTHE |  .1366   .1503   .2869
|
Total |   .515    .485       1
----------------------------------
Key:  cell proportions

Table contains a zero in the marginals.
Statistics cannot be computed.

Note: 12 strata omitted because they contain no subpopulation members.

Below is the output if you conduct analyses on only NLAAS Latino groups with corresponding cluster, strata, and weight variables after dropping all Asian groups from the data set.

You will see distributions among different race/ancestry and gender groups and chi-square statistics computed along using the Latino-specific weight variable (NLSWTLAT) are different from the case study we shown earlier. These differences are due to the fact that we used different weights in the two approaches. The overall NLAAS weight (NLAASWGT) adjusts the sample to a different population than the Latino-specific weight.

We recommend that you should NEVER simply delete cases that are not in a particular subpopulation. After you create your indicator variable (see Step 2), you should always use the subpop() option rather than dropping cases or using if modifiers.

. drop if NLSWTLAT==.
(2095 observations deleted)

. svyset SECLUSTR [pweight=NLSWTLAT], strata(SESTRAT)

pweight: NLSWTLAT
VCE: linearized
Single unit: centered
Strata 1: SESTRAT
SU 1: SECLUSTR
FPC 1: <zero>

. svy: tab RANCEST SEX
(running tabulate on estimation sample)

Number of strata   =        57                  Number of obs      =      2554
Number of PSUs     =       110                  Population size    =  21654900
Design df          =        53

----------------------------------
Race/Ance |          Sex
stry      |   MALE  FEMALE   Total
----------+-----------------------
CUBAN |  .0238   .0224   .0463
PUERTO R |  .0517   .0487   .1005
MEXICAN |  .2917   .2746   .5663
ALL OTHE |  .1478   .1391   .2869
|
Total |   .515    .485       1
----------------------------------
Key:  cell proportions

Pearson:
Uncorrected   chi2(3)         =    0.0000
Design-based  F(2.30, 122.11) =    0.0000     P = 1.0000

CPES Team