Los Angeles Family and Neighborhood Survey (L.A.FANS): Sampling/Sample Weights

QUESTION : If I am running simple cross-sectional regressions for all children ages 12-19 in Wave 2, should I use the XWGT_RSCSIB_LAF or XWGT_RSCSIB_LAC? In the same regressions, I need to account for the clustering of individuals. I was originally using the cluster option with the 1990 census tract indicator. Is this correct? Or, if I use weights, do I still need to use the cluster option?

RESPONSE:

Whether you use “XWGT_RSCSIB_LAF” or “XWGT_RSCSIB_LAC” depends whether you want to use all children ages 12-19 in (a) just the 65 L.A.FANS tracts or (b) all of L.A. County. If it’s (a) then use the former weight, if it’s (b) then use the latter.

Yes, you should account for clustering of individuals using the original 1990 census tract indicator, which is the basis for sample selection.

Yes, you should still use the cluster option when you use the weights.


QUESTION 2: I am working on an analysis using the Wave II adult module (adult2) and have a question about the weights for all of these respondents.
I am using the LA County weight, XWGT_ADULT_LAC. According to Wave II codebook p.45 this weight includes RSA, PCG, and RSC/SIB 18+ years. Once I merged this variable from the weight file (indivwgts2) with the Wave II adult module (adult2) I discovered that the individuals with weights (merged based on hhid2 pid2) do not align with the respondents in the adult module. Specifically:
  • 455 RSC and SIB 18+ do NOT have weights
  • 20 respondents with weights are not in the Wave II adult module at all
  • 1864 have a weight and a complete Wave II adult module
Did I select the appropriate weight file? Is there an error?

RESPONSE:

The user’s guide is a bit misleading it seems—something we didn’t catch. For RSAs and PCGs you use the XWGT_ADULT_LAC as their weight and for the aged up RSCs and SIBs, you use their XWGT_RSCSIB_LAC value as theirs. Note that an adult respondent may have a weight if they did not do the adult module but did do another module like the PCG or Health Measures.


QUESTION: Could you clarify the differences between the sample weights for the RSC and PCG (WGTRSC and WGTPCG) and the Child and Adult sample weights (WGTKID and WGTADLT)? Is there any sample weight for a combined analysis using both the RSC and the PCG?

RESPONSE:

The sample weights are explained in the LAFANS codebook (DRU-2400/2–it’s the pdf) starting on page 41. Please read the sample weights section for specific details on the LAFANS sampling weights.

The weights are designed for different analysis samples. If you are only working with RSCs, you use the WGTRSC weight. If you are using a sample that pools RSCs and SIBS, you use the WGTKID weight. If you are using a sample only of those who are PCGs (this includes PCGonlys and those RSAs who were also selected as the PCG), you use the WGTPCG. If you are using a sample of all adult respondents (this is all RSAs and all PCGs, remembering that some respondents are both an RSA and a PCG), then you use the WGTADLT sampling weight.

The sample weights for either the RSC or the PCG are what you need. It’s a classic issue in sampling or demography. If your analysis is based on a sample of mothers (or PCGs, to be more precise) and that is the group you want to make inferences to, you should start with the PCG sample, add in information on RSCs, and use the PCG weights. If, on the other hand, your analysis is based on children and that is the group you want to make inferences to, you should start with the RSC (or the RSC/SIB sample), add in information from the PCG for each child, and use the child weights. Before starting any analysis, I strongly recommend thinking about who your sample should be.

My guess is that you want to make inferences about children. In that case, as I said above, your analysis should be based on the child sample and you should use child weights.

For children, you have two choices in LAFANS, as you know. You can either use the RSCs, which are a random sample of one child per household. In that case, you should use the RSC weights. Or you should use the RSCs and SIBS, in which case we have provided a “child weight” that corrects for the sampling of children. If you use both RSCs and SIBS, however, you should use a statistical procedure that corrects for the fact that you have two children per household.


QUESTION: I’d like to know more about the nonresponse adjustment used in LA FANS. I’ve been reading the documentation and all I could find was a paragraph on “LA FANS Codebook” (page 46) that says you used a raking procedure on all the 2way cross-classification of age, gender and race/ethnicity.
Is there more information on these procedures? Where do these 3 variables come from? (were they completed by interviewer observations during the screener?). Are these 3 variables self-reports of the respondents or were they observations made by the interviewers (without asking the respondents)?
During the screener, the interviewers are asked to complete these data by themselves. Since you have this “interviewer observations” for the 7,638 households screened, I wanted to know if it were these age/sex/race variables the ones that were used in the nonresponse adjustment. Or did you used the info provided by the respondent on the roster? Or on the adult questionnaire? Since you have to have information on respondents and nonrespondents to do the nonresponse adjustment I thought you were using the screener data. Am I wrong?

RESPONSE:

There is no other documentation about the L.A.FANS weighting procedures, other than what’s provided in the Codebook.

Just to elaborate on the write up there, the goal of the weighting procedure was to have the marginals of the two-way tables (that reflects the three variables–age, sex, and race/ethnicity) based on the survey match the same marginal distribution from the 2000 census (for Los Angeles County as a whole). Raking was necessary in order to get all the marginals to match. The weighting procedure thus incorporates an implicit adjustment for non-response.

The three variables (age, sex, and race/ethnicity) were obviously collected though interviews with L.A.FANS respondents–in other words, respondents reported on their individual characteristics and parents (typically mothers) reported on their children’s characteristics. The matching population figures were collected in the 2000 Census and reported in the summary files as cross-tabulations.

In constructing the weights, we used self-reported (or parent-reported, in the case of children) information on sex, age, and race/ethnicity from the adult (or parent) questionnaire. In cases where self-reported information was missing, we filled it in with information from the Roster. (Note that item non-response levels on race/ethnicity in the questionnaires were quite low.)

We also had interviewer observations of sampled respondents’ race/ethnicity–but these were not used for weighting. In fact, we have multiple different measures of race/ethnicity (roster, self-report, self-report “best” race if multiple races reported, interviewer observation, etc.).

Information from the screener was not used in constructing any of the weights. Recall that the screener respondent was not necessarily one of the sampled respondents. For non-response cases, we don’t have any further information on household composition to try to figure out the type of respondent and for many screener refusals and incompletes we didn’t have any interviewer observation information (because the interviewers did not complete it).


QUESTION: LOW SES neighborhoods were oversampled… Does this need to be taken into account in an analysis or did you correct for it in some way?

RESPONSE:

Please read the section of the L.A.FANS codebook on sampling weights. The sampling weights take care of both the oversample by neighborhood poverty level and the oversample of households with children. It may also help you to read about The Design of a Multilevel Survey of Children, Families, and Communities.


QUESTION: I have been using Level 2 LAFANS data, and we have some concerns about design effects and generalizability to the larger population. Although I’ve looked through the codebooks, I don’t see any variables or weights that could be used to account for design effects or to weight the data in order to make it match the characteristics of the larger LA population. Are these options are available?

RESPONSE:

If you look at the L.A.FANS Codebook produced by RAND on pp. 41-46 you will find a description of the sample weights that account for the sampling design of the survey. If you are using special estimation procedures, such as “svy” commands in Stata, you can use the weights in conjunction with the stratifying and clustering variables (also described in the manual) to obtain corrected standard errors.


QUESTION: Is there is a variable that incorporates the stratification within the sample design? Was any clustering used in the sample design and if so–what variable would contain the sample cluster?
I am using STATA for my analysis and and want to ensure that my standard errors are correct. I am therefore using the sample weight variable (in my case the sample weight for the RSC), as well as the stratification variable (POVCAT) to adjust my analysis–however I am wondering what the variable is that incorporates the clustering within the sample design?

RESPONSE:

The POVCAT variable in the MODSTAT1 and ROSTHH1 files contains the sampling strata in the sample design. The POVCAT variable is also described in Chapter 4 of the main LAFANS codebook under the “Community Characteristics” subsection.

The sample design is described in detail in:

Sastry, Narayan, Bonnie Ghosh-Dastidar, John Adams, and Anne R. Pebley
The Design of a Multilevel Survey of Children, Families, and Communities: The Los Angeles Family and Neighborhood Survey — 2003

The short answer to your question is that there were two oversamples: (1) we oversampled in poor and very poor strata, and (2) we oversampled households with children. But please see the publication listed above for a more complete description.

If you’re using Stata for your analysis, you might want to use the “svy” set of routines, which would involve specifying the following sample design parameters:

strata: POVCAT
psu: TRACT
pweight: appropriate individual or household weight

One thing to note is that if you are using all children (i.e., both RSCs and SIBs) in your analysis, then you will have an additional level of clustering. A similar situation would occur if you are using all adults (i.e., both RSAs and PCGs). There are a number of ways to account for this additional level of clustering–you should probably consult with a statistician to get some specific guidance that best applies to your situation.

Note that to use TRACT, you must have either restricted version 1 data which has the pseudo tract variable TRACTX or the restricted version 2 data that has the actual census tract in TRACTH90.


QUESTION: I want to use the LAFANS data to calculate the number of natives and immigrant (adults) in Los Angeles county. I have identified citizens and immigrants in the data but am not sure on how use the weights to compute the total population numbers.

RESPONSE:

The LAFANS excludes all non-English and non-Spanish speaking households. So using weighted tabulations of native vs immigrant from the LAFANS will not give one an estimate of the native vs immigrant distribution in LA county. One would be missing Asian immigrants, Russian immigrants, etc.

If you were focusing just on natives and immigrants from English- or Spanish-speaking households, you probably could use weighted tabs from the LAFANS and apply them to census numbers based on English- and Spanish-speaking households only.


QUESTION:  I am working with just the RSAs, and want to apply the post-stratification sample weights.
My first question is: Do I use the wgtrsa variable? In order to apply the sample weights on Stata, I also need to specify the PSU (primary sampling weight) and strata.
My second question is: Do you know what variables correspond to these in the public use data? I would imagine some variable corresponding with census tract or pseudo census tract would be the PSU, but I can’t find either in the public use data.

RESPONSE:

I’d suggest rereading the section of the codebook on weights. As the codebook says, you use the wgtrsa weights if you are using the RSA only sample and wgtadlt if you are using all adults. The weights all account for the strata oversampling, the household selection probabilities by tract, and tract-specific rates of oversampling of households with children and of household nonresponse. To apply the sample weights in STATA, there are at least two ways to do it. You can use the svy procedures or the weight= subprocedure for each procedure (e.g., logit). In the weight= procedure you do not need to specify the strata or PSUs, since they are already incorporated into the weights.

If you want to use the svy procedures and specify both the PSUs and the strata, you have to use restricted version 1 data and not the public use data. Remember that the public use data does not specify which respondents are in the same census tract (PSU). That’s the reason the data are not restricted. The public use data do, however, provide information on which stratum a household is in. This variable is povcat.

So basically you have two options: (1) use the public use data with the weights and correcting the standard errors for stratum only (using povcat) and the weight= option or (2) use the version 1 restricted data with the svy commands and full correction of the standard errors for both PSU and stratum.


QUESTION: The codebook for wave 1 indicates that there is no reliable household weight so it is not being released: “household weights are not being released because we were unable to construct satisfactory control totals for use in adjusting the weights to match the 2000 Census” (pg 41).
I did find a household weight (WGTHH) in the private data, however. Is this a good measure that I can use to compare to Census data? Can you tell me why the household weight wasn’t constructed for public use, or just a bit more detail behind the statement on page 41 in the codebook?

RESPONSE:

Providing household weights significantly increases the probability of deductive identification of respondents. Therefore, we are not able to release this information in the public use data.

We believe that the household weight in the restricted data is a good weight and you should feel comfortable using it. Given the risk of deductive disclosure, and the fact that LAFANS data were designed to use as individual and not household data, we originally decided not to release a household weight at all. But subsequently, we found that users of the restricted data were interest in having it, so we have released it in the restricted data. Like other variables in the restricted data, you have to comply with the security restrictions you agreed to while using this variable.