mental health services,
substance abuse treatment,
Date of Collection:
Unit of Observation:
The civilian, noninstitutionalized population of the
United States aged 12 and older, including residents of
noninstitutional group quarters such as college dormitories, group
homes, shelters, rooming houses, and civilians dwelling on military
Data Collection Notes:
Data were collected and prepared for release by
Research Triangle Institute, Research Triangle Park, NC.
the 2002 survey, this series was titled National Household Surveys on
Although the design of the 2003 survey is similar to the design of the 1999 through 2001 surveys, there are important methodological differences since 2002 that affect the 2003 estimates. Each NSDUH respondent since 2002 has been given an incentive payment of $30. This change resulted in an improvement in the survey response rate. In addition, in 2002 new population data from the 2000 decennial Census became available for use in NSDUH sample weighting procedures. Therefore the data from 2002 and later should not be compared with data collected in 2001 or earlier to assess changes over time.
For selected variables, statistical imputation was performed
following logical inference to replace missing responses. These
variables are identified in the codebook as "...LOGICALLY ASSIGNED"
for the logical procedure, or by the designation "IMPUTATION-REVISED"
in the variable label when the statistical procedure was also
performed. The names of statistically imputed variables begin with the
letters "IR." For each imputation-revised variable, a corresponding
imputation indicator variable indicates whether a case's value on the
variable resulted from an interview response or was imputed. Missing
values for some demographic variables were imputed by the unweighted
hot-deck technique used in previous surveys. Beginning in 1999,
imputation of missing values for most variables was accomplished
using predictive mean neighborhoods (PMN), a new procedure developed
specifically for this survey. Both the hot-deck and PMN imputation
procedures are described in the codebook.
To protect the privacy of
respondents, all variables that could be used to identify individuals
have been encrypted or collapsed in the public use file. To further
ensure respondent confidentiality, the data producer used data
substitution and deletion of state identifiers and a subsample of
records in the creation of the public use file.
published estimates may not be exactly reproducible from the variables
in the public use file due to the disclosure protection procedures
that were implemented.
The data definition and dictionary files
for Stata are designed to be compatible with StataSE, Version 8. This
is a large data file requiring that approximately 250 megabytes of
Random Access Memory be allocated to Stata. Operations within Stata,
including conversion of the ASCII data to Stata format, are likely to
be slow. Analysts may wish to download subsets of data from the
SAMHDA Data Analysis System (DAS) for use with Stata.
Since 1999, the survey sample has employed a 50-State design with an independent, multistage area probability sample for each of the 50 States and the District of Columbia.
Multistage area probability sample for each of the 50
states and the District of Columbia since 1999. A coordinated five-year sample
design was developed for 1999 through 2003. Although there is no
overlap with the 1998 sample, the design facilitates overlap in the
first-stage units (area segments) between each two successive years in
the five-year design. This design increases the precision of estimates
in year-to-year trend analysis. The sample is stratified on multiple
levels, beginning with states. Eight states are considered large
sample states and contribute approximately 3,600 respondents per
state. The remaining states are sampled to yield 900 respondents per
state. The second level of stratification divides states into Field
Interviewer (FI) Regions. The third level of stratification divides FI
regions into area segments consisting of adjacent Census blocks.
These area segments were used as the primary sampling units. Dwelling
units in area segments were listed in a standardized order and were
selected by systematic sampling. Field interviewers visited each
sample address to determine dwelling unit eligibility, to list all
eligible persons at the address, and to conduct interviews. Each
respondent who completed a full interview was given a $30 cash
payment. Persons were selected from the address roster using a
handheld computer. To improve the precision of estimates, the sample
allocation process targeted five age groups: 12-17, 18-25, 26-34,
35-49, and 50 and older. The size measures used in selecting the area
segments were coordinated with the dwelling unit and person selection
process so that a nearly self-weighting sample could be achieved in
each of the five age groups. The sample design included approximately
equal numbers of persons in the 12-17, 18-25, and 26 and older age
groups. The achieved sample for the 2003 NSDUH was 67,784 persons. The
public use file contains 55,230 records due to a subsampling step used
in the disclosure protection procedures. Minimum item response
requirements were defined for cases to be retained for weighting and
further analysis (i.e., "usable" cases). These requirements, as well
as full sampling methodology, are detailed in the codebook.
Due to unequal selection probabilities at multiple stages of sample selection and various adjustments, such as those for nonresponse
and poststratification, the 2003 NSDUH sample design is not
self-weighting. Analysts are advised to use the final sample weight
when attempting to use the 2003 NSDUH data to draw inferences about
the target population or any subdomains of the target population. All
estimates published in SAMHSA reports (such as the Results from the
2003 NSDUH) are weighted using the final analysis weight for the full
sample (ANALWT). For the public use file, the corresponding final sample weight
is denoted as ANALWT_C, with the "C" denoting confidentiality
protection. This sample weight represents the total number of target
population persons each record on the file represents. Note that the
sum of ANALWT_C, over all records on the data file, represents an
estimate of the total number of people in the target population.
Mode of Data Collection:
audio computer-assisted self interview (ACASI),
computer-assisted personal interview (CAPI)
The study yielded a weighted screening response rate
of 91 percent and a weighted interview response rate for the Computer
Assisted Interview (CAI) of 77 percent.
Extent of Processing: ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of
disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major
statistical software formats as well as standard codebooks to accompany the data. In addition to
these procedures, ICPSR performed the following processing steps for this data collection:
Performed consistency checks.
Created online analysis version with question text.
Checked for undocumented or out-of-range codes.
Restrictions: Users are reminded by the United States Department of
Health and Human Services that these data are to be used solely for
statistical analysis and reporting of aggregated information and not for
the investigation of specific individuals or treatment facilities.