mental health services,
substance abuse treatment,
Date of Collection:
Unit of Observation:
The civilian, noninstitutionalized population of the
United States aged 12 and older, including residents of
noninstitutional group quarters such as college dormitories, group
homes, shelters, rooming houses, and civilians dwelling on military
Data Collection Notes:
Data were collected and prepared for release by
Research Triangle Institute, Research Triangle Park, North Carolina.
Prior to the 2002 survey, this series was titled National
Household Surveys on Drug Abuse.
Although the design of the 2005 survey is similar to the design of the 1999 through 2001 surveys, there are important methodological differences since 2002 that affect the 2005 estimates. Each NSDUH respondent since 2002 has been given an incentive payment of $30. This change resulted in an improvement in the survey response rate. In addition, in 2002 new population data from the 2000 decennial Census became available for use in NSDUH sample weighting procedures. Therefore the data from 2002 and later should not be compared with data collected in 2001 or earlier to assess changes over time.
For selected variables, statistical
imputation was performed following logical inference to replace
missing responses. These variables are identified in the codebook as
"...LOGICALLY ASSIGNED" for the logical procedure, or by the
designation "IMPUTATION-REVISED" in the variable label when the
statistical procedure was also performed. The names of statistically
imputed variables begin with the letters "IR". For each
imputation-revised variable, a corresponding imputation indicator
variable indicates whether a case's value on the variable resulted
from an interview response or was imputed. Missing values for some
demographic variables were imputed by the unweighted hot-deck
technique used in previous surveys. Beginning in 1999, imputation of
missing values for most variables was accomplished using
predictive mean neighborhoods (PMN), a new procedure developed
specifically for this survey. Both the hot-deck and PMN imputation
procedures are described in the codebook.
To protect the privacy
of respondents, all variables that could be used to identify
individuals have been encrypted or collapsed in the public use
file. To further ensure respondent confidentiality, the data producer
used data substitution and deletion of state identifiers and a
subsample of records in the creation of the public use file.
Previously published estimates may not be exactly reproducible from
the variables in the public use file due to the disclosure protection
procedures that were implemented.
The data definition and
dictionary files for Stata are designed to be compatible with StataSE,
Version 8. This is a large data file requiring that approximately 250
megabytes of Random Access Memory be allocated to Stata. Operations
within Stata, including conversion of the ASCII data to Stata format,
are likely to be slow. Analysts may wish to download subsets of data
from the SAMHDA Data Analysis System (DAS) for use with Stata.
Since 1999, the survey sample has employed a 50-State design with an independent, multistage area probability sample for each of the 50 States and the District of Columbia.
A multistage area probability sample for each of the 50
states and the District of Columbia was used since 1999. The 2005 NSDUH is the
first survey in a coordinated five-year sample design. Although there
is no overlap with the 1999-2004 samples, the coordinated design for
2005 through 2009 facilitated a 50 percent overlap in second-stage
units (area segments [see below]) between each two successive years
from 2005 through 2009. This design was intended to increase the precision of
estimates in year-to-year trend analyses because of the expected
positive correlation resulting from the overlapping sample between
successive survey years. The 2005 design allows for computation of
estimates by state in all 50 states plus the District of Columbia.
States may therefore be viewed as the first level of stratification as
well as a reporting variable. Eight states, referred to as the large
sample states, had a sample designed to yield 3,600 respondents per
state for the 2005 survey. This sample size was considered adequate to
support direct state estimates. The remaining 43 states (which include
the District of Columbia) had a sample designed to yield 900 respondents
per state in the 2005 survey. In these 43 states, adequate data were
available to support reliable state estimates based on SAE methodology.
Within each state, sampling strata called state sampling (SS) regions
were formed. Based on a composite size measure, states were partitioned
geographically into roughly equal-sized regions. In other words,
regions were formed such that each area yielded, in expectation,
roughly the same number of interviews during each data collection
period. The eight large sample states were divided into 48 SS regions
each. The remaining states were divided into 12 SS regions each.
Therefore, the partitioning of the United States resulted in the
formation of a total of 900 SS regions. Unlike the 1999 through 2004
surveys, the first stage of selection for the 2005 through 2009 NSDUHs
was Census tracts. The first stage of selection began with the
construction of an area sample frame that contained one record for each
Census tract in the United States. If necessary, Census tracts were
aggregated within SS regions until each tract had, at a minimum, 150
dwelling units in urban areas and 100 dwelling units in rural areas.
These Census tracts served as the primary sampling units (PSUs) for the
coordinated five-year sample. One area segment (one or more Census blocks) was selected within each sampled Census tract. In advance of the survey period, specially
trained listers had visited each area segment and listed all addresses
for housing units and eligible group quarters units in a prescribed
order. Systematic sampling was used to select the allocated sample of
addresses from each segment. Each respondent who completed a full
interview was given a $30 cash payment as a token of appreciation for
his or her time. To improve the precision of the estimates, the sample
allocation process targeted five age groups: 12 to 17 years, 18 to 25
years, 26 to 34 years, 35 to 49 years, and 50 years or older. The size
measures used in selecting the area segments were coordinated with the
dwelling unit and person selection process so that a nearly
self-weighting sample could be achieved in each of the five age groups.
The achieved sample size for the 2005 survey was 68,308 persons. The
public use file contains 55,905 records due to a subsampling step used
in the disclosure protection procedures. A key step in the data
processing procedures established the minimum item response
requirements in order for cases to be retained for weighting and
further analysis (i.e., "usable" cases). These requirements, as well
as full sampling methodology, are detailed in the codebook.
Due to unequal selection probabilities at multiple stages of sample selection and various adjustments, such as those for nonresponse
and poststratification, the 2005 NSDUH sample design is not
self-weighting. Analysts are advised to use the final sample weight
when attempting to use the 2005 NSDUH data to draw inferences about
the target population or any subdomains of the target population.
All estimates published in SAMHSA reports (such as the results from
the 2005 NSDUH) are weighted using the final analysis weight for the
full sample (ANALWT). For the public use file, the corresponding final sample
weight is denoted as ANALWT_C, with the "C" denoting confidentiality
protection. This sample weight represents the total number of target
population persons each record on the file represents. Note that the
sum of ANALWT_C, over all records on the data file, represents an
estimate of the total number of people in the target population.
Mode of Data Collection:
audio computer-assisted self interview (ACASI),
computer-assisted personal interview (CAPI)
Strategies for ensuring high rates of participation
resulted in a weighted screening response rate of 91 percent and a
weighted interview response rate for the CAI of 76 percent. (Note that
these response rates reflect the original sample, not the subsampled
data file referenced in this document.)
Extent of Processing: ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of
disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major
statistical software formats as well as standard codebooks to accompany the data. In addition to
these procedures, ICPSR performed the following processing steps for this data collection:
Performed consistency checks.
Created online analysis version with question text.
Checked for undocumented or out-of-range codes.
Restrictions: Users are reminded by the United States Department of
Health and Human Services that these data are to be used solely for
statistical analysis and reporting of aggregated information and not for
the investigation of specific individuals or treatment facilities.