mental health services,
substance abuse treatment,
Date of Collection:
Unit of Observation:
The civilian, noninstitutionalized population of the
United States aged 12 and older, including residents of
noninstitutional group quarters such as college dormitories, group
homes, shelters, rooming houses, and civilians dwelling on military
Data Collection Notes:
Users are advised to review the errata file prior
to conducting any analyses.
Data were collected and prepared for
release by Research Triangle Institute, Research Triangle Park,
The National Household Survey on Drug Abuse survey administration and sample design changed with the implementation of the 1999 survey. Since 1999, the survey sample has employed a 50-State design with an independent, multistage area probability sample for each of the 50 States and the District of Columbia. Therefore, estimates produced from the 1999, 2000, and 2001 surveys are not comparable to those produced from the 1998 and earlier surveys.
For selected variables, statistical
imputation was performed following logical inference to replace
missing responses. These variables are identified in the codebook as
"...LOGICALLY ASSIGNED" for the logical procedure, or by the
designation "IMPUTATION-REVISED" in the variable label when the
statistical procedure was also performed. The names of statistically
imputed variables begin with the letters "IR". For each
imputation-revised variable there is a corresponding imputation
indicator variable that indicates whether a case's value on the
variable resulted from an interview response or was imputed. Missing
values for some demographic variables were imputed by the unweighted
hot-deck technique used in previous NHSDAs. Beginning in 1999,
imputation of missing values for many other variables was accomplished
using predictive mean neighborhoods (PMN), a new procedure developed
specifically for the NHSDA. Both the hot-deck and PMN imputation
procedures are described in the codebook.
To protect the privacy
of respondents, all variables that could be used to identify
individuals have been encrypted or collapsed in the public use
file. To further ensure respondent confidentiality, the data producer
used data substitution and deletion of state identifiers and a
subsample of records in the creation of the public use file.
Previously published estimates may not be exactly reproducible from
the variables in the public use file due to the disclosure protection
procedures that were implemented.
The data definition and
dictionary files for Stata are designed to be compatible with StataSE,
Version 8. This is a large data file requiring that approximately 250
megabytes of Random Access Memory be allocated to Stata. Operations
within Stata, including conversion of the ASCII data to Stata format,
are likely to be slow. Analysts may wish to download subsets of data
from the SAMHDA Data Analysis System (DAS) for use with Stata.
A multistage area probability sample for each of the 50 states and the District of Columbia was used since 1999. A coordinated five-year sample
design was developed for 1999 through 2003. Although there is no
overlap with the 1998 sample, the design facilitates overlap in the
first-stage units (area segments) between each two successive years in
the five-year design. This design increases the precision of estimates
in year-to-year trend analysis. The sample is stratified on multiple
levels, beginning with states. Eight states are considered large
sample states and contribute approximately 3,600 respondents per
state. The remaining states are sampled to yield 900 respondents per
state. The second level of stratification divides states into Field
Interviewer (FI) Regions. The third level of stratification divides
FI regions into area segments consisting of adjacent Census
blocks. These area segments were used as the primary sampling
units. Dwelling units in area segments were listed in a standardized
order and were selected by systematic sampling. Field interviewers
visited each sample address to determine dwelling unit eligibility, to
list all eligible persons at the address, and to conduct
interviews. Persons were selected from the address roster using a
handheld computer. To improve the precision of estimates, the sample
allocation process targeted five age groups: 12-17, 18-25, 26-34,
35-49, and 50 and older. The size measures used in selecting the area
segments were coordinated with the dwelling unit and person selection
process so that a nearly self-weighting sample could be achieved in
each of the five age groups. The sample design included approximately
equal numbers of persons in the 12-17, 18-25, and 26 and older age
groups. The 2001 file also includes a boosted sample for New York City
and the surrounding area to provider greater precision in analysis of
the effects of the events of September 11, 2001. The achieved sample
for the 2001 NHSDA was 68,929 persons. The public use file has 55,561
records due to a subsampling step used in the disclosure protection
procedures. Minimum item response requirements were defined for cases
to be retained for weighting and further analysis (i.e., "usable"
cases). These requirements, as well as full sampling methodology, are
detailed in the codebook.
Due to unequal selection probabilities at multiple stages of sample selection and various adjustments, such as those for nonresponse and
poststratification, the 2001 NHSDA sample is not self-weighting.
Analysts are advised to use the sample weight when attempting to use
the NHSDA data to draw inferences about the target population or any
subdomain of the target population. All estimates published in SAMHSA
reports (such as the Results from the 2001 NHSDA: Volumes I, II, and
III) are weighted using the final analysis weight for the full sample (ANALWT). For the public use file, the corresponding final sample weight is
denoted as ANALWT_C, with the "C" denoting confidentiality protection.
This sample weight represents the total number of target population
persons each record on the file represents. Note that the sum of
ANALWT_C, over all records on the data file, represents an estimate of
the total number of people in the target population.
Mode of Data Collection:
audio computer-assisted self interview (ACASI),
computer-assisted personal interview (CAPI)
The study yielded a weighted screening response rate
of 92 percent and a weighted interview response rate for the Computer
Assisted Interview (CAI) of 73 percent.
Extent of Processing: ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of
disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major
statistical software formats as well as standard codebooks to accompany the data. In addition to
these procedures, ICPSR performed the following processing steps for this data collection:
Performed consistency checks.
Created online analysis version with question text.
Checked for undocumented or out-of-range codes.
Restrictions: Users are reminded by the United States Department of
Health and Human Services that these data are to be used solely for
statistical analysis and reporting of aggregated information and not
for the investigation of specific individuals or treatment