The Source for Crime and Justice Data

National Household Survey on Drug Abuse, 2001 (ICPSR 3580)

Alternate Title:  NHSDA 2001

Principal Investigator(s): United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies


The National Household Survey on Drug Abuse (NHSDA) series measures the prevalence and correlates of drug use in the United States. The surveys are designed to provide quarterly, as well as annual, estimates. Information is provided on the use of illicit drugs, alcohol, and tobacco among members of United States households aged 12 and older. Questions include age at first use as well as lifetime, annual, and past-month usage for the following drug classes: marijuana, cocaine (and crack), hallucinogens, heroin, inhalants, alcohol, tobacco, and nonmedical use of prescription drugs, including pain relievers, tranquilizers, stimulants, and sedatives. The survey covers substance abuse treatment history and perceived need for treatment, and includes questions from the Diagnostic and Statistical Manual (DSM) of Mental Disorders that allow diagnostic criteria to be applied. Respondents are also asked about personal and family income sources and amounts, health care access and coverage, illegal activities and arrest record, problems resulting from the use of drugs, and needle-sharing. Questions introduced in previous NHSDA administrations were retained in the 2001 survey, including questions asked only of respondents aged 12 to 17. These "youth experiences" items covered a variety of topics, such as neighborhood environment, illegal activities, gang involvement, drug use by friends, social support, extracurricular activities, exposure to substance abuse prevention and education programs, and perceived adult attitudes toward drug use and activities such as school work. Also retained were questions on mental health and access to care, perceived risk of using drugs, perceived availability of drugs, driving behavior and personal behavior, and cigar smoking. Questions on the tobacco brand used most often were introduced with the 1999 survey and have been retained through the 2001 survey. Demographic data include gender, race, age, ethnicity, marital status, educational level, job status, veteran status, and current household composition. In addition, in 2001 questions on purchase of marijuana were added.

Series: National Survey on Drug Use and Health (NSDUH) Series

Access Notes


National Household Survey on Drug Abuse, 2001 - Download All Files (1734.7 MB) large file

Study Description


United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Household Survey on Drug Abuse, 2001. ICPSR03580-v4. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2013-06-25.

Persistent URL:

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote XML (EndNote X4.0.1 or higher)


This study was funded by:

  • United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Studies (283-98-9008)

Scope of Study

Subject Terms:   addiction, alcohol, alcohol abuse, alcohol consumption, amphetamines, barbiturates, cocaine, controlled drugs, drinking behavior, drug abuse, drug dependence, drug treatment, drug use, drugs, hallucinogens, heroin, households, inhalants, marijuana, mental health, mental health services, methamphetamine, prescription drugs, sedatives, smoking, stimulants, substance abuse, substance abuse treatment, tranquilizers

Geographic Coverage:   United States

Time Period:  

Date of Collection:  

Unit of Observation:   individual

Universe:   The civilian, noninstitutionalized population of the United States aged 12 and older, including residents of noninstitutional group quarters such as college dormitories, group homes, shelters, rooming houses, and civilians dwelling on military installations.

Data Types:   survey data

Data Collection Notes:

Users are advised to review the errata file prior to conducting any analyses.

Data were collected and prepared for release by Research Triangle Institute, Research Triangle Park, NC.

The National Household Survey on Drug Abuse survey administration and sample design changed with the implementation of the 1999 survey. Since 1999, the survey sample has employed a 50-State design with an independent, multistage area probability sample for each of the 50 States and the District of Columbia. Therefore, estimates produced from the 1999, 2000, and 2001 surveys are not comparable to those produced from the 1998 and earlier surveys.

For selected variables, statistical imputation was performed following logical inference to replace missing responses. These variables are identified in the codebook as "...LOGICALLY ASSIGNED" for the logical procedure, or by the designation "IMPUTATION-REVISED" in the variable label when the statistical procedure was also performed. The names of statistically imputed variables begin with the letters "IR". For each imputation-revised variable there is a corresponding imputation indicator variable that indicates whether a case's value on the variable resulted from an interview response or was imputed. Missing values for some demographic variables were imputed by the unweighted hot-deck technique used in previous NHSDAs. Beginning in 1999, imputation of missing values for many other variables was accomplished using predictive mean neighborhoods (PMN), a new procedure developed specifically for the NHSDA. Both the hot-deck and PMN imputation procedures are described in the codebook.

To protect the privacy of respondents, all variables that could be used to identify individuals have been encrypted or collapsed in the public use file. To further ensure respondent confidentiality, the data producer used data substitution and deletion of state identifiers and a subsample of records in the creation of the public use file.

Previously published estimates may not be exactly reproducible from the variables in the public use file due to the disclosure protection procedures that were implemented.

The data definition and dictionary files for Stata are designed to be compatible with StataSE, Version 8. This is a large data file requiring that approximately 250 megabytes of Random Access Memory be allocated to Stata. Operations within Stata, including conversion of the ASCII data to Stata format, are likely to be slow. Analysts may wish to download subsets of data from the SAMHDA Data Analysis System (DAS) for use with Stata.


Sample:   A multistage area probability sample for each of the 50 states and the District of Columbia was used since 1999. A coordinated five-year sample design was developed for 1999 through 2003. Although there is no overlap with the 1998 sample, the design facilitates overlap in the first-stage units (area segments) between each two successive years in the five-year design. This design increases the precision of estimates in year-to-year trend analysis. The sample is stratified on multiple levels, beginning with states. Eight states are considered large sample states and contribute approximately 3,600 respondents per state. The remaining states are sampled to yield 900 respondents per state. The second level of stratification divides states into Field Interviewer (FI) Regions. The third level of stratification divides FI regions into area segments consisting of adjacent Census blocks. These area segments were used as the primary sampling units. Dwelling units in area segments were listed in a standardized order and were selected by systematic sampling. Field interviewers visited each sample address to determine dwelling unit eligibility, to list all eligible persons at the address, and to conduct interviews. Persons were selected from the address roster using a handheld computer. To improve the precision of estimates, the sample allocation process targeted five age groups: 12-17, 18-25, 26-34, 35-49, and 50 and older. The size measures used in selecting the area segments were coordinated with the dwelling unit and person selection process so that a nearly self-weighting sample could be achieved in each of the five age groups. The sample design included approximately equal numbers of persons in the 12-17, 18-25, and 26 and older age groups. The 2001 file also includes a boosted sample for New York City and the surrounding area to provider greater precision in analysis of the effects of the events of September 11, 2001. The achieved sample for the 2001 NHSDA was 68,929 persons. The public use file has 55,561 records due to a subsampling step used in the disclosure protection procedures. Minimum item response requirements were defined for cases to be retained for weighting and further analysis (i.e., "usable" cases). These requirements, as well as full sampling methodology, are detailed in the codebook.

Weight:   Due to unequal selection probabilities at multiple stages of sample selection and various adjustments, such as those for nonresponse and poststratification, the 2001 NHSDA sample is not self-weighting. Analysts are advised to use the sample weight when attempting to use the NHSDA data to draw inferences about the target population or any subdomain of the target population. All estimates published in SAMHSA reports (such as the Results from the 2001 NHSDA: Volumes I, II, and III) are weighted using the final analysis weight for the full sample (ANALWT). For the public use file, the corresponding final sample weight is denoted as ANALWT_C, with the "C" denoting confidentiality protection. This sample weight represents the total number of target population persons each record on the file represents. Note that the sum of ANALWT_C, over all records on the data file, represents an estimate of the total number of people in the target population.

Mode of Data Collection:   audio computer-assisted self interview (ACASI), computer-assisted personal interview (CAPI)

Response Rates:   The study yielded a weighted screening response rate of 92 percent and a weighted interview response rate for the Computer Assisted Interview (CAI) of 73 percent.

Extent of Processing:  ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Performed consistency checks.
  • Created online analysis version with question text.
  • Checked for undocumented or out-of-range codes.

Restrictions: Users are reminded by the United States Department of Health and Human Services that these data are to be used solely for statistical analysis and reporting of aggregated information and not for the investigation of specific individuals or treatment facilities.


Original ICPSR Release:   2003-02-06

Version History:

  • 2013-06-25 Released Methodological Resources documentation and updated xml file to include variable groupings.
  • 2006-12-07 On 2006-05-18, the data producer resupplied the data file and codebook documentation. 20 variables were modified, 11 variables were dropped, and 10 new variables were added. Some of these changes were to correct for data errors, but most of these changes were done to provide consistency with the 2004 NSDUH study. Of these changes, the most important change to note is that two study design variables (VEREP and VESTR) were revised to provide consistency with the 2004 study, which collapsed the strata in order to maximize the number of people in each replicate.
  • 2004-03-25 Data producer resupplied data and documentation. There were 38 new variables, the INCOME variable was dropped, and the following variables had either data and/or documentation changes: OTDGNDLA, II2HALRC, IIHALFY, II2HALFY, IIHALFM, II2HALFM, CIGCRAVE, CIGCRAGP, YSPED, YSCHL, CIGINET, CIGFRND, CIGVEND, CIGMAIL, CIGCKOUT, CIGCLERK, CIGSMKT, CIGDSTO, CIGCONV, CIGINDIV, YTHBGHT, IILSDRC, II2LSDRC, IIPCPRC, II2PCPRC, IIECSRC, II2ECSRC, IIMTHRC, II2MTHRC, IIMTHFY, II2MTHFY, NRCH17_2, IISLTYFU, MTHYR, MTHMON, DRIVALC, DRIVDRG, and DRIVALD.

Related Publications (?)



Metadata Exports

If you're looking for collection-level metadata rather than an individual metadata record, please visit our Metadata Records page.

Download Statistics