NCVS Learning Guide: Files
The National Crime Victimization Survey is administered to persons age 12 or older from a nationally representative sample of approximately 90,000 households, made up of approximately 160,000 individuals, per year. Each household might include a different number of individuals and each person in the household might experience a different number of victimizations within that timeframe. In order to make analyses more efficient, the data are released as three separate files: (1) a household file, which includes information about the characteristics of the households; (2) a person file, which includes personal and demographic information about the respondents; and (3) an incident file, which includes information about the victimizations the respondents reported. Identification numbers are included in each of the files so that information can be linked across files. (There is also an address file that includes only administrative information for the household, such as the year and quarter of interview, panel identification, and household ID. This information is repeated in the three main files as appropriate).
Using the NCVS effectively requires knowledge of the types of research questions that could be answered with each data file. Calculating victimization rates requires data from the incident file and either the household or person file. For some research questions, the choice of files might be unclear. To help you think about file choice, some example research questions and their corresponding files are listed below.
| Example Questions: | Files Needed: | ||
|---|---|---|---|
| Incidents | Persons | Households | |
| Are apartment dwellers more likely to be victimized than those who live in single-family homes? | X | X | |
| Are all crimes more likely to take place after dark or are some more common during the day? | X | ||
| Are married people more likely than unmarried people to report crimes to the police? | X | X | |
| Are single person households more likely to be victimized than households with multiple residents? | X | X | |
| What are the most common settings in which sexual assaults take place (parks, alleys, private residences, etc.)? | X | ||
Datasets
Figure 1 and Table 1 in the Criminal Victimization, 2015 report (pdf) describe victimization rates for violent and property crimes in 2015. To calculate those, you will use three data files within ICPSR Study 36448, which you can download from the NCVS 2015 study home page:

You can download the datasets by clicking on the links for SPSS, Stata or R, depending on which software you will use for the exercises in this learning guide. Download and rename the following files:
- Download and save DS2: Household Record-Type File. Rename the file “households2015”.
- Download and save DS3: Person Record-Type File. Rename the file “persons2015”.
- Download and save DS4: Incident Record-Type File. Rename the file “incidents2015”.
Preexisting Variables
In addition to the variables that you will create, this exercise makes use of three variables from the incidents file. These variables are described below with variable name, followed by the variable label in parentheses.
- V4022 (IN WHAT CITY, TOWN, VILLAGE) describes the location of the crime. In keeping with BJS practice, you will exclude any crimes that occur outside the U.S.
- V4529 (TOC CODE (NEW, NCVS)) is a list of the types of crimes experienced. Codes 1 through 20 represent violent crimes and 31 to 59 are property crimes. Codes 21, 22, and 23 include purse snatching and pick-pocketing. For these data, BJS does not technically consider 21-23 to be either violent crimes (since they do not involve the use of threat or force) or property crimes (which are defined here as committed against a household).
- SERIES_WEIGHT (VICTIMIZATION WEIGHT ADJUSTED FOR SERIES CRIMES (2015 Q1)) is an adjusted victimization weight variable that accounts for the number of occurrences in series crimes. Datasets in the NCVS store the original counts as collected from each respondent in the sample. In order to generalize those counts to the U.S. as a whole, you will need to “weight” the data, adjusting the influence of each case so that statistics you produce are representative of the entire U.S. population.
You will need two variables from the persons file for this exercise.
- WGTPERCY (ADJUSTED PERSON WEIGHT – COLLECTION YEAR) is the weight variable for sampled persons (see the SERIES_WEIGHT description above).
- V3001 (PERSON RECORD TYPE) indicates that a case describes a person. The code “3” is used for every case in the persons data file. You will use this to simply identify the weighted number of persons for a denominator.
- Note: Information about NCVS variable naming conventions: In NCVS, 2000-level variables describe households and head of household characteristics, 3000-level variables describe person characteristics, and 4000-level variables describe incident characteristics.
If you follow the typed syntax guide, you will also make use of two similar variables from the households file.
- WGTHHCY (ADJUSTED HOUSEHOLD WEIGHT – COLLECTION YEAR) is the weight variable for sampled households (see the SERIES_WEIGHT description above).
- V2001 (HOUSEHOLD RECORD TYPE) indicates that a case describes a household. The code “2” is used for every case in the households data file. You will use this to simply identify the weighted number of households for a denominator.
Sample weights are a critical methodological tool in survey research. The calculations that go into sample weights are typically done behind the scenes. As an end user of the dataset, you don’t often need to know much about how exactly the weights are calculated. It’s still helpful to understand what sample weights are and, in broad terms, how they are implemented in the National Crime Victimization Survey (NCVS).
Because surveys are randomized, even a tiny fraction of a large population can tell us about the whole. The foundation of a sample weight is a probability weight, which is the inverse of the chance that case was selected in the survey. To look at it another way, the probability weight counts how many individuals in the population are represented by a single case in the survey. Researchers multiply results from the sampled data by probability weights to see how those numbers look in the full population.
So if the adult U.S. population is 250,000,000 and 1,000 adults are surveyed via a simple random sample, each of those cases have an equal probability weight of 250,000 (250,000,000/1,000). When that survey finds a total of 320 cats owned in the sample, using the probability weight alone, the population estimate from the sample would be 80,000,000 cats nationwide (320 x 250,000).
In practice, however, data from surveys like the NCVS do not result from a simple random sample where every case had an equal chance of being selected. Cases in the NCVS are selected in a more complicated survey design, and each case does not have an equal chance of selection from the population. Very briefly, these are the most significant factors in the NCVS sample weights that you will use in this exercise:
- A weighting control factor adjusts for any changes in the field or in how the survey design was actually implemented.
- Weights for households and for people are adjusted to try to represent those households and people that did not respond to the interviews.
- Because we know demographic features of the U.S. population with some precision, adjustments are made to the weights to bring the sample in line with characteristics like age, sex, and race.
The goal of sample weight calculation is to maximize generalizability: the degree to which sample results can be projected to the population. Unweighted analyses have their place, but they do not provide the most representative results. Weighting the data appropriately, your results can provide the best estimates available of victimization in the U.S. population.
Want to read more detail on the weights used in the NCVS? Further information on weights and error in sampling can be found in our NCVS resource guide. Details on the broader sample design of the NCVS is available in a report from the Bureau of Justice Statistics. (pdf)