This note provides an overview of replication and its roles in science. Because you will perform a replication in the exercise at the end of this guide, you may find this background information valuable. This note is not required to complete the exercises.

One of the goals of social science is to provide information that will help us understand society and influence policy. Evidence from a single study is not usually considered sufficient for accepting a scientific knowledge claim. Results gain credibility when other scientists arrive at similar conclusions in later studies.

Replication and transparency go hand-in-hand. Transparency refers to clarity and oppenness about the data and methods used in the analysis. It is what makes replication possible. In its narrowest sense, a replication of a study follows the original study very closely. Scientists apply the same method to the same data, even as precisely as using the same software code. If successful, the replication produces the same results as the original study. A broader sense of replication includes taking the methods used in the original study and applying them to different data. Or, a replication might test the original hypothesis with improvements to the method.

In each case, the replication is successful when its results match those in the original study. The scientific process of replication serves important purposes:

  • Successful replication can increase confidence in the original findings. Scientific consensus accrues as new studies replicate a finding again and again.
  • Failures to replicate may reveal flaws in the methods used by the original study. There could be problems in how data were handled, how analyses were conducted, or how results were presented.
  • Failures to replicate could also point to differences in context that turn out to be important. For example, we might find out that a pattern of results holds among the U.S. population, but not for European samples.
  • The process of replication can be a learning experience. Following the method used by earlier researchers requires careful attention to detail. Scientists can become familiar with a source of data, and students can learn the finer points of conducting research.

This guide is geared toward the last point about learning. The National Crime Victimization Survey (NCVS) is complex. An excellent way to begin using the data is to replicate official figures published by the Bureau of Justice Statistics. After completing the exercise in this guide, you can have confidence in your statistical work when your replicated results match the original reports.

Want to hear more about replication? Check out this National Science Foundation interview with Brian Nosek of the Center for Open Science.

The National Crime Victimization Survey (NCVS) is a self-report survey in which interviewed persons are asked about the number and characteristics of victimizations they experienced. The NCVS provides annual level and change estimates on criminal victimization in the U.S. Several features of the NCVS make it appropriate for this type of analysis:

  1. The U.S. Census Bureau has been collecting these data on behalf of the Bureau of Justice Statistics since 1973.
  2. The NCVS uses either direct face-to-face or telephone interviews with respondents. Survey respondents provide information about themselves and whether they experienced a victimization.
  3. The NCVS collects data from all persons ages 12 or older from a nationally representative sample of U.S. households. This means that the results can be used to estimate the levels of victimization for persons in households across the entire country.
  4. The NCVS was designed to complement official police statistics, such as the FBI’s Uniform Crime Reporting (UCR) Program. Unlike police statistics or other official sources, the NCVS captures criminal acts that may never come to the attention of the police or other law enforcement agencies, or the “dark figure of crime.”
  5. The NCVS asks respondents about their experiences with victimization in the past six months. This relatively short recall period increases the validity of the information.

In addition to tracking trends, the NCVS is meant to provide detailed information about the circumstances of crime, including characteristics of both the perpetrator(s) and the victim(s). These data can be used to estimate levels of violent and property crime victimization, and the NCVS can also be used to study such topics as the relationships between victims and offenders, the victim’s decision to report the crime to the police, and the consequences of crime on the victim’s physical and psychological well-being. The study design also makes it possible to examine crimes that affect everyone in the household as well as all crimes experienced by an individual.

Want to learn more about the NCVS? See our National Crime Victimization Survey Resource Guide. For detailed information, see the codebook documentation provided with each data release.

The purpose of calculating rates is to make “raw” counts comparable across units of analysis. Counting how many criminal acts of a given type occur in an area is important, but in order to make decisions about the incidence of crime in that area, we need more information. Say, for example, that we know that there were 10 assaults in community A and 20 in community B this year. Does that mean community B is more dangerous? That is, did community B’s residents have a higher chance of being assaulted this year? To answer that question correctly, we need to think about how many people there are in each community. What if community A is a small town and community B is a large city? That should affect our interpretation of the number of assaults.

Rates help us put absolute numbers in proper context, enabling comparisons between different geographic areas, among subgroups within the population, or across time. To calculate rates, we need to know how many events occurred but also how many individuals were “at risk” of experiencing the event. Continuing the example above, let’s say that these two communities, A and B, have the following populations and assault counts:

A B
Population 25,000 100,000
Assaults 10 20

Which of the two communities has a higher rate of assaults? To figure that out, we calculate the assault rate by dividing the number of assaults by the number of people in the community. Because crimes are rare, it is a common practice to multiply the result by 1,000 and report the rates per 1,000 people for ease of interpretation.

Following this practice for community A, the rate per 1,000 people would be: 10 over 25,000 times 1,000 = 0.4 assaults per 1,000 Community B’s rate per 1,000 people would be: 20 over 100,000 times 1,000 = 0.2 assaults per 1,000

Therefore, although community B had more total assaults this year, the risk of being assaulted was two times higher in community A.

Just as in this example, calculating crime rates from the NCVS requires knowing not only how many crimes occurred in a given year but also the number at risk for victimization. However, when national estimates are derived from a sample, as with the NCVS, caution must be used when comparing one estimate to another estimate or when comparing estimates over time. Although one estimate may be larger than another, estimates based on a sample have some degree of sampling error. When the sampling error around an estimate is taken into account, the estimates that appear different may not be statistically different. For more information on sampling error and computing standard errors with the NCVS, see User’s Guide to the National Crime Victimization Survey (NCVS) Generalized Variance Functions (pdf).