Survey of Prison Inmates (SPI) Resource Guide

Survey of Prison Inmates (SPI) (Formerly Survey of Inmates in State and Federal Correctional Facilities (SISFCF))

Conducted by the Bureau of Justice Statistics, this survey is part of a series of data gathering efforts undertaken to assist policymakers in assessing and remedying deficiencies in the nation's correctional institutions. Its primary objective is to produce national statistics of the state and sentenced federal prison populations across a variety of domains. The survey gathered information on demographic, socioeconomic, and criminal history characteristics of prisoners. Also obtained were details of prisoner’ military service, current offense and sentence, incident characteristics, and firearm possession and sources. Other information includes age at time of interview, ethnicity, education, lifetime drug use and alcohol use and treatment, mental and physical health and treatment, and pre-arrest employment and income. Data on characteristics of victims, prison programs and services, and rule violations are provided as well.

With the 2016 administration, the survey was renamed the Survey of Prison Inmates. 

Using the Resource Guide

NACJD, a part of the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan, designed this Resource Guide for users to learn about the Survey of Prison Inmates data and connect to related resources. 

Below, you'll find information about the data, file structure, estimation procedures, and accuracy of estimates. You'll also find links to external resources and information about accessing the data through NACJD. 

About the Data

Before 2016, the Survey of Inmates in State and Federal Correctional Facilities was comprised of two distinct surveys. Both surveys used the same data collection instrument, and data files resulting from the combination of the two have the same variables and record layout. The Survey of Inmates in State Correctional Facilities (SISCF) was conducted for the Bureau of Justice Statistics (BJS) by the Bureau of the Census. The Survey of Inmates in Federal Correctional Facilities (SIFCF) was also conducted for the BJS and the Federal Bureau of Prisons (BOP) by the Bureau of the Census. These surveys provide nationally representative data on State prison inmates and sentenced Federal inmates held in Federally-owned and -operated facilities. Through personal interviews from June through October of the survey year, inmates in both State and Federal prisons provided information about their current offense and sentence, criminal history, family background and personal characteristics, prior drug and alcohol use and treatment programs, gun possession and use, and prison activities, programs and services. Surveys of State prison inmates have been conducted in 1974, 1979, 1986, 1991, 1997 and 2004. Sentenced Federal prison inmates were first interviewed in the 1991 survey. Beginning in the year 1997, data collected for the State and Federal surveys were combined into one file. 

With the 2016 implementation, the survey was renamed the Survey of Prison Inmates. BJS conducted the Survey of Prison Inmates (SPI), a national, wideranging survey of prisoners age 18 or older who were incarcerated in state or federal correctional facilities within the United States during 2016. SPI provides national statistics on prisoner characteristics across a variety of domains, such as current offense and sentence, incident characteristics, firearm possession and sources, criminal history, demographic and socioeconomic characteristics, family background, drug and alcohol use and treatment, mental and physical health and treatment, and facility programs and rules violations. SPI can also be used to track changes in these characteristics over time, describe special populations of prisoners, and identify policy-relevant changes in the state and federal prison populations. RTI International served as BJS's data collection agent for the 2016 SPI under a cooperative agreement (Award no. 2011-MU-MUK070). From January through October 2016, data were collected through face-to-face interviews with prisoners using computerassisted personal interviewing (CAPI). In a CAPI interview, interviewers read questions aloud and enter responses directly into a laptop computer, allowing skip patterns and other routing criteria to be implemented automatically.

Sampling Frame Changes 

There are differences between how privately operated correctional facilities were classified in the 2004 (and prior) Survey of Inmates in State and Federal Correctional Facilities (SISFCF) and the 2016 SPI sampling frames. In the 2016 SPI, private confinement correctional facilities that were operated exclusively for the Federal Bureau of Prisons (BOP) were assigned to the federal sampling frame (15). All other privately operated correctional facilities were assigned to the state sampling frame. In the 2004 SISFCF, private correctional facilities that were holding exclusively for the BOP (19), including confinement and community-based facilities, were classified as out of scope for the study, which means those types of facilities were excluded from both the SISFCF federal and state frames. 

Furthermore, the 2012 Census of State and Federal Adult Correctional Facilities, which was used as the basis for the 2016 SPI frame, did not collect information needed to determine the proportion of inmates held for various authorities in other correctional facilities, including other privately operated correctional facilities. Therefore, for private facilities that did not exclusively hold federal prisoners for the BOP, it was not possible to determine whether 50% or more of the one-day count (ODC) of prisoners was held for the BOP, and thereby would be included on the federal frame, or was held for states, and thereby would be included on the state frame. In the 2004 SISFCF, it was possible to determine if facilities met the 50% threshold because the 2000 Census of State and Federal Adult Correctional Facilities, which was used as the basis for the 2004 SISFCF sampling frame, collected that information. Since that information was available for the 2004 SISFCF, it was possible to differentiate between 1) private facilities with an ODC that was more than 50% of federal prisoners held for the BOP (7), which were excluded from the SISFCF frame altogether, and 2) private facilities with an ODC that was more than 50% of prisoners held for states, which were included on the SISFCF state sampling frame. Given the limited information for the 2016 SPI though, both types of facilities were included on the state frame. 

Users should be aware of these differences related to the classification of privately operated correctional facilities on the federal sampling frames between the 2004 (and prior) SISFCF and the 2016 SPI, and proceed with caution when making particular comparisons between studies. For example, analyses conducted by BJS revealed that the change in the percentage of non-U.S. citizens in federal prisons between 2004 (16.2%) and 2016 (24.9%) was not significantly different due to a relatively large standard error (5.35%) for the 2016 estimate. Upon further investigation, it was discovered that clustering of non-U.S. citizens within a few privately operated correctional facilities that were holding inmates exclusively for the BOP in 2016 led to a relatively large intracluster correlation, which increased the variance of the 2016 estimate. In addition, as previously explained, the 2004 SISFCF excluded privately operated correctional facilities holding inmates exclusively for the BOP. This may have contributed to an underestimate of non-U.S. citizens in federal prisons from the 2004 SISFCF, especially when compared to an estimate based on data that the BOP reported to BJS through its Federal Justice Statistics Reporting Program in 2004 (27.5%). The 2016 SPI estimate of non-U.S. citizens in federal prisons is fairly consistent with an estimate based on data that the BOP reported to BJS through its Federal Justice Statistics Program in 2016 (21.4%).

Further information about the methodology for SPI, SISCF, and SIFCF can be found within the PDF codebooks downloadable from each study homepage on the NACJD website. 

File Structure

The 2016 SPI data is comprised of three datasets. Dataset 1 contains public-use data (state and federal combined), dataset 2 contains public-use state data, and dataset 3 contains restricted-use data. 

The 2004 SISFCF data is comprised of two datasets. Dataset 1 contains federal data, while dataset 2 contains state data. 

The 1997 SISFCF data  is comprised of two raw data files, a machine-readable codebook, data collection instruments, and SPSS and SAS data definition statements. Dataset 1, Numeric data, includes the majority of the responses to the questionnaire items. Dataset 2, Alphanumeric data, includes all of the literal responses including the "Other-specify" question responses.

Data prior to 1997 includes only one dataset, with SAS Control Cards and/or Data Definition Statements available where needed.  

Estimation Procedures

The estimation procedures for the 2016 SPI involved weighting the responses from the interviewed prisoners to produce national estimates, including separate estimates for state and federal prisoners, and some subnational estimates (for the self-representing states of Texas and California) with some calculable degree of sampling error. A series of adjustment factors was applied to the design-based weights of each interviewed prisoner. Weights for state and federal prisoners were calculated separately. Bias could result if the non-respondents were different from the respondents (nonresponse bias) or if the sampling population (the frame) did not accurately represent the target population (coverage bias). To compensate for these two possibilities, nonresponse and post-stratification adjustments were made.

The estimation procedures for the SISCF and SIFCF in 2004 and prior involved weighting the responses from the sampled, interviewed inmates to produce estimates with some calculable degree of sampling error. A series of adjustment factors were applied to the basic weight of each interviewed inmate. Weights for Federal and State inmates were calculated separately.

  1. Basic weight (BW).

    The initial weight, or basic weight, for each sampled inmate is the inverse of the probability of selection. This weight changes every year; for the basic weight for a specific year, see the codebook corresponding to that year

  2. Drug Subsampling Factor (DSSF)

    The Drug Subsampling Factor was calculated for the SIFCF only. To compensate for subsampling drug offenders by taking only a third of those originally selected, in this adjustment drug offenders were multiplied by 3 and nondrug offenders by 1.

  3. Weighting Control Factor (WCF)

    In some prisons, the sampling rate for a facility was adjusted because the actual number of persons in a prison on the sampling date was different from the expected number from earlier Census of State and Federal Correctional Facilities reports or lists from the BOP. When the actual number was less than 80% or more than 120% of the expected number, the weighting control factor was applied to account for adjusting the inmate sampling rate. The weighting control factor is equal to the number of inmates in a facility on the interview date divided by the number expected for that facility. If the expected number was within 20% of the expected number, the weighting control factor was 1.

  4. Duplication Control Factor (DCF)

    Several of the very smallest prisons have a total inmate population that is smaller than the number to be sampled in each facility in a particular stratum. For example, if a sample prison contained 15 inmates in a stratum in which 55 were expected to be interviewed, there would be a shortage of inmates. The DCF is used to adjust for the workload shortfall in such prisons. It is equal to the expected number of sample inmates in each facility in a stratum divided by the number of inmates in the prison on the date of the sample. In most prisons, the calculated DCF is less than one because the prison had more total inmates than the expected number in the sample for that stratum; in this case the DCF is set to 1.

  5. Noninterview Factor (NIF)

    This factor was applied to adjust the weights to account for noninterviewed inmates. The NIF was calculated as follows:

    • Basic demographic data on noninterviewed inmates were obtained by interviewers from prison records after they completed interviewing in a facility for the SISCF or from BOP for SIFCF.
    • Inmate records, including noninterviewed inmates, were separated by gender, stratum, race (Black, nonBlack), and age.
    • If there were fewer than 30 unweighted cases in a cell, it was collapsed with those in the nearest age category.
    • For each cell, the adjusted weights were summed separately for interviewed inmates (I) and for noninterviewed inmates (N).
    • A noninterview adjustment factor was calculated for each cell as the sum of the adjusted weights for both interviewed and noninterviewed inmates divided by the adjusted weights for the interviewed, or NIF=(I + N)/I.
  6. Offense Category Ratio Adjustment Factor (OCRAF)

    The OCRAF was used to adjust the weighted sample to reflect varying interview rates among inmates in different offense categories. The OCRAF was computed separately for males and females for different offense categories for State and Federal inmates. It was calculated as the weighted count of interview and noninterview thru the DCF divided by the weighted count for each stratum through application of the NIF.

  7. Control Count Ratio Adjustment Factor (CCRAF)

    CCRAF adjusts the weighted interviews by stratum level counts as of some specific date; this date varies by year. For the date specific to some collection year, see the codebook corresponding to that collection year. For the SISCF these counts were from the National Prisoners Statistics series (NPS-1A). For the SIFCF, the BOP provided counts of sentenced Federal prisoners as of some date (see codebook.)

    Thus the final weight (FW) is the product of the basic weight and all the adjustment factors.

    For the SISCF:


    For the SIFCF:


Accuracy of Estimates

Since the SPI, SISCF and SIFCF estimates come from a sample, they may differ from figures from a complete census using the same questionnaire, instructions, and enumerators. A sample survey has two possible types of errors: sampling and nonsampling. The accuracy of an estimate depends on both types of errors, but the full extent of the nonsampling error is unknown. Consequently, one should be particularly careful when interpreting results based on a relatively small number of cases or small differences between estimates. The standard errors for SISCF and SIFCF estimates primarily indicate the magnitude of sampling error. They also partially measure the effect of some nonsampling errors in responses and enumeration, but do not measure systematic biases in the data. (Bias is the average over all possible samples of the differences between the sample estimates and the desired value.)

  1. Nonsampling variability

    There are several sources of nonsampling errors, including the following:

    • Inability to obtain information about all cases in the sample
    • Definition difficulties
    • Differences in the interpretation of questions
    • Respondents' inability or unwillingness to provide correct information
    • Respondents' inability to recall information
    • Errors made in data collection such as in recording or coding the data
    • Errors made in processing the data
    • Errors made in estimating values for missing data
    • Failure to represent all units within the sample
  2. Nonresponse

    Nonresponse in the SISCF and SIFCF resulted from failing to obtain cooperation with sample prisons (first stage nonresponse) or failing to obtain completed interviews with sampled inmates (second stage). In the weighting of the sample, the NIF adjusted the weights for second stage nonresponse. The NIF was calculated based on gender, race, age and stratum. However, biases exist in the estimates to the extent that noninterviewed inmates have different characteristics from those of interviewed inmates in the same age-gender-ethnicity-stratum group. Total nonresponse for each survey includes both first and second stage nonresponse.

  3. Comparability of data

    Data obtained from the SPI, SISCF and SIFCF are not entirely comparable with data from other sources. This is due to differences in interviewer training and experience and in differing survey processes. This is an example of nonsampling variability not reflected in the standard errors. Caution should be used when comparing results from different sources.

    Note on results based upon a small number of cases or small differences in estimates: When summary measures (such as medians and percent distributions) are computed on a base smaller than 5,000 for the SISCF and 1,000 for SIFCF, they probably do not reveal useful information because of the large standard errors involved. In addition, nonsampling errors may result in small differences which may appear to be borderline significant, but are not really different.

  4. Sampling variability

    Sampling variability is variation that occurred by chance because a sample was surveyed rather than the entire population. Standard errors are primarily measures of sampling variability, although they may include some nonsampling error. They are measures of the variations that occur by chance because a sample rather than the entire population was surveyed. The sample estimate and its standard error enable one to construct a confidence interval, a range that would include the average result for all possible samples with a known probability. A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples. Standard errors may also be used to perform hypothesis testing.

  5. Generalized variance estimates

    A number of approximations are required to derive, at a moderate cost, standard errors applicable to estimates from these two surveys. Instead of providing an individual standard error for each estimate, two parameters, a and b, are provided to calculate standard errors for each type of characteristic. For more information, please see the codebook specific to your data.

    Variances were calculated using Vplex, a Bureau of the Census software package designed to calculate variances for data derived from multistage complex sample designs. Variances were calculated for the total sample and for gender, marital status and race/ethnicity subgroups (male or female, and black, nonblack, or Hispanic, and married or single). Variables for which variances were estimated included criminal justice status, prior sentence to incarceration, prior sentence to probation, current offense (murder or manslaughter, sexual offense, assault, robbery, other violent, drug offense), marital status, ever used marijuana, ever used cocaine or crack, alcohol use, armed during crime, HIV status, military service, one or more victims, education, age (not used for SIFCF), monthly income prior to arrest, whether physically or sexually abused, family member ever in prison, employment status at arrest, sentenced status, whether under the influence of drugs at time of arrest, whether under the influence of alcohol at time of arrest, whether maximum sentence was less or more than 5 years, whether a disability, whether had children, whether received help for a mental or emotional problem, who lived with growing up. These variances were calculated for the general form
    σ = ax2+bx

    The variances were then transformed logrythmically and plotted in a regression in several iterations, excluding outliers until a best fit was obtained. Hence, the a values are the intercept and the b values the slope of the line.

    Tests may be performed at various levels of significance. A significance level is the probability of concluding that the characteristics are different when, in fact, they are the same. To conclude that two parameters are different at the .05 level of significance, for example, the absolute value of the estimated difference between characteristics must be greater than or equal to 1.96 times the standard error of the difference.

    More detailed information on standard errors can be found in the codebook.

  6. Specific variance estimates

    Standard error estimates for specific variables can be derived using software packages developed to generate standard errors for data obtained from a complex sample survey design. Variables have been added to the data files to be used in running such software packages. A description of these variables can be found within the PDF codebooks and user guides, downloadable from each study homepage on the NACJD website. 

Data Access

Studies from this series include both public-use and restricted-use data. Public-use data files are available to download directly from each study homepage on the NACJD website. The SPI series page on the NACJD website includes a list of available data

Users interested in obtaining restricted-use data must complete a NACJD Restricted Data Use Agreement, specify the reasons for the request, and obtain IRB approval or notice of exemption for their research. Please visit the ResearchDataGov website to download the appropriate Restricted Data Use Agreement and apply for data access. Once approved, SPI data may be accessed from a requester secure site via ICPSR's secure download procedures.

Additional Resources




  • For questions about finding or accessing SPI data, help with interpreting codebooks, or to report errors found in the data, please contact NACJD user support at 
  • For questions regarding SPI methodology, questionnaires, or BJS-produced publications, please contact BJS support at