Projects Analyzing Existing Data
Submission Guidelines
Some projects use secondary data and do not involve original data collection. These projects obtain data from one or more existing sources. Original data may be available to the public, or restrictions may preclude depositing original data at NACJD. In these situations, the original data may not need to be deposited; however, the existing data should be clearly identified, and a description of the steps taken to acquire the data should be provided so others can replicate the process. This is known as a “data road map”.
If the original data are already available from NACJD, please specify the ICPSR study number and the data version. Any new or derived variables created from the original data should be submitted along with documentation (code, syntax, or setup files) for how the variables were created.
Refer to the Guidelines for Depositing NIJ and OJJDP Data at NACJD document for more information.
When existing data used for analysis is not publicly available, researchers are encouraged to deposit the data, with the permission of the original data producer. This is especially important if the data were linked to data collected under the award to produce project findings, and both the existing and original data are needed to replicate the analysis. If the original data producer does not permit their data to be archived, the investigator should contact their assigned NIJ or OJJDP Grant Manager.
Existing and Linked Data
Existing data may be linked to original data collected by research projects. The guiding questions for what to deposit are:
- Are the existing data already publicly available;
- How easily can the original data be linked to existing data; and
- How easily can derived variables be recreated?
If a linkage is straightforward, users can easily obtain existing data from an original source and link it to the original data submitted by the researcher. In this case, project report(s) should clearly identify the source of the existing data. The code or setup file used to link the data should also be provided.
If the linkage is not straightforward, then the researcher is providing a useful service by depositing the linked data. Examples that fit this situation include:
- The link needs to be made using a judgment call of a combination of nonunique variables (e.g., age, sex, and race of an individual, and date of incident), and
- An understanding of local geographic factors (e.g., neighborhoods or block levels, especially over a range of years when boundaries shift).
Here, the redundancy of storing data twice in the archive is outweighed by the usefulness of providing it to others.
When the existing data are from the U.S. Census Bureau, the linked data should be deposited. Since the original Census files are large and contain many variables, determining which variables to use and at what level to extract data for the subsets can be time-consuming.
Often, after data are linked, a researcher may compute new variables. All useful derived variables should be deposited, especially if the variables were used for analyses included in publications. Depending on the factors above, the derived variables may be deposited in a data file that includes the original data and the existing data from another source, or deposited with the original data alone. The code or setup file used to link the files and create the derived variables should be provided.