National Archive of Criminal Justice Data

This dataset is maintained and distributed by the National Archive of Criminal Justice Data (NACJD), the criminal justice archive within ICPSR. NACJD is primarily sponsored by three agencies within the U.S. Department of Justice: the Bureau of Justice Statistics, the National Institute of Justice, and the Office of Juvenile Justice and Delinquency Prevention.

Examining the Structure, Organization, and Processes of the International Market for Stolen Data, 2007-2012 (ICPSR 35002)

Principal Investigator(s): Holt, Thomas, Michigan State University; Smirnova, Olga, East Carolina University

Summary:

These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.

This study was designed to understand the economic and social structure of the market for stolen data on-line. This data provides information on the costs of various forms of personal information and cybercrime services, the payment systems used, social organization and structure of the market, and interactions between buyers, sellers, and forum operators. The PIs used this data to assess the economy of stolen data markets, the social organization of participants, and the payment methods and services used.

The study utilized a sample of approximately 1,900 threads generated from 13 web forums, 10 of which used Russian as their primary language and three which used English. These forums were hosted around the world, and acted as online advertising spaces for individuals to sell and buy a range of products. The content of these forums were downloaded and translated from Russian to English to create a purposive, yet convenient sample of threads from each forum.

The collection contains 1 SPSS data file (ICPSR Submission Economic File SPSS.sav) with 39 variables and 13,735 cases and 1 Access data file (Social Network Analysis File Revised 04-11-14.mdb) with a total of 16 data tables and 199 variables.

Qualitative data used to examine the associations and working relationships present between participants at the micro and macro-level are not available at this time.

Access Notes

  • These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.

  • One or more files in this data collection have special restrictions ; consult the restrictions note to learn more. You can apply online for access to the restricted-use data. A login is required to apply.

    Access to these data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement, specify the reason for the request, and obtain IRB approval or notice of exemption for their research.

    Any public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

Dataset(s)

Dataset
Download:
No downloadable data files available.

Study Description

Citation

Holt, Thomas, and Olga Smirnova. Examining the Structure, Organization, and Processes of the International Market for Stolen Data, 2007-2012. ICPSR35002-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2017-06-15. https://doi.org/10.3886/ICPSR35002.v1

Persistent URL: https://doi.org/10.3886/ICPSR35002.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote XML (EndNote X4.0.1 or higher)

Funding

This study was funded by:

  • United States Department of Justice. Office of Justice Programs. National Institute of Justice (2010-IJ-CX-1676)

Scope of Study

Subject Terms:    computer use, cyber crime, electronic commerce, fraud, identity theft, Internet, stolen property

Smallest Geographic Unit:    Country

Geographic Coverage:    Australia, Canada, Europe, Russia, United Kingdom, United States

Time Period:   

  • 2007--2012

Date of Collection:   

  • 2011--2012

Unit of Observation:    Individual

Universe:    Individuals who engaged in 13 on-line markets operating through web forums across the world from 2007-2012.

Data Type(s):    observational data

Data Collection Notes:

These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.

Qualitative data used to examine the associations and working relationships present between participants at the micro and macro-level are not available at this time.

It was unclear from the documentation provided by the PI as to how many threads were used in the analysis. Three different numbers were provided (1,889 ; 1,893 ; and 1,899). Because of this issue, ICPSR uses the term "approximately 1,900" when referencing the number of threads.

Methodology

Study Purpose:    The purpose of this study was to assess the economy and market processes of the market for stolen data operating on-line, as well as its organization and network structure.

Study Design:   

This study utilized a sample of approximately 1,900 threads from 13 web forums where criminals and hackers buy, sell, and trade stolen financial and personal information. Eight of the sites sampled were publicly accessible, in that the entire site could be accessed by anyone in the general public. The five remaining sites required that an individual create a registered user account within the site in order to access the content of the sub-forums related to data sales. In order to capture the forum content across all sites, the researchers created usernames for each forum but did not interact with other registered participants to reduce the potential for contamination.

Researchers utilized both qualitative and quantitative methods to address the research questions of this study. Specifically, quantitative analysis techniques were used to examine the economy of the market, while qualitative grounded theory analyses were used to explore the social organization of the market. Finally, quantitative social network analyses were used to assess the relationships between participants in these markets.

For the Economic Data Coding (ICPSR Submission Economic File SPSS.sav), content analysis techniques were applied to classify the various products, resources, and materials sold or sought out in these forums. The content of each ad was coded based on the detail provided. Each item was coded individually, such that an advertisement where an individual was selling credit card numbers as well as PayPal accounts were coded as a single instance of each activity. In addition, any additional advertisements or updates in a thread were coded as new cases to capture variations in pricing and products over time.

The Social Network Analysis (Social Network Analysis File Revised 04-11-14.mdb) allowed researchers to visualize and quantify the information on large networks and complex relationships. Network analysis allowed both the visualization of users' communications, as well as the extraction of network connectivity. The exchanges between individuals in these forums allowed for the identification of network structures within and across all the forums. This type of analysis allowed for the identification of global patterns in the otherwise hidden networks of data market participants, and connectivity between participants. In addition, social network analysis enabled researchers to consider connections between participants based on their role in both the forum and in the course of any sales or exchanges noted in a thread.

Applying social network analysis techniques to forum data, individual posters became network vertices, while their forum interactions established connections between them. The username for each poster, regardless of what forum it appeared in, served as a basis to assess connections between individuals, and consider the flow of information from one agent, or vertex, to another. This allowed researchers to build a set of arcs, or connections between hackers.

Sample:    A convenience sample of approximately 1,900 threads from 13 public and private forums engaged in sale of stolen data in both Russian and English were collected. The sample of forums was developed via a snowball sampling procedure similar to those used in traditional qualitative field work in the real world. Such a tactic was valuable as there was no immediate way to document the total number of stolen data markets operating around the world at any point in time. Thus, this sample began with the identification of three English language forums through Google.com using common terms in stolen data markets, including "carding dump purchase sale cvv." One of these sites was a sub-forum of a larger Russian language forum. After exploring the content of threads from these sites, three Russian language forums were identified via web links provided by forum users. Six additional forums were identified using the same processes to create a total of 10 Russian language sites and three English language forums.

Time Method:    Cross-sectional

Weight:    None

Mode of Data Collection:    record abstracts

Description of Variables:   

The Economic Data File (ICPSR Submission Economic File SPSS.sav ; 39 variables, n=13,735) includes information related to the forum such as language used and hosting location, as well as information related to the poster including whether they list e-mail. For items being sold, there are variables pertaining to what is being sold and at what price, whether there is a minimum purchase, and whether there is a bulk discount. There is also information detailing which nations are being harmed and what payment type the poster seeks. There are also variables regarding whether the poster provides customer service, whether the poster provides samples, whether the poster provides free replacements, and whether the forum is a ripping forum.

The tables in the Social Network Analysis data file (Social Network Analysis File Revised 04-11-14.mdb ; 16 tables, 199 variables) include the name of the forum, a username, the level or role or the user in the forum, the date the user joined that particular forum as a member, and the date the user started the thread or posted in the thread. There are also variables regarding the number of posts the user has had, any funding the user has, what the post is about, and whether the post pertains to buying or selling. There is also information related to the user who started the thread, a list of the usernames within the thread in sequential order, a thread number, and a list of the usernames to which the post is being directed.

Response Rates:    Not applicable.

Presence of Common Scales:    none

Version(s)

Original ICPSR Release:   2017-06-15

Utilities

Metadata Exports

If you're looking for collection-level metadata rather than an individual metadata record, please visit our Metadata Records page.

Download Statistics