Smallest Geographic Unit:
Date of Collection:
Unit of Observation:
Individuals who engaged in 13 on-line markets operating through web forums across the world from 2007-2012.
Data Collection Notes:
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed.
Qualitative data used to examine the associations and working relationships present between participants at the micro and macro-level are not available at this time.
It was unclear from the documentation provided by the PI as to how many threads were used in the analysis. Three different numbers were provided (1,889 ; 1,893 ; and 1,899). Because of this issue, ICPSR uses the term "approximately 1,900" when referencing the number of threads.
The purpose of this study was to assess the economy and market processes of the market for stolen data operating on-line, as well as its organization and network structure.
This study utilized a sample of approximately 1,900 threads from 13 web forums where criminals and hackers buy, sell, and trade stolen financial and personal information. Eight of the sites sampled were publicly accessible, in that the entire site could be accessed by anyone in the general public. The five remaining sites required that an individual create a registered user account within the site in order to access the content of the sub-forums related to data sales. In order to capture the forum content across all sites, the researchers created usernames for each forum but did not interact with other registered participants to reduce the potential for contamination.
Researchers utilized both qualitative and quantitative methods to address the research questions of this study. Specifically, quantitative analysis techniques were used to examine the economy of the market, while qualitative grounded theory analyses were used to explore the social organization of the market. Finally, quantitative social network analyses were used to assess the relationships between participants in these markets.
For the Economic Data Coding (ICPSR Submission Economic File SPSS.sav), content analysis techniques were applied to classify the various products, resources, and materials sold or sought out in these forums. The content of each ad was coded based on the detail provided. Each item was coded individually, such that an advertisement where an individual was selling credit card numbers as well as PayPal accounts were coded as a single instance of each activity. In addition, any additional advertisements or updates in a thread were coded as new cases to capture variations in pricing and products over time.
The Social Network Analysis (Social Network Analysis File Revised 04-11-14.mdb) allowed researchers to visualize and quantify the information on large networks and complex relationships. Network
analysis allowed both the visualization of users' communications, as well as the extraction of network connectivity. The exchanges between individuals in these forums allowed for the identification of network structures within and across all the forums. This type of analysis allowed for the identification of global patterns in the otherwise hidden networks of data market participants, and connectivity between participants. In addition, social network analysis enabled researchers to consider connections between participants based on their role in both the forum and in the course of any sales or exchanges noted in a thread.
Applying social network analysis techniques to forum data, individual posters became network vertices, while their forum interactions established connections between them. The username for each poster, regardless of what forum it appeared in, served as a basis to assess connections between individuals, and consider the flow of information from one agent, or vertex, to another. This allowed researchers to build a set of arcs, or connections between hackers.
A convenience sample of approximately 1,900 threads from 13 public and private forums engaged in sale of stolen data in both Russian and English were collected. The sample of forums was developed via a snowball sampling procedure similar to those used in traditional qualitative field work in the real world. Such a tactic was valuable as there was no immediate way to document the total number of stolen data markets operating around the world at any point in time. Thus, this sample began with the identification of three English language forums through Google.com using common terms in stolen data markets, including "carding dump purchase sale cvv." One of these sites was a sub-forum of a larger Russian language forum. After exploring the content of threads from these sites, three Russian language forums were identified via web links provided by forum users. Six additional forums were identified using the same processes to create a total of 10 Russian language sites and three English language forums.
Mode of Data Collection:
Description of Variables:
The Economic Data File (ICPSR Submission Economic File SPSS.sav ; 39 variables, n=13,735) includes information related to the forum such as language used and hosting location, as well as information related to the poster including whether they list e-mail. For items being sold, there are variables pertaining to what is being sold and at what price, whether there is a minimum purchase, and whether there is a bulk discount. There is also information detailing which nations are being harmed and what payment type the poster seeks. There are also variables regarding whether the poster provides customer service, whether the poster provides samples, whether the poster provides free replacements, and whether the forum is a ripping forum.
The tables in the Social Network Analysis data file (Social Network Analysis File Revised 04-11-14.mdb ; 16 tables, 199 variables) include the name of the forum, a username, the level or role or the user in the forum, the date the user joined that particular forum as a member, and the date the user started the thread or posted in the thread. There are also variables regarding the number of posts the user has had, any funding the user has, what the post is about, and whether the post pertains to buying or selling. There is also information related to the user who started the thread, a list of the usernames within the thread in sequential order, a thread number, and a list of the usernames to which the post is being directed.
Presence of Common Scales: