#MeToo Tweet IDs, October 15-28, 2017 (ICPSR 37447)

Name: #MeToo Tweet IDs, October 15-28, 2017
Published: 2019-11-14
License: https://www.icpsr.umich.edu/web/ICPSR/studies/37447/terms

Version Date: Nov 14, 2019 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Ryan J. Gallagher, Northeastern University (Boston, Mass.); Elizabeth Stowell, Northeastern University (Boston, Mass.); Andrea G. Parker, Northeastern University (Boston, Mass.); Brooke Foucault Welles, Northeastern University (Boston, Mass.)

https://doi.org/10.3886/ICPSR37447.v1

Version V1

Slide tabs to view more

Alternate Title View help for Alternate Title

#MeToo 2017

Summary View help for Summary

This collection of tweet IDs pertains to the first two weeks of the #MeToo hashtag campaign in October 2017. During this time period there were over 1.5 million tweets with the #MeToo hashtag. Tweets containing the hashtag #MeToo were collected retroactively from a full historical Twitter Firehose (100%) collection, and reply threads in response to those tweets were separately collected from Twitter. According to Twitter Terms of Service, full tweet objects cannot be disseminated, but the tweet IDs can be rehydrated through Twitter's public GET statuses/lookup API endpoint.

The available data for this study exist in one zipped folder containing 28 files. There are 14 .csv files, one for each day, between October 15th to October 28th, containing the tweet ID with one tweet ID appearing per line. Each file only contains a single column of data (tweet_id). There were on average 109,237 tweets per day during this two-week period ranging between 16,074 to 528,143 tweets per day. Tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

The other 14 .csv files correspond to the reply threads for each day in response to tweets containing the hashtag #MeToo. Each line indicates the tweet ID of a reply in a thread of replies to a #MeToo tweet (tweet_id) and the tweet ID of the tweet immediately preceeding that tweet in the reply thread (in_reply_to_tweet_id) as comma-separated values. There were on average 21,072 replies to tweets per day during this period with a range of 2,388 to 110,789 replies per day.

Citation View help for Citation

Gallagher, Ryan J., Stowell, Elizabeth, Parker, Andrea G., and Foucault Welles, Brooke. #MeToo Tweet IDs, October 15-28, 2017. Inter-university Consortium for Political and Social Research [distributor], 2019-11-14. https://doi.org/10.3886/ICPSR37447.v1

Export Citation:

RIS (generic format for RefWorks, EndNote, etc.)
EndNote

Subject Terms View help for Subject Terms

hashtags sexual assault social media social support tweets Twitter

Geographic Coverage View help for Geographic Coverage

Global

Smallest Geographic Unit View help for Smallest Geographic Unit

None

Restrictions View help for Restrictions

This data collection may not be used for any purpose other than statistical reporting and analysis. Use of these data to learn the identity of any person or establishment is prohibited. To protect respondent privacy, all data files in this collection are restricted from general dissemination. To obtain these restricted files, researchers must agree to the terms and conditions of a Restricted Data Use Agreement.

Distributor(s) View help for Distributor(s)

Inter-university Consortium for Political and Social Research

Hide

Time Period(s) View help for Time Period(s)

2017-10-15 -- 2017-10-28

Date of Collection View help for Date of Collection

2018-02 -- 2019-11

Data Collection Notes View help for Data Collection Notes

Tweets were originally collected in February 2018 through Sysomos, a social media analytics company. The tweets were collected manually by date and time through a user interface. The tweets were then rehydrated in September 2018 through the public Twitter API. In February 2019, a 24 hour period of missing data was identified in the original collection of #MeToo tweets. The tweets from this time period of missing data were purchased directly from Twitter in June 2019. Reply threads were collected from Twitter in November 2019 for the original collection of tweets, and June 2019 for the purchased collection of missing tweets. Tweets were only collected if they were publicly available and had not been deleted or taken down. If a tweet was deleted or taken down before it could be collected at any point in this process, then it is not in this dataset.
Replies were collected in response to #MeToo tweets. Replies were collected iteratively so that entire reply threads in response to #MeToo tweets could be collected. Replies were only collected if they came within 2 days of the original tweets and did not extend beyond the upper window of the study, October 28th, 2017. As with the #MeToo tweets, tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.
Between the original collection of the #MeToo data in February 2018 and the rehydration of the tweet IDs in September 2019, approximately 78% of tweets were still present. Tweets were not available if they were deleted or taken down by Twitter. The Principal Investigators are unable to provide an estimate on the attrition rate of replies to the #MeToo tweets.
The tweet IDs in this collection pertain to the first two weeks of the #MeToo hashtag campaign. The Principal Investigators used this data to algorithmically identify individuals who have disclosed experiences of sexual violence. These disclosures make up over 51.7% of authored #MeToo tweets (i.e. not retweets) during this period, and 15.1% of all #MeToo tweets (including retweets) during this period.
A README text document accompanies the .csv data files. This file in conjunction with the DocNow hydrator can return available tweet IDs into JSON (JavaScript Object Notation). The README file provides the command lines to prepare the #MeToo data for rehydration using the hydrator.

Hide

Study Purpose View help for Study Purpose

The temporal focus of this data collection of the first two weeks of the #MeToo campaign was to study the direct, public disclosures of sexual violence on Twitter, and to study the social support structures that emerge around such disclosures.

Sample View help for Sample

Tweets were collected from a full (100%) Twitter Firehose collection if they contained the hashtag #MeToo or retweeted a tweet containing the hashtag #MeToo. Those tweets were later rehydrated via their tweet IDs through Twitter's publicGET statuses/lookupAPI endpoint. Tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

Replies were collected in response to #MeToo tweets. Replies were collected iteratively so that entire reply threads in response to #MeToo tweets could be collected. Replies were only collected if they came within 2 days of the original tweets and did not extend beyond the upper window of the study, October 28th, 2017. As with the #MeToo tweets, tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

Time Method View help for Time Method

Longitudinal: Cohort / Event-based

Universe View help for Universe

Tweets from Twitter that contained, quoted, or retweeted a tweet containing the hashtag #MeToo, and replies (not necessarily containing #MeToo) threaded in response to those tweets.

Unit(s) of Observation View help for Unit(s) of Observation

Data Type(s) View help for Data Type(s)

event/transaction data

Description of Variables View help for Description of Variables

The following presents a list of the number of tweets and replies for each of the 14 days during the initial #MeToo movement of October 2017.

Sunday, October 15th: 24,265 tweets / 4,896 replies
Monday, October 16th: 528,143 tweets / 110,789 replies
Tuesday, October 17th: 414,188 tweets / 79,715 replies
Wednesday, October 18th: 186,381 tweets / 39,421 replies
Thursday, October 19th: 108,574 tweets / 18,535 replies
Friday, October 20th: 58,344 tweets / 9,118 replies
Saturday, October 21st: 34,448 tweets / 5,296 replies
Sunday, October 22nd: 36,243 tweets / 5,923 replies
Monday, October 23rd: 26,912 tweets / 3,882 replies
Tuesday, October 24th: 28,989 tweets / 4,112 replies
Wednesday, October 25th: 27,451 tweets / 3,992 replies
Thursday, October 26th: 19,846 tweets / 3,437 replies
Friday, October 27th: 19,464 tweets / 3,505 replies
Saturday, October 28th: 16,074 tweets / 2,388 replies

Hide

Original Release Date View help for Original Release Date

2019-11-14

Hide

Weight View help for Weight

None

Hide

Notes

These data are freely available to data users at ICPSR member institutions. The curation and dissemination of this study are provided by the institutional members of ICPSR. How do I access ICPSR data if I am not at a member institution?
One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.

#MeToo Tweet IDs, October 15-28, 2017 (ICPSR 37447)

Project Description

Alternate Title View help for Alternate Title

Summary View help for Summary

Citation View help for Citation

Subject Terms View help for Subject Terms

Geographic Coverage View help for Geographic Coverage

Smallest Geographic Unit View help for Smallest Geographic Unit

Restrictions View help for Restrictions

Distributor(s) View help for Distributor(s)

Scope of Project

Time Period(s) View help for Time Period(s)

Date of Collection View help for Date of Collection

Data Collection Notes View help for Data Collection Notes

Methodology

Study Purpose View help for Study Purpose

Sample View help for Sample

Time Method View help for Time Method

Universe View help for Universe

Unit(s) of Observation View help for Unit(s) of Observation

Data Type(s) View help for Data Type(s)

Description of Variables View help for Description of Variables

Version(s)

Original Release Date View help for Original Release Date

Analysis Information

Weight View help for Weight

Notes