#MeToo Tweet IDs, October 15-28, 2017 (ICPSR 37447)

Version Date: Nov 14, 2019 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Ryan J. Gallagher, Northeastern University (Boston, Mass.); Elizabeth Stowell, Northeastern University (Boston, Mass.); Andrea G. Parker, Northeastern University (Boston, Mass.); Brooke Foucault Welles, Northeastern University (Boston, Mass.)

https://doi.org/10.3886/ICPSR37447.v1

Version V1

Slide tabs to view more

#MeToo 2017

This collection of tweet IDs pertains to the first two weeks of the #MeToo hashtag campaign in October 2017. During this time period there were over 1.5 million tweets with the #MeToo hashtag. Tweets containing the hashtag #MeToo were collected retroactively from a full historical Twitter Firehose (100%) collection, and reply threads in response to those tweets were separately collected from Twitter. According to Twitter Terms of Service, full tweet objects cannot be disseminated, but the tweet IDs can be rehydrated through Twitter's public GET statuses/lookup API endpoint.

The available data for this study exist in one zipped folder containing 28 files. There are 14 .csv files, one for each day, between October 15th to October 28th, containing the tweet ID with one tweet ID appearing per line. Each file only contains a single column of data (tweet_id). There were on average 109,237 tweets per day during this two-week period ranging between 16,074 to 528,143 tweets per day. Tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

The other 14 .csv files correspond to the reply threads for each day in response to tweets containing the hashtag #MeToo. Each line indicates the tweet ID of a reply in a thread of replies to a #MeToo tweet (tweet_id) and the tweet ID of the tweet immediately preceeding that tweet in the reply thread (in_reply_to_tweet_id) as comma-separated values. There were on average 21,072 replies to tweets per day during this period with a range of 2,388 to 110,789 replies per day.

Gallagher, Ryan J., Stowell, Elizabeth, Parker, Andrea G., and Foucault Welles, Brooke. #MeToo Tweet IDs, October 15-28, 2017. Inter-university Consortium for Political and Social Research [distributor], 2019-11-14. https://doi.org/10.3886/ICPSR37447.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote

None

This data collection may not be used for any purpose other than statistical reporting and analysis. Use of these data to learn the identity of any person or establishment is prohibited. To protect respondent privacy, all data files in this collection are restricted from general dissemination. To obtain these restricted files, researchers must agree to the terms and conditions of a Restricted Data Use Agreement.

Inter-university Consortium for Political and Social Research
Hide

2017-10-15 -- 2017-10-28
2018-02 -- 2019-11
  1. Tweets were originally collected in February 2018 through Sysomos, a social media analytics company. The tweets were collected manually by date and time through a user interface. The tweets were then rehydrated in September 2018 through the public Twitter API. In February 2019, a 24 hour period of missing data was identified in the original collection of #MeToo tweets. The tweets from this time period of missing data were purchased directly from Twitter in June 2019. Reply threads were collected from Twitter in November 2019 for the original collection of tweets, and June 2019 for the purchased collection of missing tweets. Tweets were only collected if they were publicly available and had not been deleted or taken down. If a tweet was deleted or taken down before it could be collected at any point in this process, then it is not in this dataset.

  2. Replies were collected in response to #MeToo tweets. Replies were collected iteratively so that entire reply threads in response to #MeToo tweets could be collected. Replies were only collected if they came within 2 days of the original tweets and did not extend beyond the upper window of the study, October 28th, 2017. As with the #MeToo tweets, tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

  3. Between the original collection of the #MeToo data in February 2018 and the rehydration of the tweet IDs in September 2019, approximately 78% of tweets were still present. Tweets were not available if they were deleted or taken down by Twitter. The Principal Investigators are unable to provide an estimate on the attrition rate of replies to the #MeToo tweets.

  4. The tweet IDs in this collection pertain to the first two weeks of the #MeToo hashtag campaign. The Principal Investigators used this data to algorithmically identify individuals who have disclosed experiences of sexual violence. These disclosures make up over 51.7% of authored #MeToo tweets (i.e. not retweets) during this period, and 15.1% of all #MeToo tweets (including retweets) during this period.

  5. A README text document accompanies the .csv data files. This file in conjunction with the DocNow hydrator can return available tweet IDs into JSON (JavaScript Object Notation). The README file provides the command lines to prepare the #MeToo data for rehydration using the hydrator.

Hide

The temporal focus of this data collection of the first two weeks of the #MeToo campaign was to study the direct, public disclosures of sexual violence on Twitter, and to study the social support structures that emerge around such disclosures.

Tweets were collected from a full (100%) Twitter Firehose collection if they contained the hashtag #MeToo or retweeted a tweet containing the hashtag #MeToo. Those tweets were later rehydrated via their tweet IDs through Twitter's publicGET statuses/lookupAPI endpoint. Tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

Replies were collected in response to #MeToo tweets. Replies were collected iteratively so that entire reply threads in response to #MeToo tweets could be collected. Replies were only collected if they came within 2 days of the original tweets and did not extend beyond the upper window of the study, October 28th, 2017. As with the #MeToo tweets, tweets must have been public and not deleted or taken down at the time of collection in order to appear in this dataset.

Longitudinal: Cohort / Event-based

Tweets from Twitter that contained, quoted, or retweeted a tweet containing the hashtag #MeToo, and replies (not necessarily containing #MeToo) threaded in response to those tweets.

tweet

The following presents a list of the number of tweets and replies for each of the 14 days during the initial #MeToo movement of October 2017.

  • Sunday, October 15th: 24,265 tweets / 4,896 replies
  • Monday, October 16th: 528,143 tweets / 110,789 replies
  • Tuesday, October 17th: 414,188 tweets / 79,715 replies
  • Wednesday, October 18th: 186,381 tweets / 39,421 replies
  • Thursday, October 19th: 108,574 tweets / 18,535 replies
  • Friday, October 20th: 58,344 tweets / 9,118 replies
  • Saturday, October 21st: 34,448 tweets / 5,296 replies
  • Sunday, October 22nd: 36,243 tweets / 5,923 replies
  • Monday, October 23rd: 26,912 tweets / 3,882 replies
  • Tuesday, October 24th: 28,989 tweets / 4,112 replies
  • Wednesday, October 25th: 27,451 tweets / 3,992 replies
  • Thursday, October 26th: 19,846 tweets / 3,437 replies
  • Friday, October 27th: 19,464 tweets / 3,505 replies
  • Saturday, October 28th: 16,074 tweets / 2,388 replies

Hide

2019-11-14

Hide

Notes