Congressional Record for 104th-109th Congresses: Text and Phrase Counts (ICPSR 33501)

Principal Investigator(s): Gentzkow, Matthew, University of Chicago, and National Bureau of Economic Research; Shapiro, Jesse, University of Chicago, and National Bureau of Economic Research

Summary: This qualitative data collection contains original and processed text from the United States Congressional Record for the 104th-109th Congresses. The Congressional Record includes text from both chambers, the United States House of Representatives and the United States Senate. For each Congress the archive includes the original tagged text files, parsed files that separate the text into individual speeches, speaker metadata that can be linked to the parsed files, and counts of two-word phrases (... (more info)

Access Notes

  • These data are available only to users at ICPSR member institutions. Because you are not logged in, we cannot verify that you will be able to download the data.

Dataset(s)

WARNING: Because this study has many datasets, the download all files option has been suppressed, and you will need to download one dataset at a time.

WARNING: This study is over 150MB in size and may take several minutes to download on a typical internet connection.

DS0:  Study-Level Files
DS1:  Original 1995
Download:
All Dataset Files (110,991 KB) Other
DS2:  Original 1996
Download:
All Dataset Files (77,211 KB) Other
DS3:  Original 1997
Download:
All Dataset Files (73,892 KB) Other
DS4:  Original 1998
Download:
All Dataset Files (77,703 KB) Other
DS5:  Original 1999
Download:
All Dataset Files (85,092 KB) Other
DS6:  Original 2000
Download:
All Dataset Files (75,167 KB) Other
DS7:  Original 2001
Download:
All Dataset Files (74,576 KB) Other
DS8:  Original 2002
Download:
All Dataset Files (65,196 KB) Other
DS9:  Original 2003
Download:
All Dataset Files (87,678 KB) Other
DS10:  Original 2004
Download:
All Dataset Files (70,624 KB) Other
DS11:  Original 2005
Download:
All Dataset Files (83,767 KB) Other
DS12:  Original 2006
Download:
All Dataset Files (65,135 KB) Other
DS13:  Speeches
Download:
All Dataset Files (362,498 KB) Other
DS14:  Counts by Date
Download:
All Dataset Files (325,885 KB) Other
DS15:  Counts by Party
Download:
All Dataset Files (111,156 KB) Other
DS16:  Counts by Speaker
Download:
All Dataset Files (310,639 KB) Other
DS17:  Metadata: Speaker
Download:
DS18:  Metadata: Speech
Download:
All Dataset Files (16,633 KB) Other

Study Description

Citation

Gentzkow, Matthew, and Jesse Shapiro. Congressional Record for 104th-109th Congresses: Text and Phrase Counts. ICPSR33501-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2012-12-14. doi:10.3886/ICPSR33501.v1

Persistent URL: http://dx.doi.org/10.3886/ICPSR33501.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote XML (EndNote X4.0.1 or higher)

Funding

This survey was funded by:

  • National Science Foundation (SES-0617658 and SES-0922342)

Scope of Study

Summary:   This qualitative data collection contains original and processed text from the United States Congressional Record for the 104th-109th Congresses. The Congressional Record includes text from both chambers, the United States House of Representatives and the United States Senate. For each Congress the archive includes the original tagged text files, parsed files that separate the text into individual speeches, speaker metadata that can be linked to the parsed files, and counts of two-word phrases (bigrams) by speaker, party, and date.

Subject Terms:   government, legislative bodies, political speeches, public officials, United States Congress

Smallest Geographic Unit:   United States

Geographic Coverage:   United States

Time Period:  

  • 1995--2006

Date of Collection:  

  • 2007-06--2011-11

Unit of Observation:   parsed bigrams of congressional records, speaker and speech metadata

Universe:   Full-text of the published Congressional Record for both chambers of the 104th-109th Congresses of the United States.

Data Types:   administrative records data, aggregate data, machine-readable text, program source code

Data Collection Notes:

Please see the ICPSR User Guide for information about what each part of the data collection contains.

Please note that the files for this data collection are extremely large. Users should exercise discretion when downloading files.

Methodology

Study Design:   Please refer to the Original P.I. Documentation in the ICPSR User Guide.

Sample:   Please refer to the Original P.I. Documentation in the ICPSR User Guide for more information about sampling.

Data Source:

Congressional Records obtained from the Government Printing Office

Version(s)

Original ICPSR Release:  2012-12-14

Related Publications

Utilities

Update Notification

Use any of the notification links to add this study to your RSS feed; you will then receive notification if the study is substantively updated.

Metadata Exports

If you're looking for collection-level metadata rather than an individual metadata record, please visit our Metadata Records page.

Download Statistics

Found a problem? Use our Report Problem form to let us know.