Measures of Effective Teaching: 5 - Observation Score Calibration and Validation, 2011 (ICPSR 37090)

Version Date: Sep 19, 2018 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Bill and Melinda Gates Foundation

Series:

https://doi.org/10.3886/ICPSR37090.v1

Version V1

MET 5 - Observation Score Calibration and Validation, 2011

The Measures of Effective Teaching Project (MET)

The MET project is based on two premises: First, a teacher's evaluation should depend to a significant extent on his/her students' achievement gains; second, any additional components of the evaluation (e.g., classroom observations) should be valid predictors of student achievement gain.

Student achievement was measured in two ways -- through existing state assessments, designed to assess student progress on the state curriculum for accountability purposes, and supplemental assessments, designed to assess higher-order conceptual understanding. The supplemental assessments used were Stanford 9 Open-Ended Reading Assessment in grades 4 through 8, Balanced Assessment in Mathematics (BAM) in grades 4 through 8, and the ACT QualityCore series for Algebra I, English 9, and Biology.

Panoramic digital video of classroom sessions were taken of participating teachers and students, teachers submitted commentary on their lessons (e.g., specifying the learning objective) and then trained raters scored the lesson based on classroom observation protocols using the following five observation protocols:

  • Classroom Assessment Scoring System (CLASS), developed by Robert Pianta, University of Virginia
  • Framework for Teaching, developed by Charlotte Danielson
  • Mathematical Quality of Instruction (MQI), developed by Heather Hill, Harvard University, and Deborah Loewenberg Ball, University of Michigan
  • Protocol for Language Arts Teaching Observations (PLATO), developed by Pam Grossman, Stanford University
  • Quality Science Teaching (QST) Instrument, developed by Raymond Pecheone, Stanford University

A subset of the videos also are being scored using an observational protocol developed by the National Board for Professional Teaching Standards (NBPTS) and using the UTeach Observational Protocol (UTOP), developed by the UTeach Preparation Program.

Close to 3,000 teacher volunteers from across the following six, predominantly urban, school districts participated in the MET project: Charlotte-Mecklenburg Schools, Dallas Independent School District, Denver Public Schools, Hillsborough County Public Schools, Memphis City Schools, and the New York City Department of Education. Participants teach math and English language arts (ELA) in grades 4-8, Algebra I, grade 9 English, and high school biology.

The Observation Score Calibration and Validation File

The Observation Score Calibration and Validation file enables psychometric research on rater error. The MET Project may be the largest application of instruments designed to measure teacher effectiveness from classroom observations ever conducted. More than eight hundred raters were trained to score over fifteen thousand videos recorded by teachers in the MET Project. The result is a database of more than 2.4 million scored items from five observations instruments:

  1. Framework for Teaching (FFT)
  2. Classroom Assessment Scoring System (CLASS)
  3. Mathematical Quality of Instruction (MQI)
  4. Protocol for Language Arts Teaching Observations (PLATO)
  5. Quality of Science Teaching (QST)

This data file has all scores assigned by raters, including scores used to evaluate raters during the scoring process. Each row in the file is the score assigned to a segment of a video by a rater under one of the five instruments evaluated by the MET Project.

MET observation scores were assigned remotely using a web application supervised by ETS and Teachscape (exception of the UTOP instrument, which was managed by the National Math and Science Initiative [NMSI]) that displayed excerpts of videos and prompted raters for scores. Raters were trained on videos that had been "master scored" with "true" scores. At the beginning of every scoring session raters were assigned pre-scored "calibration" videos to assure that instruments were applied consistently. Even after they were approved for scoring, raters were occasionally given "validation" videos to be sure that their scores were consistent with expectations. Please see Gathering Feedback for Teaching: Combining High-Quality Observations with Student Surveys and Achievement Gains Research Paper on the MET Project Web site, as well as Section 6.3 "Classroom Videos and Video Scoring Processes" of the User Guide for complete details.

The Observation Score Calibration and Validation file is provided for research on questions like the consistency of scoring across raters. For example, these data show how often raters failed validation tests and needed to be re-trained on each item used in the MET Project. Users who want to combine observation scores based on videos with other types of MET data should use the observation scores found in the Core [ICPSR 34414] or Basic [ICPSR 34346] data files.

Also included in this release is a Scoring Design Memorandum from MET researchers at ETS and Teachscape written in 2011 to MET Project leadership which describes procedures for creating observation scores for MET videos.

Bill and Melinda Gates Foundation. Measures of Effective Teaching: 5 - Observation Score Calibration and Validation, 2011. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2018-09-19. https://doi.org/10.3886/ICPSR37090.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
Bill and Melinda Gates Foundation

The Measures of Effective Teaching Longitudinal Database (MET LDB) is restricted from general dissemination; a Confidential Data Use Agreement must be established prior to access. Researchers interested in gaining access to the data can submit their applications via ICPSR's online Restricted Contracting System accessible via the "Access Restricted Data" tab on the ICPSR study homepage.

Applicants will be required to:

  • Submit IRB approval/exemption documentation;
  • Scan and email the completed Confidential Data Use Agreement, signed by the Primary Investigator and an Institutional Representative;
  • Pay annual access fee of $350 per user, and renew yearly for continued data access.

Please visit the MET LDB Web site for more information.

Inter-university Consortium for Political and Social Research
2011
2011

Participating academic institutions include Dartmouth College, Harvard University, Stanford University, University of Chicago, University of Michigan, University of Virginia, and University of Washington. Participating non-profit organizations include Educational Testing Service, RAND Corporation, and the New Teacher Center. Participating education consultants include Cambridge Education, Teachscape, and Westat. The National Board for Professional Teaching Standards and Teach For America supported the project and have encouraged their members to participate. The American Federation of Teachers and the National Education Association were involved in discussions about the MET project and supported the research.

For additional information about The Measures of Effective Teaching (MET) Project, please visit the ICPSR MET LDB Web site, as well as the MET Project Data Web site hosted by the Bill and Melinda Gates Foundation.

The Observation Score Calibration and Validation file is provided for research on questions like the consistency of scoring across raters.

Two MET project partners, ETS and Teachscape, jointly managed the recruitment and training of raters and lesson scoring. The one exception was the UTOP instruments, which was managed by the National Math and Science Initiative [NMSI].

Instrument developers set minimum expectations for the education level and teaching experience of raters. All raters held a bachelor's degree, and a majority (about 70 percent across most instruments) held higher degrees. While some raters were currently enrolled in teacher preparation programs, the majority (more than 75 percent) had six or more years of teaching experience.

Depending on the instrument, rater training required between 17 and 25 hours to complete. Training for the four instruments (other than UTOP) was conducted via online, self-directed modules. Raters for UTOP were trained using a combination of in-person and online sessions.

At the end of their training, raters were required to rate a number of pre-scored videos and achieve a minimum level of agreement with the expert scores. Raters who failed certification after one attempt were directed to review the training material. Those who failed after a second attempt were deemed ineligible to score for the MET project. The pass rate for raters averaged 77 percent across instruments and ranged from 56 percent (MQI) to 83 percent (FFT).

The MET project also monitored rater accuracy on an ongoing basis. At the start of each shift, raters had to pass a calibration exercise, scoring a smaller set of pre-scored videos. Raters who failed to pass calibration after two tries (about 9 percent per day) were not permitted to score videos that day. Throughout their work with the project, raters received additional training and guidance from their "scoring leader" - an expert scorer responsible for managing and supervising a group of raters. In addition, pre-scored videos were interspersed within the unscored videos assigned to each rater (although raters were not told which videos were pre-scored and which were unscored). Scoring leaders were provided reports on the rates of agreement their raters were able to achieve with those videos. Scoring leaders were asked to work with raters who frequently submitted discrepant scores on the pre-scored videos. Double scoring, in which the same lesson was scored by two raters, also served as a form of quality control.

ETS recruited observers using a range of online methods:

  1. Postings on the ETS Web site
  2. Postings on educational professional Web sites (e.g., National Council of Teachers of Mathematics, National Council of Teachers of English)
  3. emails to ETS scorers, such as those scoring Advanced Placement exams
  4. Postings on Facebook

The vast majority of applicants came from the postings on the ETS Web site, followed by the professional Web sites.

Videos scored by 902 current and former teachers across the United States trained by MET researchers.

Observation score
event/transaction data

  1. Classroom Assessment Scoring System (CLASS), developed by Robert Pianta, University of Virginia
  2. Framework for Teaching, developed by Charlotte Danielson
  3. Mathematical Quality of Instruction (MQI), developed by Heather Hill, Harvard University, and Deborah Loewenberg Ball, University of Michigan
  4. Protocol for Language Arts Teaching Observations (PLATO), developed by Pam Grossman, Stanford University

2018-09-19

2019-04-11 The collection is being updated to include frequencies in the ICPSR Public-Use Codebook.

2018-09-19 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Performed consistency checks.
  • Created variable labels and/or value labels.
  • Checked for undocumented or out-of-range codes.

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.

  • The citation of this study may have changed due to the new version control system that has been implemented.