If not for a beastly combination of food poisoning and stomach flu, your editor would have commented late Tuesday about the final report on teacher evaluations released by the Bill & Melinda Gates Foundation’s Measures of Effective Teaching Project. Dropout Nation has already written about the implications of some of the initiative’s findings. But the spin on the results being pitched by Gates, along with the seeming unwillingness of reformers to ask hard questions about what the study really shows, made it necessary for your editor to note some key points.
Classroom observations are useless in evaluating teacher performance: With just four hundredths of a standard deviation (at most) relationship between classroom observations and the likelihood of a teacher performing well in improving student performance (depending on the role of observations in each of the four models developed by Gates) –versus five-tenths of a standard deviation relationship between Value-Added student test score growth data and performance — it is clear that even the most-effective classroom observations are hardly useful in evaluating this most-important and non-observable aspect of teacher quality. This shouldn’t be surprising. As MET noted last year, even the most-rigorous classroom evaluations are accurate in assessing teacher performance only one fifth of a standard deviation in reading and less than half of a standard deviation in math, lower than the seven-tenths of a standard deviation for Value-Added data.
It would be too costly and counterproductive to make classroom observations better: While the Gates Foundation argues that observations may work better if two or more observers are involved — especially a school leader or otherwise independent observer not working in the school — the MET report doesn’t fully break out how much more reliable this approach would be compared to either having one person observe a teacher multiple times or just relying on test score data. [The fact that MET doesn't fully define what "reliable" is in their view is also a problem.] In any case, given that the approach Gates thinks would work best (including training observers for 17 hours and likely having to dismiss a quarter of them for being too biased in their observations to do the job right) is also one that could cost far more money than the average district or state would be willing to bear, the conclusion remains that classroom observations are not ready for any time.
No matter how you mix it, it’s better to go with Value-Added, student surveys, or both: As Dropout Nation noted last year, the accuracy of classroom observations is so low that even in a multiple measures approach to evaluation in which value-added data and student surveys account for the overwhelming majority of the data culled from the model (72.9 percent, and 17.2 percent of the evaluation in one case), the classroom observations are of such low quality that they bring down the accuracy of the overall performance review. This point is raised again in the latest group of models floated by Gates in its final MET study. Only one model matches the level of accuracy Value-Added has on its own — and that’s because observations only account for two percent of the data in the model. The usefulness of the next model, one of the three Gates prefers because observations account for a quarter of the data used (while Value-Added accounts for half), declines by nine-hundreds of a standard deviation based on Dropout Nation‘s analysis of the MET report’s data; another model, in observations, Value-Added and student surveys account for one-third each, the loss of accuracy is nearly two-tenths of a standard deviation.
Yet the Gates Foundation insists on pushing a “multiple measures approach” that is useless to teachers, school leaders, families, and children alike: Certainly one understands why it is doing so. After all, Gates is looking to molify the National Education Association and the American Federation of Teachers, which have, for the most part, unsuccessfully stood against efforts to overhaul teacher evaluations, as well as deal with those teachers who are fearful of any change in how their performance is assessed. Multiple measures offers that faint possibility of whittling down opposition (even as reaction from traditionalists so far suggest the opposite). Yet in touting a multiple measures approach that data from its own research does not support in any meaningful way whatsoever, the Gates Foundation is making a deliberate error; its embrace of multiple measures is one that dismisses strong data that disproves its position. This last MET report should be considered more of a white paper with only a semblance of a basis for its conclusions on otherwise useful data.
By touting multiple measures as the best approach to teacher evaluations, Gates Foundation and reformers are doing a disservice to teachers, school leaders, families, and children alike. For teachers, multiple measures means inaccurate and, ultimately, unfair assessment of their performance because they are not getting the highest-quality data from the two data sources — Value-Added and student surveys along the lines of the Tripod system developed by Cambridge Education and Harvard researcher Ronald Ferguson — that provide the most-comprehensive information on both non-observable and observable aspects of teacher quality. For school leaders, continuing classroom observations when there is no need for them equals to wasted man-hours and financial resources that could be better used for improving student achievement. Families are not being assured that the teachers that serve their children are getting the most-comprehensive feedback on their work needed in order to help good and great teacher become better (as well as weed out laggards from classrooms). And for children, who likely have a better sense of which teachers are more-helpful than others, multiple measures means the loss of their much-needed feedback.
Gates Foundation could and should have done better. The good news is that the data its MET project has made clear the need to move away from the traditional model of teacher performance management.