If you want to get a good sense of the impact of the Obama Administration’s Race to the Top initiative on teacher evaluations, take a look at the report released last week by Maryland’s Department of Education on the implementation of the state’s new performance management system. Thanks to the effort, we are now gaining some insights on the need to improve the quality of teaching in our schools, and at the same time, getting an important lesson on why student test score growth data should be the dominant feature in evaluating how teachers improve student achievement.

statelogoFor one thing, the Old Line State’s evaluation system confirms some conclusions that researchers have reached about the potential teachers have for improving their performance in boosting student achievement. As state education officials concluded in their research, improvements in the effectiveness of the average teacher plateaus between their third and fifth years on the job. In short, a teacher is unlikely to improve their performance after the first few years on the job. For example, 47 percent of three-to-five year teachers evaluated were ranked highly effective, compared to 50 percent of teachers with 21 to 25 years of experience. The data bears out the conclusion reached by Dan Goldhaber and Michael Hansen five years ago that the average teacher was no better at their job after 25 years than after four.

This result alone once again serves as a reminder of why granting teachers near-lifetime employment (in the form of tenure makes no sense. Without a lot of hard work, a low-performing teacher will not get any better over time. If anything, by granting tenure, laggard teachers have no incentive to leave the profession. Based on the data that shows that 3.6 percent of newly-hired teachers without tenure were ineffective compared to 2.2 percent of veteran instructors, just one-third of laggard teachers will either leave (or be shown the door) before attaining near-lifetime employment.

Just as importantly, it also raises the question of whether professional development programs can really make a difference in improving teacher performance. Perhaps professional development can improve performance of teachers during the first two or three years on the job, but they will likely be ineffective beyond that point. This reality may be the underlying reason why just eight out of every 10 teacher professional development programs in 50 districts surveyed by TNTP failed to improve teacher performance in improving student achievement.

Maryland’s latest teacher evaluation report also confirms another conclusion reached by education researchers a while ago: That poor and minority children are less-likely to attain high-quality teachers than middle-class peers. Just 23.8 percent of the 10,739 teachers serving children in the highest-poverty schools were rated highly effective, versus 58 percent of the 7,660 teachers working in low-poverty schools. This resembles results from research such as a 2010 report from the Center for Analysis of Longitudinal Data in Education Research, and Center for American Progress’ analysis of Massachusetts’ and Louisiana’s teacher evaluation ratings.

Even more important is that the state’s evaluations show that black and Latino children are more-likely to be denied high-quality teaching than their peers. Just 15.1 percent of the 7,217 teachers serving minority children in high minority-high poverty schools were highly effective, versus 58 percent of the 5,724 teachers serving kids in low-minority-low poverty schools. As in the rest of the nation, to be black and poor in Maryland (or be Latino and poor) in America is to be educationally abused and neglected.

Certainly the insights from this data may be helpful in transforming public education, both in Maryland and in the rest of the nation. Yet the Old Line State’s report also raises some important questions about the usefulness of the multiple measures approach to evaluations that many reformers (and traditionalists) have embraced.

The first question comes courtesy of one of the more-curious data points in the report: That Baltimore’s traditional district rated more than 35 percent of its teachers as highly-effective. Considering that the district’s average eighth-grade scale score National Assessment of Educational Progress barely budged between 2009 and 2015 (and that 49 percent of eighth-graders read Below Basic on this year’s exam, three percentage points higher than six years ago), there’s no way that so many of the district’s teachers are high performing.

But Baltimore City isn’t the only one that gave out so many top rankings to its teachers. Fifteen Maryland districts ranked more than 35 percent of teachers as highly effective; 12 of them gave 50 percent or more of their teachers the highest evaluation rank. This includes Howard County in the Baltimore suburbs (where 86 percent of teachers were ranked as highly-effective), and tiny Somerset County on the state’s Eastern Shore (where more than 52 percent of teachers were considered top performers). Meanwhile Montgomery County, which has long had the (undeserved) reputation as one of the nation’s best-performing districts, ranked not one of its teachers as highly effective; Prince George’s County (in which Dropout Nation is located) ranked less than five percent of its teachers as top-performing.

On one hand, it may not be so shocking that not one Montgomery County teacher was rated highly effective. As Dropout Nation has revealed over the last year — and as demonstrated by news earlier this year that three out of every four high schoolers failed the district’s Algebra 1 final exams — the district’s reputation for academic achievement has always been more illusion than reality. This is true for many suburban districts, whose generally high raw test scores mask the low quality of teacher performance. One of the benefits of Value-Added Analysis is that its focus on test score growth instead of raw scores allows researchers, school leaders, and policymakers to figure out how much of student achievement is attributable to the work of adults in schools and what is the mere consequence of the kids just coming from wealthier households than many of their poor and minority peers.

But the fact that Baltimore City, one of the nation’s worst-performing districts, ranks more than a third of its teachers as highly effective makes you wonder what is going on. As it turns out, the differences in levels of teachers rated highly effective may have much to do with the leeway given to districts and affiliates of the National Education Association and American Federation of Teachers (the bargaining agents for teachers) in structuring the 50 percent of evaluations tied to student learning objectives.

As part of both its successful bid during the initial round of Race to the Top, as well as through its waiver from the accountability provisions of the No Child Left Behind Act, Maryland politicians agreed to require evidence of student growth to account for 50 percent of evaluations. State test score growth, in particular, would account for 25 percent based on how it structured the evaluations. Subjective observations long ago proven to be ineffective in measuring teacher performance accounts for the remainder of the performance measuring. Sounds good. The problem? It depends on what is allowed to be included in the Student Learning Objectives portion of the evaluation.

While districts are required to use state test score growth data as one component, they can also add other measures of student achievement. In fact, a district can add three or more SLOs into the student performance portion of the evaluation. As a result, the percentage of test score data may actually be lower than 25 percent. The more measures added, the less fine-tuned the evaluation will be in measuring how teachers improve student achievement. As a result, it is quite likely that districts that use three or more “measures” of student achievement are allowing teachers to appear better than they really are. This isn’t helpful to teachers, school leaders, families, or most-important of all, our children.

As your editor noted two years ago, the Bill & Melinda Gates Foundation demonstrated in its Measures of Effective Teaching research that student test score growth is a far better identifier of how teachers improve student success than subjective observations and other multiple measures. Oddly enough, Maryland’s education department demonstrated this point in its report. The more student test score growth data is included in an evaluation, the better it identifies (and rewards) high-quality teachers. If student growth accounted for 80 percent of evaluations, the scores for the average high-quality teacher increased by 3.36 points, while those of a laggard declined by 6.06 points. On the other hand, if observations(or evidence of professional practice, as the state calls it), made up 80 percent of evaluations, the score for the average laggard teacher would increase by 6.01 points while that for a high-quality counterpart would decline by 3.36 points.

It is clear from the report that Maryland needs to limit what districts can allow to be included in the student learning objectives portion of its teacher evaluations. One key step: Require high-quality formative assessments aligned with Common Core reading and math standards to be used in evaluations. An even better move would be to only allow test score growth data from Maryland’s battery of assessments to be used as evidence of improving student achievement. But given the stranglehold of NEA’s and AFT’s affiliates (as well as districts) on politics in Annapolis, this is unlikely to happen unless Gov. Larry Hogan (who hasn’t proven to be useful in advancing systemic reform on any front) pushes the state’s board of education to do so.

The consequences of using multiple measures isn’t limited to Maryland alone. Other states that revamped teacher evaluations as part of Race to the Top and the No Child waiver gambit have embraced similarly watered-down approaches to evaluating teacher performance. Given the Obama Administration’s partial reverse of support for using standardized testing for accountability (including in evaluating teachers), you can expect student test score data to be even less of a factor in teacher evaluations. Which means a set-back in a key aspect of transforming American public education. As Maryland’s latest report demonstrates, this would be terrible for children, especially given what the data, watered-down as it may be, is spotlighting.