In a landmark victory, New York governor Andrew Cuomo and the state’s teachers’ unions have reached agreement on the structure of a new evaluation policy for public school teachers. In the new system, 40 percent of each teacher’s evaluation will be quantitative—based on his students’ performance on standardized tests—while the remaining 60 percent will be qualitative, deriving from subjective measures of his effectiveness, including at least one unannounced observation by the principal. The new evaluations will be hard to buck. Only 13 percent of teachers receiving a bad rating on their first evaluation can appeal, and only in cases when the teachers’ union claims that the principal harassed the teacher during the evaluation. Teachers can’t appeal their second bad rating: two bad ratings will serve as sufficient evidence that the teacher is ineffective and warrant potential termination, putting the burden on the teacher to prove that he is competent rather than the school district to prove otherwise.
The fight to develop meaningful evaluations for public school teachers has been long, difficult, but necessary. Under most current evaluation systems, principals observe teachers in the classroom a few times at most during the school year. The teacher is notified of these visits in advance, and they usually last less than a complete class period. The teacher’s rating is then based largely on the principal’s observation. Neither principals nor teachers take the evaluations seriously, in part because the results have no meaningful influence on the teacher’s compensation or job security. It shouldn’t be surprising, then, that, in many school districts, it’s common for 98 percent or more of teachers to be rated “satisfactory” or better. Such rubber-stamp evaluations are bad for students stuck with ineffective teachers. And they don’t help teachers, either—those interested in meaningful feedback about their performance, at least.
Teachers’ unions kept meaningful evaluations off the table for many years. Their anxieties about assessment weren’t completely unwarranted; both quantitative and qualitative measures of teacher performance contain flaws. But that’s precisely why both types of assessments should be used, as they will be in New York: the strength of each measure helps mitigate weaknesses in the other.
Teachers are wary, for instance, of seeing their performance assessment based on students’ standardized test scores, because math and reading exams are blunt instruments that only tell us about part of what teachers do in the classroom. Further, as statistical tools, such assessments are by definition influenced by random error. Some bad teachers will incorrectly be identified as effective, and some average and good teachers will get lower ratings than they deserve. That’s why assessment of student test scores should be supplemented with in-person observations of teachers’ performance. Such observations will allow principals to identify cases when great teachers posted low test scores because the margin of random error worked against them or because of some other mitigating factor. Principals will also be able to identify teachers who make contributions to the school unrelated to their own students’ test scores—mentoring younger teachers, for example.
But we can’t rely entirely on principal assessments of a teacher’s performance. Before collective bargaining gave teachers their political influence, administrators commonly abused their power and dismissed teachers for various reasons, including their political beliefs and race—and even for getting pregnant. Of course, the law now protects teachers from many of these abuses. But teachers worry that if given too much power, administrators might feel emboldened to remove them for personal reasons or if they speak out about problems in the school.
This is where test scores come in. Despite their imperfections, they are a good supplement to principal assessments because they provide an objective measure of teacher performance. The computer cares only about whether students in a teacher’s classroom appear to be making academic gains—not whether the teacher is a whistleblower or whether he has had a personal conflict with an administrator. Teachers whose students are making academic progress needn’t worry about receiving an unfair rating. This alone gives teachers a reason to embrace test-score analysis eventually: it will protect them from inappropriate actions by administrators. A principal looking to fire an effective teacher for personal reasons can’t do so unless the test scores of that teacher’s students happen not to represent her classroom performance accurately. That situation is possible but unlikely; research shows that in most cases, principal observations and test-score analysis will come to the same conclusions.
In developing a new system capable of identifying the best and worst teachers, the Empire State has joined a growing number of school systems around the country. Still more states and school districts should move in New York’s direction. If implemented properly and fairly, these new evaluation systems will benefit not only kids but teachers as well.