For Education Policy, Not Much Value in 'Value-Added' Evaluations

On September 21, Los Angeles teacher Rigoberto Ruelas killed himself, jumping 100 feet from a bridge into the Angeles National Forest below. His suicide came about a month after the Los Angeles Times posted ratings of over 6,000 L.A. teachers on its website (8/14/10). Ruelas was poorly ranked, pegged as "less effective than average overall." But he was much beloved, and his sense of self was deeply tied up with the work he did in South L.A.'s poor and largely Latino Miramonte Elementary School.

The L.A. Times rating was certainly devastating. It may have also been wrong.

A legion of critics charge that the "value-added" measures used by the Times and other such rating systems, which rate teachers based on the growth in individual students' test scores, are imprecise and inconsistent. But politicians eager to sell silver-bullet school reform have largely ignored the adverse research findings. The Obama administration's Race to the Top grant program has pushed states to tie teacher evaluation to student performance on standardized tests, and in 2010, former D.C. Public Schools chancellor Michelle Rhee used the measures to fire 165 teachers (Washington Post, 7/24/10).

The New York Times--alongside other outlets, including the Wall Street Journal, New York Post, Daily News and local news channel NY1--has followed suit, filing a freedom of information request demanding that the New York City Department Of Education (DOE) release such data on city teachers. The Teacher Data Reports cover 12,000 New York City English and math teachers in grades four through eight.

While it had previously pledged to fight such efforts, the DOE announced in October 2010 they would make the data available. The United Federation of Teachers sued to block the release, and in November, media organizations joined the lawsuit on the side of the DOE. The case is still in limbo after the union appealed a judge's January decision in favor of the city and the news outlets.

"Obtaining this data advances our ability to inform readers about the quality of teaching and the way schools measure quality," Times spokesperson Eileen Murphy told Extra!, "both of which are central to the current debate over the future of public education."

The L.A. Times took a similar position, framing its effort as one that would bring more information to a complicated debate. On the paper's website, a list of Frequently Asked Questions acknowledged problems with the data's reliability, but insisted that "parents and the public have a right to judge it for themselves."

New York Judge Cynthia Kern echoed that message, writing that "the public's interest in disclosure of the information outweighs the privacy interest of the teachers," and that "there is no requirement that data be reliable for it to be disclosed."

The New York Times has not said whether it plans to publish the data in a searchable database, as the L.A. Times did. According to Gotham Schools (10/20/10), a news website covering city schools, the DOE provided district-level data without individual teachers' names attached when the Times made a similar request in 2008.

Gotham Schools disagrees with how the L.A. Times handled its city's data. "We did make a FOIL [Freedom Of Information Law] request, but we're not participating in the lawsuit," Gotham Schools editor Elizabeth Green told Extra!. Nor will the website put teachers' scores in a database.

"If we say, 'Here are a lot of the problems with this number, but here's the number,'" says Green, "you're still making an argument about the quality of that teacher by attaching it to that number."

In a recent New York Times Magazine article (1/26/11), New York Times executive editor Bill Keller described how reporters from his paper and other media outlets carefully vetted diplomatic cables provided by WikiLeaks, supplying "context, nuance and skepticism" to each. Keller wrote that the "cables called for context and analysis" and that the very release of some could put lives in danger. They decided to release only a small number of the cables, and provided detailed reporting to make sense of them.

If the New York Times follows the L.A. Times on value-added measures, however, they will be employing a very different approach to the large data trove.

"We are not going to discuss our plans for the data," says Murphy. "Having said that, we do believe we have a history of fairly covering this topic."

Indeed, what's most surprising about the New York Times' lawsuit is that it contradicts the paper's solid reporting on the data's unreliability.

Sam Dillon, who regularly reports on education policy for the Times, wrote (9/01/10) that the "federal Department of Education's own research arm warned in a study that value-added estimates 'are subject to considerable degree of random error,'" and a National Academies expert panel wrote a letter to Education Secretary Arne Duncan expressing "significant concern" that Race to the Top put "too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals."

Dillon quoted Stanford professor Edward Haertel, a co-author of an August 2010 Economic Policy Institute study criticizing value-added measures, saying the system was "unstable." University of Wisconsin-Madison professor Douglas Harris described how taking different student characteristics into account can produce different outcomes. Dillon detailed more problems: students changing classes mid-year and thus being associated with the wrong teacher; the impossible-to-discern influence of a given teacher or tutor, since they teach overlapping skills; a "ceiling effect" limiting the measure's sensitivity to gains amongst the highest-performing students.

Sharon Otterman (12/27/10), who covers New York schools for the Times, reported that "the rankings are based on an algorithm that few other than statisticians can understand, and on tests that the state has said were too narrow and predictable." She also pointed out that "a promising correlation for groups of teachers on the average may be of little help to the individual teacher, who faces, at least for the near future, a notable chance of being misjudged by the ranking system."

Otterman quoted a Brooklyn elementary school principal, "Some of my best teachers have the absolute worst scores." She cited a July 2010 U.S. Department of Education study that found a teacher would probably be misrated 35 percent of the time with one year of data taken into account, and 25 percent of the time with three years. With 10 years of data, the error rate still stood at a stubborn 12 percent.

Making things all the more complicated, Otterman pointed out, is the fact that standardized tests are adjusted from time to time, making it difficult to compare one year's test scores with the next--and New York's were just toughened.

A November 10 news article on Ruelas' death and the ensuing controversy demonstrates the importance of expertise in education reporting. The writer, Ian Lovett, does not regularly report on education, and quoted just one expert: a scholar from the conservative Hoover Institution.

While the New York Times education reporters who follow the beat week in and week out seem skeptical toward education reform flavors of the month, the Times opinion pages seem, like the paper's legal team, to largely discount this reporting.

One unsigned editorial (3/18/10) stated that "evaluation systems could have an enormous effect on the quality of the profession and the quality of education" and lamented that "right now most states lack the capacity to perform sophisticated, data-driven studies and evaluations."

Another (3/31/10) celebrated the early Race to the Top winners--Delaware and Tennessee--for aggressively linking testing data to teacher evaluation. The op-ed failed to mention researchers' criticism and implied that self-interested teachers' unions were the only opponents of progress. "The politically powerful teachers' unions," they wrote, "reacted fiercely and predictably to this provision."

The editorials also fail to discuss other proposals for revamping a teacher evaluation system that pretty much everyone agrees is broken. By not articulating the alternatives proposed by many researchers and teachers, the writers set up a false choice between value-added measures and the dysfunctional status quo.

Defenders of value-added measures always insist that it should be just one of many tools used to evaluate a teacher. Indeed, many critics acknowledge the tests can be useful for pointing out a teacher that may require closer scrutiny or support. News reporters at the New York Times have done a good job describing a complicated issue and putting value-added measures in proper context.

If editors decide to grade individual teachers, however, politics and sales will trump good reporting. That would be unsatisfactory, but, unfortunately, not below average.

Daniel Denvir is a journalist in Philadelphia. His work has appeared in Salon, AlterNet, the Guardian and on NPR. He regularly covers education for the Philadelphia Public Schools Notebook.

Our work is licensed under Creative Commons (CC BY-NC-ND 3.0). Feel free to republish and share widely.

economic policy institute wikileaks