Yet another rebuttal to the Arizona State Testing Study
Fellow psychometrician Greg C. sent a "scoop" my way - Hoover Institute Fellows Margaret E. Raymond and Eric A. Hanushek have conducted a detailed analysis of student performance, and what they found directly contradicts the Arizona State Testing Study. While the ASU study concluded, in essence, that testing impedes learning, Raymond and Hanushek found that better state accountability led to greater gains on the NAEP Math scores. Their results were accompanied by direct criticism of the ASU study:
Raymond and Hanushek also evaluated [the ASU study] and found serious flaws in Amrein and Berliner’s research. “The findings are astonishing,” Raymond and Hanushek said. “Once correct statistical techniques are applied to the data they used, the results are opposite to nearly every one of their conclusions.”
Raymond and Hanushek’s analysis showed that test scores actually improved at a faster rate than in no-accountability states in almost all of the states where Amrein and Berliner claimed to find decreases. In New Mexico, Oklahoma, and West Virginia, where Amrein and Berliner found decreases, Raymond and Hanushek found that high-stakes testing was introduced too early to make a valid before and after comparison.
The fatal flaw of Amrein and Berliner’s methods, assert Raymond and Hanushek, is their point of comparison. “If one wants to assess the effect of high-stakes testing, the obvious comparison is
between states that adopted accountability systems and those that did not. Amrein and Berliner’s decision to compare the gains in high-stakes states with the national average violates a most basic principle of social-science research.”
The published study is not yet available, but unlike the ASU study, it was submitted to blind peer review. Wonder if it will, like the ASU study, be featured in the NYT as well? I'm not holding my breath.
Update:
I'm no longer on Bill Ever's mailing list (I need to rectify that by signing up with my new email address), but Bas Braams, who is, sent me a .pdf file of the Raymond & Hanushek study. Bas commented that R&H "do a fine job on Amrein Berliner", but questioned whether their statistically significant results are in fact meaningful (a topic that I've discussed previously).
R&H note, among other things, that the ASU study was not blind-reviewed, but was instead shown to scholars at other education schools (who were most likely sympathetic) with full disclosure. This alone could account for the fact that no one commented on the study's unorthodox method of comparing high-stakes schools to the national average, rather than to school that did not implement high-stakes testing. When R&H compare the high-stakes states' NAEP scores to those from no-accountability states, the direction of the score advantages reverse from those reported in the ASU study, which makes this a very important rebuttal to that study.
However, while the R&H results show that NAEP scores increase significantly more in high-stakes states than in low/no-stakes ones, the difference is not that large in a practical sense (they're in the single-digits, percentagewise). So is that educationally significant? I don't know. This does show that the ASU conclusions may be completely wrong, but this doesn't seem like proof that high-stakes accountability rules are vital to education reform. As a pro-testing person, I'm cautious, but optimistic.
Update #2: The link above is the abridged version. Here's the unabridged version at Education Next. Thanks, Bas.