September 08, 2003

Tilting the SAT playing field

There's a fascinating article over at Education Next entitled, "Disabling the SAT." It's long, but well worth your time.

Those of you who are regular readers know that I have long been critical of some issues surrounding accommodated testing, most especially the decisions by testing companies to discontinue the flagging of tests that are given under extended time limits. Those of you who wish to review my comments on this can click here, and, especially, here.

Done? Good. Back to the EN article. Attorney Miriam Kurtzig Freedman has answered my prayers with a long and thoughtful look at the recent ETS/College Board decisions surrounding accommodated testing, and the lack of public controversy surrounding the concessions on the part of testing companies is as surprising to her as it was to me:

In 1999, after taking the Graduate Management Admission Test (GMAT), the standardized exam required of applicants to business schools, Mark Breimhorst sued the test’s maker, the powerful Educational Testing Service (ETS). Breimhorst was born without hands and thus had been given more time to complete the admissions exam. His lawsuit contested ETS’s practice of informing schools when students take one of its tests under specialized conditions, effectively placing an asterisk or, in testing parlance, a “flag” next to their scores. For unexplained reasons, instead of weathering a trial, ETS settled the case and agreed to stop flagging GMAT scores.

Once the GMAT was no longer flagged, disability rights activists went after the SAT, claiming that these flags "stigmatize" disabled test takers. As I posted previously, a flag is a "stigma" only if we are willing to concede that admissions officers choose to discriminate against handicapped applicants, which is hard to believe in this day and age. Removing the flag, on the other hand, is a surreal denial that the test was given under non-standard conditions in the first place. At the heart of the matter is whether choosing to interpret a non-standard test differently from a standard test is "stigmatizing," or sound psychometric practice.

Accommodations like extended time, [disability rights advocates] believe, are necessary to equalize the testing experience for disabled and nondisabled students and thus make the scores of disabled students [on the SAT] more valid.

The problem is, those scores haven't actually been validated, not in a statistical sense. Equalizing the testing situation in this sense doesn't automatically translate into equalization of the predictive validity of the two types of scores. There's a great deal of controversy over whether extending testing time actually "levels the playing field" for disabled students, as well as whether accommodations can benefit non-disabled students who obtain fraudulent diagnoses. If the former is false, there's no reason to offer accommodations; if the latter is true, the potential for fraud is great. Neither of these, though, have any bearing on whether admissions officers should know whether a test was non-standard.

Should a test be flagged if it's given in Braille for a visually-disabled test taker? Not necessarily, because a credible argument can be made that the same skills are being measured. Reading comprehension is reading comprehension, whether you read with your eyes or your fingers. But because admissions tests are almost always speeded, changing the time limit of the tests will almost certainly change the nature of what's being tested, so extended-time tests should be flagged. The only question that needs to be asked when considering the removal of these flags is, "Do non-standard tests predict college GPA as well as standard tests?" If they do, there's no reason to flag. If they don't, it's fraudulent not to flag, all the talk about "stigma" notwithstanding.

As Ms. Freeman notes,

"The College Board website tells aspiring matriculants, “Your scores show colleges how ready you are to handle the work at their institutions and how your verbal and math skills compare with those of other applicants.”

Therefore, no scores should be used unless they have been validated in this respect. Regardless, the College Board appointed an expert panel to discuss the removal of the flags, and ultimately they did remove them (with their main competitor, ACT, following suit only weeks later).

How many test-takers does this affect? Only 2% of SAT-takers request special accommodations - but almost all of them request additional time, and for some, such as those who claim learning disabilities, this is the only accommodation they request. As for learning disabilities (LDs), these diagnoses have increased 300% since 1976 - and while the percentage of SAT-takers has increased only 18% since 1987, the percent of those requesting accommodations has increased by more than 300%. (Don't miss the accompanying graphs, by the way, which suggest that this increase in the requests for accommodations have been accompanied by an increase in test scores for disabled test-takers.)

When the number of people in any situation who claim to be disabled increases at a rate this much faster than the increase in the general population, either something is happening to create disabilities, or a new way of diagnosing disabilities is present - or some people are not being honest about their disability status. Removing the "stigma" of the flag for extra time (which is the accommodation that LD test-takers will virtually always request) will do nothing to stem any tide of fraudulent claims.

Ms. Freedman goes one step further than (correctly) criticizing the College Board's puzzling decision to stop flagging on the SAT. She suggest three alternative routes that could have been taken:

Untimed SATs for all. If, as the College Board asserts, time doesn’t affect validity, administer the SAT untimed for everyone! As she notes, this would still change the nature of the test - but at least it would change it for everyone, so that everyone's scores could be lumped together for the validity studies. However, despite the claims of the disability advocates that they simply want disabled test-takers to be on equal footing with everyone else, I have the feeling that few of these advocates would go for this truly leveling option, as opposed to one that gives preferential treatment to disabled test-takers.

Let the students decide. If time does affect validity and standardized norms, as we have been led to believe since the precursor of the SAT began in 1926, then the Board can avoid the allegation of discrimination by allowing all students to choose whether they want extended time. No. Given the ease with which some students have obtained fake LD diagnoses, this is essentially what ETS and the College Board are already doing. Also, this doesn't remove the problems of the necessity of flagging and the differential test validity. Besides, I'm cynical enough to believe that some test-taker will be willing to sue on the grounds that being forced to choose is in and of itself stigmatizing, if the flags are still in place.

Defend the SAT. The College Board could void the settlement. If actually sued, it could defend the SAT in court...a court would most likely defer to educational experts, uphold standards supported by evidence of the SAT’s validity, reliability, and technical underpinnings, and find flagging not to be unlawful discrimination. This is, not surprisingly, the option that I believe College Board should choose, not least because this is the only option that conforms to the Standards for Educational and Psychological Testing set forth by American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME):

When there is credible evidence of score comparability across regular and modified administrations, no flag should be attached to the score. When such evidence is lacking, specific information about the nature of the modification should be provided, if permitted by law, to assist test users properly to interpret and act on test scores.

The proof of "credible evidence" should have been the lynchpin of the expert panel's decision to remove the SAT flags, but Ms. Freedman's summary of their conclusions makes it devastatingly clear that this decisions was not guided by research, nor by sound psychometric theory:

The bottom line is that the panel majority had no research directly comparing changes in the performance of nondisabled and disabled students when both are given extended time. Moreover, the existing research, given the limitations outlined above, hardly establishes that many nondisabled students would not benefit from having extra time. This is not to criticize the research, since the researchers themselves acknowledge these limitations. The problem is with the panel majority’s drawing of firm conclusions based on inconclusive evidence...

The two psychometricians on the panel certainly recognized this. In a joint statement accompanying their individual reports to the panel, when they asked themselves the key question—whether scores from the SAT taken with and without extended time are comparable—one answered “no” while the other said “not sure.”

What's more, the panel quibbled over the definition of "credible evidence," and focused on the fact that the research did not conclusively show that the two forms of testing were different, when in fact the decision to remove the flags should have been made only if the evidence conclusively showed that there was no difference in predictive validity. I was previously unaware of the exact breakdown of the vote, but this outline makes it quite clear that the decision was primarily based on the urging of the three panelists "with experience and training in the special-education and learning-disabilities arena." Funny, I don't remember seeing anything about that in the Standards for Testing guidelines.

The conclusion?

With the decision to end flagging, most students will take the test in three hours, some in four and a half hours, others in five hours (for a shorter version of the test), without any reporting of these differences. Does that pass the test of common sense?

No. As noted in the article, speed counts in the real world, admissions officers would prefer to have the flags for making decisions, and the potential for disability-claim abuse will only increase. The playing field is now more unequal; students who receive accommodated tests, even if not truly disabled, will most likely have an advantage over students who do not. It's hard not to conclude that this is what the disability rights advocates and people "with training in the learning-disabilities" have wanted all along.

Posted by kswygert at September 8, 2003 08:00 PM
Sitemeter