October 21, 2003

TIME's take on the new SATs

The new issue of Time has the goods on the "dramatic" changes on the New SAT. At first, I feared that the editors had gone with the "gloom-n-doom" approach, because the fact that some students' scores could be "hurt" is mentioned in the subtitle, and the article begins with:

Three hours of misery are apparently not enough. Now the makers of the SAT want to shape what kids learn throughout four years of high school. True, students have always had to brush up on vocabulary and take practice tests before the SAT, but now the College Entrance Examination Board, which owns the test, is developing the "New SAT," an exhaustive revision largely intended to mold the U.S. secondary-school system to its liking.

Please tell me that this isn't setting the tone of the entire article Why, only one paragraph in, and the College Board has already taken over the entire K-12 system.

The College Board wants schools to produce better writers, so the New SAT will require an essay. The board thinks grammar is important, so the new test will ask students to fix poorly deployed gerunds and such. To encourage earlier advanced-math instruction, the New SAT will go beyond basic algebra and geometry for the first time to include Algebra II class material (remember negative exponents—q?3, for instance?). The board, a powerful group of 4,300 educational institutions—including most of America's leading universities—has undertaken an unprecedented effort to push local school districts to alter their curriculums accordingly.

I take it all the Time's editors and writers send their kids to schools where writing skills and grammar are unimportant, and no math beyond Algebra II is considered useful?

...Instead of the venerable math and verbal sections, the test will have three segments that will be more familiar to Americans: the three Rs, reading, writing and arithmetic. (Hence a perfect score will go from 1600 to 2400.) At first blush, the changes seem healthy enough. But inevitably, some students will do better, and some worse, on the new test.

Why is that newsworthy? Is it because the assumption here is that the only fair test is one on which all subgroups perform equally well? Do the writers at Time really believe that is possible? Or am I just being hypersensitive, thanks to the spate of uninformed and unsupported claims of test "bias" recently?

Girls tend to outperform boys on writing exams, so their overall scores could benefit from the addition of the new writing section. Boys usually score higher on the math section, but the new exam will contain fewer of the abstract-reasoning items at which they often excel. The elimination of analogies may exacerbate the black-white SAT score gap, since the gap is somewhat smaller on the analogy section than on the test as a whole, according to Jay Rosner, executive director of the Princeton Review Foundation.

Oh, jeez. Here we go, again. Does this mean that even Time writers don't see the vast flaws in Mr. Rosner's reasoning, nor the ability to wonder how there might be a conflict of interest here? If Mr. Rosner can scare more kids into taking prep courses, allegedly because he has inside information about "biases," shouldn't a journalist wonder aloud whether that sort of information is good for business?

More broadly, students who attend failing schools could suffer as the SAT morphs from a test of general-reasoning abilities into a test of what kids learn in school. "There's a danger that making it too curriculum-dependent will actually increase overall score gaps for some minority groups," says Rebecca Zwick, a former chair of the College Board's own SAT Committee. "Because we have such huge disparities in the quality of schooling in the country, kids who go to crummy schools may be disadvantaged."

Kudos to Time for interviewing an actual psychometrician who makes a very good point. Those who go to crummy schools will be disadvantaged if the test becomes too curriculum-specific. But are the universities concerned about this? Or are they too fed up with the influx of diploma-holders who need remedial math and English courses to care? Perhaps the only students they want are the ones who can show that they've learned something in high school.

As Time itself points out, this sort of change in the test can drive the the curricula that schools use. Unlike Time, I don't immediately assume that this is a bad thing.

The world of standardized testing has its own language and history; an entire branch of science, psychometrics, is devoted to test design and analysis. But the tiny discipline touches most Americans' lives at some point...

Hee hee hee. You just don't know how much I appreciate that phrase, "tiny discipline." Not only is it funny, it's true - there aren't many of us in the world (and some are very thankful for that, I'm sure).

Insta-experts from the media and from antitesting groups often repeat fallacies: blacks do better in college than their SAT scores predict (actually, for reasons that aren't well understood, blacks tend to do worse in college than matched groups of whites with the same scores); how well you do on the SAT will determine how well you do in life (SAT scores have little power to predict earnings).

Hooray. Finally, an article willing to call a fallacy a fallacy, although there are plenty more they could have listed here (and certainly the names Rosner and Freedle should have been mentioned as well).

Six months ago, TIME asked the College Board if we could sort out some of these conundrums by following the development of the New SAT from inside. To our surprise, board president Gaston Caperton III agreed...

...the production of New SAT questions has entailed some expected debates—Is this item too hard? Is that one biased against women?—I saw something quite unexpected as well: Caperton is changing the very nature and purpose of the SAT.

At his insistence, the goal of influencing school curriculums has become the overriding preoccupation of the new test's developers...To be sure, Caperton believes the notion (actually, he's staking his career on it) that the SAT can both improve high schools and still remain useful to colleges as a predictor. But the first goal is a political aim; the second, a psychometric one. And Caperton has surrounded the New SAT with dozens of educators who aren't schooled in psychometrics.

Reeeallly. Now that's interesting, and good reporting.

For decades, the SAT was, at its heart, an aptitude test; now it's becoming more like its competitor, the act, the nation's biggest achievement test. What's the difference? Achievement tests gauge mastery of subject matter; your U.S. history final was an achievement test...

Aptitude tests are harder to define. Many people seem to think of aptitude exams in general—and the old (or current) SAT in particular—as IQ tests...

If IQ tests try to probe innate abilities, and if achievement tests rate classroom learning, aptitude tests assay something in between—developed abilities. Developed abilities are those nurtured through schoolwork, reading, doing crosswords, soaking up the arts, debating politics, whatever. These aren't inborn traits but honed competencies.

Whereas early psychometricians, many of them racist, propagated what Lemann calls the dipstick theory—the idea that a test score is like a mark on a dipstick showing the raw amount of intelligence in your mental oil tank—the field outgrew that simplistic notion at least a generation ago.

Um, more than that, unless a generation is fifty years or so. But hey, at least they're not still calling psychometricians racists; many reporters haven't made it that far. And their explanation of the distinction between aptitude and achievement is very useful for readers.

...the more you challenge yourself intellectually, the more you condition your brain; your academic achievements are less impressive if you don't have the conditioning to build upon them. As the SAT becomes more an assessment of one's achievements, it will less sensitively gauge these underlying skills...

...what happens when you move away from trying to assess aptitude? Consider the reading section of the New SAT. In May, the College Board's Reading Development Committee decided that SAT item writers should feel free to use literary terminology in their questions for the reading section. Words that one would typically use only in a literature class—simile, personification—had always been avoided on the SAT...No more...

The use of technical language will also increase in math. For instance, in the past, an SAT item might have stipulated that group A has 10 members and group B has 10 + 5x members, where x = 3...on the New SAT, the question might read, "What is the union of sets A and B?"...The answer is still 35, but you must know the jargon to get it.

This is a fine point, and I'm glad the Time writers took the path of questioning what could happen with the shift towards more technical language and jargon, and away from pure symbols, instead of immediately assuming that this change is unecessary, or unfair.

I put that question to David Lohman, a University of Iowa psychology professor who has studied the differences between achievement and aptitude tests. In a paper that will be published in the forthcoming book Rethinking the SAT , Lohman analyzed test scores for 6,300 11th-graders who in 2000 took two very different tests, the Iowa Tests of Educational Development (ited) and the Cognitive Abilities Test (CogAT)...

The ited is your basic achievement test: it assesses how well kids have learned such class exercises as setting up science experiments, reading social studies passages, and spelling. The CogAT, by contrast, is a test that measures verbal, quantitative and figural reasoning abilities, irrespective of any one curriculum...

When he compared ited and CogAT scores by race, Lohman found something surprising to those outside his field: the gap between white and minority students was smaller on the reasoning test than on the achievement test...

Others have replicated such findings by comparing achievement and reasoning tests in earlier grades; one theory as to why minorities often score higher on the latter is that they attend poor schools that leave their potential untapped. "Indeed," writes Lohman in Rethinking the SAT, "the problem with the current version of the SAT"—which continues to show a racial score gap—"may not be that it is an aptitude test, but that it is not enough of an aptitude test."

Very interesting. Time also points out that it has been schools such as the University of California that have demanded this change, rather than the College Board. So the new SAT, which may be more likely to measure the "higher-order thinking skills" that educrats so favor, may also exacerbate the score gap, and it may do by measuring achievements that students at poor schools are less likely to attain, regardless of their innate ability.

If the subject-based SAT IIs are doing just as good a job of predicting performance as the SAT at some schools, that suggests that those higher-order skills may be what admissions tests should be measuring - even if the subgroup differences widen. But some claim that isn't the case. What does it mean if a combination of high school GPA, SAT, and the SAT II is the best predictor? What do we conclude about what better prepares a student for college? Is reasoning, as measured by something like the CogAt, really enough?

The paradoxes, thorny decisions, and Catch-22s (now there's some literary jargon for you) here are really something. The new SAT reading passages will come from actual works of literature (although I assume they will bowdlerized to some extent), meaning that those passages won't be deathly dry and dull anymore - but this also means that students who have already seen those passages may very well have an advantage. It's not that it's bad to encourage kids to read those books, but Time is correct in surmising that this introduces noise into the measurement of reading comprehension.

The more performance assessments a test contains, the more noise the measurement contains, and the new SAT essay will be no different:

As Lemann writes of the early rationale for the SAT, "Tests that require a student to write essays ... are highly susceptible to the subjective judgment of the grader and to the mood of the taker on the day of the test, so they have low reliability."

Reliability is a measure of a test's precision from one administration to the next—a gauge of how much noise, or measurement error, it has eliminated. The standard error of measurement for a typical SAT is about 30 points for the math section and 30 for the verbal...Thirty points in either direction is a pretty big swing, but scores on the writing section will be even less reliable: field trials of the New SAT estimate a standard error of measurement of 41 points. That means a kid who gets a 670 may "really" be in the élite reaches of the 700s—or in the more average environs of the low 600s.

Grading essays is not an easy task. Setting standards so that raters can be trained to efficiently and accurately rate essays is very difficult. The College Board has their work cut out for them. Combining the essay score with the multiple-choice score mitigates the noise problem somewhat, but not entirely.

Today Atkinson and Caperton have launched another great social experiment with the SAT. This time, the idea is that the test's rigorous new curricular demands will lift all boats—that all schools will improve because they want their students to do well on the test. Schools have long tried to prepare kids for the SAT, but education experts scorned the practice of openly teaching to the test. Now it's the mission of the College Board that every school should teach to the SAT. "I would say that the most important aspect of this test is sending a real message back to kids on how to prepare for college," says Atkinson. It's not clear what happens to students in schools that won't hear, or can't afford to heed, his message.

Not a bad ending. And not a bad article; in fact, it was much better than I thought it was going to be, given the tone of the introductory grafs. An important question has been raised. Are those educators who so often devalue the "basic skills" and "lower-order thinking" that the old/current SAT allegedly measures going to support this more difficult test, even if the score gaps widen? Certainly, the gaps should be a wakeup call to struggling schools, but will those schools claim that they cannot "afford to heed" the message?

I don't see how they can afford not to.

Posted by kswygert at October 21, 2003 02:02 PM
Sitemeter