Devoted Reader John L. sent me this tale of "testing goofiness in Oregon." The state's 10th-graders are turfing on the math exam, and the test design might be to blame:
State officials are racing to answer why: Was the test too hard, or did schools fail to teach this class to write clear, mathematically sound answers to elaborate math problems?
Last year, half the state's sophomores passed the problem-solving test. So far this year, 82 percent have failed. Another 20,000 sophomores will take a different problem-solving test starting Monday through mid-May...
The state has given the problem-solving test since the early 1990s, part of a decision to go beyond multiple-choice questions when measuring math skills. Students choose one of three multistep math problems, then write an answer that typically runs a page or two. They must show how they solved the problem and how they checked their work; communication counts as much as the right answer...
Every version of the test gives students a choice of a probability question, a geometry question and an algebra question. This year's winter test gave students the chance to prove themselves figuring the odds in a dice game, the dimensions of a hand-made quilt or the speed and mileage of a daughter and her slow-driving dad. By comparing results from this winter's test with results from a year earlier, state officials have determined there wasn't one particularly difficult question on this year's test, they say. All three questions tripped up more students than last year.
Thanks to this, some in Oregon have become highly critical of perfomance assessment items:
Rob Kremer, a longtime critic of Oregon's test system who ran unsuccessfully for state schools superintendent in 2002, said the wild swing in results proves that the state-developed test is unreliable.
"Faddish assessments such as Oregon's math problem-solving tests are not suited for use as large-scale, high-stakes tests," he said.
I don't know if I'd call problem-solving tests in math "faddish," and such items are not automatically unsuitable for high-stakes testing. When the state's employers claim they need more citizens with solid problem-solving skills, they're right, and one way to test those skills is with this type of item.
But such items are more difficult to develop properly, and they may very well test a narrow area of the domain, making it hard to generalize the results to the overall math construct. What's more, that one item counts the same as the multiple-choice exam, so if none of the three options are appealing, an examinee is at a real disadvantage. There's research to suggest that examinees, when given a choice of topics, don't always do a good job of knowing what they're good at.
My reader wanted to know how the following could be possible:
It's fairly easy for test makers to create a new multiple-choice test that is as difficult as the previous year's test, said Edward Haertel , a Stanford professor who is past president of the National Council on Measurement in Education. But when creating tests that require long answers, it is harder to match the difficulty level from year to year...
Haertel said a statistical adjustment, such as the one Oregon testing officials are considering, may be the best step for the state to take.
Although I don't know for sure what Haertel is suggesting, one possibility is to assume the distribution of examinees this year is similar to last year's, and essentially shift the score scale up to match. That's similar to what is done on large-scale standardized tests like the LSAT, which is why a certain number right out of 101 items can translate to a different scaled score from form to form. Obviously, though, it may be unsafe it is to assume the student ability distribution is the same from year to year; if the quality of teaching declined dramatically, it won't be.
A second possibility is to "borrow information," and examine what the historical correlation is between the MCQ's and the performance-assessment items, and use that to adjust scores. If, in the past, students who did really well on the MCQ's also did well on problem-solving, then you'd expect the same to be true now. If it's not, the PA score can be adjusted. However, oftentimes MCQ's and PA items do not correlate highly (if they did, they could be measuring the same thing, and both types might not be needed).
A third option at this point is to re-weight the test sections, given more weight to the more reliable part, the MCQ's. And then there's the "scorched-earth" option:
The U.S. Department of Education would have to approve any move by the state to cancel the results, which would spare schools the consequences of the poor scores, said Ron Tomalis , counselor to the U.S. secretary of education.
Posted by kswygert at April 26, 2004 03:35 PM