The Education Wonks link to a NYT article summarizing serious problems with standardized test items:
Beware the perils of ambiguity. It is a mantra that is increasingly pertinent to tests in mathematics and science. The two fields might seem immune from imprecision. But in mathematics, for example, today's tests assess more than a student's ability to do "naked computation," as Cathy Seeley, president of the National Council of Teachers of Mathematics, puts it. In many places, calculators have rendered meaningless the testing of basic computational tasks. Instead, more questions test students' comprehension in real-world contexts. A triangle is a corner garden bed. A rectangular object intersected by a line is a juice box, with a straw. A sloped line on a graph represents a year's worth of payments to the power company.With these scenarios come variables, and mathematicians and scientists from British Columbia to Boston spend much time picking apart the questions, particularly in online discussion groups. If students are asked how many seeds can be planted in the surface area of a triangular garden, do you put seeds in the corners where there isn't room for plants to take root? What about relevant considerations like seasonality of utility bills or position of the planets? Multiple-choice questions, with no place to show your work and thinking, make such realities more vexing.
These realities should vex everyone who thinks that a lengthy word problem is always more suitable than a simple computation. Word problems certainly have more face validity, and their champions claim they engage students in a way that straightforward computational items do not. But there's a lot more squiggle room in talking about a triangular flower bed than in talking about a triangle.
Field testing, of the type described below, is crucial:
Once questions are written, they are typically reviewed by multiple groups that include test writers, teachers, editors, statisticians and content specialists. And then most developers test the questions on real students in real exam settings. In field testing, statisticians may discover that most top-scoring students selected answer "d" when answer "c" was deemed correct. What made "d" so appealing to the advanced students? Could a flaw in the question have led them to arrive at an equally correct answer? In most cases, the incongruity is a red flag, prompting developers to discard the question.
Any organization that goes live without field testing, especially those who use innovative items, is asking for trouble. But even field testing doesn't catch everything.
I disagree, though, that the situation is always better when multiple-choice items are removed. Yes, multiple-choice items that are poorly-written can confuse students if there is more than one right answer, or no right answer. But all answers must be scored, and it's quite possible that the time and expense needed to create a scoring rubric to account for all possible answers on an open-ended item is more than what's needed to create and field-test a decent-sized pool of MCQ's.
There's a nice list of pros and cons for various item types here. Note that for every type except multiple-choice, scoring becomes more time-consuming, more challenging, or both.
If you're really interested in writing some good multiple-choice items, you can't do better than this set of guidelines, "Constructing Written Test Questions for the Basic and Clinical Sciences," by Case and Swanson. The booklet is tailored to medical science items, but the techniques could be easily adapted to other fields.
Posted by kswygert at April 26, 2005 04:54 PM