May 28, 2003

The problem of intuitive test

The problem of intuitive test theory

I only have time for one post today, so I'm going to make it count. Here's a link to a newly-posted article on intuitive test theory, written by two eminent psychometricians. The first author is Robert Mislevy, who's a professor of Measurement, Statistics, and Evaluation at the University of Maryland. He's also a giant in my field and an all-around nice, brilliant, funny guy (here's a link to various doodles he drew during boring meetings - "The worse the meeting, the better the drawing", in his words). His co-author is Henry Braun, a well-known researcher who was former VP for research management at ETS, and is now a distinguished presidential appointee (let me hasten to add, however, that the paper does not represent ETS's position or policies).

This paper is really, really, really neat, and I want to know what you non-psychometricians out there think of it (and so does Bob Mislevy, for that matter). The thesis of the paper is that there are "phenomenological primitives," or p-prims, in psychometrics just as there are in physics (from where the term is borrowed). P-prims are:

...primitive notions in the sense that they "stand without significant explanatory substructure or explanation" (diSessa, 1983, p. 15). Familiar examples [in physics] are "Heavy objects fall faster than light objects", "things bounce because they are 'springy'", and "Continuing force is needed for continuing motion."

Most anyone who is not a physicist is going to have a world-view that is in some way shaped by p-prims - these are intuitive, primitive notions, drawn from direct observation, that can serve as an explanation for functioning in the world without actually being correct. Physicists know these p-prims are not correct, of course, but most of the rest of us don't, because we don't need to. We're not the ones building particle accelerators or "shooting rockets to the moon," so it doesn't matter if we can't explain much about space, time, and force without resorting to broad, inaccurate generalities. We know how to throw a ball to a dog; we don't have to know why the ball flies or why it stops. We aren't the engineers.

So let's move on to the field of testing. As some of you might remember, I was inspired a while back by this Steven Den Beste article, and subsequently wrote about how psychometricians are the engineers of educational testing. We're the ones who have to know how things actually work in testing, and we have to build tests according to empirical models, not sociological or educational theories about how tests should work. Problem is, a vast set of testing p-prims has developed - the "intuitive test theory" in the paper's title - and much of the current debate about testing is being informed not by empirical test theory, but by the intuitive theory. Actually, "inform" might be the wrong word, because these p-prims, which are explicitly stated in the paper, are often a hindrance to intelligent discussion and decision-making in the education world:

...first I will list a number of beliefs about testing that my colleagues and I come upon time and again in discussions of tests in everyday conversations. We will return to them presently.

• A test measures what it says at the top of the page.
• A test is a test is a test.
• Any two tests that measure the same thing can be made interchangeable, with a little equating magic
• A score is a score is a score.
• You score a test by adding up scores for items.
• 93% is an A, 85% is a B, 78% is a C, and 70% is passing.
• Multiple-choice questions only measure recall.
• It's easy to write test items.
• You can tell if an item is good by looking at it.
• You can tell if a test is good by looking at it.
• Technology will solve testing problems by making it possible to get voluminous amounts of data.

The remainder of the paper then goes on to address these p-prims, in a manner that's meaty enough for psychometricians yet readable enough for the layperson. I think the entire subject is absolutely fascinating. And I agree wholeheartedly with his conclusion about how this makes life frustrating for psychometricians:

One [aspect of the job] isn't fun, but the other is. The one that isn't fun is trying to critique or implement policies and programs that have been put together on the basis of intuitive test theory. This kind of project requires a lot of telling people that what they want to do won't work, and to do it right is harder or takes longer or isn't as accurate as they want.

Amen. What I've been routinely bashing as testing "myths" are what Mislevy and Braun have more elegantly defined in this paper, and in writing this, they've done a great service to psychometricians and others educators alike.

In addition, I received an email from Bob Mislevy, in which he mentioned the connection between this topic and my comment about psychometricians and engineers. He also mentioned the astounding lack of solid test theory knowledge in some people who should know better, including the hapless Mr. Freedle, who had the, um, "imaginative" solution for correcting racial bias on the SAT:

...it is amazing how many people in the education business, including even assessment policy, don't really understand the concepts underlying psychometrics--even when they know a lot of the words and use some of the formulas...The other [connection] is the more recent entry about Roy Freedle's paper. Despite his many years at ETS, and knowing a great deal about language and substance of language assessment, Mr. Freedle remained remarkably robust against those basic ideas of the inferential machinery of assessment. What he says does indeed make for good copy in the popular press, unfortunately. Hooks in so neatly to some strong and popular p-prims...

Yes, those people whose comments match up with the more popular myths are the ones who get the publicity; those of us who try to patiently defuse these myths are often ignored. The refusal of some to correct their intuitive test ideas is understandable - after all, who wants a lecture on physics every time they play catch with Oscar? - but when the intuitive test theories are driving educational reform, or state standards, it's incredibly frustrating to watch.

"Remains remarkably robust against those basic ideas" - heh. Hee hee hee. Okay, so you'd have to be a psychometrician to really see the humor in that. Still, the joke is greatly appreciated on this end.

Anyway, I want to see plenty of reader discussion about this. Those of you who are educational professionals and reformers, those of you who are parents, those of you who have way too much time on your hands (you know who you are) - send me some emails. Download this paper. Suggest some other p-prims (such as the one I keep bashing, which is "Group mean differences indicate test bias," although I'm not sure that's so much a myth as a willful misrepresentation for ideological purposes). Let me know what you think.

Posted by kswygert at May 28, 2003 04:26 PM
Sitemeter