April 30, 2003

Scientists, engineers, and psychometriciansFirst, go

Scientists, engineers, and psychometricians

First, go read Steven Den Beste's post on engineers and the application of scientific thinking, and be sure to read the Jane Galt article that he references as well. I read both of them this morning, and they got me thinking. As a result, I have this theory that might explain why testing is so unpopular among certain educators.

My theory begins with Jane's observation that not only do liberal arts major tend to avoid scientific thought, they also tend to be hostile to those who would disprove their "theories" with logical analysis and facts:

I've been an English major. And the unfortunate tendency for those who are verbally fluent and spend four years arguing their opinion through footnotes and elegant phrasing rather than data, is to believe that a nice turn of phrase is as important as hard data. It informs the glib politics of many in the academy who often seem to think that the amusing bon mots of a Doonesbury cartoon constitute serious policy thought. And the reaction I get when explaining, say, rent control -- that somehow I'm just being mean, and that if I wanted to, I could make it so that imposing rent control improved the housing stock rather than destroying it.

Which is not to diminish the importance of literature and art. It's vital. But it's dangerous that our humanities students are so alienated from the scientific way of thought that they can't evaluate science on its own terms.

Jane's observation is two-pronged. First, many people with degrees in the liberal arts and humanities - and I include those with degrees in education in this category - place little or no importance on formulating and evaluating hypotheses, and there is little emphasis on developing a scientific mindset. I've noticed that many of those who espouse the wackier educational theories rarely follow the steps of the scientific method that Jane outlines; that is, they rarely state the underlying premises for their theories, they don't entertain other theories, they don't examine any contraditions between their theories and the premises, and they have a tendency to ignore any possible disconfirming evidence.

Second, these types of academics not only disavow the scientific method, but are openly hostile to those who utilize it. As Jane puts it, her students think she's being "mean" when she confronts their questionable economic theories with solid facts.

So. How does this relate to what Den Beste posted, and to my theory about the hostility against testing? Den Beste notes that:

Despite what [Jane] says, it's engineers and not scientists who are building the modern world. And if anything, this makes her point even more forcefully. But that's because the engineering sensibility is in a sense even more extreme than the science sensibility.

Scientists can still sometimes afford the luxury of ideological self-delusion -- it's happened many times...No engineer can... If we don't produce results more or less on time, we're automatically failures. And if we do produce results, our work is instantly tested in the cold brutality of real-world use and market acceptance. If the product doesn't work, or doesn't actually solve customer problems adequately, no amount of handwaving and nifty turns-of-phrase in the documentation will change that, or prevent it from being discovered. It will be discovered sooner, rather than later. And when it is, the product (and us) will be a failure...

Engineers cannot afford any kind of delusions; it costs too damned much.

It was clear to me, after reading this, that standardized tests are the products of educational assessment, and psychometricians are the engineers. If our tests fail, the publicity fallout is scathing and immediate. Long-lasting, too; people were still talking about NCS Pearson's scoring errors, which erroneously held back almost 8000 students, more than two years after it happened.

We have to be scientists. We have to care more about pragmatism than idealism. We can't just dream up workable mathematical item response models, or posit glib ideological explanations for why test-score gaps occur. We can't ignore the real-world validity tests that our assessments undergo, and we refuse to ignore certain relationships among test scores and educational achievements just because our conclusions are not "politically correct." We demand that educators not only look at the data, but take it seriously, and we don't allow feelings and ideologies to be "proof" that our data are not correct.

And that's where we run head-on into the idealistic educators, who have no interest in the scientific method, and outright hostility towards seeing their pet theories disproven.

One example of the schism between those who dream and those who produce in the world of educational reform is the the current fad for performance assessments (or portfolios). Those who tout these exams as an educational cure-all often have a mystical and unrealistic concept of them. They envision these exams as non-standardized, low-test-anxiety, touchy-feely, unbiased, multi-dimensional measures of "higher-level thinking" that don't require a lot of time to grade, yet are also perfectly reliable, perfectly valid, and inexpensive. These dreamers don't want to hear us when we tell them that these assessments require a great deal of funding to develop, lengthy amounts of time to administer and grade, and many controls in place to avoid rater bias.

Rater training is difficult work, and ratings must be done blind to avoid bias based on unrelated student qualities (such as race). Even with superb training, raters often disagree with one another or with the scoring rules, and the reliability of the scores is driven downward. The more qualified the rater, and the more training the rater receives, the more money they are paid.

Even if raters were perfect and cheap, developing a broad performance assessment is an extremely difficult task. If it's meant to measure something different from the multiple-choice exams, then what do we correlate the scores with to see what the test does measure? What type of items should be used? How do we quickly score open-ended items? How valid are short-answer items? What's the impact on certain subgroups if we suddenly switch item types? Do we move from one kind of test anxiety to another? And how are we supposed to combat test anxiety when certain activists keep insisting that our assessments are racially biased? Switching from an objective (multiple-choice) exam to a more subjective one increases the possibility of test bias. What if the test-score gap increases with these new assessments?

These are but a few of the many issues that we, the engineers, insist on addressing before a test goes live. Unfortunately, the idealists and politicians rarely support our realistic and pragmatic approach. In addition to the hostility and charges of racism that test developers often face, few educators and politicians bother the learn the methods required for developing and validating exams.

Evidence of this is the fact that testing companies are now given very little lead time to develop exams. This quote, from an AccessAtlanta article now buried in their archives (here's my post on it), bears repeating:

Stuart Kahl, president of Measured Progress, said the testing industry has started to expand to meet the demand...Kahl said the biggest challenge facing the industry is time. He said the companies need to create individualized tests based on each state's curriculum. The companies used to have years to develop tests, he said, but that's changed. "Instead of taking three or four years to develop an assessment system, you've got three or four months," Kahl said.

So, our educators value reform and assessment - yet we're expected to produce good tests in 1/12 the time that we had available before? Yet another sign that the scientific method, and the psychometric application of it, is not truly valued within the educational community.

Update: The two best snappy replies that I've gotten to this posting are:

(1) "Math and science are cold and hard and mean. And male and white. Wrong answers kill puppies." - Joanne Jacobs, who must be completely fed up with the anti-science crowd.
(2) "Imagine your children being graded by ice-skating judges ...'nuf said." - Tom C., who probably suspects the French are somehow involved with this whole performance assessment conspiracy...It wouldn't surprise me.

Thanks for the input, folks.

Posted by kswygert at April 30, 2003 10:09 AM
Sitemeter