Welcome!
If you're here from the Fox News Blog of the Day website, thanks for visiting. This blog has been up and running for about four months and I'm greatly enjoying the reader response. I hear from a lot of people who have been looking for a psychometrician to bounce ideas off of - parents who want to talk about their kids' testing experiences; the kids themselves who think the tests are too hard, too easy, or just a pain in the rear; scientists in other fields who want me to check their statistics (you know who you are). Please feel free to email me if you have a question or comment - I'm out of the office quite a bit in June but I'll try to reply to every email.
Note: I had to post this early, since I'll be out of the country when my blog is planned to appear on the Fox website. If it isn't there when you read this, check again after noon on 5/29/02.
The Godless Capitalist strikes again
Major dude Godless Capitalist fact-checked Mark Goldblatt at JWR and sent me the results. Charitable as always, GC prefaced his email to me by saying, "I'm surprised you didn't mention [Goldblatt's error]", which is a nice way of saying, "Hey, take a closer look at the data next time before you praise someone's work." A good thing to remember.
However, I'm going to play Devil's Advocate again and suggest that Mark's main point was correct, although some of his statements were incorrect. Here's the relevant part of Mark's argument:
"Survey after survey has shown only a slight correlation between SAT scores and college grade-point averages or graduation rates.[...] The predictive factor of the SAT is very strong at the extreme ends of the scoring spectrum. So, for example, a student who scores a combined 750 is far likelier to flunk out of Dartmouth or Stanford — or, for that matter, any accredited liberal arts college — than a student who scores a combined 1500. Now imagine a school that attracted equal numbers of 1500 and 750 scorers. Do you think there would be a noticeable correlation between SAT results and grades? Or between SAT results and graduation rates?[...]In the real world, of course, there is no such school — since elite universities utilize standardized tests to screen out most applicants who score below, say, 1300[...] It's this very screening process, however, that undermines the SAT's ability to predict grades and graduation rates since it ensures a relative homogeneity among students at any given college. Once the pool of students is narrowed to those who scored between, say, 1100 and 1300, then variables such as home environment, discipline, and maturity — which the SAT cannot measure — tend to override the statistically minor deviation between, say, a 1130 student and a 1170 student."
Now, take another look at the numbers that GC cites. Those numbers are indeed higher than Mark seems to be giving the SAT credit for. It's entirely possible that Mark didn't take a look at the College Board data, or that he did, but didn't consider a correlation of (for example) 0.52 to be high, when in fact it's huge (from a social science perspective). So GC was correct to call him on that.
However, I really think Mark was trying to make a different point, which is that these correlations are not the same from school to school, and that the more restricted the range of SAT scores for admitted students, the smaller the correlation. After all, the correlations listed by the College Board are weighted averages across schools, and no measure of variability is given. Mark's point was that some critics of the SAT may think they've proven their point by referring to this school or that school that shows a low correlation between SAT and GPA, when the cause might be the restricted range of SAT scores. However, he was remiss in not getting the facts straight from the horse's mouth and reporting the College Board numbers, which also refute the critics.
A Carolina scandal not related to the Confederate flag....
More news reports here, and here, and here, about the security violation scandal in South Carolina. This scandal interests me because I grew up in South Carolina, although I graduated before the current PACT tests and the emphasis on accountability.
Three teachers have had their licenses suspended (though not revoked) since the tests were implemented in 1998, and ten schools have been or are now under investigation by SLED (State Law Enforcement Division). This security violation is a misdemeanor that can carry a 90-day jail sentence or a $1000.00 fine; failure to report any security violation is also a crime, and at least one school considers their violation to be "human error" that was reported just be on the "safe side" (of the iron jail bars, apparently).
The third article that I've linked to is the oldest one, but it's the most comprehensive and it lists several cases of cheating nationwide. The Devil's Advocate in me just has to notice that when students cheat on classroom tests, we discipline the students and keep the tests, but when teachers and administrators assist the students in cheating on statewide tests, some rush to declare this as evidence that the tests should be abolished. Interesting, isn't it? After all, the fact that some will always try to get around the rules could be used as support for abolishing just about any rule, according to this logic. To their credit, though, the school principals state emphatically that the pressure posed by the tests do not justify any "lapses in judgement" on the part of the test administrators.
The Fresno Bee has yet another article about the testing backlash in California (this article also mentions the story of Ms. Robinson's suicide linked below). The old conundrum of high-stakes vs. no-stakes comes up again here - no one wants to penalize kids for their performance on tests designed to assess schools, but the result is that there's less incentive for kids to perform well on those tests, and less incentive for teachers to teach that material. If the claim that California has "no coherent system for accountability" is true, change is definitely necessary, although I have my doubts about the capability of AB2347 to produce the desired coherent testing system.
Too much testing in California's schools?
Another sympathetic article about over-testing of students in the San Francisco Gate online. The article does a good job of describing the "gantlet" of tests that school students may experience, although some of the tests that eat up so much time are not mandatory (such as the AP exams). A spokesman for Fair Test is once again quoted, this time with a silly comment about measuring kids and fattening cattle. California may be "ahead of the curve", and much too enthusiastic about bringing on the tests, but California's educators have not claimed (I hope) that the tests in and of themselves make kids smarter, as his comment suggests.
Opinions about the new emphasis on accountability
The Atlanta Journal-Constitution has gathered a range of opinions about testing and accountability in the Georgia schools. I think the articles are pretty balanced, although the unintended consequences of high-stakes testing are over-dramatized as usual.
For example, I was dismayed to read that Betty Robinson, a principal at Simonton (Georgia) Elementary School, committed suicide in April, but I was also surprised by the decision of the AJC to publish an opinion by a schoolteacher that suggests Ms. Robinson took her life because of high-stakes testing. I don't begrudge the teacher in question his grief and his concern. I do, however, question the judgment of the AJC editors, who apparently decided that the claim that Ms. Robinson committed suicide "at least in part because of the stressful demands placed on her to improve her school's test scores" is an appropriate assessment of the impact of standardized testing. There are plenty of valid criticisms to be made of the high-stakes testing programs, some of which are badly-planned and too-hastily implemented, but the story of Ms. Robinson is too profound to be reduced to just another reason to oppose standardized tests. What's more, it leaves anyone who defends standardized testing open to the accusation of being callous if they decide Ms. Robinson's death should not be taken into account when assessing the testing programs.
Protesting the MCAS
Education News pointed me towards an online article about May 16th protest against the Massachusetts Comprehensive Assessment System (MCAS). And what was said in protest?
" There is no more important lesson in our country, " [City Councilor Ken] Reeves told the assembled students, parents and teachers, " that if you don’t like something that is going on, you should be able say something about it. "
Can't argue with him there....
" We think that you are wrong, " Reeves continued, speaking to MCAS supporters and Ed Reform planners, who were noticeably absent at the gathering. " As Fredrick Douglas said, ‘If there is no struggle, there is no progress.’ Well, here’s the struggle, let’s see some progress. "
Were Ed Reform planners invited? Who knew about this protest? How was it publicized? And Reeves' logic, while not completely unsound, is a a little incomplete. No struggle may mean no progess, true, but the existence of struggle does not imply progress. Tearing down an existing system is not progress unless a better one is suggested to be erected in its place. Do the protestors have an alternative in mind?
In 1999, the Cambridge School Committee approved a policy that guaranteed no reprisals would be made against students who opted not to take the MCAS test, and directed schools to place boycotters in instructional settings during test-taking times. The committee also guaranteed there would be no reprisals against teachers or staff who speak out against the test.
Wonder if those "instructional settings" are real classes, or about as instructional as "study hall" was in my high school?
The fight is really about money, of course:
Cambridge receives almost $9.5 million in education aid annually from the state, as part of the Education Reform law of 1993. The law created uniform standards for all school districts, as well as the Massachusetts Comprehensive System exam — a test backers said would ensure very high school graduate in the state received a quality education. But with just one year to go before they must pass MCAS to graduate, 35 percent of high school juniors in Cambridge have yet to pass the test.
Here's an invitation - if you're out there reading this and you're anti-MCAS, let me know what the alternative is. I'm not saying there isn't one, and for heaven's sakes, I'm not necessarily in favor of high school exit exams. I'm just saying that so much of this anti-testing movement seems to be entirely reflexive and based on nothing more extensive than, "We don't like these tests."
Juan Gato & his Bucket O'Rants discovered a nice little hissy fit going on in Seattle. Seems that a group of principals are upset because they can no longer ensure a diverse student body - meaning, they can't use race as a means of student placement - thanks to a circuit court's recent ruling. This reply by a Seattle Times editorialist neatly skewers their arrogance and bluntly reminds them that, when it comes to choosing schools, race doesn't matter to parents as much as the edu-crats wish it would.
The "Don't they get it?" award
New York City school districts have been relying on Kaplan to "help" students get through standardized tests, according to Newsday. This is one of the few times that I feel the anti-testing crowd has a point and should be making their voices heard. The decision to use a test prep company undermines the reasons for implementing K-12 testing programs, and I'd be angry too if my tax dollars went to fund that.
K-12 Testing in the news
Here's a quick roundup of some local news about testing in various states.
Santa Rosa county, CA students a standout (Pensacola News Journal)
Nevada's standardized testing is undergoing changes (Reno Gazette-Journal)
A similar report on Pennsylvania schools (Pittsburgh Live)
FCAT scores are released (News-Press.com, SW Florida)
How the Hoosier students are doing (and how much it costs) (Indianapolis Star)
Big trouble in little Greenville on South Carolina's PACT (Greenville Online)
Uh-oh - more trouble in Texas (News 8 Austin)
SAT redesign and California's desperation
Mark Goldblatt reports on the changing SAT in the Jewish World Review. CNN had their version of the same story back in March. Mark's version is much more interesting and informative, because he goes into detail about criticisms of the test and alleged reasons for the change, whereas CNN glosses over the controversy and presents only one side of the story by including a tired, obligatory quote from FairTest at the end.
The JWR article mentions the three charges most commonly leveled against the current SAT, and I'd like to supplement the response.
"1) The exam is culturally biased against minorities...[Mark] except that Asian students consistently outscore white students."
Once again, those who don't agree with the test results feel free to use the psychometrically-precise term "bias" to mean whatever they want it to mean (I'm speaking here of the criticisms, not Mark's reply). Let me explain this one more time - group mean differences on a test is not proof that a test is biased. It is only proof that one group, for whatever reason, is performing worse on average than other groups. See my post on 5/20 for a discussion of reasons.
"2) The availability of SAT preparatory courses skews the scores of students from families affluent enough to afford them...[Mark] except that, according to the latest evidence, the average gains of students who take a prep course are 6-12 points on the verbal section and 13-26 points on the math section — out of a possible 800 — and that comparable gains can be achieved simply by taking the test twice."
I wish I had the citation for that evidence, because it does agree with the data that the testing organizations collect, rather than the data the test preparation companies trumpet. I'm a capitalist, so I don't begrudge the test prep companies their desire to make a buck. However, I'm in awe of those who have the nerve to take overinflated test prep score-gain claims at face value and then present that as "legitimate" evidence that the tests, and not the test prep processes, are somehow biased.
" 3) The exam measures only the ability to take the exam and doesn't accurately forecast future success in higher education...[Mark] Now in a limited sense, this is true. Survey after survey has shown only a slight correlation between SAT scores and college grade-point averages or graduation rates. There is, however, a vast logical leap from acknowledging that the SAT does not predict grades or graduation rates to concluding that it predicts nothing except the ability to take the exam."
Thank you, Mark Goldblatt, for directly stating what so many test critics willingly miss, which is that just because a test is not perfect does not mean it is not useful. We have a saying in psychometrics - "All models are wrong, but some are useful", which translates here to, "All tests are imperfect, but some are useful". Mark even goes on to explain the restriction-of-range phenomenon with correlations (although he doesn't use that exact term), which is another thing I rarely see in all the testing debate in the media. Great job.
Criterion-related tests now a requirement?
The NYTimes columnist Richard Rothstein informs us that a group of Democratic senators are "insisting that the new federal education law be interpreted as requiring the use of criteria-referenced tests." Those of you who have visited my site before know that I have posted a few times explaining the difference between norm-referenced and criterion-referenced tests. I won't explain it again here, because Mr. Rothstein does a good job of it.
I can't say I'm as impressed with the rest of what he says, though. He's basically stating an argument for not using criterion-referenced tests because of their supposed "flawed proficiency definitions and limited ability to detect progress." However, much of what he has to say this is critical of criterion-related tests has nothing to do with the inherent design of such tests. Yes, administrators can fudge the categories afterwards (these are the complaints I'm hearing from parents in Texas). Yes, categorizing studies is a much less distinct way of assessing them than giving percentile rankings. Yes, if test norms are held constant from year to year, a norm-referenced scale can help in assessing student progress (this is why we can compare current SAT performance to previous performance, at least back until the scale was tweaked).
However, the only requirement with criterion-based tests is that the test-takers are compared to how well they perform on a set of standards. The categorization can be set as broadly or as fine as the administrators wish. He gives an example that supposedly defies common sense: " [Suppose] 'proficient' scores are achieved by 40 percent of third and fifth graders alike but by only 30 percent of fourth graders. If the criteria were valid, such results would suggest (implausibly) that fourth-grade teachers were awful but fifth-grade teachers terrific." It could also suggest that the fourth-grade standards are set too high, either erroneously or just because fourth grade is a hard grade. It could suggest that the fourth-grade teachers are using a method that is not as effective. Nothing in that scenario is implausible, much less in defiance of "common sense".
I can understand his wish to remove any disadvantages or negative ramifications suffered by kids who improve from year to year but never manage to make it into the "proficient" category. The problems, though, are with the way categories are set and the ramifications of placing kids in categories, not the criterion-based nature of the test itself.
The Duchess of Dumb
Jill Stewart gets things off her chest in today's New Times (Los Angeles). Ms. Stewart isn't too fond of state assemblywoman Jackie Goldberg, and after reading today's article I can understand why. Her commentary on Jackie's attitudes toward testing is priceless, and I have to quote it at length:
"Goldberg is livid about the sustained gains in literacy and student achievement, reported all over California, following a return two years ago to stricter academics. She despises the traditional teaching of reading and math, enforcement of higher academic expectations, tests that compare student achievement at similar schools to each other, cash awards made to the most improved bad schools, and many other great reforms. The plain fact is, Goldberg cannot accept that these big academic improvements, shown by test scores at many of the worst schools, are disproving her own unbending beliefs about what causes student failure. You see, Goldberg is a former L.A. principal who put student "self-esteem" far ahead of academics"
Ah, one of those. I'm constantly bemused by the people who feel that self-esteem must be directly and automatically imparted, rather than earned through performance or mastery of skills, and that any form of challenge, such as a difficult standardized test, is inherently damaging to self-esteem. Such thinking runs counter to a great deal of psychological and educational theory, not to mention good old-fashioned horse-sense.
"Goldberg's bills would give teacher unions power in choosing curricula, ban crucial literacy tests in second grade and put teachers in charge of creating the tests that measure teachers."
I'm waiting for the day that someone suggests allowing students to be in charge of creating the tests that measure students. Hey, it's based on the same logic.
"Ineffective teachers, horrible teachers, good teachers in bad situations, burned out teachers who mean well -- all are equally protected from negative consequences. Indeed, the California Teacher's Association and other teacher lobbying groups made sure that we, the public, can never find out how any particular teacher is doing in addressing the illiteracy or other academic troubles of his or her students. Few people realize that test results by classroom are not public information in California. "
Indeed, I didn't realize that. I don't know how many states do release test results at the classroom level, though; that's something I'd have to research. Ms. Stewart doesn't specify which California tests she's referring to here, but I assume she means the STAR .
The Confidence Man gets off easy (no pun intended)
And what was the Con Man amusing himself with while I was slogging through the Texas testing compost heap? He was reading about pornographic pictures being displayed during standardized tests, that's what he was doing. Some guys really know how to live.
More news from the Lone Star front line...
Man, the concerned parents from Texas are coming out the woodwork! I'm getting a real education on the TAAS and the difficulties parents are having in dealing with those test scores. My most recent email comes from Susan Sarhady, the president of the Plano Parental Rights Council. The website is a great repository of TAAS-related articles, news about effective teaching strategie, and plans for giving parents more say in the educational process.
Susan's email is related to my previous post about the extreme fluctuation in the TAAS cutscore. According to her sources, the TAAS administrators claim that the test shouldn't fluctuate very much from admin to admin, but somehow they ended up with a test so much more difficult that the cutscores moved by a huge amount. These cut scores were not released until after the test was administered and scored, which leaves parents with no way to tell if the test was more difficult, or if (as they suspect), the cutscores were tweaked so as to preserve the illlusion that Texas schoolchildren are doing great on the TAAS, as described in this press release.
Also, the new test that replaces the TAAS is called the Texas Assessment of Knowledge and Skills (TAKS), which may or may not better measure the Texas Essential Knowledge and Skills (TEKS) standards. The transition from the TAAS to the TAKS is critically examined in this report (requires Adobe Acrobat) by the Texas Public Policy Foundation. The report is 65 pages, and after giving it a thorough reading, my head is spinning from the TAAS - TAKS - TEKS discussion. I'll just list a few highlights.
First off, one reason for the change is ostensibly because the TAAS is no longer challenging enough to measure the higher levels of academic acheivement in schools, which every parent who's emailed me thinks is absolute hooey. Second, the stakes are now higher - social promotion in Texas is going to end, and federal funding will be tied to progress on the state assessments. The new test, the TAKS, doesn't seem to be based on any new assesment objectives and promises to be similar (depressingly so, according to this report) to the TAAS. The Texas Education Agency's claim that the TAKS will be more rigorous is met with skepticism. For some constructs, TAAS scores and NAEP scores rose, but on others, NAEP scores declined even when TAAS scores rose. The difference between norm-referenced and criterion-referenced tests is discussed so as to make it clear that the TAAS items are criterion-referenced (mastery depends on knowledge of standards) but the passing standard and scores are norm-referenced(test takers are compared to one another). The new TAKS tests will seemingly continue to measure students at the TAAS level, which appears to be below-grade level as compared to the national standards. Performance on the TAAS may be more related to familiarity with the test format, rather than knowledge of the skills. The test development procedure currently in place does not give enough weight to the TEKS so as to produce a meaningful assessment of the standards. And so on.
The authors's closing plea? "Nothing is more important than getting TAKS tests right..[...]..No less than the future of Texas is at stake."
A lively bunch
If you're a concerned parent with a computer, some time to kill, and an educational issue you're dying to discuss, get on over to the bulletin board at the Education News website. The talk on the board ranges from testing issues to zero tolerance war stories to teacher qualifications to just ranting about kids spouting spectacular examples of disinformation, courtesy of the public school system. It's entertaining, informative, and never boring.
Homeschooled bee finalists are all the buzz
Yet another story about the superior spelling skills of homeschooled kids.
Rational ranting
R. Lee Wright goes on a rampage in the Rational Review about the current state of public schooling. I wouldn't want to be the principal who tries to justify, to R. Lee, the decision to ignore the North Carolina standards and to make countless "exceptions" for the related standardized tests.
The flip side
John Leo has a column in Townhall today about the flip side of Michigan's anti-Jewish (by way of being pro-minority) affirmative action - Vanderbilt is aggressively recruiting Jewish students, ostensibly for their higher SAT scores and rigorous study habits. Are "positive" stereotypes as dangerous to justice as negative ones? John thinks so. The part of the story that startled me was this comment: "The College Board has fueled the new market in religious identity groups by asking college-bound test-takers to list their faith." I can only hope that such a question is optional on the SAT. Even if the question was included solely for demographic research on the part of the College Board, it's inevitable that the inclusion of such information can be used to affect admissions.
What? A new education blog?
Oh, yes, right after I just said that I thought my posts would taper off, I discovered (via lovable Aussie bulldog Tim Blair ) a new education-related blog by Jeff Sackman - The Confidence Man. His byword? "Reliable information". Oh yeah? Let's see your KR-20 value. (Sorry, psychometric geek joke there.)
Anyway, he found some stories that I missed, so I'm going to poach links off him and provide some additional commentary on a couple of noteworthy stories (not that you shouldn't go read entire his site immediately).
1. I missed the Kohn editoral in last Friday's USA Today that bewails the tragedy that poor and minority students are more likely to suffer the "sterile" and "stigmatizing" environment of summer school, that "summer prison" (gee, no hyperbole there) that forces kids to learn the educational basics in order to improve their test scores. If, as Kohn says, "Research overwhelmingly demonstrates that the worst thing you can do for struggling students is hold them back a year", then what option other than summer school is there? I notice that he doesn't mention any overwhelming amounts of research to support his claim that tougher school standards reduce the quality of education, or that kids who haven't yet gotten the basic skills down pat could somehow benefit from the removal of standardized tests. Yes, that's his suggested solution to the summer school problem. Funny how those little multiple-choice items get blamed for everything.
2. Up to now, I've avoiding posting about the recent federal appeals court ruling that the U. of Michigan law school can give preferential treatment to minorites (link to story at NYTimes here, free registration required), mainly because the issue didn't seem to be that the law school doesn't believe in using the LSAT, but they do feel they should be able to disregard it in favor of "diversity". My postings tend to focus on the validity of test score use, but the fact is that if you want an incoming law class to be determined by race rather than LSAT scores, that doesn't mean the test is being used or interpreted in an invalid fashion. However, I should have at least provided a link to the story, because it IS a big story in terms of affirmative action and the ongoing "diversity" debate.
I have to comment here on Jeff's posting,
"Also enormous is 11 points [out of 180] on the LSAT [that Michigan decided race was worth when determining admissions]. The test is a fair judge of one's ability to apply comprehension and critical thinking skills. I've looked at every single LSAT ever published, and there's nothing in there that I can see that unfairly favors one group over another."
Now I realize that this IS a test validity issue. Jeff is the first non-psychometrician I've seen who's written anything like this, and I'm glad he did, for two reasons. First of all, the theory that current large-scale standardized tests contain questions that are biased against any subgroup (based on race or sex) is the biggest urban legend that anti-testing agitators circulate. I know from experience that these tests go through an enormous amount of pre-testing and review, and every test question is examined carefully for signs of differential item functioning, or DIF. I'm not going to explain DIF in great detail here, but the gist of it is that psychometricians consider a question to be biased when members of different subgroups who are performing at the same level score differently on that item. If, during pre-testing, high-scoring people in Group A have a lot of trouble with a question that high-scoring people in Group B do not, that item does not go on the test. Period.
However, a great deal of controversy stems from the fact that anti-testing activists consider overall group differences to be evidence of bias, when this is not the case. A group mean difference in test scores that favors Group B over Group A is not proof that the test is biased against Group A. More likely, it means that, on average, Group A hasn't yet learned the skills necessary for the test, or that Group A is not taking the test as seriously as Group B, or that Group A has not benefiting from enough test preparation, or that Group A suffers from stereotype threat (see Claude Steele's work). Many people who are knowledgable in educational testing see affirmative action as necessary to counteract racism and negative stereotyping, but on the other hand, many educators and psychometricians view Michigan's affirmative action as biased against Caucasians and detrimental to African-Americans who are admitted after displaying lower levels of compentency on the analytical skills measured by the LSAT.
This is related to my second point, a more emotional one for me, which is that those who spread the rumor that such tests are biased really get my goat. The underlying assumption is that test developers and psychometricans are all Caucasian (false) men (extremely false) who deliberately include biased test questions (utterly false) in order to keep minorities from advancing (uttter rubbish).
I can sympathize with the desire to give an extra push to students who may not test well, due to stereotype threat or lack of preparation. I draw the line, however, at the suggestion that the LSAT is so inherently biased that certain candidates should be awarded 11 LSAT points just because of skin color. There is no evidence to suggest that this is a valid interpretation of LSAT scores.
More on the "abysmal" history scores...
Joanne Jacobs has more to say than I did about the need for better teaching of history in today's schools (scroll down to "End of teaching history" to read). I especially like it that she made the point that the sudden craze of "multicultural" education can leave kids without a common base of American historical knowledge.
Summertime and the livin' is easy....
I figure now that school's out, the testing news will be pretty mild until August or so. I'll be surfing the web as always, but I may be posting more older stuff that is of interest, rather than breaking news. No matter what, I'll try to keep it interesting (as interesting as standardized testing can be, that is).
News from the Lone Star state
Alert reader Dudley Crawford sent me the link for this article, ostensibly about the lowering of standards on the TAAS. However, it seems that the reason students now have to answer fewer question correctly in order to pass is because the test is harder, not because the standards have been lowered.
I sometimes encounter potential test-takers for the standardized test that my company produces, and they often ask, "So, do you set the curve [their term, not ours] for the test before or after you give it?" The answer is "Both", meaning that we do our best to assemble tests that are all of the same difficulty level, but we adjust the conversion of number-right to final score afterwards to take into account any fluctuations of test difficulty. Therefore, if we were to give a test this year that was more difficult than last year's test, we would adjust the conversion so that you could get the same score with fewer items correct.
That much said, it's not a good idea to just toss off a more difficult test and assume you can correct it afterwards, because the impact on the test taker is different. I hate to use the phrase "It's not fair" here, because I hear that bogus, unsupported complaint so often in regards to testing, but it really isn't fair to give tests so different in difficulty that students can answer 10 fewer right out of 56 in order to pass. Yes, the process of equating is standard for these kinds of tests, but 10 points is a huge difference on a 56-item test. A more difficult test can be more nerve-wracking to the test taker, especially if they're expecting a test similar to those they've seen in the past. The test I work on has almost twice than many items, and our number-right conversions don't fluctuate by more than three or four points. A difference of over 17% of the items is an indication that the test is not well-assembled - perhaps there were not enough low-difficulty items in the item pool. In that situation, I would do some investigation to make sure that the more difficult test is in fact measuring the same thing as the previous tests.
Also surprising - the TAAS is being phased out next year in order to make way for a new test. First I've heard of it (although I don't have any psychometric informants in Texas right now).
A failure to teach
North Carolina recently released results from the end-of grade tests for third through eighth graders, and the results are shocking. Selected quotes:
"...it is easy to see that numbers of blacks, Hispanics, and low-income students are failing to learn basic skills"
"Twenty-three districts in North Carolina taught less than 50 percent of their young black males in grades three through eight to read at a basic level."
The problem isn't just within North Carolina:
"while overall scores have increased in reading and mathematics, the differences in scores for black and white students in virtually every NAEP subject area and for every age group are greater than they were in the late 1980s. Perhaps even more disturbing, these gaps seem to be getting wider each year."
Unsurprisingly, one of the suggested solutions to this problem is a set of multicultural lesson plans that stresses the importance of understanding the "differences and many aspects of multicultural education", although there are no studies listed in this report to suggest that such lessons would improve things. Meanwhile, the schools that have succeeded in raising the scores of minority students stress "solid academics, strong community support, and spending priorities", but mention of these schools, and these concepts, is buried near the very end of the article. Politics as usual, I'm afraid.
Sizing up test scores
I just discovered Education Next, which is part of the Hoover Institute and contains lots of nifty articles that are models of clarity, patience, and non-ideological discussion. I particularly enjoyed "Sizing up test scores", by Dale Ballou. This article takes a calm, rational look at the use of test scores for accountability, and provides a good definition of psychometrics and item response theory to boot. I don't think I know Dr. Ballou (I'm terrible with names), but after reading this article I plan to drop him a line. Oh, and Don McAdams' "Enemy of the Good" gives a thorough explanation of the imperfection of standardized tests without once lapsing into hyperbole. Great stuff.
Blooooooooooger...
...is moving verrrry slow lately. I couldn't log in for most of yesterday and today (luckily, it's a slow news day in the educational testing world). More posting to come, I promise.
A view from the "10th Circle of Hell"
Amanda Bowen, a 10th-grader in a Texas high school, airs her views today about standardized testing (especially the TAAS) and the failure of testing to motivate students. Happy as I am to see someone reporting from "the front lines", it was a shame to see that her fall into the trap, common to many standardized testing critics, of both damning and relying on standardized test scores. I'll explain this trap with three quotes from the article, in order:
"The [TAAS] is bureaucratically equivalent to any other sort of standardized test in any state, but when compared in difficulty to standardized tests in my home state of New York, it falls laughably short of the mark." So far so good - sound like she's getting ready to argue that the TAAS isn't difficult enough to be a useful test.
"The solution [to student shortcomings]? First and foremost, abolish standardized testing. The results have nothing at all to do with an accurate measure of student intelligence, anyway." Oh really? Would you like to provide some data to support that argument? Didn't think so.
"All reports show that those students who are home schooled are more interested in learning. They score consistently higher on the SAT. They are well adjusted and contribute to society. All this without BigBro looking over their shoulders." This was presented as her argument for getting rid of school boards. I thought we were supposed to abolish standardized testing, Ms. Bowen. The SAT is a standardized test. Do you mean to say that the SAT is actually "an accurate measure of student intelligence"?
This trap is a common one that I see from standardized testing critics. On the one hand, they blithely advocate eliminating standardized tests, without providing supportive data or suggesting alternatives that would provide accountability. On the other hand, they don't hesitate to present standardized test scores to support their favorite educational reforms.
This trap can be avoided fairly easily. While I don't agree with Ms. Bowen that standardized testing should be eliminated, if that's what she believes, she could certainly make her argument for that more consistent (and more powerful) by doing some research on the web to back her claim, and then by not relying on standardized test scores in order to support a related argument. That much said, kudos to Ms. Bowen for writing the article and entering the debate about standardized testing. I'll be emailing her today, and if she's interested perhaps she'll agree to post a line or two on this site.
And, heeeeere's Billy!
Today's a slow newsday, testing-wise, so I'm stealing Sine Qua Non's suggested list of topics for Bill Clinton's talk show:
Children, and the Republicans Who Starve Them
Republicans: Throwing the Elderly On the Street For Fun and Profit
Sexual Non-Relations
A Day as President, Blow by Blow
Reading “Is” Fundamental, Or Should That Be “Are” Fundamental?
Maintaining Your Political Viability At All Costs
Newts and Other Reptiles
Why I Would Handle 9/11 Better Than Bush
The Spin Cycle
Sycophants and Lackeys: Repulsive or Required?
Interns: Cheap Labor or Perquisites of Power?
Pardon Me
Why Winning With 43% is a Mandate, But Winning With 49% Ain’t
Go read the whole thing, in which he slices and dices Richard Cohen of the Washington Post as well. It's great.
Don't know much about history
The Washington Post reports the latest National Assessment of Educational Progress (NAEP) scores in history, and they ain't pretty. The scores of high school seniors are described as "abysmal", a word that doesn't leave much room for alternate interpretation. One fascinating point is that, "the NAEP results showed no significant difference in test performance between students whose teachers reported adhering closely to content standards, and those who had none to follow." The reason for this is not immediately apparent. The content standards may not be as rigorous as the NAEP standards. The teachers may not be teaching the standard material very effectively. Either way, I'd like to see follow-up on this topic.
More on single-sex schooling
Andrew Sullivan leads off with a Washington Post article on Bush's decision to encourage single-sex public schools. NOW and the ACLU are complaining, of course; their fear of "separate-but-equal" schooling is overriding their capability to acknowledge that, unlike in the past, these schools will not be segregated for racist reasons, but instead to provide the best possible education and disclipline for both sexes. They also don't seem to notice that schools who have already experimented with single-sex classes report great results, but then again, I wouldn't expect NOW to rejoice at improved test scores for boys. More power to Bush for encouraging this.
Wish I could think of something witty to accompany this article, but all I can say is, I would never have thought to examine the relationship of that variable with IQ scores.
Deep thinker Stephen den Beste mentions taking the SAT a few years back, in a discussion on the legality of minority-only scholarships.
Another good article from the Education Policy Analysis Archives - this one displays the high test scores and academic acheivements of home-schooled children. The authors note that because this is not a controlled study, the results don't demonstrate anything other than that home-schooled kids are doing well in their educational environments.
As if standardized tests weren't stressful enough...
Now the schools want to know whether a student thinks it's right to beat up someone or to make fun of another student, and if they are comfortable talking to teachers of a different color. These are questions from a survey given alongside standardized tests in Washington State. The original form of the survey required the students to put their names and identification numbers on these surveys. The ACLU, not surprisingly, has become involved.
"I'm here! I'm stupid! Get used to it!"
Mark Goldblatt proudly identifies himself as one of The Stupid Ones and asks, where's the profanity on the SAT?
Gym class is less stressful now, too
John Miller of the National Review reports that the Department of Education has cleared the way for more single-sex schools, as long as the plans for such schools is couched in the language of affirmative action, rather than parental choice. Administrators at Thurgood Marshall Elementary School in Seattle have already experimented with separate classes for boys and girls, because so many of their boys were being suspended for behavioral issues. They've discovered that not only did the boys' behavior problems decrease, their test scores increased. The girls, who were doing just fine on tests to begin with, continued to do so. All of the classes at this school are now single-sex, but under the current civil rights laws, once the problems (suspensions) have been corrected, the school may have to return to coed classrooms, which may have helped cause the problems in the first place.
A protest in writing
The Wall Street Journal runs a regular "Zero-Tolerance" watch on its Best of the Web page that reports the idiocies of such policies in public schools. Recently, they noted that a school in Greensboro, NC, tried to suspend a student for writing a sassy answer to a standardized test writing prompt. Long story short - the student did not understand the meaning of the prompt, and wrote an essay criticizing the choice of prompt. The administrators are bound by a code of ethics that states "school employees may not look at test questions or answers before or after exams." On the other hand, NC's "Department of Public Instruction said principals should take action against students who protest or boycott state tests." So what's a principal to do? Here's my take on it - the school's decisions to read an essay response instead of shipping the test off to the central office, and to have a sheriff's deputy sit in on the suspension meeting, do indeed fall into the category of "zero-tolerance" nonsense. A critical test answer doesn't count as a protest that merits "action".
The SAT comes full circle
Heather MacDonald (one of my favorite journalists) has a fantastic article about the SAT in today's City Journal. Kudos to her for pointing out that the SAT, as an aptitude test, was originally viewed as the meritocratic alternative to content-based tests, designed to give students from the public schools a chance to compate with the private school crowd. Ms. MacDonald does not shy away from blunt discussion of controversial topics, as can be seen by her closing paragraph, below.
"Expect the race industry to resurrect the same arguments against content testing as were used in the 1940s, but without proposing aptitude tests in its place. There is no reason to think that the test score gap will go away with a different test, since the explanation for it lies largely in a culture that devalues academic achievement. So after spending millions on developing a new test, the education profession will be left with its old options: shooting the messenger by blaming the test for differential academic outcomes, or finally telling the truth about the cultural changes needed to overcome lagging academic achievement. The sky will fall before the latter option comes to pass, so get ready for another decade of covert racial preferences and explicit excuse-making around the new SAT."
The plot thickens....
More on the Massachusetts test controversy. Go here to read a critique of the Massachusetts Comprehensive Assessment System (MCAS). This was posted today on the Education Policy Analysis Archives, which is a peer-reviewed scholarly journal on the internet. This critique was posted on a mailing list to which I belong. I will offer one quick caveat before I start - well, two caveats, which are that I gave it only the quickest reading, since I'm short on time today, and I wasn't able to double-check the sources that were cited, so I'm going to have to take Walt Haney's interpretation of the data at face value.
That much said, I think this is a very informative and interesting article, and I hope that people other than psychometricians will find it and read it. Dr. Haney makes a very common-sense (yet data-supported) argument that giving large cash incentives to schools on the basis of one year's worth of average test score changes is statistically unsound and unfair.
Also, I stated in an earlier post (scroll down for it) that the MCAS is criterion-referenced (test-takers are compared to a standard), and I got that from the official MCAS website. However, according to Dr. Haney, the test was developed in a norm-referenced fashion (test-takers are compared to each other), because items that were likely to have been answered correctly by a majority of test-takers during pilot tests were excluded during operational test construction. This process of test development is par for the course for norm-referenced tests such as the SAT, on which items that everyone gets right or everyone gets wrong would be useless for ranking the test-takers. Dr. Haney argues that the norm-referenced quality of the MCAS would result in a test that is not a very good or useful indicator of school quality, yet that is what it is being used for. Given that the norm- or criterion-referenced quality is usually a cut-and-dried aspect of a test, it's also not a good sign that there exists any ambiguity about it.
The author's main point is that test scores should never be used in isolation to rate schools (when cash or other incentives are on the line) or to make high-stakes judgments about test-takers, and I agree completely (as does the American Educational Research Association).
Two more provocative points in the essay:
The author cites recent research that allegedly shows how "low-tech" writing tests - those requiring children to write in shorthand - may underestimate the writing skills of children used to writing on the computer. In my work, I am heavily involved in the development of computerized tests and diagnostic software, and if paper tests do indeed underestimate computer-literate kids, that finding is crucial, and that's research I'd like to follow up on.
Finally, the author takes a potshot at the "widespread" errors in the testing industry. While the errors that have been hashed and re-hashed in the media, I've yet to see evidence that this industry makes enough serious errors so as to warrant concern about accuracy. Certainly, I don't there are more errors in this field than in any other research or statistically-oriented field. There are plenty of reasons not to use one test score, in isolation, to make a high-stakes decision, but I don't think that the quality of the testing organizations is a factor that needs to be weighted heavily when considering how to best use scores.
Check out the test scores for yourself
Follow the links to see your state's test scores online (some sites require Adobe Acrobat):
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
llinois
Hawai'i
Idaho
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
Email me if you know of a site that has more information than the ones I've provided. Thanks.
News from the Heartland
The Heartland Institute's School Reform News, that is. They have a good article this month on preserving the standards that references the "Reality Check 2002" data I linked to previously. I haven't seen it mentioned in any of the mainstream media that, "Teachers express slightly more concern with standards-based reform, but 'are largely untroubled by testing's impact in their own classroom,' ", have you?
Hit and run
Debra J. Saunders gives more evidence about liberal media bias, and makes a passing reference to testing, in the provocative and informative Jewish World Review. Glad to see that I'm not the only one who noticed that the media almost never runs stories about the support for testing among teachers and parents.
Girls do just fine on math tests
Another bit of good news - the "common knowledge" that boys do better in math may be more myth than reality. The Contrarian describes a study in which researchers from my alma mater examined the scores of 14,000 students in North Carolina (which has a very extensive statewide assessment system in place). I'll just quote the main paragraph:
"From this broad sample they found that girls scored higher, on average, in math than boys until about age 11, and girls achieved higher reasoning scores at ages 11 to 13. It turns out, however, that things are pretty even overall. By the end of high school, boys held an edge of 1.5 percent over girls, a figure of scant significance that surprised Leahey and Guo, who were expecting big differences."
Homeschooling and Feminism
The incisive Wendy McElroy has a nifty article today on whether a feminist can homeschool her child. She cites a 1999 survey in which one of the most popular reason for choosing to homeschool children was "dropping test scores". I'd be interested in knowing if, as the critics claim, there are parents who pull their children out of schools to avoid standardized testing altogether. I'm betting that's a much less prevalent concern than the media leads us to believe.
I stand corrected...
Hey, I just got my first email that said, "You screwed up". I'm glad to know there's someone out there reading this blog that closely.
My correspondent (who shall remain anonymous until he gives me permission to use his name) helpfully pointed out to me that I should have referred to the MCAS test as "standards-based" rather than "standardized", because the test is, in his words, "completely subjective, zero to many correct answers, subjectively scored by humans, scored against an arbitrary 'proficient' level deemed as the goal for performance 10 years from now". Good point, although it needs clarification, so now is a good time to discuss some definitions (taken from Educational Measurement and Testing by Wiersma & Jurs).
Standardized test: A test given under standard conditions so that the basis for interpreting the score extends beyond the site of examination and the group of examinees.
Norm-referenced test: A test on which the score is compared to those other individuals and groups who took the same tests. These tests tend to be more general and comprehensive, and scores are reported as position within a normative group.
Criterion-referenced test: A test that has an absolute interpretation that is referenced to a body of learning. These tests tend to be shorter and more focused on subskills than broad categories. Scores are percentage-correct, number-right, or some other mastery indicator. These are also called standards-based or mastery-based tests.
Most of the large-scale admissions tests, such as the SAT, GRE, etc., are norm-based standardized tests - the score is not interpreted as a mastery of certain subskills but is instead used to compare the test-taker to others on the same levels. This is why you will never see every test-taker scoring above the 50th percentile on these tests - even if everyone who took the SAT tomorrow made between a 700 and an 800, those students scoring below, say, a 725 would still be in one of the lower percentiles. On these tests, where the test-taker is in relation to the others is more important than the absolute score.
On the other hand, I believe that most statewide assessments are criterion-referenced, in which the important thing is to demonstrate that a test-taker has mastered a certain set of skills. On these tests, every test-taker can indeed make a high score, and what matters most is how high the absolute score is.
So, I downloaded the information about the MCAS from the Massachusetts Dept. of Education website. The MCAS is criterion-referenced, and the items measure what subskills students have learned in the areas of English, Math, Science, and History/Social Science. However, the test is still a standardized test, in that students take the same test, under the same conditions. What's more, the criticisms of the test that I addressed in my previous post (scroll down for it) were not about the standards, although I think that establishing standards is one of the most difficult and controversial parts of developing such tests. The criticisms seemed to be mainly about the standardization part - the claims are that teachers felt they had to teach the test material rather than what they wanted to teach, and that such tests did not engage students in learning. Those points were what I was debating, and those have nothing to do with the type of standardized test that it was. What's more, the fact that the MCAS has open-ended items that are scored by human raters does not affect the standardized nature of the test. Certainly, adding open-ended items and human raters can reduce the reliability of such scores, but the test itself remains standardized.
I'm grateful my correspondent pointed out to me the need to better define my terms. I stand by my earlier comments, though.
Harvard's scare session on standardized testing
I missed this when it first happened, back in March of this year. Frontline has come up with yet another anti-standardized testing documentary (a previous one purportedly reveals the "Secrets of the SAT" ). The new one criticizes the K-12 MCAS test used in Massachusetts, Virginia, and California.
I marvel at such broad criticisms as:
"Testing doesn't have the right effect on students" - Well, that's news to all those teachers who think that testing is a perfectly acceptable means by which to see if students actually learned anything.
"Those standardized tests don't really test anything" - This statement is so vague and meaningless that it is automatically ignored by anyone with even the most rudimentary knowledge of educational assessment.
"Students' poor performance may be caused by 'test anxiety,' suggesting a 'better test would factor in classroom work, student portfolios and teacher evaluations" - All of which are extremely subjective, and what happens when a student encounters a teacher who doesn't like him or his work? For kids who like to run with scissors, or who don't play well with others, an objective assessment is a savior. And if teachers don't have time to teach the basic skills measured by these tests now, how will they have time to do in-depth portfolio reviews and evaluations in the future? By watering down such portfolios until they're meaningless, or by dramatically reducing the grading scale, I bet.
"If you're against standardized tests, then you should come up with a new plan" - Finally, a sensible comment.
I'm sure Frontline means well, but no testing critic has of yet come up with an alternative to standardized testing that doesn't result in more work for these over-burdened teachers and doesn't make educational assessment even more subjective than it already is. I've had it with testing critics who can't even be bothered to present a well-thought-out alternative. As an example, here's one paper on what those teachers who are seemingly unable to prepare students for standardized tests will be required to do when grading portfolios. Portfolio assessment can indeed be an ideal way to assess a complex holistic skills such as writing, but people blather on about implementing them with no hint as to how time-consuming such assessments are and how incredibly difficult it is to score them reliably. You think kids are stressed by multiple-choice tests now? Wait until they can't even get a straight answer from a teacher about why their collection of writing samples gets a poor grade.
[Update: For a different viewpoint on the effects of that very same Massachusetts test, check out what Diane Ravitch, a visiting fellow of the Hoover Institution has to say. Another research fellow at the Hoover Institution makes the case for standardized testing in a brief and elegant manner. If you haven't visited the Hoover Institution website, go there now, because they have a great collection of education-related essays. I particularly like Caroline Hoxby's "Conversion of a Standardized Test Skeptic." Bravo. ]