There's a long, and interesting, article in today's NYT on the use of the FCAT for grade promotion in Florida. Blogger Nick has taken notice of the story as well, but his archives don't work, so you'll have to scroll down.
The story begins with the description of Derek, a "good student" who, like 23% of all young Floridians, failed the reading portion of the third-grade FCAT. Derek was determined to be promoted to fourth-grade, and so attended a four-week summer reading camp financed by the state. The camp doesn't sound like much fun, and it wasn't, and to get out of third grade, one had to score above the 51st percentile of the Stanford 9 exam. Only 15% of the camp-bound youngsters managed this, which suggests that the camp is extremely ineffective at teaching Florida's youngsters, or that their reading problems are more entrenched than anyone anticipated.
The end result is that the third-grade retention rate is going to be four or five times what it was a year ago. Derek is one of those who is going to be held back, because he scored at the 50th percentile - and here's where the controversy begins:
Derek missed by one question, scoring at the 50th percentile. His principal, Louise Brown, says he deserves to be promoted. "Derek's a late bloomer, just coming into his own — not everyone reads on the same time scale," Ms. Brown said. By scoring at the 50th percentile, Derek is reading better than half the nation's third graders. But according to the new state rules on retention, championed by Gov. Jeb Bush, the principal and the teacher have almost no say in promotion.
The standard error of measurement on the Stanford 9, developed by Harcourt Assessment, is 3.2 points, meaning Derek's score may reflect a reading ability above the 51st percentile...
As Nick points out, this also means that Derek could be reading below the 50th percentile. The reporter is correct to mention the SEM here, because that's a measure of variability for an individual student's scores, but we don't know how many percentile points 3.2 points translates to, and it seems a bit skeevy to base an argument on the SEM without pointing out that it cuts both ways.
Derek's principal's argument that "not everyone reads on the same time scale" is actually an argument against promotion, not for it. The point that supporters of the FCAT are trying to make is that fourth grade is not for everyone of the same age; it is for everyone who can read at the fourth-grade level. It's possible - perhaps likely - that Derek is not one of those kids. Thus, the fourth-grade might not be where he is supposed to be right now, because he's not on the same "time-scale" as everyone else.
For a reporter who's unafraid to mention the standard error of measurement, Michael Winerip seems awfully shy about pointing out the distribution of those who flunked more prominently than did Derek. Did most of Florida's flunkers hover around the 50%ile mark? Or was Derek chosen because he was the closest to the cutscore?
We do read that "hundreds" scored within the standard deviation for the passing score on the Stanford 9. The relationship between the standard deviation and the SEM for a test, in case you're wondering, is
s * (1-r)**1/2,
which is keyboard notation for the standard deviation times the square root of the sample size minus the reliability of the test.
Without the reliability of the test, we can't really tell what the standard deviation is, so we're still a little bit in the dark about how wide the band is. If the number is in the high hundreds, it's not surprising that 71 were close to the cutpoint, because we'd expect most kids to be massed up around the middle of the bell-shaped score distribution. If it's in the low hundreds, that might be a different story. Tens of thousands of Florida's third-graders had the chance to take the Stanford 9 for promotion; depending on the number and the distribution, for "hundreds" to be within one standard deviation is expected.
My guess is that the distribution of the FCAT flunkers on the Stanford 9 was exactly as expected - most everyone was in the middle to the lower end of the curve, which is more likely to be positively skewed than bell-shaped. But by choosing to tell the story of one kid who is close to the 85th percentile of the flunkers, and close to the grade promotion cutscore, the reporter invites readers to imagine that most of Florida's students fit this description.
In Florida's push to get every child reading by third grade, politicians have ignored the scientific studies on retention, which overwhelmingly conclude that students held back suffer academically, dropping out at a higher rate.
Why doesn't the reporter cite any studies here? I'm not trying to be mean; I'm just not aware of this "overwhelming" evidence. I mean, if kids who get held back tend to drop out at a higher rate, that isn't proof that holding kids back causes them to drop out later. Instead, it could simply mean that the same factors that keep kids from achieving early on - lack of intelligence or concentration; emotional disturbances; undiagnosed learning disorders - keep them from achieving later on.
Principal Brown, in fact, contradicts the reporter's statement with her own:
Ms. Brown is not against testing. Her school has an A rating from the state, based largely on strong test scores. But she says she does not believe that tests should replace human judgment and says that just a couple of her third graders should be retained.
"A child will not read any better whether he's sitting in a third-grade or fourth-grade classroom," Ms. Brown said.
If I'm reading this correctly, Ms. Brown is saying that promoting a kid to fourth-grade instead of retaining them won't help matters. Her statement is an argument against promoting poor readers, not for promoting them. It also seems to be a mighty pessimistic assessment of both third- and fourth-grade reading classrooms, doesn't it?
Those of you who read my comments section will notice that many of my readers have recently made logical statements about why third-graders should not be tested under stakes as high as this. I tend to agree with them. However, given that we currently have a culture (at least in Florida) in which third-graders are being held to these standards, it behooves us to examine the data accurately and see what it's telling us.
We can argue all day about whether to promote the kids who flunked, but I'd rather argue about why they flunked. What are the schools not doing that they should be doing? Are the test standards inconsistent with the classroom curriculum? Are kids of this age more likely to have incapacating test anxiety, or are they perhaps unable to grasp the implications of not trying their best? This article could have addressed these questions, but instead it gave us one sob story, one partial-sob story, incomplete data for our conclusions, uncited "overwhelming" research, contradictory statements, and complaints about summer schooling.
Most profoundly, I find it astonishing that the article, which is about the reading portion of the FCAT, highlights the fact that many more third-graders will be held back this year, but doesn't invite its readers to wonder what reading skills the test might be measuring that teachers didn't catch in the past.
Posted by kswygert at July 23, 2003 03:36 PM