Howdy, folks! Starting on April 6, 2008, I will be putting up a "testing roundup" post every weekend at Joanne Jacob's site. Swing on by and put in your two cents!
The first post is here.
UPDATE: And by the way, I have to point out this article in the NYTimes about psychometricians, which many of you probably saw two years ago. I just love that it came out on - yes - the day before my wedding day. Which is why I missed it at the time.
Is this good news or bad news for Indiana? You be the judge:
More than half of Indiana's seventh-graders passed the state's mandatory science exam, administered to that age group en masse for the first time last fall. Fifty-two percent of the state's 80,863 seventh-grade students passed the science assessment, a new section in the annual Indiana Statewide Testing for Educational Progress-Plus exam. That left 47 percent below the benchmark, said Mary Tiede Wilhelmus, spokeswoman for the Indiana Department of Education; 1 percent of the results were unscoreable."We certainly know we need to do better, but this is a starting point," said Tiede Wilhelmus. Two areas that gave kids trouble, she said, were sections on the nature of science and technology -- which included scientific investigation -- and one focusing on the physical universe.
Indianapolis schools fared much worse, with only 18% passing the exam. A sample exam is here. It's pretty open-ended. I had to guess on the first one - I remembered the orbits as being circular rather than elliptical, and that is one of the correct answers.
One suburban Philly teacher leads tai chi classes to help prepare kids for the state standardized exams:
A Centennial teacher is utilizing the ancient martial art of Tai Chi Chuan to help his students get ready for the state's standardized mathematics and reading tests.“I want you to concentrate. Think about what you are doing. Breathe in and breathe out,” Joseph Pisacano, a fifth-grade teacher at Everett A. McDonald Elementary School in Warminster, said Thursday morning as he and the students made smooth, circular motions with their arms and hands.
The class went through the Tai Chi relaxation exercises as calm, soothing music played quietly in the background.
“Think about how wonderful you are going to do on the test and how easy it's going to be. Think about all that you have learned this year. Take all of that, and use it,” the teacher said as the class wrapped up the exercise session by sitting in a meditative position on the floor.
As long as he's making sure to actually teach the material along with some relaxation exercises, I'm all for it. This approach is worlds better than the chicken-little squawks you see from some educators who are so stressed out about testing that they end up stressing out students as well.
The idea of standardized testing in college is still gaining attention - most of it negative:
A parade of college presidents will appear before a federal higher-education commission meeting in Boston tomorrow, and early signs suggest it will be a lively, even contentious scene. Texas businessman Charles Miller, chairman of the commission appointed by Education Secretary Margaret Spellings, has made waves by suggesting that some kind of standardized testing would help measure whether college students are taught well. There is no formal proposal yet, and Miller has stressed in press interviews that there would be no single test for every school. Still, the idea has alarmed many educators.Susan Hockfield of MIT, who had lunch with the Globe editorial board last week, didn't mince words when asked about the testing notion. ''I think it's a terrible idea," said Hockfield, who is scheduled to testify tomorrow. ''Higher education needs help, but what is really broken is K-12 education. We need more high school graduates who can understand and do math."
In other words, there's a problem, but it can't be fixed in college. I agree with Dr. Hockfield, but I wonder if she knows how strong the opposition can be to "fixing" anything in the K-12 system:
At School Without Walls and two other high schools where I am a guest teacher -- Wilson High School in the District and Bethesda-Chevy Chase High School in lower Montgomery County -- I have never given a test. I respect my students too much to demean them with exercises in fake knowledge.Tests represent fear-based learning, the opposite of learning based on desire. Frightened and fretting with pre-test jitters, students stuff their minds with information they disgorge on exam sheets and sweat out the results. I know of no meaningful evidence that acing tests has anything to do with students' character development or whether their natural instincts for idealism or altruism are nurtured.
I have large amounts of evidence that tests promote the opposite: character defects.
Good luck convincing this teacher that reading and math basic skills can and should be assessed. She's much more concerned about "idealism" and "altruism." How far do those get one at MIT, I wonder? As Betsy's Page notes, this teacher teaches a class on peace. Most of his students love him, it seems - and wouldn't you, if testing was banned? - but not all:
At Bethesda-Chevy Chase, Peace Studies is taught by Colman McCarthy, a former Washington Post reporter and founder and president of the Center for Teaching Peace. Though the course is taught at seven other Montgomery County high schools, some say B-CC's is perhaps the most personal and ideological of the offerings because McCarthy makes no effort to disguise his opposition to war, violence and animal testing.Saraf and Avishek Panth, also 17, acknowledge that with the exception of one lecture they sat in on this month, most of what they know about the course has come from friends and acquaintances who have taken the class. But, they said, those discussions, coupled with research they have done on McCarthy's background, have convinced them that their school should not continue to offer Peace Studies unless significant changes are made. This is not an ideological debate, they said. Rather, what bothers them the most is that McCarthy offers students only one perspective.
Of course he does. This is a crusade for him. One that doesn't involve anything as nasty and dehumanizing as testing (and that's even funnier than "high comedy," according to one Devoted Reader who forwarded the link). After reading his WaPo diatribe about testing, it's hard to believe that he actually is "welcoming of conservative dissention" on any topic.
When the stakes are high, do the results make sense? Jay Greene, Marcus Winters, and Greg Forster address the issue:
Several objections have been raised against using standardized testing for accountability purposes. Most concerns about high-stakes testing revolve around the adverse incentives created by the tests. Some have worried that pressures to produce gains in test scores have led to poor test designs or questionable revisions in test designs that exaggerate student achievement...Others have written that instead of teaching generally useful skills, teachers are teaching skills that are unique only to a particular test...Still others have directly questioned the integrity of those administering and scoring the high-stakes tests, suggesting that cheating has produced much of the claimed rise in student achievement on such exams..Most of these criticisms fail to withstand scrutiny...This study differs from other analyses in that it focuses on the comparison of school-level results on high-stakes tests and commercially designed low-stakes tests. By focusing on school-level results we are comparing test results from the same or similar students, reducing the danger that population differences may hinder the comparison. Examining school-level results also allows for a more precise correlation of the different kinds of test results than is possible by looking only at state-level results, which provide fewer observations for analysis...
The conclusion? That, within a school, correlations between high- and low-stakes tests tend to be large and postive:
The finding that high- and low-stakes tests produce very similar score level results tells us that the stakes of the tests do not distort information about the general level at which students are performing. If high-stakes testing is only being used to assure that students can perform at certain academic levels, then the results of those high-stakes tests appear to be reliable policy tools. The generally strong correlations between score levels on high- and low-stakes tests in all the school systems we examined suggest that teaching to the test, cheating, or other manipulations are not causing high-stakes tests to produce results that look very different from tests where there are no incentives for distortion.
Maine is now testing its third-graders regularly, and teachers are having to walk a fine line of test prep and emotional support:
They want the students to take the tests seriously, because they're used by the state and federal government to measure whether schools are teaching students math and reading effectively. But they don't want to pressure young students new to the game of high-stakes testing."We tell them to just do their best," said Hall-Dale Elementary School third-grade teacher Maureen Mathews. "You want to them to know it's important. But you don't want to make them nervous."
Love the photo of one neophyte examinee, who appears to be a direct descendant of Dame Judi Dench.
Critics are howling about ten-buck-an-hour temps grading FCAT essays:
Critics were fuming Friday after learning the FCAT -- the standardized test that will leave a permanent mark on the academic future of thousands of Florida students -- will be graded by $10-an-hour temporary workers who are required only to have a week's training and a bachelor's degree."It's just incredible to me that after all of the pressure that is placed on me to maintain my teaching credentials, the countless hours spent in workshops, and then they turn around and hand these tests off to be scored by a bunch of temps," said David Worrell, president of the Leon Classroom Teachers Association. "It's just insulting."
The DOE says that many of the workers have teaching experience and are very familiar with the exams. Keeping permanent full-time graders would certainly up the costs of the exam. Others say that criticizing the temps misses the point:
Of the many legitimate concerns raised about the FCAT over the years, this is the least of them. Temps are used to grade other important exams, such as the ACT college entrance exam, and the FCAT graders will have bachelor's degrees and training. Those grading essays will handle questions that relate to their college degree. Essays are to be graded twice, and possibly a third time.The real problem is that, under Gov. Jeb Bush, the FCAT has been wielded like a cudgel in the evaluation of school quality - and now, teacher quality.
In other words, they're fine with how the essays are graded; they're just not happy with the FCAT use as a whole.
Hey, look, it's a foolproof method for making college educations cheaper!
Oh, wait, that's not what they meant. Never mind. I'll just point out that the author is against college-level standardized tests, in part because, "achievements in math and science will speak for themselves." Couldn't that theory be used as an argument to test everyone who doesn't take those kind of courses in college?
For more than 20 years, FairTest, a small nonprofit group headquartered on the second floor of an old house here, has been the No. 1 critic of America's big testing companies and their standardized tests. In 1987, when FairTest began publishing its list of colleges that did not require applicants to submit SAT's, there were 51; today there are 730, including Holy Cross, Bowdoin, Bates, Mount Holyoke and Muhlenberg....for all FairTest's impact, its days may be numbered. Never before has standardized testing so dominated American public education, thanks to the 2002 federal No Child Left Behind Law. Every child from grade 3 to high school must now take state tests. And the Bush administration is considering extending those tests to colleges.
"With N.C.L.B., a lot of people feel the debate is over," said Monty Neill, director of FairTest, officially the National Center for Fair and Open Testing. "The attitude seems to be, 'Testing is so pervasive, what's the point?' " Support from foundations has virtually dried up and individual donations have not made up the difference. "Our board has seriously discussed whether to fold the operation," Mr. Neill said.
I find this a pretty revealing comment. There's always a need for testing to be scrutinized, for tests to be evaluated, and for the public to be informed. But I've always sensed that FairTest's commentary was always anti-any-testing, not pro-good-testing. Now that testing is so pervasive, it's not helpful to bash tests rather than inform the public. ETS's president seems to agree:
Kurt Landgraf, the president of the testing service, which administers the SAT, wrote in an e-mail message: "Perhaps if they had been more attuned to the public's support for using tests to help teachers teach and students learn, then they might have had wider support."
Further along in the article, I don't quite get the point of NYT reporter Michael Winerip listing this as though it's a bombshell:
In a recent newsletter, FairTest printed an analysis of SAT results, using, and crediting, College Board research showing the direct correlation between family income and SAT scores. For every extra $10,000 a family earns, children's combined math and verbal scores go up 12 to 31 points. So children whose parents earn $50,000 score better on average (a combined 996 SAT) than students from families who earn $40,000 (967) but worse than students from families who earn $60,000 (1014).
Okay, who alive today doesn't know that kids with more money tend to have more educational advantages? It makes sense to me that kids from wealthier homes do better on all educational indices; if they didn't do better on the SAT, parents would question the efficacy of private schools and tutoring. Why this is being mentioned here as though it's surprising knowledge - or a valid test criticism - is beyond me.
At the same time, correlation doesn't equal causation. Just because A and B are correlated, that doesn't mean A causes B. B might cause A, or some C could be causing both to happen. Smarter parents might make more money, and their kids get both the nature and nurture benefits. We are trying to close the gap by offering all students better opportunities, but a test that doesn't reflect when kids know more material, either by virtue of schooling or parental largesse, is a pretty useless test.
Jay Mathews offers a spirited defense of a process much maligned in the education world - "teaching to the test":
When we say "teaching to the test," we should acknowledge that we are usually not talking about those drill fests. Rather, we often use the phrase to refer to any course that prepares students for one of the annual state assessment exams required under the No Child Left Behind Act. For reasons that escape me, we never say a teacher is "teaching to the test" if she's using a test she wrote herself. We share the teacher's view that what she is doing is helping her students learn the material, not ace the test. But if she is preparing the class for an exam written by some outsider, the thinking goes, then she must be forced to adhere to someone else's views on teaching and thus is likely to present the material too quickly, too thinly, too prescriptively, too joylessly -- add your own favorite unattractive adverb......Conversations about this would go more smoothly if we didn't have such distorted views of what teaching to the test means. We might instead turn the discussion to what methods of instruction work best or how much time our children should spend studying.
The more-pay-for-higher-scores plan in Florida has passed the Board of Education approval stage:
The Florida Board of Education unanimously approved a plan Tuesday that will give some teachers bonuses based solely on their students' performance on standardized tests.As early as next year, the plan will award the top 10 percent of teachers in each school district a 5 percent bonus based on learning gains shown on the Florida Comprehensive Assessment Test.
If districts want to reward more teachers, they can. But there may not be state funding for it, officials warn. The plan also will require the state to create exams or other assessments in every subject not covered by the FCAT.
Whew. Possibly no funding and definitely more testing? Gee, wonder if this plan is causing any controversy. (Obviously, I'm being sarcastic here.)
Florida Education Commissioner John Winn wants to get serious - about tying teacher bonuses to FCAT scores:
In Miami-Dade and Broward counties, teachers could earn bonuses ranging from $1,710 to $4,150 per year on salaries that range from $34,200 to $83,070.The policy would add another set of consequences to Florida's high-stakes accountability system, which already determines school grades, high-school graduation and whether students can progress from third to fourth grade.
Winn said the Effective Compensation plan, which he dubbed E-Comp, would encourage teacher improvement, reward excellence and bolster recruitment and retention.
The critics, they disagree:
E-Comp would replace existing bonus systems in Miami-Dade and Broward, both of which Winn said were unacceptable.In Miami-Dade, a 5 percent bonus is given to all teachers at 28 schools that have the largest gains in FCAT reading and math scores. UTD President Karen Aronowitz said that system is fairest because many teachers contribute to a student's success. ''When we send firefighters to a fire, do we pay them differently based on who handles the hose?'' she asked.
No, but we do tend to get rid of those who don't pick up their end of the hose, especially if the end result is a house burnt down to the ground. If the public were secure in the knowledge that bad teachers would not only not share in school-wide compensation, but would get dismissed to boot, I'm not sure that merit pay for especially-good teachers would even be an issue.
The new GRE grows ever closer:
According to the ETS Web site, the changes will better gauge students' preparation for graduate school by measuring general academic skills with more precision than in the past. A single 30-minute verbal section will be changed to two 40-minute sections. Sections on analogies and antonyms will be removed, while new sentence equivalence questions will be introduced and critical reading sections will be expanded. Quantitative reasoning -- lengthened from one 45-minute section to two 40-minute sections -- will include less geometry and more data interpretation and word problems. The test will be graded on a scale of 120-179, as opposed to the current 200-800 scale.
Some students resent the newer, longer length:
...Sam Penziner '07, who is also planning to attend graduate school, said he thinks the longer exam will measure test-taking stamina rather than skill. "Making the test longer emphasizes factors like endurance and stress that affect performance," Penziner said.
And graduate school doesn't require endurance and good stress-coping strategies?
University of Arkansas at Fayetteville’s education researchers doubt that standardized test scores are the best indicators of school district performance:
The statistical analysis — which will be presented today to the Arkansas Association of Educational Administrators — concluded that Arkansas students are achieving slightly better than the nation as a whole. And several Arkansas districts not typically recognized for their academic excellence top the scale of high performers...“The School Performance Index in Arkansas” takes into account student demographics and the levels of affluence and education in a community. It predicts what student achievement in a school or district should be on the basis of those factors and then compares those projected achievement levels to actual standardized test results.
So, if I understand this, they're predicting how well a school should do based on various demographic and SES levels, and then comparing those predictions to the real test scores. They're concluding that raw test scores shouldn't be used to compare schools, but instead should be adjusted to show how well the school is doing given all these predictor variables, so that schools with students who are predicted to do poorly should not be considered bad schools if they produce mediocre test scores. I'm not sure I agree with that conclusion.
One interesting side finding:
The school analysis, which Greene said could be further refined by the state, showed that school performance on the Iowa Test is “partially” affected by the level of household income, educational attainment of adults, and the percentage of married families in a district. The scores are “substanially affected” by the percentages of black students and students who qualify for reduced meal prices at a school.In contrast, the study concluded that school performance is not affected by the size of a school or a district or by the amount of money spent in a district.
Gee, wonder why the headline for this article wasn't, "Schools don't need more money to perform better?"
A UMass-Amherst research claims the Connecticut Academic Performance Test is extremely predictive of college success - even more so than the SAT:
Stephen Coelen, a researcher from the University of Massachusetts at Amherst, tracked 32,653 members of the Class of 1998, comparing how well they did as sophomores on CAPT to how many applied to, enrolled in and did well in college. On every measure, he found the higher the CAPT score, the more students were likely to go to college, avoid remedial courses in college, get higher grade point averages in college and graduate.When matched against SATs — the College Board exam students take to predict college success, Coelen said both exams helped explain student success in college. Of the two, Coelen said CAPT "was always correct. SAT was not always correct.
Interesting. The "going to college" part could, I think, be affected by the possibility that those who score high on CAPT in 10th grade spend the next couple of years being groomed by teachers for college. If their CAPT scores affect their high school class placement or treatment in any way, then it wouldn't be surprising that CAPT would correlate with college attendence - it would be one of the predictors of it.
Interesting also to see that the CAPT apparently has a high positive correlation with college grades, but given the outcry we hear these days about grade inflation, one wonders if this is really a positive thing about the exam.
Commissioner of Higher Education Valerie Lewis said the study proves the value of CAPT in predicting college success and should be recognized by college admission staffs as a valuable piece of information when they admit students.Lewis also found it startling that 10 percent of students who score very high on CAPT never show up in college. That means some talent is going untapped and underdeveloped...
I find 10 percent startlingly low We're not being told what "very high" means, nor do we know the shape of the distribution. The study was composed of around 32,000 kids; if "very high" means the top 5% of scorers, we're talking about less than 200 smart kids from Connecticut in that graduating year who passed on college. I would think that family issues, financial issues, health issues, and lifestyle issues would affect that preclude college would affect at least 10 percent, maybe more. These are people, not automatons, and if they were that smart, they may have well decided that they wanted to do something other than pay thousands of dollars a year for additional education.
Third- through eighth-graders in New York must now take state standardized exams, and for three of those grades, the tests will determine promotion:
Today, grades three, four and five had a multiple choice test. Tomorrow the same grades will listen to stories, and then write about what they've heard. On Thursday, 4th graders only will be asked to read a passage and then write about it.This year marks the end of citywide tests and the start of statewide testing for all students in grades three through eight because of the federal law 'No Child Left Behind.' Also this year, state standards are higher.
You can look through the PowerPoint presentation here that gives an overview of the exams. Core curricula in English and Math are also available. A sample exam for Grade 3 looks pretty simple
More testing might be on the way in Florida - and that could be a good thing:
Florida high school students may someday have to take end-of-grade tests in history, literature, biology and other key subjects -- possibly in addition to the FCAT. Members of a state task force on high school reform are suggesting the tests as a way to make sure students are really learning what the state says they are supposed to learn...New York and Texas already use similar tests, and some Florida school districts have adopted them, too. ''We're looking for something that's going to help students achieve at a higher rate, not looking to multiply the number of tests out there,'' said state education Commissioner John Winn. "But an end-of-course test is a way to have some consistency in proficiency level.''
Chett at ReformK12 just pointed me to a nifty online list of psychometric resources - The Statistics Resources for the Center for Research, Evaluation, and Program Development, Boston Graduate School of Psychoanalysis. I'd recommend the books on IRT, except that I haven't read them - I'm old school and rely on the classic Hambleton & Swaminathan IRT bible. I actually rely on a lot of "bibles" that aren't listed here, like Educational Measurement and Psychometric Theory. A tad ironic that I mainly read these old books, since I'm the book review editor for a measurement journal, but oh well.
I also would be remiss if I did not mention the IRT software packages BILOG and *cough*my-advisor-created-this*cough* MULTILOG. Even I can't claim they're easy to use - unless you're fluent in FORTRAN, and a frightening percentage of psychometricians are - but I see them get a lot of use in testing organizations.
The tweetles, trills, blats, waa-waas, and crashes of Florida's band students may soon be measurable via a standardized exam:
It won't resonate as loudly as the FCAT, but Florida schoolchildren could soon face a new test of how well they're learning music. The test is being developed by two statewide groups of music teachers who see it as a way of reinforcing the importance of music in a well-rounded education and measuring how well it is being taught around Florida."Music educators are accountable every time their students step up on the stage, but they felt they needed a more formal way (to measure their learning)," said Timothy Brophy, a University of Florida assistant professor of music education, in a telephone interview from Gainesville...
The first phase of the music tests could begin in fourth and eighth grades as soon as 2007 if all goes as planned. It will consist of a paper-and-pencil test in which pupils respond to a series of questions recorded on a compact disc, including musical passages.
Brophy said the next two phases of the testing program are expected to include actual musical performances that would be recorded and compared with a standard for the appropriate grade level.
Tests for music students in other parts of the country should be modified to conform to local standards and tastes. For example, band geeks in South Dakota should be tested on their awareness of just how seductive the saxophone can be.
Testing marches onwards, as 23 states expand their testing programs:
Forty-eight states and the District of Columbia will give standards-based tests in reading and mathematics in grades 3-8 and at least once in high school this school year, as required by the nearly 4-year-old federal law, according to a survey by the Editorial Projects in Education Research Center.The holdouts are Iowa and Nebraska. Districts in Iowa give the Iowa Tests of Basic Skills, a national test not designed to measure state or local content standards, while districts in Nebraska craft their own tests, except for a state writing exam.
In devising the new tests, most states have defied predictions and chosen to go beyond multiple-choice items, by including questions that ask students to construct their own responses.
Hoo boy. Could be good, could be very problematic (and expensive) to score. The entire article is worth a read.
Opinion Journal's Best of the Web wonders if there's been an interesting spin on test scores in Michigan. Here's the bad news:
Michigan African-American fourth- and eighth-graders scored much worse in reading and math than African-American students in the United States as a whole, according to national test results released Wednesday. The 2005 National Assessment of Educational Progress results show significant improvement in math. But the test results show little improvement in reading scores overall since 1992.
Now, the good news:
The good news is that the gap between black and white students' scores in math and reading within Michigan has decreased.
BotW assumes that the gap could not have closed without white students performing worse, and this article (interestingly) does not mention white student performance. But it does say that black students have improved in math and stayed level in reading. The "reverse Lake Wobegon" effect might be happening for reading, but not necessarily for math. If the black students have improved in math while the white students stayed even, and stayed level in reading while the white students declined slightly, we'd see this pattern.
I couldn't find change information by ethnic group and state in the 2005 report, but if you manage to see that anywhere, let me know.
The GRE, which made waves in 1993 by transforming into a computer-adaptive exam, is being revamped again:
Although the test will still include sections on verbal reasoning, quantitative reasoning and analytical writing, every section is being revised, and the test lengthened to about four hours, from two and a half hours. About 500,000 students, 20 percent to 25 percent of them foreigners, take the general G.R.E. each year. E.T.S., which administers the test, also offers subject-matter tests in such fields as biology, mathematics and physics, but those tests, taken by far fewer students, are not being changed.To enhance security, every question on the new exams will be used only once, and the test will start at different times in different time zones, so students who have finished cannot pass on questions to those in different zones...
As of next year, the test will no longer be "computer adaptive," with test-takers getting questions tailored to their performance on previous questions, so that each gets challenging questions that provide a clear picture of what they can do. Instead, every student taking the test on a particular day will get the same questions, and those questions will not be reused.
Computer-adaptive tests, as ETS and others have discovered, require item reuse and enormous item pools to prevent any one item from being exposed too often. Good GRE items are not cheap nor easy to come by, and this change addresses security questions and helps to ensure the relationship between the items and the construct by reverting back to the one-use-per-item model. Of course, removing the adaptive algorithm also involves lengthening the exam, because the range in item difficulties for each form once again must be wide enough to adequately assess the geniuses and those who should probably strike graduate school off their "to-do" lists.
Update: Ah, the joys of knowing so many experts. I think this post was up for about two seconds when another psychometrician emailed me to remind me about the CAT version of the ASVAB, the Armed Services Vocational Aptitude Battery. That actually came out before the GRE-CAT, in the late 1980's, and I certainly should have remembered that, considering the truly phenomenal and exhaustive primer that exists on the subject, Computerized Adaptive Testing: From Inquiry to Operation. I can't recommend that book highly enough for anyone who wants to learn more about developing, testing, and implementing a computer-adaptive test.
The advent of computer-based testing is worrisome to would-be med students:
As if pre-meds did not already have enough to worry about, recent changes to the MCAT have some pre-med students worried about more than just mastering its content.The Association of American Medical Colleges announced recently in a press release that it will convert the MCAT to a computer-based format within the next two years, a move that will force both students and test-prep companies to sharpen their strategies for tackling the test, rather than their pencils.
The paper format of the test will be administered through 2006, though trial versions of the computer-based test will be given at the August 2006 testing date.
There are many advantages to computer-based tests; three of the biggest ones are the shorter test length (I'm assuming here the new MCAT is adaptive as well as computerized), more testing opportunities, and shorter score report turnaround time. Based on my moderate experience with CBTs, I think that some of the fears of future examinees will turn out to be unfounded:
"With the computer-based test you can't underline passages and put notes next to text, so it's hard to map out the progression of a passage. You can't keep track of it when you have to keep switching from a computer to your notes," said Patrick Wiita, a fourth-year microbiology, immunology and molecular genetics student who has taken the MCAT.
Underlining text onscreen is a fairly simple feature to add, if it isn't already included. And it's also possible to allow a typing area on-screen for notes or comments, for those who don't feel comfortable using scratch paper.
There's also concern about computer glitches, Wiita said. "Who knows what errors could occur in programming. It's the same reason they haven't switched to online voting. If there's an error, there's no paper trail," Wiita said.
Why on earth would you want a paper trail when you can capture every keystroke? Certainly, big programming errors can occur - but paper tests can vanish in transit just as easily. If anything, the software that processes computerized testing errors tend to provide more information to the testing companies than paper errors. There will be exact records of when screens go down, where examinees were when the error occurred - and most CBT providers that I've seen have little problem restarting a test on the correct screen after an error occurs. Any testing company thinking about CBT administrations correctly places a lot of focus on error prevention and recovery.
Mustafa said new testing formats create a lot of work for test-prep companies, especially research into how students feel about the test. "We did a survey on 4,000 (students) to see how they're feeling, what they think. Eighty-two percent said they would do worse on a computer-based test," Mustafa said.
I've yet to see any research showing that examinees in fact do worse as a whole on computer-based tests. Some subgroups, in fact, do better in certain construct areas. Regardless, this is an easy enough question to answer; matched examinee groups can be compared with respect to P&P and CBT scores.
Of course, examinees may experience a feeling of doing worse if they're switching from a P&P test to an computer-adaptive test (or CAT), since the items will be tailored to their ability level and they'll see more of what they consider to be hard items. But CAT scoring scheme takes item difficulty into account, so that two examinees could miss the same number of items but end up with different scores, based on the difficulty of items that were answered correctly.
CATs have been operational in a high-stakes, large-scale environment since 1993 (when ETS pioneered the GRE-CAT), so there's quite a bit of theoretical and operational research out there to guide the development and refinement of the MCAT-CAT. Good luck to them.
An op-ed in the New York Sun and the folks at The Daily Howler focus on a recent article praising Wake County (NC) for raising test scores. The real picture, it seems, is not what the NYT the made it out to be.
Here's what the NYT printed:
Over the last decade, black and Hispanic students here in Wake County have made such dramatic strides in standardized reading and math tests that it has caught the attention of education experts around the country.The main reason for the students' dramatic improvement, say officials and parents in the county, which includes Raleigh and its sprawling suburbs, is that the district has made a concerted effort to integrate the schools economically.
Here's what the Daily Howler concluded, after a spot of online digging:
Wow! Times readers felt a familiar glow; 80 percent of Wake County black kids scored at grade level on last spring’s tests! But here’s what Finder didn’t tell you—across the state of North Carolina, 77 percent of all black kids scored at grade level on those same tests! That’s right; the Times devoted this front-page story to a three-point difference in passing rates—a three-point difference in passing rates on tests almost everyone passes!...WHY YOU’RE BEING PLAYED THIS WAY: Finder’s piece has an obvious sub-text. Wake County is busing to achieve economic integration—and this is producing big score gains.
For ourselves, we would favor such a program as long as the voters were willing. But these Wake Country test scores provide little evidence of big pay-offs in minority achievement if you enact such a program. Yes, Wake has shown good score gains (most likely on easier tests)—but so have schools all over the state! How can Wake’s program account for gains which are happening in all the state’s districts?
Yes, the gains are occurring all over the state...But apparently, Finder didn’t want you to know that. As good pseudo-liberals have endlessly done, he just wanted you feeling real good about a type of program he favors. As good pseudo-liberals have shamelessly done, he wanted you thinking something bogus and cruel: When it comes to the education of poor black children, success is right there for the taking...
And here's what the op-ed writer concluded:
Intrigued by the story's claim that the percentage of Raleigh's students achieving proficiency had risen dramatically over the past several years, my research assistant, Mark Linnen, took it upon himself to check out the data available on the North Carolina Web site. Over the past 10 years, the percentage proficient or better in grades 3-8 in Raleigh (Wade County) had in fact risen by 13% in math and 12% in reading between 1995 and 2005. That seemed to confirm the bragging of local officials - until it was discovered that, statewide, proficiency rates were up by 21% in math and 19% in reading - gains that outstripped those in Raleigh by over 50%. Nor did the proficiency rates of Raleigh's black and Hispanic students climb any faster than the statewide average for these groups. In fact, the gains were somewhat smaller.Not that proficiency rates in North Carolina mean much. The state has some of the worst state standards in the country. Last spring, my Education Next co-editor, Rick Hess and I gave North Carolina's proficiency standards one of the worst marks in the country - a D minus. (By comparison, South Carolina got an A.) So low were the standards that 85% of all North Carolina eighth graders was said to be proficient in reading, despite the fact that only 29% of the state's eighth graders was found proficient on the National Assessment of Educational Progress, the nation's report card.
One can only suspect that the allegedly astronomical gains in North Carolina - in both Raleigh and elsewhere - were simply a function of a dumbed-down scoring system.
Does the NYT even realize that all the NC scores are online, and that readers can do the math for themselves?
Parapundit has a few things to say on the topic as well
How well would you do on the FCAT? Come Wednesday, you can find out:
State officials will put an old edition of the Florida Comprehensive Assessment Test online so people can see what it's like to be in 10th grade.If you don't know much about history, don't know much biology, you might still do well. The test measures ability in reading and math. The questions were from the 2004 exam, and most likely won't be seen on any test again any time soon.
State education officials said Friday the test will be on the department's Web site. People can download or print the test and take it like students would, then get the answer key to see what they got right and wrong.
The Lodi Unified school district (CA) is unhappy that membership in the gifted-student program Gifted and Talented Education (GATE) is down, and doesn't seem to reflect the ethnic diversity of the local population. So, of course, they've suggested tinkering with the admission formulas:
Trustees recently approved changing the requirements of the Gifted and Talented Education program to allow more opportunity for minority students to join its ranks.Students are now eligible to receive extra points on top of their standardized test scores for limiting factors present in their lives. So, for example, a child who has a learning disability, is an English language learner or comes from an impoverished or culturally diverse background will receive special consideration.
"We've been seeing lately our (GATE) enrollment is down," Lodi Unified Superintendent Bill Huyett said Monday. "That's an indicator that it needs to be opened up a little"...
Despite past attempts to diversify its gifted programs, the district found GATE enrollment did not reflect the makeup of the district.
Imagine that. Schools tend to find that academic grades, test scores, and other measures of academic achievement are not randomly dispersed among ethnic group members. Seen in that light, the fact that GATE participation doesn't mirror the "diversity" of the district is a validation of the GATE admissions criteria. Unfortunately, "diversity" trumps all, so these educators would like to muddy the waters by including some very fuzzy criteria.
Who's going to define what a learning disability is? What criteria will be used? How severe does the disability need to be? Why would the district expect that a student with any sort of learning disability would be able to handle the GATE curriculum? How long can a student be living in the US and still be considered an English language learner? How are we defining an impoverished background? And why would any of these factors, which could be considered a disability, be lumped in together with being from a "culturally diverse" background, which is not? Does this mean Asian and Indian children who are already off the top of the testing charts will be even more likely to be accepted? Or will some cultures be considered more diverse than others?
I agree that the codes are in conflict; it's ridiculous to define a program as being for kids who are on the top end of the intelligence scale and then also demand that the kids in that group be diverse in any sort of multicultural way. They will be who they are, and if the district is truly unhappy to find out that any particular group seems underrepresented, the solution is to investigate why that might be, not to fudge the numbers afterward with some ill-defined and highly-unreliable admission criteria.
What really frosts my shorts about this whole scenario is that the programs are for the benefit of the students enrolled in them. Therefore, the adults involved should be committed to developing admission criteria that guarantee, as much as possible, that any particular student is ready for more challenges, will benefit by more challenges, and will complete the program with the increased self-esteem that comes from tackling, and overcoming, tough educational programs. The current criteria address this by requiring that students perform above a certain percentile on the Raven Progressive Matrices or the CAT-6 exam.
Instead, the adults involved are dithering over the fact that the "diversity" is not what it should be, and have suggested criteria that may in fact be negatively related to the ability of any particular student to do this work or to benefit from it. Does that sound like a plan that's in the students' best interests to you?
The California Association for the Gifted, or CAG defines gifted students as:
...a child enrolled in a public elementary or secondary school of this state who is identified as possessing demonstrated or potential abilities that give evidence of high performance capability...
"Abilities that give evidence of high performance capability." Period. There is no research supporting the notion that overcoming learning disabilities, or learning English late in life, or coming from an impoverished background, are positively related to high academic ability. Therefore, lowering the academic standards for these students makes no sense. If a child shows that he or she is capable of doing the work, these factors should not be used to exclude them. I see no reason why they should be used to include them, either.
For schoolchildren in NYC, their number 2 pencils will now last twice as long:
To the relief of thousands of pencil-biting children and their parents, state and local education officials have reached an agreement that means that New York City's third, fifth and seventh graders will have to take only one round of standardized tests this school year.A conflict between the state and the city had raised the likelihood that children in those grades would have to take four tests over the course of the year - two reading and two mathematics exams - with one set of results for the city and the other set for the state.
Under the terms of the agreement reached yesterday, students will take only the two state tests.
As one educator was quoted as saying, "Common sense prevails."
New Hampshire wonders which came first - the extracurricular activities, or the high scores?
A questionnaire given to 10th grade students taking part in this past spring's New Hampshire Education Improvement and Assessment Program testing also shows that students who work limited hours while attending classes also have a firmer grasp on subjects...This year's 10th graders were offered a questionnaire that sheds lights on how activity outside the classroom might impact learning...a statewide analysis of results showed that student who took part in five or more extracurricular activities (i.e. sports, band, theater and more) had the highest mean scale score on this years test.
Students who took part in five or more such activities — of which Laconia had 9 percent responding — had a statewide mean scale score of 269 out of 300 points in reading and 270 out of 300 in math...The questionnaire showed that students who took part in no extracurricular activities...[had mean scores that] place [those] students in a category that identifies them as having a "basic" knowledge of those subjects.
The polling data showed in general that the more extracurricular activities a student took part in, the higher they scored....One difficult thing to determine is whether students are performing better because they are involved in such activities or whether students who are more committed to school are more likely to engage themselves in endeavors outside the classroom...
Yes, indeedy, it is a difficult thing to determine. It's not a surprise that students who work fewer hours, read more outside of class, and take part in more outside activities have higher test scores, but it isn't simple to disentangle these and draw firm conclusions about what causes what.
For example, students could be urged to work less outside of school, which would leave them more time for reading and other activities. But students from low-income families might be forced to work many hours, and it might be the genetics, family dynamics, home environment, and/or lack of parental education that drive the low test scores more than the time spent behind the counter at Burger King. For such a student, the familial incentive to read (or join the marching band) might be nil, while the time spent as a cashier might be helping them learn valuable job and cognitive skills as well as keeping them from unproductive extracurricular activities.
But there was one survey result that suggests a pretty clear cause-and-effect relationship (if I understand the tortured phrasing correctly):
Students who responded as receiving regular homework assignments also showed to perform considerably better on the math portion of the test than those who did not.
A long and informative article about the rise of technological inventions to meet the need of disabled examinees. There's also a nice discussion of the pitfalls inherent when technological modifications could cause a test to be measuring something other than its intended construct.
Do cheesy songs help raise SAT scores? (Free registration required.)
Renee Mazer is trying to help high school students get into good colleges — by teaching them silly songs and cheesy poems.Mazer is the creator of "Not Too Scary Vocabulary!: For the SAT and Other Standardized Tests and Success in Life," a boxed set of CDs (or audio tapes) aimed at beefing up students' semantic skills. Using playful mnemonic devices and slang-studded stories, the discs teach hundreds of words that often appear on the Scholastic Aptitude Test. By presenting the words in a manner that's easy to absorb and remember, Mazer said, she can help raise students' scores on the verbal section of the SAT. And that can help turn a hapless Ivy League reject into an ebullient Harvard freshman.
What's Mazer's secret? She's never dull.
I'll say, considering that one of the poems talks about emotions involved with "getting to first base."
If you feed them, they will improve:
A pilot program intended to ensure every child has the chance to eat a free and nutritious breakfast is being offered this year to students at Morris Elementary School in Rialto. The program is based on research that shows proper nutrition can positively influence academic achievement, Rialto Unified School District officials said.The free breakfast program paid for by reimbursable state and federal funds is being offered initially at Morris, but if successful, will be expanded to other school sites, said Syeda Jafri, district spokeswoman.
"We offered free breakfast during standardized testing at our schools during last year,' said Sharon Flores, director of nutrition services for the district. "What we found was that the free breakfast reduced tardies and that we had less students in the nurses' offices with stomach aches.'
This part, though, makes me think teachers will be driven nuts:
"The food is served in the classroom right when class starts so it's a family-type environment,' Flores said.
Which means teachers will have to monitor table manners and clean up all that jelly off the desks afterwards.
A thoughtful and quirky look at the testing craze, from a young (class of '98) reporter:
...I just took last year’s standardized English Language-Arts Test for the 11th grade - which anyone can sample at the state Department of Education’s Web site and which the state uses to monitor its districts’ progress - and while I scored well (a 95 percent, thank you very much), there were only 38 questions, nine of which were devoted to understanding a car rental agreement and the instructions of a food processor. Only one great literary work was examined - “Young Goodman Brown” by Nathaniel Hawthorne - and there were scant vocabulary questions, which, maybe, is a good thing....I can say that my foray into the world of standardized testing didn’t exactly fill me with the overwhelming confidence that the results of these tests will mean anything significant. Sure, the state will use them to decide which school districts are doing their job and how to parcel out an increasingly limited chunk of resources, but is it more than just a numbers game?
In speaking with the district’s curriculum director, Elizabeth Chapin-Pinotti, about the county’s California High School Exit Exam and Standardized Testing and Reporting results, which were released last week, it occurred to me that the above question doesn’t simply have a ‘yes’ or ‘no’ answer.
Chapin-Pinotti said that California schools tend to get a bad rap on the national stage when the test results come out, because what’s not factored in is that the state employs some of the strictest standards in the country. “The state set the bar very high and we’re not backing down,” Chapin-Pinotti said.
Reporter Raheem Hosseini goes on to discuss the Connecticut lawsuit before wondering why we care so much:
So what is it about tests we all love so much (and don’t kid yourself, we’re obsessed with them)? I think it has something to do with the simplicity of being able to quantify an amorphous concept such as intelligence. Color in a few bubbles, feed your sheet into the Scantron machine and find out how smart you are. And the strange thing is that even after high school and college and after having taken hundreds of scholastic tests through roughly 20 years of school, we’re not done with them.There are employee evaluations and credit assessments and loan applications and - even weirder - the tests we actually choose to take in our spare time: crossword puzzles, jumbles, online IQ and personality tests. It never stops.
But is it really so bad for people to crave a little disposable evaluation? Maybe what they’re really craving is intellectual stimulation. Heck, even Tommy Lee is back in school.
The Chicago public schools are abandoning the Iowa Test of Basic Skills, or ITBS, in favor of a revised Illinois Standards Achievement Tests, or ISAT:
It's welcome news for both Chicago teachers and students who had to spend hours of preparation for the tests and lost precious class time for other lessons. Now, the public schools will be able to focus on one high-stakes test, the revamped Illinois Standards Achievement Tests, which has become an important measure of Chicago schools' performance under the No Child Left Behind law. "It's fabulous," Nobel School Principal Mirna Diaz Ortiz says. "I'm very happy that we're going to be measured by one test and not have to take two tests"...In addition to the ISAT, there will be another new test, but it will not put the same burden on students and teachers that the Iowa test did. It will not be used to determine promotion. The Stanford Learning First measures students' strengths and weaknesses in reading. It will provide a diagnostic tool for teachers to see where their students need help. It will be offered three times a year through 40- minute exams and will cover the same kind of material that is in the ISATs. It won't require extensive preparation.
The Illinois Standards Achievement Test (ISAT) measures individual student achievement relative to the Illinois Learning Standards. The ITBS, on the other hand, gives results in relation to an Iowa norm, or a national norm. This means that promotional judgments will be made for scores based on comparisons within Illinois, not for the nation as a whole. This sounds consistent with NCLB regulations.
Interactive ISAT samples can be found here. I'd like to be objective, but I note that the eighth-grade Reading sample is a reading passage written by James Thurber, one of my all-time favorite authors. I started reading him about that time, too. So I'm predisposed to like this exam.
This Chicago Tribune article has more info:
The Iowas, used in Chicago since 1972, will be replaced with three short reading assessments that officials believe will prove more valuable in gauging the progress of individual students. Called Stanford Learning First, the new 40-minute exams will be given in October, January and May and will test the same kind of material covered by the new ISATs...Officials said they still expect to use the historic data as a basis for comparison by creating a new formula that can equate the results of the old Iowas with the new Stanfords.
This can work, if the new Stanford exams have the same content as the old Iowa exams. Seems like this is a new, low-stakes way to test reading, with the scores being equated back to the high-stakes ITBS.
The test change also will trigger changes in the district's controversial retention policy, the details of which will be announced in October. The get-tough retention policy was created in 1997 when Daley declared an end to social promotion and started requiring students to meet minimum test standards in reading and math.The policy has been softened over the years. Now the district only considers reading scores and bars schools from retaining students twice in the same grade regardless of how low they score on the tests. Last year, about a third of the 24,000 students required to attend summer school because of low Iowa scores had to repeat a grade.
The article also mentions that the ISAT-based promotion decisions will be based on the ISAT portion that is scored in comparison to national norms. Interesting.
If any of you have experience with the ISATs, or have information that hasn't made it into the Chicago papers, let me know.
One of the testing criticisms I often see is the claim that standardized tests can't tell you everything about a student's achievement:
Naysayers to the testing format say it reflects not what a student has grasped conceptually about a subject, but how well they take tests. These critics, most often teachers, point out that each student processes learning differently - some are better able to express verbally or in essay form their depth of knowledge on a subject..."Think of the driving test," said San Luis Obispo High School's English department chair, Ivan Simon. "If you just looked at how well someone answered the written part of the driver's test, then you'd assume the skill of the driver was represented by only that score. But that person wouldn't necessarily be a good driver."
It is true that the performance-based exam of driving skills tells you much more than the written exam, because the skill being measured is wholly performance. However, one could argue that someone who knows how to turn the key and step on the gas is not a good driver if they cannot read road signs or haven't memorized any of the rules of the road. Someone who passes the written exam is not necessarily a good driver, but we can argue that someone who flunks the written exams is necessarily a bad driver. Both components of the exam are important. The ability to understand signs could be folded into the performance assessment (and often is a part of it), but the reason for the written exam parallels the reason for many standardized multiple-choice assessments - it's a way to very quickly sample a broad domain and make a cheap, reliable assessment in order to flag those who just aren't getting it.
Many standardardized exams are, in fact, minimum competency exams, and the best precision is not in separating the brilliant from the good, but the terrible from everybody else. Simon's criticism that multiple-choice exams often don't tell the whole picture is correct. However, the critics gloss over that these exams can tell you quite a bit about how students have mastered basic skills.
And, as one principal points out, the basic skills are important:
We provide teachers with examples of multiple choice questions, but it's not the sole focus of the curriculum," said [Will Jones, principal of San Luis Obispo High School]. "The state releases sample questions, roughly 20 percent of the test, and if teachers want to use those questions they can, and they do, just like they do for advance placement tests. The misperception is that we spend all our time just teaching to the test and somehow the STAR exam is consuming education, but the truth is there are all kinds of assessments and we prepare kids for all of those as well."Jones said he believes good students will always excel regardless of test format. "If a student is capable of writing a good essay and answering short questions, then they will have success on the test," said Jones.
This is something that few testing critics are willing to admit. From what I can tell, the evidence that there are hordes of little geniuses out there who routinely flunk exams yet learn brilliantly through non-traditional methods is anecdotal, at best. The testing critics are right when they say that there are skills we aren't measuring with the one-size-fits-all exams, but I say they're missing the boat when they insist that students who can't master basic material are somehow ready for advanced performance assessments.
Devoted Reader Lori M sent a provocative column my way:
Tens of thousands of parents of schoolchildren and hundreds of thousands of other taxpayers learned from media reports last week that "the majority of Texas school districts and campuses in 2005 earned the rating of 'Academically Acceptable.'" Most of the moms, dads and school-tax payers breathed a sigh of relief and shrugged off the "bad" news that a small percentage of districts and campuses "received the lowest rating of 'Academically Unacceptable'"...To "earn" a rating of "Academically Acceptable" on the 2005 Texas Assessment of Knowledge and Skills test, the students in a school district or at an individual campus had to achieve ...
In reading and English language arts, a passing rate of 50 percent.
In writing ... 50 percent.
In social studies ... 50 percent.
In math ... 35 percent.
In science ... 25 percent.In case the educational horror of those numbers didn't sink in ...
If half of the youngsters in a district or on a campus failed tests in reading, writing and social studies ... and 65 percent failed arithmetic ... and 75 percent failed science — the Texas education establishment deemed that district/campus "Academically Acceptable"!
Lori comments: "I guess I'm part of that old-fashioned school of thought that believes that a passing grade consisted of mastering the majority (at least 65%) of the material. Looks like it's sufficient to learn 25-50%!"
What's happening here? A disconnect between how we, the educational consumers, think of "passing," and how Texas is ranking schools, which is with a minimum-competency standard. "Acceptable" here is not defined in the same way that we'd consider "acceptable" to be in an academic course.
The author isn't exaggerating when he quotes the low percentages above; those come straight out of the state's 2005 Accountability Manual. You can skip right to this table for the good stuff. Yes, it's true that this year, a school for which 26% of the students meet the standard in Science is acceptable in that content area. This is also the first year the Science standard was set at what the advisory panel actually recommended, as opposed to one (2004) or two (2003) SEMs beow it.
To interpret these numbers, you really have to have some idea of what the standards are, so that you know if a school in which only 26% meet those standards is a travesty, or just plain mediocre. You also have to realize that while the state set those standards very low, that doesn't necessarily mean most schools are squeaking in just over the bar. The 10th grade Science results, for instance, show that 54% of the overall student body in Texas met the standard. The raw score conversion table for that exam shows that a raw score of 34 out of 55 converts to the lowest possible passing score, and according to this document, that means a student would have to answer 62% of the items correctly to meet the standard. That's pretty much in line with what most people think of as a passing score.
I don't mean to suggest that parents don't have a right to wonder why the standards for Acceptable schools aren't set higher. And, given that we don't know how difficult the science items are, for example, we don't know how meaningful that 62%-correct standard is. But it might help in this debate to be sure to separate the standard for the exam from the standard for the schools.
Joanne Jacobs points out that our students may not be dumb - they just might not care:
You could conclude from these exams that American high-schoolers are ill-taught and ill-prepared for the competitive global economy. But what if you look at these tests like a capitalist rather than an educator? Nothing is at stake for kids when they take the international exams and the NAEP. Students don't even learn how they scored. And that probably affects their performance. American teenagers, in other words, may not be stupid. It could be that when they have nothing to gain (or lose), they're lazy...The dubiousness of these test results becomes clear when you compare them to the results of tests that actually do matter for teenagers: high-school exit exams and college boards...
Alexander Russo, for one, is suspicious of such a neat-and-easy conclusion:
if things are better now in secondary education than they were before, shouldn't kids today still outscore kids from 30 years ago? They were unmotivated to perform on the NAEP then. They're unmotivated now. They know more now, according to Starr. But the scores aren't much different.And what about elementary school NAEP scores, which are on the rise? If motivation is all, then shouldn't they stay flat?
Now I'm no economist or behavior expert, but it seems to me that if high school kids were actually learning more in school than they had before, the NAEP scores would show at least part of that change.
I'll play Devil's Advocate - could it be possible that kids know more, but care even less? After all, we're constantly told that kids are over-tested and are sick of exams, and perhaps there's truth to that. Could it b