Software for Automating Essay Grading Put to the Test

Although Scantron answer sheets have been part and parcel of test-taking since at least the 1960s, there have been no serious attempts to design a software tool for grading student essays until recently. No doubt many overwhelmed teachers would have welcomed the wide adoption of such tools at any time, the need for them became more acute in 2005, when essay-writing became a mandatory part of the redesigned SAT test. Now, with many states reworking their academic programs to conform with the Common Core Curriculum means that kids in all grades will be submitting more written assignments than ever, and with the shrinking education budgets leading to staffing cuts, computer-aided grading would greatly reduce the burden on instructors who remain.

With that in mind, two professors from the College of Education at the University of Akron, Ohio, Morgan and Mark Shermis, decided to put several essay-grading software packages available on the market to a rigorous test, by having them grade 16,000 essays that been previously assigned grades by teachers. The results, announced during this year’s National Council on Measurement in Education meeting held in Vancouver, Canada, showed that at least some of the programs produced marks very similar to the ones given by humans.

Grading software from nine manufacturers, which together cover 97 per cent of the US market, was used in the test. To calibrate the systems, each looked for correlations between factors associated with good essays, such as strong vocabulary and good grammar, and the human-assigned score. After training, the software marked another set of essays without access to the human-given grades.

According Morgan Shermis, the grades assigned by the computer programs were statistically identical to those given by human teachers which proves that such software has progressed a great deal since development first began. When he heard of the Akron team’s findings, Les Perelman, who teachers writing at the Massachusetts Institute of Technology, said he wasn’t surprised but not because he considers himself a great supporter of computer grading. On the contrary, as far as he’s concerned, these kinds of programs are “reinventing the wheel,” replicating the technology already available, on the market and installed on nearly every computer in the country and the world: Microsoft Word.

The ubiquitous word processing program is “a much better product than anything that’s going to be developed by this competition,” he says. Its grammar checker is fairly sophisticated, but can be fooled. For instance, if a student types, “The car was parked by the side of the road,” Word suggests, “(The) side of the road parked the car.”

Perelman worries that the bid to develop machine readers will, in the end, train humans to read more like machines. “It will get good agreement (between humans and machines) but not necessarily good writing.”

Still, with the CCS’s wide adoption scheduled for 2014, and thus with the imminent increase of the number of writing assignments students will have to complete and teachers will have to grade, many feel they can not afford to turn up their noses at a tool that will help them cope. Jeff Pence, who teaches English at the Dean Rusk Middle School in Canton, GA, and who uses essay-scoring software to grade the papers of his 120 students, admits that while he is not blind to the tools’ shortcomings, neither is he unaware of the shortcomings of overwhelmed human graders. So far this year, with the aid of the program, he was able to collect and grade 25 written assignments from each of his students, sometimes returning them the next day, while hand-grading even a single batch would have previously taken him nearly two weeks.

 “I know, as does every teacher out there, that on that 63rd essay, I am nowhere near as consistent, accurate or thorough as I was on the first three.”