When your kid comes home from school with an inscrutable red mark on his or her test results (what does 153 even mean? Is it out of 200? 154?) your first question is probably “what was the average score?” Your kid doesn’t have to be a genius—but it’s nice to know that he or she is at least average. That is why the self-serving, but seemingly legitimate findings from a study conducted by rubric-based online assessment tool Gradescope, are so unnerving. Bad math scores have long been understood through the lens of bad math.
“There is no average student,” Gradescope’s Liz Carlson declares on the company’s blog. Indeed, the study reports that, out of 1,500 computer science students who submitted answers to seven final exam questions, only one scored within the average 20 percent on all seven questions. Gradescope’s team analyzed grading data from a 1,500-student computer science course, which consisted of seven questions and 26 subquestions. A perfect score was (somehow?) 80, and the average score among the students was 46. Yet, the researchers found that only one of the students scored within the average 20 percent on all seven questions.
Fewer than 1 in 25 students scored within the average range on five or more questions. Nearly 25 percent of the students did not obtain average marks on a single question.
Now, Gradescope’s study is not published in a peer-reviewed journal and the researchers have substantial conflicts of interest. The results indicate, for instance, that a more individualized and detailed approach to grading is necessary—and it just so happens that’s precisely what Gradescope is selling. Nonetheless, the findings do echo prior research that suggests uniform standards like tests are outdated, and that truly average students (and people in general) probably do not exist.
Even more confusing, among the handful of students who obtained overall scores within 20 percent of the average (that is, between 41.4 and 50.6) no less than 14 did not have scores that fell within the average 20 percent on any of the seven questions. In other words, their overall grades were average, but their performances on individual test questions were not. The findings require follow-up and peer review, but broadly suggest that grading on a curve—and indeed, conventional grading as we know it—fails to capture the strengths and weaknesses of students.
“We found the discrepancy among average-scoring students could be over 40 percent — a truly significant difference in exactly what each student learned,” Carlson writes. “We looked at two students who both earned 51.5 out of 80 points on the exam. Despite earning an identical score, they had 67 rubric item discrepancies between them, or nearly 44 percent of all rubric items.”
“They essentially understood only half the same material.”