Multiple Choice and Testing Machines: A History

read

Why multiple choice? It’s a question that’s plagued me for a long time, particularly as someone who grew up with one foot in the American and one foot in the British education system. (The former involved a lot of multiple choice testing; the latter, almost none.)

Where and when did multiple choice assessment originate? Who decided it was a good measurement of learning? How did multiple choice come to look this way? Like, why are there only four or five options in the typical multiple choice test? Why not three? Why not thirty?

How did multiple choice questions become the predominant means by which American schoolchildren are tested? And most importantly perhaps for my work: what is the relationship between multiple choice tests and technology?

The Origins of the Multiple Choice Test

“One cannot understand the history of education in the United States during the twentieth century unless one realizes that Edward L. Thorndike won and John Dewey lost” – Ellen Condliffe Lagemann

Frederick J. Kelly is often credited as the “father” of the multiple choice test, although it’s worth noting that Edward Thorndike – the “father” of education psychology – had also developed his theory about animals’ learning in part by giving them multiple options to solve a problem or situation and assessing their responses.

Choose the best answer from among a number of options. This is the legacy education and education technology must still address. This is the paradigm within which we still operate.

From Anya Kamenetz’s wonderful new book The Test:

The multiple-choice question was an important technique for simplifying and mass-producing tests. Frederick Kelly completed his doctoral thesis in 1914 at Kansas State Teacher's College. He recognized that different teachers tend to give different judgments of student work. And Kelly saw this as a big problem in education. He proposed eliminating this variation through the use of standard tests with predetermined answers. His Kansas Silent Reading Test was a timed reading test that could be given to groups of students all at the same time, without requiring them to write a single sentence, and graded as easily as scanning one's eyes down a page.

As digital humanities scholar Cathy Davidson writes in her book Now You See It, "To make the tests both objective as measures and efficient administratively, Kelly insisted that questions had to be devised that admitted no ambiguity whatsoever. There had to be wholly right or wholly wrong answers, with no variable interpretations . The format will be familiar to any reader.... Here are the roots of today's standards-based education reform, solidly preparing youth for the machine age."

From Frederick Kelly’s article in the February 1916 edition of The Journal of Educational Psychology:

Standardized Testing and the Great War

Why multiple choice?

As Kelly argued, it’s more “objective.” It takes the power of judgment out of the hands of individual (likely female) teachers. Multiple choice enables standardization. It means tests can be graded quickly and can be administered “at scale,” an incredibly important feature at a time when enrollment in public education in the US was expanding rapidly. Moreover, multiple choice assessment promised an education system that would be more efficient. And in conjunction with twentieth century futurism, it was nod towards an education system that could become more automated.

When the US military undertook its massive effort to assess recruits for the First World War, it needed a system that would do just that: assessment, standardized, efficiently, at scale. No doubt, that assessment process was nothing short of remarkable: between 1917 and 1918, some 1.7 million men were examined via standardized testing designed (ostensibly) to identify who might be suitable for officer training and who would be best for the trenches. But that process was highly flawed by design, often confirming the racist expectations, for example, of what African-American recruits could do.

The US public school system opted to replicate that very process. It opted to do so by design and by machine…

WWI was the catalyst for assessment – and for education technology – as we know it today.

The Testing Machine

How do you test millions of people? By machine, of course. The early twentieth century saw the development of several testing machines and testing technologies.

“Mark Sense”:

First filed in 1937 and updated a few years later by Reynold Johnson: a patent for a “scoring apparatus.”

Johnson was a high school physics teacher, who in the early 1930s experimented with using machines to grade his students’ work. He designed a machine that would detect pencil marks on a piece of paper and then compare them to an answer key.

That technology provided the basis for IBM’s 805 Test Scoring Machine, launched commercially in 1937. The Educational Testing Service (ETS) initially used this technology as well.

From an IBM brochure, “Scoring Examinations the Electrical Way”:

Speed. Accuracy. Efficiency. Cost-saving. Labor-saving.

“Optical Mark Recognition”:

University of Iowa education professor Everett F. Lindquist had a different idea – technologically, at least. Rather than sense pencil marks’ lead electronically, his system – patented in 1955 – identified the marks optically.

Optical mark recognition is the technology used by the company Scantron (founded in 1972), whose name has become synonymous with the paper forms used for multiple choice testing.

From Lindquist’s patent application:

"By employing methods and apparatus in accordance with the teaching of the present invention, it is possible to perform the desired scoring, converting, analyzing and reporting operations in a matter of days, even hours, as compared to weeks. In other words, it is unnecessary to have a staff of from fifty to one hundred persons. … Furthermore, the capabilities of methods and apparatus according to the present invention are such that many more converting, analyzing and reporting operations can be performed upon the raw score data without sacrificing, to any appreciable extent, the speed of completing the desired reports. Because relatively few operators are needed to perform the desired operations, the problem of periodic large staff recruitment is effectively eliminated.

Lindquist’s work on standardized testing eventually evolved into the Iowa Tests of Basic Skills and the ACT. Lindquist also helped develop the GED.

All multiple choice standardized tests. All gradable via machine.

What Gets “Hard-Coded”

Five choices. Both of these patents seem to suggest that five choices is the optimal number their machines were designed to process. Lindquist's patent application says "it is apparent that certain tests can involve as many as five possible choices for a particular test question. The equipment could be readily designed, if desired, to provide for any larger number of answers per question." But it wasn't. We're typically offered four or five choices.

I would love to be able to say more about “why five.” Is it that multiples of five calculate neatly? Is it that multiples of five fit neatly onto a piece of paper? Or is it is that five times a couple of hundred questions times a couple of thousand students hit some sort of threshold for early twentieth century mechanics or computation? I still don’t know…

But what I do know: multiple choice – four or five choices – has become hard-coded into our educational technologies and our educational practices and our educational policies. We could assess differently. But even new technologies, developed one hundred years after Frederick Kelly’s work, tend to re-inscribe the multiple choice test.

Multiple Choice and Testing Machines: A History

Audrey Watters

The Origins of the Multiple Choice Test

Standardized Testing and the Great War

The Testing Machine

What Gets “Hard-Coded”

Written by

Audrey Watters

Credits

Hack Education

The History of the Future of Education Technology