Two cheers for multiple-choice tests

The oldest geyser in Yellowstone National Park is:

a. Steamboat Geyser
b. Old Faithful
c. Castle Geyser
d. Daisy Geyser

test22We’ve all answered hundreds if not thousands of these multiple-choice questions over the years. We answer them to get our drivers’ licenses, to get into good colleges and grad schools and professional schools. They’re ubiquitous, yet everyone hates them. Educators dismiss them as simplistic, the enemy of complex learning. Students think they’re unfair. And learning experts say they plain don’t work.

To be clear, learning experts are questioning the value of these tests as learning tools. Perhaps the easy-to-grade exams are a necessary evil for assessments—for things like drivers’ licenses and law school admission. But psychological scientists who study memory and learning say that they can’t be justified on a basic cognitive level as learning tools: Years of research have shown that multiple choice questions fail to trigger the memory retrieval that’s known to solidify new learning. All students have to do for these tests is recognize the right answer, and simple recognition does not facilitate learning. Only digging through memory does that.

At least that’s what critics of multiple-choices tests have been arguing for years. But now some new research is challenging that entrenched view. A team of UCLA scientists—graduate-student Jeri Little (now a postdoctoral fellow at Washington University in St. Louis), Elizabeth Bjork, Robert Bjork, and research assistant Genna Angelo–decided to take another look at the much maligned multiple-choice test. They wanted to see if at least some kinds of questions—if well constructed—might indeed trigger the crucial retrieval process—and thus promote memory and learning.

To test this, they asked students to read short essays on two topics—Yellowstone National Park and the planet Saturn. Then they took different kinds of practice tests—but all having to do with either Yellowstone or Saturn. Some answered multiple-choice questions like the one above, while others got the same questions in simple question form: What is the oldest geyser in Yellowstone? The students had plenty of time to search their memories while completing these practice tests.

Then, after a delay, they all took the “final exam”—another recall test, to see what if anything they had learned. But here’s the key to the experiment: All of the students got the questions they had been tested on earlier, but they also got new questions that were closely related to the ones they had practiced. For example: What’s the tallest geyser in Yellowstone? They also answered control questions, drawn from the essay they had not been tested on. This was the crucial comparison: Did students do better (or worse) on practice questions—and also on the related questions—than they did on the control questions?

The findings were provocative. Both types of practice tests improved performance on the final exam—not surprisingly. But practicing on the multiple-choice test enhanced learning more than practicing on a recall test. What’s more—and this is the most striking finding—practicing on recall tests actually impaired learning of the related material, while practicing on the multiple-choice test slightly enhanced recall of these related but novel items. In other words, the learning fostered by the multiple-choice tests was broader, including even material that had not been tested.

So it appears that multiple-choice practice does in fact trigger the memory retrieval process, and in that way enhances learning. But how and why? The UCLA scientists believe that it has everything to do with the way the questions and answers are constructed. As they describe in a forthcoming issue of the journal Psychological Science, the questions they used in the experiment had “competitive alternative answers.” That is, the wrong answers were plausible enough that the students had to think about why the correct answer was correct—and why the wrong answers were wrong. In coming to the (correct) conclusion that the oldest geyser in Yellowstone is Castle Geyser, for example, they might think something like this: “Well, Old Faithful is most familiar, but that doesn’t mean it’s the oldest. And I think I recall that Steamboat is the tallest, not the oldest.” And so forth. It’s this cognitive process, and the memory search that accompanies it, that leads to learning. This is important as a practical matter too, since final exams often use questions that are different but related to practice questions.

So is this vindication for multiple-choice tests, after years in testing purgatory? Well, yes, at least “well constructed” practice tests. But proper construction of questions and answers is not easy, the scientists note. Including wildly implausible answers to the oldest geyser question—the Empire State Building, say—may make students laugh, but it doesn’t make them think. It takes work to come up with answers that are plausible yet fair. In that sense, the scientists concede, it may be true that multiple-choice tests are more often than not bad tests—but that may have more to do with the test writers—and with human nature—than with the test itself.

Wray Herbert’s book, On Second Thought, is available in paperback. Excerpts from his two blogs—“Full Frontal Psychology” and “We’re Only Human”—appear regularly in The Huffington Post and in Scientific American Mind.


This is a really interesting question to me. Great article. Glad to have heard about it on Twitter.

Right now, we are engaged in a study of how best to help elementary school students build and retain vocabulary.

Key points that we are finding are that word study needs to:
– be spaced across many weeks, not crammed into one week
– involve multiple types of encounters with words: hearing them, saying them, writing them, playing with them, using them in context, breaking them down phonetically and morphologically

Another aspect of our approach is the merger of two design disciplines: instructional design and game design. If you look closely at how our multiple choice activities work, you’ll notice a few details that we think greatly increase learning and retention.

1. When a student gets an answer right on the vocabulary multiple choices, it stays on the screen for awhile and the word is said in a context rich sentence. Example:

2. For phonetic exercises, we have intelligent feedback. If the student does not select the right answer on a multiple choice initial sound exercise (phonological skills), our feedback involves some intelligent feedback and repetition which should help with learning. Example:

Your research seems to assume that the multiple choice questions are not interactive and don’t involve audio, we are way beyond. WE have lots of other little details which we think increases engagement and learning. Would you be interested in collaborating on this?

I just posted an article on a related topic on my blog on the site:

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.