Presidential Column

The Case of the Invisible Experimenter(s)

Douglas L. Medin

Douglas L. Medin

When our story left off last month, we had considered three cases of classic field studies: (1) the Festinger, Riecken, and Schachter (1956) “When Prophecy Fails” study of cognitive dissonance, where Festinger et al. joined a group that was expecting the world to end in a flood on December 21, 1954; (2) Richard LaPiere’s (1934) observations on the mismatch between motel, hotel, and restaurant owners’ stated attitudes toward potential Chinese travelers and their behaviors toward actual ones; and (3) the “Robbers Cave” studies of a summer camp for boys, where Sherif et al. (1961), disguised as camp staff, created a competitive environment that led to group conflict and then implemented measures to restore intergroup cooperation. In each case, the researchers faced the challenge of being present enough to make relevant observations but absent enough that their observations did not affect the actual results.

Perhaps Festinger et al. were in the most delicate situation. As participant-observers they needed to be well accepted and integrated into group activities, but they also needed to avoid acts of commitment, proselytizing, or anything that would affect the course of the movement, which began after the automatic writing of Marion Keech (i.e., writing not under her “control”) that revealed the coming catastrophe and the possibility of escaping to another planet on a Guardian flying saucer. As Festinger et al. faithfully recount, maintaining their neutrality was sometimes far from easy. For example, one night in late November 1954, Mrs. Keech more or less demanded that one of the researchers lead the meeting. To accommodate her request, his solution was “to suggest that the group meditate silently and wait for inspiration.” In the ensuing silence, a member of the group acted as a medium for the first time, perhaps something that would not have otherwise occurred. Another time the authors were informed by one of their hired observers of an upcoming meeting that was otherwise not advertised. When the authors requested permission to meet with Mrs. Keech on the days when the meeting would be held, she treated their anticipation of the unannounced meeting as having supernatural origin.

Did the presence of the observers/researchers affect the key data? Recall that, initially, the group did little by way of proselytizing, because Mrs. Keech believed that “those who are ready will be sent.” The appearance of the three authors and three hired observers (one of whom joined the group on December 25, 1954, after the prophecy’s failure) may have served to reinforce this belief. On the other hand, the studied neutrality of the researchers and hired observers may have dampened the enthusiasm of other group members. Overall, the effect of the observers was likely a minor factor in the dramatic increase in proselytizing after the prophesized date had come and gone. Festinger et al.’s (1956) careful, balanced discussion of observer effects strikes me as a model of good science.

Richard LaPiere’s (1934) field observations were based on his travels with a Chinese couple. He notes that he tried to “factor” himself out: “Whenever possible I let my Chinese friends negotiate for accommodations (while I concerned myself with the car or luggage) or sent them into a restaurant ahead of me.” But LaPiere also tabulated his judgments about the quality of the reception the couple received. According to his Table 2, he was present to observe the initial reception of the couple 84 percent of the time (with the rest of his judgments being based on how they were treated after he joined them). (The Table 2 data are themselves intriguing. LaPiere judged that the couple received a “normal” reception 14 percent of the time; a hesitant, awkward reception 5 percent of the time; a reception better than LaPiere himself would have received 39 percent of the time; and a reception characterized by “heightened curiosity” on 44 percent of occasions. Some of my Native American friends comment on the “heightened curiosity” they receive in restaurants, and it is not a reaction that they appreciate.) Still, the fact that the pattern of response was essentially the same whether or not LaPiere was present suggests that his presence as an observer was, at most, a minor factor.

In the Sherif et al. (1961) “Robbers Cave” studies, it seems a bit creepy to me that the researchers/observers were the camp staff — maybe creepy enough to serve as a premise for a movie in which things go wrong. And they did. In the process of tracking down the details of the Robbers Cave experiment, I learned that this was actually the third in a series of studies. The first, conducted in 1949 (Sherif & Sherif, 1953), included an attempt to restore intergroup harmony by introducing a “common enemy.” In this case, the common enemy was a softball team from a neighboring town that was handily defeated by an “all-star” camp team drawn from both groups (the Red Devils and the Bull Dogs). Although the effort appeared to be successful, Sherif and Sherif noted that this device could lead to larger conflicts in the longer run, and they did not employ it again. They also admitted that, despite their efforts, the in-groups tended to persist.

But the key study in this series was the second one. Conducted in 1953, this study was halted midway though its execution. Sherif et al. (1961) offer only a cryptic reason: “During the period of intergroup relations, the study was terminated as an experiment owing to various difficulties and unfavorable conditions, including errors of judgment in the direction of the experiment.” This statement is curious, to say the least. Cherry (1995) determined that this note refers to the fact that the boys in the study figured out that they were being manipulated and then staged a mutiny. Specifically, the boys realized that the source of a frustration (manipulation) was the camp administration (the researchers), not the other group.

Earlier, Billig (1976) had argued that the summer camp studies should be conceptualized as involving three groups, not two — two groups of boys and the researchers/camp staff. According to Cherry, Billig was unaware of the reason that the second study was aborted, but nonetheless Billig’s argument is logical and compelling. The staff/researcher group had the power and authority, and it would be a conceptual mistake to assume the experimenters were invisible and not part of the context in which the study took place. Although the second study underlines this point, all three studies involved three groups, not two, according to Billig.

Now let’s turn to laboratory studies. Here we have visible experimenters (with the exception of studies involving confederates). But problems may arise if we treat them as if they were invisible when writing our papers. One of my all-time favorite psychology works is Robert Rosenthal’s (1966) book on experimenter effects in research (see also Rosenthal, 1994; Rosenthal & Jacobson, 1968). He documented the many ways in which experimenter expectancy can affect the results of studies, and he systematically analyzed various ways of trying to overcome researcher bias.

Concern with researcher effects shouldn’t be the sort of thing that would go out of style. Nonetheless, when I went to the Google books Ngram viewer and entered terms like “experimenter expectancy” or “experimenter bias,” the results revealed a peak in the early 1970s followed by decades of decline (see the figure).*

In preparing this column, I did a little looking into who has paid attention to Rosenthal’s warnings over the past few decades and was struck by how often the research in question came from what one might call “borderline” areas of psychology. But this was evident to Rosenthal very early on. He wrote:

Clearly we have a double standard of required degree of control. Those behavioral data found hard to believe are checked and controlled far more carefully than those behavioral data found easier to believe. … Obviously the solution is not to make it easier to ‘demonstrate’ unlikely events such as clairvoyance, rod-divining, talking animals, or muscle reading. What is called for is the setting of equally strict standards of control against expectancy effects in the more prosaic, perhaps more important, everyday bread-and-butter areas of behavioral research. (Rosenthal, 1966, p. 379)

Even when the experimenter does not bias the results in the direction of expectations, he or she may still have a substantial impact on the results. Here’s a (true) story. In our studies of biological expertise involving Native  American and European American hunters and fishermen, we had to decide whether the ethnicity of the interviewer should match that of the interviewee. Although this seems a sensible thing to do, any cultural differences we observed would be confounded with experimenter differences.

At one point when we (postdoctoral fellow Norbert Ross and myself) were teaching a research course at the College of the Menominee Nation, we started working with a Native-American student who had grown up hunting and fishing with her father and his friends, not far from where we were doing our studies. By appearance, she was not obviously Native American, and Norbert and I got the bright idea that we could both hold the experimenter constant and match the ethnicity of researcher and participant. We hired her as a research assistant (RA), asking her to say nothing about her ethnicity when interviewing European American experts but to let the Menominee experts know (in one way or another) that she was a member of the Brotherton tribe. We also thought her obvious biological expertise would be a big asset.

But our clever strategy proved to be a disaster. The next time Norbert or I ran into our experts (as we frequently did), they had nothing but complaints about being interviewed by our multicultural RA. A little probing revealed that the issue was less about gender than the feeling that they had been “demoted” when we substituted an undergraduate RA for more senior researchers. Although our RA was blind to any hypotheses we might have been entertaining, her presence clearly affected the tenor of the interviews.

Here’s what I think the take-home lesson is: Examining the consequences of visibility may be more promising than hoping for invisibility. The idea of invisibility may sound good, but it may be akin to trying to paint a picture without adopting a perspective.

*If you’re a stickler for details, you may think the decline reflects a shift away from “experimenter,” not “expectancy” or “bias.” But “researcher expectancy” appears rarely and, though “researcher bias” does not show a decline, its baseline is so low that adding its frequency to “experimenter bias” does not reverse the overall decline. It’s hard to escape the conclusion that experimenter effects have gone out of style.


APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.