In a famous psychological study performed 20 years ago, researchers gave two groups of participants a trivia test. Beforehand, they asked members of the first group to imagine what their daily lives would be like if they were a “professor”; they asked a second group to imagine their lives as a “soccer hooligan.” Amazingly, the first group scored 13 percent higher on the test.
“That’s a huge effect,” says Haas doctoral candidate Michael O’Donnell, who says the study also had a huge effect on the field of psychology. Cited more than 800 times to date, the study helped establish the concept of behavioral priming, the theory that people’s behavior can be influenced by subtle suggestions. “For psychologists, it’s seductive to think that we can address these weighty issues through simple manipulations,” says O’Donnell, PhD 2019.
Seductive maybe, but in this particular case, it turns out not to be true. Working with Haas Prof. Leif Nelson, O’Donnell recently ran an exhaustive experiment to try and replicate the “professor priming” study. Their effort—one of the first to answer a call from the Association for Psychological Science for systematic replication studies—involved 23 labs and more than 4,400 participants around the world.
Depending on the lab, the “professor” group scored anywhere from 4.99 percent higher to 4.24 lower on the test. On average, they scored just 0.14 percent better—essentially, a statistically insignificant result. “It was 1/100th the size of the original effect,” O’Donnell says. “That’s pretty strong evidence there was no effect.”
Crisis of conscience
Over the past eight years, the field of social psychology has been undergoing something of a crisis of conscience, sparked by the work of Nelson and other skeptics who began calling into question research results that seemed too good to be true. It’s grown into a full-blown movement—which Nelson and co-authors Joseph Simmons and Uri Simonsohn of Wharton refer to as “Psychology’s Renaissance” in a new paper in Annual Review of Psychology. Researchers are systematically re-visiting bedrock findings, examining new research under new standards, and establishing new methodologies for how research is conducted and validated.
Nelson says he started to suspect something was amiss in social psychology about a decade ago, during weekly meetings he organized with colleagues to discuss scientific journals. “I had noticed that more and more of the comments seemed to boil down to something like, ‘this just does not sound possible…’ A challenge of plausibility is a challenge of truth, and I started to feel as though there wasn’t enough truth in what we were reading.”
In 2011, Nelson joined with Simmons and Simonsohn to publish “False-Positive Psychology,” which identified and pinned the blame on certain practices they dubbed “P-hacking.” The term refers to the P-value, a calculation which researchers use to determine a study’s validity. A P-value of less than 5 percent is the gold standard—meaning the probability that the results were due to pure chance rather than experimental conditions is less than 5 percent. In analyzing papers, Nelson noticed that many of them had P-values just a hair under that limit, implying that researchers were slicing and dicing their data and squeaking in under the wire with publishers. For example, a wildly outlying data point might be seen as a glitch and tossed—a practice accepted as “researchers degree of freedom”.
“Basically, P-hacking references a set of often well-intentioned behaviors that lead to the selective reporting of results,” Nelson says. “If scientists report selectively, then they will tend to select only those results that will look most positive.”
These were practices that everybody knew about but rarely confessed. Haas Prof. Don Moore compares the 2011 paper to the child in the story “The Emperor’s New Clothes.” “Nelson’s paper changed the world of social science research,” he says. “After he had the courage to speak the truth, people couldn’t ignore it any more. Everyone knew that everyone else knew, and the emperor had been exposed as naked.”
The replication challenge
Taking up the challenge, other researchers attempted to reproduce the findings of a number of suspect studies, and in many cases, were unable to do so. Most famously, Simmons and Simonsohn called into question a small 2010 study co-authored by Dana Carney and Andy Yap, then of Columbia University, and Amy Cuddy, then of Harvard, which found that holding “power poses” could increase risk taking, increase testosterone levels, and lower cortisol levels. The study had involved a sample size of just 42 people, but power posing had become a pop-culture phenomenon after Cuddy created a TED talk that garnered 64 million views.
Lead author Carney—now an associate professor at Haas—joined the skeptics, serving as a reviewer on failed replication attempts, and as evidence mounted, publicly disavowing the findings. In 2016, she posted a statement detailing the problems in the original work, including several points where P-hacking had occurred.
Not surprisingly, the battle over validity in this and other cases got heated. The effort by the Association for Psychological Science aims to cool down the vitriol through comprehensive Registered Replication Reports (RRRs). For each RRR, the original author participates in approving the protocol for the study, which is registered beforehand, and then performed by dozens of labs, all of whom have their results peer-reviewed.
“An operational definition of scientific truth is that if you follow an identical procedure you expect to observe the same result,” says Nelson. “Running replications helps the field keep track of that goal.”
The replication movement now extends far beyond psychology across a wide range of disciplines—for example, in a recent Nature article, “How to Make Replication the Norm,” Haas Prof. Paul Gertler made recommendations on how to lower barriers for replication studies in economics research.
For the “professor priming” study, O’Donnell worked with the original author, Dutch psychologist Ap Dijksterhuis, who accepted the results. “Science is supposed to be constantly updating our views and testing things, but in practice it doesn’t always work like that,” says O’Donnell. “I hope something like this will contribute to the process of updating, and encourage the kind of skepticism we are supposed to have.”
Nelson and his co-authors strike a hopeful note in “Psychology’s Renaissance,” concluding that the new scientific practices have dramatically improved their field. They argue for even more transparency, including that researchers pre-register their hypotheses and subject their methods to peer-review before they run their experiments—something that Moore has also been outspoken about. That way, once the data come in, psychologists can be confident about their validity.