They say that hindsight is 20-20, and perhaps nowhere is that more true than in academic research.
“We’ve all had the experience of standing up to present a novel set of findings, often building on years of work, and having someone in the audience blurt out ‘But we knew this already!,’” says Prof. Stefano DellaVigna, a behavioral economist with joint appointments in the Department of Economics and Berkeley Haas. “But in most of these cases, someone would have said the same thing had we found the opposite result. We’re all 20-20, after the fact.”
DellaVigna has a cure for this type of academic Monday morning quarterbacking: a prediction platform to capture the conventional wisdom before studies are run.
Along with colleagues Devin Pope of the University of Chicago’s Booth School of Business and Eva Vivalt of the Research School of Economics at Australian National University, he’s launched a beta website that will allow researchers, PhD students, and even members of the general public to review proposed research projects and make predictions on the outcome.
Making research more transparent
Their proposal, laid out in a new article in Science’s Policy Forum, is part of a wave of efforts to improve the rigor and credibility of social science research. These reforms were sparked by the replication crisis—the failure of reproduce the results of many published studies—and include mass efforts to replicate studies as well as platforms for pre-registering research designs and hypotheses.
“We thought there was something important to be gained by having a record of what people believed before the results were known, and social scientists have never done that in a systematic way,” says DellaVigna, who co-directs the Berkeley Initiative for Behavioral Economics and Finance. “This will not only help us better identify results that are truly surprising, but will also help improve experimental design and the accuracy of forecasts.”
Identifying truly surprising results
Because science builds on itself, people interpret new results based on what they already know. An advantage of the prediction platform is that it would help better identify truly surprising results, even in cases where there’s a null finding—which rarely get published because they typically aren’t seen as significant, the researchers argue.
“The collection of advance forecasts of research results could combat this bias by making null results more interesting, as they may indicate a departure from accepted wisdom,” Vivalt wrote in an article on the proposal in The Conversation.
A research prediction platform will also help gauge how accurate experts actually are in certain areas. For example, DellaVigna and Pope gathered predictions from academic experts on 18 different experiments to determine the effectiveness of “nudges” versus monetary incentives in motivating workers to do an online task. They found the experts were fairly accurate, but there was no difference between highly cited faculty and other faculty, and that PhD students did the best.
Understanding where there is a general consensus can also help researchers design better research questions, to get at less-well-understood phenomena, the authors point out. Collecting a critical mass of predictions will also open up a new potential research area on whether people update their beliefs after new results are known.
Making a prediction on the platform would require a simple 5-to-15-minute survey, DellaVigna says. The forecasts would be distributed to the researcher after data are gathered, and the study results would be sent to the forecasters at the end of the study.
Berkeley Haas Prof. Don Moore, who has been a leader in advocating for more transparent, rigorous research methods and training the next generation of researchers, says the prediction platform “could bring powerful and constructive change to the way we think about research results. One of its great strengths is that it capitalizes on the wisdom of the crowd, potentially tapping the collective knowledge of a field to help establish a scientific consensus on which new research results can build.”
Note: The “Classified” series spotlights some of the powerful lessons faculty are teaching in Haas classrooms.
As a young researcher, Kristin Donnelly was captivated by the work of social psychologists who published striking insights on human behavior, such as a finding that people walked more slowly after being exposed to the words gray, Florida, and Bingo. That was one of many surprising studies that had crossed into mainstream pop culture—thanks to books like Malcom Gladwell’s Blink—but there was a problem: No one could reproduce them.
“It was a sad, dark time to enter the field,” says Donnelly, who is now a Berkeley Haas PhD student in behavioral marketing. “I was pursuing similar ideas to people who had these incredible studies, but I couldn’t get any significant results. I became very disillusioned with myself as a researcher.”
Psychology has been rocked by a full-blown replication crisis over the past few years, set off in part by a 2011 paper co-written by Haas Prof. Leif Nelson. It revealed how the publish-or-perish culture—which rewards novel findings and did not reward attempts to replicate others’ work—led researchers to exploit gray areas of data analysis procedures to make their findings appear more significant.
Now Nelson, along with Prof. Don Moore, is working to train a new generation of up-and-comers in methodologies that many see as key to a rebirth of the field. This semester, they’re leading Donnelly and 22 other doctoral students from various branches of psychology in what may be a first for a PhD seminar: a mass replication of studies around one psychological theory: to see how well they hold up.
“We aren’t doing this because we want to take down the literature or attack the original authors. We want to understand the truth,” says Prof. Don Moore, an expert on judgement and decision-making who holds the Lorraine Tyson Mitchell Chair in Leadership and Communication. “There are many forces at work in the scientific publication process that don’t necessarily ensure that what gets published is also true. And for scientists, truth is at the top of the things we ought to care about.”
Examining the psychology of scarcity
The theory they’re examining is the “psychology of scarcity,” or the idea that being poor or having fewer resources actually impairs thinking. Moore and Nelson chose it not because of an inherent flaw, but because it’s relatively new (defined by a 2012 paper), high profile, and relevant to the students’ interests. Each student was randomly assigned a published study, and, after reaching out to the original researchers for background details, is attempting to replicate it. Results will be combined in a group paper.
“At Berkeley, we’re at the epicenter of this new methodological and statistical scrutiny, and as a young researcher I want to do good work that will replicate,” says Stephen Baum, also a PhD student in behavioral marketing at Haas. “Most people were willing to take things at face value before 2011. Things have changed, and we all have to do better.”
Moore and Nelson are leaders in the growing open science movement, which advocates for protocols to make research more transparent. Nelson, along with Joseph Simmons and Uri Simonsohn of Wharton, coined the term “P-hacking” in 2011 to describe widespread practices that had been within researchers’ discretion: removing data outliers, selectively reporting data while “file drawering” other results, or stopping data collection when a threshold was reached. These practices, they argued, made it all too tempting to manipulate data in pursuit of a P-value less than 0.05. That translates to a less than 5% chance that the results were due to pure chance, and it’s the standard for demonstrating statistical significance and the threshold for getting published.
Building confidence through pre-registration
At a recent session of their PhD seminar, Moore and Nelson led a discussion of one of the key ways to combat P-hacking: pre-registering research studies. It sounds arcane, but it’s simply the grown-up equivalent of what grade-school teachers require students to do before starting on their science fair project: Write out a detailed plan, including the questions to be answered, hypothesis, and study design, with key variables to be observed.
“How many of you are working with faculty who pre-register all their studies?” asks Nelson, a consumer psychologist in the Haas Marketing Group and the Ewald T. Grether Professor in Business Administration and Marketing. Less than half the class raises their hands.
Nelson and Moore estimate that only about 20% of psychology studies are now pre-registered, but they believe it will soon become a baseline requirement for getting published—as it has become in medical research. Although there’s no real enforcement body, the largest pre-registration portal, run by Brian Nosek of the Center for Open Science, creates permanent timestamps on all submissions so they can’t be changed later. Nelson co-founded his own site, AsPredicted, which now gets about 40 pre-registration submissions per day. It’s patrolled by a fraud-detecting robot named Larry that dings researchers for potential cheats like submitting multiple variations of the same study.
“Without pre-registration, statistics are usually, if not always, misleading,” Moore tells students. “They aren’t entirely worthless, but they’re worth less.”
Gold Okafor, a first-year PhD student studying social and personality psychology, says she plans to pre-register all her future studies. Though it requires a bit more work up front, it may save time in the end. “I think if you don’t use some of these methods, you could be called out and have your work questioned,” she says.
Students are also learning techniques such as P-curving, which is a way to determine the strength of a study’s results and whether data manipulation may have occurred. They’re also learning from guest lectures from other open science leaders, including Economics Prof. Ted Miguel and UC Davis Psychology Prof. Simine Vazire, who edits several journals.
The bedrock of the scientific method
Then there’s reproducibility, one of the bedrocks of the scientific method and the heart of the course. The American Psychological Association now promotes systematic replications, where multiple researchers around the world all re-create the same study. (PhD student Michael O’Donnell, who is assisting Nelson and Moore in teaching the course, recently led one such effort that cast doubt on a study finding that people who were asked to imagine themselves as a “professor” scored higher on a trivia quiz than those who imagined themselves as a “soccer hooligan.”)
Baum, the marketing student, will be replicating a psychology of scarcity study that was published in the flagship journal Psychological Science. The researchers asked people to recall a time when they felt uncertain about their economic prospects, and then write about how much pain they were experiencing in their body at that moment. The finding was that those people reported feeling more pain than those in a control group prompted to recall a time when they felt certain about their economic prospects.
“If it replicates, I will be surprised, but I’ve been wrong before,” Baum says.
No matter what the results, the replications will offer important new insights into the psychology of scarcity—important to understand in a society plagued by growing inequality, Moore says. Beyond the one theory, the fact that the course has the highest enrollment of any PhD seminar he’s ever taught gives Moore great hope for the future.
“The stakes are high,” he says. “The most courageous leaders in the open science revolution have been young people—it’s the doctoral students and junior faculty members who have led the way. The next generation will be holding themselves, and each other, to higher standards.”
Donnelly is a case in point. “This whole movement has made me a better researcher. I’ve changed what questions I ask, I changed how I ask them, and I changed how I work,” she says. “It’s a brave new world, and we may be able to lay the foundation of a new science that will build on itself.”
In a famous psychological study performed 20 years ago, researchers gave two groups of participants a trivia test. Beforehand, they asked members of the first group to imagine what their daily lives would be like if they were a “professor”; they asked a second group to imagine their lives as a “soccer hooligan.” Amazingly, the first group scored 13 percent higher on the test.
“That’s a huge effect,” says Haas doctoral candidate Michael O’Donnell, who says the study also had a huge effect on the field of psychology. Cited more than 800 times to date, the study helped establish the concept of behavioral priming, the theory that people’s behavior can be influenced by subtle suggestions. “For psychologists, it’s seductive to think that we can address these weighty issues through simple manipulations,” says O’Donnell, PhD 2019.
Seductive maybe, but in this particular case, it turns out not to be true. Working with Haas Prof. Leif Nelson, O’Donnell recently ran an exhaustive experiment to try and replicate the “professor priming” study. Their effort—one of the first to answer a call from the Association for Psychological Science for systematic replication studies—involved 23 labs and more than 4,400 participants around the world.
Depending on the lab, the “professor” group scored anywhere from 4.99 percent higher to 4.24 lower on the test. On average, they scored just 0.14 percent better—essentially, a statistically insignificant result. “It was 1/100th the size of the original effect,” O’Donnell says. “That’s pretty strong evidence there was no effect.”
Crisis of conscience
Over the past eight years, the field of social psychology has been undergoing something of a crisis of conscience, sparked by the work of Nelson and other skeptics who began calling into question research results that seemed too good to be true. It’s grown into a full-blown movement—which Nelson and co-authors Joseph Simmons and Uri Simonsohn of Wharton refer to as “Psychology’s Renaissance” in a new paper in Annual Review of Psychology. Researchers are systematically re-visiting bedrock findings, examining new research under new standards, and establishing new methodologies for how research is conducted and validated.
Nelson says he started to suspect something was amiss in social psychology about a decade ago, during weekly meetings he organized with colleagues to discuss scientific journals. “I had noticed that more and more of the comments seemed to boil down to something like, ‘this just does not sound possible…’ A challenge of plausibility is a challenge of truth, and I started to feel as though there wasn’t enough truth in what we were reading.”
In 2011, Nelson joined with Simmons and Simonsohn to publish “False-Positive Psychology,” which identified and pinned the blame on certain practices they dubbed “P-hacking.” The term refers to the P-value, a calculation which researchers use to determine a study’s validity. A P-value of less than 5 percent is the gold standard—meaning the probability that the results were due to pure chance rather than experimental conditions is less than 5 percent. In analyzing papers, Nelson noticed that many of them had P-values just a hair under that limit, implying that researchers were slicing and dicing their data and squeaking in under the wire with publishers. For example, a wildly outlying data point might be seen as a glitch and tossed—a practice accepted as “researchers degree of freedom”.
“Basically, P-hacking references a set of often well-intentioned behaviors that lead to the selective reporting of results,” Nelson says. “If scientists report selectively, then they will tend to select only those results that will look most positive.”
These were practices that everybody knew about but rarely confessed. Haas Prof. Don Moore compares the 2011 paper to the child in the story “The Emperor’s New Clothes.” “Nelson’s paper changed the world of social science research,” he says. “After he had the courage to speak the truth, people couldn’t ignore it any more. Everyone knew that everyone else knew, and the emperor had been exposed as naked.”
The replication challenge
Taking up the challenge, other researchers attempted to reproduce the findings of a number of suspect studies, and in many cases, were unable to do so. Most famously, Simmons and Simonsohn called into question a small 2010 study co-authored by Dana Carney and Andy Yap, then of Columbia University, and Amy Cuddy, then of Harvard, which found that holding “power poses” could increase risk taking, increase testosterone levels, and lower cortisol levels. The study had involved a sample size of just 42 people, but power posing had become a pop-culture phenomenon after Cuddy created a TED talk that garnered 64 million views.
Lead author Carney—now an associate professor at Haas—joined the skeptics, serving as a reviewer on failed replication attempts, and as evidence mounted, publicly disavowing the findings. In 2016, she posted a statement detailing the problems in the original work, including several points where P-hacking had occurred.
Not surprisingly, the battle over validity in this and other cases got heated. The effort by the Association for Psychological Science aims to cool down the vitriol through comprehensive Registered Replication Reports (RRRs). For each RRR, the original author participates in approving the protocol for the study, which is registered beforehand, and then performed by dozens of labs, all of whom have their results peer-reviewed.
“An operational definition of scientific truth is that if you follow an identical procedure you expect to observe the same result,” says Nelson. “Running replications helps the field keep track of that goal.”
The replication movement now extends far beyond psychology across a wide range of disciplines—for example, in a recent Nature article, “How to Make Replication the Norm,” Haas Prof. Paul Gertler made recommendations on how to lower barriers for replication studies in economics research.
For the “professor priming” study, O’Donnell worked with the original author, Dutch psychologist Ap Dijksterhuis, who accepted the results. “Science is supposed to be constantly updating our views and testing things, but in practice it doesn’t always work like that,” says O’Donnell. “I hope something like this will contribute to the process of updating, and encourage the kind of skepticism we are supposed to have.”
Nelson and his co-authors strike a hopeful note in “Psychology’s Renaissance,” concluding that the new scientific practices have dramatically improved their field. They argue for even more transparency, including that researchers pre-register their hypotheses and subject their methods to peer-review before they run their experiments—something that Moore has also been outspoken about. That way, once the data come in, psychologists can be confident about their validity.