How We’re Duped by Data

Haas faculty lead movement to restore faith in science.

In 2011, Prof. Leif Nelson and two statistics-whiz colleagues from Wharton published an experiment in which participants became 18 months younger after listening to The Beatles’ “When I’m Sixty-Four.” A control group that listened to a bland instrumental piece failed to achieve the same rejuvenating effects. Impossible, right? But that was the point.

The experiment was an academic farce that questioned the status quo of psychology research. By using widely accepted methodology to reach an obviously absurd conclusion, Nelson and his colleagues aimed to show how loose research practices made it all too easy for investigators to prove whatever they set out to prove—even that a pop song can cause reverse aging.

Nelson, who studies judgment and consumer preferences as the Ewald T. Grether Professor of Business Administration and Marketing, had no idea that his eight-page paper, “False Positive Psychology,” would help upend the psychology field, topple famous studies, and send waves throughout the social sciences and beyond. “We wrote it because we cared about the topic, but if you had asked us at the time, we would have said this will simply not be published, full stop,” Nelson says.

Instead, the paper would play a central role in a burgeoning research transparency movement that has found passionate advocates among the faculty at Haas and across the Berkeley campus. The movement is focused on rooting out biases, testing important studies via replication, and—on rare occasions—exposing outright fraud. The aim is to restore the credibility of research in an age when society is swimming in data but public trust in science has eroded. It’s a natural fit for Berkeley, where methodological rigor and values of openness are cultural bedrocks.

“Berkeley has become one of the world’s hubs of activity for issues relating to improving research practices, at Haas and in economics and data science,” says Brian Nosek, a movement leader and co-founder of the Center for Open Science, which hosts the largest platform for research collaboration. “There are few places in the world where there are so many people working on these issues across multiple disciplines.”

At Haas, Nelson, Prof. Don Moore, and others are helping to build tools and infrastructure for new methodologies and training the next generation of researchers and managers in rigorous data analysis practices. Newer faculty, including Asst. Profs. Juliana Schroeder and Ellen Evers, are applying the fresh standards to their own work.

Moore, the Lorraine Tyson Mitchell Chair in Leadership Communication, says he’s optimistic, even thrilled, at the future of science. “Our field is developing tools that are propagating through other fields, into sociology, political science, economics, and medicine,” he says. “They offer the real chance that scientific quality will increase across the board.”

The Perfect Storm

Publish or perish has long been the name of the game in academia, and studies with significant or surprising results are far more likely to be published than research with null results—a phenomenon known as publication bias. What Nelson and his Wharton colleagues, Joseph Simmons and Uri Simonsohn, illustrated in their 2011 paper was that a wide range of generally accepted practices in experimental psychology were allowing researchers’ biases to creep in, often unconsciously.

To make papers publishable, researchers generally need to show—using a measure called the p-value—a less-than 5% likelihood that their results are due to random chance. Nelson and his colleagues coined the term “p-hacking” to describe how “researcher degrees of freedom” allow the p-value to be manipulated. Experimenters were removing outlying data points, tweaking variables in pursuit of statistical significance, and reporting sexy results while leaving others in the proverbial “file drawer.” Too often, they were falling prey to their own confirmation bias—interpreting new evidence as confirmation of their beliefs—and finding effects where there were none.

P-hacking is a problem, Nelson and his colleagues wrote, because “false positives waste resources: They inspire investment in fruitless research programs and can lead to ineffective policy changes,” concluding that “a field known for publishing false positives risks losing its credibility.”

Debates over the manipulation of statistics were nothing new—author Mark Twain famously warned of “lies, damned lies, and statistics.” Psychologist Robert Rosenthal defined the “file-drawer problem” back in 1979. Ten years later, Berkeley economics Prof. Brad DeLong asked, “Are All Economic Hypotheses False?”

“Everybody knew there were problems decades ago. Some people pointed them out. But there were no solutions put into place, and no one started a social movement,” says Prof. David Levine, the Eugene E. and Catherine M. Trefethen Chair in Business Administration, who may have been the first to try an antidote to p-hacking in economics in 1991. As editor of the journal Industrial Relations, he asked two economists warring over the effects of the minimum wage to detail their analysis plans in advance of receiving data. Only one agreed.

But Nelson’s paper came during a time of growing pop-culture interest in surprising psychological insights. It was a crystallization moment that Moore compared to the fable “The Emperor’s New Clothes,” because just about every researcher had done some p-hacking— including the papers’ authors—without realizing how much it mattered.

“After [Nelson] had the courage to speak the truth, people couldn’t ignore it anymore. Everyone knew that everyone else knew, and the emperor had been exposed as naked,” Moore says.

Not only did the paper offer practical solutions, such as preregistering detailed research plans to avoid the temptations of flexible data analysis, but it coincided with other events to create a perfect storm. That same year, the prestigious Journal of Personality and Social Psychology published a study that used accepted statistical practices to “prove” that people could predict the future through extrasensory perception, prompting outrage and public criticism.

“Everybody knew there were problems decades ago. Some people pointed them out. But there were no solutions put into place, and no one started a social movement.”

 

Meanwhile, on the other side of the Atlantic, Asst. Prof. Evers, then a graduate student, had blown the whistle on a superstar social psychologist who was fabricating data (see “Fighting Fraud”). Soon after, two other prominent U.S. psychology researchers were exposed for fake data. Such blatantly unethical behavior was in an entirely different category than p-hacking, but it underscored the need to bring more transparency to the research and publishing process.

At the same time, researchers had begun noticing that an increasing number of psychology experiments weren’t holding up on the second attempt. Nosek had launched the Reproducibility Project, which enlisted 270 researchers to repeat 100 studies published in the most prominent psychology journals. The results—suggesting that as many as two-thirds of the study findings could not be reproduced—helped trigger a full-scale replication crisis. (A second Reproducibility Project is now targeting 50 cancer biology studies.) In 2012, Nosek launched the Open Science Framework to allow researchers to record, share, and report their work. Nelson, Simmons, and Simonsohn started the AsPredicted website to allow researchers a streamlined way to preregister studies.

Beyond Psychology

The open science movement extends beyond psychology to the social and life sciences. Berkeley Economics Prof. Edward Miguel created the field-spanning Berkeley Initiative for Transparency in the Social Sciences (BITSS), which has helped develop new publishing standards now adopted by 1,000 journals. Prof. Paul Gertler wrote in Nature last year about how to make replication the norm in economics.

And Prof. Stefano DellaVigna, a behavioral economist who recently launched a prediction platform to further open science, says cross-disciplinary conversations around methodology are making everyone’s research stronger. He points to efforts by faculty like Assoc. Prof. Dana Carney, who recently invited behavioral economists to a psychology meeting she organized.

“Economics started worrying about some questions of methodology much earlier, requiring things like posting data. But in some ways psychology has leapfrogged ahead,” he says. “Economics has plenty of things still to sort out.”

Crusaders for Open Science

Although everyone agrees on the need for objective, independently verifiable research results, the last few years have involved some bitter academic fights, and there is still plenty of disagreement on how to ensure the best results—including within the walls of Haas. Nelson, Moore, and other faculty in the Marketing and Management of Organizations groups host a weekly journal club that attracts researchers from far and wide to critique new papers and methods.

Nelson and Moore are also involved in large- scale attempts to replicate prior studies and are leaders in training the next generation of researchers. Last spring, they created a PhD seminar in which 18 students tried—and mostly failed—to replicate experiments involving the psychology of scarcity, which claims that poverty affects cognition and decision-making. Michael O’Donnell, a doctoral student of Nelson’s who served as a graduate student instructor, had previously led a 23-lab effort to replicate a study indicating that people performed higher on a trivia test after imagining themselves as a professor. (The original results did not hold up.)

“The stakes are high,” Moore says. “The next generation is holding themselves, and each other, to higher standards.”

Asst. Prof. Schroeder, who studies social cognition, says she’s significantly changed how she works (see p. 30). “Berkeley has really influenced my own research in terms of the quality of the methods,” she says.

All of this is why, despite all the retractions and failures to replicate, leaders in the research transparency movement remain enthusiastic about the future. After all, self-correction is at the heart of the scientific process, and one study has never been enough to prove anything. Theories are proposed, reinforced, and torn down—only to have stronger ones replace them. It’s usually just a bit more incremental.

“Human behavior continues to be an incredibly interesting and important topic to study, and the scientific method continues to be the best way to study it,” says Nelson. “Even if the field took years or decades to improve its practices, the questions will always be worth asking and the answers will just keep getting better.”

Back