Magazine Fall 2019

Table of Contents

How We’re Duped by Data

Haas faculty lead movement to restore faith in science.

Featured Researcher

Leif Nelson

Professor in Business Administration & Marketing, Marketing Group

Featured Researcher

Don A. Moore

Professor, Management of Organizations

Featured Researcher

Ellen Evers

Assistant Professor, Marketing

Featured Researcher

Juliana Schroeder

Associate Professor, Management of Organizations

LAURA COUNTS

In 2011, Prof. Leif Nelson and two statistics-whiz colleagues from Wharton published an experiment in which participants became 18 months younger after listening to The Beatles’ “When I’m Sixty-Four.” A control group that listened to a bland instrumental piece failed to achieve the same rejuvenating effects. Impossible, right? But that was the point.

The experiment was an academic farce that questioned the status quo of psychology research. By using widely accepted methodology to reach an obviously absurd conclusion, Nelson and his colleagues aimed to show how loose research practices made it all too easy for investigators to prove whatever they set out to prove—even that a pop song can cause reverse aging.

Nelson, who studies judgment and consumer preferences as the Ewald T. Grether Professor of Business Administration and Marketing, had no idea that his eight-page paper, “False Positive Psychology,” would help upend the psychology field, topple famous studies, and send waves throughout the social sciences and beyond. “We wrote it because we cared about the topic, but if you had asked us at the time, we would have said this will simply not be published, full stop,” Nelson says.

Instead, the paper would play a central role in a burgeoning research transparency movement that has found passionate advocates among the faculty at Haas and across the Berkeley campus. The movement is focused on rooting out biases, testing important studies via replication, and—on rare occasions—exposing outright fraud. The aim is to restore the credibility of research in an age when society is swimming in data but public trust in science has eroded. It’s a natural fit for Berkeley, where methodological rigor and values of openness are cultural bedrocks.

“Berkeley has become one of the world’s hubs of activity for issues relating to improving research practices, at Haas and in economics and data science,” says Brian Nosek, a movement leader and co-founder of the Center for Open Science, which hosts the largest platform for research collaboration. “There are few places in the world where there are so many people working on these issues across multiple disciplines.”

At Haas, Nelson, Prof. Don Moore, and others are helping to build tools and infrastructure for new methodologies and training the next generation of researchers and managers in rigorous data analysis practices. Newer faculty, including Asst. Profs. Juliana Schroeder and Ellen Evers, are applying the fresh standards to their own work.

Moore, the Lorraine Tyson Mitchell Chair in Leadership Communication, says he’s optimistic, even thrilled, at the future of science. “Our field is developing tools that are propagating through other fields, into sociology, political science, economics, and medicine,” he says. “They offer the real chance that scientific quality will increase across the board.”

The Perfect Storm

Publish or perish has long been the name of the game in academia, and studies with significant or surprising results are far more likely to be published than research with null results—a phenomenon known as publication bias. What Nelson and his Wharton colleagues, Joseph Simmons and Uri Simonsohn, illustrated in their 2011 paper was that a wide range of generally accepted practices in experimental psychology were allowing researchers’ biases to creep in, often unconsciously.

To make papers publishable, researchers generally need to show—using a measure called the p-value—a less-than 5% likelihood that their results are due to random chance. Nelson and his colleagues coined the term “p-hacking” to describe how “researcher degrees of freedom” allow the p-value to be manipulated. Experimenters were removing outlying data points, tweaking variables in pursuit of statistical significance, and reporting sexy results while leaving others in the proverbial “file drawer.” Too often, they were falling prey to their own confirmation bias—interpreting new evidence as confirmation of their beliefs—and finding effects where there were none.

P-hacking is a problem, Nelson and his colleagues wrote, because “false positives waste resources: They inspire investment in fruitless research programs and can lead to ineffective policy changes,” concluding that “a field known for publishing false positives risks losing its credibility.”

Debates over the manipulation of statistics were nothing new—author Mark Twain famously warned of “lies, damned lies, and statistics.” Psychologist Robert Rosenthal defined the “file-drawer problem” back in 1979. Ten years later, Berkeley economics Prof. Brad DeLong asked, “Are All Economic Hypotheses False?”

“Everybody knew there were problems decades ago. Some people pointed them out. But there were no solutions put into place, and no one started a social movement,” says Prof. David Levine, the Eugene E. and Catherine M. Trefethen Chair in Business Administration, who may have been the first to try an antidote to p-hacking in economics in 1991. As editor of the journal Industrial Relations, he asked two economists warring over the effects of the minimum wage to detail their analysis plans in advance of receiving data. Only one agreed.

But Nelson’s paper came during a time of growing pop-culture interest in surprising psychological insights. It was a crystallization moment that Moore compared to the fable “The Emperor’s New Clothes,” because just about every researcher had done some p-hacking— including the papers’ authors—without realizing how much it mattered.

“After [Nelson] had the courage to speak the truth, people couldn’t ignore it anymore. Everyone knew that everyone else knew, and the emperor had been exposed as naked,” Moore says.

Not only did the paper offer practical solutions, such as preregistering detailed research plans to avoid the temptations of flexible data analysis, but it coincided with other events to create a perfect storm. That same year, the prestigious Journal of Personality and Social Psychology published a study that used accepted statistical practices to “prove” that people could predict the future through extrasensory perception, prompting outrage and public criticism.

“Everybody knew there were problems decades ago. Some people pointed them out. But there were no solutions put into place, and no one started a social movement.”

Meanwhile, on the other side of the Atlantic, Asst. Prof. Evers, then a graduate student, had blown the whistle on a superstar social psychologist who was fabricating data (see “Fighting Fraud”). Soon after, two other prominent U.S. psychology researchers were exposed for fake data. Such blatantly unethical behavior was in an entirely different category than p-hacking, but it underscored the need to bring more transparency to the research and publishing process.

At the same time, researchers had begun noticing that an increasing number of psychology experiments weren’t holding up on the second attempt. Nosek had launched the Reproducibility Project, which enlisted 270 researchers to repeat 100 studies published in the most prominent psychology journals. The results—suggesting that as many as two-thirds of the study findings could not be reproduced—helped trigger a full-scale replication crisis. (A second Reproducibility Project is now targeting 50 cancer biology studies.) In 2012, Nosek launched the Open Science Framework to allow researchers to record, share, and report their work. Nelson, Simmons, and Simonsohn started the AsPredicted website to allow researchers a streamlined way to preregister studies.

Beyond Psychology

The open science movement extends beyond psychology to the social and life sciences. Berkeley Economics Prof. Edward Miguel created the field-spanning Berkeley Initiative for Transparency in the Social Sciences (BITSS), which has helped develop new publishing standards now adopted by 1,000 journals. Prof. Paul Gertler wrote in Nature last year about how to make replication the norm in economics.

And Prof. Stefano DellaVigna, a behavioral economist who recently launched a prediction platform to further open science, says cross-disciplinary conversations around methodology are making everyone’s research stronger. He points to efforts by faculty like Assoc. Prof. Dana Carney, who recently invited behavioral economists to a psychology meeting she organized.

“Economics started worrying about some questions of methodology much earlier, requiring things like posting data. But in some ways psychology has leapfrogged ahead,” he says. “Economics has plenty of things still to sort out.”

Crusaders for Open Science

Although everyone agrees on the need for objective, independently verifiable research results, the last few years have involved some bitter academic fights, and there is still plenty of disagreement on how to ensure the best results—including within the walls of Haas. Nelson, Moore, and other faculty in the Marketing and Management of Organizations groups host a weekly journal club that attracts researchers from far and wide to critique new papers and methods.

Nelson and Moore are also involved in large- scale attempts to replicate prior studies and are leaders in training the next generation of researchers. Last spring, they created a PhD seminar in which 18 students tried—and mostly failed—to replicate experiments involving the psychology of scarcity, which claims that poverty affects cognition and decision-making. Michael O’Donnell, a doctoral student of Nelson’s who served as a graduate student instructor, had previously led a 23-lab effort to replicate a study indicating that people performed higher on a trivia test after imagining themselves as a professor. (The original results did not hold up.)

“The stakes are high,” Moore says. “The next generation is holding themselves, and each other, to higher standards.”

Asst. Prof. Schroeder, who studies social cognition, says she’s significantly changed how she works (see p. 30). “Berkeley has really influenced my own research in terms of the quality of the methods,” she says.

All of this is why, despite all the retractions and failures to replicate, leaders in the research transparency movement remain enthusiastic about the future. After all, self-correction is at the heart of the scientific process, and one study has never been enough to prove anything. Theories are proposed, reinforced, and torn down—only to have stronger ones replace them. It’s usually just a bit more incremental.

“Human behavior continues to be an incredibly interesting and important topic to study, and the scientific method continues to be the best way to study it,” says Nelson. “Even if the field took years or decades to improve its practices, the questions will always be worth asking and the answers will just keep getting better.”

Whether analyzing the results of an ad campaign, social media push, or customer survey, it’s tempting to select only those metrics that show a desired outcome. “People are going to try to present the most exciting conclusion,” says Prof. Leif Nelson. “Leaders need to look past that and play the role of the wet blanket.” Prof. Don Moore advises learning how conclusions were reached. “Ask how evidence was generated. Did their hypothesis precede data collection?” he says. Both suggest these questions when someone presents you with data they claim proves something.

1. How did you decide how much data you would collect?

2. Did any other analyses yield different results?

3. Did you measure any other variables worth discussing?

4. Did these results surprise you or were they expected at the outset?

A 2017 paper published in Psychological Science by Asst. Prof. Juliana Schroeder found that people are less likely to dehumanize a person with opposing political views when they hear them speak, rather than reading what they say.

Coming just after the polarizing 2016 election, several top-tier media outlets reported on the study. The Washington Post declared, “Science shows why it’s important to speak—not write—to people who disagree with you.”

So how did Schroeder and her co-authors ensure their findings were as scientific and impartial as possible? Their procedures were different than what they would have done even a couple of years prior.

1. They preregistered their predictions and analysis plan on the Open Science Framework (OSF) before collecting data.

2. They conducted four experiments with sample sizes of several hundred people each, explaining when they planned
to stop collecting data and describing which participants were excluded from analysis and why.

3. They reported all measured variables and experimental conditions.

4. After finding in the first two experiments that people viewed those with opposing political views as more thoughtful, competent, and rational on audio or video than in written text, they conducted a preregistered replication, but with a larger sample size and using people’s views on the presidential candidates, on the weekend before the 2016 election.

5. They reported statistical effect sizes and 95% confidence intervals for all analyses. They confirmed their main hypothesis with <0.1% chance the results were random and directed readers to interpret weaker, though still significant, effects with caution.

6. They reported variations on coding their data and identified any analyses that weren’t preregistered as exploratory.

7. They posted all materials, including surveys, data, and transcripts (within the bounds of privacy rules) on the OSF.

Data concerns prevalent in academia—succumbing to biases, capitalizing on chance, and flexible analysis—readily occur in business. Two MBAs describe how lessons learned at Haas inform their work.

Kim Ayers, MBA 18
Associate, McKinsey

“When my team and I are quickly moving forward with a hypothesis, I try to take a moment for divergent thinking, to look at the facts and data in front of us to evaluate if we are falling prey to confirmation bias. What might we be missing? What other explanatory factors might be at play that are not captured in our data? What data would I need in order to confirm that an alternative hypothesis is not true?”

Sally Darby, MBA 17
UX Researcher, Lyft

“I regularly collect and interpret quantitative and qualitative data to understand customer behavior and to improve products and services. Challenging my assumptions and establishing hypotheses before conducting a study—things I learned at Haas—help me avoid confirmation bias, which is critical to identify true areas of opportunity.”

As an undergraduate at The Netherlands’ Tilburg University, Asst. Prof. Ellen Evers landed a research assistant position to prominent social psychologist Diederik Stapel—known for studies showing that trash-filled environments bring out people’s racist tendencies or that carnivores are more selfish than vegetarians.

“He ran amazing experiments that turned out perfectly every time, and somehow all his students’ work turned out perfectly as well,” says Evers, who continued working for Stapel as a master’s student.

Over time, she noticed anomalies in his work and became convinced that his data—and data he routinely collected for his students—was counterfeit. Joining with a fellow student and a junior faculty member, Evers painstakingly collected enough evidence to report Stapel, who was then dean of the school, to their department head.

It was a bold, career-risking move, but the university immediately suspended Stapel and launched an investigation. Stapel confessed to decades of fraud and dozens of his articles were retracted. Some of his former students, whose PhDs were based on fake data, lost professorships or had to complete new dissertations. The case stoked calls for research transparency, including open posting of data. It also shaped Evers into a researcher who never takes anything at face value.

When Asst. Prof. Ellen Evers asked students to design the most “successful” Facebook ad, two groups claimed victory: one with the most likes, another the most shares. Evers had proved a point: If you don’t clearly define a question up front, it’s easy to be self-serving interpreting data.

That is one key problem solved by preregistration, when researchers post their hypotheses and plans for data collection and analysis before beginning a study, limiting the possibility of flexibly analyzing data in search of the most interesting results.

Preregistration was first required by the FDA. “The drug companies would run 20 trials, hide 19, and publish the one they liked,” says Prof. David Levine. “Preregistration took care of the file-drawer problem.”

Prof. Don Moore says the same applies in business. “Without preregistration, statistics are usually, if not always, misleading,” he says. “To discover the truth, commit to a plan before you have any data in hand.”

Posted in:

Research

Topics:

Economics Marketing

How We’re Duped by Data

The Perfect Storm

Beyond Psychology

Crusaders for Open Science

DATA 101 FOR LEADERS: AVOID CHERRY-PICKING

HOW ONE RESEARCHER MAKES HER WORK MORE TRANSPARENT

FENDING OFF CONFIRMATION BIAS AT WORK

FIGHTING FRAUD

IN ANALYZING DATA, STICK TO A PLAN

How We’re Duped by Data

The Perfect Storm

Beyond Psychology

Crusaders for Open Science

DATA 101 FOR LEADERS: AVOID CHERRY-PICKING

HOW ONE RESEARCHER MAKES HER WORK MORE TRANSPARENT

FENDING OFF CONFIRMATION BIAS AT WORK

FIGHTING FRAUD

IN ANALYZING DATA, STICK TO A PLAN

More from this Issue