This article is published jointly with Vanderbilt University’s Owen Graduate School of Management.
When a high-profile disaster occurs—from the BP Deepwater Horizon spill to Pacific Gas & Electric’s San Bruno pipeline explosion—the public scramble for answers and accountability begins. Oftentimes, among the teams of investigators called in from law enforcement and government agencies, you’ll find organizational behavior experts Berkeley Haas Prof. Emeritus Karlene Roberts or Vanderbilt Owen’s Prof. Rangaraj Ramanujam.
That’s because about three decades ago, researchers at Berkeley pioneered a new way to understand man-made disasters, looking beyond human error and technical glitches to the organizational causes of catastrophes. Roberts was one of these trailblazers—her early aha moments came on Navy ships, where she observed a culture and systems that allowed for risky, technical work while minimizing errors.
A new field was born: the study of high reliability, and the practices that “highly reliable organizations” use to avoid disasters before they start. Researchers went on to apply this new lens to the study nuclear power plants, commercial aviation, utilities, the health care system, and other industries.
Roberts, chair of UC Berkeley’s Center for Catastrophic Risk Management, and Ramanujam, a leading researcher in the field who specializes in health care systems, decided it was time to take stock of the past 30 years of research. The new book they co-edited, “Organizing for Reliability – A Guide for Research and Practice,” was released last month.
We sat down with Roberts and Ramanujam to get a better understanding of the field, the special qualities of highly reliable organizations, and the work that has taken them to the aftermath of disasters around the world.
Q: First off, what is a “highly reliable organization”?
KR: The original definition was an organization that operates in technically demanding conditions and prevents errors that lead to catastrophic consequences. My former Berkeley Haas dean, Ray Miles, once said to me, “High reliability is nothing but good management.” And I’d agree with that in part, but it’s good management in a particular direction. It’s not the same as saying “Our production was much higher this week.” It’s saying, “Our production levels are fine and we did it in a very safe manner.”
RR: There were originally two main features that make an organization highly reliable: It had an extended track record of avoiding errors and adverse outcomes, and it accomplished this despite operating in environments which were extremely challenging and where you’d have expected to see far more errors and adverse outcomes.
Q: How did this field get started?
KR: Accident research, as it was in those years, was mostly about slips, trips and falls. Those are things individuals do, and they’re usually linked to technological issues, such as stairs that weren’t built well, rather than any organizational process. This approach is different from that: if you see a really good thing in the organization going on consistently, then you have to look deep below the surface to see how that happens. You need to look at the individual embedded in the organization. How are pilots able to consistently land on aircraft carriers and rarely crash? That’s where I started off. You have to look into the culture, the decision-making, the communication, the training.
“If you see a really good thing in the organization going on consistently, then you have to look deep below the surface to see how that happens.” —Karlene Roberts
RR: There had been prior work which did look at an organizational approach to accidents, but without a doubt Karlene was one of the pioneers in drawing attention to the exceptional ability of some organizations to be so highly reliable for over such a long period of time.
Expanding the definition of Highly Reliable Organizations
Q: How this definition changed over time?
RR: It continues to evolve and expand. The original focus was on reliability-attaining organizations, whereas it’s pretty clear right now most organizations are seeking reliability without always attaining it. The definition has also expanded beyond preventing major accidents to also include recovering effectively from accidents and shocks. More and more industries face a shrinking public tolerance for error and coming to terms with reliability as an imperative.
BH: Could you give a couple of examples of the most successful HROs, and some that are not?
KR: Commercial aviation, as an industry, has been successful. There have been accidents, yes, but you couldn’t sell airline tickets if planes kept dropping out of the sky. Ranga can speak to the healthcare industry—I think there’s minimal success there.
RR: I would say that healthcare as a whole cannot be characterized as a highly reliable organization, but there are pockets or islands of high reliability for sure. For example, anesthesiology has been ahead of the curve when it comes to minimizing harm from preventable errors. I study patient safety and medical errors, so people sometimes ask me, when they have a family member who is in the hospital for a serious procedure, “What should we do to make sure they are safe?” My answer is, “If you’re very concerned about the risks from a complicated procedure, that’s probably the part that is much better managed from a reliability viewpoint. Be much more alert to the seemingly simple parts of the post-op care such as medication administration and hand-wash compliance.” An especially frequent kind of medical error is the inability of the system to provide the right doses of the right drug to the right patient at the right time. The point here is you could have an organization with some parts that are highly reliable and some that are not. Reliability is highly local.
Q: That doesn’t give me a lot of confidence in the healthcare system. What sets these organizations apart from other organizations? What do they have in common?
RR: Actually, healthcare has been one industry that has been especially receptive to new ideas for enhancing reliability, and therefore, patient safety. In fact, the edited volume has an entire chapter about reliability in healthcare. So, the situation in healthcare in the U.S. is much more encouraging now. As for your question about what is different or distinctive about highly reliable organizations, I’d first of all note they have an explicit organizational or team-level commitment to outcomes of reliability, such as safety. The second is they put a lot of emphasis on training and deliberately cultivating practices within teams so that they are continuously aware of the situation around them and very alert to the possibility of risk.
Mindful organizing
Q: One concept that’s highlighted in the book is “mindful organizing.” What is it?
RR: The idea of mindful organizing originated in the work that Karlene did with Karl Weick on board aircraft carriers, in what is a now classic paper in the field called “Collective Mind in Organizations: Heedful Interrelating on Flight Decks.” It’s a very highly cited and widely admired piece about what enables nuclear aircraft carrier operations to be highly reliable. In it, Karl and Karlene pointed to the quality of interactions among team members as an important part of the answer. Later on, Karl Weick and his colleague Kathy Sutcliffe have worked on—she wrote a chapter in our volume—have continued to formally study and refine the idea of mindful organizing. In essence, they have identified five distinct collective practices that constitute mindful organizing: a preoccupation with failure; a reluctance to simplify interpretations; sensitivity to operations—which means a good awareness of who knows what, in real time; commitment to resilience; and deference to expertise, rather than to authority. Karlene’s paper on “heedful interrelating” has a memorable example about deference to expertise.
Deference to expertise over authority
KR: I was just getting used to being aboard ships at the time, so this struck me. On aircraft carriers, planes land very rapidly. Unlike in commercial aviation where planes slow down to land, these guys speed up, because they’re going to get caught by the arresting gear. This pilot was coming in full steam ahead and suddenly the landing was called off. He pulled up and got out. Well, there was a kid on the deck who waved him off because he found a tool left on the deck. This was the lowest-level guy on the deck who called off the landing. What happened next is the air boss, who controls the aviation tower, shouts out over a loudspeaker to get this kid up to the tower quickly. I was thinking, “I’m going to get to witness this kid being drummed out of the Navy.” The 18-year-old shows up in the tower shaking, and I’m trying to be a fly on the wall watching the whole thing. The air boss congratulated him and told him he did a wonderful job, and he rewarded all the kid’s buddies on the deck as well. If that tool had gotten sucked up into an engine, it was going to cause a pretty dramatic accident. The engine could have been destroyed, the plane could have crashed. If the enemy had been near, they would have taken advantage of it.
It was very powerful. I saw that stuff over and over and over again in the time I spent on Navy ships. And until I understood how the organization really worked I was surprised by it. So that is one of the fundamentals that feed into the mindful organizing features: deference to the person who knows what is going on.
The importance of citizenship behaviors
Q: Wow, really interesting. Ranga, you’ve also talked about how the quality of social interactions affects the reliability of the outcomes.
RR: Yes, it takes lots of things for operations to be reliable. Clearly, technology matters a lot, design matters a lot, procedures matter a lot. But the reality is, even the best-designed technology, when put into operation, can produce situations which cannot be anticipated. Therefore, you depend on people to respond collectively rather than as individuals. If all everyone did was just to follow rules and comply with rules, lots of things will go wrong. As systems become more technologically complex, people must consistently go beyond the call of duty. That’s what we call citizenship behaviors, and people are much more likely to do it when they are part of a team because of team cohesion or motivation.
I did a study with a collaborator on patient safety in hospitals and the concept of silence—specifically the silence of nurses. Oftentimes frontline providers observe something that they think is unsafe or potentially harmful, but they choose not to speak because they think they have low status. And their inability to speak up can lead to a very bad outcome. Silence is not passive. This is a voluntary choice. We found that nurses were more likely to choose to remain silent if they feel their manager is unfair, procedurally. In this sense, a safe culture is also a fair culture.
KR: That’s why if I were to list the top predictors of reliability, I would put communication right at the very top of that kind of list. Open communication is extremely important.
RR: It’s communication in two ways. One is voluntary speaking of every individual in the system, and the other is communication within teams or amongst teams. Number two I would say is respectful interactions. I know it sounds very soft or fluffy, but I do really think that the extent to which people are respectful of one another in their interactions goes to the heart of the organization. And third is a clear commitment to reliability. It sounds very obvious, but I think most organizations or leaders take reliability for granted and think of it as something that folks at the lower levels of the operation do.
I know it sounds very soft or fluffy, but I do really think that the extent to which people are respectful of one another in their interactions goes to the heart of the organization. —Rangaraj Ramanujam
Why do we still have so many catastrophes?
Q: If we know these things about creating reliability, and organizations are spending a lot of money on it, why do we still have so many catastrophes?
KR: A simple answer is that people don’t implement all the things we tell them they should implement. I get called in on a lot of accidents, and all I have to do is look at it and say, “There it goes again.”
Q: What’s the “it” there?
KR: Well, it’s a lot of things and the same things over and over. Take PG&E in the San Bruno explosion, for example, there’s a long list: they didn’t have a good understanding of where their pipes were or what condition the pipes were in. Within the company, there was a lack of coordination and thought given to this issue. Then, if you look a little bit broader, they had never thought about or coordinated with the California Highway Patrol. It was rush hour when the accident happened, and they couldn’t get the cops and firetrucks up there quickly.
RR: Another thing is the scale of operations and the technology are getting more and more complex. It’s quite possible that, like the Red Queen says in Through the Looking Glass, that to stay in the same place you need to keep running. The growth and scale in technology is outpacing organizations’ ability to adopt these practices.
And one more important thing is a reality of life in corporate American today, where any efforts towards reliability happens in the context of escalating pressures for profits and speed. Even organizations that are aware of high-reliability principles might be subordinating reliability to profits and speed.
Increasing complexity
Q: Has technology made us safer?
KR: Yes and no.
RR: Exactly. Ultimately, we need a socio-technical system. The social and technical have to work in tandem.
KR: Very frequently they don’t. We have people right now looking at the Oroville Dam. The government report was written by engineers and no one else. But the problem on the face of that dam wasn’t just caused by engineers not taking into consideration weaknesses. The question is, why didn’t they take those things into consideration? I think it’s pretty clear that somehow the incentives weren’t there to do it. Now, they’re focusing on a large number of other dams where the problem may be the same. I’m glad I don’t live at the bottom of any of those dams.
RR: That’s true. I can think of several examples where technology has made things safer. One of the statistics I think doesn’t get as much respect as it should get is the decrease in car fatalities. There are certain models of cars, like Volvo, for example, where the fatalities are near zero for one or two-year periods. The problem is, even if technology is getting safer, that hasn’t stopped people from trying riskier and riskier things. As Karlene said, the challenge is ensuring that the social organization is keeping pace with the technological advances.
KR: It never does.
Beyond highly reliable organizations
Q: You said earlier you wouldn’t necessarily apply most of these principles to a regular organization that is not high risk. But can other organizations benefit from becoming more reliable?
RR: Earlier, Karlene started by talking about her former dean who told her that high reliability is really just good management. And I think that some of that is true, and some of the practices that HRO researchers brought to the surface are really practices or good communication, good coordination, situational awareness and responding to surprises. If you think about it in those general terms, you can see how those practices could enhance not just outcomes like reliability, but also outcomes such as innovation, speed, flexibility. There is some new research that is applying it in that way.
KR: I’d add a caveat that I don’t worry about this stuff in a mom-and-pop grocery store. Frustration or lack of communication is not likely to kill anybody there. We’re talking about organizations where reliability of outcome is really important. It’s an extraordinarily expensive thing for an organization to do, and you wouldn’t be able to put the money into it unless you were getting into very complex organizations.