Remarks by Christopher A. Hart, Vice Chairman
Thank you for inviting the National Transportation Safety Board to speak at this observance of the 51st Anniversary of the loss of the USS Thresher. It is an honor and a privilege to be here and to be asked for our views about improving safety.
More than twenty years ago, Professor Karlene Roberts, from the University of California at Berkeley, and Professor Karl Weick, from the University of Michigan, began developing concepts of High Reliability Organizations based upon the high-hazard operations that are conducted very safely in nuclear submarines and aircraft carriers. Companies in many industries have learned much from the Navy and from these HRO principles, and have used those principles to implement procedures that improve safety.
Hence, it is very humbling to be asked by the Navy about best practices that are in use in the civilian aviation world for improving safety.
An important characteristic of a robust safety culture is a quest for continuous improvement. The Navy has demonstrated a strong desire for continuous improvement in many ways. For example, the Navy deserves kudos for implementing SubSafe, which has been a very successful program that reflects the Navy's recognition that the problems that resulted in the loss of the Thresher went beyond the Thresher itself and reflected a need for a broader and more systemic response. SubSafe also reflects the Navy's desire to honor the 129 lives lost on the Thresher. In addition, the Navy also deserves kudos because, despite being source of improvement for many civilian industries, it is still reaching out to those industries and elsewhere to find ever better ways to continue improving.
The Navy's quest for continuous improvement reflects its awareness that safety is not a destination, but a never-ending journey.
It is certainly my pleasure to help with this quest for continuous improvement by offering some lessons learned in the commercial aviation industry. In order to put my remarks in context, however, let me briefly describe what the NTSB does.
The NTSB is a federal agency that Congress created, in its present form, forty years ago to improve transportation safety by investigating accidents in all modes of transportation, determining the probable cause of those accidents, and making recommendations to prevent recurrences. Our primary product is recommendations, and our focus is safety. We are not a regulator and we cannot require anyone to do what we recommend. We must have our finger on the pulse of economic reality, but we do not do quantitative cost-benefit studies. Our role is to recommend what should be done in an ideal safety world, in which the only consideration is safety. Despite our inability to require action, however, it is a tribute to the high quality of the NTSB's staff that, because they do such world-class accident investigations, and because their analysis from those investigations results in such excellent recommendations, more than 80% of our recommendations are responded to favorably.
Now to the issue the Navy has asked us to address today - continuous improvement.
Many other industries are also struggling with how to continuously improve safety, including nuclear power, chemical manufacturing, petroleum exploration and refining, banks and the financial industries, healthcare, and public utilities. A common characteristic of these industries is that they all involve complex systems of subsystems that are coupled together and that must work together successfully in order for the entire system to work. Because the subsystems are coupled, a change in one subsystem may affect one or more other subsystems.
The challenge for these complex systems is "System Think," i.e., understanding how a change in one subsystem will affect other coupled subsystems within the system.
The commercial aviation industry is pursuing System Think through CAST, the Commercial Aviation Safety Team. CAST obtains System Think by bringing all of the key industry elements - including the airlines, the manufacturers, the pilots, the air traffic controllers, and the regulator, i.e., everyone who has a "dog in the fight" - together to work collaboratively. These industry elements work collaboratively to identify potential safety issues; prioritize those issues - because they will identify more issues than they have resources to address; develop strategies to address the prioritized issues, and then evaluate whether the strategies are working, and whether they are producing any unintended consequences.
The result has been a major win-win in that the CAST process resulted in a reduction of the fatal accident rate by more than 80 percent in its first ten years; and contrary to conventional wisdom that safety and productivity are mutually exclusive, it also improved productivity at the same time. This amazing accident rate reduction was from a rate that, after declining for decades, had begun to stop declining and had been "stuck on a plateau" for several years.
The moral of this collaboration success story is very simple: Anyone who is involved in a problem should be involved in developing the solution.
With respect to submarine safety, the commercial aviation community has also demonstrated System Think, using collaboration, at the aircraft manufacturer level. Early in the design process, manufacturers bring in pilots - their end-users - to help assure that their ultimate product will be friendly to the end-users. Manufacturers also bring in expertise from the maintenance community in order to assure that their aircraft, which generally must last several decades in order to be economically viable, are maintenance-friendly. Finally, because the autopilot does the flying in airliners most of the time, the manufacturers bring in air traffic control expertise in order to assure that their airplanes can easily do what the air traffic controllers are likely to ask them to do.
This collaborative process reflects the manufacturers' recognition that the airplane is a complex system that involves hardware, software, and liveware as coupled subsystems, and complex coupled systems demand system solutions not only for their hardware, software, and liveware separately, but also for all of these subsystems together as part of a complex system.
Recent accidents have demonstrated some of the challenges that are created by the complexity of these hardware/software/liveware systems, and they have also demonstrated that the industry is still struggling to address these challenges. One recent example is Air France Flight 447, which crashed into the Atlantic Ocean in 2009, while enroute from Rio de Janeiro to Paris.
This accident occurred at night, when the airplane was in Instrument Meteorological Conditions, in or near thunderstorms and in turbulence, in large quantities of supercooled water, and flying in "coffin corner," i.e., with very little plus-or-minus airspeed margin around their cruise airspeed. In these circumstances, the pitot tubes - the tubes that protrude from the airplane to measure the airspeed based upon the dynamic pressure of the air that is hitting the tubes - became frozen over in supercooled water that turned into ice.
Once the pitot tubes froze, the aircraft computers no longer had airspeed information. The loss of airspeed information caused the loss of, among other crucial systems, the autopilot, the automatic throttle, and the protections against exceeding a safe angle of attack. Numerous error messages were displayed to the pilots, and for unknown reasons, one pilot pulled back the control stick, commanding nose-up, causing the airplane to enter an aerodynamic stall. The pilots were not able to recover from the stall, and the aircraft ultimately crashed into the ocean.
This accident demonstrated problems in all three subsystems -- hardware, software, and liveware.
One of the major hardware problems was that the three independent pitot-static systems - independent for redundancy - were not actually redundant because they were all taken out by a common cause: icing. This problem had been encountered before, and a retrofit of more robust pitot tube heaters was underway, but it was not an emergency retrofit because other pilots had recovered successfully from this problem, and the accident airplane itself was soon due to be retrofit.
The software problems included that the system stopped functioning immediately upon losing the airspeed information, rather than transitioning gradually; and that the error message displays revealed no cause-and-effect information, i.e., that the loss of airspeed information was the reason for the failure of several other systems. The pilots were apparently unable to determine, in the startled moment, exactly what caused so many error messages to suddenly appear.
The liveware problem started with the fact that the pilots were startled - they were flying along on autopilot and all was well, when suddenly many error messages appeared, and they immediately had to take over and fly the airplane manually. They did not understand the system well enough to be able to figure out what was happening, and they had never before experienced a loss of airspeed event in cruise, even in training. Last but not least, manual flight at cruise altitude is illegal in most parts of the world where they fly, and hence the pilots had never before flown the airplane manually at cruise altitude, even in training.
Automation is double-edge sword: It has a longstanding record of significantly improving safety and efficiency, but the continuing challenge is how to assure that if something goes wrong, the system fails in a safe way instead of catastrophically.
Flight 447 is but one of many accidents that demonstrate that the aviation community is still struggling with hardware/software/liveware system issues.
Nonetheless, in response to the Navy's request for safety improvement suggestions, I would like to offer three suggestions. I offer them, however, with great humility, because the civilian aviation community has certainly learned more from the military about how to improve safety than the military has learned from us.
First, I would suggest that the military continue to exchange notes with the civilian community about how to improve the robustness of software. There are SAE committees and MilSpecs that already address these issues, and they have made great progress, but they have a long way to go, and their challenges are many and increasing.
Second, I would suggest that the military continue looking to civilian industries about how they address issues of complex systems of hardware, software, and liveware. I mentioned two such processes in the aviation industry, one at the industry level and another at the aircraft manufacturer level; query how other complex industries address these challenges. Finding SAE committees and MilSpecs that address these issues, however, will not be as easy as with my first suggestion.
Last, but certainly not least, these complex systems will never be perfect, so it will be very important to have a robust process for collecting and analyzing data that will provide quick feedback about processes in complex systems that are not working as intended. The aviation industry uses two sources for this information - automatic sources, such as aircraft flight data recorders; and human sources, such as non-punitive reporting programs that allow pilots and others to report, without fear of punishment, problems that they encounter in the system. Both sources are usually necessary because the automatic sources normally reveal only what happened; the human sources are usually necessary in order to find out more about why it happened.
In closing, I would again offer tributes to the Navy for seeking continuous safety improvement, and by so doing, honoring the lives lost in the Thresher.
Thanks again for inviting the NTSB to assist with this effort, and we hope the Navy will continue to feel free to call upon us to assist in its quest for continuous safety improvement.