Thank you for inviting the National Transportation Safety Board to speak at
this observance of the 51st Anniversary of the loss of the USS
Thresher. It is an honor and a privilege to be here and to be asked for our
views about improving safety.
More than twenty years ago, Professor Karlene Roberts, from the University of
California at Berkeley, and Professor Karl Weick, from the University of
Michigan, began developing concepts of High Reliability Organizations based upon
the high-hazard operations that are conducted very safely in nuclear submarines
and aircraft carriers. Companies in many industries have learned much from the
Navy and from these HRO principles, and have used those principles to implement
procedures that improve safety.
Hence, it is very humbling to be asked by the Navy about best practices that
are in use in the civilian aviation world for improving safety.
An important characteristic of a robust safety culture is a quest for
continuous improvement. The Navy has demonstrated a strong desire for continuous
improvement in many ways. For example, the Navy deserves kudos for implementing
SubSafe, which has been a very successful program that reflects the Navy's
recognition that the problems that resulted in the loss of the Thresher went
beyond the Thresher itself and reflected a need for a broader and more systemic
response. SubSafe also reflects the Navy's desire to honor the 129 lives lost on
the Thresher. In addition, the Navy also deserves kudos because, despite being
source of improvement for many civilian industries, it is still reaching out to
those industries and elsewhere to find ever better ways to continue
The Navy's quest for continuous improvement reflects its awareness that
safety is not a destination, but a never-ending journey.
It is certainly my pleasure to help with this quest for continuous
improvement by offering some lessons learned in the commercial aviation
industry. In order to put my remarks in context, however, let me briefly
describe what the NTSB does.
The NTSB is a federal agency that Congress created, in its present form,
forty years ago to improve transportation safety by investigating accidents in
all modes of transportation, determining the probable cause of those accidents,
and making recommendations to prevent recurrences. Our primary product is
recommendations, and our focus is safety. We are not a regulator and we cannot
require anyone to do what we recommend. We must have our finger on the pulse of
economic reality, but we do not do quantitative cost-benefit studies. Our role
is to recommend what should be done in an ideal safety world, in which the only
consideration is safety. Despite our inability to require action, however, it is
a tribute to the high quality of the NTSB's staff that, because they do such
world-class accident investigations, and because their analysis from those
investigations results in such excellent recommendations, more than 80% of our
recommendations are responded to favorably.
Now to the issue the Navy has asked us to address today - continuous
Many other industries are also struggling with how to continuously improve
safety, including nuclear power, chemical manufacturing, petroleum exploration
and refining, banks and the financial industries, healthcare, and public
utilities. A common characteristic of these industries is that they all involve
complex systems of subsystems that are coupled together and that must work
together successfully in order for the entire system to work. Because the
subsystems are coupled, a change in one subsystem may affect one or more other
The challenge for these complex systems is "System Think," i.e.,
understanding how a change in one subsystem will affect other coupled subsystems
within the system.
The commercial aviation industry is pursuing System Think through CAST, the
Commercial Aviation Safety Team. CAST obtains System Think by bringing all of
the key industry elements - including the airlines, the manufacturers, the
pilots, the air traffic controllers, and the regulator, i.e., everyone who has a
"dog in the fight" - together to work collaboratively. These industry elements
work collaboratively to identify potential safety issues; prioritize those
issues - because they will identify more issues than they have resources to
address; develop strategies to address the prioritized issues, and then evaluate
whether the strategies are working, and whether they are producing any
The result has been a major win-win in that the CAST process resulted in a
reduction of the fatal accident rate by more than 80 percent in its first ten
years; and contrary to conventional wisdom that safety and productivity are
mutually exclusive, it also improved productivity at the same time. This amazing
accident rate reduction was from a rate that, after declining for decades, had
begun to stop declining and had been "stuck on a plateau" for several years.
The moral of this collaboration success story is very simple: Anyone who is
involved in a problem should be involved in developing the solution.
With respect to submarine safety, the commercial aviation community has also
demonstrated System Think, using collaboration, at the aircraft manufacturer
level. Early in the design process, manufacturers bring in pilots - their
end-users - to help assure that their ultimate product will be friendly to the
end-users. Manufacturers also bring in expertise from the maintenance community
in order to assure that their aircraft, which generally must last several
decades in order to be economically viable, are maintenance-friendly. Finally,
because the autopilot does the flying in airliners most of the time, the
manufacturers bring in air traffic control expertise in order to assure that
their airplanes can easily do what the air traffic controllers are likely to ask
them to do.
This collaborative process reflects the manufacturers' recognition that the
airplane is a complex system that involves hardware, software, and liveware as
coupled subsystems, and complex coupled systems demand system solutions not only
for their hardware, software, and liveware separately, but also for all of these
subsystems together as part of a complex system.
Recent accidents have demonstrated some of the challenges that are created by
the complexity of these hardware/software/liveware systems, and they have also
demonstrated that the industry is still struggling to address these challenges.
One recent example is Air France Flight 447, which crashed into the Atlantic
Ocean in 2009, while enroute from Rio de Janeiro to Paris.
This accident occurred at night, when the airplane was in Instrument
Meteorological Conditions, in or near thunderstorms and in turbulence, in large
quantities of supercooled water, and flying in "coffin corner," i.e., with very
little plus-or-minus airspeed margin around their cruise airspeed. In these
circumstances, the pitot tubes - the tubes that protrude from the airplane to
measure the airspeed based upon the dynamic pressure of the air that is hitting
the tubes - became frozen over in supercooled water that turned into ice.
Once the pitot tubes froze, the aircraft computers no longer had airspeed
information. The loss of airspeed information caused the loss of, among other
crucial systems, the autopilot, the automatic throttle, and the protections
against exceeding a safe angle of attack. Numerous error messages were displayed
to the pilots, and for unknown reasons, one pilot pulled back the control stick,
commanding nose-up, causing the airplane to enter an aerodynamic stall. The
pilots were not able to recover from the stall, and the aircraft ultimately
crashed into the ocean.
This accident demonstrated problems in all three subsystems -- hardware,
software, and liveware.
One of the major hardware problems was that the three independent
pitot-static systems - independent for redundancy - were not actually redundant
because they were all taken out by a common cause: icing. This problem had been
encountered before, and a retrofit of more robust pitot tube heaters was
underway, but it was not an emergency retrofit because other pilots had
recovered successfully from this problem, and the accident airplane itself was
soon due to be retrofit.
The software problems included that the system stopped functioning
immediately upon losing the airspeed information, rather than transitioning
gradually; and that the error message displays revealed no cause-and-effect
information, i.e., that the loss of airspeed information was the reason for the
failure of several other systems. The pilots were apparently unable to
determine, in the startled moment, exactly what caused so many error messages to
liveware problem started with the fact that the pilots were startled -
they were flying along on autopilot and all was well, when suddenly many error
messages appeared, and they immediately had to take over and fly the airplane
manually. They did not understand the system well enough to be able to figure
out what was happening, and they had never before experienced a loss of airspeed
event in cruise, even in training. Last but not least, manual flight at cruise
altitude is illegal in most parts of the world where they fly, and hence the
pilots had never before flown the airplane manually at cruise altitude, even in
Automation is double-edge sword: It has a longstanding record of
significantly improving safety and efficiency, but the continuing challenge is
how to assure that if something goes wrong, the system fails in a safe way
instead of catastrophically.
Flight 447 is but one of many accidents that demonstrate that the aviation
community is still struggling with hardware/software/liveware system issues.
Nonetheless, in response to the Navy's request for safety improvement
suggestions, I would like to offer three suggestions. I offer them, however,
with great humility, because the civilian aviation community has certainly
learned more from the military about how to improve safety than the military has
learned from us.
First, I would suggest that the military continue to exchange notes with the
civilian community about how to improve the robustness of software. There are
SAE committees and MilSpecs that already address these issues, and they have
made great progress, but they have a long way to go, and their challenges are
many and increasing.
Second, I would suggest that the military continue looking to civilian
industries about how they address issues of complex systems of hardware,
software, and liveware. I mentioned two such processes in the aviation industry,
one at the industry level and another at the aircraft manufacturer level; query
how other complex industries address these challenges. Finding SAE committees
and MilSpecs that address these issues, however, will not be as easy as with my
Last, but certainly not least, these complex systems will never be perfect,
so it will be very important to have a robust process for collecting and
analyzing data that will provide quick feedback about processes in complex
systems that are not working as intended. The aviation industry uses two sources
for this information - automatic sources, such as aircraft flight data
recorders; and human sources, such as non-punitive reporting programs that allow
pilots and others to report, without fear of punishment, problems that they
encounter in the system. Both sources are usually necessary because the
automatic sources normally reveal only what happened; the human sources
are usually necessary in order to find out more about why it
In closing, I would again offer tributes to the Navy for seeking continuous
safety improvement, and by so doing, honoring the lives lost in the
Thanks again for inviting the NTSB to assist with this effort, and we hope
the Navy will continue to feel free to call upon us to assist in its quest for
continuous safety improvement.