In a three-part series of blogs for the hub, Norman Macleod explores how systems behave and how the actions of humans and organisations increase risk. He argues that, to measure safety, we need to understand the creation of risk.
In this first blog, Norman looks at the problems of measuring safety, using an example from aviation to illustrate his points.
In the final paragraph of his seminal 2005 paper, 'Evaluating the Quality of Medical Care', Donabedian suggests that instead of asking "What is wrong: and how can we make it better?" we should, more often, ask "What goes on here?" The author identifies three areas of enquiry: process, outcomes and structure. He also recognises that care episodes are not discrete: instead, they form chains of events involving multiple actors. The issues raised in the paper apply equally to the problem of measuring safety.
Vincent, Burnett and Carthey offer a definition of patient safety as:
"The avoidance, prevention and amelioration of adverse outcomes or injuries stemming from the process of healthcare."
The authors also suggest that quality deals with the intended results of the healthcare system whereas safety looks at the ways the system can fail to function. Leveson, though, observes that, in engineering, reliability is not the same as safety: and we could substitute quality for reliability.
Safety has been described as a "dynamic non-event" (Weick) in that it is "an ongoing condition in which problems are momentarily under control […]". Implicit in this position is that the absence of failure does not mean that an entity is safe. Another view is that safety is the "freedom from [a level of] risk which is not tolerable". These approaches shift the focus from outcomes to the domain of structure and how it shapes processes. This suggests that measures of safety should address the issue of ‘control’ in the workplace. We particularly want to understand the distribution of risk and how it becomes ‘intolerable.’
Understanding ‘What goes on here?’
A patient entering the healthcare system experiences episodes of care, each of which is intended to remediate the patient’s condition in some way. Despite being highly proceduralised, the inherent variability in each patient requires treatment to be adaptive because, in short, no two patients are the same. Equally, the condition of the healthcare worker introduces variability. As a result, there are multiple pathways that can lead to the same safe outcome. The range of different ways an episode can unfold can be described as ‘buffering’: the system has the capacity to cope with variability and still function as intended. Unfortunately, each variation in the delivery of a specific episode carries with it a degree of risk, which is often not apparent unless something goes wrong.
Occasionally activity will exceed the system’s buffering capacity. We can hypothesis a point where a process transitions from safe to unsafe: the resources available to restore the process to a safe state have been exhausted. We are particularly interested in how systems behave in these boundary states. Finally, we want to know how a system fails. Is the outcome inconsequential, recoverable but with additional intervention, or catastrophic? A system’s response to failure can be described as its tolerance. These concepts are illustrated using output from an aircraft’s digital flight data recorder (DFDR):
Figure 1: Li L. CityU, Hong Kong. Personal communication.
The graph depicts an aircraft during the final approach. Approaching the runway, the pilot lifts the nose to stop the rate of descent. Power is reduced, the aircraft settles on the runway and the nose is lowered again. This change in attitude is recorded in flight data as the pitch angle. The graph shows the pitch angle of 300 aircraft during the final mile of the approach to touchdown and then shows the aircraft on the runway and slowing down. The dark blue band shows the central 50% of data points, those closest to the planned approach path, with the outer, lighter bands showing 20% either side (some data is lost in the processing). All these approaches were successful and the data shows the range of solutions to the problem of attitude control on final approach: the buffering.
Airline safety management systems are required to track parameters out of tolerance and the chart shows the angle that would trigger a Flight Data Monitoring (FDM) alert. We can see the gap between ‘normal’ and what would trigger a safety alert. Put another way, it shows how close the system is operating to a safety trigger but without knowing it. The graph reveals the ‘what goes on here’ that would normally be invisible.
The red line on the graph is the data for a specific flight that did result in an investigation. The outcome was a ‘hard landing’. Hard landings can trigger a mandatory maintenance inspection (lost productivity while the aircraft is being checked), damage to the aircraft structure and even a collapsed undercarriage. These are the outcomes that could arise from the same initial problem. The result, in this case benign, illustrates the tolerance in the system.
To measure safety we, first, need to understand performance variability (buffering), behaviour at the boundaries (opportunities to recover) and tolerance (how failure propagates). Having said that measures of outcome are not useful indicators of safety, the first problem we face is that safety reflects performance in a space that is not easily open to inspection. If that is the case, then we need to look for surrogates that can reliably stand in for direct measures of safety. In part 2 of this blog, I will look at how error may offer insight into system’s behaviour.
I would love to hear your feedback on this blog and how you 'measure safety'. Please add your comments below (you will need to be a hub member and signed into the hub to comment).
- Donabedian A. Evaluating the Quality of Medical Care. The Milbank Quarterly 2005; 83 (4):691-729.
- Vincent C, Burnett S, Carthey J. The Measure and Monitoring of Safety. The Health Foundation Spotlight, 2013.
- Leveson N. Engineering a Safer World. MIT Press. 2011. DOI: https://doi.org/10.7551/mitpress/8179.001.0001
- Weick KE. Organizational culture as a source of high reliability. California Management Review 1987: 29 (2): 112-128.
- Li L. CityU, Hong Kong. Personal communication.
Further blogs from Norman:
About the Author
Norman MacLeod served for 20 years in the RAF involved in the design and delivery of training in a variety of situations. He stumbled across 'CRM' in 1988 while investigating leadership in military transport aircraft crews. From 1994, he worked around the world as a consultant in the field of CRM in commercial aviation, latterly employed as the Human Factors Manager for a blue chip airline in Hong Kong. Now semi-retired, he is one of the Patient Safety Partners at James Cook Hospital in Middlesbrough.