Climbing aboard the helicopter for a training flight one evening was probably the first time that I thought about the difference between probability and impact as components of risk. For whatever reason, I remember looking at the tail rotor gearbox that particular evening and thinking, “What if that fails? There aren’t a lot of options.”
Tail rotor failures in helicopters are not good. The tail rotor is what provides the counterbalance for the torque generated by the main rotor that generates all of the lift. It’s what keeps the fuselage of the helicopter from spinning in the opposite direction of the main rotor in flight. If the tail rotor fails in some way, not only are you no longer in controlled flight (because the fuselage wants to spin around in circles), but the emergency procedures (EP’s) are pretty drastic and probably won’t help much.
So I found myself thinking, “Why in the world would I (or anyone) climb on board when something so catastrophic could happen?” And then the answer hit me, “because it probably won’t fail.” That is, the impact of the event is very bad, but the probability of it happening is low. This particular possibility represented the extremes — very high impact and generally low probability.
But there are possibilities in between also. For example, what if the nose gear gets stuck and won’t come back down when I want to land. While not desirable, it’s certainly not as bad as the tail rotor failing. I could come back and land/hover and keep the nose gear off the deck while someone comes out and tried to pull it down. Or they could put something underneath the nose of the helicopter (like a stack of wooden pallets) and set it down on that. While not a high likelihood of occurrence, a stuck nose gear happens more often than a tail rotor failure, so let’s call it a medium probability for the sake of argument.
While the impact of the stuck-nose-gear-event is much less than that of the tail rotor failure, the potential impact is not trivial because recovery from it requires extra people on the ground that are placed in harm’s way. So maybe this is a medium impact event.
Similarly, what if the main gear box overheats or has other problems? Or other systems have abnormalities, problems or failures? What are the probabilities and impacts of each of these?
There are multiple pieces to the puzzle and each piece needs to be considered in terms of both impact and likelihood. Even as commercial airline passengers,
If we based our decision to fly purely on an analysis of the impact of an adverse event (crash), few people would ever fly.
We do board the plane, though, because we know or believe that the probability of that particular plane crashing is low. So, in making our decision to fly, we consider two components of risk: the probability of a mishap and the impact of a mishap.
We have the same kind of thing in managing risk for IT and Information Management services. We have many interconnected and complex systems and each has components with various probabilities of failure and resulting impacts from failure.
What if I’m working in a healthcare environment and users store Protected Health Information (PHI) on a cloud service with a shared user name and password and PHI leaks out into the wild? This might have risk components of: Probability — High. Impact — Medium to High, depending on size of leak. What about laptop theft or loss containing PHI? Same for USB thumb drives. What is the probability? What is the impact? What about malware infestation of workstations on your network because of lack of configuration standards on BYOD devices? What is the likelihood? What is the impact?
It’s possible that our server or data center could be hit with an asteroid. The impact would be very high. Maybe there are multiple asteroids that hit our failover servers and redundant data centers also ! That would surely shut us down. But does that mean that we should divert our limited business funds to put our server in an underground bunker? Probably not — because the likelihood of server failure due to asteroid impact is very low.
As with flying, when we analyze risks in IT and Information Management operations, we have to dissect and review the events in terms of their respective impacts and probabilities. Only when we have these two components, impact vs probability, can we start to do meaningful analysis and planning.
What are events do you plan for that have Low Probabilities but High Impacts? What about High Probabilities but Low Impacts?