Tag Archives: aviation

Developing Your Scan in Information Risk Management

Cockpit Scan

Learning to scan is one of the most important skills any pilot develops when learning to fly.    In flying, scan is the act of keeping your eyes moving in a methodical, systematic, and smooth way so that you take in information as efficiently as possible, while leaving yourself mental bandwidth for processing that information.

While flying the aircraft, as you look out across the horizon, your scan might start near the left of the horizon and scan across to the right.  Then, upon approaching the right of the horizon, you drop your eyes down a little bit and start scanning back right to left and pick up some instrument readings along the way.  As you approach the left side, you might drop your eyes a little again, change directions, scanning left to right again.  You might repeat this one more time and then finally return back to where you started.  Then do it again.

The exact pattern doesn’t matter, but having a pattern and method does.

cockpit scan

One possible cockpit scan

At first, this is all easier said than done.  When you’re learning to fly, so much is uncertain.  You really don’t know much.  You want to lower your uncertainty.  There is a real tendency to want to know everything about everything.  But you’ll never know everything about everything.  Slowly, a budding pilot begins to learn that.

When I was learning to fly and while mistakenly trying to know every detail about every flight parameter, ironically I would end up (unhelpfully) having what I now call my “instrument of the day”.  I would get so focused on one thing, say the engine power setting, that I would disregard most of everything else.  (This is why there are instructor pilots.)  On the subsequent flight, the instrument might have been the altimeter where I’d be thinking, “By God, I’m going to hold 3,000 feet no matter what! I’m going to own this altitude!” Now the fact that I wasn’t watching anything else and we may have been slowing to stall speed was lost on me because I was so intent on complete knowledge of that one indicator or flight parameter.  (This is why there are instructor pilots.)

To develop an effective scan, you slowly learn that you can’t know everything.  You learn that you have to work with partial information.

You have to live with uncertainty in order to fly.  

By accepting partial information about many things and then slowly integrating that information through repetitive, ongoing scans, you gain what you need to fly competently and safely.  Conversely, if you focus solely on one or two parameters and really ‘know’ them, you’ve given up your ability to have some knowledge on the other things that you need to fly.  In my example above with engine power setting, I could ‘know’ the heck out of what that power setting was, but that told me nothing about my airspeed, altitude, turn rate, etc.  It doesn’t even tell me if I’m right side up. While we can’t know everything about everything, we do want to know a little bit about a lot of things.

“Scan” even becomes a noun.  “Develop your scan.”  “Get back to your scan.” “Don’t let your scan break down.”

Your flying becomes better as your scan becomes better.

After a while, that scan becomes second nature.  As a pilot gains in experience, the trick then becomes to keep your scan moving even when something interesting is happening, eg some indicator is starting to look abnormal (and you might be starting to get a little nervous).  You still want to keep your scan moving and not fixate on any one parameter —

Just because something bad is happening in one place does not mean that something bad (or worse) is not happening somewhere else.

The pilot’s scan objectives are:

  • Continually scan
  • Don’t get overly distracted when anomalies/potential problems appear — keep your scan moving 
  • Don’t fixate on any one parameter — force yourself to work with partial information on that parameter so that you have bandwidth to collect & integrate information from other parameters and other resources
  • If disrupting your scan is unavoidable, return to your scan as soon as possible

Maintaining Your Scan In Information Risk Management

This applies to Information Risk Management as well.  We want to continually review our information system health, status, and indicators. If an indicator starts to appear abnormal, we want to take note but continue our scan.  Again, just because an indicator appears abnormal doesn’t preclude there being a problem, possibly bigger, somewhere else.

An Information Risk Management 'scan' can be similar to a cockpit scan

An Information Risk Management ‘scan’ can be similar to a cockpit scan

A great example is the recent rise in two-pronged information attacks against industry and government.  Increasingly, sophisticated hackers are using an initial attack as a diversion and then launching a secondary attack while a company’s resources are distracted by the first attack.  A recent example of this sort of approach is when the Dutch bank, ING Group, had their online services disrupted by hackers and then followed with a phishing attack on ING banking customers.

This one-two punch is also an approach that we have seen terrorists use over the years where an initial bomb explosion is followed by a second bomb explosion in an attempt to target first-responders.

(As an aside, we know Boston Marathon-related phishing emails were received within minutes of news of the explosions.  I don’t know whether this was automated or manual phishing attacks, but either way, someone was waiting for disasters or other big news events to exploit or leverage.)

I believe that we will continue to see more of these combination attacks.  Further, it is likely that not just one, but rather multiple, incidents will serve as distractions while the real damage is being done elsewhere.

To address this, we must continue to develop and hone our scan skills.  We must:

  • Develop the maturity and confidence to operate with partial information
  • Practice our scan, our methodical and continual monitoring, so that it becomes second nature to us
  • Have the presence of mind and resilience to return to our scan if disrupted

 

Do you regularly review your risk posture? What techniques do you deploy in your scan? What are your indicators of a successful scan?

 

Impact vs Probability

Climbing aboard the helicopter for a training flight one evening was probably the first time that I thought about the difference between probability and impact as components of risk.   For whatever reason, I remember looking at the tail rotor gearbox that particular evening and thinking, “What if that fails? There aren’t a lot of options.”

Tail rotor failures in helicopters are not good.  The tail rotor is what provides the counterbalance for the torque generated by the main rotor that generates all of the lift.  It’s what keeps the fuselage of the helicopter from spinning in the opposite direction of the main rotor in flight.  If the tail rotor fails in some way, not only are you no longer in controlled flight (because the fuselage wants to spin around in circles), but the emergency procedures (EP’s) are pretty drastic and probably won’t help much.

helo1riskslide

tail rotor gearbox

So I found myself thinking, “Why in the world would I (or anyone) climb on board when something so catastrophic could happen?”  And then the answer hit me, “because it probably won’t fail.”  That is, the impact of the event is very bad, but the probability of it happening is low. This particular possibility represented the extremes — very high impact and generally low probability.

helo2riskslide

nose gear

But there are possibilities in between also.  For example, what if the nose gear gets stuck and won’t come back down when I want to land.  While not desirable, it’s certainly not as bad as the tail rotor failing.  I could come back and land/hover and keep the nose gear off the deck while someone comes out and tried to pull it down.  Or they could put something underneath the nose of the helicopter (like a stack of wooden pallets) and set it down on that. While not a high likelihood of occurrence, a stuck nose gear happens more often than a tail rotor failure, so let’s call it a medium probability for the sake of argument.

While the impact of the stuck-nose-gear-event is much less than that of the tail rotor failure, the potential impact is not trivial because recovery from it requires extra people on the ground that are placed in harm’s way. So maybe this is a medium impact event.

helo3riskslide

multiple components/multiple systems

Similarly, what if the main gear box overheats or has other problems? Or other systems have abnormalities, problems or failures? What are the probabilities and impacts of each of these?

There are multiple pieces to the puzzle and each piece needs to be considered in terms of both impact and likelihood.  Even as commercial airline passengers,

If we based our decision to fly purely on an analysis of the impact of an adverse event (crash), few people would ever fly.

We do board the plane, though,  because we know or believe that the probability of that particular plane crashing is low.  So, in making our decision to fly, we consider two components of risk: the probability of a mishap and the impact of a mishap.

We have the same kind of thing in managing risk for IT and Information Management services.  We have many interconnected and complex systems and each has components with various probabilities of failure and resulting impacts from failure.

itimriskslide

multiple components and multiple systems in IT & Information Management systems as well

 What if I’m working in a healthcare environment and users store Protected Health Information (PHI) on a cloud service with a shared user name and password and PHI leaks out into the wild?  This might have risk components of: Probability — High.  Impact — Medium to High, depending on size of leak.  What about laptop theft or loss containing PHI? Same for USB thumb drives.  What is the probability? What is the impact? What about malware infestation of workstations on your network because of lack of configuration standards on BYOD devices? What is the likelihood?  What is the impact?

It’s possible that our server or data center could be hit with an asteroid.  The impact would be very high.  Maybe there are multiple asteroids that hit our failover servers and redundant data centers also !  That would surely shut us down.  But does that mean that we should divert our limited business funds to put our server in an underground bunker?  Probably not — because the likelihood of server failure due to asteroid impact is very low.

As with flying, when we analyze risks  in IT and Information Management operations, we have to dissect and review the events in terms of their respective impacts and probabilities.  Only when we have these two components, impact vs probability, can we start to do meaningful analysis and planning.

 

What are events do you plan for that have Low Probabilities but High Impacts?  What about High Probabilities but Low Impacts?