Tag Archives: risk

Horseshoe Irony & Uncertainty

As we manage the online existence of our enterprises, we have discussed that we must rid ourselves of the illusion of total control, certainty, and predictability.  Ironically, however, we often must move forward as if we’re certain or near-certain of the outcome.  No shortage of irony, contradiction, and paradox here.

While playing horseshoes over Memorial Day Weekend with neighbors, I had a minor epiphany regarding uncertainty.  I realized that once I tossed my horseshoe and it landed in the other pit (a shallow three-sided box made up of sand, dirt, some tree roots, and the pin itself) that

I had no idea about where that horseshoe was going to bounce.

I had some control over the initial toss — I could usually get it in the box — but once it hit the ground, I had zero certainty about which direction it was heading. It might bury in the sand, bounce high on the dirt, or hit a tree root and fly off to parts unknown.

Unpredictable bounce

Unpredictable bounce

Because I couldn’t control that bounce, my best opportunity for winning was to get that horseshoe to land in the immediate vicinity of that pin as often as possible so that I created the opportunity for that arbitrary bounce to land on the pin as often as possible.

By seeking to place that horseshoe near that pin for the first bounce as often as I could, the more likely it was to land on, near, or around the pin for points or even the highly coveted ringer.

Here’s the irony.  One of the best ways to achieve this goal of landing the shoe near the pin as often as possible, is to aim for the pin every time.  That is, toss the horseshoe like you expect to get a ringer with every throw.

So, we’re simultaneously admitting to ourselves that we can’t control the outcome while proceeding as if we can control the outcome.  

This logical incongruity is precisely the sort of thing that can make addressing uncertainty and managing risk so challenging.

Similar things happen with the management of our information systems.  We want to earnestly move forward with objectives like 100% accurate inventory and 100% of devices on a configuration management plan.  Most of us know that’s not going to happen in practice with our limited resources.  However,

    • by choosing good management objectives (historically known as ‘controls’) 
    • executing earnestly towards those objectives while
    • thoughtfully managing resources,

we increase the chances of things going our way even while living in the middle of a world that we ultimately can’t control.

Those enterprises that do this well not only increase survivability, but also increase competitive advantage because they will use resources where they are most needed and not waste them where they don’t add value.

Ringer

Ringer

So, yes, I’m trying to get a ringer with every throw, but at the same time, I know that is unlikely. But while shooting for the ringer every time, I increase the opportunity for that arbitrary bounce to go in a way that’s helpful to me.

Much like horseshoes, in Information Risk Management, I can’t control what is going to happen on every single IT asset, be it workstation, server, or router, but I can do things to increase the chances for things move in a helpful way.

The opportunity for growth before us, then, is to have that self-awareness of genuinely moving towards a goal, simultaneously knowing that it is unlikely that we will reach it, and being ready to adjust when and if we don’t.

 

How do you create opportunity for taking advantage of uncertainty in your organization?

Inverting Sun Tzu – Know Yourself 1st

While Sun Tzu implores us in The Art of War to, “Know your enemy, know yourself” to win 100 battles, in information risk management for small and medium-sized businesses, we need to invert that priority to “Know yourself and then know your enemy.”

Sun Tzu

Sun Tzu

Actually, for those of us in resource-constrained organizations trying to protect ourselves and manage our information risk, we need to add a middle piece to that phrase, “Know your environment.”  Knowing ourselves, though, is the fundamental foundation.  So it looks something like this:

Fundamentals -- know yourself

Fundamentals — know yourself

The trick, though, is that it’s not so easy to know ourselves.  A major challenge for small and medium-sized businesses in this time of BYOD and indeterminate interconnectivity, is that trying to know ourselves is tough all by itself.  With unknown devices in unknown configurations with unknown operating parameters entering the business everyday, it’s hard to even know ourselves.  Just getting a device inventory is difficult.  And that’s really just counting.  So if counting is hard, we know we’ve got a challenge.

Knowing ourselves does not enable us to predict or control the future, but it does allow us to make better decisions when unpredictable things happen

As we work towards mastery of knowing ourselves, we then begin to endeavor to better understand the environment in which we work.  How rough is the online neighborhood in which we’re doing business? (pretty rough).  Who are we connected to? Who’s connected to us? Who might be trying to connect to us?

BYOD makes increases the complexity of knowing yourself ...

BYOD increases the complexity of knowing yourself …

The “Know your enemy” part is important, and intriguing, and sexy, but we can’t get there without better knowing ourselves and the environment in which we work and defend ourselves  

 

Do you know your company’s business objectives, its assets, its capabilities, its vulnerabilities?  What techniques do you use to know yourself, your business? Do you use a risk register to do this? Informal focus groups? Something else?

Frog Boiling and Shifting Baselines

There is an apocryphal anecdote that a frog can be boiled to death if it is placed in a pan with room temperature water and then slowly heating the water to boiling. The idea is that the frog continues to acclimate to the new temperature which is only slightly warmer than the previous temperature (if the temp went up in incremental chunks) and so it is never in distress and never tries to jump out. A less PETA-evoking metaphor is that of a ‘shifting baseline’ which was originally used to describe a problem in measuring rates of decline in fish populations. The idea here is that skewed research results stemmed from choosing a starting point that did not reflect changes in fish populations that had already occurred which resulted in an order of magnitude of change that was not accounted for.

We can also see this sort of effect in risk management activities where risks that were initially identified with a particular impact and likelihood (and acceptability) seem to slide to a new place over time on the risk heat map. This can be a problem because often the impact and the probability did not change and there’s no logical reason for the level of risk acceptability to have changed. I believe this slide can happen for a number of reasons:

  • we get accustomed to having the risk; it seems more familiar and (illogically) less uncertain the longer we are around it
  • we tell ourselves that because the risk event didn’t happen yesterday or the day before, then it probably won’t happen today either. This is flawed thinking! (In aviation we called this complacency).
  • we get overloaded, fatigued, or distracted and our diligence erodes
  • external, financial, and/or political pressures create an (often insidious) change in how we determine what risk is acceptable and what is not. That is, criteria for impact and/or probability is changed without recognition of that change.

Challenger Disaster

Challenger_explosion

Challenger plume after explosion

In 1986, the Space Shuttle Challenger exploded 71 seconds after liftoff. The Rogers Commission, established to investigate the mishap, created a report and panel debrief to include the testimony by the iconic Richard Feynman. Chapter 6 of the report, entitled “An Accident Rooted in History” opens with, “The Space Shuttle’s Solid Rocket Booster problem began with the faulty design of its joint and increased as both NASA and contractor management first failed to recognize it as a problem, then failed to fix it and finally treated it as an acceptable flight risk.” (italics added) Let’s recap that:

  • initially failed to formally recognize it as a problem (but later did)
  • failed to fix it
  • morphed into an acceptable flight Risk

When further testing confirmed O Ring issues, instead of fixing the problem, “the reaction by both NASA and Thiokol (the contractor) was to increase the amount of damage considered ‘acceptable’.” That is, in effect, they changed criteria for risk.  Implicitly, and by extension, loss of life and vehicle was now more acceptable.

 

For risk to become acceptable, the criteria for impact and/or probability would have had to change -- even if implicitly

For risk to become acceptable, the criteria for impact and/or probability would have had to change — even if implicitly

Physicist Richard Feynman, on the Commission, observed:

“a kind of Russian roulette. … (The Shuttle) flies (with O-ring
erosion) and nothing happens. Then it is suggested, therefore, that
the risk is no longer so high for the next flights. We can lower our
standards a little bit because we got away with it last time. … You
got away with it, but it shouldn’t be done over and over again like
that.”

This is the sort of frog boiling or baseline shifting that we talked about earlier. Though there were many contributing factors, I believe that one of them was that they simply got familiar with and comfortable with the risk. In a way, the risk was ‘old news’.

Frog Boiling in Information Risk Management

This kind of frog boiling or baseline shifting can happen with Information Risk Management as well. As we become inundated with tasks, complexity, and knowledge of new threats and vulnerabilities, it can be tempting to reduce the importance of risk issues that were established earlier. That is, we can have a tendency to put a higher priority on the issue(s) that we have been most recently dealing with. If we are not careful, the perceptions of earlier risks can actually change.  Gregory Berns discusses perception change in his book, Iconoclast.

It’s one thing to consciously re-evaluate our tolerance for difference types of risk because we have new information and new things to consider, but it is another to let our tolerance for risk slide because of fatigue, ‘information overload’, political pressure, or distraction.

More than once I’ve picked up an older Information Risk Register & Heat Map and reminded myself that there were unmitigated risks that were still there. Yes, since that time, I had new risks to deal with, but that didn’t mean that the original ones went away.

One way to help us address this non-overt perception change is to use our earlier Risk Registers and Heat Maps.

      • Look at it again.
      • Did the original issues really go away?
      • Did their impacts or likelihoods really lessen?

It may well be that with new knowledge of threats, vulnerabilities, and capabilities, that we have to revisit and adjust (usually increase) our tolerance for risk given our limited resources. That is fine and appropriate. However, this needs to be a conscious effort. Changing the importance of a risk issue, whether it be its impact or probability, because we are tired, overwhelmed, or because the issue is ‘old news’ doesn’t help us. We need to be methodical and consistent with our approach. Otherwise, we are just fooling ourselves.

 

Have you had to change your risk tolerance in your organization?  Have you had to change your criteria for impact and probability? What were driving factors for re-assessing how you measure risk impact?

 

Federal court rules SMB — not bank — liable for loss from online theft

Even though Uniform Commercial Code places loss risk with banks for unauthorized transfers, a Federal court ruled against an SMB in Missouri last month and with the larger bank — primarily because the SMB did not implement fraud prevention controls offered by the bank.  This resulted in a $440,000 loss for the SMB.  Here’s the nutshell version:

  1. SMB has business account with bank
  2. Bank offers security (fraud prevention) controls for SMB
  3. SMB declines to implement controls (twice)
  4. SMB computer hacked & SMB’s credentials used to transfer money from its bank to Cyprus bank
  5. SMB sues bank for loss stemming from stolen funds
  6. Federal court rules against SMB and with bank
  7. SMB out $440,000 plus legal expenses

If this indeed sets precedent, this further increases SMB business risk.

Some lessons learned:

  • If your bank offers recommended security services or tools, use them (unless you can show that this directly and materially negatively impacts your business)
  • Use Positive Pay where list of authorized checks are provided to the bank via separate channel (i.e. bank has to cross check against that list prior to paying checks/requests presented to them)
  • Use a dedicated computer for banking transactions
  • Use two-factor authentication where possible
  • If not using Positive Pay or similar service, establish criteria with your bank for when they should alert you that a check or transfer request seems unusual

 More here in this Dark Reading story.

Communicate risk in a single page (the first time)

One of the most challenging aspects of work for an IT or information management professional is to communicate risk.  If you are in a resource-constrained business, e.g. small and medium size businesses (SMB), that hasn’t analyzed information risk before, consider communicating it the first time in a single page.

The reason for a single page communication is that risk can be so complicated and obscure and IT technologies, concepts, and vocabulary can also be complicated and obscure that the combination of both can go well beyond mystifying to an audience not familiar with either or both (which is most people).

A few years ago I was in a position to try to communicate information risk to a number of highly educated, highly accomplished, and high performing professionals with strong opinions (doctors).  I only had a tiny sliver of time and attention for them to listen to my pitch on the information risk in their work environment.  If I tried some sort of multi-page analysis and long presentation, I would have been able to hear the ‘clunk’ as their eyes rolled back in their heads.

Clearly, there was no lack of intellectual capacity for this group, but there was a lack of available bandwidth for this topic and I had to optimize the small amount that I could get.

After several iterations and some informal trials (which largely consisted of me pitching the current iteration of my information risk presentation while walking with a doc in the hall on the way to the operating room), I came up with my single page approach.  It consists of three components:

  • (truncated) risk register 
  • simple heat map
  • 2 – 3 mitigation tasks or objectives
Single Page Risk Plan Communication

Communicate risk in a single page (click to enlarge)

I put the attention-getting colorful heat map in the upper left hand corner, the risk register in the upper right, and a proposed simple mitigation plan at the bottom of the page.

This ended up being pretty successful.  I actually managed to engage them for 5 – 10 minutes (which is a relatively large amount of time for them) and get them thinking about information risk in their environment.

To communicate risk in a single page, I am choosing to leave information out.  This can tend to go against our nature in wanting to be very detailed, comprehensive, and thorough in everything that we do.  However, that level of detail will actually impede communication.  And I need to communicate risk.  By leaving information out, I actually increase the communication that occurs.

Also, notice in the Proposed Mitigation section, I am not proposing to solve everything in the register.  I am proposing to solve things that are important and feasible in a given time frame (three months in this case).

In three, six, or nine months, we can come back with a new presentation that includes results from the proposed mitigation in this presentation.

Notice that I put “Sensitive” in a couple of places on the document to try to remind people that we don’t want to share our weak spots with the world.

If at some point, your company leadership or other stakeholders want more detail, that’s fine.  If they ask for it, they are much more likely to be able and willing to consume it.

To communicate risk, start simple.  If they want more, you’ll be ready by being able to use your working risk register as a source.  I’ll be willing to be bet, though, that most will be happy with a single page.

 

Have you presented information risk to your constituents before? What techniques did you use?  How did it go?

Impact vs Probability

Climbing aboard the helicopter for a training flight one evening was probably the first time that I thought about the difference between probability and impact as components of risk.   For whatever reason, I remember looking at the tail rotor gearbox that particular evening and thinking, “What if that fails? There aren’t a lot of options.”

Tail rotor failures in helicopters are not good.  The tail rotor is what provides the counterbalance for the torque generated by the main rotor that generates all of the lift.  It’s what keeps the fuselage of the helicopter from spinning in the opposite direction of the main rotor in flight.  If the tail rotor fails in some way, not only are you no longer in controlled flight (because the fuselage wants to spin around in circles), but the emergency procedures (EP’s) are pretty drastic and probably won’t help much.

helo1riskslide

tail rotor gearbox

So I found myself thinking, “Why in the world would I (or anyone) climb on board when something so catastrophic could happen?”  And then the answer hit me, “because it probably won’t fail.”  That is, the impact of the event is very bad, but the probability of it happening is low. This particular possibility represented the extremes — very high impact and generally low probability.

helo2riskslide

nose gear

But there are possibilities in between also.  For example, what if the nose gear gets stuck and won’t come back down when I want to land.  While not desirable, it’s certainly not as bad as the tail rotor failing.  I could come back and land/hover and keep the nose gear off the deck while someone comes out and tried to pull it down.  Or they could put something underneath the nose of the helicopter (like a stack of wooden pallets) and set it down on that. While not a high likelihood of occurrence, a stuck nose gear happens more often than a tail rotor failure, so let’s call it a medium probability for the sake of argument.

While the impact of the stuck-nose-gear-event is much less than that of the tail rotor failure, the potential impact is not trivial because recovery from it requires extra people on the ground that are placed in harm’s way. So maybe this is a medium impact event.

helo3riskslide

multiple components/multiple systems

Similarly, what if the main gear box overheats or has other problems? Or other systems have abnormalities, problems or failures? What are the probabilities and impacts of each of these?

There are multiple pieces to the puzzle and each piece needs to be considered in terms of both impact and likelihood.  Even as commercial airline passengers,

If we based our decision to fly purely on an analysis of the impact of an adverse event (crash), few people would ever fly.

We do board the plane, though,  because we know or believe that the probability of that particular plane crashing is low.  So, in making our decision to fly, we consider two components of risk: the probability of a mishap and the impact of a mishap.

We have the same kind of thing in managing risk for IT and Information Management services.  We have many interconnected and complex systems and each has components with various probabilities of failure and resulting impacts from failure.

itimriskslide

multiple components and multiple systems in IT & Information Management systems as well

 What if I’m working in a healthcare environment and users store Protected Health Information (PHI) on a cloud service with a shared user name and password and PHI leaks out into the wild?  This might have risk components of: Probability — High.  Impact — Medium to High, depending on size of leak.  What about laptop theft or loss containing PHI? Same for USB thumb drives.  What is the probability? What is the impact? What about malware infestation of workstations on your network because of lack of configuration standards on BYOD devices? What is the likelihood?  What is the impact?

It’s possible that our server or data center could be hit with an asteroid.  The impact would be very high.  Maybe there are multiple asteroids that hit our failover servers and redundant data centers also !  That would surely shut us down.  But does that mean that we should divert our limited business funds to put our server in an underground bunker?  Probably not — because the likelihood of server failure due to asteroid impact is very low.

As with flying, when we analyze risks  in IT and Information Management operations, we have to dissect and review the events in terms of their respective impacts and probabilities.  Only when we have these two components, impact vs probability, can we start to do meaningful analysis and planning.

 

What are events do you plan for that have Low Probabilities but High Impacts?  What about High Probabilities but Low Impacts?

What Floyd the Barber knew about information risk management

The Mayberry Model

Watch the till and lock the door at night. If you were opening a small business 30 years ago, your major security concerns were probably to keep an eye on the till (cash register) during the day and to lock the door at night.  It reminds me a little bit of the Andy Griffith Show which ran in the 1960’s about a small fictional town called Mayberry RFD in North Carolina.  Mayberry enterprises included Floyd’s Barbershop, Emmett’s Fix-It Shop, and Foley’s Grocery.

mayberry

Floyd didn’t need a risk management program, much less an IT risk management program to run his business.  It was pretty easy to remember — watch the till and lock the door.   He could also easily describe and assign those tasks to someone else if he wasn’t available.    Further, it was fairly easy to watch the till:  Money was physical — paper or metal — and it was transferred to or from the cash drawer. He knew everyone that came into his shop.  Same for Emmett and his Fix-It Shop.  Plus they had the added bonus of a pleasant bell ring whenever the cash drawer opened.  This leads us to the MISRMP (Mayberry Information Security & Risk Management Plan).

cash_register1

Mayberry Information Security and Risk Management Plan:

  • Watch the till 
  • Lock the door at night
  • Make sure the cash register bell is working

Today’s model

Fast forward to a small business today, however, and we have a different story.  Today, in our online stores selling products, services, or information, there is no physical till and probably little to no physical money.  There are online banks, credit cards, and PayPal accounts and we really don’t know where our money is.  We just hope we can get it when we need it.

There are not actual hands in the till nor warm bodies standing near the till when the cash drawer is opened. There is no soft bell ring to let us know the cash drawer just opened.  We don’t know the people in the store and they don’t go away when the front door is locked.  Our customers shop 24/7.

Further, instead of a till with a cash drawer, our businesses rely on very complex and interconnected equipment and systems — workstations, servers, routers, and cloud services — and we don’t have the time to stop and understand how all of this works because we’re busy running a business.  Floyd’s only piece of financial equipment was the cash register (and Emmett could fix that if it broke).

This new way of doing business has happened pretty fast. It is not possible to manage and control all the pieces that make up our financial transactions.  We also have a lot more financial transactions.  While the Internet has brought many more customers to our door, it has also brought many more criminals to our door.  Making the situation even more challenging, we largely don’t have the tools in place to manage our information risks.

Floyd the Barber

Floyd the Barber

What Floyd knew (and we don’t): 

  • who his customers were (knew them by face and name)
  • what their intentions were (wanted to purchased a haircut or shave or steal from the till)
  • where his money was (in the till, in the bank, or in his pocket while being transferred from the shop to the bank)
  • when business transactions occurred ( 9:00 – 5:00 but closed for lunch and closed on Sundays)
  • what was happening in his store after hours (nothing)

That is to say, Floyd had much less business uncertainty than we must contend with today.  He could handle most of his uncertainty by watching the till and locking the door at night. Our small and medium sized businesses today, though, are much more complex, have much higher levels of uncertainty, and need be risk managed to allow us to operate and grow.

As Floyd managed his security and risk to operate a successful business, so must we — ours is just more complicated.

What are the 3 biggest IT & Information Management risks that you see affecting your business?

 

Companies in the long tail & information risk

I contend that at least half of the companies in the US and other industrialized countries are critically overexposed to IT & Information Management risk and that this population of highly vulnerable companies is primarily compromised of medium and small sized companies, aka SME’s (Small and Medium sized Enterprises).

The problem is that the techniques and approaches in the fairly fledgling field of IT risk management usually are developed from or apply to very large companies that differ significantly in scale from SME’s.

Often the IT risk management techniques envisioned for large companies don’t scale down to SME’s. For SME’s, quantities of analytical data, staffing, operational bandwidth are all in short supply.  Also, because of their smaller size, impacts such as total dollar loss from adverse information events such as hacking, malware, fraud, etc are usually lower than that of large companies and compromises, breaches, disclosures can be less newsworthy per event.  However, there are a large number of small and medium size companies.

It turns out that company sizes in industrial countries follow the Zipf distribution where a few very large companies coexist with a lot of much smaller companies.  This is a similar distribution to what Chris Anderson popularized in his Wired magazine article The Long Tail in 2004.  For example, Anderson talks about the record industry historically focusing on the revenue generated from hits (few in number but large in revenue) and missing the fact that there were many non-hit songs generating substantial revenue when viewed in aggregate.  Similarly, there are a few really big companies and a lot of smaller companies.   This high number of smaller companies (like the number of non-hit songs) is the part known as the long tail.  And this is the part suffering the overexposure to information risk because of a lack of tools, methods, and shared approaches between companies.

longtailgraphic3

The challenge is that many of the information risk management techniques and processes used by the relatively few very big companies don’t work well for smaller companies.  This is due largely, but not entirely, to resource constraints of smaller companies.   Staff in smaller companies frequently wear multiple hats and are eyeball-deep in sales, innovation, marketing, infrastructure development, and management of risk is often down the priority list.

As a whole, we end up with part of the population, the few large companies, with reasonable IT risk management capabilities and the other part, the medium and small companies, with poor IT risk management capabilities.

For the sake of argument, say that half the working population is in the few very large companies and the other half is in many small and medium size companies.  Oversimplifying a bit, this means that half of the working population are in companies able to manage risk and the other half are in companies that can’t.

What can be done to enhance the capability of that half that currently can’t manage information risk effectively (or at all)?  What can we do to provide small and medium sized companies risk management tools that are pragmatic and implementable? We need techniques and mechanisms and to share learned experiences in performing risk management in small and medium sized companies.

Do you work in a small to medium sized company? How do you address IT risk management? What other reasons do you see for lack of IT risk management in medium and small sized companies?