Borrowing from search to characterize network risk

Most frequently occurring port is in outer ring, 2nd most is next ring in, ...

Most frequently occurring port is in outer ring, 2nd most is next ring in, …

Borrowing some ideas from document search techniques, data from the Shodan database can be used to characterize networks at a glance. In the last post, I used Shodan data for public IP spaces associated with different organizations and Wordle to create a quick and dirty word cloud visualization of exposure by port/service for that organization.

The word cloud idea works pretty well in communicating at a glance the top two or three ports/services most frequently seen for a given area of study (IP space).  I wanted to extend this a bit and compare organizations by a linear rank of the most frequently occurring services seen on that organization’s network.  So I wanted to capture both the most frequently occurring ports/services as well as the rank amongst those and then use those criteria to potentially compare different organizations (IP spaces).

Vector space model

I also wanted to experiment with visualizing this in a way that would give at a glance something of a ‘signature’.  Sooooo, here’s the idea: document search often uses this idea of a vector space model where documents are broken down into vectors.  The vector is a list of words representing all of the words that occur in that document.  The weight given to each word (or term or element) in the vector can be computed in a number of different ways, but one of the most popular is frequency with which that word occurs in that document (and sometimes with which it occurs in all of the documents combined).

A similar idea was used here, except that I used frequency with which ports/services appeared in an organization instead of words in a document. I looked at the top 5 ports/services that appeared.  I also experimented with the top 10 ports/services, but that got a little busy on the graphic and it also seemed that as I moved further down the ordered port list — 8th most frequent, 9th most frequent, etc — that these additional ports were adding less and less to the characterization of the network. Could be wrong, but it just seemed that way at the time.

I went through 12 organizations and collected the top 5 ports/services in each. Organizations varied between approximately 10,000 and 50,000 IP addresses. To have a basis for comparison of each organization, I used a list created by the ports returned from all of the organizations’ Top 5 ports.

Visualizing port rank ‘signatures’

A polar plot was created where each radial represents each port/service.  The rings of the plot represent the rank of that port — most frequently occurring, 2nd most frequently occurring, …, 5th most frequently occurring. I used a polar plot because I wanted something that might generate easily recognizable shapes or patterns. Another plot could have been used, but this one grabbed my eye the most.

Finally, to really get geeky, to measure similarity in some form, I computed the Euclidean distance between each possible vector pair. Two of the closest organizations of the 12 analyzed are (that is most similar port vectors):

 

mostsimilar

2 of the most similar organizations by Euclidean distance — ports 21, 23, & 443 show up with the same rank & port 80 shows up with a rank difference of only 1. This makes them close.  (Euclidean distance of ~2.5)

Two of the furthest way of the 12 studied are these (least similar port vectors):

 

leastsimilar

While port 80 aligns between the two (has the same rank) and port 22 is close in rank between the two, there is no alignment between ports 23, 3389, or 5900. This non-alignment, non-similar port rank, creates more distance between the two. (Euclidean distance of ~9.8)

Finally, this last one is some where in the middle (mean) of the pack:

 

midsimilar

A distance chosen from the middle of the sorted distance (mean). Euclidean distance is ~8.7. Because this median value is much closer to the most dissimilar, it seems to indicate a high degree of dissimilarity across the set studied (I think).

Overall, I liked the plots. I also liked the polar approach. I was hoping that I would see a little more of a ‘shape feel’, but I only studied 12 organizations.  I’d like to add more organizations to the study and see if additional patterns emerge. I also tried other distance measuring methods (Hamming, cosine, jaccard, Chebyshev, cityblock, etc) because they were readily available and easy to use with the scipy library that I was using, but none offered a noticeable uptick in utility over the plain Euclidean measure.

Cool questions from this to pursue might be:

1. For similar patterns between 2 or more organizations, can history of network development be inferred? Was a key person at both organizations at some point? Did one org copy another org?

2. Could the ranked port exposure lend itself to approximating risk for combined/multiprong cyber attack?

Again, if you’re doing similar work on network/IP space characterization and want to share, please contact me at ChuckBenson at this website’s domain for email.

Poor Man’s Industrial Control System Risk Visualization

The market is exploding with a variety of visualization tools to assist with ‘big data’ analysis in general and security and risk awareness analysis efforts in particular. Who the winner is or winners are in this arena is far from settled and it can be difficult to figure out where to start. While we analyze these different products and services and try some of our own approaches, it is good to keep in mind that there can also be some simple initial value-add in working with quick and easy, nontraditional (at least in this context), visualization

Even simple data visualization can be helpful

I’ve been working with some Shodan data for the past year or so. Shodan, created by John Matherly, is a service that scans several ports/services related to Industrial Control Systems (ICS) and, increasingly, Internet of Things sorts of devices and systems. The service records the results of these scans and puts them in a web accessible database. The results are available online or via a variety of export formats to include csv, json, and xml (though xml is deprecated). In his new site format, Matherly also makes some visualizations of his own available. For example, here’s one depicting ranked services for a particular subset of IP ranges that I was analyzing:

Builtin Shodan visualization -- Top operating systems in scan

One of the builtin Shodan visualizations — Top operating systems

Initially, I wanted to do some work with the text in the banners that Shodan returns, but I found that there was some even simpler stuff that I could do with port counts (number of times a particular port shows up in a subset of IP addresses) to start. For example, I downloaded the results from a Shodan scan, counted the occurrences for each port, ran a quick script to create a file of repeated ‘words’ (actually port numbers), and then dropped that into a text box on Wordle.

Inexpensive (free) data visualization tools

Wordle is probably the most popular web-based way of creating a word cloud. You just paste your text in here (repeated ports in our case):

Just cut & paste ports

Just cut & paste ports into Wordle

Click create and you’ve got a word cloud based on the number of ports/services in your IP range of interest. Sure you could look at this in a tabular report, but to me, there’s something about this that facilitates increased reflection regarding the exposure of the IP space that I am interested in analyzing.

 

org3portwordle

VNC much? Who says telnet is out of style ?

[For some technical trivia, I did this by downloading the Shodan results into a json file, used python to import, parse, and upload to a MySQL database, and then ran queries from there. Also, Wordle uses Java so it didn’t play well with Chrome and I switched to Safari for Wordle.]

In addition to quickly eyeball-analyzing an IP space of interest, it can also make for interesting comparisons between related IP spaces. Below are two word clouds for organizations that have very similar missions and staff make up. You would, I did anyway, expect their relative ports counts and word clouds to be fairly similar. As the results below show, however, they may be very different.

org1portwordle

Organization 1’s most frequently found ports/services

org2portwordle

Organization 2’s most frequent ports/services — same mission and similar staffing as Org 1, but network (IP space) has some significant differences

Next steps are to explore a couple of other visualization ideas of using port counts to characterize IP spaces and then back to the banner text analysis. Hopefully, I’ll have a post on that up soon.

If you’re doing related work, I would be interested in hearing about what you’re exploring.

Shodan creator opens up tools and services to higher ed

beecham_research_internet_of_things

Cisco/Beecham

The Shodan database and web site, famous for identifying and cataloging the Internet for Industrial Control Systems and Internet of Things devices and systems, is now providing free tools to educational institutions. Shodan creator John Matherly says that “by making the information about what is on their [universities] network more accessible they will start fixing/ discussing some of the systemic issues.”

The .edu package includes over 100 export credits (for large data/report exports), access to the new Shodan maps feature which correlates results with geographical maps, and the Small Business API plan which provides programmatic access to the data (vs web access or exports).

It has been acknowledged that higher ed faces unique and substantial risks due in part to intellectual property derived from research and Personally Identifiable Information (PII) issues surrounding students, faculty, and staff. In fact, a recent report states that US higher education institutions are at higher risk of security breach than retail or healthcare. The FBI has documented multiple attack avenues on universities in their white paper, Higher Education and National Security: The Targeting of Sensitive, Proprietary and Classified Information on Campuses of Higher Education .

The openness and sharing and knowledge propagation mindset of universities can be a significant component of the risk that they face.

Data breaches at universities have clear financial and reputation impacts to the organization. Reputation damage at universities not only affects the ability to attract students, it also likely affects the ability of universities to recruit and retain high producing, highly visible faculty.

This realm of risk of Industrial Control Systems combined with Internet of Things is a rapidly growing and little understood sector of exposure for universities. In addition to research data and intellectual property, PII data from students, faculty, and staff, and PHI data if the university has a medical facility, universities can also be like small to medium sized cities. These ‘cities’ might provide electric, gas, and water services, run their own HVAC systems, fire alarm systems, building access systems and other ICS/IoT kinds of systems. As in other organizations, these can provide substantial points of attack for malicious actors.

Use of tools such as Shodan to identify, analyze, prioritize, and develop mitigation plans are important for any higher education organization. Even if the resources are not immediately available to mitigate identified risk, at least university leadership knows it is there and has the opportunity to weigh that risk along with all of the other risks that universities face. We can rest assured that bad guys, whatever their respective motivations, are looking at exposure and attack avenues at higher education institutions — higher ed institutions might as well have the same information as the bad guys.

Managing the risk of everything else (and there’s about to be more of everything else)

see me, feel me, touch me, heal me

see me, feel me, touch me, heal me

As organizations, whether it be companies, government, or education, when we talk about managing information risk, it tends to be about desktops and laptops, web and application servers, and mobile devices like tablets and smartphones. Often, it’s challenging enough to set aside time to talk about even those. However, there is new rapidly emerging risk that generally hasn’t made it to the discussion yet. It’s the everything else part.

The problem is that the everything else might become the biggest part.

 

Everything else

This everything else includes networked devices and systems that are generally not workstations, servers, and smart phones. It includes things like networked video cameras, HVAC and other building control, wearable computing like Google Glass, personal medical devices like glucose monitors and pacemakers, home/business security and energy management, and others. The popular term for these has become Internet of Things (IoT) with some portions also sometimes referred to as Industrial Control Systems (ICS).

The are a couple of reasons for this lack of awareness. One is simply because of the relative newness of this sort of networked computing. It just hasn’t been around that long in large numbers (but it is growing fast). Another reason is that it is hard to define. It doesn’t fit well with historical descriptions of technology devices and systems. These devices and systems have attributes and issues that are unlike what we are used to.

Gotta name it to manage it

So what do we call this ‘everything else’ and how do we wrap our heads around it to assess the risk it brings to our organizations? As mentioned, devices/systems in this group of everything else can have some unique attributes and issues. In addition to using the unsatisfying approach of defining these systems/devices by what they are not (workstations, application & infrastructure servers, and phones/tablets), here are some of the attributes of these devices and systems:

  •  difficult to patch/update software (& more likely, many or most will never be patched)
  •  inexpensive — there can be little barrier to entry to putting these devices/systems on our networks, eg easy-setup network cameras for $50 at your local drugstore
  • large variety/variability — many different types of devices from many different manufacturers with many different versions, another long tail
  • greater mystery to hardware/software provenance (where did they come from? how many different people/companies participated in the manufacture? who are they?)
  • large numbers of devices — because they’re inexpensive, it’s easy to deploy a lot of them. Difficult or impossible to feasibly count, much less inventory
  • identity — devices might not have the traditional notion of identity, such as having a device ‘owner’
  • little precedent — not much in the way of helpful existing risk management models. Little policies or guidelines for use.
  • everywhere — out-ubiquitizes (you can quote me on that) the PC’s famed Bill Gatesian ubiquity
  • most are not hidden behind corporate or other firewalls (see Shodan)
  • environmental sensing & interacting (Tommy, can you hear me?)
  • comprises a growing fraction of Industrial Control and Critical Infrastructure systems

So, after all that, I’m still kind of stuck with ‘everything else’ as a description at this point. But, clearly, that description won’t last long. Another option, though it might have a slightly creepy quality, could be the phrase, ‘human operator independent’ devices and systems? (But the acronym ‘HOI’ sounds a bit like Oy! and that could be fun).

I’m open to ideas here. Managing the risks associated with these devices and systems will continue to be elusive if it’s hard to even talk about them. If you’ve got ideas about language for this space, I’m all ears.

 

Managed risk is not the only risk

CockpitInstrumentation

Just because we choose to measure these doesn’t mean that we choose what can go wrong

While we might do careful reflection on what risks to track and manage in our information systems, it is important for us to remember that just because we’ve chosen to measure or track certain risks, that doesn’t mean that the other unmeasured and unmanaged risks have gone away. They’re still there — we’ve just made a choice to look elsewhere.

In an aircraft, a choice has been made on what systems are monitored and then what aspects of those systems are monitored. In the aircraft that I flew in the Marine Corps some years ago, a CH 53 helicopter, the systems monitored were often gearboxes, hydraulic systems, and engines. Then of each of those systems, particular attributes were monitored — temperatures and pressures in gearboxes, pressures in hydraulic systems, and turbine speeds and temperatures in engines.

Good to know

These were all good things to know. They were reasonable choices of things to be aware of. I liked knowing that the pressure in the hydraulics system was not too high and not too low. I liked knowing that the engine temperature might be getting a little high and maybe I was climbing too fast or a load might have been heavier than I had calculated. Turbine speed in the ball park of where it’s supposed to be? Good to know too.

These were all good things to monitor, but they were not the only places that things could go wrong. And you couldn’t monitor everything — otherwise you’d spend all of your time monitoring systems when your primary job was to fly. (And there wouldn’t be enough room in the cockpit to have an indicator for everything!)

Subset of risks

So a selection of things to measure and monitor is made and certain sensors and indicators are built into the aircraft. Of all of the things that can go wrong, the aircraft cockpit only gave me a subset of things to monitor. But again, this is by design because the job was to get people and stuff from Point A to Point B. The job wasn’t to attempt to identify, measure, monitor, mitigate every conceivable risk. It’s just not possible to do both.

Much like a movie director chooses what the viewer will see and not see in a movie scene, in our information systems we choose what risks we will be most aware of by adding them to our risk management plan. It is important for us to keep in mind, though, that the other risks with all of their individual and cumulative probabilities and impacts have not gone away. They’re still there — we’ve just chosen not to look at them.

Does trust scale?

In this age where scale is king and where government sanctioned pension default, where executive compensation and line worker pay disparities continue to grow, and where willingness to shed trust for a few moments of attention, among others exist, what does trust mean to us? Is there a limit to how large a business can grow and still be trusted, both internally (employee to business) and externally (business to customer)?

Many, if not most, of our information systems rely on trust. Prime examples are banking systems, healthcare systems, and Industrial Control Systems (ICS). We expect banking and healthcare systems to have technical protections in place to keep our information from ‘getting out’. We expect that the people who operate these systems won’t reveal our data or the secrets and mechanisms that protect them.

Similarly, critical infrastructure ICS, such as power generation and distribution systems, must deliver essential services to the public, government, and businesses. To prevent misuse, whether ignorance or malicious intent, it must do so without revealing to all how it is done. Again, we expect there to be sufficient protective technologies in place and trusted people who, in turn, protect these systems.

The problem is that I’m not sure that trust scales at the same rate as other aspects of the business.

British anthropologist Robin Dunbar’s research suggests that the maximum number of stable relationships a person can maintain is in the ball park of 150. After that number, the ability to recognize faces, trust others in the organization, and other attributes of a stable group begin to roll off.

Exacerbating this numerical analysis are the recent phenomena mentioned above of pension defaults, unprecedented compensation disparities, and selling trust for attention. We don’t trust our employers like we used to. That idealized 1950’s corporate loyalty image is simply not there.

No data centers for trust

So as critical information systems such as healthcare, banking, and ICS seek to scale to optimize efficiency for profit margins and their systems require trust and the required trust doesn’t scale with them, what does that mean?

It means there is a gap. There are no data centers for trust amongst people. The popular business model implies that trust scales as the business scales, but trust doesn’t scale that way, and then we’re surprised when things go awry.

I think it’s reasonable to assert that in an environment of diminishing trust in business and corporations (society today), that the likelihood goes up of one or more constituents violating that trust and possibly disclosing data or the secrets of the mechanisms that protect that data.

Can we fix it?

I don’t think so. It’s a pleasant thought and it’s tidy math, but it’s just that — pleasant and tidy and not real. However, the next best thing is to recognize and acknowledge this. Recognize and plan for the fact that the average trust level across 100 large businesses is probably measurably less than the average trust level across 100 small businesses.

With globalization and mingling of nationalities in a single business entity, there is talk of misplaced loyalties as a source of “insider threat” or other trust leakage or violation. That may be, but I don’t know that it’s worse than the changes in perception of loyalty in any one country stemming from changes in trust perception over the past couple of decades.

So what do we do — Resilience

It gets back to resilience. If we scale beyond a certain point, we’re going to incur more risk — so plan for it. Set aside resources to respond to data breach costs, reputation damage, and other unpleasantness. Or plan to stop scaling fairly early on. Businesses that choose this route are probably fairly atypical, but not unheard of.

We can’t control what happens to us, but we can plan for a little more arbitrariness and a few more surprises. This doesn’t mean the check is in the mail, but it increases the likelihood that our business can make it to another day.

A trash can, a credit card, & a trip to the computer store

“A trash can, credit card, and a trip to the computer store” is how Bruce Schneier recently described the software update process (patch management) for networked consumer devices, aka Internet of Things devices. This category of devices already include home/small business routers and cable modems and is quickly growing to include home energy management devices, home health devices and systems, and a plethora of automation devices and systems.

I believe he is spot on. There may be a few people who consistently download, reprogram, and reconfigure their devices but I would estimate that it’s well under 1%.

The problem of software updates/patch management for Internet of Things devices, both consumer and enterprise, is a significant issue on its own. The bigger issue, though, is that we largely tend to think we’re going to manage these updates in a traditional way such as Microsoft’s famous Patch Tuesday. That simply won’t happen with the raw number of Internet of Things devices as well as the variability of types of devices.

The work before us then is twofold: 1) Are there automated patch management solutions that can be developed to detect outdated software and update/patch the same for at least a subset of all of the devices on the network, and 2) Find a way to formally acknowledge and document the risk of that larger group of devices that remain forever unpatched.

Option 1 has a cost. Option 2 has a cost. I think it will turn out that wrapping our heads around Option 2, the risk, will prove to be more difficult than creating some automated patching solutions.

Use Heartbleed response to help profile your vendor relationships

Heartbleed

Heartbleed vulnerability announced on 4/17/14

To paraphrase REM, whether Heartbleed is the end of the world as we know it (11 on a scale of 10) or if we feel fine (or at least not much different), how our vendors respond or don’t respond gives us the opportunity to learn a little more about our relationship with them.

I’ve only seen one unsolicited vendor response that proactively addressed the Heartbleed discovery. In effect, the email said that they (the vendor) knew there was a newly identified vulnerability, they analyzed the risk for their particular product, took action on their analysis, and communicated the effort to their customers. This was great. But it was only one vendor.

Other vendors responded to questions that I had, but I had to reach out to them. And from some vendors, it has been crickets (whether there was an explicit Heartbleed vulnerability in their product/service or not).

Ostensibly, when we purchase a vendor’s product or service, we partner with them. They provide a critical asset or service and often an ongoing maintenance contract along with that product/service. The picture that we typically have in our heads is that we are partners; that we’re in it together. Generally, that’s also how the vendor wants us to feel about it.

What does it mean then, if we have little or no communication from our ‘partner’ when a major vulnerability such as Heartbleed is announced? Where this is the case, the partner concept breaks down. And if it breaks down here, where else might it break down?

Because of this, we can use the Heartbleed event to provide a mechanism to revisit how we view our vendor relationships. A simple table that documents vendor response to Heartbleed could give us broader and deeper perspective into understanding our vendor relationships.

vendorprofilebyheartbleedresponse

For this example, because of their quick communication that did not require me to reach out, I might send a thank you email to Vendor A to further tighten that relationship . Vendor C and Vendor Z are in the same ball park, but I might want to follow up on the delay.  I’ll definitely be keeping Vendor B’s complete lack of response in mind the next time the sales guy calls.

Again, some vendor responses might be great. However, I think vendor and partner relationships aren’t as tight as we may like to tell ourselves and we can use vendor customer response to Heartbleed as an opportunity to reflect on that.

 

[Heartbleed image/logo: Creative Commons]

Some Heartbleed vendor notifications from SANS