Tag Archives: risk

Minuteness and power — learning to perceive IoT

physicsrockstars3.003

for those about to do physics, we salute you

Perceiving, working with, and managing risk around the Internet of Things (IoT) requires a new way of thinking. The vast and growing numbers of sensing-networked-computing devices, the great variability of types of those devices, the ambient-almost-invisible nature of many of the devices, and the emergent and unpredictable behavior from networked effects puts us into a whole new world. I don’t believe that, at present, we have the mindset or tools to even broadly perceive this cultural and technological change, much less manage risk around it.  In many ways, we don’t know where to start. The good news, though, is that there is historical precedent with changing how we perceive the world around us.

Physics rock stars of the 2nd millennium

While a lesser known name, one of the top 3 rock star physicists of the past half millennium or so is James Clerk Maxwell, sharing the stage with Isaac Newton and Albert Einstein. While known primarily for his contributions to electromagnetic theory, he made contributions in several other areas as well to include governorscolor photography, the kinetic theory of gases.  It was this latter work on the distribution of particle velocities in a gas that helped evolve the notion of entropy and led to bold new ways of perceiving the world around us. And he did all of this in his short 48 years of life.

In his book, Maxwell’s Demon: Why Warmth Disperses and Time Passes, author and physics professor Van Baeyer describes Maxwell as particularly adept at marrying theory with what was observable in the lab or elsewhere in the natural world. As Maxwell worked to come to terms with the work that Joule, Rumford, and Clausius had done toward the development of the 2nd law of thermodynamics, he realized that he was entering a world that he could no longer directly observe. This was a world of unseeable atoms and molecules that could not be counted and whose individual behavior could not be known. As Von Baeyer describes,

“What troubled him was the fact that molecules were too small to be seen, too numerous to be accounted for individually, and too quick to be captured … he no longer had the control that he had become accustomed to …”

In short, he had to abandon his expectations of certainty and find a new way to understand this realm.

To me, this has some similarities to what we’re dealing with with IoT:

  • molecules were many IoT devices are too small to be seen
  • [molecules were] IoT devices are too numerous to be accounted for individually
  • [molecules were] too quick to be captured individual IoT devices change state too often to track

The increasingly frequent lack of visibility of IoT devices, the large numbers of devices, the potentially nondeterministic behavior of individual devices, and network effects create the high probability of emergent and completely unpredictable effects and outcomes.

“… building up in secret the forms of visible things …”

In Maxwell’s own words, he says,

“I have been carried … into the sanctuary of minuteness and of power, where molecules obey the laws of their existence, clash together in fierce collision, or grapple in yet more fierce embrace, building up in secret the forms of visible things.”

Of this, in particular, I believe that the phrase,

“… molecules … clash together in fierce collision or grapple in yet more fierce embrace … building up in secret the forms of visible things …”

speaks to emergent, unpredictable, and possibly even unknowable things.  And I’m not at all saying this is all bad on its own but rather that we’ll have to look at it, perceive it, and attempt to manage it differently than we have in the past.

Minuteness and power

In none of this am I trying to suggest that the Internet of Things is exactly like gas molecules in a balloon or that all of the properties and formulas of molecules in a gas (or wherever) apply to all systems of IoT devices. However, the change in thinking and perception of the world around us that is required might be similar. The numbers and variability of IoT devices are huge and rapidly growing, the behavior of any particular device is uncertain and possibly not knowable, and network effects will contribute to that unpredictability of systems as a whole. I’m hoping that in the onrush of excitement about IoT, as well as the unseen subtlety and nuance of IoT, that we’ll acknowledge and respect the effects of that minuteness and power and work to adjust our perceptions and approaches to managing risk.

Who’s building your IoT data pipeline?

data pipelines require labor too

IoT data pipelines require labor too

There is a lot of excitement around IoT systems for companies, institutions, and governments that sense, aggregate, and publish useful, previously unknown information via dashboards, visualizations, and reporting.  In particular, there has been much focus on the IoT endpoints that sense energy and environmental quantities and qualities in a multitude of ways. Similarly, everybody and their brother has a dashboard to sell you. And while visualization tools are not yet as numerous, they too are growing in number and availability. What is easy to miss, though, in IoT sensor and reporting deployments is that pipeline of data collection, aggregation, processing, and reporting/visualization. How does the data get from all of those sensors and processes along the way so that it can be reported or visualized? This part can be more difficult than it initially appears.

Getting from the many Point A’s to Point B

While ‘big data’, analysis, and visualization have been around for a few years and continue to evolve, the new possibilities brought on by IoT sensing devices have been most recent news.  Getting an individual’s IoT data from the point of sensing to a dashboard or similar reporting tool is generally not an issue. This is because data is coming from only one (or a few) sensing points and will (in theory) only be used by that individual. However, for companies, institutions, and governments that seek to leverage IoT sensing and aggregating systems to bring about increased operating effectiveness and ultimately cost savings, this is not a trivial task. Reliably and continuously collecting data from hundreds, thousands, or more points, is not a slam dunk.

Companies and governments typically have pre-existing network and computing infrastructure systems. For these new IoT systems to work, the many sensing devices (IoT endpoints) need to be installed by competent and trusted professionals. Further, that data needs to be collected and aggregated by a system or device that knows how to talk to the endpoints. After that (possibly before), the data needs to be processed to handle sensing anomalies across the array of sensors and from there, the creation of operational/system health data and indicators is highly desirable so that the new IoT system can be monitored and maintained. Finally, data analysis and massaging is completed so that the data can be published in a dashboard, report, or visualization. The question is, who does this work? Who connects these dots and maintains that connection?

who supplies the labor to build the pipeline?

who supplies the labor to build the pipeline?

The supplier of the IoT endpoint devices won’t be the ones to build this data pipeline. Similarly, the provider for the visualization/reporting technology won’t build the pipeline. It probably will default to the company or government that is purchasing the new IoT technology to build and maintain that data pipeline. In turn, to meet the additional demand on effort, this means that the labor will need to be contracted out or diverted from internal resources, both of which incur costs, whether direct cost or in opportunity cost.

Patching IoT endpoint devices – Surprise! it probably won’t get done

Additional effects of implementing large numbers of IoT devices and maintaining the health of the same include:

  1. the requirement to patch the devices, or
  2. accept the risk of unpatched devices, or
  3. some hybrid of the two.

Unless the IoT endpoint vendor supplies the ability to automatically patch endpoint devices in a way that works on your network, those devices probably won’t get patched.  From a risk management point of view, probably the best approach is to identify the highest risk endpoint devices and try to keep those patched. But I suspect even that will be difficult to accomplish.

Also, as endpoint devices become increasingly complicated and have richer feature sets to remain competitive, they have increased ability to do more damage to your network and assets or those of others on the Internet. Any one of the above options increase cost to the organization and yet that cost typically goes unseen.

Labor leaks

Anticipating labor requirements along the IoT data pipeline is critical for IoT system success. However, this part is often not seen and leaks away. We tend to get caught up in the IoT devices at the beginning of the data pipeline and the fancy dashboards at the end and forget about the hard work of building a quality pipeline in the middle. In our defense, this is where the bulk of the marketing and sales efforts are — at each end of the pipeline. This effort of building a reliable, secure, and continuous data pipeline is a part of the socket concept that I mentioned in an earlier post, Systems in the Seam – Shortcomings in IoT systems implementation.

With rapidly evolving new sensing technologies and new ways to integrate and represent data, we are in an exciting time and have the potential to be more productive, profitable, safe, and efficient. However, it’s not all magical — the need for labor has not gone away. The requirement to connect the dots, the devices and points along the pipeline, is still there. If we don’t capture this component, our ROI from IoT systems investments will never be what we hoped it would be.

 

Developing an IoT vendor strategy

The vendor count for IoT systems that a company or organization manages will only increase in the coming months and years and it will possibly increase substantially. Some of this will be from traditional systems like HVAC that have been in the space longer than most and are maturing and extending their IoT development and deployment.  New growth in an organizations’s vendor count will be from vendors with brand new products and service lines made possible by IoT innovation and expansion.  Many of the benefits of IoT will be from products and services from different vendors that interact and exchange information with each other such as an IoT implementation leveraging the cloud.   Regardless of the source, the number of IoT vendors that an organization has will grow.

This increased IoT system vendor count is not a bad thing in its own right. However, a somewhat insidious effect is that the number of relationships to be managed (or not managed) will grow even faster than the increasing vendor count itself.

number of relationships grows increasingly faster than the number of nodes

number of relationships grows increasingly faster than the number of nodes

Relationships have friction

Every relationship has friction or loss from an idealized state. Nature has plenty of examples —  pressure loss in a pipe, channel capacity in information theory, marriage, and heat engine efficiency established nearly 200 years ago by Sadi Carnot. Carl Von Clausewitz famously established the concept of friction in war in his book On War in which he sometimes evokes the image of two wrestlers in a relationship.

Relationships between business customer and their vendors have friction too — from day-to-day relationship management overhead such as communication planning and contract management to more challenging aspects such as expectation alignment/misalignment and resource allocation problems.

heatengine

there’s a limit to how much work can get done between any two points

Friction in a business customer-vendor relationship (unavoidable to some degree) means less information gets communicated than expected, similar to Shannon’s observations on information exchange. And similar to limits expressed with Carnot’s engine efficiency, less work gets done in practice than in the idealized state. Particularly for the former, a reduction in expected information exchange, by definition, increases uncertainty. Further, friction in a network of relationships can manifest itself in yet even more uncertainty.  Less work gets done than is expected and the state of things is unclear.

With a growing network of nodes (IoT vendors in this case), the even faster growing number of relationships, and the friction that naturally exists between them, our business environments are becoming increasingly complex and accompanied with increased uncertainty. Vendor management and its associated risk, in the traditional sense, have left the building.

Sans organizational IoT strategy, IoT vendors will naturally optimize for themselves

While a strategy around IoT deployment and IoT vendor management can be difficult to devise and establish given the complexity and relative newness of the phenomenon, we have to acknowledge that vendors/providers will naturally optimize for themselves if we don’t have an IoT implementation strategy for our organizations.

This is not an easy thing. We really don’t know what is going to happen next in IoT innovation, so how do we establish strategy? Also, the strategy might cost something in terms of technical framework and staffing — and that is particularly hard to sell internally. However, without some form of an IoT system implementation strategy, each individual provider will offer a product or service line implementation that’s best for them. They won’t be managing the greater good of our organization. This is not evil, it’s natural in our market economy — but we as business consumers need to be aware of this.

Similar to the concept of building a socket in the last post, in establishing a policy or framework for IoT vendor relationships, some IoT vendor considerations might include:

  • Are there standard frameworks that can be deployed to support requirements from multiple different IoT vendors? For example, does every vendor need their own dedicated, staffed, and managed database? If individual vendors demand dedicated support frameworks/infrastructure, are they willing to pay for it or otherwise subsidize it?
  • Does your vendor offer a VM (virtual machine) image that works in your data center or with your cloud provider? Do they offer a service that helps integrate their VM image into your data center or cloud environment?
  • Are there protocols that can be leveraged across multiple different vendors? Does the vendor in consideration participate in open-source protocols? For example, for managing trust, Trusted Computing Group has extended some of their efforts in an open source trust platform to the IoT space.
  • Does the vendor provide a mechanism to help you manage them for performance?  If so, the vendor acknowledges the additional complexity that managing many IoT systems brings and offers to help you review and manage performance.

While an IoT framework or policy at this stage is almost guaranteed to be imperfect, incomplete, and ephemeral, the cost of not having one puts your organization at every IoT system provider’s whim.  And that cost is probably much higher.

Systems in the seam — shortcomings in IoT system implementation

Jose Abreu

Coming apart at the seams

One of the greatest areas of risk related to the Internet of Things (IoT) in an organization, corporation, or institution comes not necessarily from the IoT systems themselves, but rather the implementation of the IoT systems. A seam forms between the delivery of the system by the vendor/provider and the use of that system by the customer.  Seams, in themselves, are not bad. In fact, they’re essential for complex systems. They connect and integrate different parts of a system to work towards a cohesive whole.  However, how we choose to approach and manage these seams makes a difference.

Managing the seam

Seams are where interesting things happen. College baseball changed its ball seams this year to flat instead of raised to drive more hits and home runs and, sure enough, balls are traveling an average of 20 feet further.  There are seam routes in football where the receiver tries to exploit the gap between defenders. And anyone that’s ever sat in the window seat by the wing of an airplane can attest that there are many more seams than they would probably care to see. Finally, of course, seams can also be where things come apart.

More seams than I would care to be aware of

More seams than we would probably care to acknowledge

Vendor relationships and vendor management have always been important for firms and institutions. However, the invasive nature of IoT systems makes vendor management particularly important to successful IoT system implementation and subsequent operation. However, the work and staffing required to manage those customer-vendor relationships and to provide the oversight needed to operate safe and effective systems often gets obfuscated by the promises and shininess of the new technology.

IoT systems are different from traditional deployments of workstations, laptops, and servers. By their very nature, IoT systems have the ability to sense, record, transmit, and/or interact with the environments in which we live and work. Further complicating the IoT systems deployments and support, these systems may well be invisible to us and organizational IT might not even know the systems exist much less be able to provide central IT support.

Firms and institutions purchase IoT devices and systems en masse to address some need in their operation. These IoT systems might be related to environmental control and energy efficiency, safety of staff and the public (fire, security, other), biometric authentication systems, surveillance systems and others. Because of this, IoT devices can be brought into an organization’s physical and cyber space by the hundreds or thousands or more. When such systems and devices are partially or improperly configured, there can be significant consequences to the organization. Similarly, a lack of planning of long-term support, whether local or via maintenance contract with the vendor or both, can also have significant implications.

Cost of building a socket

In most organizations, implementing a third-party solution, whether hardware, software, SaaS, or hybrid, requires a supporting infrastructure for that solution. I call this supporting structure a socket. The customer organization must create a socket that allows the new vendor solution to interface with appropriate parts of the customer’s existing infrastructure. Taking the time and resources to plan, build, and maintain this socket is integral to the operational success of the new system. It also provides the opportunity to manage some of the risk that the new system introduces to the organization.

VendorSocket

Building a socket to support vendor IoT systems

Know yourself

One of the worst case scenarios for an organization is believing that an IoT system is managed when it is actually not managed. At this point in the evolution of IoT deployments, I suspect that this scenario is more of the rule than the exception. Given the scale and speed of IoT innovation and growth and the lack of precedence for managing this sort of risk, the famed Sun Tzu guidance to know yourself can be elusive.  The IoT phenomena will change how we seek to know and characterize our organizations as a part of the risk management process.  A good place to start knowing ourselves is planning, building, and managing that seam where the interesting things happen.

FTC IoT guideline describes complexity, nuance of IoT

FTC IoT development guidelines http://1.usa.gov/1LeGOpX

FTC IoT development guidelines http://1.usa.gov/1LeGOpX

The Federal Trade Commission (FTC) has issued a guideline to companies developing Internet of Things (IoT) products and services. The guideline addresses security, privacy, encryption, authentication, permission control, testing, default settings, patch/software update planning, customer communication and education, and others.

IoT irony

The irony is that the comprehensiveness of the document, the things to plan for and look out for when developing IoT devices and systems, is the same thing that makes me think that the preponderance of device manufacturers will never do most of the things suggested. At least not in the near term. Big companies that have established brand, (eg Microsoft, Cisco, Intel, others) will have the motivation (and capacity) to participate in most of these recommendations. However, the bulk of the companies and likely the bulk of the total IoT device/system marketplace entries will be from the long tail of companies and businesses.

These companies are the smaller companies and startups that are just trying to get into the game. They won’t have an established brand across a large consumer base. This can also be read as, ‘they don’t have as much to lose’. Their risk and resource allocation picture does not include an established brand that needs to protected. They don’t have a brand yet. For most of these startup and small companies, they will view their better play to be:

  • throw our cool idea out there
  • get something on the market
  • if we get a toehold & start to establish some brand, then  we’ll start to worry about being more comprehensive with the FTC suggestions

Change

Again, to be clear, I am appreciative of the FTC guideline for manufacturers and developers of Internet of Things devices. It’s a needed document and is thoughtful, well-written, and thorough. However, the same document can’t help but illustrate all of the variables and complexities of networked computing regarding privacy and security concerns — the same privacy and security concerns that most companies will have insufficient resources and motivation to address.

We’re in for a change. It’s way more complicated than just ‘bad or good’. Where we help protect and manage risk for our organizations, we’re going to have to change how we approach things in our risk management and security efforts. No one else is going to do it for us.

Side effect of IoT growth – more attack platforms

iotgrowth

Rapid growth brings many good things, but also drives how we manage risk. [Image: theconnectivist.com http://bit.ly/1owv1dp]

The rapid growth of the Internet of Things (IoT) phenomenon, along with its corresponding rapid growth in device count, has been the talk about town over the past year or so. While IoT promises many good things, more conversation is being directed toward the risk brought about by the Internet of Things. Often this is in the form of someone will hack your web cams, steal your FitBit health information, hijack your routers and printers, or monkey with your thermostat remotely. While all important risks and concerns, I think that the bigger IoT risk has more to do with the sheer numbers of devices.

IoT devices as attack enablers

In all of the hoopla and coolness and excitement of the Internet of Things, we can sometimes forget the underlying subtle and amazing thing that they are all networked computing devices, many with well known and well understood operating systems. So, for a moment, forget that cool thing that the IoT device does in its local environment (capture video, audio, biometric authentication information, health information, temperature, humidity, refrigerator status, air composition, etc) and just remember that they are networked computing devices — many of these with substantial computing resources.

What this means is that IoT devices are not just targets themselves, but can also act as attack enablers or attack platforms. This can occur via direct hack or by unwitting participation in a botnet.

turkishpipelinehack

Baku-Tbilisi-Ceyhan (BTC) pipeline near the eastern Turkish city of Erzincan on Aug. 7, 2008.

From this recent analysis of a 2008 Turkish pipeline hack and sabotage:

“As investigators followed the trail of the failed alarm system, they found the hackers’ point of entry was an unexpected one: the surveillance cameras themselves.

The cameras’ communication software had vulnerabilities the hackers used to gain entry and move deep into the internal network, according to the people briefed on the matter.

Once inside, the attackers found a computer running on a Windows operating system that was in charge of the alarm-management network, and placed a malicious program on it. That gave them the ability to sneak back in whenever they wanted.”

So, the networked computing presence of the cameras themselves were used as a stepping stone (aka attack point) into the larger network. Some weakness in the operating system (OS) of the camera devices themselves provided a point of entry (‘vector’ in geek speak) into the pipeline’s operational network.

Big numbers

So, if we look at the growth in the number of IoT devices and consider them, for now, only as networked computing devices capable of being compromised, that’s a lot of new stepping stones for attacks.

These growing number of devices can enable & assist attacks by:

1) providing many more attack platforms, which …
2) provides more opportunities for indirection in attack, which …
3) makes attribution more difficult

buttonsLet’s get transitive – Kauffman’s buttons

At the risk of being a little bit tangential, all this reminds me of another network phenomenon, dealing with botnets, that I believe occurs. It is one that is exacerbated by the rapid increase in networked computing nodes, eg from IoT growth and has to do with how quickly the character of a network can change under fairly simple conditions.

I’ve always been intrigued with this ‘toy problem’ that Stuart Kauffman describes in his book, At Home in the Universe. He says to imagine that you have a bunch of buttons on the floor and some pieces of thread. You arbitrarily pick two buttons and then connect them with a piece of thread, a button at each end. Then you arbitrarily pick two more buttons and connect those two. (The original buttons are not excluded; they are still contenders. ) Keep doing this. While doing so, create a graph and plot the thread to number of buttons ratio on the X axis and the size of the largest cluster on the Y axis.

kauffman

Not too much happens at first. Early on, the largest button cluster stays pretty small. Then, at a certain point, the size of the largest cluster leaps. Logically, it’s not surprising. You can see how it happens. However, I still find myself staring at that big jump. That’s a real phase change for at least one aspect of that button network.

kauffman2

Quite a leap — https://keychests.com/media/bigdisk/pdf/16096.pdf

 

I think a similar thing happens with some botnets, particularly P2P botnets, as they grow in size. We can make the reasonable assumption that some botnet sizes are more effective than others at carrying out their varied nefarious tasks, eg 1000 is probably better than 10. While individual bots in botnets do not connect to all of the other bots on the network, they do connect to many.

IoT growth => More buttons

In this environment, I think Kauffman’s toy problem still applies. Namely, at some point, the largest cluster size grows very rapidly. Maybe not with the near-vertical drama of Kauffman’s problem where everything can be connected, but still with a significant acceleration in growth of the largest cluster once a critical point is reached. And if the largest cluster size suddenly meets or exceeds that putative optimal botnet size, well then, we’ve got ourselves an effective botnet.

So if the rapid growth in IoT provides many more buttons, then there are also many more buttons/potential botnet participants for the network. And the fact that these botnets can fairly suddenly (aka seemingly arbitrarily) reach their optimal effectiveness adds another air of uncertainty and difficult-to-predictness to the whole thing.

Not gloom & doom, but evolving risk picture

The sky is not falling and the Internet of Things holds much promise, but the way we look at risk will need to change. The advent and rapid growth of the Internet of Things will change some of the math on the Internet. More botnets will come online and they will do so in unpredictable ways. I’m not saying the end is near, but rather the way we look at risk will have to change.

Attacks on internet of things top security predictions for 2015

iotattacks

Attacks on Internet of Things tops list of Symantec’s 2015 Security Predictions. The post and infographic say that there will be a particular focus on smart home automation. Interestingly, the blog post references what is likely the Shodan database, referring to it as a “search engine that allows people to do an online search for Internet-enabled devices,” but does not mention it by name. While attacks on IoT devices/systems or attacks via IoT devices/systems is certainly not the only risk, it is further evidence that the attack surface provided by the rapid growth of IoT/ICS devices and systems is a burgeoning risk sector.

The report also highlights attacks on mobile devices, continuing ransomware attacks, and DDOS attacks.

Excavating Shodan Data

excavator

A shovel at a time

The Shodan data source can be a good way to begin to profile your organization’s exposure created by Industrial Control Systems (ICS) and Internet of Things (IoT) devices and systems. Public IP addresses have already been scanned for responses to known ports and services and those responses have been stored in a searchable web accessible database — no muss, no fuss. The challenge is that there is A LOT of data to go through and determining what’s useful and what’s not useful is nontrivial.

Data returned from Shodan queries are results from ‘banner grabs’ from systems and devices. ‘Banner grabs’ are responses from devices and systems that are usually in place to assist with installing and managing the device/system. Fortunately or unfortunately, these banners can contain a lot of information. These banners can be helpful for tech support, users, and operators for managing devices and systems. However, that same banner data that devices and systems reveal about themselves to good guys is also revealed to bad guys.

What are we looking for?

So what data are we looking for? What would be helpful in determining some of my exposure? There are some obvious things that I might want to know about my organization. For example, are there web cams reporting themselves on my organization’s public address space? Are there rogue routers with known vulnerabilities installed? Industrial control or ‘SCADA’ systems advertising themselves? Systems advertising file, data, or control access?

The Shodan site itself provides easy starting points for these by listing and ranking popular search terms in it’s Explore page. (Again, this data is available to both good guys and bad guys). However, there are so many new products and systems and associated protocols for Industrial Control Systems and Internet of Things that we don’t know what they all are. In fact, they are so numerous and growing that we can’t know what they all are.

So how do we know what to look for in the Shodan data about our own spaces?

Excavation

My initial approach to this problem is to do what I call excavating Shodan data. I aggregate as much of the Shodan data as I can about my organization’s public address space. Importantly, I also research the data of peer organizations and include that in the aggregate as well. The reason for this is that there probably are some devices and systems that show up in peer organizations that will eventually also show up in mine.

Next, using some techniques from online document search, I tokenize all of the banners. That is, I chop up all of the words or strings into single words or ‘tokens.’ This results in hundreds of thousands of tokens for my current data set (roughly 1.5 million tokens). The next step is to compute the frequency of each, then sort in descending order, and finally display some number of those discovered words/tokens. For example, I might say show me the 10 most frequently occurring tokens in my data set:

devices1st10

Top 10 most frequently occurring words/tokens — no big surprises — lots of web stuff

I’ll eyeball those and then write those to a stoplist so that they don’t occur in the next run. Then I’ll look at the next 10 most frequently occurring. After doing that a few times, I’ll dig deeper, taking bigger chunks, and ask for the 100 most frequently occurring. And then maybe the next 1000 most frequently occurring.

This is the excavation part, gradually skimming the most frequently occurring off the top to see what’s ‘underneath’. Some of the results are surprising.

‘Password’ frequency in top 0.02% of banner words

Just glancing at the top 10, not much is surprising — a lot of web header stuff. Taking a look at the top 100 most frequently occurring banner tokens, we see more web stuff, NetBIOS revealing itself, some days of the week and months, and other. We also see our first example of third party web interface software with Virata-EmWeb. (Third party web interface software is interesting because a vulnerability here can cross into multiple different types of devices and systems.) Slicing off another layer and going deeper by 100, we find the token ‘Password’ at approximately the 250th most frequently occurring point. Since I’m going through 1.5 million words (tokens), that means that ‘Password’ frequency is in the top 0.02% or so of all tokens. That’s sort of interesting.

But as I dig deeper, say the top 1500 or so, I start to see Lantronix, a networked device controller, showing up. I see another third party web interface, GoAhead-Webs. Blackboard often indicates Point-of-Sale devices such as card swipers on vending machines. So even looking at only the top 0.1% of the tokens, some interesting things are showing up.

LantronixGoAheadBB

Digging deeper — Even in the top 0.1% of tokens, interesting things start to show up

New devices & systems showing up

But what about the newer, less frequently occurring, banner words (tokens) showing up in the list? Excavating like this can clearly get tedious, so what’s another approach for discovery of interesting, diagnostic, maybe slightly alarming words in banners on our networks? In a subsequent post, I’ll explain my next approach that I’ve named ‘cerealboxing’, based on an observation and concept of Steve Ocepek’s regarding our human tendency to automatically read, analyze, and/or ingest information in our environment, even if passively.

Poor Man’s Risk Visualization II

Categorizing and clumping (aggregating) simple exposure data from the Shodan database can help communicate some risks that otherwise might have been missed.  Even with the loss of some accuracy (or maybe because of loss of accuracy), grouping some data into larger buckets can help communicate risk/exposure. For example, a couple of posts ago in Poor Man’s Industrial Control System Visualization, Shodan data was used to do a quick visual analysis of what ports and services are open on publicly available IP addresses for different organizations. Wordle was used to generate word clouds and show relative frequency of occurrence where ‘words’ where actually port/service numbers.

Trading-off some accuracy for comprehension

This is great for yourself or colleagues that are also fairly familiar with port numbers, the services that they represent, and what their relative frequencies might imply. However, often we’re trying to communicate these ideas to business people and/or senior management. Raw port numbers aren’t going to mean much to them. A way to address this is to pre-categorize the port numbers/services so that some of them clump together.

Yes, there is a loss of some accuracy with this approach — whenever we generalize or categorize, there is a loss of information.  However, when the domain-specific information makes it difficult or impossible to communicate to another that does not work in that domain (with some interesting parallels to the notion of channel capacity), it’s worth the accuracy loss so that something useful gets communicated. Similar to the earlier post of port/service numbers only, one organization has this ‘port number cloud’:

org1portnum

A fair amount of helpful quick-glance detail consumable by the IT or security professional, but not much help to the non-IT professional

Again, this might have some utility to an IT or security professional, but not much to anyone else. However, by aggregating some of the ports returned into categories and using descriptive words instead, something more understandable by business colleagues and/or management can be rendered:

org1word

For communicating risk/exposure, this is a little more readable & understandable to a broader audience, especially business colleagues & senior management

How you categorize is up to you. I’ll list my criteria below for these examples. It’s important not to get too caught up in the nuance of the categorization. There are a million ways to categorize and many ports/services serve a combination of functions. You get to make the cut on these categories to best illustrate the message that you are trying to get across. As long as you can show how you went about it, then you’re okay.

portcat

One way to categorize ports — choose a method that best helps you communicate your situation

The port number and ‘categorized’ clouds for a smaller organization with less variety are below.

 

org2portnum

A port number ‘cloud’ for a different (and smaller) organization with less variety in port/service types

org2word

The same port/service categorization as used above, but for the smaller organization, yields a very different looking word cloud

One challenge with the more clear approach is that your business colleagues or senior management might leap to a conclusion that you don’t want them too. For example, you will need to be prepared for the course of action that you have in mind. You might need to explain, for example, that though there are many web servers in your organization, your bigger concern might be exposure of telnet and ftp access, default passwords, or all of the above.

This descriptive language categorization approach can be a useful way to demonstrate port/service exposure in your organization, but it does not obviate the need for a mitigation plan.

Borrowing from search to characterize network risk

Most frequently occurring port is in outer ring, 2nd most is next ring in, ...

Most frequently occurring port is in outer ring, 2nd most is next ring in, …

Borrowing some ideas from document search techniques, data from the Shodan database can be used to characterize networks at a glance. In the last post, I used Shodan data for public IP spaces associated with different organizations and Wordle to create a quick and dirty word cloud visualization of exposure by port/service for that organization.

The word cloud idea works pretty well in communicating at a glance the top two or three ports/services most frequently seen for a given area of study (IP space).  I wanted to extend this a bit and compare organizations by a linear rank of the most frequently occurring services seen on that organization’s network.  So I wanted to capture both the most frequently occurring ports/services as well as the rank amongst those and then use those criteria to potentially compare different organizations (IP spaces).

Vector space model

I also wanted to experiment with visualizing this in a way that would give at a glance something of a ‘signature’.  Sooooo, here’s the idea: document search often uses this idea of a vector space model where documents are broken down into vectors.  The vector is a list of words representing all of the words that occur in that document.  The weight given to each word (or term or element) in the vector can be computed in a number of different ways, but one of the most popular is frequency with which that word occurs in that document (and sometimes with which it occurs in all of the documents combined).

A similar idea was used here, except that I used frequency with which ports/services appeared in an organization instead of words in a document. I looked at the top 5 ports/services that appeared.  I also experimented with the top 10 ports/services, but that got a little busy on the graphic and it also seemed that as I moved further down the ordered port list — 8th most frequent, 9th most frequent, etc — that these additional ports were adding less and less to the characterization of the network. Could be wrong, but it just seemed that way at the time.

I went through 12 organizations and collected the top 5 ports/services in each. Organizations varied between approximately 10,000 and 50,000 IP addresses. To have a basis for comparison of each organization, I used a list created by the ports returned from all of the organizations’ Top 5 ports.

Visualizing port rank ‘signatures’

A polar plot was created where each radial represents each port/service.  The rings of the plot represent the rank of that port — most frequently occurring, 2nd most frequently occurring, …, 5th most frequently occurring. I used a polar plot because I wanted something that might generate easily recognizable shapes or patterns. Another plot could have been used, but this one grabbed my eye the most.

Finally, to really get geeky, to measure similarity in some form, I computed the Euclidean distance between each possible vector pair. Two of the closest organizations of the 12 analyzed are (that is most similar port vectors):

 

mostsimilar

2 of the most similar organizations by Euclidean distance — ports 21, 23, & 443 show up with the same rank & port 80 shows up with a rank difference of only 1. This makes them close.  (Euclidean distance of ~2.5)

Two of the furthest way of the 12 studied are these (least similar port vectors):

 

leastsimilar

While port 80 aligns between the two (has the same rank) and port 22 is close in rank between the two, there is no alignment between ports 23, 3389, or 5900. This non-alignment, non-similar port rank, creates more distance between the two. (Euclidean distance of ~9.8)

Finally, this last one is some where in the middle (mean) of the pack:

 

midsimilar

A distance chosen from the middle of the sorted distance (mean). Euclidean distance is ~8.7. Because this median value is much closer to the most dissimilar, it seems to indicate a high degree of dissimilarity across the set studied (I think).

Overall, I liked the plots. I also liked the polar approach. I was hoping that I would see a little more of a ‘shape feel’, but I only studied 12 organizations.  I’d like to add more organizations to the study and see if additional patterns emerge. I also tried other distance measuring methods (Hamming, cosine, jaccard, Chebyshev, cityblock, etc) because they were readily available and easy to use with the scipy library that I was using, but none offered a noticeable uptick in utility over the plain Euclidean measure.

Cool questions from this to pursue might be:

1. For similar patterns between 2 or more organizations, can history of network development be inferred? Was a key person at both organizations at some point? Did one org copy another org?

2. Could the ranked port exposure lend itself to approximating risk for combined/multiprong cyber attack?

Again, if you’re doing similar work on network/IP space characterization and want to share, please contact me at ChuckBenson at this website’s domain for email.