Tag Archives: internet of things

Testimony before US-China Economic & Security Review Commission re IoT & 5G

Earlier this month, at the invitation of the US-China Economic and Security Review Commission, I submitted written testimony and subsequently testified at the China, the United States, and Next Generation Connectivity hearing regarding IoT Systems risk mitigation for institutions and cities as well as considerations regarding 5G deployments on the same.

A copy of the written testimony is here. A transcript of the oral testimony will be available in the next weeks.

The testimony discussed potential benefits of IoT Systems for US government, cities, universities, other institutions, and companies. It also discussed risks to those same entities from IoT Systems implementations. The risks discussed include:

  • Supply chain risks
  • Poor selection, procurement, implementation, and management of IoT Systems
  • Lack of institutional governance and lack of awareness of social-technical issues in IoT Systems deployments

Prior to the testimony, I was asked how the US government could help. I suggested these four areas in the testimony:

  • Standardized provenance vetting and reporting for IoT device components
  • Support for increased US labor force training in Operational Technology (OT) skill sets
  • Support for development of institutional and city IoT governance frameworks
  • Support for data ethnography and socio-technical research and application in context of IoT Systems

The testimony also included comments on supply chain risks:

Provenance of multiple components across thousands, millions (or more) devices is challenging

As well as aspects previously discussed in this blog such as the ability of an institution or city to manage their IoT systems:

where the wild things are [reference to Sendak]

Full written testimony is here:

https://www.uscc.gov/sites/default/files/Chuck%20Benson_Written%20Testimony.pdf

Looking for a quick definition of IoT?

Defining IoT (image Wikipedia)

Defining IoT
(image Wikipedia)

 

Tired of looking for the right words when trying to impress your boss, friends, or potential future spouse with a description of the Internet of Things, aka IoT ? Well look no further !

The 10 word version

Here’s a 10 word version. An IoT device is one that:

1. Computes
2. Is networked
3. Interacts with the environment in some way

The 20 word version

And once you’ve impressed them with this knowledge that just rolled off your tongue, feel free take it further with the 20 word version!

1. Computes
2. Is networked
3. Interacts with the environment with the intention of collecting sensory data and/or manipulating the local environment

For example:

  • A FitBit device computes, is networked, & interacts with the environment (ie you)
  • An industrial  SmartGrid meter computes, is networked, & interacts with the environment (collects power data)
  • A residential Nest meter computes, is networked, and interacts with the environment (collects temperature data)
  • Chicago’s Array of Things devices compute, are networked, and interact with the environment (collect many environmental data points)
  • Blood glucose monitors compute, are networked, and interact with the environment (ie you)
  • and much much more !!

And then, while impressing those around you, you can bring it on home with the definition of an IoT System. An IoT system:

1. Is a set of IoT devices that
2. Communicate with each other and/or communicate with
3. A central server that aggregates data and/or provides control data

Congratulations on your assured future personal, social, and professional successes now that a handy definition of IoT is at your disposal!

Minuteness and power — learning to perceive IoT

physicsrockstars3.003

for those about to do physics, we salute you

Perceiving, working with, and managing risk around the Internet of Things (IoT) requires a new way of thinking. The vast and growing numbers of sensing-networked-computing devices, the great variability of types of those devices, the ambient-almost-invisible nature of many of the devices, and the emergent and unpredictable behavior from networked effects puts us into a whole new world. I don’t believe that, at present, we have the mindset or tools to even broadly perceive this cultural and technological change, much less manage risk around it.  In many ways, we don’t know where to start. The good news, though, is that there is historical precedent with changing how we perceive the world around us.

Physics rock stars of the 2nd millennium

While a lesser known name, one of the top 3 rock star physicists of the past half millennium or so is James Clerk Maxwell, sharing the stage with Isaac Newton and Albert Einstein. While known primarily for his contributions to electromagnetic theory, he made contributions in several other areas as well to include governorscolor photography, the kinetic theory of gases.  It was this latter work on the distribution of particle velocities in a gas that helped evolve the notion of entropy and led to bold new ways of perceiving the world around us. And he did all of this in his short 48 years of life.

In his book, Maxwell’s Demon: Why Warmth Disperses and Time Passes, author and physics professor Van Baeyer describes Maxwell as particularly adept at marrying theory with what was observable in the lab or elsewhere in the natural world. As Maxwell worked to come to terms with the work that Joule, Rumford, and Clausius had done toward the development of the 2nd law of thermodynamics, he realized that he was entering a world that he could no longer directly observe. This was a world of unseeable atoms and molecules that could not be counted and whose individual behavior could not be known. As Von Baeyer describes,

“What troubled him was the fact that molecules were too small to be seen, too numerous to be accounted for individually, and too quick to be captured … he no longer had the control that he had become accustomed to …”

In short, he had to abandon his expectations of certainty and find a new way to understand this realm.

To me, this has some similarities to what we’re dealing with with IoT:

  • molecules were many IoT devices are too small to be seen
  • [molecules were] IoT devices are too numerous to be accounted for individually
  • [molecules were] too quick to be captured individual IoT devices change state too often to track

The increasingly frequent lack of visibility of IoT devices, the large numbers of devices, the potentially nondeterministic behavior of individual devices, and network effects create the high probability of emergent and completely unpredictable effects and outcomes.

“… building up in secret the forms of visible things …”

In Maxwell’s own words, he says,

“I have been carried … into the sanctuary of minuteness and of power, where molecules obey the laws of their existence, clash together in fierce collision, or grapple in yet more fierce embrace, building up in secret the forms of visible things.”

Of this, in particular, I believe that the phrase,

“… molecules … clash together in fierce collision or grapple in yet more fierce embrace … building up in secret the forms of visible things …”

speaks to emergent, unpredictable, and possibly even unknowable things.  And I’m not at all saying this is all bad on its own but rather that we’ll have to look at it, perceive it, and attempt to manage it differently than we have in the past.

Minuteness and power

In none of this am I trying to suggest that the Internet of Things is exactly like gas molecules in a balloon or that all of the properties and formulas of molecules in a gas (or wherever) apply to all systems of IoT devices. However, the change in thinking and perception of the world around us that is required might be similar. The numbers and variability of IoT devices are huge and rapidly growing, the behavior of any particular device is uncertain and possibly not knowable, and network effects will contribute to that unpredictability of systems as a whole. I’m hoping that in the onrush of excitement about IoT, as well as the unseen subtlety and nuance of IoT, that we’ll acknowledge and respect the effects of that minuteness and power and work to adjust our perceptions and approaches to managing risk.

Who’s building your IoT data pipeline?

data pipelines require labor too

IoT data pipelines require labor too

There is a lot of excitement around IoT systems for companies, institutions, and governments that sense, aggregate, and publish useful, previously unknown information via dashboards, visualizations, and reporting.  In particular, there has been much focus on the IoT endpoints that sense energy and environmental quantities and qualities in a multitude of ways. Similarly, everybody and their brother has a dashboard to sell you. And while visualization tools are not yet as numerous, they too are growing in number and availability. What is easy to miss, though, in IoT sensor and reporting deployments is that pipeline of data collection, aggregation, processing, and reporting/visualization. How does the data get from all of those sensors and processes along the way so that it can be reported or visualized? This part can be more difficult than it initially appears.

Getting from the many Point A’s to Point B

While ‘big data’, analysis, and visualization have been around for a few years and continue to evolve, the new possibilities brought on by IoT sensing devices have been most recent news.  Getting an individual’s IoT data from the point of sensing to a dashboard or similar reporting tool is generally not an issue. This is because data is coming from only one (or a few) sensing points and will (in theory) only be used by that individual. However, for companies, institutions, and governments that seek to leverage IoT sensing and aggregating systems to bring about increased operating effectiveness and ultimately cost savings, this is not a trivial task. Reliably and continuously collecting data from hundreds, thousands, or more points, is not a slam dunk.

Companies and governments typically have pre-existing network and computing infrastructure systems. For these new IoT systems to work, the many sensing devices (IoT endpoints) need to be installed by competent and trusted professionals. Further, that data needs to be collected and aggregated by a system or device that knows how to talk to the endpoints. After that (possibly before), the data needs to be processed to handle sensing anomalies across the array of sensors and from there, the creation of operational/system health data and indicators is highly desirable so that the new IoT system can be monitored and maintained. Finally, data analysis and massaging is completed so that the data can be published in a dashboard, report, or visualization. The question is, who does this work? Who connects these dots and maintains that connection?

who supplies the labor to build the pipeline?

who supplies the labor to build the pipeline?

The supplier of the IoT endpoint devices won’t be the ones to build this data pipeline. Similarly, the provider for the visualization/reporting technology won’t build the pipeline. It probably will default to the company or government that is purchasing the new IoT technology to build and maintain that data pipeline. In turn, to meet the additional demand on effort, this means that the labor will need to be contracted out or diverted from internal resources, both of which incur costs, whether direct cost or in opportunity cost.

Patching IoT endpoint devices – Surprise! it probably won’t get done

Additional effects of implementing large numbers of IoT devices and maintaining the health of the same include:

  1. the requirement to patch the devices, or
  2. accept the risk of unpatched devices, or
  3. some hybrid of the two.

Unless the IoT endpoint vendor supplies the ability to automatically patch endpoint devices in a way that works on your network, those devices probably won’t get patched.  From a risk management point of view, probably the best approach is to identify the highest risk endpoint devices and try to keep those patched. But I suspect even that will be difficult to accomplish.

Also, as endpoint devices become increasingly complicated and have richer feature sets to remain competitive, they have increased ability to do more damage to your network and assets or those of others on the Internet. Any one of the above options increase cost to the organization and yet that cost typically goes unseen.

Labor leaks

Anticipating labor requirements along the IoT data pipeline is critical for IoT system success. However, this part is often not seen and leaks away. We tend to get caught up in the IoT devices at the beginning of the data pipeline and the fancy dashboards at the end and forget about the hard work of building a quality pipeline in the middle. In our defense, this is where the bulk of the marketing and sales efforts are — at each end of the pipeline. This effort of building a reliable, secure, and continuous data pipeline is a part of the socket concept that I mentioned in an earlier post, Systems in the Seam – Shortcomings in IoT systems implementation.

With rapidly evolving new sensing technologies and new ways to integrate and represent data, we are in an exciting time and have the potential to be more productive, profitable, safe, and efficient. However, it’s not all magical — the need for labor has not gone away. The requirement to connect the dots, the devices and points along the pipeline, is still there. If we don’t capture this component, our ROI from IoT systems investments will never be what we hoped it would be.

 

Creating initial IoT risk categories

With the onslaught of new IoT systems and devices as well as existing old school IoT-ish systems such as HVAC, we all know that this is risk that needs to be assessed and managed. However, some of it is so new that we don’t really know where to start. We don’t have a broadly understood language to discuss, categorize, and classify new IoT systems. There has to be at least some rough categorization if there is going to be any attempt to manage risk brought to our organizations by evolving and rapidly growing numbers of IoT systems and devices.

Whence it came — using provenance to create IoT risk buckets

One approach to an initial categorization to IoT systems and devices can be to identify where those IoT systems are coming from. How do they show up in your offices, buildings, and corporate/institutional spaces? How did they get there?

Some IoT just 'walks on' to corporate and institutional spaces while other IoT systems are purchased via different mechanisms

Some IoT just ‘walks on’ to corporate and institutional spaces while other IoT systems are purchased via different mechanisms

While trying to categorize IoT systems and devices by function, features, behavior, etc is a logical approach, it can be difficult to do in practice because so many new and varied IoT systems, devices, and applications are constantly showing up on our networks. Categorization and classification with this more traditional method can be a moving target, at least for now. Also, this approach can lead us down a rabbit hole looking for a perfect (and complex) taxonomy that would take a long time to develop, would likely be poorly understood, and probably largely not agreed upon. It reminds me of some older large websites that might have had perfect, library-like taxonomies but where 90% of the pages were never accessed.  The website’s taxonomy might have been awesome, but no one cared.  In fact, that perfect taxonomy might have actually diminished usability.

To begin the process of managing IoT risk now, we need some categorization now so that we have some buckets of risk to work with. This doesn’t mean that efforts to develop other classification schemes should be abandoned, but rather that categorizing IoT risk by source, aka the-way-it-got-here, is an approach that we can work with now.

Size matters

The manner in which businesses or institutions purchase IoT devices and systems typically varies with size of the organization. Smaller organizations will likely have fewer purchasing mechanisms than larger organizations. For example a small company might write a check or use a company credit card to purchase a simple IoT-based security system. Where as a larger organization might have a person or purchasing department that handles many purchases, uses purchase orders and invoicing, and probably has some purchasing policies and criteria and is used to purchase an HVAC system, for example.  And yet an even bigger organization or government might have a central planning office that makes recommendations for new buildings or large scale building or community asset modifications. Larger organizations probably have all of these.

Regardless of how many purchasing options an organization has, using purchasing options to create categorized buckets of risk for IoT devices and systems could be a helpful way to go.

Walk-On-IoT — bring it, wear it, or fly it to work

One thing that all organizations have in common is that they all have “walk-on-IoT”. By the walk-on-IoT category, I mean IoT systems purchased or acquired by an individual on their own and that they then bring to work. Whether it is FitBit devices, drones, robotics, consumer networked video cameras, or others, these are devices that a person can purchase at BestBuy, Target, Amazon, or even their local drugstore and bring directly to their corporate or institutional work place.

Other potential source-based IoT risk categories

Some other potential source-based IoT risk categorizations might be:

  • IoT devices/systems purchased with a company credit card
  • IoT devices/systems purchased via a company’s central purchasing/contracting group
  • IoT devices/systems recommended in the course of planning for major building modifications or new buildings (eg, in the case of large businesses and cities)
Identifying where an IoT system or device came from as a basis for initial IoT risk categorization

Identifying where an IoT system or device came from as a basis for initial IoT risk categorization

With these categories, we can start to ask some high level risk questions within each category. For example, is it even possible to feasibly manage this risk? If this risk is unmitigated, what is the impact? For example, can I really manage FitBit devices that walk on to my network? Probably not easily. More importantly, do I really care if FitBit users use their devices on my networks?  Maybe not.

Conversely, because of privacy issues and other concerns, I might indeed care about how an enterprise-wide biometric building access system is selected, installed, commissioned, and supported over its lifetime. Furthermore, this risk probably is manageable with thoughtful institutional safeguards.

Source-based categories can lend themselves to unique risk mitigation approaches

Finally, sourced-based IoT risk categorization can also provide some natural mitigation approaches. For example, purchases via central purchasing can provide the opportunity to see a purchase request (prior to actual purchase) and then provide guidance on the selection of the IoT system, help identify resources for secure implementation, as well as help develop long term support plans for the IoT system. While less involved, IoT purchases via corporate credit card have records amenable for review so that an organization can get an estimation of number and variety of types of IoT devices and systems arriving in the enterprise. This can help with ongoing mitigation and support services planning.

Source-based categories for IoT system risk analysis and management for the enterprise can be a place to start. It is not the end-all by any stretch. As more IoT systems and devices enter our enterprises, we will learn more about their short and long term effects as well as emergent effects between IoT systems. From this we can continue to evolve categories and approaches, but if we need a place to start now, source-based risk categories are not a bad idea.

Socializing Internet of Things risk

IoTRisk-g

adding risk from IoT doesn’t mean the existing risk to an organization conveniently disappeared …

There is a lot of conversation regarding security, privacy, safety and other issues regarding the ongoing proliferation of the Internet of Things (IoT). While IoT promises many helpful and useful things, concern about how it might (and will) be misused are valid. However, there are more than a couple of challenges to addressing this new source of risk to an organization.

Lions and Tigers and Bears

It’s easy for anyone to call out things that could happen with the IoT growth. Medical devices can be hacked , SmartMeters can be compromised and steal privacy information, the utility grid is widening its attack surface, drone video is intercepted and hacked , and countless others . Long live fear, uncertainty, and doubt, right?  While highlighting examples of IoT issues is important, the larger and more difficult thing for an organization to do is to communicate risk around IoT in a way that allows it to be managed.

Communicating IoT risk in an organization

Within an organization that already manages risk in some form, communicating and socializing the idea of IoT risk can be a challenge. There are at least two broad components to that challenge:

  • IoT defies traditional classification/categorization and is still little understood. It’s hard for people to wrap their heads around it
  • the other risks that the organization faces are still there. They haven’t gone away and IoT risk only adds to that

In order to begin to manage IoT risk, management must have some vocabulary for it. IoT is still new, its effects largely unknown and likely emergent, and precedents and analogies are few. We need to surface some language and concepts for it so that it can be discussed.

Another significant aspect of communicating IoT risk issues is that the other risks that an organization already faces — safety, liability, financial loss, reputation damage, technology challenges, business competition, and many more have not gone away. These risks are still there. We are asking senior management to make room in their list of existing risks that they are wrestling with to add yet more risk.  And possibly substantially more risk. Nobody wants to hear this.

Because of this, how we communicate these security, privacy, and risk issues is important. We are competing for a small slice of available cognitive bandwidth, so we must use this opportunity to communicate as well as we can.

Lather, Rinse, Repeat

If you either want to or are tasked with communicating IoT risk in your organization, I would suggest starting here:

  • find out what other risk the organization is already working with. Is there an annual report? Is there someone in the know in your network?
  • identify places where IoT is already in your organization or where you expect it
  • use the language of managing existing risk in your organization to begin to talk about IoT risk. If you have existing IoT risk examples, describe them in traditional risk language for your organization
  • repeat

A key to this communication is to get some IoT risk concepts out early. Give management some language to use to reflect on IoT risk and to discuss with their peers. It’s also important not to be heavy-handed in the approach. Yes, IoT risk is important, the impacts potentially very high, and the opportunities for abuse many, but the other existing risks that an organization faces haven’t gone away and they still must be managed too.

Attacks on internet of things top security predictions for 2015

iotattacks

Attacks on Internet of Things tops list of Symantec’s 2015 Security Predictions. The post and infographic say that there will be a particular focus on smart home automation. Interestingly, the blog post references what is likely the Shodan database, referring to it as a “search engine that allows people to do an online search for Internet-enabled devices,” but does not mention it by name. While attacks on IoT devices/systems or attacks via IoT devices/systems is certainly not the only risk, it is further evidence that the attack surface provided by the rapid growth of IoT/ICS devices and systems is a burgeoning risk sector.

The report also highlights attacks on mobile devices, continuing ransomware attacks, and DDOS attacks.

Cerealboxing Shodan data

luckycharmsIn 2010, Steve Ocepek did a presentation at  DefCon where he introduced an idea that he called ‘cerealboxing’.  In it, he made a distinction between visibility and visualization. He suggested that visualization uses more of our ability to reason and visibility is more peripheral and taps into our human cognition.  He references Spivey and Dale in their paper Continuous Dynamics in Real-Time Cognition in saying:

“Real-time cognition is best described not as a sequence of logical operations performed on discrete symbols but as a continuously changing pattern of neuronal activity.”

Thinking on the back burner

Steve’s work involved building an Arduino-device that provides an indication of the source country of spawned web sessions while doing normal web browsing.  The idea was that as you do your typical browsing work, the device, via numbers and colors of illuminated LEDs would give an indication of how many web sessions were spawned on any particular page and where those sessions sourced from.  I built the device myself, ran it, and it was enlightening (no pun intended).

Using Steve’s device, while focused on something else — my web browsing, I had an indication out of the corner of my eye that I processed somewhat separately from my core task of browsing.  Without even trying or ‘thinking’, I was aware when a page lit up with many LED’s and many colors (indicating many sessions from many different countries).  I also became aware when I was seeing many web pages, regardless of my activity, that came from Brazil, for example.

Cerealbox

Steve named this secondary activity ‘cerealboxing’ as when you mindlessly read a cereal box at breakfast.  From one of his presentation slides:

  • Name came from our tendency to read/interpret anything in front of us
  • Kind of a “background” technology, something that we see peripherally
  • Pattern detection lets us see variances without digging too deep
  • Just enough info to let us know when it’s time to dig deeper

Back to excavating Shodan data

As I mentioned in my last post, Shodan data offers a great way to characterize some of the risk on your networks.  The challenge is that there is a lot of data.

One of the things that I want to know is what kinds of devices are showing up on my networks? What are some indicators? What words from ‘banner grabs’ indicate web cams, Industrial Control Systems, research systems, environmental control systems, biometrics systems, and others on my networks?  I started with millions of tokens.  How could I possibly find out interesting or relevant ‘tokens’ or key words in all of these?

To approach this, I borrowed the cerealboxing idea and wrote a script that continuously displays this data on a window (or two) on my computer. And then just let it run while I’m doing other things. It may sound odd, but I found myself occasionally glancing over and catching an interesting word or token that I probably would not have seen otherwise.

cerealboxunordered

unordered tokens

So, in a nutshell, I approached it this way:

  • tokenize all of the banners in the study
  • I studied banners from my organization as well as peer organizations
  • do some token reduction with stoplists & regular expressions, eg 1 & 2 character tokens, known printers, frequent network banner tokens like ‘HTTP’, days of the week, months, info on SSH variants, control characters that made the output look weird, etc
  • scroll a running list of these in the background or on a separate machine/screen

I also experimented with sorting by length of the tokens to see if that was more readable:

ordered5char

sorted by order — this section showing tokens (words) of 5 characters in length

In the course of doing this, I update a list of related tokens.  For example, some tokens related to networked cameras:

partiallist_networkcamera

And some related to audio and videoconferencing:

partiallist_telecom_videoconf

This evolving list of tokens will help me identify related device and system types on my networks as I periodically update the sample.

This is a fair amount of work to get this data, but once the process is identified and scripts written, it’s not so bad. Besides, with over 50 billion networked computing devices online in the next five years, what are you gonna do?

Excavating Shodan Data

excavator

A shovel at a time

The Shodan data source can be a good way to begin to profile your organization’s exposure created by Industrial Control Systems (ICS) and Internet of Things (IoT) devices and systems. Public IP addresses have already been scanned for responses to known ports and services and those responses have been stored in a searchable web accessible database — no muss, no fuss. The challenge is that there is A LOT of data to go through and determining what’s useful and what’s not useful is nontrivial.

Data returned from Shodan queries are results from ‘banner grabs’ from systems and devices. ‘Banner grabs’ are responses from devices and systems that are usually in place to assist with installing and managing the device/system. Fortunately or unfortunately, these banners can contain a lot of information. These banners can be helpful for tech support, users, and operators for managing devices and systems. However, that same banner data that devices and systems reveal about themselves to good guys is also revealed to bad guys.

What are we looking for?

So what data are we looking for? What would be helpful in determining some of my exposure? There are some obvious things that I might want to know about my organization. For example, are there web cams reporting themselves on my organization’s public address space? Are there rogue routers with known vulnerabilities installed? Industrial control or ‘SCADA’ systems advertising themselves? Systems advertising file, data, or control access?

The Shodan site itself provides easy starting points for these by listing and ranking popular search terms in it’s Explore page. (Again, this data is available to both good guys and bad guys). However, there are so many new products and systems and associated protocols for Industrial Control Systems and Internet of Things that we don’t know what they all are. In fact, they are so numerous and growing that we can’t know what they all are.

So how do we know what to look for in the Shodan data about our own spaces?

Excavation

My initial approach to this problem is to do what I call excavating Shodan data. I aggregate as much of the Shodan data as I can about my organization’s public address space. Importantly, I also research the data of peer organizations and include that in the aggregate as well. The reason for this is that there probably are some devices and systems that show up in peer organizations that will eventually also show up in mine.

Next, using some techniques from online document search, I tokenize all of the banners. That is, I chop up all of the words or strings into single words or ‘tokens.’ This results in hundreds of thousands of tokens for my current data set (roughly 1.5 million tokens). The next step is to compute the frequency of each, then sort in descending order, and finally display some number of those discovered words/tokens. For example, I might say show me the 10 most frequently occurring tokens in my data set:

devices1st10

Top 10 most frequently occurring words/tokens — no big surprises — lots of web stuff

I’ll eyeball those and then write those to a stoplist so that they don’t occur in the next run. Then I’ll look at the next 10 most frequently occurring. After doing that a few times, I’ll dig deeper, taking bigger chunks, and ask for the 100 most frequently occurring. And then maybe the next 1000 most frequently occurring.

This is the excavation part, gradually skimming the most frequently occurring off the top to see what’s ‘underneath’. Some of the results are surprising.

‘Password’ frequency in top 0.02% of banner words

Just glancing at the top 10, not much is surprising — a lot of web header stuff. Taking a look at the top 100 most frequently occurring banner tokens, we see more web stuff, NetBIOS revealing itself, some days of the week and months, and other. We also see our first example of third party web interface software with Virata-EmWeb. (Third party web interface software is interesting because a vulnerability here can cross into multiple different types of devices and systems.) Slicing off another layer and going deeper by 100, we find the token ‘Password’ at approximately the 250th most frequently occurring point. Since I’m going through 1.5 million words (tokens), that means that ‘Password’ frequency is in the top 0.02% or so of all tokens. That’s sort of interesting.

But as I dig deeper, say the top 1500 or so, I start to see Lantronix, a networked device controller, showing up. I see another third party web interface, GoAhead-Webs. Blackboard often indicates Point-of-Sale devices such as card swipers on vending machines. So even looking at only the top 0.1% of the tokens, some interesting things are showing up.

LantronixGoAheadBB

Digging deeper — Even in the top 0.1% of tokens, interesting things start to show up

New devices & systems showing up

But what about the newer, less frequently occurring, banner words (tokens) showing up in the list? Excavating like this can clearly get tedious, so what’s another approach for discovery of interesting, diagnostic, maybe slightly alarming words in banners on our networks? In a subsequent post, I’ll explain my next approach that I’ve named ‘cerealboxing’, based on an observation and concept of Steve Ocepek’s regarding our human tendency to automatically read, analyze, and/or ingest information in our environment, even if passively.

Poor Man’s Industrial Control System Risk Visualization

The market is exploding with a variety of visualization tools to assist with ‘big data’ analysis in general and security and risk awareness analysis efforts in particular. Who the winner is or winners are in this arena is far from settled and it can be difficult to figure out where to start. While we analyze these different products and services and try some of our own approaches, it is good to keep in mind that there can also be some simple initial value-add in working with quick and easy, nontraditional (at least in this context), visualization

Even simple data visualization can be helpful

I’ve been working with some Shodan data for the past year or so. Shodan, created by John Matherly, is a service that scans several ports/services related to Industrial Control Systems (ICS) and, increasingly, Internet of Things sorts of devices and systems. The service records the results of these scans and puts them in a web accessible database. The results are available online or via a variety of export formats to include csv, json, and xml (though xml is deprecated). In his new site format, Matherly also makes some visualizations of his own available. For example, here’s one depicting ranked services for a particular subset of IP ranges that I was analyzing:

Builtin Shodan visualization -- Top operating systems in scan

One of the builtin Shodan visualizations — Top operating systems

Initially, I wanted to do some work with the text in the banners that Shodan returns, but I found that there was some even simpler stuff that I could do with port counts (number of times a particular port shows up in a subset of IP addresses) to start. For example, I downloaded the results from a Shodan scan, counted the occurrences for each port, ran a quick script to create a file of repeated ‘words’ (actually port numbers), and then dropped that into a text box on Wordle.

Inexpensive (free) data visualization tools

Wordle is probably the most popular web-based way of creating a word cloud. You just paste your text in here (repeated ports in our case):

Just cut & paste ports

Just cut & paste ports into Wordle

Click create and you’ve got a word cloud based on the number of ports/services in your IP range of interest. Sure you could look at this in a tabular report, but to me, there’s something about this that facilitates increased reflection regarding the exposure of the IP space that I am interested in analyzing.

 

org3portwordle

VNC much? Who says telnet is out of style ?

[For some technical trivia, I did this by downloading the Shodan results into a json file, used python to import, parse, and upload to a MySQL database, and then ran queries from there. Also, Wordle uses Java so it didn’t play well with Chrome and I switched to Safari for Wordle.]

In addition to quickly eyeball-analyzing an IP space of interest, it can also make for interesting comparisons between related IP spaces. Below are two word clouds for organizations that have very similar missions and staff make up. You would, I did anyway, expect their relative ports counts and word clouds to be fairly similar. As the results below show, however, they may be very different.

org1portwordle

Organization 1’s most frequently found ports/services

org2portwordle

Organization 2’s most frequent ports/services — same mission and similar staffing as Org 1, but network (IP space) has some significant differences

Next steps are to explore a couple of other visualization ideas of using port counts to characterize IP spaces and then back to the banner text analysis. Hopefully, I’ll have a post on that up soon.

If you’re doing related work, I would be interested in hearing about what you’re exploring.