The market is exploding with a variety of visualization tools to assist with ‘big data’ analysis in general and security and risk awareness analysis efforts in particular. Who the winner is or winners are in this arena is far from settled and it can be difficult to figure out where to start. While we analyze these different products and services and try some of our own approaches, it is good to keep in mind that there can also be some simple initial value-add in working with quick and easy, nontraditional (at least in this context), visualization
Even simple data visualization can be helpful
I’ve been working with some Shodan data for the past year or so. Shodan, created by John Matherly, is a service that scans several ports/services related to Industrial Control Systems (ICS) and, increasingly, Internet of Things sorts of devices and systems. The service records the results of these scans and puts them in a web accessible database. The results are available online or via a variety of export formats to include csv, json, and xml (though xml is deprecated). In his new site format, Matherly also makes some visualizations of his own available. For example, here’s one depicting ranked services for a particular subset of IP ranges that I was analyzing:
Initially, I wanted to do some work with the text in the banners that Shodan returns, but I found that there was some even simpler stuff that I could do with port counts (number of times a particular port shows up in a subset of IP addresses) to start. For example, I downloaded the results from a Shodan scan, counted the occurrences for each port, ran a quick script to create a file of repeated ‘words’ (actually port numbers), and then dropped that into a text box on Wordle.
Inexpensive (free) data visualization tools
Wordle is probably the most popular web-based way of creating a word cloud. You just paste your text in here (repeated ports in our case):
Click create and you’ve got a word cloud based on the number of ports/services in your IP range of interest. Sure you could look at this in a tabular report, but to me, there’s something about this that facilitates increased reflection regarding the exposure of the IP space that I am interested in analyzing.
[For some technical trivia, I did this by downloading the Shodan results into a json file, used python to import, parse, and upload to a MySQL database, and then ran queries from there. Also, Wordle uses Java so it didn’t play well with Chrome and I switched to Safari for Wordle.]
In addition to quickly eyeball-analyzing an IP space of interest, it can also make for interesting comparisons between related IP spaces. Below are two word clouds for organizations that have very similar missions and staff make up. You would, I did anyway, expect their relative ports counts and word clouds to be fairly similar. As the results below show, however, they may be very different.
Next steps are to explore a couple of other visualization ideas of using port counts to characterize IP spaces and then back to the banner text analysis. Hopefully, I’ll have a post on that up soon.
If you’re doing related work, I would be interested in hearing about what you’re exploring.