December 2015 - Long Tail Risk

There are many different estimates regarding the growth rate of the Internet of Things (IoT). There are projections of number of connected devices, projections on market capitalization, projections on growth of semiconductor counts supporting those devices, and many others. Because the numbers of devices and systems are so high and these projections are around things that we typically don’t understand well, it’s hard to get a feel for what is actually increasing so rapidly. What is this thing that is growing so rapidly? How fast is it growing? If we can’t roughly understand the magnitudes involved, we can’t discuss, plan, assess, or begin to mitigate risk to our organizations and institutions involving these systems.

Going old school

Summa de arithmetica – Wikipedia http://bit.ly/1MHOuxO

One way to better our ballpark understanding of this rate of growth can be with the old school method of applying the Rule of 72. Introduced by Pacioli in Summa de Arithmetica, the Rule of 72 has been around for over half of a millennium as a mental mechanism to quickly estimate how long it takes a value experiencing exponential growth to double. This works with systems that have parameters that are described by a percentage change over a period of time. The classic example is interest on a loan or investment that compounds. Because we are used to seeing these kinds of measures in financial, economic, and political systems, we will see them in IoT conversations also.

To apply the Rule of 72, you take the rate of growth for a period expressed as a percentage and then divide that into the number 72. The result is the number of time periods, typically expressed in years, that it takes for the doubling to occur.

For example, if you buy a house that increases in value by 6% per year, the time to double the value is:

72 / 6 = 12

or 12 years to double. So a $400,000 house purchased today that appreciates by 6% per year will see a value of around $800,000 in 12 years.

(72 is a convenient estimate that facilitates mental division with values such as 2, 3, 4, 6, 8, 12, etc. A more accurate, but less easy to mentally work with, value is closer to 69. This stems from the value for natural log 2, aka ln(2), which is .69314 … For our purposes, we’ll stick with 72.)

Making IoT growth estimates more understandable

As we all try to get our heads around IoT, what it is, and how fast it is growing, we are bombarded by a variety of estimates and figures. We know these numbers seem big, but we’re not really sure how to use these figures or compare them to something else. Being able to quickly compute how long it takes for something to double in quantity can have more meaning for us than trying to interpret growth expressed as a percentage.

In his book Grapes of Math, Alex Bellos does a great job of describing where the Rule of 72 comes from and how it works. Further he reminds us that economic, financial, political, and other growth measures that describe sales, profits, stock prices, GDP, population, inflation, and more are often stated in percentage growth per year. Because of our familiarity with communicating this way, we can expect at least some IoT growth projections to be stated this way as well.

Gartner Press Release http://www.gartner.com/newsroom/id/2905717

Gartner’s installed IoT base estimate from late 2014 suggests exponential growth — 25% growth from 2013 to 2014, 30% growth from 2014 to 2015, and what looks like almost 40% annual growth from 2015 to 2020. If this is the case, then we can estimate 72 / 40 = 1.8 years to double. So, if we started with the almost 5 billion devices indicated in the 2015 column, we’d have 10 billion in about 22 months, sometime in 2017 — 1.8 x 12 months.

Analysis of IoT growth on semiconductor industry – http://pwc.to/1kwDuNc

This Gartner/PriceWaterhouseCoopers analysis shows a CAGR growth for sensors and actuators of approximately 10%. Applying the Rule of 72 for an estimate, we can expect to see the number of sensors and actuators deployed in the world around us to double in ~72 / 10 = 7.2 years — less than 2 presidential terms. What will twice the number of sensors and actuators around us look like?

According to this IDC report, the IoT market will see 19% growth for a market size doubling in a little under 4 years (72/19 = 3.8). The biggest growth area was 40% CAGR in the automotive sector for a market doubling in under 2 years.

Lots more connections … http://bit.ly/1msfrjG

This Business Insider report suggests a 45% year over year growth from 2 billion in 2014 to 9 billion in 2018 for connection count doubling in 72/45 = 1.6, a little over a year and a half.

And finally, ON World predicts a 250% growth in wireless light bulbs for a doubling in every ~ 3.5 months.

Limitations

It’s important to note that we don’t know what IoT growth will actually look like over several years. We have some initial data from the first few years that seem to suggest that this growth will be exponential versus linear growth, for example. Also where the Rule of 72 was initially applied — money growth (compounding) — is a recursive context — money grows because there is money to act on (and time). IoT growth will come from something else. At least for now, it’s not obvious that IoT growth is or will be recursive* — we don’t know that many IoT deployments this year will cause even more deployments next year, and then that next year’s increased deployments will cause yet an even higher incremental increase the following year, and so on.

*[One frightening possibility, of course, is the Skynet scenario from Terminator where conscious machines build conscious machines and recursion in full play …]

If, however, IoT growth roughly mimics or correlates to compounding growth (for whatever reason), then we can use the Rule of 72 to help us quickly estimate magnitudes and time scales and add some context to our conversations. With more context around the phenomenon of IoT, the better are our chances for managing the risk to our organizations that comes from its proliferation.

Do aspects of Shodan data – data about Internet of Things (IoT) devices and systems – demonstrate ‘long tail’ qualities? Data showing these qualities sometimes also go by the name of having a ‘Zipf distribution‘, following a power law, or behaving according to the Pareto principle. If there is in fact a reoccurring relationship or curve that occurs across aspects of IoT data, that might offer some insights into how to categorize or classify aspects of IoT systems. For managing risk around IoT systems implementations, our current ability to classify and categorize these systems is sorely missing. Potentially, it could also offer predictive capabilities regarding elements of the Internet of Things phenomena.

To take an initial swing at it, I narrowed the question down to:

Do the frequencies of occurrences of particular ports (services) in an organization, or other Shodan data set, behave in a repeatable way?

Long tails & power laws

The concept of long tail behavior was popularized in Chris Anderson’s 2004 Wired article and it has entered popular vernacular in the years since. What Anderson articulated was that aspects of many systems or sets of data are characterized by the observation that there are a lot of a few types of things and then a rapidly dropping number of other types of things — but there are a lot of those other types. Anderson used the example of record sales — there are a relatively few mega-hit songs, but there are a lot of non-hit songs and record companies were learning how to capitalize on this observation. This is the long tail.

George Zipf

Another example is early ‘long tail’ work attributed to George Zipf with his analysis of word distribution frequencies in any particular text. He found that if you:

counted how often each word appeared in a text
ranked each word so that the word with the highest count got the highest rank (i.e. #1) and down from there
plot the results in a graph

then you find a curve that shows that a few words show up a lot.

For example, the words ‘the’, ‘be’, and ‘to’ show up a lot (1st, 2nd, & 3rd in a ranked list) and words like ‘teeth’, ‘shell’, or ‘neck’ shows up around 1000 places down the list. From the first few spots in the ranked list, the frequencies of other ranked words fall off quickly — but there a lot of those ‘other’ words. Further, this curve is a power law which looks a bit like y = 1/x. Variations include multiplying 1/x by something and raising x to some exponent. (For Zipf relationships, this exponent is often close to 1).

Yet other Zipf relationships are found in studies of populations of cities data and website references.

Ranking city population sizes also follows Zipf-like relationships (the loglog plot is fairly linear)

Power law relationships in IoT data?

John Matherly, founder of Shodan, has been collecting data on IoT sorts of devices for years. He scans all publicly accessible IP addresses for particular ports for Internet of Things or Industrial Control systems including things like power plants, video cameras, HVAC systems, and others.

I have a particular interest in how IoT data shows in higher education IP address spaces, so I analyzed large subsets of data in some of those institutions. To do this I queried for data from those publicly facing IP spaces in the organization and exported it to a json format. (Shodan also offers an XML version, but it is deprecated). From the downloaded data, I used Python scripts to clean the data a bit, count how often each port occurred, and then rank them by organization. Finally, I used the Python module matplotlib to plot the results.

This is similar to the word frequency analysis approach above where, for a set of data:

Count the number of occurrences of each port (service)
Rank the ports so that the port (service) that occurs most frequently gets the highest rank
Plot the results

Like word frequency data in Zipf studies, a plot of frequency of occurrence of each port vs rank of each port’s frequency yields a curve that drops off so fast that it is hard to discern nuanced information. However, the fact that it does drop off so fast let’s us know something at a glance that is similar to Zipf data — a very few ports occur most often and a lot of ports have a few occurrences.

4 universities and 1 (organizationally) arbitrary & large) set of IP addresses on normal (non-log) plot

What gets more interesting visually is to plot that same data on a log log scale. This kind of brings the curve out to where it’s easier to see.

Zipf-like data can follow the relationship of y = 1/x almost exactly for much of the range. (This is part of why word frequency, city population data, etc is so intriguing.) So when plotted on log log, much of the line looks almost straight – slope of 1 (ish).

A log log plot of university IoT data doesn’t yield a straight line, but sort of a bulging out line. If you were standing on the graph way out to the right and up and looking toward the origin, it would appear convex. So this isn’t Zipf in the traditional sense — the log log plot is not linear.

However, they do look similar. University1 looks roughly like University2. University2 like University3, and University3 like University4, etc. The curve roughly retains its shape regardless of the school, though the school sizes are different (or at least the number of public IP addresses are different).

4 universities and 1 (organizationally) arbitrary & large set of IP addresses on log log plot

Maybe the organization doesn’t matter?

Also plotted are the results from a search on all of the IP addresses in the 128.0.0.0/8 range (using CIDR notation). This curve, though bigger and slightly smoother, has roughly the same shape as the others. The main thing that separates it from the others appears to be magnitude (number of IP addresses sampled). It appears that there is nothing particularly unique about an organization that drives this curve shape — a similar shape appears even if a set based on a numerical range, regardless of organization, is chosen.

It will be interesting to see if, as IoT device count grows, the curve changes shape. Will the set of IoT devices across the globe continue to communicate mostly over the same ports/services as those currently in use, keeping the same shape? Or will new ports/services/enumerations show themselves as IoT device proliferation continues, changing the shape? By analyzing ranking relationships over time and between organizations, this approach could provide some insight into helpful categorizations for risk analysis.

Long Tail Risk

Internet of Things systems risk management

Monthly Archives: December 2015

IoT & the Rule of 72

Going old school

Making IoT growth estimates more understandable

Limitations

Power laws & power plants – tackling IoT systems risk classification

Long tails & power laws

Power law relationships in IoT data?

Maybe the organization doesn’t matter?

Going old school

Making IoT growth estimates more understandable

Limitations

Share this:

Long tails & power laws

Power law relationships in IoT data?

Maybe the organization doesn’t matter?

Share this: