Understanding data expectations is essential to IoT Systems & Smart City/Campus success

One of the subtle but powerful factors affecting IoT Systems implementation and management success in complex organizations such as a smart cities and smart campuses is the change required in becoming a data centric organization. In most cases, this is not a small transition. The evolutions of these cities and institutions has been from a place of relatively limited data – and certainly not ubiquitous data – available across multiple contexts. When an organization begins to shift, or seeks to shift, to an organization where data production, acquisition, consumption/analysis, and management are core to its operation and to its perception of self, subtle but powerful cultural and organizational change is required.

Data generation and/or acquisition is a major component in almost all IoT Systems that may be deployed in support of smart cities or smart campuses. It’s where the money is, so to speak. The challenge is that the expectations of data from the many constituencies and consumers can vary in significant ways and these variances in expectation, in turn, influence perceptions of IoT System, and hence smart city system, success. Further, early IoT System implementations that are viewed as failures in support of a smart city or smart campus not only mean lost investment on those particular systems, but also that these failures will (understandably) make constituents wary of funding or deploying subsequent systems.

Reflecting on and planning for what our expectations of data are in our different constituencies and contexts can go a long way to helping us identify what successful IoT Systems implementations and smart city deployments might look like.

A framework for an organization’s expectations of data

University of Washington researchers Brittany Fiore-Gartland and Gina Neff have proposed a framework for considering those data expectations in the context of health and wellness data that we might borrow from in considering IoT Systems data in smart cities and smart campuses. In their paper “Communication, Mediation, and the Expectations of Data: Data Valences Across Health and Wellness Communities,” Fiore-Gartland/Neff introduce the concept of data valences.

The authors identify six data valences:

  • self-evidence
  • actionability
  • connection
  • transparency
  • ’truthiness’
  • discovery

I’ll briefly describe these valences as I understand them and then suggest how they might be applied to an IoT System/Smart City System such as an energy management or smart grid system.

self-evidence valence

My interpretation of the self-evidence valence is that data is context-free or at least appears that way. The context-free-ness notion conflicts with the popular assumption of interpretation or mediation being required to make data meaningful as the researchers point out. My own opinion is that data does indeed need mediation to be relevant. In my mind, data without mediation devolves to the ‘just because’ answer. (While this can apply in parenting, eg ‘because I said so,’ it is extremely narrow in scope and its effectiveness has a much shorter timeline than I anticipated).

actionability valence

Actionability refers to the expectation that the data does something or drives something. From the context of the data consumer, can that data be used to do something meaningful for that consumer within their context? Fiore-Gartland/Neff give the example of a physician being presented with self-collected patient data. This may well not be “clinically actionable” because the physician has no basis for comparison or reference.

connection valence

This valence identifies data as a ‘site for conversation.’ To me, this one is particularly meaty. Because regardless of all of the other (important) valences, the connection valence draws people to the same table to discuss data for one reason or another. An example given in the paper is that of a home patient contacting their case manager about data being collected as a part of the telemedicine system. The call was not particularly important regarding the telemedicine question, but rather because it provided an opportunity for the case manager and patient to connect and share other information (which might have been written on the margins of a legal pad).

Even if the data-discussion reasons are possibly simple or seem unrelated to ostensible objectives, people are still showing up for whatever reason and in the course of that showing up, other things are shared and communicated. I believe that is a powerful valence. And possibly not easily quantifiable.

transparency valence

The transparency data valence is pretty much what it sounds like. It’s the idea or expectation of real or perceived benefit of data being “accessible, open, sharable, or comparable across multiple contexts.” As the researchers state,“Making data transparent across communities is one set of values or expectations.” The transparency valence also introduces the idea that when there is data transparency, when it is indeed shared across contexts, then new questions around ownership, access, and confidentiality present themselves. And from my perspective, addressing these new questions/issues is important work and requires some resources – time/effort/maybe dollars – and that has to come from somewhere.

truthiness valence

Stephen Colbert introduced the word truthiness during one of his shows. He uses truthiness, or the Dog Latin version “veritasiness” to describe something that feels right or just seems right, often regardless of facts or evidence. (Self-referentially, I think the idea of ’truthiness’ itself also seems right – most of us think, “yes, I understand what truthiness is. I don’t know why, but I’m pretty sure I know what it is.”

So the truthiness data valence has to do with the data quality of feeling right or seeming right.

discovery valence

Per Fiore-Gartland/Neff, the discovery valence “describes how people expect data to be the source or site of discovery of an otherwise obscured phenomenon, issue, relationship, or state.” This is not inconsistent with the popular notion of Big Data — which generally goes something like, ‘because there’s so much data there, there’s got to be something there – patterns, knowledge, etc and we can find it intentionally or accidentally’. I’m not saying that I subscribe to this, but it seems right (see ‘truthiness’ above).

Data valences in an IoT System example – smart grid

Because I’ve spent some time working with and reflecting on the challenges of implementing and managing a long-term institutional energy management system and the associated cultural and organizational challenges needed to be effective, the data valences idea proposed by the authors has made for highly relevant conversation.

So how might the six data valences reveal themselves in an institutional energy management system such as a smart grid system or a part of a smart grid system — themselves IoT Systems? Below is my take on how each of these valences might come into play in this context.

self-evidence valence in energy management data

My initial reaction is that I don’t see this valence playing out particularly well here. This energy management data sourced from thousands of energy sensors across an institution needs to have context and be interpreted to have relevance. Also, the data is too new and unfamiliar and often complex for there to be strong statements of self-evidence.

That said, the topic of climate change and all the misinterpretations and rhetoric therein comes to mind. So maybe the self-evidence valence has applicability here as well. Perhaps conclusions will indeed be drawn from energy data devoid of context.

actionability valence in energy management data

Definitely. Everyone — consumers, vendors, government, others — expect to do something useful here.

connection valence in energy management data

Definitely again. This data provides the site, as the authors say, to come together to problem-solve. And in the course of that problem-solving, a parade of assumptions and expectations come quickly to the surface. Finance people , energy management people, IT and data people, vendors, and a variety of end-users bring their expectations, assumptions, and desires to these meetings. This data valence is particularly important at this stage of the game regarding energy management systems and likely IoT Systems more generally.

transparency valence in energy management data

Yup. Everybody wants this. Much like youthful dating, this distribution of data interpretation across contexts is exciting, challenging, and fraught with peril for misunderstanding. That said, addressing topics around this valence can bring important issues to the surface (though it’s typically a lot of work).

truthiness in energy management data

I’m not sure about the truthiness valence in institutional energy management data. Similar to the self-evidence valence, I don’t know that we have enough exposure to the data to have a truthiness feel about institutional energy management data. But again, misinterpreting climate change data has become a worldwide sport.

discovery valence in energy management data

Without a doubt. Almost all of us have this expectation of discovery, at least at some point — Start capturing energy data and we’ll make awesome decisions !! I do believe that capturing this data will yield useful, actionable (see above) data. However, I think it’s going to be more work than is immediately apparent.

Data valences in IoT Systems

How we, across our multiple constituencies within an institution, perceive various aspects of data has a strong influence on the perceived success of the system that produced the data. This is true for institutional energy management systems and I believe that that is broadly generalizable to IoT Systems of whatever institutional purpose.

Understanding data perceptions across an institution or population base is essential for successful IoT System implementations and hence Smart City or Smart Campus implementations. As I mentioned in the IoT Systems Hamburger Diagram, while not sexy and ‘blingy‘, the capability and capacity of an institution to implement complex IoT Systems in a complex environment is essential to success. Understanding the varied data consumers and their perceptions and needs in a complex organization such as a city or campus is, in turn, a critical component to a successful IoT System implementation.

Organizational-spanning characteristics of IoT Systems

Gaps between institutional organizations implementing & supporting IoT Systems create challenges

Gaps between institutional organizations implementing & supporting IoT Systems create challenges

One of the unique characteristics of IoT Systems, and one that adds to the complexity of a system’s deployment, is that they tend to span many organizations and entities within the institution. This is particularly true in Higher Education institutions with their city-like aspects, multiple service lines, and wide variety of activities in their buildings and spaces. While traditional enterprise systems, such as e-mail or calendaring, are likely to be owned and operated by one or two institutional organizations, IoT Systems involve many and are deployed in the ‘complex and material manifestations’ that characterize buildings and spaces.

A Higher Education institution example might be a research lab that incorporates an automation and environmental control system that involves the facilities organization, the central IT organization, maybe a local/distributed IT organization, the lead researcher (aka Principal Investigator or PI), her lab team, at least one vendor/contractor and probably several other vendors. Between each of these, a gap forms where system ownership and accountability can fall. Everyone sees their piece, but not much of the others. There’s no one monitoring the greater Gestalt of the IoT System. And that’s where the wild things are.

Traditional enterprise systems tend to fall within the domain of central IT with use of the system being distributed around the institution. So with traditional enterprise systems, use is distributed but ownership and operation is largely with one organization. IoT Systems, on the other hand, tend to have multiple parties/organizations involved in the implementation and management, but the ownership is unclear.

IoT Systems are systems within systems within systems ...

IoT Systems are systems within systems within systems …

This lack of ownership can lead to unfortunate assumptions. For example, the end user/researcher in the Higher Ed case is probably thinking, “central IT and the Chief Information Security Officer are ensuring my system is safe and secure.” The central IT group is thinking, “I’ve got no idea what they’re plugging into the network down there … I didn’t even know they bought a new system. Where did that come from? “ The facilities people might be thinking, “Okay I’ll install these 100 sensors and 50 actuators around this building and these two computers in the closet that the vendor said I had to install. The research people and central IT people will make sure it’s all configured properly.” No one is seeing the whole picture or managing the whole system to desired outcomes.

This implementation and management of IoT Systems is a part of what is being explored within Internet2’s IoT Systems Risk Management Task Force in support of  Internet2’s  Smart Campus Initiative.

Technology adoption in other aspects of the building industry

Research has been done in other areas of technology implementation in the building, space, and campus realms that might help shed some light on the multi-organizational challenge that IoT system implementations can bring to institutions.

Research in the Building Information Modeling (BIM) field suggests that buildings have a ‘complex social and material manifestation … [that requires] a shared frame of reference to create.’ Building Information Modeling seeks to codify or digitize the physical aspects of a space or place such that its attributes can be stored, transmitted, exchanged in a way that supports decision-making and analysis.

In their paper, “Organizational Divisions in BIM-Enabled Commercial Construction”, researchers Carrie Dossick and Gina Neff identify competing obligations within supporting/contributing organizations that limit technology adoption.  They point out that BIM-enabled projects are “often tightly coupled technologically, but divided organizationally.” I believe that their observations regarding BIM projects also share common aspects of deploying and managing IoT Systems.

The Dossick/Neff research suggests that mechanical, electrical, plumbing, and fire life safety systems can be as much as 40% of the commercial construction project scope.It is likely that this number will only increase as our buildings and spaces become more alive, aware, and aggregating of information of what goes on in and around these spaces.

For the BIM implementations studied, the research suggests that there are three obligations of the people and groups contributing to the effort and that these can be in conflict with each other.  The obligations are: scope, project, and company.

As I interpret the paper, scope obligation is what a person or group is supposed to accomplish for the effort – what are they tasked to do. That mission is not overarching organization and coordination of the project, but rather specific, often local, tasks that must be done in support of the effort. In the course of that work, participants in the work are naturally “advocating for their particular system.“

Projects bring together temporary teams for a particular purpose. These time boundaries and purpose boundaries create the environment for the project obligation. Specific timelines and milestones can drive the project obligation. This area can be particularly challenging as it can involve negotiations among different providers, the interests of owners, and design requirements.

Finally, obligations to company “emphasize the financial, legal, and logistical requirements” where ownership and management provide the environment and context in which work is done. This also makes sense intuitively as company is whence one’s paycheck comes. Performance today can influence today’s paycheck as well as paychecks down the road.

While not exactly the same, I think there are some parallels with IoT Systems implementation and management.

BIM implementation/adoption -> IoT Systems implementation & management

  • scope -> scope
  • project -> project
  • company -> department/organization

To me, organizational aspects of scope and project are very similar between IoT Systems implementation/management and those observed and analyzed by the research in BIM implementation.

For IoT Systems implementation and management within an institution, internal departments and internal organizations can closely parallel that of the company that the research addresses. A person’s or group’s directives, performance expectations, and paycheck approval ultimately comes from that department or organization so there will be natural alignment there.

Leadership ‘glue’ has its limits

Finally, the paper also points out the role of leadership in an effort where there is new technology to potentially be adopted or leveraged. While strong leadership is clearly desirable, the research suggests that even good leaders often cannot overcome structural organizational problems with great efficiency or effectiveness. The authors also note that research does not yet understand what “organizational resources to be in place for effective collaboration to occur after new technologies are introduced.”

IoT System ownership for implementation & management

Like BIM and related technology adoption, I believe successful IoT Systems implementations have similar institutional organizational challenges. IoT Systems implementations are themselves a part of larger system of institution, organizations, and people. IoT System implementation and management success will, I believe, also require learning to work with these multiple organizations that have inherent competing obligations. While there may be other approaches to evolve to solve these organizational challenges, a reasonable place to start in the near term might be to establish organizational-spanning system ownership and accountability.

Internet2 Chief Innovation Office launches IoT Systems Risk Management Task Force

Internet2 has launched a national Task Force to study risk management needs around IoT Systems in Higher Education and research institutions. The Task Force is composed of Higher Education and research IT and Information Management leaders across the country and will explore the areas of IoT Systems selection, procurement, implementation, and management. At the end of 12 months, the IoT Systems Risk Management Task Force will deliver a set of recommendations for 3 – 5 areas of further in-depth work. (And in the interest of full disclosure, I am Chairing the IoT Systems Risk Management Task Force.)

Internet of Things Systems or IoT Systems offer great potential value to higher education, research, government, and corporate institutions. From energy management, to research automation systems, to systems that enhance student, faculty, staff, and public safety, to academic learning systems, IoT Systems offer great promise. However, these systems need to be implemented thoughtfully and thoroughly or the investment value won’t be realized. Further, because of the distributed computing and networking capabilities of IoT devices, poor IoT Systems implementations can even make things worse for institutions, corporations, or governments.

Internet2 Chief Innovation Office

i2logoThe mission of the Internet 2 Chief Innovation Office, led by Florence Hudson,  is to work with Internet2 members to define and develop new innovations around the Internet. The Innovation Program has three core working groups —

Internet2’s core offerings are its 100 gbps network and their NET+ services.  Their membership includes 300 Higher Education institutions and over 150 industry, lab, and national agency organizations.

Many IoT systems risk topics

Examples of topics that the Task Force will cover include IoT systems vendor management issues, network segmentation strategies and approaches, cost estimating tools and approaches for IoT systems, potential tool development and/or partnering with organizations that perform Internet-wide scanning for IoT-related systems, and the organizational and cultural issues encountered in transitioning to a data-centric organization.

IoT systems vendor management approaches

Organizations and institutions need to raise the bar with IoT systems vendors regarding what constitutes a successfully delivered product or service. For example, has the vendor delivered documentation showing the final installation architecture, have default logins & passwords been change on all devices (how is this demonstrated), have all unnecessary services been deactivated on all devices and systems and how is this demonstrated?

Development of common ‘backends’ for IoT systems

Current IoT systems (to include utility distribution, building automation systems, many others) vendor approaches require that institutions invest in separate and proprietary ‘backend’ architectures consisting of application servers, databases, etc for each different vendor. This is an approach that does not lend itself to manageability, extensibility, or scalability.  In this space, perhaps newer container and container management technologies offer solutions as well as other possibilities.

1200px-Internet_of_things_wilgengebroedDevelopment of network segmentation/micro-segmentation strategies and approaches for IoT Systems

Network segmentation seems to offer great promise for mitigating risk around IoT Systems implementations. However, without appropriate guidance for IoT network segmentation implementation and operation, institutions can end up with a full portfolio of poorly managed network segments. Exploration and development of institutional network segmentation best practices can serve to lower an organization’s risk profile.

Development of cost estimating tools and approaches for IoT Systems

There is little in the way of precedent for cost models for the rapidly evolving IoT systems space and, as such, planning for IoT Systems and trying to estimate Total Cost of Ownership is difficult and nuanced. Exploration of and development of IoT Systems cost models can be of real value to institutions making planning and resourcing decisions.

Development of risk language & risk categories around IoT systems

Currently it is difficult to discuss new risk brought on by IoT systems with enterprise risk managers because IoT systems themselves are difficult to describe and discuss.  Development and socializing IoT risk language, that incorporates existing familiar institutional risk language, would enhance the ability to discuss IoT systems risk at the enterprise level. This Task Force will explore this nuanced space as well.

Analysis tool development and partnering

The Task Force will explore tool development and/or partnerships with organizations that scan the Internet for industrial control systems and IoT systems and publish these results online. Exploring internal tool development of the same is also a possibility. Development of benchmarks and baselines of Internet-scanning results across different industries and market sectors will also be considered.

Organizational cultural barriers to successful implementation of IoT Systems

Changing from a traditional organization to a data centric organization is a non-trivial transition and not addressing these issues can be a barrier to successful implementations of IoT Systems in institutions, organizations, and cities. The Task Force will study this important space as well.

Early Task Force work will also include identifying and enumerating other independent and overlapping risk areas (operational, cyber, cultural, and others). Over the year, Task Force members will participate in phone conferences, listen to subject matter expert presentations, and identify, discuss, and prioritize IoT Systems issues. Finally, recommendations will be made for further focused work on the highest priority areas.  If you have questions, comments, further interest, please contact me ChuckBenson@longtailrisk.com or the Internet2 Chief Innovation Office at CINO@internet2.edu.

 

[IoT image above: By Wilgengebroed on Flickr – https://www.flickr.com/photos/wilgengebroed/8249565455/, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=32745541]

A potential IoT systems vendor checklist v2.0

In order to maximize the value of an IoT system to an institution or city, how that system is implemented is critical. Without a thoughtful and thorough implementation, the value of the investment will not be met and, possibly, the value can even be negative through the addition of unmitigated risk to the institution or city.

I’ve updated the IoT systems planning considerations list from an earlier post and created a more checklist-like document to use when working with IoT systems vendors. Earlier versions appeared in posts Institutional considerations for managing risk around IoT,  Developing an IoT vendor strategy, and Systems in the seam – shortcomings in IoT systems implementation. Ideally, this document could be used during the contract development and negotiation phases with the vendor.

The intent of the document is to help compute the Total Cost of Ownership for an IoT systems implementation as well as raise expectations of the vendor for a delivered system. In doing so, we can help mitigate some operational risk (suboptimal business decisions) as well as cybersecurity-related risk (bad guys wanting to use our assets in a malicious manner).

I’ve created rough categorizes of issues falling under operational risk, cybersecurity risk, and those falling in both categories to help provide some additional structure. However, many issues influence each other so it’s not critical to get tied up in the categorization.

IoTChecklistV2-032716

A starting point for an IoT systems vendor checklist

pdf version here

Looking for a quick definition of IoT?

Defining IoT (image Wikipedia)

Defining IoT
(image Wikipedia)

 

Tired of looking for the right words when trying to impress your boss, friends, or potential future spouse with a description of the Internet of Things, aka IoT ? Well look no further !

The 10 word version

Here’s a 10 word version. An IoT device is one that:

1. Computes
2. Is networked
3. Interacts with the environment in some way

The 20 word version

And once you’ve impressed them with this knowledge that just rolled off your tongue, feel free take it further with the 20 word version!

1. Computes
2. Is networked
3. Interacts with the environment with the intention of collecting sensory data and/or manipulating the local environment

For example:

  • A FitBit device computes, is networked, & interacts with the environment (ie you)
  • An industrial  SmartGrid meter computes, is networked, & interacts with the environment (collects power data)
  • A residential Nest meter computes, is networked, and interacts with the environment (collects temperature data)
  • Chicago’s Array of Things devices compute, are networked, and interact with the environment (collect many environmental data points)
  • Blood glucose monitors compute, are networked, and interact with the environment (ie you)
  • and much much more !!

And then, while impressing those around you, you can bring it on home with the definition of an IoT System. An IoT system:

1. Is a set of IoT devices that
2. Communicate with each other and/or communicate with
3. A central server that aggregates data and/or provides control data

Congratulations on your assured future personal, social, and professional successes now that a handy definition of IoT is at your disposal!

IoT & the Rule of 72

There are many different estimates regarding the growth rate of the Internet of Things (IoT). There are projections of number of connected devices, projections on market capitalization, projections on growth of semiconductor counts supporting those devices, and many others. Because the numbers of devices and systems are so high and these projections are around things that we typically don’t understand well, it’s hard to get a feel for what is actually increasing so rapidly. What is this thing that is growing so rapidly? How fast is it growing? If we can’t roughly understand the magnitudes involved, we can’t discuss, plan, assess, or begin to mitigate risk to our organizations and institutions involving these systems.

Going old school

summa

Summa de arithmetica – Wikipedia http://bit.ly/1MHOuxO

One way to better our ballpark understanding of this rate of growth can be with the old school method of applying the Rule of 72. Introduced by Pacioli in Summa de Arithmetica, the Rule of 72 has been around for over half of a millennium as a mental mechanism to quickly estimate how long it takes a value experiencing exponential growth to double. This works with systems that have parameters that are described by a percentage change over a period of time.  The classic example is interest on a loan or investment that compounds. Because we are used to seeing these kinds of measures in financial, economic,  and political systems, we will see them in IoT conversations also.

To apply the Rule of 72, you take the rate of growth for a period expressed as a percentage and then divide that into the number 72. The result is the number of time periods, typically expressed in years, that it takes for the doubling to occur.

For example, if you buy a house that increases in value by 6% per year, the time to double the value is:

72 / 6 = 12

or 12 years to double. So a $400,000 house purchased today that appreciates by 6% per year will see a value of around $800,000 in 12 years.

(72 is a convenient estimate that facilitates mental division with values such as 2, 3, 4, 6, 8, 12, etc. A more accurate, but less easy to mentally work with, value is closer to 69. This stems from the value for natural log 2, aka ln(2), which is .69314 … For our purposes, we’ll stick with 72.)

Making IoT growth estimates more understandable

As we all try to get our heads around IoT, what it is, and how fast it is growing, we are bombarded by a variety of estimates and figures. We know these numbers seem big, but we’re not really sure how to use these figures or compare them to something else. Being able to quickly compute how long it takes for something to double in quantity can have more meaning for us than trying to interpret growth expressed as a percentage.

In his book Grapes of Math, Alex Bellos does a great job of describing where the Rule of 72 comes from and how it works.  Further he reminds us that economic, financial, political, and other growth measures that describe sales, profits, stock prices, GDP, population, inflation, and more are often stated in percentage growth per year.  Because of our familiarity with communicating this way, we can expect at least some IoT growth projections to be stated this way as well.

Gartner Press Release http://www.gartner.com/newsroom/id/2905717

Gartner Press Release http://www.gartner.com/newsroom/id/2905717

Gartner’s installed IoT base estimate from late 2014 suggests exponential growth — 25% growth from 2013 to 2014, 30% growth from 2014 to 2015, and what looks like almost 40% annual growth from 2015 to 2020.  If this is the case, then we can estimate 72 / 40 = 1.8 years to double. So, if we started with the almost 5 billion devices indicated in the 2015 column, we’d have 10 billion in about 22 months, sometime in 2017 — 1.8 x 12 months.

GartnerPWCAnalysis

Analysis of IoT growth on semiconductor industry – http://pwc.to/1kwDuNc

This Gartner/PriceWaterhouseCoopers analysis shows a CAGR growth for sensors and actuators of approximately 10%.  Applying the Rule of 72 for an estimate, we can expect to see the number of sensors and actuators deployed in the world around us to double in ~72 / 10 = 7.2 years — less than 2 presidential terms. What will twice the number of sensors and actuators around us look like?

According to this IDC report, the IoT market will see 19% growth  for a market size doubling in a little under 4 years (72/19 = 3.8). The biggest growth area was 40% CAGR in the automotive sector for a market doubling in under 2 years.

BIIntelIotGrowth

Lots more connections … http://bit.ly/1msfrjG

This Business Insider report suggests a 45% year over year growth from 2 billion in 2014 to 9 billion in 2018 for connection count doubling in 72/45 = 1.6, a little over a year and a half.

And finally, ON World predicts a 250% growth in wireless light bulbs for a doubling in every ~ 3.5 months.

Limitations

It’s important to note that we don’t know what IoT growth will actually look like over several years. We have some initial data from the first few years that seem to suggest that this growth will be exponential versus linear growth, for example.  Also where the Rule of 72 was initially applied — money growth (compounding) —  is a recursive context — money grows because there is money to act on (and time). IoT growth will come from something else.  At least for now, it’s not obvious that IoT growth is or will be recursive* — we don’t know that many IoT deployments this year will cause even more deployments next year, and then that next year’s increased deployments will cause yet an even higher incremental increase the following year, and so on.

*[One frightening possibility, of course, is the Skynet scenario from Terminator where conscious machines build conscious machines and recursion in full play …]

If, however, IoT growth roughly mimics or correlates to compounding growth (for whatever reason), then we can use the Rule of 72 to help us quickly estimate magnitudes and time scales and add some context to our conversations. With more context around the phenomenon of IoT, the better are our chances for managing the risk to our organizations that comes from its proliferation.

Power laws & power plants – tackling IoT systems risk classification

Do aspects of Shodan data – data about Internet of Things (IoT) devices and systems – demonstrate ‘long tail’ qualities? Data showing these qualities sometimes also go by the name of having a ‘Zipf distribution‘, following a power law, or behaving according to the Pareto principle. If there is in fact a reoccurring relationship or curve that occurs across aspects of IoT data, that might offer some insights into how to categorize or classify aspects of IoT systems. For managing risk around IoT systems implementations, our current ability to classify and categorize these systems is sorely missing. Potentially, it could also offer predictive capabilities regarding elements of the Internet of Things phenomena.

To take an initial swing at it, I narrowed the question down to:

Do the frequencies of occurrences of particular ports (services) in an organization, or other Shodan data set, behave in a repeatable way?

Long tails & power laws

The concept of long tail behavior was popularized in Chris Anderson’s 2004 Wired article and it has entered popular vernacular in the years since. What Anderson articulated was that aspects of many systems or sets of data are characterized by the observation that there are a lot of a few types of things and then a rapidly dropping number of other types of things — but there are a lot of those other types. Anderson used the example of record sales — there are a relatively few mega-hit songs, but there are a lot of non-hit songs and record companies were learning how to capitalize on this observation. This is the long tail.

George Zipf

George Zipf

Another example is early ‘long tail’ work attributed to George Zipf with his analysis of word distribution frequencies in any particular text. He found that if you:

  1. counted how often each word appeared in a text
  2. ranked each word so that the word with the highest count got the highest rank (i.e. #1) and down from there
  3. plot the results in a graph

then you find a curve that shows that a few words show up a lot.

 

For example, the words ‘the’, ‘be’, and ‘to’ show up a lot (1st, 2nd, & 3rd in a ranked list) and words like ‘teeth’, ‘shell’, or ‘neck’ shows up around 1000 places down the list. From the first few spots in the ranked list, the frequencies of other ranked words fall off quickly — but there a lot of those ‘other’ words. Further, this curve is a power law which looks a bit like y = 1/x. Variations include multiplying 1/x by something and raising x to some exponent. (For Zipf relationships, this exponent is often close to 1).

Yet other Zipf relationships are found in studies of populations of cities data and website references.

citypopulationdata

Ranking city population sizes also follows Zipf-like relationships (the loglog plot is fairly linear)

Power law relationships in IoT data?

John Matherly, founder of Shodan, has been collecting data on IoT sorts of devices for years. He scans all publicly accessible IP addresses for particular ports for Internet of Things or Industrial Control systems including things like power plants, video cameras, HVAC systems, and others.

I have a particular interest in how IoT data shows in higher education IP address spaces, so I analyzed large subsets of data in some of those institutions. To do this I queried for data from those publicly facing IP spaces in the organization and exported it to a json format. (Shodan also offers an XML version, but it is deprecated). From the downloaded data, I used Python scripts to clean the data a bit, count how often each port occurred, and then rank them by organization. Finally, I used the Python module matplotlib to plot the results.

This is similar to the word frequency analysis approach above where, for a set of data:

  1. Count the number of occurrences of each port (service)
  2. Rank the ports so that the port (service) that occurs most frequently gets the highest rank
  3. Plot the results

Like word frequency data in Zipf studies, a plot of frequency of occurrence of each port vs rank of each port’s frequency yields a curve that drops off so fast that it is hard to discern nuanced information. However, the fact that it does drop off so fast let’s us know something at a glance that is similar to Zipf data — a very few ports occur most often and a lot of ports have a few occurrences.

nonlogmultipleuniversity

4 universities and 1 (organizationally) arbitrary & large) set of IP addresses on normal (non-log) plot

What gets more interesting visually is to plot that same data on a log log scale. This kind of brings the curve out to where it’s easier to see.

Zipf-like data can follow the relationship of y = 1/x almost exactly for much of the range. (This is part of why word frequency, city population data, etc is so intriguing.) So when plotted on log log, much of the line looks almost straight – slope of 1 (ish).

A log log plot of university IoT data doesn’t yield a straight line, but sort of a bulging out line. If you were standing on the graph way out to the right and up and looking toward the origin, it would appear convex. So this isn’t Zipf in the traditional sense — the log log plot is not linear.

However, they do look similar. University1 looks roughly like University2. University2 like University3, and University3 like University4, etc. The curve roughly retains its shape regardless of the school, though the school sizes are different (or at least the number of public IP addresses are different).

loglogmultipleuniversity

4 universities and 1 (organizationally) arbitrary & large set of IP addresses on log log plot

Maybe the organization doesn’t matter?

Also plotted are the results from a search on all of the IP addresses in the 128.0.0.0/8 range (using CIDR notation).  This curve, though bigger and slightly smoother, has roughly the same shape as the others. The main thing that separates it from the others appears to be magnitude (number of IP addresses sampled). It appears that there is nothing particularly unique about an organization that drives this curve shape — a similar shape appears even if a set based on a numerical range, regardless of organization, is chosen.

It will be interesting to see if, as IoT device count grows, the curve changes shape. Will the set of IoT devices across the globe continue to communicate mostly over the same ports/services as those currently in use, keeping the same shape? Or will new ports/services/enumerations show themselves as IoT device proliferation continues, changing the shape?  By analyzing ranking relationships over time and between organizations, this approach could provide some insight into helpful categorizations for risk analysis.

Institutional considerations for managing risk around IoT

socket

sockets for vendor products & services

There are a number of things to think about when planning and deploying an IoT system in your institution. In posts here since last spring, several issues have been touched upon — the idea of sockets and seams in vendor relationships, the rapid growth in vendor relationships to be managed and the resulting costs to your organization, communicating IoT risk, some quick risk visualization techniques based on Shodan data, initial categorization of IoT systems, and others.  The FBI warning on IoT last week is a further reminder of what we’re up against.

There is a lot to chew on and digest in this rapidly changing IoT ecosystem. Below is a partial list of some things to consider when planning and deploying IoT systems and devices in your institution. It’s not a checklist where all work is done when the checking is complete. Rather, it is intended to be a starting list of potential talking points that you can have with your team and your potential IoT vendors.

Some IoT Planning Considerations

  • Does IoT vendor need 1 (or more) data feeds/data sharing from your organization?
    • Are the data feeds well-defined?
    • Do they exist already?
    • If not, who will create & support them?
    • Are there privacy considerations?
  • How many endpoint devices will be installed?
    • Is there a patch plan?
    • Do you do the patching?
    • Who manages the plan, you or the vendor?
  • Does this vendor’s system have dependencies on other systems?
  • How many IoT systems are you already managing?
    • How many endpoints do you already have?
    • Are you anticipating/planning or planning more in the next 18 months?
  • Is there a commissioning plan? Or have IoT vendor deliverable expectations otherwise been stated (contract, memorandum of understanding, letter, other?)
    • Has the vendor changed default logins and passwords? Has the password schema been shared with you?
    • Are non-required ports closed on all your deployed IoT endpoints?
    • Has the vendor port scanned (or similar) all deployed IoT endpoints after installation?
    • Is there a plan (for you or vendor) to periodically spot check configuration of endpoint devices?
  • Has the installed system been documented?
    • Is there (at least) a simple architecture diagram?
      • Server configuration documented?
      • Endpoint IP addresses & ports indicated?
  • Who pays for the vendor’s system requirements (eg hardware, supporting software, networking, etc?)
    • Does local support (staffing/FTE) exist to support the installation? Is it available? Will it remain available?
    • If supporting IoT servers are hosted in a data center, who pays those costs?
      • startup & ongoing costs?
    • Same for cloud — if hosted in cloud, who pays those costs?
      • startup & ongoing costs?
  • What is total operational cost after installation?
    • licensing costs
    • support contract costs
    • hosting requirements costs
    • business resiliency requirements costs
      • eg redundancy, recovery, etc for OS, databases, apps
  • How can the vendor demonstrate contract performance?
    • Okay to ask vendor to help you figure this out
  • Who in your organization will manage the vendor contract for vendor performance?
    • Without person/team to do this, the contract won’t get managed
  • Can vendor maintenance contract offset local IT support shortages?
    • If not, then this might not be the deal you want
  • For remote support, how does vendor safeguard login & account information?
    • Do they have a company policy or Standard Operating Procedure that they can share with you?
  • Is a risk sharing agreement in place between you and the vendor?
    • Who is liable for what?

Typically, with the resources at hand, it will be difficult to get through all of these — maybe even some of these. The important thing, though, is to get through what we can and then be aware of and acknowledge the ones we weren’t able to do. It’s way better to know we’ve come up short given limited resources than to think we’ve covered everything when we’re not even in the ballpark.

Talking about IoT

word cloud from 3 business magazine articles on IoT

word cloud from 3 business magazine articles on IoT

I was curious if language in articles and blog posts on IoT varied significantly with the type of magazine or blog. So my unscientific quick and dirty research was to use three semi-arbitrarily* chosen articles from three different types of blog or magazine, do word frequency counts in each of these, and then from this do a word cloud where font size varies with frequency of the word count. The three magazine/blog types were: business magazine or blog, industry trade magazine or blog, and vendor magazine or blog.  (*I say ‘semi-arbitrarily’ because I chose them all myself and I’m sure my Googling/searching habits aren’t without some bias).

The first word cloud above was made by piling the words of all three articles together and then doing a word frequency count on the combined verbiage, sorting the counts, and then creating a word cloud. I used Wordle.net to do this and it makes the last three steps pretty easy to do.

Similarly, I made sorted word frequency word clouds for articles from three industry/trade magazines/blogs and did the same again for vendor magazines/blogs:

tradeiotwordcloud

word cloud from 3 industry/trade magazines/blogs

vendoriotwordcloud

word cloud from 3 vendor blogs

Side by side, they look like:

side by side comparison of the same

side by side comparison of the same (click to increase size)

While there are a number of things that could be done to make the comparison more robust (higher sample count, remove ‘stop words‘, etc), I think even this little snippet of samples shares some interesting results (or at least provides direction/motivation for digging deeper with a larger sample set). Some ‘eyeball’ observations from this sample set:

  • security & privacy more prevalent in trade/industry articles/posts
  • vendor articles/posts seem to hit harder on data
  • trades & vendors heavier on sensors
  • business sample skewed some with a chunk of text from one article dedicated to talking about IoT parking systems

Language use is important because, among other things, it directly affects how we categorize, classify, and discuss risk.

Again, no smoking gun here and there’s plenty of room to make this more robust, but the use of the language used to talk about IoT it becomes more prevalent might be interesting to keep an eye on.

 

**********
If you’re interested … the three articles/posts from business magazines/blogs were:
Wall Street Journal –
http://www.wsj.com/articles/the-internet-of-things-will-change-everything-1424664603
Forbes –
http://www.forbes.com/sites/jacobmorgan/2014/10/30/everything-you-need-to-know-about-the-internet-of-things/
Business Insider –
http://www.businessinsider.com/how-the-internet-of-things-market-will-grow-2014-10

The three articles/posts from industry/trade magazines and blogs were:
CIO magazine –
http://www.cio.com/article/2923475/innovation/cios-put-the-internet-of-things-in-perspective.html
Dark Reading –
http://www.darkreading.com/partner-perspectives/intel/securing-the-internet-of-things/a/d-id/1318072
EE Times
http://www.eetimes.com/document.asp?doc_id=1325079

And the three articles/posts from vendor articles/posts were:
Cisco –
http://www.cisco.com/web/solutions/trends/iot/introduction_to_IoT_november.pdf
Microsoft –
http://www.microsoft.com/en-us/server-cloud/internet-of-things.aspx
Atmel –
http://blog.atmel.com/2015/06/26/6-reasons-why-the-iots-true-value-remains-untapped/