Capturing Value from IoT Data with Federated Analytics

Topics in this article

In a recent blog posted to CIO.com, Dell’s Global VP, Global CTO for Sales and Distinguished Engineer, Patricia Florissi, Ph.D. outlined how the unique characteristics of the Internet of Things (IoT) create the need for federated analytics, an approach that allows data to be analyzed wherever it resides — at the edge, at the distributed core, and in the cloud. I invite you to read more from Florissi in the blog below, “Capturing Value from IoT Data with Federated Analytics.”

Capturing Value from IoT Data with Federated Analytics

As the Internet of Things comes of age, industry observers warn that we need to get ready for exponential growth in the numbers of connected devices. Are you ready for a really big number? The predictions say we will have a trillion connected IoT devices in service by the end of the century. Sooner than that is the estimated 20 billion – 30 billion connected devices that are expected by 2021.[1]

With all those connected devices at work, we won’t have any shortage of data. The challenge will be to gain value from massive amounts of data generated by sensors and captured by gateways in local data zones scattered around the world. This challenge becomes even harder when you consider three unique characteristics of IoT data: streaming, scale, and system of systems.

Let’s walk through each of these three “Ss” of IoT data, and think about the associated challenges for data analytics.

Streaming

With the IoT, there is a streaming nature to a great deal of data generated by devices. These devices might be sensors on patient monitors, on airplanes engines, or on autonomous vehicles. And they might be on land, on water, or in the air — such as drones. Devices like these generate continuous streams of data that can require continuous real-time analysis over individual subsets of this stream, often referred to as batches.

A Light Detection And Ranging (LiDAR) system, for example, calculates the distance to an object by illuminating the object with a pulsed laser light and by using a sensor to measure the reflected pulses. A typical LiDAR system takes 1.3 million readings per second, and these readings need to be analyzed on a continuous basis. This is in sharp contrast to discrete data sets that are first collected and, only after the collection completes, analyzed at different points in time. Furthermore, for analysis in real time, the data may demand to be analyzed locally — because time, like response and reaction time, is lost when data is transmitted.

Scale

With the IoT, you gain limited insight by analyzing data generated by sensors of a single type or located within a close-proximity area. Instead, data from a variety of sensors provides different perspectives on a physical environment, such as temperature, air quality, vibration levels of devices, and human occupancy. The real value lies in analyzing the data generated by tens of thousands or even millions of sensors. With this scale, you can improve the accuracy and reliability of the digital representation of the environment. The more data you get, the more informed you are about whatever it is you’re looking at — the performance of a product, energy usage, transaction analysis, or the flow of traffic on a freeway.

At the same time, the more value you get from more sensors, and the more sensors you deploy, the more data you get in return, and the more accurate and reliable the analysis. All of this generates a positive feedback loop where scale continuously drives the need for more scale. Yet for all its benefits, this scale comes with its own challenges. You end up with data in a multitude of local gateways, data nodes, and multiple clouds. So how do you analyze that data when it’s all over the place?

System of systems

Many functionally-related IoT devices tend to form a system in and of itself. For example, all the sensors from the Global Positioning System (GPS), the Inertial Measurement Unit (IMU) devices, the LiDAR technology, and the cameras used to calculate the exact geo position of a vehicle form its localization system. Sensors within the same system typically communicate with each other in order to align toward a common objective, such as triangulating data from several types of sensors to adjust and cross-correct the most accurate location of a vehicle.

The ultimate value of an IoT device, however, is obtained only when different systems work together to align toward higher-order goals. For example, greater value emerges when the localization system connects with a Heating Ventilation and Air Conditioning (HVAC) system to adjust the home temperature when the vehicle is a couple of miles from home. Further value emerges when the localization system connects with the home security system to automatically open the garage door as the vehicle approaches the driveway. The systems forming the larger system of systems are heterogeneous in nature, and a system may require that additional analytics be performed by other systems in order to improve its decision-making process.

Let’s consider another example. A smart electric grid system may require the HVAC system in all homes in a given area to perform a one-week demand analysis to better estimate consumption in the area, while the HVAC systems may require the calendar systems to predict the expected times each family member will be at home during the upcoming week in order to better estimate the energy required for each household. This creates the need to run analytics across the multiple data zones, where each data zone represents a system for a system owner, while all come together in the connected system of systems.

Overcoming analytics challenges

On one hand, the combined unique characteristics of streaming, scale, and system of systems impose severe challenges on the ability of enterprises to deploy a centralized IoT analytics architecture, where all the data moves toward a single location and is then analyzed. On the other hand, in many cases, IoT data can’t be moved to a centralized location for analysis anyway, for reasons ranging from regulatory requirements and security concerns to bandwidth constraints and the need for real-time analysis. In other cases, the data could be moved but it’s hard to get to it because it is generated in places with poor connectivity.

So, what do we do about this? In this new world, which is now unfolding all around us, we will increasingly take analytics and compute to the data.  Under this new approach, data is analyzed locally, and only the results are shared outside of the local data zone. Results from different distributed data zones are collected and fused together for further analysis. This approach is called federated analytics, which I have written about in multiple recent blog posts.

The Dell Technologies approach to federated analytics is called World Wide Herd (WWH). WWH creates a global network of data zones that function as a single virtual computing cluster. The WWH orchestrates the execution of distributed and parallel computations on a global scale, across data zones, pushing analytics to where the data resides.

In the context of IoT, a data zone can be as contained as an IoT gateway, or it can be an appliance aggregating and analyzing streams of data collected by several gateways, or it can be a data center, such as in a manufacturing facility, or it can be a cloud, private or public. Regardless of the deployment model, the data itself stays in place. This approach enables the analysis of geographically dispersed data without requiring the data to be moved to a single location before analysis. Only the privacy-preserving results of the analysis are shared, and made available for further analysis.

WWH answers the questions raised by the three “Ss” of IoT data. Continuous streams of data can be analyzed in place, and in real time, as batches of a stream becomes available, while the continuous stream of results can be shared with other data zones. Distributed data can be analyzed at scale, as analytics is pushed to where the data is, minimizing data movement and conserving bandwidth, while leveraging available computing capacity that is scattered across the edge-core-cloud continuum. And datasets residing within each of the system of systems can be analyzed right within their data zones.

All of this opens the door for analysis of ever-growing volumes of IoT data in our connected world — as we head toward the day of a trillion connected devices.


[1] IEEE Spectrum, “Popular Internet of Things Forecast of 50 Billion Devices by 2020 Is Outdated,” August 18, 2016.

About the Author: Jean Marie Martini

Topics in this article