Thinking Beyond the Database: Continuous Intelligence
The speed and volume of modern data demands continuous intelligence to create business impact.
- By Simon Crosby
- August 18, 2020
The use of databases is deeply embedded in our enterprise application architectures. There are scores of database types, open source projects, and massively scalable cloud data lakes and databases. However, for a growing class of business-critical applications, what’s in the database is likely to be out of sync with the real world, so insights arrive too late to be useful. This is where continuous intelligence -- the ability to generate contextual, continuous insights from vast amounts of static and dynamic data -- comes to the rescue.
New IoT devices are being connected to the Internet at an astounding rate. International Data Corporation (IDC) estimates that over 41 billion connected IoT devices will create 79.4 zettabytes (ZB) of data in 2025. All these devices have a lot to say. In addition, over 1 billion mobile devices in the hands of consumers continue to generate vast amounts of data. Although you may think that streaming data is about media sent from cloud services (such as Netflix and Spotify) to consumers, a huge challenge is emerging as massive numbers of smart devices, assets, and IT infrastructure stream data to application owners using the cloud (or a distributed edge cloud), for real-time analysis and response.
Dealing with the Data Deluge: Beyond Bandwidth
Continuous intelligence and 5G networking are fast friends. Few 4G mobile networks are engineered to handle a deluge of data heading into the network, but planning for 5G networking includes this scenario because a key use case for 5G is the fine-grained isolation of customer networks (called slices), offering a powerful, secure competitor to Wi-Fi and VPNs.
Back to data and databases: obviously a lot of data needs to be stored -- for compliance, historical records, and to record the state of devices and the environments they are monitoring and controlling. At the same time, there is a sea change in application thinking about how to process streaming data to gain continuous intelligence. Ten years ago, enterprises thought they could store everything and analyze the data later -- the big-data fallacy. It was a reasonable assumption given the costs of storage, networking, and computing power at the time and given the data rates of sources in the environment. Those assumptions proved to be seriously wrong:
- The number of connected sources is growing exponentially
- Bandwidth requirements for sources in the environment are growing fast
- Many applications require insights immediately, so the inherent delay of a store-then-analyze approach won’t suffice
Many legacy devices -- including traffic infrastructure, for example -- generate vast amounts of data. By way of example, the traffic infrastructure for the city of Palo Alto, CA, generates more data than Twitter’s Firehose. The infrastructure for the city of Las Vegas streams more than ten times that! If we need to deliver a smart city application that allows delivery vehicles to use granular predictions to find the best routes, the volumes of data are astounding. Moreover, the need for real-time processing is obvious: delivering out-of-date predictions is useless. 5G to the rescue!
Simply providing more bandwidth at the edge, although important, doesn’t solve the problem. What’s needed is real-time, data-driven computation in which the analysis and responses are driven by the arrival of real-world updates as well as computation at CPU and memory speeds without storing data first. We call this an “analyze, act, and then store” architecture.
Predictions need to be continuously computed, and at all times the application must have the latest answer -- not something computed as part of a batch run. A ride-share vehicle needs continuous traffic predictions to use the best route. From this observation we can conclude that a new category of applications -- continuous intelligence applications -- exists for which the application must always have a timely response (e.g., within 100ms) and, therefore, store-then-analyze is not an appropriate architecture.
Indeed, for applications dependent on generating responses continuously in real time, we need to analyze-then-store, a completely new paradigm for data processing that is now on the rise. The key requirements are bandwidth to get data to compute and the ability to drive computations from data.
Continuous Intelligence
Continuous intelligence demands stateful, in-memory processing to optimize performance and enable real-time responses. It embraces event streaming and other infrastructure patterns that have emerged recently, but it focuses on the application layer functions needed to develop and operate stateful applications that consume streaming events.
Although modern databases can store streaming data for later analysis, update relational tables, or modify graphs, continuous intelligence drives analysis from the arrival of data -- using an analyze, act, then store architecture that builds and executes a live computational model from streaming data.
This is a big change from the architectural assumptions of the database era or from the cloud-centric “stateless REST API plus stateful database” pattern. Moving state into memory enables faster analysis.
An emerging technology that enables developers to create distributed applications using real-time microprocesses is the concept of a Swim web agent, used liberally in SwimOS, an Apache 2.0 licensed platform for continuous intelligence. An application is an automatically created, distributed graph of web agents -- in effect, smart digital twins of data sources -- that each concurrently cleans and processes streaming data from a single source and analyzes its resulting state changes in the context of static and other dynamically evolving states.
Web agents dynamically link to related web agents. Links build an in-memory graph in which each vertex is a web agent that concurrently and in real time analyzes both its own state and the states of web agents to which it is linked.
Links are made and broken dynamically based on changing real-world relationships between data sources, so the graph is continually in flux. In the physical world, many relationships can be inferred from a geospatial context, but links can also express simpler notions such as containment or even correlation. Web agents can analyze, learn, and predict from their own states and those of linked web agents. Their insights, computed concurrently in the graph, stream in real time to applications and GUIs.
Using a dynamic graph of in-memory web agents that compute in real time on data from their real-world twins yields orders of magnitude greater performance than database-coupled applications. Web agent-based, object-oriented applications can process data on the fly, delivering results before storing raw data. They take the view that the real world (and its dynamic changes) offers the truest view of its own state. By computing at memory speed close to the sources of raw data, they can deliver a new class of applications that are no more than the blink of an eye out of sync with the real world, whether analyzing, learning, or predicting what’s next.
Where to Begin
To get started with continuous intelligence, developers and architects can prototype applications using open source tools: Apache Kafka or Apache Pulsar for event streaming, and Apache 2.0 licensed SwimOS as the web agent runtime application platform. SwimOS can easily integrate with stream analysis tools you may have used on other projects, such as Apache Beam, Apache Flink, or Apache Spark.