UPDATED 10:49 EDT / OCTOBER 09 2020

CLOUD

Data firehose: Next generation of streaming technologies goes cloud-native

Real-time streaming data has become a reality for an increasing number of enterprise applications. From the “internet of things” to artificial intelligence, from mobile analytics to a diverse range of big-data challenges, dealing with firehoses of information in real time is rapidly becoming standard fare for enterprise information technology.

Over the years, many technologies have come to the fore to deal with streaming data in various ways. Asynchronous interactions, from traditional message queuing to publish/subscribe to the more modern AsyncAPI standard, have added to the technologists’ toolbelt.

Architectural styles have also kept pace, with the message-oriented, event-driven architecture dating from the 1990s giving way to a revamped, streaming-oriented EDA. Today, the streaming data story is all about massive scale – scale in both dynamic business and technology environments without sacrificing performance, resilience and accuracy.

In other words, streaming data is becoming part of the cloud-native world, since Kubernetes is the only infrastructure technology on the horizon promising the real-time scale that today’s streaming data applications require.

The many paths to cloud-native real-time data streaming

I spoke to 10 vendors with various offerings in the cloud-native streaming marketplace to see how they were approaching this challenging area. I found that many of them had been in the streaming business pre-Kubernetes, and were now leveraging existing technologies and architectural patterns while incorporating them within new cloud-native approaches. Here are some examples.

WSO2 streaming integrator: evolving from complex event processing

WSO2 Inc. has been an open source-based enterprise integration technology vendor for many years. It offered a complex event processing platform in the 2000s that evolved into the Streaming Integrator product.

At the core of Streaming Integrator is WSO2’s open source Siddhi technology, which has evolved from a message-based event-driven engine to the more modern stream-based technology it is today.

WSO2 is now working on merging its flagship API management product with Streaming Integrator, with the goal of making all integration scenarios enabled for application programming interfaces.

Vantiq: event broker for the IoT

The distributed event-driven broker from VANTIQ Inc. is a scalable real-time platform that supports a range of edge computing applications, with a particular sweet spot for the IoT.

The company’s partners and customers have built many different types of data-intensive, real-time applications on the VANTIQ platform, including smart manufacturing, fleet management and environmental monitoring.

The VANTIQ platform is particularly well-suited to support AI-centric applications that must process large quantities of data on the edge in real-time. The company is also well-positioned to support future AI-based applications as they reach the market.

Lightbend: Bringing reactive programming to the cloud

The Lightbend Inc. story begins with reactive programming, a technical approach for building real-time applications, and the Scala programming language, purpose-built for reactive computing applications.

Today, Lightbend’s reactive programming story is an essential component of the broader cloud-native computing movement, which delivers massive horizontal scale, ideally in real time.

Not only must developers that leverage Lightbend’s technology build microservices following a reactive approach, but the Kubernetes-based cloud-native infrastructure must also handle interactions in a stateless manner.

Managing state information within an inherently stateless infrastructure is a challenge that Lightbend addresses with its open-source Cloudstate offering, combined with its Cloudflow streaming data solution and its more established Akka, Play and Lagom reactive application platform components.

Solace: cloud-native publish/subscribe

Asynchronous interactions and more broadly, event-driven architectural patterns like publish/subscribe have been with us for decades. But the request/reply pattern is so ingrained in our thinking that organizations struggle with the more general world of streaming events.

Solace Corp. is driving innovation in publish/subscribe functionality with its event brokers. Solace delivers these brokers in hardware appliance, software, and cloud-based ‘as-a-service’ form factors for broad deployment across the hybrid IT landscape.

Once in place, Solace’s event brokers create an “event mesh,” an event-driven abstraction of the full range of dynamic endpoints, similar to the way a service mesh like Istio (along with the Envoy proxy) does for RESTful microservices endpoints in Kubernetes clusters.

In contrast to the more familiar service meshes, event meshes treat request/reply as a special case of broader event-driven interaction mechanisms like publish/subscribe. The primary difference: publish/subscribe supports real-time behavior at scale better than request/reply.

Building on Apache Kafka

Much of the innovation in the real-time streaming segment takes place in various open-source projects, the best-known being Apache Kafka.

Kafka is a distributed event streaming platform that has reached a reasonable level of maturity and commensurate enterprise adoption. Nevertheless, Kafka is difficult to set up and manage, and it suffers from certain architectural issues that limits its applicability in the cloud-native data streaming space. In particular, Kafka is rather heavy-handed with how it manages state information.

You might think that real-time streaming technology should be stateless, essentially handling its processing and then moving on via a fire-and-forget approach. In reality, however, Kafka nodes must store some information about the streams they are processing in data partitions in order to keep all the nodes properly coordinated and to provide visibility into the behavior of each node.

How Kafka handles such partitions leads to scale and performance issues. Fortunately, a number of vendors have tackled these shortcomings. Here’s the scoop:

Confluent: commercializing enterprise Kafka

Perhaps more than any other vendor, Confluent Inc. is responsible for commercializing the open-source Apache Kafka distributed streaming platform.

At its core, Confluent treats event streams rather than messages as fundamental units, even though messages form the fundamental unit of most streaming middleware.

Managing Kafka partitions is difficult in practice, so Confluent brings additional technical capabilities and expertise to the table, including the ksqlDB SQL-compatible event streaming database purpose-built for stream processing applications.

Chargify Keen

Keen.io LLC combines event streaming, persistent storage, real-time analytics, and embedded visualizations into a single managed event streaming platform.

Under the covers, Keen leverages Kafka, Apache Cassandra NoSQL database and the Apache Spark analytics engine, adding a RESTful API and a number of SDKs for different languages. Keen enriches streaming data with relevant metadata and enables customers to stream enriched data to Amazon S3 or any other data store.

In March 2020, subscription billing platform vendor Chargify LLC acquired Keen, leveraging Keen’s streaming data platform to scale its billing platform.

Hazelcast Jet: accelerating Kafka with in-memory computing

The flagship product from Hazelcast Inc. is its in-memory computing platform that enables exceptionally fast processing by leveraging distributed systems or virtual machines.

For the Hazelcast Jet stream processing framework, the company positions its in-memory computing platform as a data grid that supports the parallel processing of real-time streaming data. Jet addresses Kafka’s challenges with data partitioning by breaking up streaming data tasks into subtasks and then intelligently distributing them to in-memory processing nodes for optimal performance and low latency.

Jet can also treat Kafka as a data source. When coupled with an idempotent data sink, the Jet stream processing engine can emulate two-phase commit behavior, even when the data sources aren’t ACID-compliant.

Red Hat: Kafka on OpenShift

The core of the AMQ Streams product from Red Hat Inc. is Kafka on the OpenShift Kubernetes container platform. To support the deployment of Kafka in a cloud-native Kubernetes environment, Red Hat has been driving the open-source Strimzi project that seeks to optimize Kafka on containers.

Red Hat also sponsors the open-source Debezium distributed platform for change data capture. Debezium enables organizations to track changes to data sources, creating an event stream that enables streaming data infrastructure to roll back and replay the various interactions with tracked data sources.

StreamNative: decoupling logic and storage for maximum performance

StreamNative Inc. is the sponsor of the open-source Apache Pulsar, a cloud-native distributed messaging and streaming platform that the StreamNative team originally developed at Yahoo! Inc.

Unlike Kafka, which uses each node for both streaming logic and storage of the data partitions, Pulsar separates these two functions, allowing each to scale independently of the other. As a result, Pulsar works well in cloud environments with abstracted data storage such as Amazon S3.

This decoupled scalability gives the illusion of “infinity stream storage,” with access to unlimited data storage via a single API. The result is a combination of the exactly-once resilient capabilities of message queuing technologies with the scalable streaming of Kafka.

For its commercial offering, StreamNative adds hosted clusters in public clouds as well as cloud-managed clusters in customers’ own cloud environments for a Pulsar-as-a-service offering well-suited to enterprises.

Building on cloud native

The final vendor on my list may be reacting to the challenges of Kafka in part, but instead of building on the older project, this vendor built cloud-native technology from the ground up.

Synadia: cloud-native streaming support for global communications system

Synadia Communications Inc. is the sponsor of open source NATS.io. That’s a publish/subscribe technology offering the scalability, speed and observability that cloud-native technologies require. It also handles security in a fully multitenant, trustless manner – not only aligning with cloud-native principles, but moving them forward.

Leveraging NATS.io and other open source technologies, Synadia built NGS, a cloud-native global communications system that is essentially a self-organizing, cloud-native network abstraction that provides optimized and secure message interactions among any endpoints on the Internet, in any cloud, at the edge or anywhere else.

The current version of NATS is a stateless, fire-and-forget technology. Synadia is addressing this limitation with its upcoming Jetstream technology, which combines NATS with cloud-native persistence technology to compete more effectively with Kafka-based streaming products.

What’s next for real-time streaming data

Of all the complex, intertwined emerging technology trends buffeting today’s enterprises, two stand out as driving innovation in real-time streaming: AI and edge computing.

AI leverages vast quantities of data, the larger and more real-time the better. When those data come from the edge, we have a double whammy driving the scale, resilience and accuracy requirements for real-time data.

Other trends are also driving innovation in streaming data: The 5G buildout and its impact on mobile technology and the “internet of things.” Enterprise demand for real-time observability in the context of cloud-native infrastructure. And of course, all the real-time applications that both consumers and businesses alike demand from the companies they interact with.

In the past, technology has always placed limitations on the volume, immediacy and power of real-time streaming data. With the rise of cloud-native computing, such limitations may dissolve, or at the least, recede into the future. Either way, we’re only going to crank up the firehose of real-time streaming data.

Jason Bloomberg is founder and president of Intellyx, which publishes the Cloud-Native Computing Poster and advises business leaders and technology vendors on their digital transformation strategies. He wrote this article for SiliconANGLE. (* Disclosure: At the time of writing, IBM (parent of Red Hat) is an Intellyx customer, and Solace and VANTIQ are former Intellyx customers. None of the other vendors mentioned in this article is an Intellyx customer.)

Image: Pikrepo

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU