Why Everyones Talking About Event Streaming – CIO UK

By Chris Latimer, vice president, product management, DataStax

Theres a lot of talk about the importance of streaming data and event-driven architectures right now. You might have heard of it, but do you really know why its so important to a lot of enterprises? Streaming technologies unlock the ability to capture insights and take instant action on data thats flowing into your organization; theyre a critical building block for developing applications that can respond in real-time to user actions, security threats, or other events. In other words, theyre a key part of building great customer experiences and driving revenue.

Heres a quick breakdown of what streaming technologies do, and why theyre so important to enterprises.

Organizations have gotten pretty good at creating a relatively complete view of so-called data at rest the kind of information thats often captured in databases, data warehouses, and even data lakes to be used immediately (in real time) or to fuel applications and analysis later.

Increasingly, data thats driven by activities, actions, and events that happen in real-time across an organization pours in from mobile devices, retail systems, sensor networks, and telecommunications call-routing systems.

While this data in motion might ultimately get captured in a database or other store, its extremely valuable while its still on the move. For a bank, data in motion might enable it to detect fraud in real time and act upon it instantly. Retailers can make product recommendations based on a consumers searching or purchasing history, the instant someone visits a web page or clicks on a particular item.

Consider Overstock, a U.S. online retailer. It must consistently deliver engaging customer experiences and derive revenue from in-the-moment monetization opportunities. In other words, Overstock sought the ability to make lightning-fast decisions based on data that was arriving in real-time (generally, brands have 20 seconds to connect with customers before they move on to another website).

Its like a self-driving car, says Thor Sigurjonsson, Overstocks head of data engineering. If you wait for feedback, youre going to drive off the road.

To maximize the value of their data as its created instead of waiting hours, days, or even longer to analyze it once its at restOverstock needed a streaming and messaging platform, which would enable them employ real-time decision-making to deliver personalized experiences and recommend products likely to be well-received by customers at the perfect time (really fast, in other words).

Data messaging and streaming is a key part of an event-driven architecture, which is a software architecture or programming approach built around the capture, communication, processing, and persistence of eventsmouse clicks, sensor outputs, and the like.

Processing streams of data involves taking actions on a series of data that originates from a system that continuously creates events. The ability to query this non-stop stream and find anomalies, recognize that something important has happened, and act on it quickly and in a meaningful way, is what streaming technology enables.

This is in contrast to batch processing, where an application would store a data after intaking it, process it, and then store the processed result or forward it to another application or tool. Processing might not start until after, say, 1000 data points have been collected. Thats too slow for the kind of applications that require reactive engagement at the point of interaction.

Its worth pausing to break that idea down:

Some enterprises have recognized that they need to derive value from their data-in-motion and have assembled their own event-driven architectures from a variety of technologies, including message-oriented middleware systems like Java messaging service (JMS) or message queue (MQ) platforms.

But these platforms were built on a fundamental premise that the data they processed was transient and should be immediately discarded once each message had been delivered. This essentially throws away a highly valuable asset: data thats identifiable as arriving at a particular point in time. Time-series information is critical for applications that involve asynchronous analysis, like machine learning. Data scientists cant build machine learning models without it. A modern streaming system needs to not only pass events along from one service to another, but also store them in a way that retains their value or usage later.

The system also needs to be able to scale to manage terabytes of data and millions of messages per second. The old MQ systems are not designed to do either of these.

As I touched upon above, there are a lot of choices available when it comes to messaging and streaming technology.

They include various open-source projects like RabbitMQ, ActiveMQ, and NATS, along with proprietary solutions such as IBM MQ or Red Hat AMQ. Then there are the two well-known, unified platforms for handling real-time data: Apache Kafka, a very popular technology that has become almost synonymous with streaming; and Apache Pulsar, a newer streaming and message queuing platform.

Both of these technologies were designed to handle the high throughput and scalability that many data-driven applications require.

Kafka was developed by LinkedIn to facilitate data communication between different services at the job networking company and became an open source project in 2011. Over the years its become a standard for many enterprises looking for ways to derive value from real-time data.

Pulsar was developed by Yahoo! to solve messaging and data problems faced by applications like Yahoo! Mail; it became a top-level open source project in 2018. While still catching up to Kafka in popularity, it has more features and functionality. And it carries a very important distinction: MQ solutions are solely messaging platforms, and Kafka only handles an organizations streaming needsPulsar handles both of these needs for an organization, making it the only unified platform available.

Pulsar can handle real-time, high-rate use cases like Kafka, but its also a more complete, durable, and reliable solution when compared to the older platform. To have streaming and queuing (an asynchronous communications protocol that enables applications to talk to one another), for example, a Kafka user would need to bolt on something like RabbitMQ or other solutions. Pulsar, on the other hand, can handle many of the use cases of a traditional queuing system without add-ons.

Pulsar carries other advantages over Kafka, including higher throughput, better scalability, and geo-replication, which is particularly important when a data center or cloud region fails. Geo-replication enables an application to publish events to another data center without interruption, preventing the app from going downand preventing an outage from affecting end users. (Heres a more technical comparison of Kafka and Pulsar).

In the case of Overstock, Pulsar was chosen as the retailers streaming platform. With it, the company built what its head of engineering Sigurjonsson describes as an integrated layer of data and connected processes governed by a metadata layer supporting deployment and utilization of integrated reusable data across all environments.

In other words, Overstock now has a way to understand and act upon real-time data organization-wide, enabling the company to impress its customers with magically fast, relevant offers and personalized experiences.

As a result, teams can reliably transform data in flight in a way that is easy to use and requires less data engineering. This makes it that much easier to delight their customersand ultimately drive more revenue.

To learn more about DataStax, visit us here.

About Chris Latimer

Chris Latimer is a technology executive whose career spans over twenty years in a variety of roles including enterprise architecture, technical presales, and product management. He is currently Vice President of Product Management at DataStax where he is focused on building the companys product strategy around cloud messaging and event streaming. Prior to joining DataStax, Chris was a senior product manager at Google where he focused on APIs and API Management in Google Cloud. Chris is based near Boulder, CO, and when not working, he is an avid skier and musician and enjoys the never-ending variety of outdoor activities that Colorado has to offer with his family.

See the article here:
Why Everyones Talking About Event Streaming - CIO UK

Related Posts
This entry was posted in $1$s. Bookmark the permalink.