Use Cases for Streaming Data Integration

The Top 4 Use Cases for Streaming Data Integration: Whiteboard Wednesdays

 

 

Today we are talking about the top four use cases for streaming data integration. If you’re not familiar with streaming data integration, please check out our channel for a deeper dive into the technology. In this 7-minute video, let’s focus on the use cases.

 

Use Case #1 Cloud Adoption – Online Database Migration

The first one is cloud adoption – specifically online database migration. When you have your legacy database and you want to move it to the cloud and modernize your data infrastructure, if it’s a critical database, you don’t want to experience downtime. The streaming data integration solution helps with that. When you’re doing an initial load from the legacy system to the cloud, the Change Data Capture (CDC) feature captures all the new transactions happening in this database as it’s happening. Once this database is loaded and ready, all the changes that happened in the legacy database can be applied in the cloud. During the migration, your legacy system is open for transactions – you don’t have to pause it.

While the migration is happening, CDC helps you to keep these two databases continuously in-sync by moving the real-time data between the systems. Because the system is open to transactions, there is no business interruption. And if this technology is designed for both validating the delivery and checkpointing the systems, you will also not experience any data loss.

Because this cloud database has production data, is open to transactions, and is continuously updated, you can take your time to test it before you move your users. So you have basically unlimited testing time, which helps you minimize your risks during such a major transition. Once the system is completely in-sync and you have checked it and tested it, you can point your applications and run your cloud database.

This is a single switch-over scenario. But streaming data integration gives you the ability to move the data bi-directionally. You can have both systems open to transactions. Once you test this, you can run some of your users in the cloud and some of you users in the legacy database.

All the changes happening with these users can be moved between databases, synchronized so that they’re constantly in-sync. You can gradually move your users to the cloud database to further minimize your risk. Phased migration is a very popular use case, especially for mission-critical systems that cannot tolerate risk and downtime.

Cloud adoptionUse Case #2 Hybrid Cloud Architecture

Once you’re in the cloud and you have a hybrid cloud architecture, you need to maintain it. You need to connect it with the rest of your enterprise. It needs to be a natural extension of your data center. Continuous real-time data moment with streaming data integration allows you to have your cloud databases and services as part of your data center.

The important thing is that these workloads in the cloud can be operational workloads because there’s fresh information (ie, continuously updated information) available. Your databases, your machine data, your log files, your other cloud sources, messaging systems, and sensors can move continuously to enable operational workloads.

What do we see in hybrid cloud architectures? Heavy use of cloud analytics solutions. If you want operational reporting or operational intelligence, you want comprehensive data delivered continuously so that you can trust that’s up-to-date, and gain operational intelligence from your analytics solutions.

You can also connect your data sources with the messaging systems in the cloud to support event distribution for your new apps that you’re running in the cloud so that they are completely part of your data center. If you’re adopting multi-cloud solutions, you can again connect your new cloud systems with existing cloud systems, or send data to multiple cloud destinations.

Hybrid Cloud ArchitectureUse Case #3 Real-Time Modern Applications

A third use case is real-time modern applications. Cloud is a big trend right now, but not everything is necessarily in the cloud. You can have modern applications on-premises. So, if you’re building any real-time app and modern new system that needs timely information, you need to have continuous real-time data pipelines. Streaming data integration enables you run real-time apps with real-time data.

Use Case #4 Hot Cache

Last, but not least, when you have an in-memory data grid to help with your data retrieval performance, you need to make sure it is continuously up-to-date so that you can rely on that data – it’s something that users can depend on. If the source system is updated, but your cache is not updated, it can create business problems. By continuously moving real-time data using CDC technology, streaming data integration helps you to keep your data grid up-to-date. It can serve as your hot cache to support your business with fresh data.

 

To learn more about streaming data integration use cases, please visit our Solutions section, schedule a demo with a Striim expert, or download the Striim platform to get started.

 

PostgreSQL to Kafka

Streaming Data Integration Tutorial: Adding a Kafka Target to a Real-Time Data Pipeline

This is the second post in a two-part blog series discussing how to stream database changes into Kafka. You can read part one here. We will discuss adding a Kafka target to the CDC source from the previous post. The application will ingest database changes (inserts, updates, and deletes) from the PostgreSQL source tables and deliver to Kafka to continuously to update a Kafka topic.

What is Kafka?

Apache Kafka is a popular distributed, fault-tolerant, high-performance messaging system.

Why use Striim with Kafka?

The Striim platform enables you to ingest data into Kafka, process it for different consumers, analyze, visualize, and distribute to a broad range of systems on-premises and in the cloud with an intuitive UI and SQL-based language for easy and fast development.

How to add a Kafka Target to a Striim Dataflow

From the Striim Apps page, click on the app that we created in the previous blog post and select Manage Flow.

MyPostgreSQL CDC App
MyPostgreSQL-CDC App

This will open your application in the Flow Designer.

PostgreSQL CDC App Data Flow
MyPostgrSQLCDC app data flow.

To do the writing to Kafka, we need to add a Target component into the dataflow. Click on the data stream, then on the plus (+) button, and select “Connect next Target component” from the menu.

Connecting a target component to the Data Flow
Connecting a target component to the data flow.

Enter the Target Info

The next step is to specify how to write data to the target.  With the New Target ADAPTER drop-down, select Kafka Writer Version 0.11.0, and enter a few connection properties including the target name, topic and broker URL.

Configuring the Kafka Target
Configuring the Kafka target.

Data Formatting 

Different Kafka consumers may have different requirements for the data format. When writing to Kafka in Striim, you can choose the data format with the FORMATTER drop down and optional configuration properties. Striim supports JSON, Delimited, XML, Avro and free text formats, in this case we are selecting the JSONFormatter.

Configuring the Kafka target formatter
Configuring the Kafka target FORMATTER.

Deploying and Starting the Data Flow

The resulting data flow can now be modified, deployed, and started through the UI. In order to run the application, it first needs to be deployed, click on the ‘Created’ dropdown and select ‘Deploy App’ to show the Deploy UI. 

Deploying CDC app
Deploying the app.

The application can be deployed to all nodes, any one node, or predefined groups in a Striim cluster, the default is the least used node. 

Deployment node selection.
Deployment node selection.

After deployment the application is ready to start, by selecting Start App.

Starting the app.
Starting the app.

Testing the Data Flow

You can use the PostgreSQL to Kafka sample integration application, to insert, delete, and update the PosgtreSQL CDC source table, then you should see data flowing in the UI, indicated by a number of msgs/s. (Note the message sending happens fast and quickly returns to 0).

Testing the streaming data flow.
Testing the streaming data flow.

If you now click on the data stream in the middle and click on the eye icon, you can preview the data flowing between PostgreSQL and Kafka. Here you can see the data, metadata (these are all updates) and before values (what the data was before the update).

Previewing the data flowing from PostgreSQL to Kafka
Previewing the data flowing from PostgreSQL to Kafka.

There are many other sources and targets that Striim supports for streaming data integration. Please request a demo with one of our lead technologists, tailored to your environment.

Change Data Capture - change log

Streaming Data Integration Tutorial: Using CDC to Stream Database Changes

 

This is the first in a two-part blog post discussing how to use Striim for streaming database changes to Apache Kafka. Striim offers continuous data ingestion from databases and other sources in real time; transformation and enrichment using Streaming SQL; delivery of data to multiple targets in the cloud or on-premise; and visualization of results. In this part, we will use Striim’s low-impact, real-time change data capture (CDC) feature to stream database changes (inserts, updates, and deletes) from an operational database into Striim.

What is Change Data Capture

Databases maintain change logs that record all changes made to the database contents and metadata. These change logs can be used for database recovery in the event of a crash, and also for replication or integration.

Change data capture change log

With Striim’s log-based CDC, new database transactions – including inserts, updates, and deletes – are read from source databases’ change logs and turned into a stream of events without impacting the database workload. Striim offers CDC for Oracle, SQL Server, HPE NonStop, MySQL, PostgreSQL, MongoDB, and MariaDB.

Why use Striim’s CDC?

Businesses use Striim’s CDC capabilities to feed real-time data to their big data lakes, cloud databases, and enterprise messaging systems, such as Kafka, for timely operational decision making. They also migrate from on-premises databases to cloud environments without downtime and keep cloud-based analytics environments up-to-date with on-premises databases using CDC.

How to use Striim’s CDC?

Striim’s easy-to-use CDC template wizards automate the creation of applications that leverage change data capture, to stream events as they are created, from various source systems to various targets. Apps created with templates may be modified using Flow Designer or by exporting TQL, editing it, and importing the modified TQL. Striim has templates for many source-target combinations.

In addition, Striim offers pre-built integration applications for bulk loading and CDC from PostgreSQL source databases to target systems including PostgreSQL database, Kafka, and files. You can start these applications in seconds by going to the Applications section of the Striim platform.

Striim Pre-built Sample Integration Applications
Striim pre-built sample integration applications.

In this post, we will show how to use the PostgreSQL CDC (PostgreSQL Reader) with a Striim Target using the wizards for a custom application instead of using the pre-built application mentioned above. The instructions below assume that you are using the PostgreSQL instance that comes with the Striim platform. If you are using your own PostgreSQL database instance, please review our instructions on how to set up PostgreSQL for CDC.

Using the CDC Template

To start building the CDC application, in the Striim web UI, go to the Apps page and select Add App > Start with Template. Enter PostgreSQL in the search field to narrow down the sources and select “PostgreSQL Reader to Striim”.

CDC application template
Wizard template selection when creating a new app.

Next enter the name and namespace for your application (the namespace is a way of grouping applications together).

Creating a new application using Striim.

Specifying the Data Source Properties

In the SETUP POSTGRESQL READER specify the data source and table properties:

  • the connection URL, username, and password.
  • the tables for which you want to read change data.
Configuring the Data Source in the Wizard
Configuring the data source in the wizard.

After you complete this step, your application will open in the Flow Designer.

The Wizard Generates a Data Flow
The wizard generates a data flow.

In the flow designer, you can add various processors, enrichers, transformers, and targets as shown below to complete your pipeline, in some cases with zero coding.

Flow designer enrichers and processors.

Flow designer event transformers and targets.

 

In the next blog post, we will discuss how to add a Kafka target to this data pipeline. In the meantime, please feel free to request a demo with one of our lead technologists, tailored to your environment.

 

Real-Time Collection, Enrichment and Analysis of Set-Top Box Data

Real-Time Collection, Enrichment and Analysis of Set-Top Box Data

Competition is stiff. With the onset of Internet protocol TV and “over the top” technology, satellite, telco and cable set-top box providers are scrambling to increase the stickiness of their subscription services. The best way to do this is to provide real-time context marketing for their set-top boxes in order to know the customer’s interests and intentions immediately, and tailor services and offers on-the-fly.

In order to make this happen, these companies need three things:

  • They need to be able to ingest huge volumes of disparate data from a gazillion set-top boxes around the world.
  • They need to be able to – in real time – enrich that data with customer information/behavior and historical trends to assess the customer’s interest in-the-moment.
  • They need to be able to map that enriched data to a set of offers or services while the customer is still present and interested.

The Striim platform helps companies deliver real-time, context marketing applications that addresses all three phases of interaction and analysis. It collects your real-time set top box clickstream data and enriches it with a broad range of contextual data sources such as customer history and past behavior, geolocation, mobile device information, sensors, log files, social media and database transactions.

With Striim’s easy-to-use GUI and SQL-like language, users can rapidly create tailored enterprise-scale, context-driven marketing applications.

The aggregation of real-time and historical information via the set-top box makes it possible for providers to know who is watching right now, where they are, and what their purchasing patterns look like. With this context, providers can instantly deliver the most relevant and effective advertising or offer while the customer is still “present,” giving the provider the best change of motivating the customer to take immediate action.

With the Striim platform, users can deliver a streaming analytics application that constantly integrates real-time actions and location with historical data and trends. Once the customers intentions are identified, they can easily take action to either promote retention or incentivize additional purchases.

Detecting behavior that would be out-of-the-norm may signal a completely new set of advertising opportunities. For example, if a working Mom is at home watching the Disney Channel, it might indicate she is home with a sick child. With streaming analytics and context marketing, this scenario would be detected immediately, and could trigger a set of ads within the customer’s video stream that provide offers for children’s cold and flu medicine.

+ READ MORE

Real-World Examples of Real-Time Log File Monitoring

 

 

At its most basic, the goal of log file monitoring is finding things which otherwise would have been missed, such as trends, anomalies, changes, risks, and opportunities. For some firms, log files exist to meet compliance requirements or because software already in use generates them automatically. But for others, analyzing log files – even in real time, as they are created – is incredibly valuable.

In many industries, the speed with which analysis is performed is immaterial. For a personnel-heavy division, for example, looking at employee logs weekly or monthly might provide enough information.

For others, though, the difference between detecting an upsell opportunity while a customer is still on their website, compared to 30 seconds later, could make a difference in what’s purchased. For a smaller subset of applications, real-time monitoring can make the difference between catastrophic failures which could cost millions, and routine maintenance solving the problem.

In general, fields where the mean time to recover from failure is high, and cost of downtime expensive, real-time log file monitoring can prevent costly mistakes and open up otherwise missed opportunities.

Let’s look at two fields that are rapidly adopting real-time analytics: manufacturing and financial services.

Banking & Financial Services

Real-time analysis of log files presents three major opportunities to financial services firms.

First, it allows them the opportunity to make trades faster. Real-time log file monitoring can find network issues and unwanted latency, ensuring that trades are committed when they’re ordered – not later, when the opportunity for arbitrage is entirely passed.

Second, real-time analysis of customer interactions (with ATMs, electronic banking, or even service representatives) provides the opportunity to increase customer satisfaction and even upsell opportunities by noticing trends in behavior as they happen.

Third, real-time analysis of log files is a tremendous boon to security. In a world reliant on technology to support delicate financial systems, real-time analysis may catch network intruders before they can commit crimes. Legacy analysis would find only traces and lost money.

Manufacturing

For manufacturers, especially heavily automated ones, uptime can be critical. Any time that a factory isn’t running because something has gone wrong, it could be losing money both for the company directly, and for any clients downstream who might rely on it to produce intermediate goods.

In these circumstances, real-time monitoring can alleviate risks. Analyzing logs daily, or even every half-hour, wouldn’t notice a machine malfunctioning until potentially too late. On the other hand, real-time analysis can detect failure before it spreads from one machine into the next part of an assembly line.

Real-time analysis can also provide opportunities for manufacturers to streamline operations. In cases where factory equipment is heavily specialized, for example, repair parts can take days or weeks to arrive, all of which is downtime.

Weekly log analysis likely wouldn’t detect parts beginning to wear down until it’s too late. Real-time analysis, on the other hand, allows factory operators to purchase replacement parts preemptively, thereby minimizing or eliminating downtime.

Additionally, real-time log file monitoring in the manufacturing sector can allow companies to keep smaller quantities of inventory or intermediate products on hand. This can help to lower costs and streamline operations.

Ultimately, not every company or business unit will gain tremendous value from real-time analysis. Most, however, will find far more value in under-utilized log files than they expect.

As costs come down and real-time analysis proliferates, it would be prudent for companies to make sure they’re ahead of the curve, or at least tracking it as it evolves.

real-time offers

Real-Time Offers and Sales Monitoring

Today we’re going to take a look at a Striim-base application that makes real-time offers to customers while they’re still in a store by combining beacon, sales, and inventory information.

https://youtu.be/7iI9I3pUHy8

If you’re a retailer, you may have many stores for which you want to track sales. Additionally, you may wish to make offers to your customers while they’re still in the store. To do this, you may offer your customers a phone app, and place beacons around the store able to track customer movement.
real-time offers
Using Striim, you then build an application to monitor the beacon data and sales data through change data capture from the original database. You can enrich all of this with the inventory, product, and customer information to give that data context. The enrichment and correlation of multiple streams of information in real time will enable you to send offers to your customers and monitor activity on the dashboard.
real-time offers
The logic to send the offers is customizable. But in this case, we’re sending the offer 1) if the customer stays close to a specific product for a certain amount of time, or 2) if they look at the product, leave, and then come back to look at the product again a certain number of times. In either case, you can define the application to only send the offer if you have sufficient inventory.
 
The application itself has a dashboard that allows you to track the sales per store and look at the sales traffic over time. You can see how it varies depending on the time of day. It also tracks the types of items actually being sold and the offers that have been made to customers in real-time. In addition to this chart, each offer will result in an alert that is sent to the dashboard. In that alert, you can see profile of the customer that was sent the offer, which store they were in, and what offer was made.
real-time offers
 
If you drill down to a particular store, then you can see a heat map of where the customers actually are in real time. You can see which aisles are trafficked the most, and how that varies over time. You’ll also see the sales for that particular store over time and what offers were actually being made to customers in that particular store. Crucially, the dashboards also show the impact of these these real-time offers make in how much customers actually spend.
 
When looking how this is built, all the processing is through streaming data flows. We have sources at the top for the beacon information and the sales data pulled from a databases and change data capture. The inventory information, products, and customers are loaded into in-memory caches so they can be joined in real-time. If we look at one of the processing flows, you can see the process is done through a query. This is the query that’s checking for customers staying in one place for too long, or going back to the same place multiple times. If either those queries are met, we might make the customer an offer, but first we need to check to see if the inventory is sufficient. You can see that twenty percent mark there.
 
The dashboards built on this data flow are fully customizable. Each visualization is powered by a query and configured through a very simple drag-and-drop configuration enabling you to map the query to the visualization.
real-time offers dashboard
 
If you would like a more in-depth look at this application, please request a demo with one of our lead technologists.