Network Traffic Monitoring with Striim 2

Network Traffic Monitoring with Striim – Part 2

In the first post of this two-part series, we demonstrated how to use Striim’s ability to collect, integrate, process, analyze and visualize large streaming data in real time. In this post, we will provide more details about the machine learning pipeline to forecast network traffic and detect anomalies in Striim.

First, Striim reads raw data from files and then processes and aggregates the raw data. After aggregation, the data volume is dramatically reduced. In this application, it aggregates total network traffic for all IPs per minute. Thus it only has one data point for each minute to train the machine learning models.

In the machine learning pipeline, it first preprocesses the data, then fits the data into the machine learning models for prediction. Finally, it detects anomalies if the actual value is far from the predicted one.

  • Data Processing: Before fitting data into sophisticated models, it is usually necessary to transform and preprocess the data. Striim supports a collection of data preprocessing methods including standardization and normalization to make training less sensitive to the scale of features, power transformation (like square root, log) to stabilize volatile data, and seasonality decomposition like Loess or STL decomposition to remove the seasonality in time-series data, etc. In this application, we use log transformation to make data more stable and improve prediction accuracy.
  • Automated Prediction: Striim first transforms time series regression to a standard regression problem with lagged variables (i.e. autoregression), and then applies any regression model to it for prediction. Striim has a plethora of machine learning models including linear and non-linear regression, SVM, gaussian process regression, random forest, Bayesian networks, deep learning models, etc. In this application, we use Random Forest for several reasons: 
    • It is efficient enough to retrain continuously over streaming data. Some models, like deep neural networks, are time-consuming and not suitable for online training. 
    • It uses the ensembling method which combines multiple ML models to obtain better performance. 
    • It achieves good predictive performance compared to other models according to our comprehensive empirical study.
  • Anomaly Detection: Striim detects anomalies based on its prediction. The main idea is that anomalies tend to have a large difference between the actual value and predicted value. Striim uses a statistical-based threshold to find anomalies if the percentage error between actual and predicted values is larger than the threshold. In this application, Striim assumes percentage errors follow the normal distribution and defines the threshold accordingly. e.g. we can define the threshold as mean + 2 * standard deviation of percentage errors.  

The models can update continuously on the streaming data. One can bound the training data size and control update frequency. In this application, we use the recent 200 data points as training data and update the ML models when it receives a new data point. 

For more details, please see our 2019 DEBS paper: A Demonstration of Striim.

A Streaming Integration and Intelligence Platform. Striim won Best Demo Award in the 13th ACM International Conference on Distributed and Event‐based Systems (DEBS).

 

Hazelcast Striim Hot Cache

Introducing Hazelcast Striim Hot Cache

Today, we are thrilled to announce the availability of Hazelcast Striim Hot Cache. This joint solution with Hazelcast’s in-memory data grid uses Striim’s Change Data Capture to solve the cache consistency problem.

With Hazelcast Striim Hot Cache, you can reduce the latency of propagation of data from your backend database into your Hazelcast cache to milliseconds. Now you have the flexibility to run multiple applications off a single database, keeping Hazelcast cache refreshes up-to-date while adhering to low latency SLAs.

 

Check out this 5-minute Introduction and Demo of Hazelcast Striim Hot Cache:

https://www.youtube.com/watch?v=B1PYcIQmya4

 

Imagine that you have an application that works by retrieving and storing information in a database. To get faster response times, you utilize a Hazelcast in-memory cache for rapid access to data.

However, other applications also make database updates which leads to inconsistent data in the cache. When this happens, suddenly the application is showing out-of-date or invalid information.

Hazelcast Striim Hot Cache solves this by using streaming change data capture to synchronize the cache with the database in real time. This ensures that both the cache and associated application always have the most up-to-date data.

Through CDC, Striim is able to recognize which tables and key values have changed. Striim immediately captures these changes with their table and key, and, using the Hazelcast Striim writer, pushes those changes into the cache.

We make it easy to leverage Striim’s change data capture functionality by providing CDC Wizards. These Wizards help you quickly configure the capture of change data from enterprise databases – including Oracle, MS SQL Server, MySQL and HPE NonStop – and propagate that data to a Hazelcast cache.

You can also use Striim to facilitate the initial load of the cache.

To learn more, please read the full press release, visit the Hazelcast Striim Hot Cache product page, or jump right in and download a fully loaded evaluation copy of Striim for Hazelcast Hot Cache.

Oracle to Azure

Demo: Migrate Oracle Data to Azure in Real Time

We’d like to demonstrate how you can migrate Oracle data to Microsoft Azure SQL Server running in the cloud, in real time, using Striim and change data capture (CDC).

People often have data in lots of Oracle tables, on-premise. They want to migrate Oracle data into Microsoft Azure SQL Server, in real-time. How do you go about moving data from Oracle to Azure without affecting your production databases?

https://www.youtube.com/watch?v=iglW9aJCUlE

You can’t use SQL queries because typically these would be queries against a timestamp – like table scans that you do over and over again – and that puts a load on the Oracle Database. You might also skip important transactions. You need change data capture (CDC) which enables non-intrusive collection of streaming database change.
Migrate Oracle Data to Azure in Real Time

Striim provides change data capture as a collector out of the box. This enables real-time collection of change data from Oracle SQL Server and MySQL. CDC works because databases write all the operations that occur into transaction logs. Change data capture listens to those transaction marks, instead of using triggers or timestamps, and directly reads these logs to collect operations. This means that every DML operation – every insert, update, and delete – is written to the logs captured by change data capture and turned into events by our platform.

Migrate Oracle Data to Azure in Real Time

In this demo, you will see how you can utilize Striim to do real-time collection of change data capture from Oracle Database and deliver that data, in real-time, into Microsoft Azure SQL Server. We also build a custom monitoring solution of the whole end-to-end data flow. The demo starts at the 1:43 mark.

First, we connect to Microsoft Azure SQL Server. In this instance, we have two tables: TCUSTOMER and TCUSTORD, that we can show are currently completely empty. We use a data flow that we’ve built in Striim to capture data from an on-premise Oracle database using change data capture. You can see the configuration properties, and deliver the data (after doing some processing) into Microsoft Azure SQL Server.

To show this, we run some SQL against Oracle. This SQL does a combination of inserts, updates, and deletes against our two Oracle tables. When we run this, you can see the data immediately in the initial stream. That data stream is then split into multiple processing steps and then delivered into a Azure SQL Server. If we redo the query against our Azure tables, you can see that the previously empty tables now have data in them. That data was delivered live and will continue to be delivered in a streaming fashion as long as changes are happening in the Oracle database.

In addition to the data movement, we’ve also built a monitoring application complete with dashboard that shows data flowing through the various tables, the types of operations occurring, and the entire end-to-end transaction lag. This shows the difference between when a transaction was committed on the source system, and when it was captured and applied to the target. You can also see some of the most recent transactions.

Migrate Oracle Data to Azure in Real Time

This monitoring application was built, again, using a data flow within the Striim platform. This data flow uses the original streaming change data from the Oracle Database and then applies some processing in the form of SQL queries to generate statistics. In addition to generating data for the dashboard, you can also use this as rules to generate alerts for thresholds, etc. The dashboard itself is not hard-coded. It’s generated using a dashboard builder which utilizes queries to connect to the back-end. Each visualization is powered by a query against the back-end data. There are lots of visualizations to choose from.

We hope you have enjoyed seeing how to migrate Oracle data into the cloud using Striim via the Oracle to Azure demo. If you would like a more in-depth look at this application, please request a demo with one of our lead technologists.

Real-Time Financial Transaction Monitoring

 

 

Financial Monitoring Application

Building complex, financial transaction monitoring applications used to be a time-consuming task. Once you had the business case worked out, you needed to work with a team of analysts, DBAs and engineers to design the system, source the data, build, test, and rollout the software. Typically it wouldn’t be correct the first time, so rinse and repeat.

Not so with Striim. In this video you will see a financial transaction monitoring application that was built and deployed in four days. The main use case is to spot increases in the rate at which customer transactions are declined, and alert on that. But a whole host of additional monitoring capabilities were also built into the application. Increasing decline rates often indicate issues with the underlying ATM and Point of Sale networks, and need to be resolved quickly to prevent potential penalties and decline in customer satisfaction.

The application consists of a real-time streaming dashboard, with multiple drill-downs, coupled with a continuous back-end dataflow that is performing the analytics, driving the dashboard and generating alerts. Streaming data is sourced in real time from a SQL Server database using Change Data Capture (CDC), and used to drive a number of analytics pipelines.

The processing logic is all implemented using in-memory continuous queries written in our easy to work with SQL-like language, and the entire application was built using our UI and dashboard builder. The initial CDC data collection goes through some initial data preparation, and is then fed into parallel processing flows. Each flow is analyzing the data in different ways, and storing the results of the processing in our built-in results store to facilitate deeper analysis later.

If you want to learn how to build complex monitoring and analytics applications quickly, take 6 minutes to watch this video.

 

ATM Remote Device Monitoring with Predictive Analytics

Watch this video to learn how Striim can monitor ATM components in order to avoid downtime and keep customers happy. This application is a specific example of the remote device monitoring solution developed by Striim which alerts technicians to possible machine outages and cash shortages. Key ATM components are monitored in realtime, including: CPU, printer, card reader, temperature, and cash balances. The App also acquires streaming data about current ATM transactions and historical data to compare past behavior. With visibility into data streams you can predict ATM component failures and service your machine as needed, rather than when scheduled.

  • a summary that shows an overview of all locations being monitored on a global level
  • a location page that shows machines on a local level
  • an ATM page highlighting activity, issues, and component metric prediction charts – shows data collected in 20 minute windows, and provides predictions for the next 10 minute window.

The Striim Platform processes all types of data in innovative ways so users can react to diverse sets of information in massive volumes and maximize the value of big data within their enterprise. The ATM Component Monitor App efficiently acquires and processes streaming data in-memory on commodity hardware, enabling instant alerts and real-time visualizations of your ATM needs. The same approach can be used for other applications, such as monitoring slot machines.

 

Streaming Multi-log Correlation Using the Striim Platform

Learn how the Striim Multi-log micro application correlates interesting events across multiple data streams in real time. This video is a high-level demonstration of the Multi-log app correlating two live data streams, a web server log and an app server log. The Multi-log application can be extended to continuously correlate all structured, semi-structured, and transactional data sources allowing businesses to watch events and add context as they unfold.

https://youtu.be/NlAn0m_VRzU

Multi-log Correlation Real-time Use Cases

  • VIP activity identification
  • cross log correlation
  • user activity enrichment
  • hack attempts
  • blacklist cross checks
  • large response times
  • zero content check
  • stream enrichment
  • real-time contextual marketing offers

Big Data and NonStop – Here to Stay and Users Richer for It!


“Big Data is here to stay, and it’s getting bigger. Volume, variety, and velocity are growing at ever increasing rates, and the data is coming from more diverse sources than ever (Internet of Things, sensors, machines, components, etc.). The next generation agile enterprise needs a Big Data analytics solution that allows seamless integration of transactional data with Big Data for a rich, actionable view of every transaction,” so opens the October 21, 2014, CIO Story Big Data Special 2014. Under the banner of WebAction: Providing Data Driven Apps for Agile Enterprises CIO Story names WebAction one of the 20 Most Powerful Big Data Companies. For the NonStop community, this is further evidence that Big Data is indeed here to stay, that WebAction is a well-respected vendor and that transactional data is an often-overlooked component of today’s Big Data analytics offerings.

But wait, there’s a whole lot more here of interest to the NonStop community. And it’s not, “just one more thing” either, to paraphrase Apple’s Steve Jobs, as he was apt to say on big occasions. Big Data is something our lives will become a part of – everything we do and everything we touch will generate an event or action that will then influence all that comes next. Not just for us, but for everyone else around us. Buying that cool sneaker may the one additional purchase that pushes sales of that type of sneaker past a threshold that in turn, influences all sneaker vendors. It’s just that simple – with Big Data every transaction will in turn feed a far richer future experience. And it is all about providing a richer user experience. By richer, try thinking of it as being more meaningful – having to scroll through less meaningless junk and hitting on exactly what you require, immediately!

In an opinion paper featuring WebAction that should be available in time for the upcoming NonStop Technical Boot Camp 2014, I quote WebAction Cofounder Sami Akbay, when he said, “You don’t do data integration to the Internet; you Internet-enable yourself … Big Data will go down the same path; we will not integrate to it, we will become a part of it.” Furthermore, according to Akbay, “It’s pretty straightforward really, consumers want a richer online experience and it is becoming imperative that even the most basic transaction applications running on NonStop can benefit from what WebAction can provide.”

These comments by Akbay first appeared in the article Big Data: analyzing every last piece of information and finding needles in haystacks! published in the Sep-Oct, 2014, issue of the NonStop community publication, The Connection. But what’s really important for the NonStop community to realize is that those with applications on NonStop are already involved in Big Data (by the very nature of the transactions that they process) and that there is one of the 20 Most Powerful Big Data Companies that, as our good fortune has it, is well versed in all things NonStop. For those who plan on attending the NonStop Technical Bootcamp make sure you set aside time to attend a joint HP / WebAction presentation where Sami Akbay will be providing an update – and yes, checkout the CIO Story, the article in The Connection and yes, watch for my first opinion paper on Big Data and WebAction coming soon!

Register to Experience the WebAction Data Center Edition in Houston, TX October 24th

Join WebAction Account Executive, Steve Banovic, for a lunch and interactive meetup to explore the capabilities of the WebAction Real-time App Platform: Data Center Edition in Houston, TX on October 24th 2014 at at 11:30 AM. Corporations and government agencies claim to own the necessary management tools to maintain SLAs yet, when outages occur, nobody knows what the problem is (or how it was caused). Meet the WebAction team at Del Frisco’s Double Eagle Steak House and let us show you how we leverage your existing tools, creating a predictable view of outages, anomalies and exposures before they happen.

Register for the Meetup Here

The WebAction Data Center App

With the proliferation of servers, workstations, routers, middleware, applications and endpoints it has become an impossible task to manage it all.  While these products have exponentially grown, support staff has remained stagnant. This has created silos of reactive personnel attempting to troubleshoot outages after the calls start coming in. Executives proclaim there are plenty of tools to manage the data center yet, 80% of the outages that occur could have been prevented.

WebAction’s Real-time App Platform: Data Center Edition provides proactive analytics to monitor your entire infrastructure in real-time, predicting issues before they occur. The Data Center App allows you to correlate events across network and other log streams, so that you can address failures, preemptively avert problems, and keep your finger on the pulse of your enterprise at all times.

 

CIO Review Adds Striim to Oracle 100 Most Promising Oracle Solution Providers

Oracle100_2014Striim is excited to be recognized as one of the CIO Review Oracle 100 Most Promising Oracle Solution Providers. Striim integrates with Oracle NoSQL as a robust persistence layer for our Data Driven Apps.

Striim Data Driven Apps and Oracle NoSQL

“The new release of Oracle NoSQL Database adds compelling value for developers and IT, like security and data center monitoring enhancements which are critical for enterprise deployments. For Striim’s data center automation application, we need to have a robust NoSQL back end data store and Oracle NoSQL Database is a natural fit. The new table model and indexing features are also a welcome addition to the latest release, and we plan to take advantage of these features,” mentioned Alok Pareek, Founder and Executive Vice President, Striim.

Download full article here

Real-time Data Driven App Use Case: SF Marathon Business Impact

This weekend I ran across an interesting real-life tale where the WebAction Point-of-Sale Data Driven App could have saved (and made) WholeFoods, thousands of dollars in profit. For some businesses the huge number of participants running through downtown for the annual SF marathon equated to increased sales. Unfortunately for the Potrero Hill Whole Foods, their proximity to the race course had the opposite effect. Rhode Island St. at 17th Ave. and Mariposa St. was blocked and the diverted access to the store was not obvious (I found it). The normally bustling parking lot only held five cars at 11:15 AM, fantastic for me but certainly not for business. Those customers might have come back later or they went to Safeway.

Data Driven Apps to the Rescue

Maybe the store manager should have noticed the empty store and lack of sales and appointed a team member to direct traffic to the inconspicuous entrance on Mariposa St. to drive people into the store. But, likely the manager was otherwise engaged handling inventory, payroll and other pressing issues in back. If he had the benefit of a Point-of-Sale Data Driven App, monitoring retail transactions across all national point-of-sale terminals he would have received a real-time alert of the dip in sales allowing him to assess the situation and get a team member in action saving thousands of dollars in revenue for the day. Another tangible example of harnessing big data streams for real-time action.