In this post, we demonstrate how to use Striim to monitor network traffic, and showcase Striim’s ability to collect, integrate, process, analyze and visualize large streaming data in real time.
The dataset we use is called UNSW-NB15, which is a popular dataset for network traffic analysis. It has about 2 million records, which are simulated tcpdump data from normal traffic and malicious traffic. The following is the application pipeline:
- Striim first reads network traffic data from the source file.
- Striim then extracts features from data records, including IP address, the port number of source and destination, transaction timestamp, and transaction bytes. It also transforms data on the fly, e.g., converts the unit from bytes to megabytes.
- Striim aggregates the total bytes of network traffic data for each IP address per minute. It finds hot IPs that have the heaviest network traffic.
- Striim predicts the future network traffic data one-step ahead with machine learning models (like Random Forest).
- Striim detects anomalies if the actual network traffic is far from the predicted one.
- Finally, Striim writes the results to the target file.
The models are updated periodically to guarantee high prediction performance for continuous data streams. As shown in the image above, Striim detects some peaks and troughs as anomalies (red points). The intuition is that it is abnormal if the network traffic data suddenly increases or drops. The monitoring dashboard will be updated in real time. When a new anomaly is detected, it will show up immediately.
Striim supports a number of preprocessing and machine learning models. In our pipeline, we apply log transformation to stabilize volatile data, fit it into random forest models, and use dynamic percentage error threshold to detect anomalies. We find such model selection performs well according to our empirical evaluation. It is also efficient enough to continuously evolve the unbounded streaming data and support real-time decision making. Some models, like deep neural networks, are time-consuming to retrain and not suitable in the streaming settings.
For more details, please see our 2019 DEBS paper: A Demonstration of Striim.
A Streaming Integration and Intelligence Platform. Striim won Best Demo Award in the 13th ACM International Conference on Distributed and Event‐based Systems (DEBS).