Microsoft® Azure® HDInsight® is a fully-managed cloud service on Azure for open source analytics. It enables customers to use popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more in the Azure Cloud environment. Azure HDInsight supports a broad range of use cases including data warehousing, machine learning, and IoT analytics.
With Azure HDInsight, companies can benefit from the comprehensive capabilities of a high-powered and reliable cloud service environment to perform big data analytics with higher developer productivity and lower cost of management. To fully take advantage of this, companies need to be able to move data in and out of HDInsight in real time.
Striim is cloud-hosted platform that enables the continuous movement of data, both in to HDInsight from a wide variety of data sources, and out from HDInsight to a wide variety of data targets.
When Azure customers sign up to use HDInsight to run analytics workloads in the cloud via Hadoop, Kafka, or Spark, they need to set up data flow from their on-premises and other cloud-based data sources to their analytical environment on HDInsight.
Striim provides continuous, real-time data integration to HDInsight from enterprise databases – using low-impact change data capture (CDC) – log files, messaging systems, sensors, and Hadoop solutions. It offers a secure, reliable, and scalable service for real-time collection, preparation, and movement of unstructured, semi-structured, and structured data into Kafka, Hadoop, and Spark on Azure HDInsight.
While the data is streaming, Striim enables in-flight processing and enrichment before delivering to Kafka, HDFS, HBase, Hive, or Spark. HDInsight customers can store the data in the right format, helping them to accelerate the insight gained from their analytics applications.
Striim moves data from on-premises and other cloud-based data stores to Azure HDInsight in real time. Striim simplifies working with an agile and modern data architecture by replacing the traditional batch processing with non-intrusive, streaming data integration. Real-time data flow with in-flight data processing drives more operational value from analytics applications on HDInsight. In addition, customers can build machine learning applications in Azure HDInsight using high-velocity data continuously delivered from on-premises sources.
Running in HDInsight cluster directly, Striim can be deployed as part of the HDInsight services. Once an HDInsight user starts provisioning their service, they can add Striim as a recommended add-on service to the same HDInsight cluster with just a few clicks.
Run a real-time big data pipeline in the cloud to Hadoop, Kafka, and Spark
Seamless user experience via easy provisioning within the Azure HDInsight cluster
Data distribution back to on-premises systems and users