Pre-Processing of Streaming Data into Hadoop via HFDS and HBase

2 Minute Read

With the proliferation of Hadoop, the volume of stored data is growing exponentially. As a result, many companies are looking to streaming technologies to offload the burden on their Big Data infrastructures. Real-time data ingestion combined with streaming analytics enables companies to filter and add structure and context to data the instant it is born. This reduces the volume of stored data, and makes the data that is loaded into Hadoop more accessible and actionable.

There are two options for adding stream processing to a Big Data infrastructure: roll your own with open source, or leverage an end-to-end platform. The challenge with working with open source technologies is that it is very costly and time consuming to get all of the various open source components to work together in a streaming fashion, especially at scale. To make matters worse, that same expertise is required to maintain the applications as requirements change and open source technologies continuously evolve/fall in and out of style.
The alternative is an end-to-end streaming integration and intelligence platform such as Striim. Striim enables companies to get the value they were promised from their Big Data implementations.

Striim provide native HDFS read and write, as well as support for HBase. This allows companies to add streaming analytics to their Big Data architecture without disrupting or compromising their investment in Hadoop.

The platform enables real-time data ingestion from a wide variety of data sources and, via HDFS, flows those data streams into Hadoop. While the data is streaming, the application is able to pre-process the data, filtering data that does not meet set parameters, and enriching data with context that will make the data loaded into Hadoop more easily understood and actionable.

Conversely, companies can also use the HDFS and HBase adapters to read data out of Hadoop, leveraging stored data to enrich and provide context to streaming data. This processed data can be further refined via predictive analytics, ensuring the maximum usability of streaming data.

The Striim platform offers a quick and easy solution for maximizing the value of your Hadoop implementation through real-time data integration and streaming analytics, without intruding on your Hadoop infrastructure. Learn how pre-processing of streaming data can help you deliver on the promise of your Big Data investment.

Read more +