Resources > eBooks > Streaming Cloud Integration: Key Data Considerations for Hybrid and Multicloud Architectures

Hybrid Cloud: Offloading Reporting and Analytics to the Cloud

With a wide range of advanced analytics and artificial intelligence solutions available in the cloud, organizations are fast adopting cloud-based analytics to gain new insights quickly and cost-effectively. In general, cloud analytics solutions can be evaluated in two groups:

  1. Long-term analytics, where data is first delivered to cloud storage or cloud data warehouse solution. Advanced analytics services use these environments to work with higher latency data for strategic decision-making, including building new machine learning
  2. Real-time analytics, where data is streamed in an event-based fashion into the Data might be streamed to operational data stores or messaging solutions, such as Amazon Kinesis, Google Cloud Pub/Sub, and Azure Event Hubs in real time to feed streaming analytics services in the cloud for time-sensitive insights.

The next section looks at both of these use cases’ key integration considerations.

Key Data Considerations for Cloud-Based Analytics

Low-Latency, Even with High Data Volumes: For real-time analytics, the requirement for a real-time data integration solution is straightforward. Without real-time data, running fast queries on outdated data cannot provide relevant insights that can influence ongoing operations. Performing batch ETL often is not fast enough for time-sensitive analytics, especially when the data volumes are large.

Even event-based integration solutions can struggle in high-volume, high-velocity environments to keep the end-to-end latency to seconds. Therefore, it is essential to evaluate the data sources and their volumes, as well as the scalability of the real-time data integration solution, to meet the source data type and latency requirements.

Rich Data Sources for Bulk and Real-Time Data Loading: An additional consideration for the long-term analytics use case is that an initial load may be needed when setting up a new cloud analytics solution. Data from existing relevant sources – such as operational databases, Hadoop, messaging systems, etc. – need to be migrated to the cloud environment. For this purpose, working with a data integration solution that can move data in bulk and in real time from a wide range of sources, including machine and IoT data, so rich analytics can be performed in the cloud, should be preferred.

In-Memory Pre-Processing for a Smart, Streaming Data Architecture: While data latency requirements are different, in both long-term analytics and real-time analytics use cases, data needs to be prepared before it can be used for analytics. Filtering, joining, masking, and enrichment, as well as performing transformations such as denormalization, are common ways that data needs to be processed before being written in a consumable format to drive insights, regardless of whether analytics solution runs on-premises or in the cloud.

However, where this preparation step is performed affects the performance and efficiency of the analytics solution. On-disk transformation can introduce significant latency depending on the processing complexity, making the insights less relevant for time-sensitive use cases. For the real-time analytics use case, preparing the data-in-motion using in-memory computing enables users to keep data latency low – very close to real time.

The in-memory, stream processing approach is not only crucial for working with relevant, up-to-the-second operational data, but also for handling the large data volumes in the source systems. For example, if the analytics solution requires IoT sensor data, filtering out data sets that are not applicable for the analysis, or removing redundancy, helps with managing the data volumes stored in the cloud environment. Stream processing is the new and smart way of designing a data architecture to move only the data needed in the required format.

Cloud Use and Storage Optimization: In the case of long-term analytics, in-flight data preparation can provide value through a similar smart data architecture. When dealing with large data volumes flowing from a wide range of sources, filtering out what is not needed is a sure way to bring some efficiency and cost savings for cloud storage and analytics. In addition, using transformed, enriched, masked data in the cloud analytics environment reduces the time to prepare the data for analytics, leading to faster time-to-insight.

Using real-time CDC with in-flight data processing, cloud analytics can bring new and faster operational value.