By Irem Radzik with contribution from Katherine Rincon and Steve Wilkes
McKinsey estimates that the growth in spending on cloud initiatives will continue at more than six times the rate of general IT spending through 2020.1 The benefits are clear: elastic scalability, agility to launch innovative solutions quickly using modern technologies, better cash flow management, and lower expenses with a pay-as-you-go model, just to name a few.
These benefits are realized in multiple varied use cases such as offloading reporting and analytics to the cloud, performing dev/test in the cloud, and building modern applications in the cloud. The level of benefits achieved from cloud computing is directly correlated to the extent to which businesses leverage the modern cloud infrastructure for their core operations. The more functions and systems that run in the cloud, the more agility, innovation, and cost benefits can be achieved.
Hybrid cloud architectures are becoming increasingly commonplace as organizations move critical applications and systems to the cloud, while maintaining some existing systems in-house. Without having reliable operational data in the public cloud, businesses cannot use the cloud environment to support their core business functions. Thus, streaming data integration, continuously collecting and moving data while processing it in-stream from on-premises systems to cloud environments and back, is integral for hybrid cloud success.
The traditional batch ETL method to load data to the cloud once a day, or a few times a day, limits the cloud use cases and the resulting value. High-latency data cannot support dynamically changing business functions. In a large digital business, core customer-facing operations can result in millions or even billions of events per second that need to be shared with the cloud services that support operations.
Furthermore, when transactional systems need to be synchronized with cloud environments to support business operations, maintaining transactional integrity matters. Batch ETL processing removes the business transaction context from the data and is not able to track the fast-paced change in business data that happens between data extracts. This reduces the ability to run operational workloads in the cloud, and limits the depth of insights in analytics use cases.
As a result, a real-time, streaming data integration solution that can maintain transactional context becomes a necessity to connect:
Hybrid-cloud, multicloud, or inter-cloud architecture strategies all need a modern, real-time data integration solution that keeps up with the speed of business.
Streaming data integration across cloud-based and in-house systems is difficult to do at scale. The sheer volume of events and requirements for zero data loss and exactly once processing increase the challenge of keeping these data stores up-to-date. Therefore, hybrid cloud architectures need a streaming data integration solution that can address the following additional requirements:
Enterprise-Grade Recovery from Failure: As cloud applications and systems become increasingly business-critical, the streaming data integration infrastructure that supports them needs to be able to handle recovery and failover in an enterprise-grade manner. This requirement includes the ability to handle changes to data structures on the fly, and to replay data streams in the event of failure to ensure zero data loss.
Security and Privacy Provisions: With greater regulatory compliance requirements, ensuring data security and privacy is paramount. Hybrid cloud requires a single authentication and authorization scheme to protect all aspects of streaming data integration and cloud migration. Any solution must provide role-based security with fine-grained access, as well as encryption for protected resources (like passwords and keys) and for data going over any network.
Simple Configuration and Management of Streaming Data Pipelines: Configuring and managing a streaming data integration infrastructure can be complex. Users should look for technologies that offer an easy-to-use, zero-install, web-based user interface (UI) and a declarative programming language such as SQL to ensure that a wide variety of team members in the organization are able to set up and maintain the data pipelines. It is also necessary for IT organizations to be able to easily monitor and manage the pipelines themselves without involving expensive technical resources.
Beyond SaaS offerings for specific business applications, organizations adopt a hybrid cloud infrastructure for different use cases. Streaming integration plays a crucial role in many of these use cases, but particularly for:
In the following chapters, we will examine each of these five use cases and their data-oriented requirements.