Resources > eBooks > Streaming Cloud Integration: Key Data Considerations for Hybrid and Multicloud Architectures

Cloud Data Migration – Moving Legacy Systems to the Cloud using Change Data Capture

Cloud infrastructure and platform services have been a popular way that businesses modernize their legacy IT solutions by lifting and shifting them for rehosting, introducing revisions such as extending the code, or rebuilding a new application. However, moving an application to run in a VM or container in the cloud, or revising to make the best use of cloud technologies, is only part of the problem. Migrating the data is also a big part of the task. It is closely linked to how much downtime the existing application users will face during the migration. For a large percentage of business applications, the acceptable amount of downtime approaches zero. However, moving large amounts of data, and testing the applications in the cloud can take days, weeks, or even months.

When businesses build new applications in the cloud, data migration from existing, relevant systems is also a crucial step to get the project going. If source systems are critical production environments, how the data is extracted and loaded into the new cloud environment plays a big role in keeping the source systems performant. Below are several critical data integration considerations for cloud migration, whether it is for moving existing databases to cloud, or moving relevant data sets for new services and applications.

Key Considerations for Cloud Migration

Real-Time Data Flow Across Heterogeneous Systems to Minimize Downtime: Most of the time, when businesses move their data to a new cloud environment, the target database is not the same as the source system. In addition, for operational systems, data migration carries the risks of interrupting the business processes, as well as losing data during the data movement.

The modern approach to data migration is combining initial batch load with real-time change data capture (CDC) technology, so the source system does not need to be paused for the data migration. The CDC component collects and applies the changes collected during the initial load. As a result, the switch over to the target system can be done as soon as the cloud environment has fully caught up with the transactions that occurred in the source system and has been thoroughly tested with live data. Log-based, non-intrusive CDC allows online database migration by enabling source systems to continue to run, while moving the data in real time to the target system.

Transactional Consistency: For OLTP systems that are moved to cloud, ensuring the transactional consistency between the legacy and the new system is an essential requirement, one that a batch extract-and-load approach cannot guarantee. The data integration solution should maintain transactional integrity of the business events and provide some form of validation that all the data is moved to the target, and there is no data loss. A few CDC offerings come with mechanisms that track data movement and processing to prevent data loss or duplicates, and validate that source and target databases are consistent before the new cloud-based database takes over.

Combining initial load with real-time change data capture enables a seamless transition to cloud environments without database downtime.

Scalability for Initial Load: Other considerations for the data integration solution in a migration use case include being able to support high-performance initial loads since that step can take a long while in today’s extremely large data volume environments. A solution designed for easy and inexpensive scaling should be preferred to allow long-term value from the investment. Solutions that have parallelization capabilities might be a better fit for high-volume environments.

Post-Migration, Continuous Data Integration: After operational systems go to production in the cloud, they need to be connected to relevant in-house or cloud-based business systems to keep their data up-to-date. Therefore, continuous real-time data pipelines are necessary after the migration phase as well. Using a single cloud integration solution throughout minimizes development efforts and reduces risks with a simplified solution architecture.