Moving Data to Amazon Web Services in Real Time

Unedited Transcript:

Adopting Amazon Web Services is important to your business, but why is real-time data movement through streaming integration, change data capture and stream processing necessary parts of this process? You’ve already decided that you want to adopt Amazon Web Services. This could be Amazon RDS, Amazon Aurora, Amazon Redshift, Amazon S3, Amazon Kinesis, Amazon EMR, or any number of other technologies. You may want to migrate existing applications to AWS, scale elastically as necessary, or use the cloud for analytics or machine learning or any applications in AWS as VMs or containers is only part of the problem. You also need to consider how to move data to the cloud and ensure your applications and analytics are always up to date to make sure the data is in the right format to be valuable. The most important starting point is ensuring you can stream data to the cloud in real time. Batch data movement can cause unpredictable load on cloud targets and that’s a high latency meaning it as often hours old.

Among applications, having up to the second information is essential. For example, to provide current customer information, accurate business reporting, or offer real-time decision making. Streaming data from on-premise to Amazon Web Services required making use of appropriate data collection technologies. For databases, this is change data capture, or CDC. Which directly and continuously intercepts database activity and collects all the inserts, updates and deletes as events as they happen. Log data requires file tailing, which reads at the end of one or more file across potentially multiple machines and streams the latest records as they are written. Other sources like IoT data or third party SAS applications also requires specific treatments in order to ensure data can be streamed in real time. Once you have streaming data, the next consideration is what processing is necessary to make that data valuable for your specific AWS destination. And this depends on the use case for database migration or lesson scalability use cases, but the target schema is similar to the source.

Moving raw data from on premise databases to Amazon RDS or Aurora may be sufficient. The important consideration here is that the source applications typically cannot be stopped, and it takes time to do an initial load. That’s why collecting and delivering database change during and after the initial load is essential for zero downtime migrations. The real time application sourcing from Amazon Kinesis or analytics use cases built on Amazon Redshift or Amazon EMR may be necessary to perform stream processing before the data is delivered to the cloud. This processing can transform the data structure and enrich it with additional context information while the data is in flight, adding value to the data and optimizing downstream analytics. With Striim’s streaming integration platform, we continuously collect data from on-premise or other cloud sources and delivered to all of your Amazon Web Service endpoints to can take care of initial loads as well as CDC for the continuous application of change. And these data flows can be created rapidly and monitored and validating continuously through our intuitive UI. With Striim, your cloud migration, scaling, and analytics are built and iterated on at the speed of your business, ensuring your data is always where you wanted when you want.