Change Data Capture Methods
There are several change data capture methods that have been in-use for decades, each with its own merits depending on the use case. In high-velocity data environments where time-sensitive decisions are made, low-latency, reliable, and scalable CDC-powered data flows are imperative.
The business transactions captured in relational databases are critical to understanding the state of business operations. Traditional batch-based approaches to move data once or several times a day introduce latency and reduce the operational value to the organization. Change Data Capture provides real-time or near real-time movement of data by moving and processing data continuously as new database events occur. Moving the data continuously, throughout the day, also uses network bandwidth more efficiently.
There are several change data capture methods to identify changes that need to be captured and moved. Here are the common methods, how they work, and their advantages as well as shortcomings:
Timestamps – By using existing “LAST_UPDATED” or “DATE_MODIFIED” columns, or by adding one if not available in the application, you can create your own CDC solution at the application level. This approach retrieves only the rows that have been changed since the data was last extracted. There may be issues with the integrity of the data in this method – for instance, if a row in the table has been deleted, there will be no DATE_MODIFIED column for this row and the deletion will not be captured. This approach also requires CPU resources to scan the tables for the changed data and maintenance resources to ensure that the DATE_MODIFIED column is applied reliably across all source tables.
Table Differencing – By comparing the tables to be replicated in the source and target systems by running a diff, this approach only loads the data that is different to enable consistency. Although this works better for managing deleted rows, the CPU resources required to identify the differences is significant and the requirement increases in line with the volume of data. The diff method also introduces latency and cannot be performed in real time.
Triggers – Another method for building CDC at the application level is defining triggers and creating your own change log in shadow tables. Triggers fire before or after INSERT, UPDATE, or DELETE commands (that indicate a change) and are used to create a change log. Operating at the SQL level, some users prefer this approach. However, triggers are required for each table in the source database, and they have greater overhead associated with running triggers on operational tables while the changes are being made. In addition to having a significant impact on the performance of the application, maintaining the triggers as the application change leads to management burden. Many application users do not want to risk the application behavior by introducing triggers to operational tables.
Log-Based Change Data Capture – Databases contain transaction (sometimes called redo) logs that store all database events allowing for the database to be recovered in the event of a crash. With log-based CDC, new database transactions – including inserts, updates, and deletes – are read from source databases’ transaction or redo logs. The changes are captured without making application level changes and without having to scan operational tables, both of which add additional workload and reduce source systems’ performance.
Of all of these change data capture methods, the Striim platform uses log-based CDC to continuously extract and move the changed data. Striim also filters, transforms, aggregates, masks, and enriches change data while it is in-motion, allowing it to be delivered to a variety of targets in the format required with sub-second latency.
Built-in database delivery validation capabilities compare the sources and targets as transactions are replicated. This tracks key performance metrics for data pipelines and validates source and target database consistency for zero data loss.
For more information about change data capture methods, and how log-based change data capture delivers real-time data from transactional systems, please see our next blog post titled, Log-Based Change Data Capture: the Best Method for CDC.
If you would like a demo of to see how low-impact, real-time log-based change data capture works, or to talk to one of our CDC experts, please schedule a demo.