In this post we cover the built-in SQL Server CDC functionality, how to set it up, and some pros and cons of this feature. We also introduce Striim’s high-speed, reliable SQL Server CDC solution.
- Built-in SQL Server CDC
- Tutorial: How to set up the built-in SQL Server CDC functionality
- Striim for High-Speed, Reliable SQL Server CDC
What is Change Data Capture?
Change Data Capture (CDC) is a software process that identifies and tracks changes in a database. There are various CDC methods, each with their pros and cons, but the main benefit of CDC is that it enables near-real-time data movement from databases. Continuous data capture is important for both analytics use cases (where timely data is used to make decisions) and zero-downtime cloud migrations (where operational databases can’t be taken offline for extended periods).
What is SQL Server?
Microsoft SQL Server is a relational database management system (RDMS) developed by Microsoft. SQL Server is mainly used to store and retrieve data. For the purposes of analysis and business intelligence, data is typically replicated to a data warehouse optimized for large analytics queries.
Change Data Capture is an ideal solution for replication of SQL Server data in near real-time. In this post, we’ll share an overview of both SQL Server’s built-in CDC functionality and Striim’s high-speed SQL Server CDC reader.
SQL Server CDC Methods
Built-in SQL Server CDC
SQL Server offers a built-in CDC functionality, as depicted in the diagram below.
The source table contains the data that’s being inserted, updated, and deleted by applications. The SQL Server transaction log keeps a record of the changes. If CDC is enabled, a SQL Server agent reads inserts, updates, and deletes from the transaction log. These changes are added to a separate change table, where query functions are used to capture the changes so they can be delivered to a target data warehouse or other destination via a data integration process like ETL.
What are the benefits and drawbacks of the built-in SQL Server Change Data Capture feature?
- Built-in feature, no external tools needed
- Uses familiar T-SQL language
- Adding a change table adds overhead to the source database
- Querying change tables adds latency
- Change tables need to be cleaned up on a regular basis
- Since a change table is created for each CDC-enabled table, it can be challenging to consolidate changes from multiple tables (and databases)
Tutorial: How to set up the built-in SQL Server CDC functionality
This tutorial describes how to set up change data capture (CDC) in SQL Server using Microsoft’s AdventureWorks sample database to illustrate the steps.
For additional information see SQL Server setup in the documentation.
The following are required to complete this tutorial:
- Existing SQL Server running instance
- SQL Server management tools
- (Optional) AdventureWorks sample database
Set up Database-Level CDC
CDC must be enabled both at the database and table level in SQL Server. Database-level CDC functionality is enabled by setting is_cdc_enabled to 1 in the sys.databases table.
Invoke the following commands to check the current state of database-level CDC functionality:
USE AdventureWorksDW2019 GO SELECT database_id, name, is_cdc_enabled FROM sys.databases GO
Locate your database (e.g., AdventureWorks2019) in the resultset. If is_cdc_enabled is 0 as shown below, then CDC is not currently enabled at the database level:
Invoke the following commands to enable database-level CDC. These commands use SQL Server’s built-in sys.sp_cdc_enabled_dbd stored procedure to perform the necessary updates:
USE AdventureWorksDW2019 GO EXEC sys.sp_cdc_enable_db GO
Re-invoke the verification commands from the start of this section to verify that is_cdc_enabled has is now set to 1:
Set up Table-Level CDC
In addition to setting up database-level CDC, you must also enable CDC on a per-table basis.
This section describes how to set up and verify table-level CDC on the AdventureWorks database’s DimAccount table.
Invoke the following commands to set up CDC on the DimAccount table. Note the source_schema parameter must be assigned the name of the schema to which the table (DimAccount) belongs, in this case dbo:
USE AdventureWorksDW2019 GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'DimAccount', @role_name = NULL GO
Verify that CDC is enabled for the DimAccount table by checking that is_tracked_by_cdc is set to 1 for the database in sys.tables.
Invoke the following command to return all tables listed in sys.tables:
USE AdventureWorksDW2019 GO SELECT object_id, SCHEMA_NAME(Schema_id) As [Schema Name], name As [Table Name], is_tracked_by_cdc FROM sys.tables GO
Locate the row for the DimAccount table and verify that is_tracked_by_cdc is set to 1:
This process results in the creation of a system table prefixed with cdc. called cdc.dbo_DimAccount_CT that can be seen in Object Explorer and further verified by querying the table.
Invoke the following command to verify that cdc.Dbo_DimAccount_CT can be queried and that a row is returned:
USE AdventureWorksDW2019 GO SELECT * from cdc.Dbo_DimAccount_CT GO
You have now successfully configured CDC for a database and table.
Striim for High Speed, Reliable SQL Server CDC
Striim is a real-time data integration platform that natively performs change data capture by mining database change logs while managing additional metadata. Striim supports enterprise workloads with:
- high-speed capture (more than 7x faster than SQL Server CDC using change tables — see data below)
- exactly-once processing (E1P) to prevent data loss or duplication
- support for long-running transactions
- migration and replication of data from SQL Server databases to cloud and on-premise data warehouses, data lakes, messaging systems, files, and big data targets
As shown in the diagram above, Striim’s SQL Server CDC Reader doesn’t require the use of a change table. Striim reads the change data directly from the SQL Server transaction logs; performs in-memory transformations, enrichment and analysis; and delivers data to target systems…with millisecond latency.