This post will take you through the intricacies of Change Data Capture (CDC) that is generally available on the Microsoft SQL Server and Azure SQL Managed Instance.
The CDC feature is critical in today’s data-driven business environment primarily because the focus is on data security. It is not merely limited to data breaches and hacking but other aspects too like securing changed data where their values have to be stored in a way that their history is not compromised. In the past, there have been many solutions for saving changed data like complex queries, triggers, timestamps, and even data auditing but none of them have been fool-proof enough.
First of the mark with CDC was Microsoft when in 2005, the software giant introduced SQL Server CDC with its “after update”, “after insert”, and “after delete” features. Though useful, certain discrepancies persisted in the implementation of this version and it was not until 2008 that a more advanced version of the SQL Server CDC was launched. This CDC technology helped DBAs and developers to capture and archive data without going through additional programming tasks.
Overview of Change Data Capture
In SQL Server CDC, Change Data Capture makes use of the SQL Server to apply change activities like insert, update, and delete to a table, the details of which are easily available in a user-friendly relational format. Necessary inputs required to capture the changes to a target environment like column information and metadata are captured for the modified rows. These changes are then stored in tables that reflect the column structure of the tracked stored tables. Access to the change data by consumers is ensured by the required table-valued functions.
One of the best examples of a consumer that is targeted by this change data capture technology is the ETL (Extract, Transform, Load) application. In SQL Server CDC, change data from SQL source tables are loaded by an ETL application incrementally to a data warehouse or a data mart.
While source tables located within a data warehouse must mirror changes in them, a technology that has to continually refresh a replica of the source is complex to use. Instead, a more appropriate technology would enable a steady stream of change data structured to help consumers apply it to divergent target representations of the data. This technology is provided by SQL Server CDC.
Functioning of SQL Server CDC
Change Data Capture tracks and monitors any changes that are made in tables created by the user. These changes are then stored in relational tables that can be easily accessed for retrieval of the data with T-SQL. A mirrored image is created of the tracked table whenever the features of the CDDC technology are applied to a database table.
The column structure of the replicated table has additional columns of metadata that check the type of changes made in the database row. Other than this aspect, the replicated table and the source tables are similar. The new audit tables can be used by the SQL DBA after completion of SQL Server CDC to track the logged tables and other activities that have occurred.
The transaction log in the SQL Server CDC is what shows the source of change in CDC. As soon as any modifications are detected in the tracked source tables (inserts, updates, deletes), details of these entries are added to the log and become the referral point in CDC. This log is then read and all descriptions of the changes are linked to the change table segment of the original table.
Types of Change Data Capture
There are two types of SQL Server CDC. In the first stage organizations can choose the log-based CDC and later use the trigger-based CDC.
- Log-based CDC: In this form of CDC, the system reads the transaction log and file of a database to find out the changes made in the source system. It then replicates the changes made at the source to the target database. The benefit of this method is that it is highly reliable with no missed changes and minimal impact on the production database system. There is also no need to change the schemas of the production database or add new tables. The downside is that this type of CDC is very complex and works only with databases that support log-based CDC.
- Trigger-based CDC: Database triggers are used in this type of change data capture. The process is triggered when another event occurs. The database triggers lower the costs of extracting the changes. However, there is also an increase in the overhead of running the source systems as a certain amount of additional runtime is required every time the database refreshes.
The advantages of this trigger-based CDC are that it is easy to implement, changes occur faster, detailed logs of all transactions are provided in shadow tables, and direct support is received in the SQL API for selected databases. On the other hand, there are some downsides too like trigger overload and triggers getting disabled during certain operations. Further, there is also a significantreduction in the functioning of the database as the method requires multiple writes to a database every time changes are made to rows.
SQL Server CDC creates two activities every time it works. In the first instance, the changed information populates the database change tables. In the second, the change tables are cleaned by deleting records that are older than the configurable retention. Changes that are monitored by the SQL Server CDC can also be loaded from the OLTP source to the OLAP warehouse source through the ETL or the T-SQL processes.