Change Data Capture

Change Data Capture is a design pattern which is used to identify and track the changes happed on the data, so that action can be taken based on the change.

Log Based CDC

Most of the database today has transaction logs which keeps track of changes happened on the database, the idea is scan through the logs and identify the changes

MySQL

In MySQL transaction can be logged on to file by enabling bin log and anyone who needs the data can read and use this. In fact MySQL slaves are using the same mechanism to replicate the data from Master

HBase

In HBase also we can use transaction log from WAL aka WriteAheadLogs to capture the change in HBase. HBase also uses similar WAL replication strategy for cluster to cluster replication. We can use this feature to capture the changes from server and relay the changes to other clients

Redis

In Redis we can use append only persistence feature to read the changes happened on the data and relay to interested clients

Aerospike

In Aerospike also similar feature available, but unfortunately it is on Enterprise edition only

When can you use CDC?

We had use case where we wanted to send the changes happened on our database to multiple clients through event bus and CDC is really good design pattern which can be used. We also need to provide the full text search feature on top our data. So we have used the CDC to power the Elastic search as well

Reference

https://en.wikipedia.org/wiki/Change_data_capture

https://github.com/linkedin/databus/wiki/Databus-for-MySQL

https://debezium.io/documentation/reference/1.2/connectors/mysql.html