According to a study by Seagate, only 32% of data available to enterprises is put to work. The remaining 68% is unleveraged. One of the challenges noted is: making the different silos of collected data available. Using automation to bring together figures from disparate systems helps leaders make confident and reliable decisions backed by real-time information. This overview discusses how to use CDC Change Data Capture to enable real-time analysis.
Table of Contents
- How Does CDC Work in AWS S3
- Prerequisites for Using Amazon S3 as a Target
- CDC and Transaction Order
- Using AWS Data Migration Service (DMS) for CDC
- Best Practices for Using AWS Data Migration Service for CDC
- How Integrate.io Can Help
How Does CDC Work in AWS S3
Amazon Simple Storage Service (S3) is a cloud-based storage service. S3 makes your data available from any location. As it is a cloud-based service, companies can see improved scaling, availability, security, and performance
The premise of S3 is the concept of buckets. Buckets are containers for objects also sometimes referred to as files. Buckets contain the files and the metadata about the files. To store information in S3 developers upload files to the appropriate buckets. Developers can set permissions for each bucket.
Integrate.io offers integrations that allow you to connect AWS S3 to other data sources.
Prerequisites for Using AWS S3 as a Target
AWS includes the Data Migration (DMS) service for using Amazon S3 as a target. There are three requisites before developers can get started:
Location of S3 Bucket
The S3 bucket you are using for the AWS region must reside in the same region as the DMS instance you are using for migration.
IAM Role Requirements
Identity Access Management (IAM) roles are used to assign permissions to accounts to determine access to the system.
Specific IAM rules include:
- The account used for the migration has the IAM role with write and delete permissions
- The role must have tagging so that any objects written to the target can be tagged
- The IAM role is added as a trusted entity
CDC and Transaction Order
Transaction order refers to how the system writes changes to the logs. The two methods are:
CDC Without Transaction Order
By default, changes in AWS DMS are not logged in order of the transaction. Instead, it stores all changes in one or more files for each table. AWS creates directories on the target database to store the changes coming from the source.
Capturing Changes With Transaction Order
AWS DMS can be configured to store transactions in order. This approach requires setting S3 endpoint settings. These settings direct DMS to store the changes in .csv files. These files contain all row changes listed by transaction order.
Using Integrate.io’s no-code tools, you can quickly build integrations that use several AWS functions such as Amazon Aurora, Amazon RDS, and Amazon Redshift.
Using AWS Data Migration Service (DMS) for CDC
Developers have several items to configure before using DMS. These tasks are:
The schema represents the logical configuration of a database. The schema from the source database must be converted to that of the target. This ensures the database configurations match so the information can be updated successfully
Configure Replication Instance
In AWS DMS, a replication instance hosts one or several replication tasks. The replication instances must have enough storage and processing power to migrate the information. DMS performs most processing in memory. However, larger transactions may require disk space for buffering.
Specify Database Endpoints
The endpoints specify connection information about the data store. The endpoints also specify datastore type and location information. One endpoint must be an AWS service. Thus, you can’t migrate from one on-premise data store to another on-site data store.
Create Replication Tasks
The replication tasks migrate the data from the source to the target. You must specify a replication instance the task will use.
Best Practices for Using AWS Data Migration Service for CDC Change Data Capture
Despite the benefits of CDC, it can quickly cause problems if the process isn’t configured properly. Below are a few best practices to follow to minimize issues.
Use Row Filtering When Handling Large Tables
Filtering rows to find updates on large tables could negatively affect performance during the process. To improve performance, break the process into multiple tasks.
Reduce Load on Source Database
An AWS DMS full load task performs a full table scan of the source table. The full load task also runs queries to locate changes to apply to the destination database. Running a table scan and queries could affect performance. To minimize these issues, limit the number of tasks or tables for the migration.
Removing Bottlenecks on Target Database
There may be processes running on the target database that competes with the migration. Turn off unnecessary triggers and secondary indexes for the first load. You can turn them back on for ongoing migrations
Frost & Sullivan's research shows that almost a quarter of IT decision-makers say that automation is one of the top technologies they use to reduce costs and positively affect the bottom line. Automation empowers leaders to assess new market opportunities and make strategic decisions. CDC Change Data Capture is a valuable tool in gathering these insights.
How Integrate.io Can Help
The Integrate.io data integration platform enables you to bring together figures from disparate systems to supply key insight into the business. If you’d like to try these integrations firsthand get in touch with our team and experience the Integrate.io platform for yourself.