Part 2: Data Integration Platforms’ Initial & Re-sync Time Benchmark

Table of Contents

In Part 1 of this database replication resync time benchmark study, we discussed why minimizing your database replication resync times is of upmost importance when building mission-critical data products. In this Part 2, we share the breakdown of the tests that were carried out and the detailed results for each platform.

The six platforms that we benchmarked for their CDC database replication resync times were:

Integrate.io
Fivetran
Hevo
Matillion
Rivery
Estuary

Benchmarks are all about making choices: What kind of data will I use? How much? What kind of setup? How you make these choices matters a lot: Change the configuration options in the data platforms or the structure of your data and the fastest platform can become the slowest.

We’ve tried to make these choices in a way that represents a typical data user, so that the results will be useful to the kind of company that uses the data replication platforms mentioned in this benchmark study.

What Data Did We Use For Testing?

Our goal was to measure the data replication resync times under realistic conditions. We simulated common customer scenarios for initial data loads for the following two sets of databases.

Small Database:
1. Type: AWS MySQL Aurora
2. Size: 10 million rows, [16 columns]
3. Connectivity Method: Direct
Large Database:
1. Type: AWS MySQL Aurora
2. Size: 300 million rows, [7 columns]
3. Connectivity Method: Direct

How Did We Carry Out the Comparison?

We benchmarked the six data replication platforms to compare initial database sync and resync times using the two sample databases.

Test Setup and Configuration

Source Database: AWS MySQL Aurora
Target Data Warehouse: Snowflake on AWS
Replication Method: Direct connectivity for both small and large datasets.
Measurement: Initial sync times were recorded for each platform under identical conditions.

Results

Initial Sync Times	Integrate.io	Fivetran	Hevo	Matillion	Rivery	Estuary
Small Database	3 minutes 44 seconds	5 min & 29 sec	22 min	5 min	21 min & 42 sec	42 min
Large Database	19 minutes 17 seconds	49 min & 39 sec	10 hours & 47 min	39 min	13 hours, 42 min & 53 sec	3 hours

Integrate.io

Hevo

Fivetran

Matillion

Rivery

Estuary

The complete log history which is used for calculating the time taken for the sync is attached below:

Conclusion

Data pipeline platforms have revolutionized the way businesses centralize and leverage their data for downstream applications such as data products, analytics, and machine learning. As data integration becomes more critical, choosing the right platform based on your use cases and requirements is key.

Our recent benchmark comparison shows that Integrate.io excels in delivering the fastest initial sync and resync times when replicating database data to a data warehouse. This is critical for companies building business-critical data products being powered with data from their data warehouse and have strict SLAs for data uptime.

To discuss your data uptime needs and learn more about the industry's fastest initial sync and resync times, you can schedule a time to speak with one of our Solution Engineers here or get started with a free trial.

Replication resync

Part 2: Data Integration Platforms’ Initial & Resync Time Benchmark