In Part 1 of this database replication resync time benchmark study, we discussed why minimizing your database replication resync times is of upmost importance when building mission-critical data products. In this Part 2, we share the breakdown of the tests that were carried out and the detailed results for each platform. 

The six platforms that we benchmarked for their CDC database replication resync times were:

  • Integrate.io
  • Fivetran
  • Hevo
  • Matillion
  • Rivery
  • Estuary

Benchmarks are all about making choices: What kind of data will I use? How much? What kind of setup? How you make these choices matters a lot: Change the configuration options in the data platforms or the structure of your data and the fastest platform can become the slowest.

We’ve tried to make these choices in a way that represents a typical data user, so that the results will be useful to the kind of company that uses the data replication platforms mentioned in this benchmark study.

What Data Did We Use For Testing?

Our goal was to measure the data replication resync times under realistic conditions. We simulated common customer scenarios for initial data loads for the following two sets of databases.

  1. Small Database:

    1. Type: AWS MySQL Aurora

    2. Size: 10 million rows, [16 columns]

    3. Connectivity Method: Direct

  2. Large Database:

    1. Type: AWS MySQL Aurora

    2. Size: 300 million rows, [7 columns]

    3. Connectivity Method: Direct

How Did We Carry Out the Comparison?

We benchmarked the six data replication platforms to compare initial database sync and resync times using the two sample databases.

Test Setup and Configuration

  • Source Database: AWS MySQL Aurora

  • Target Data Warehouse: Snowflake on AWS

  • Replication Method: Direct connectivity for both small and large datasets.

  • Measurement: Initial sync times were recorded for each platform under identical conditions.

Results

Initial Sync Times

Integrate.io

Fivetran

Hevo

Matillion

Rivery

Estuary

Small Database


3 minutes 44 seconds

5 min & 29 sec

22 min

5 min

21 min & 42 sec

42 min

Large Database

19 minutes 17 seconds

49 min & 39 sec

10 hours & 47 min

39 min

13 hours, 42 min & 53 sec

3 hours

Integrate.io

Hevo

Fivetran

Matillion

Rivery

Estuary

The complete log history which is used for calculating the time taken for the sync is attached below:

Conclusion  

Data pipeline platforms have revolutionized the way businesses centralize and leverage their data for downstream applications such as data products, analytics, and machine learning. As data integration becomes more critical, choosing the right platform based on your use cases and requirements is key. 

Our recent benchmark comparison shows that Integrate.io excels in delivering the fastest initial sync and resync times when replicating database data to a data warehouse. This is critical for companies building business-critical data products being powered with data from their data warehouse and have strict SLAs for data uptime. 

To discuss your data uptime needs and learn more about the industry's fastest initial sync and resync times, you can schedule a time to speak with one of our Solution Engineers here or get started with a free trial.