Initial Sync Process

undefined

During initial sync, Integrate.io counts the number of records of a table and divide it into chunks equally. A chunk represents a select statement with a range each. Each of these chunks will send continuous row data to Avro Stream in parallel. The Avro Stream will combine these records in a batch. When a batch reaches maximum default size, or sync timeout, it will be sent to S3 in form of an Avro file.

Once all records of a chunk were transferred completely to S3, the whole chunk will be marked as finished and will not be reprocessed again upon restart or interruption.

Note that chunking is only supported on tables with number like primary keys (Integer, Big integer, medium integer).