Sources

Description	The S3 Connector enables users to periodically sync files across multiple destinations. It supports a variety of file formats and allows for comprehensive configuration to meet users' data replication needs.
Supported Replication	Initial Sync Continuous Sync
Authentication Type	IAM Role Authentication

Configuration

General Configurations

Start Date: The initial date from which files should be synced. Useful for historical data imports.
File Prefix: A string that files must start with to be considered for syncing. Helps in filtering relevant files.
File Regex: A regular expression to match file names. Offers precise control over which files are synced.

S3 Connection Configuration

Bucket Name: The name of the S3 bucket from which files will be synced. This is a key identifier in the AWS ecosystem.
Bucket Region: The AWS region the bucket resides in.
Role ARN and External ID: These are part of the IAM role setup that permits access from an external account to your S3 bucket. Customers are required to create a new policy and role within their IAM to facilitate this access. It is important to note that the external id is unique for every client, and this can’t be modified.

Table Name

Table Name: The table name is necessary and will be used to create a destination table for all synced data. Naming should adhere to the conventions and limitations of the destination, such as Redshift and Snowflake, to avoid errors.

Source Schema

The S3 Connector automates the schema fetching process from the source by sampling files within the S3 bucket. It attempts to infer data types but allows user modifications.

Supported Data Types

The connector supports various data types: string, boolean, number, integer, array, object, bigint, date, datetime, with string being the default.
Note: array and object data types will be inserted as blobs.

Schema Considerations

Users can adjust inferred data types but must be cautious of mismatches that could lead to pipeline failures.
Schema changes after setup are not recommended as they may require a full re-sync. For schema modifications, creating a new source and pipeline is advised.
The connector generates a custom primary key (rowNum_filename) for the destination table, which facilitates file syncing and versioning.

Data and Schema Consistency

Empty strings in CSV, TSV, & TXT files are treated as null unless specified as string.
The connector samples 50 rows from up to 5 files to infer the schema.
It is crucial that all files within a source maintain a consistent schema and that the data is valid and parsable. Failure to meet these requirements may result in pipeline failure.
Testing the source connection (Test connection) before schema fetching is essential to identify any authorization issues.

By adhering to these configurations and considerations, users can effectively set up and maintain the S3 Connector, ensuring smooth and accurate data replication processes.

Collections

Only one table per pipeline/source is supported on this connector.

Limitations

Only files with the below formats, and their gzip and bzip2 compressed versions are supported:
- CSV
- TSV
- TXT
- JSON
- JSONL
Files not modified since the last run will not be synced again to prevent syncing the same file multiple times.
Ensure that the bucket-name, file prefix, and file regex are correctly configured to avoid missed synchronizations.

ELT & CDC
Knowledge Base

ELT & CDC Knowledge base

Getting Started

7 Articles

52 Articles

Destinations

5 Articles

Monitoring

5 Articles

Security

6 Articles

Transforms

1 Articles

Data Observability

8 Articles

Sources - S3

Configuration

General Configurations

S3 Connection Configuration

Table Name

Source Schema

Supported Data Types

Schema Considerations

Data and Schema Consistency

Collections

Limitations

Solutions

Support

Company

Language

ELT & CDC Knowledge Base

ELT & CDC Knowledge base

Getting Started

7 Articles

Sources

52 Articles

Destinations

5 Articles

Monitoring

5 Articles

Security

6 Articles

Transforms

1 Articles

Data Observability

8 Articles

Sources - S3

Configuration

General Configurations

S3 Connection Configuration

Table Name

Source Schema

Supported Data Types

Schema Considerations

Data and Schema Consistency

Collections

Limitations

See Also

Solutions

Support

Company

Language

ELT & CDC
Knowledge Base