Using components: Snowflake Destination

Use the Snowflake destination component to store the output of a data flow in a Snowflake table. The destination component stores the data intermediately into Amazon S3 and then uses Snowflake's COPY statement to push the data into the table.

Connection

Select an existing Snowflake connection or create a new one (for more information, see Allowing Integrate.io ETL access to my Snowflake account.)

Destination Properties

  • Target schema - the target table's schema. If empty, the default schema is used.
  • Target table - the name of the target table in your Snowflake database. By default, if the table doesn't exist, it will be created automatically.
  • Automatically create table if it doesn't exist - if unchecked and the table doesn't exist, the job fails.
  • Automatically add missing columns - when checked, the job will check if each of the specified columns exist in the table and if one does not exist, it will add it. Key columns can't be automatically added to a table.

Operation type

Append (Insert only) - default behavior. Data will only be appended to the target table

Overwrite (Truncate and insert) - truncate the target table before data is inserted into the target table.

Overwrite (Delete all rows on table and insert) - deletes all of the target table before the data flow executes. If a truncate statement can't be executed on the target table due to permissions or other constraints, you can use this instead. This operation does not clear the schema.

Merge with existing data using delete and insert - incoming data is merged with existing data in the table by deleting target table data that exists in both the data sets and then inserting all the incoming data into the target table. Requires setting the merge keys correctly in field mapping. Merge is done in a single transaction:

  1. The dataflow's output is copied into a temporary table with the same schema as the target table.
  2. Rows with keys that exist in the temporary table are deleted from the target table.
  3. All rows in the temporary table are inserted into the target table.
  4. Temporary table is dropped.

Merge with existing data using update and insert - incoming data is merged with existing data in the table by updating existing data and inserting new data. Requires setting the merge keys correctly in field mapping. Merge is done in the following manner:

  1. The dataflow's output is copied into a temporary table with the same schema as the target table.
  2. Existing records (by key) in the target table are updated and new records are inserted using the MERGE statement.
  3. Temporary table is dropped.

Pre and post action SQL

Pre-action SQL - SQL code to execute before inserting the data into the target table. If a merge operation is selected, the sql code is executed before the staging table is created.

Post-action SQL - SQL code to execute after inserting the data into the target table. If a merge operation is selected, the sql code is executed after the staging table is merged into the target table.

Advanced options

  • Maximum errors - If this number of errors occurs in Snowflake while loading data into the table, the job fails.
  • Truncate columns - Truncates string values in order for them to fit in the target column specification.
  • Load empty as null - insert empty string as null values
  • Null string - String fields that match this value will be replaced with NULL.

Schema Mapping

Map the dataflow fields to the target table's columns. Columns defined as key will be used as the sort key when Integrate.io ETL creates the table. If merge operation is used, you must select at least a field or multiple fields as keys, which will be used to uniquely identify rows in the table for the merge operation.

The data types in Integrate.io ETL are mapped as follows when the table is created automatically. Note that since Integrate.io ETL doesn't have a notion of maximum string length, the string columns are created with the maximum length allowed in Snowflake.

Integrate.io ETL Snowflake
String VARCHAR
Integer NUMBER
Long NUMBER
Float DOUBLE
Double DOUBLE
DateTime TIMESTAMP_TZ
Boolean BOOLEAN