Using components: Database Source

Use the database source component to read data stored in a database table, view or using a query.


Select an existing database connection or create a new one (for more information, see Allow ETL access to my database server).

Source Properties

  • Access mode - select table to extract an entire table/view or query to execute a query.
  • Source schema - the source table's schema. If empty, the default schema is used.
  • Source table/view - the table or view name from which the data will be imported.
  • where clause - optional. You can add predicates clauses to the WHERE clause as part of the SQL query that is built in order to get the data from the database. Make sure to skip the keyword WHERE.
    Good prod_category = 1 AND prod_color = 'red'
    Bad WHERE prod_category = 1 AND prod_color = 'red'
  • Query - type in a SQL query. Make sure to name all columns uniquely.

Table access mode parallelization (used with table access mode only)

To parallelize a query, select a key to split the queries by and the maximum number of parallel connections. When parallelizing a query, a preliminary query will get the minimum and maximum values for the column and then queries will be issued from multiple connections with a where clause that splits the data to ranges. E.g.: pk >= 1 AND pk < 1000, pk >= 1001 AND pk < 2000.

  • Split query by key column - Specify the name of a source table column to split the query by or leave empty to use a single query. It is recommend to a column that is uniformly distributed across its value range (primary key column is a good choice).
  • Max parallel connections - an positive number specifying how many tasks to assign to the import process.

Note: Do not increase the number of tasks above what your database can reasonably support.

Source action

  • None - By default, data is read from the database and transformations are applied immediately.
  • Copy - Copy the data from the database source to intermediate storage before processing the data. This may keep the database connections open for shorter periods of time, but selecting None would usually result in quicker job execution times.

Source Schema

After defining the source table/view/query select the fields to use in the source.

With table access mode, the fields you select are used to build the query that will be executed to read the data.

With query access mode, select all the fields that are defined in the query and make sure to use the same column names

Define the data type for the field. Use the following table when matching database data types to ETL data types.

PostgreSQL MySQL Microsoft
SQL Server
Oracle Snowflake ETL
varchar, char, text, time, interval varchar, nvarchar, text, time varchar, nvarchar, text, ntext, time, datetimeoffset longnvarchar, nchar, nvarchar, longvarchar, char, varchar, clob, nclob varchar, char, character, string, text String
smallint, int bit, bool, tinyint, smallint, mediumint, int, integer tinyint, smallint, int tinyint, integer, smallint Integer
bigint bigint bigint bigint int, integer, bigint, smallint, tinyint, byteint, number(38,0) Long
decimal, real decimal, float decimal, numeric, float float, binary float, real Float
double precision double real numeric, decimal, binary double float, float4, float8, double, double precision, real, decimal, numeric Double
timestamp, date date, datetime, timestamp datetime, date, datetime2, smalldatetime date, time, timestamp, timestamptz, timestampltz date, datetime, timestamp, timestamptz, timestampltz, timestampntz DateTime

Note: The query is executed in the read-committed transaction isolation level.