Use the database source component to read data stored in a database table, view or using a query.
Connection
Select an existing database connection or create a new one (for more information, see Allow Integrate.io ETL access to my database server).
Source Properties
-
Access mode - select table to extract an entire table/view or query to execute a query.
-
Source schema - the source table's schema. If empty, the default schema is used.
-
Source table/view - the table or view name from which the data will be imported.
-
where clause - optional. You can add predicates clauses to the WHERE clause as part of the SQL query that is built in order to get the data from the database. Make sure to skip the keyword WHERE.
Good |
prod_category = 1 AND prod_color = 'red' |
Bad |
WHERE prod_category = 1 AND prod_color = 'red' |
-
Query - type in a SQL query. Make sure to name all columns uniquely.
Table access mode parallelization (used with table access mode only)
To parallelize a query, select a key to split the queries by and the maximum number of parallel connections. When parallelizing a query, a preliminary query will get the minimum and maximum values for the column and then queries will be issued from multiple connections with a where clause that splits the data to ranges. E.g.: pk >= 1 AND pk < 1000, pk >= 1001 AND pk < 2000.
-
Split query by key column - Specify the name of a source table column to split the query by or leave empty to use a single query. It is recommend to a column that is uniformly distributed across its value range (primary key column is a good choice).
-
Max parallel connections - an positive number specifying how many tasks to assign to the import process.
Note: Do not increase the number of tasks above what your database can reasonably support.
Source action
-
None - By default, data is read from the database and transformations are applied immediately.
-
Copy - Copy the data from the database source to intermediate storage before processing the data. This may keep the database connections open for shorter periods of time, but selecting None would usually result in quicker job execution times.
Source Schema
After defining the source table/view/query select the fields to use in the source.
With table access mode, the fields you select are used to build the query that will be executed to read the data.
With query access mode, select all the fields that are defined in the query and make sure to use the same column names
Define the data type for the field. Use the following table when matching database data types to Integrate.io ETL data types.
PostgreSQL |
MySQL |
Microsoft SQL Server |
Oracle |
Snowflake |
Integrate.io ETL |
varchar, char, text, time, interval |
varchar, nvarchar, text, time |
varchar, nvarchar, text, ntext, time, datetimeoffset |
longnvarchar, nchar, nvarchar, longvarchar, char, varchar, clob, nclob |
varchar, char, character, string, text |
String |
smallint, int |
bit, bool, tinyint, smallint, mediumint, int, integer |
tinyint, smallint, int |
tinyint, integer, smallint |
|
Integer |
bigint |
bigint |
bigint |
bigint |
int, integer, bigint, smallint, tinyint, byteint, number(38,0) |
Long |
decimal, real |
decimal, float |
decimal, numeric, float |
float, binary float, real |
|
Float |
double precision |
double |
real |
numeric, decimal, binary double |
float, float4, float8, double, double precision, real, decimal, numeric |
Double |
timestamp, date |
date, datetime, timestamp |
datetime, date, datetime2, smalldatetime |
date, time, timestamp, timestamptz, timestampltz |
date, datetime, timestamp, timestamptz, timestampltz, timestampntz |
DateTime |
Note: The query is executed in the read-committed transaction isolation level.