Use the Salesforce source component to read Salesforce sales cloud standard and custom objects using the Bulk API.
Connection
Select an existing Salesforce connection or create a new one (for more information, see Allowing Integrate.io ETL access to my data on Salesforce.)
Source Properties
-
API Version - list of Salesforce Bulk API versions we support. Read more about Bulk API 2.0 here
-
Access mode - select object to extract an entire object or query to execute a SOQL query.
-
Source object - the table name from which the data will be imported.
-
where clause - optional. You can add predicates clauses to the WHERE clause as part of the SQL query that is built in order to get the data from the database. Make sure to skip the keyword WHERE.
Good |
prod_category = 1 AND prod_color = 'red' |
Bad |
WHERE prod_category = 1 AND prod_color = 'red' |
-
Source action - Use
bulk query
(default) to read available records or bulk query all
to read available and deleted records.
-
Query - type in a SOQL query. Read more about SOQL syntax here.
-
Max records - Specify the number of records to retrieve per page/API call to prevent timeouts. If left empty, all records will be retrieved in a single API call. Only available in Bulk API 2.0
Source Schema
After defining the source object, select the fields to use in the source.
The fields you select are used to build the SOQL query that will be executed to read the data.
Loading data incrementally from Salesforce
In order to load data incrementally (changes and additions) to objects, the object to synchronize should have the systemmodstamp column. This column is automatically updated whenever a user or an automated process updates a record. Use the following condition in the where clause field with a variable:
SystemModstamp > $lastsysmod
You can use the last successful submission timestamp for the package as you can see in the example below as a value for the variable, or use ExecuteSqlDatetime function to get the last SystemModstamp in your destination database table.
lastsysmod = CASE WHEN (COALESCE($_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP,'')=='') THEN '1900-01-01T00:00:00Z' ELSE $_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP END
In order to store additions or changes in your database destination, make sure to mark the id column as key and change the operation type to "merge":
Enabling PK Chunking
You can enable PK Chunking for large datasets such as initial bulk load. By enabling this feature, Salesforce automatically splits the Bulk API Query job into multiple batches. Integrate.io ETL then polls for the progress of each of the batch then process them in parallel once all are done. The parallelism depends on the cluster node count (higher node count can pull more batches in parallel). Note that PK Chunking is disabled in Bulk API 2.0
Limitations
It is currently not possible to join a Salesforce Sandbox environment source connection with a Salesforce Production environment source connection.
This limitation applies to Join, Union and Cross-join components to join Sandbox with Production source data.
Querying Parent Fields
It is possible to execute query containing parent fields, for example following query:
SELECT CreatedDate, Id, Sales_User__r.Id, Sales_User__r.Name
FROM Account
Sales_User__r.Id
and Sales_User__r.Name
are fields from Account
's parent object: Sales__User__c
(Sales_User__r
is the relationship name prefix for the Account
object, for more details about the difference on __c
and __r
is here)
Please note that when doing the query, the schema column might not pop up if the data returned is empty.