Using components: Salesforce source

Use the Salesforce source component to read Salesforce sales cloud standard and custom objects using the Bulk API.

Connection

Select an existing Salesforce connection or create a new one (for more information, see Allowing Integrate.io ETL access to my data on Salesforce.)

Source Properties

  • API Version - list of Salesforce Bulk API versions we support. Read more about Bulk API 2.0 here
  • Access mode - select object to extract an entire object or query to execute a SOQL query.
  • Source object - the table name from which the data will be imported.
  • where clause - optional. You can add predicates clauses to the WHERE clause as part of the SQL query that is built in order to get the data from the database. Make sure to skip the keyword WHERE.
    Good prod_category = 1 AND prod_color = 'red'
    Bad WHERE prod_category = 1 AND prod_color = 'red'
  • Source action - Use bulk query (default) to read available records or bulk query all to read available and deleted records.
  • Query - type in a SOQL query. Read more about SOQL syntax here.
  • Max records - Specify the number of records to retrieve per page/API call to prevent timeouts. If left empty, all records will be retrieved in a single API call. Only available in Bulk API 2.0 

thumbnail image

Source Schema

After defining the source object, select the fields to use in the source.

The fields you select are used to build the SOQL query that will be executed to read the data.

Loading data incrementally from Salesforce

In order to load data incrementally (changes and additions) to objects, the object to synchronize should have the systemmodstamp column. This column is automatically updated whenever a user or an automated process updates a record. Use the following condition in the where clause field with a variable:
 
SystemModstamp > $lastsysmod

thumbnail image

You can use the last successful submission timestamp for the package as you can see in the example below as a value for the variable, or use ExecuteSqlDatetime function to get the last SystemModstamp in your destination database table.

lastsysmod = CASE WHEN (COALESCE($_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP,'')=='') THEN '1900-01-01T00:00:00Z' ELSE $_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP END


In order to store additions or changes in your database destination, make sure to mark the id column as key and change the operation type to "merge":

Enabling PK Chunking

You can enable PK Chunking for large datasets such as initial bulk load. By enabling this feature, Salesforce automatically splits the Bulk API Query job into multiple batches. Integrate.io ETL then polls for the progress of each of the batch then process them in parallel once all are done. The parallelism depends on the cluster node count (higher node count can pull more batches in parallel). Note that PK Chunking is disabled in Bulk API 2.0

thumbnail image

Limitations

It is currently not possible to join a Salesforce Sandbox environment source connection with a Salesforce Production environment source connection.

This limitation applies to Join, Union and Cross-join components to join Sandbox with Production source data.

Querying Parent Fields

It is possible to execute query containing parent fields, for example following query: 

SELECT CreatedDate, Id, Sales_User__r.Id, Sales_User__r.Name 
FROM Account

Sales_User__r.Id and Sales_User__r.Name are fields from Account's parent object: Sales__User__c (Sales_User__r is the relationship name prefix for the Account object, for more details about the difference on __c and __r is here)

Please note that when doing the query, the schema column might not pop up if the data returned is empty.