Variable name |
Description |
_ADWORDS_API_MAX_INPUT_SPLITS |
maximum number of concurrent Google Adwords requests. |
_ADWORDS_API_REQUEST_READ_TIMEOUT |
request timeout (in milliseconds) for Google Adwords source components. |
_ADWORDS_API_SKIP_BAD_ACCOUNTS |
set to true in order to have a package complete successfully when an adwords customer id is inaccessible (with Google Adwords source components). |
_BQ_READER_MAX_SHARDS |
Set to 1 if Google BigQuery returns the error: Exporting to multiple wildcard URIs...
|
_BQ_READER_POLL_INTERVAL |
Sets the interval in milliseconds between retries when polling data export from Google BigQuery. |
_BQ_READER_POLL_RETRIES |
Controls the number of retries when polling data export from Google BigQuery. |
_BYTES_PER_REDUCER |
the string size, in bytes, that can be read for each record. Larger records are ignored. |
_CACHED_BAG_MEMORY_PERCENT |
Percentage of the heap allocated for all bags in a map or reduce task. When the amount is filled, data is spilled to disk. Higher value reduces spills to disk but increases likelihood of running out of heap memory. |
_COPY_TARGET_PARTITIONS |
Controls how many partitions the data is divided to by the copy pre-process action. Setting this variable's value to 0 forces the process not to merge files. |
_COPY_TARGET_SIZE |
Controls the maximum size per file in partition for files that are concatenated by the copy pre-process action. |
_COPY_PARALLELISM |
Controls how many processes are used in the copy pre-process action. |
_DEFAULT_TIMEZONE |
default time zone for date-time datatype fields. |
_DEFAULT_PARALLELISM |
Sets the default number of parallel reduce tasks to use in the package. Generally speaking, the number of reducers depend on the size of your data and it's distribution. If your data is relatively big but skewed (for example, when you aggregate by a field, most records fall into one group), adding more reducers will not have a positive effect on performance. The default value is 0, which means that the number of reducers is being calculated by _BYTES_PER_REDUCER. |
_FB_ASYNC_REPORT_TIMEOUT |
Request timeout in seconds for a Facebook Ads Insights source async report request (per attempt). If this value is exceeded, the attempt fails (default - no timeout). |
_FACEBOOK_ADS_INSIGHTS_SLEEP |
interval (ms) between retry attempts when trying to get Facebook Ads Insights report (default - 0) |
_FS_IGNORE_MISSING_INPUT_EXCEPTIONS |
set to true in order to have the package complete successfully when no input is found in source path (with file storage source components). |
_FS_SFTP_BLOCK_SIZE |
determines the size of block read by a task from SFTP. Change to a value greater than default when reading large files and SFTP server doesn't allow reading file starting at an offset>0. |
_FS_SFTP_MAX_RETRIES |
number of retry attempts when trying to find files or directories in SFTP (default - 5). |
_FS_SFTP_RETRY_SLEEP |
interval (ms) between retry attempts when trying to find files or directories in SFTP (default - 500) |
_GA_API_MAX_INPUT_SPLITS |
maximum number of concurrent Google Analytics requests. |
_GA_API_SKIP_BAD_PROFILES |
set to true in order to have a package complete successfully when Google Analytics profile id is inaccessible. |
_GA_API_REQUEST_MAX_RESULTS |
maximum results per page for Google Analytics source components. |
_GA_API_REQUEST_READ_TIMEOUT |
request timeout (in milliseconds) for Google Analytics source components.
|
_GA4_API_REQUEST_MAX_RESULTS |
maximum results per page for Google Analytics (GA4) source components.
|
_GA4_API_REQUEST_MAX_ATTEMPTS |
maximum attempts when trying to make a request to GA4 API
|
_GA4_API_MAX_INPUT_SPLITS |
maximum number of concurrent Google Analytics 4 requests.
|
_HTTP_FOLLOW_REDIRECTS |
set to false if you would like the REST API source or *Curl* functions not to follow redirect status code. |
_HTTP_REQUEST_MAX_RETRIES |
number of retries REST API source or *Curl* functions attempt when receiving response code 429 or 5xx before throwing an exception.
|
_HUBSPOT_API_REQUEST_MAX_RETRIES |
number of retries Hubspot API attempt when receiving rate limit-related errors.
|
_JDBC_SPLIT_QUERY_RETRIES |
number of retry attempts to get the min and max values for the key in database source parallel queries. |
_JDBC_SPLIT_QUERY_RETRIES_INTERVAL_IN_SEC |
interval to wait between retry attempts to get the min and max values for the key in database source parallel queries. |
_INTERMEDIATE_COMPRESSION |
compression for intermediate results. Defaults to false. |
_LINE_RECORD_READER_MAX_LENGTH |
maximum length in byte for lines read from files. Lines longer than this value will be discraded. |
_MAP_MAX_ATTEMPTS |
number of times to try to execute a map task before failing the job. |
_MAP_MAX_FAILURES_PERCENT |
Controls the maximum percentage of map tasks that are allowed to fail without triggering job failure. The value range is 0-100. |
_MAP_TASK_TIMEOUT |
number of milliseconds before task is killed if the task doesn't update status. |
_MAX_COMBINED_SPLIT_SIZE |
amount of data, in bytes, to be processed by a single task. Smaller files are combined until this size is reached. Larger files are split if they are uncompressed or compressed using Bzip2. |
_PARQUET_BLOCK_SIZE |
Size of a row group being buffered in memory for Apache Parquet. |
_PARQUET_COMPRESSION |
Compression type for Apache Parquet. Available values are: UNCOMPRESSED, GZIP, SNAPPY. |
_PARQUET_PAGE_SIZE |
Page size for Apache Parquet compression. |
_REDUCER_MAX_ATTEMPTS |
number of times to try to execute a map task before failing the job. |
_REDUCER_MAX_FAILURES_PERCENT |
maximum percentage of reduce tasks that are allowed to fail without triggering job failure. The value range is 0-100. |
_SALESFORCE_API_REQUEST_MAX_RETRIES |
number of retries Salesforce API attempt when receiving rate limit-related errors. |
_SHUFFLE_INPUT_BUFFER_PERCENT |
The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. |
_SPANNER_BATCH_SIZE |
determines the batch size in Google Spanner destination. Default batch size is 100 |
_SQL_COMMAND_TIMEOUT_IN_SEC |
command timeout (in seconds) for SQL statements before failing/retrying. By default, SQL commands do not timeout. |
_SYNC_WAIT_TIME |
time in seconds to wait between staging data for an Amazon Redshift destination and executing COPY on the Redshift cluster. This applies to Snowflake destination as well. |
_TIMEOUT_IN_SECONDS |
Number of seconds after which the job will be stopped. Allows you to automatically stop the job if it's taking too long. By default- no timeout is set (the job will run indefinitely). This can be set on the account level as well which would apply it to all of the packages. Please reach out if you would like to set it on the account level. |