Rails Upgrade
Schema Importer
Updated sshj to the latest version
Expression / text box fields
Updated validation errors to be less aggressive (lighter colors, highlighting on the relevant part e.g. alias / expression).
Rest API source
Added text preview on Headers
Join component
Fixed a bug where Step 2 is collapsed by default when opening for the first time.
Package variables
Fixed issue where text preview isn’t being shown if vars are more than 3.
Package variables (Secrets)
Added text preview to show examples
Component previewer
Now available for large API responses
Database Destination Component
Added a preview of available columns in the destination table
Package Designer
Note component now aligned to the grid
Google Ads
Update to v17 API
Meta Ads
Update to v20 API
Salesforce
System variable _SALESFORCE_API_REQUEST_MAX_RETRIES: max retry for API calls (to address expired/invalid token error)
Facebook Pages
Remove call_to_actions collection due to insufficient permissions
Remove page_groups collection due to deprecation
MySQL Connector
Add support for column renaming replication (without backfilling)
BigQuery Connector
Add support for column addition / renaming replication
Support deleted record syncing on BULK/REST
Set deleted records support as the default
SQLServer Connector
Support DECIMAL data type mapping to destination
Schema importer updates
Potential memory leak updates to avoid importer downtime (Excel, Facebook, Hubspot, JDBC, Azure)
Added a select box with loaded tables/schemas in database components. Users can now select a table/schema from the list instead of typing the name manually
Support ‘$’ character in SFTP and FTPS passwords
IBM Db2
Now supported as a destination as well as a source
BigQuery
We now show a list of tables on the BigQuery dataset
Backend stack upgrades
Fix the issue where test connection and save source would fail when trying to edit source
UI styles rewrite for improved performance
Usage Dashboard
Add option to order pipelines/tables based on total usage
Change Start Date format to allow setting it from the dashboard
Add manual refresh for authentication tokens
MySQL (and other database sources)
Auto Sync new tables
Automatically add newly created tables to the sync
Facebook Pages connector
During the setup of the source show the list of available pages for sync to the user
Google Ads Connector
Upgrade API version
Added a new String Qualifier option: None (with line ending option). With this option user can set CRLF line ending option with none as a string qualifier.
Added new option Ingest large Excel file for Excel files, when checked it enables to process large files.
Zip file ingestion on File Storage source
Automatically ingest zip file when the file path suffix ends with .zip
Supported record types: delimited values, JSON, Raw, XML
Able to stream password-protected zip files. Leaves password field blank if no password
Viewpoint
API pagination support added
Toast
Add 50% threshold for Monthly Usage notification mailer
Style and usability updates on Settings pages
Usage dashboard
Ability to create reports to filter by specific pipeline/s and table/s
Snowflake
Add override to use TIMESTAMP_NTZ on date fields
Add avro LogicalTypes to support date strings
MySQL
Add override to support date strings from source
Bulk API - Update continuous sync resets on successful run
Add tables to not supported by connector
New data mappings for Redshift/S3 & Snowflake destination
Uses a separate CONFIG_ID
Deprecated fields removed
dateRange option has been added (default set to 7)
BingAds
Update schema and overview docs
Automatically replace . character with _ in salesforce source component field alias
Rails upgrade
Package Variables (Secrets)
We can now store secrets on the package for sensitive variables such as tokens, API keys, passwords, etc. Only the last 4 characters are viewable upon saving the package and these are encrypted in our database. Secrets are only supported on dataflows, and cannot be overwritten on the workflow or schedule level for simplicity.
Memory issue (when previewing schema) updates
Oracle connectors
test connection/preview updates
File Storage
Package system var (_FS_IGNORE_MISSING_INPUT_EXCEPTIONS) supports Excel
Reference object in destination component
Database destination
Added an option beside the Auto-fill to lowercase / uppercase letters — particularly useful for Snowflake warehouse.
Released Data Observability compare metric feature
Updated frontend libraries
Scheduled Replication
Allow 6, 8 & 12hrs sync frequency to set an anchor time
Pipeline sync delay monitoring on cron schedule
Updates how CSS styles are written for part of the app
Update OAuth flow
Salesforce V2
Continuous sync runs executing simultaneously
Marketo
Updated pagination
Sources with Query Mode
Implemented pre-loading of Query box
Search Bar
Fixed an issue when the package name has certain special characters
Angular upgrade
We now support ingesting of parquet file(s) from source component
Null String option for destination component
String fields that match this value will be replaced with NULL. Default value is \N
JDBC upgrade
Facebook
Update to v19
Offstreet API
New pagination support added
HubSpot
Object/fields fetching updates on destination
Introduce BigQuery’s Bag datatype enforce casting option.
This is to address issue when loading Bag datatype generated from function e.g. TOBAG in the transformation component.
Scheduled Replication feature
Allow customer to set a fixed sync time for pipelines on 24hrs replication
Documentation available here
Google Analytics
Fix table with extra dimensions
Add rate limit error handling
Fix issue with invalid metric
Fix issues with Behavior Overview & Acquisition Overview reports
Add support for schema rename
Facebook Ads
Update API version from v14 to v19
S3
Allows customers to sync files in S3 buckets for the following formats, as well as their compressed gzip and bzip2 versions:
CSV
TSV
TXT
JSON
Additional formats supported on request
Queued jobs view
We now show the number of queued and running jobs per cluster
Save search query params in URL after opening and closing jobs modal
Pipeline details view - Show usage statistics starting from 0:00 UTC time
Update Rails version
Package Designer -
Removed grey dead area when zooming out
Angular update
Excel (.xls) File Storage loader updates
Search Update
Updated to Phrase Search to deliver more relevant search results
Added highlights for search keywords in results
Chargify
Rest API
Source’s fields (Raw response type) updates
Add missing fields fixes on AdsV15
Parcel lab API
Swapi API
City Spark API
Pagination support updated
LinkedIn Connection
Updated Linkedin oauth gem
TikTok Connection
Updated TikTok oauth gem
HubSpot Destination
Data type parsing updates
Salesforce Source
We now show a schema mismatch prompt on query mode if the ordering of the fields is incorrect, as it causes data integrity issues. We’ve also removed re-ordering, removal, and adding of fields, similar to the file storage source.
Fix the IDOR vulnerability on the hooks feature
Add validation for the ownership of dependent resources
Add text length constraints to TEXT columns for Snowflake destination
GTID improvements
Deprecated isGtidMode override and replaced by replicationMode
Added support for XID event
Added gtid_executed validation
Only GTID of monitored tables will be saved
Fix bug where schema name is not reflected on the source form
Added noColumnQuotes override to not use quotes in column names
PostgreSQL connector
Fixed indexing issue caused by generic columns
Upgrade pg-subscription-stream library which will upgrade pg-copy-streams to latest version
For Redshift destinations, truncate string to Redshift’s max length on string columns
Rails version upgrade
File Mover functionality
Task component in Workflow
Full documentation available here
Queued Job status
Users can now see if the job is actually running or if it’s waiting for cluster resources. Some notes below:
Queued status - Job is waiting for cluster resources and hasn’t started running yet
Running status - Job is actually running
Runtime is now the actual runtime of the job though not retroactive (jobs before the deployment still show queued time + run time)
There are now three timestamps for a Completed job. Created, Started, and Completed
Job Transition from Queued to Stopped State
Addressed an issue where stopping jobs directly from a queued state was not a valid transition. The fix ensures that the job state transition logic behaves as expected
Note component is movable using both white and yellow areas, also it updates its height and width after text change
Note components are ignored in auto-aligning functionality
Bulk API 2.0 - line endings updates
XML file loader on source component
“End of File” needs to be selected as record delimiter
“Base Record XML Key” must be specified
No longer need to flatten and parse XML data as BAG, we can now call XPath to parse each attribute of the record
Webgains
Process pipeline events in background to avoid unnecessary response wait time
Show detailed error message on test connection
Temporarily remove validation on table due to casing issues
Fix issue when adding multiple columns
Add validation to check if table exists before adding columns
Redshift
Truncate the object string values for specific source connectors
Use unnamed timezone instead of named in MySQL connection validation
Add automatic backoff on long-running fetches for BULK API
Update default timeout for BULK job completion
Fixes on continuous sync state
Fixes on edit source / add re-authenticate button on the dashboard
Show excluded columns in multiple rows
SQL Server
Add proper error handling during the validation
Airtable
Upgrade from API key authentication to OAuth2 (due to API key deprecation by the end of January 2024)
New User Management module for more granular access levels (documentation here)
Validation for archiving packages
Added validation to prevent archival of a package if associated with schedule(s)
Added pending jobs count when calculating number of jobs on a cluster for scheduler load balancing strategy to avoid unbalanced jobs distribution
Ads v14 updates
Ads v15 release
Bulk API 2.0 - output error updates
Bulk API 2.0 - null values updates
Backpressure mechanism for low memory consumption
Support string chunking
Fixes on continuous sync
Fixes for admin console queries
Support custom primary key tables
Snowflake connector
Add validation for future ownership grants on schema
Improve error logging when resync is needed
Google Search Console
YAML definition fixes/allow URI encoding on dependencies
Allow syncing of custom PK tables with chunking
Fix initial sync row count
PostgreSQL
Workspaces Dashboard
Optimized how we compute the packages’ metadata on the Workspaces list dashboard to reduce load time for accounts with a large number of Workspaces
Salesforce Bulk API 2.0 - Error output CSV
Schedule Modal
Fixed an issue where modals are sometimes being closed if there are multiple users navigating on the schedule modals
Enabled Workspaces feature by default for all customers
Salesforce Bulk API 2.0
Error output CSV update
Updated to v14
GA4
Variable name updates
Support for PrivateLink added
Set Parallel Sync as default for pipelines using Snowflake as a destination
Add a check condition for CDC rows schema selection page to remove the error in the logs
Update the default nullable setting for the columns in converters
Add toggle to reset binlog state when resyncing tables through the dashboard
Modify notifications to show if the state has been reset when performing a resync action
SQLServer
Fix for tables without a non-integer primary key
Fixes for the connector to support parallel sync on large databases
Postgres
Salesforce V2 fixes
Refactor Streaming API sync
Mandrill Connector
Set URLs table as unsupported since the endpoint was deprecated
Fix issues for the subaccounts table
Salesforce Connectors
Fix the backpressure mechanism to avoid duplicate records
Secure Tunnel (VPN Tunnel)
Add advanced options to set custom SSH user & custom SSH port
PostgreSQL database upgrade
Angular update to dashboard
Update to coreDNS and Pod disruption budgets for improved cluster reliability
Added 1 hour idle time to sandbox clusters
Component Previewer EU intermediate storage upgrade
Update to host change plugin application
REST API Connector
Added PUT and PATCH Http method support
Functionality for deleting BigQuery staging tables
Relationships fields issue updates
New dashboard UI release
Responsive Web Design - application is now usable on mobile phones
Update the AWS role storage locations to support list of values
Change AWS S3 role creation flow for Snowflake
Delete events updates
Updates on queries for column names
GA4 Export
Report option updates
Improve connection error messages
Upgrade version to v14
Automatic multi-account sync
Azure Postgres
Setup guide - Single Server & Flexible Server
Support for timestamp and row version data types
Datetimeoffset data type support
Salesforce connector
Sync data from REST/Bulk/Streaming API through same connector
Deprecate previous Salesforce connectors (Salesforce CDC, Salesforce REST/Bulk)
Mailer alerts for consumption-based pricing model customers
SSL cert migration to Heroku Automated Certificate Management
PostgreSQL upgrade testing, maintenance window on 15th of October (Sunday) from 18:30 to 19:30 Tokyo time
Salesforce Bulk API 2.0 for destination
To support more streamlined and optimized workflow. (Updated KB here). In this version, we:
Disable “Batch size” option as it’s now automatically handled by Salesforce.
Outputting failed records is not supported yet.
Monthly Mailer
Updated case issue where job wasn’t triggering as expected
EU region production support for high-volume traffic
Turn off GTID by default on new pipelines
Change threshold value to the percentage value on the billing notification email
Changed some of the error messages from ‘info’ to ‘error’ level
Added the trimming of the values to the max length value allowed by Snowflake during the creation of the columns
Updated the max length on SQL Server being interpreted incorrectly
Snowflake error logging improvements
All Connectors
User can now reauthenticate sources in the edit view
GA4 Export connector
Allows customer to pull raw GA4 data from BigQuery after setting up GA4 BigQuery Export
Netsuite Connector
Added error logging
Increase timeout on requests
Remove the InventoryDetail table
Fix data mismatch issue
JSON datatype is replicated to Redshift as VARCHAR and will be truncated if it exceeds VARCHAR max length
Update pooling query formatting
ETL and IIO Dashboard redesign release
Software patch update for Jetty
Error log summary explanation feature now enabled for all accounts
Temporary staging table updates to avoid race condition
Connections/Packages/schedules etc lists are now full-width on the screen filling all the space
Package validation on source
Destination - removed case sensitivity on Auto-Fill button. Will now auto-fill fields as long as the incoming fields alias is the same and not case sensitive.
Bulk API 2.0 for source - to support more streamlined and optimized workflow. In this version, we:
Introduce “Max records” - specify number of records per pages to prevent timeout
Disable “PK Chunking” option
MongoDB
Merge operation _id updates
Destination - load empty data as null option
Destination’s new system variable: _HUBSPOT_API_REQUEST_MAX_RETRIES
Paginations
Beauhurst
CitySpark
Google Analytics (GA4)
Added documentation on the Connection creation modal
Pipeline Engine Update
Released version auto update
Billing Mailer - new opt-in feature, which allows customers to get notified when they surpass certain percentages of their monthly volume limit.
Bulk selection for tables
You can type comma separate table names in the input instead of clicking checkboxes for each item to select given tables.
Removed RELOAD from required permissions
Upgraded library for large data volume timeouts
Add backoff retry support on retryable errors
Added new logging metrics for MySQL and S3 write duration
Improved Job Error Debugging With Error Log Summaries
Receive AI-generated summary of job error log to pinpoint reason for failure
Job load balancing option on scheduler
Ability to reuse cluster with the least number of jobs running
Google Service Account Json Key validation fixes
Allowing universe_domain in JSON key
Auto-map workspace when importing package JSON
Automatically assigning the package to a workspace based on the name
DB Source Component
Updated where the values on Table mode isn’t being removed when switching to Query mode which causes jobs to fail
Create Major Version
Updated where it removes the last few changes when creating a major version
Create Connection Forms
Added a modal to refer to our documentation for setup instructions
Heroku Addon V3
Workspace auto-mapping on package import JSON
Mitigations from pentest in re-test
SOC 2 tests and evidence collection
Password limit on Rest API component connection (up to 128 characters in length)
Microsoft Excel
Facebook Marketing API update to v17
Netsuite JDBC upgrades to 8.10.136.0
Survio
Validation server certificate update
Github connector validation fixes
Parsing the latest position when a GTID set contains multiple positions.
Adds tinyint and smallint as datatypes for number like primary keys.
Version fetching for MariaDB > 11
GA360
Fixed permission validation bug when database, warehouse or schema name are the same
Shopify connector
Auto retry on SSLv3 connection error
MySQL connector
GTID support fixes and configurations net_read_timeout , net_write_timeout and wait_timeout are no longer mandatory for test connection
Show only warning if non-mandatory configurations are not set to recommended values for test connection
Updated password policy
Enforced new password policy on password update / signup (at least 1 lowercase, 1 uppercase, 1 number, and min. 12 characters)
SSO
Updated cases for handling camel-cased emails
Updated Workspaces view with pagination support
AWS marketplace signup and billing updates
RSS feed updates
API key retrieval without password update
Package variables saving updates
Custom objects and properties supports
Google Spanner Data Boost
Netsuite
SOAP validation updates
Multi-reference column support
LeadFeeder
TST Vandale
Add PIPELINE_ERRORED event tracking
New metric gauge which calculates the binlog lag based on the latest record received
Deployed SSH Tunnel feature
Updated session expiration to 12 hours
Add new pipeline events to logging feature
OAuth token leakage fix
Add the collection name to the error object to identify the collection to remove from sync temporarily
Release of hooks feature for notifications
Add a validation run before the force-start pipeline action on staff view
Make SSH tunnel as default and move it to first option
Add security response headers
MariaDB
10.6 support
Validation support for BINLOG MONITOR and REPLICA MONITOR grants for 10.5.8 and above
Convert array and bbject to string
String PK chunking support
Global variables on account level
We now support global variables creation at account level. It can be modified from Account Settings -> Global Variables
Package version on jobs list
We now show the particular version of the package that ran on the jobs list (and clicking on it views the particular version of the package read-only)
Global variables validation function
We are now validating the value or expression of global variables when saving. Please note that creating a global variable using values from other global variables as reference is currently not supported.
Released schema-importer OOM fixes
EKS upgrades
API
Security updates for bug bounty reports
Complex datatype support - load MAP datatype from source to BQ (as BQ’s JSON datatype)
Error logging with Property ID
Introduced backoff retry mechanism with maximum attempts system variable (_GA4_API_REQUEST_MAX_ATTEMPTS default: 3). This is to address issue when customer exceed their quota limits.
Rate limit workaround
Stability improvement. Customers might need to increase _ADWORDS_API_REQUEST_READ_TIMEOUT in order to adhere with these changes (especially customers with high number of properties/records).
Error output - empty file fixes
Column with multiple references - visible through source component’s schema and destination’s column mapping.
Bing Ads
Authentication updates
Add fields (breakdown properties)
Intermittent connection errors will now be auto retried without failing (ECONNRESET, Connection timed out, Timeout acquiring a connection)
Fixed issue on invite when email has capital letters
Create new pipeline event RATE_LIMIT_ERROR for connectors
Migrations, models and constants for SSH tunnel
Salesforce CDC
Fetch source schema in batches, set query batch size to 200, update column fetching, implement renew credentials on backend
Salesforce REST & BULK
Add KnowledgeArticle to tables not supported by BULK API
Show username when first creating the connector, renew and save credentials on the backend
Updated setup on dashboard - separated small scripts to ease customer source setup
Test connection now shows the encountered error (previously only generic message)
Google Sheets
OAuth request only needed scopes by each connector (instead of requesting all previous granted scopes)
Github
Rate-limit issues fix
Dynamic connection on Execute SQL Task component on Workflow. This feature is behind our existing feature flag “Dynamic connection”
Import JSON API - updated variables when creating new packages with dataflow JSON
Sandbox clusters idle time update
Documentation search improvements
Google Cloud Storage
V2 connection
Update to v16
Property name parsing updates
Missing fields fixes on Product Partition Report: campaign.shopping_setting.merchant_id, ad_group_criterion.listing_group.case_value.product_item_id.value
File Storage destination
Dynamic destination directory creation
File Storage Source
Disabled re-ordering of fields which caused data integrity issues
Disabled re-ordering of fields on Query mode which could cause data integrity issues
Inkdesk
Increase timeout when fetching dynamic schema connector catalog
Sentry tracing updates
Added ToggleCase and ReplaceInField transforms in Staff API
Salesforce -
Secure Tunnel Creation - prevent the page from reloading when downloading tunnel template
Confirmation modal for Archive and Deactivate
Create subscription proration behavior setting support
HTTP Connector
Change JobId from random number to UUID
Salesforce Marketing Cloud
Incremental sync updates
Allow choosing between REST and Bulk
Move tables LoginAsEvent, ApiEvents and ListViewEvent to forced Full Sync list.
Move table AccountUserTerritory2View to unsupported table list.
Custom certificate updates
Action parallel build run optimization
Role permission updates
Github Connector
Implement new tables: release_asset, branches, commit_files, commit_parents, commit_pull_request, commit_users_emails, deployments, deployment_statuses, issue_assignees, issue_labels, repository_teams, repository_topics, repositories, workflows, workflow_runs, workflow_pull_requests, collaborator_details
Add option to handle nested bookmarks (both on incremental sync and resync)
Fallback to string on Geometry types
New last successful job timestamp added:
_WORKFLOW_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP
Hard Delete operation - allowing records to bypass the recycle bin and immediately become available for deletion
Error output file updates
Support for Reverse ETL destination added
Support for Rest API source. Supports using HubSpot connection for authentication in Rest API source component.
SDK upgrade to 13.0.15.1
Added new fields
Added new dimensions
Support empty rows
Update to AdsV13
New fields on Campaign report
New regions added:
North America
Snowflake GCP - US East (N. Virginia)
Snowflake Azure - South Central US (Texas)
Europe
Snowflake AWS - EU West (Paris)
Snowflake Azure - UK South (London)
Asia Pacific
Snowflake AWS - Asia Pacific (Osaka)
Snowflake Azure - Central India (Pune)
Middle East
Snowflake Azure - UAE North (Dubai)
South America
Snowflake AWS - South America (Sao Paulo)
Klaviyo (a.klaviyo.com) (March)
Released pipeline event optimization
Google sign-in updates
BigQuery support on Data Observability
Secure tunnel updates
Existing tunnels shows all tunnels (inactive ones included) but active ones are on top of the list
Test tunnel connection button
Link to documentation on secure tunnels
Download bash script from tunnel creation
Select date range for pipeline usage dynamically in the pipeline details page based on the pipeline creation date
Update observability graph
Added type casting support for avro-based destinations
Support more retry error messages
Added rate-limit error message handling
BigQuery destination
Parse years to forcefully have 4 digits.
Update API version to v13
Fix GTID issue
NetSuite
Update collections on yaml definition
Additional types
httpTap
nodeType support
Salesforce Marketing optimizations
Support catalog S3 fetch and update (For dynamic connectors, reuse of catalog files)
Processing Engine
Fix NullPointerException during staging table deletion
Update for accounts fetching
Google Drive
Source Tokens Autofill when creating from “Sources” tab.
Logging feature (Phase 1)
Show pipeline events on pipeline details page (up to 1000 events)
Highlight errored events
Hide full errors on errored pipeline stopped events
Change Salesforce types to align with other snowflake converters
Adobe Analytics
Add source attributes
Create mappings for data types.
Redshift schema and avro converter update
Redshift add support for scale and precision
Salesforce update for column type changed error
Add support for scale and precision for floating value types
Updated catalog replication method
Update catalog.json for Full Sync tables
Update bufferTime to 7 days
Upgrade to v14.0 (singer-io repository)
Add sync_frequency recommendations
UTF16 support
Google Search
Add bufferTime / custom PK feature support
Youtube Analytics
Updated yaml to fetch video details
TikTok Ads
Shopify
Upgrade to v2022-07
Add rate-limit handling
Fix referenced schemas
LinkedIn Ads
Frontend framework migration
Heroku add-on SSO updates
SAML SSO update for an issue during the callback phase of authentication
System upgrade (Ruby version)
Tiktok Ads
Launch as Reverse ETL destination
Staging table deletion on job failure
Added destination parallelism (both Bulk and SOAP API).
Adding a new field: Thread count to apply parallelism during loading data to the destination and reduce the job runtime.
This is very helpful for customers running a job with high number of records in small batch on production cluster with high number of nodes.
Fix discrepancy on error file output
LinkedIn creation/reconnect update
Younium
Auth2AuthProvider - Automatic renew credentials function
Data Observability
Updated alert sync frequency update
Updated alert validation issues
Added date range filter
Updated date format from MON-YY to MON-YYYY (Pipeline and Observability)
Clamp precision and scale to 38 and 37 in Redshift
Add support for null insert on array columns
Metafields collection data mismatch
Sendgrid
Updated YAML definitions
Low-code ETL with 220+ data transformations to prepare your data for insights and reporting.
Replicate data to your warehouses giving you real-time access to all of your critical data.
Generate a REST API on any data source in seconds to power data products.