Snowflake, the Elastic Data Warehouse in the Cloud, has several exciting features such as Virtual Warehouse, Time Travel, Zero Copy cloning, Secure Views, and Undrop. Through these offerings, Snowflake has re-imagined and re-invented what a data warehouse can do. In this article, we will explore two of these - Time Travel & Zero Copy Cloning - from within Integrate.io.
For more information on Integrate.io's native Snowflake connector, visit our Integration page.
Table of Contents
Time Travel enables accessing historical data at any point within a defined period. By default, the maximum retention period is 1 day (i.e. one 24 hour period), as mentioned in the Snowflake documentation on Understanding & Using Time Travel. With Snowflake Enterprise Edition (and higher), this default can be set to any value up to 90 days.
For a quick demonstration, here is a sample daily recurring data pipeline loading one week’s data incrementally from a Google Analytics connection onto Snowflake.
The table GA_USERS in Snowflake shows the current data (T2 from the image below):
Let’s say at T2 (present-day)we need to Time-Travel to the table at T0 (yesterday). This can simply be done with a single query.
From Integrate.io, lets source data from the Snowflake table (Snowflake source is accessible via the DB source component) on a separate pipeline with the query
select * from GA_USERS at(timestamp => '2020-09-21 18:18:59.769 -0700'::timestamp)
This gives the records with date ending at 20200921.
Another option when using the Time Travel feature is the ‘offset’.
To Time-Travel from T2 to T1(say 60 min prior before T2), we can time offset from the present to source the data. Similar to the pipeline mentioned above, let’s source data from Snowflake (accessible via the DB source component) and run the query:
select * from GA_USERS at(offset => -60*60);
This sources the data from the table 60 mins ago (which had 3 records deleted) and shows the preview as below.
As stated above, the standard retention period is 1 day and is automatically enabled for all Snowflake accounts. The DATA_RETENTION_TIME_IN_DAYS is an Object parameter that can be used by users with the ACCOUNTADMIN role to set or control the default retention for the Snowflake account. More information on this is available here.
The Time Travel feature can be a powerful tool for:
- Picking up and processing from the desired state on the table.
- Processing of historical data to make up for any accidental inclusions or deletions on the data.
Zero Copy Cloning
Dubbed as one of the most powerful Snowflake features, Zero Copy Cloning offers an easy way to clone and test the database without physically copying the data. For more information, refer to the Snowflake documentation.
From Integrate.io, furthering our data processing from before, we can create a clone of the Prod Database.
Using CREATE DATABASE DEV CLONE PROD; as a Pre-action SQL command creates a database clone DEV with the table GA_USERS;
(Note: This requires SYSADMIN privileges to create a Database)
Or we can clone a table with CREATE OR REPLACE TABLE GA_USERS_clone CLONE GA_USERS;
This would be a great way to take a snapshot before doing a truncate and using this to compare incremental loading.
Taking the process a step further, we could combine Time-Travel + Clone on the Integrate.io Snowflake destination to capture a historical snapshot of the table on Test DB (Post-action SQL) while creating a clone on Dev (Pre-action SQL).
This creates multiple copies of the tables without replicating the data itself, while also providing the ability to quickly make the data available to use for multiple user groups, without the additional cost and time spent in actually replicating the data.
Zero copy cloning and Time Travel are truly remarkable features that simplify data warehousing and keep it agile. These features and more can be leveraged from within Integrate.io. To explore more of Snowflake’s native integration with Integrate.io, contact us to book your demo today.