How Do I Work with Datetime Data

Data almost always contains temporal details like data creation timestamps, birthdays, invoice dates and the like. Although there's a standard for timestamp representation (ISO 8601), it still comes in many different formats.

In ETL, date and time can be represented in either:

  • A DateTime field
  • A string field in ISO 8601 format, or in a custom string format
  • A numeric field representing Unix time

Reading Temporal Data into DateTime Fields

Based on the source, ETL can automatically read temporal data into DateTime fields as follows:

  • Database, Amazon Redshift and MongoDB source - their respective data type for date and time (e.g. timestamp, DateTime, etc.) 
  • Files (flat files or JSON) storage source - strings that match the ISO 8601 patterns.
  • Rest API source - ISO 8601.

If the temporal data is in a string format (other than ISO 8601) or is a Unix time,  use the function ToDate to convert it into a DateTime field, or ParseDate to try and parse a string using multiple custom formats (or get a null DateTime in case none of the formats match the input string).

To use literal values, it's best to apply the ToDate function to an ISO 8601 string value. For example: ToDate('2013-10-17T23:59:54.432Z')

Applying Functions to Temporal Data

  You can do a lot with temporal data! Some common usage of DateTime functions:

  • Temporal data in string format can be cast to DateTime data type using the functions ToDate, ParseDate, and ToDateCustomFormats.
  • Datetime expressions can be cast to custom string formats using the function ToString.
  • Datetime expressions can be cast to Unix time using the functions ToUnixTime or ToMilliSeconds.
  • Use AddDuration or SubtractDuration to add or subtract a duration to/from DateTime fields.
  • To compare DateTime expressions (in a filter component or in a CASE WHEN... expression), use *Between (SecondsBetween, MinutesBetween, DaysBetween, etc.) functions value. For example: DaysBetween(field1,field2)>2 passes the records for which field2 is greater than field1 by 2 days or more.
  • Derive a portion of a DateTime format field to an integer field using Get* (GetDay, GetMonth, GetYear, etc.) functions.

Working with Time Zones

Datetime fields in ETL contain time zone information. If your temporal data doesn't contain timezone ("wall-clock" time), it is assigned the default timezone when placed in a DateTime field as defined by the system variable _DEFAULT_TIMEZONE. The variable is set by default to the UTC timezone and can be changed as described in setting variables. You can also assign any timezone to a "wall-clock" time using the ToDate function with the timezone argument. The time zone for DateTime fields can also be switched to another time zone using the function SwitchTimeZone.

Writing Temporal Data Into Destinations

A DateTime field can be mapped directly to these destination fields:

  • Database columns as listed in the Database Destination table
  • Amazon Redshift date or timestamp column
  • MongoDB

Note that if the destination column type in a database table does not include all DateTime components, only a portion of the DateTime will be stored in the destination. For example, when mapping ToDate('2013-10-17T23:59:54.432Z') to a date field, only the date portion will be mapped.

The DateTime field can also be stored in its internal format in Cloud Destination flat file or JSON destination types.