Using components: Window Transformation

Use the Window component to apply window functions to incoming data, similar to window functions in SQL. These functions let you rank or distribute data, provide moving averages, running totals and other useful data. The output of the Window component contains all records and fields from the input data flow with the addition of the calculated window functions.

Data partitioning

Select Treat entire input as a single partition to treat the entire dataset as a single window or Partition input data by fields to determine the partitioning of the incoming dataset to windows. Each partition contains of all the records that have the same values in the partitioning fields.

  1. Select a field in the dropdown list.
  2. Click the + button to add another partitioning field.
  3. Click the X button to remove a field from the partition by list.

Data order

Select Don’t sort input data if you do not want to sort input data or Sort partition by fields to add fields to determine the sorting of the data within a window. This is optional but recommended for many windowing functions (what would "first" mean without order?)

  1. Select a field in the dropdown list.
  2. Specify sort order - ascending or descending. 
  3. Click the + button to add another sorting field.
  4. Click the X button to remove a field from the sort by list.

Window functions
Select the functions to apply to the window and fill in their arguments.

  1. Select function from the drop down (see list below)
  2. Fill in the field and arguments.
  3. Type an alias for the field containing the resulting value.
  4. Click the + button to add another function.
  5. Click the X button to remove a function. 

Functions list

  • Countreturns the number of non-null values in a field within a window range (see range arguments below).
  • Sum - returns the sum of values in a numeric field within a window range. For example, use frame between -∞ and 0 for a running total.
  • Average- returns the average of values in a numeric field within a window range. For example, use frame between -2 and 2 for a moving average on 5 values.
  • Min - returns the minimum value in a field within a window range.
  • Max - returns the maximum value in a field within a window range.
  • First Value - returns the first value in a field within a window range. Make sure the window is sorted.
  • Last Valuereturns the last value in a field within a window range. Make sure the window is sorted.
  • Lead - returns the value from a field in a subsequent record within the window range. The location of the subsequent record (relative to the current record) is defined by the offset argument. The default argument is returned if the lead record can't be found (for example, when the current record is at the end of the window range). Example: use offset of 1 to get the value from a field in the next record.
  • Lag - returns the value from a field in a preceding record within the window range. The location of the preceding record (relative to the current record) is defined by the offset argument. The default argument is returned if the lag record can't be found (for example when the current record is at the beginning of the window range). Example: use offset of 1 to get the value from a field in the previous record.
  • Row number - returns a sequential number for each record within a window range according to the window's ordering.
  • NTILE - distributes the records within a window range into a specified number of buckets set by the argument number of ntiles. For example, NTILE can be used on data partitioned by class and sorted by grades to divide each class to 4 groups of students according to their grades.

Range arguments
Per function, the range arguments further limit the records within the window. The range is relative to the current record and is defined by the number of records preceding the current record and the number of records following the current record. The default values are -∞ ("unbounded preceding" in SQL lingo) and +∞ ("unbounded following"). 0 stands for current record. 
Other examples:

  • "-2" in range before means 2 records preceding the current record are the lower boundary for the function.
  • "5" in range after means 5 records after the current record are the upper boundary for the function.