Use the Cube and Rollup component to group the input dataset by combinations of fields and use aggregate functions such as Count, Average, Minimum, Maximum, etc.
Cube will produce results of the functions for each combination of expressions selected. Rollup will limit the results to the combinations for the hierarchical combination of expressions.
For example, you may want to calculate total profit in a chain of stores for time period, region and department. Using rollup with fields in that order will yield results for:
- Subtotals for each combination of time period, region and department
- Subtotals for each time period and region
- Subtotals for each time period
- Grand total
Note there are results for combinations of time period with the other fields based on the hierarchy of fields, but no results for combinations of time period and department or of each combination of region or department with the other fields. Use cube instead of rollup if results for every possible combination is desired.
To aggregate records with cube or rollup:
- Add a Cube component where required in your package.
- Open the component and name it.
- Under grouping combinations, select cube or rollup grouping and then select the fields on which to perform the summary functions. The results will be returned for combinations of the selected fields.
- Add another cube or rollup aggregation if required.
- Under aggregate functions, select the aggregate function you want to apply as follows:
-
Count - returns the number of non-null values in the field you specify in the field column, according to the groupings.
-
Count Distinct - returns the number of unique values in the field you specify in the field column, according to the groupings.
-
Count All - returns the number of records, according to the groupings.
-
HyperLogLog++ - uses the HyperLogLog++ algorithm to return a cardinality estimate or an approximate number of distinct values in the field you specify, according to the groupings. Return value data type is long.
-
Average - returns the average for numeric fields you specify in the field column, according to the groupings.
-
Sum - returns the sum for numeric fields you specify in the field column, according to the groupings.
-
Min - returns the minimum value for the field you specify in the field column, according to the groupings.
-
Min By - for the minimum value in the field you specify in the field column, and according to the groupings, returns the value defined by projected field.
-
Max - calculates the maximum value for the field you specify in the field column, according to the groupings.
-
Max By - for the maximum value in the field you specify in the field column, and according to the groupings, returns the value defined by projected field.
-
VAR - returns the statistical variance for all values in the field you specify in the field column and according to the groupings.
-
VARP - returns the statistical variance for the population of all values in the field you specify in the field column and according to the groupings.
-
STDEV - returns the statistical standard deviation for all values in the field you specify in the field column and according to the groupings.
-
STDEVP - returns the statistical standard deviation for the population of all values in the field you specify in the field column and according to the groupings.
You can use functions in the field column to manipulate field data (see Using functions in components).
Type an alias for the field that contains the resulting values for the function.
Add another function if required.
Impact of null value
Null values are used in columns to represent subtotals in cube and rollup operations. In order to differentiate the legitimate null values that already exist in records, any null values in the records are converted to "unknown" value before performing the cube or rollup operation.