How Do I Process a Different Encoding

While ETL platform works with primarily UTF-8 encoded data, other character encodings can be processed with steps as shown in this example:

This dataflow used the functions as detailed in the following steps. These would need to be replaced with the relevant encoding, fields delimiters specific to your use case

  1. Read data as raw and binary data type.
  2. Convert the byte array data in the given encoding to a string type using

    ByteArrayToString(body, 'UTF-16LE').

  3. Split the data from step 2 using STRSPLITTOBAG(body,'\n') and then a Flatten() to get individual records or lines.

  4. Remove headers as applicable (if it is from an API) with the filter transformation. Text matches(regex) options can be useful here.

  5. Individual lines are split based on the relevant delimiter using CSVSPLIT(line, '\t').

  6. Extract the required fields from the tuple as line.$0, line.$1,line.$2 and so on.