While Integrate.io ETL platform works with primarily UTF-8 encoded data, other character encodings can be processed with steps as shown in this example:
![thumbnail image]()
This dataflow used the functions as detailed in the following steps. These would need to be replaced with the relevant encoding, fields delimiters specific to your use case
- Read data as raw and binary data type.
-
Convert the byte array data in the given encoding to a string type using
ByteArrayToString(body, 'UTF-16LE').
-
Split the data from step 2 using STRSPLITTOBAG(body,'\n') and then a Flatten() to get individual records or lines.
-
Remove headers as applicable (if it is from an API) with the filter transformation. Text matches(regex) options can be useful here.
-
Individual lines are split based on the relevant delimiter using CSVSPLIT(line, '\t').
-
Extract the required fields from the tuple as line.$0, line.$1,line.$2 and so on.