While Integrate.io ETL platform works with primarily UTF-8 encoded data, other character encodings can be processed with steps as shown in this example:
This dataflow used the functions as detailed in the following steps. These would need to be replaced with the relevant encoding, fields delimiters specific to your use case
- Read data as raw and binary data type.
-
Convert the byte array data in the given encoding to a string type using
ByteArrayToString
(body, 'UTF-16LE')
.
-
Split the data from step 2 using STRSPLITTOBAG
(body,'\n')
and then a Flatten()
to get individual records or lines.
-
Remove headers as applicable (if it is from an API) with the filter transformation. Text matches(regex) options can be useful here.
-
Individual lines are split based on the relevant delimiter using CSVSPLIT
(line, '\t')
.
-
Extract the required fields from the tuple as line.$0, line.$1,line.$2 and so on.