While Integrate.io ETL platform works with primarily UTF-8 encoded data, other character encodings can be processed with steps as shown in this example:
- Read data as raw and binary data type.
-
Convert the byte array data in the given encoding to a string type using
ByteArrayToString
(body, 'UTF-16LE')
. -
Split the data from step 2 using
STRSPLITTOBAG
(body,'\n')
and then aFlatten()
to get individual records or lines. -
Remove headers as applicable (if it is from an API) with the filter transformation. Text matches(regex) options can be useful here.
-
Individual lines are split based on the relevant delimiter using
CSVSPLIT
(line, '\t')
. -
Extract the required fields from the tuple as line.$0, line.$1,line.$2 and so on.