While Integrate.io ETL platform works with primarily UTF-8 encoded data, other character encodings can be processed with steps as shown in this example:
- Read data as raw and binary data type.
Convert the byte array data in the given encoding to a string type using
Split the data from step 2 using
(body,'\n')and then a
Flatten()to get individual records or lines.
Remove headers as applicable (if it is from an API) with the filter transformation. Text matches(regex) options can be useful here.
Individual lines are split based on the relevant delimiter using
Extract the required fields from the tuple as line.$0, line.$1,line.$2 and so on.