There are a few standard structured data formats and discussions galore on which of them is more advantageous. Within, users are able to process JSON and XML data formats with ease, and this article shares an example showing the functions that facilitate processing XML on 

Table of Contents:

  1. Overview and Resources
  2. Setting Up the Data Pipeline
  3. Summary

Overview and Resources

For a demonstration, here is the link for the sample XML file we will be processing

The file shows XML structure as in the image below:

thumbnail image

The functions XPath and XPathToBag are key to the processing of this data. Let's examine these with a data pipeline.

Setting up the Data Pipeline

thumbnail image

The following list explains the different components of the pipeline in the order:

1. XML_Source: The XML file from the link shared above is copied onto a cloud storage location and read using the File Storage Source Component

2. XPathToBag: This step calls the XPathToBag function to match the XPath '/catalog/book'. This fetches all the books under <catalog> </catalog> in a Bag datatype. For example, XPathToBag(data,'/catalog/book')

3. Flatten_Books: Uses the Flatten() function to get the books as individual records each record of the structure as 

            thumbnail image

4. XPath: In this step using the XPath function, the individual elements of the book structure can be retrieved. Here is a peek into the component with the XPath set up for the above <book> </book> structure

        thumbnail image

For additional reference on XPath and examples, refer to an XPath evaluator such as

5. Destination: The individual fields processed from the XML are stored in a destination, in this example, it is a BigQuery table.

The following image depicts some example records from the output:

thumbnail image

Parsing the XML from a file or an API response into a tabular structure would be key for having data lookup, and blending with other datasets could facilitate further data analysis.


There are several enterprise systems that consume and output XML data, and as a trusted document-based information transfer, XML based files and APIs can come up often as use cases. Stop by and explore the functionality for processing the structured data formats on For more individualized instruction and information, contact us to book a risk-free demo.