Integrate.io ETL can access data residing on any Hadoop distributed file system (HDFS). This article details creating an HDFS connection in Integrate.io ETL.
You must provide Integrate.io ETL access to the cluster's HDFS. Please consult our support team if the HDFS is behind firewall.
To create a Hadoop Distributed File System (HDFS) connection in Integrate.io ETL:
- Click the Connections icon (lightning bolt) on the top left menu.
- Click New connection and choose Hadoop Distributed File System (HDFS).
- In the new HDFS connection window, name the connection and enter the connection information:
- User Name - the user name to use when connecting to HDFS (Kerberos authorization is not currently supported).
- NameNode Hostname - the host name of the NameNode server or the logical name of the NameNode in a high availability configuration.
- NameNode Port - the TCP port of the name node. Leave empty if the NameNode is in a high availability configuration.
- HttpFS Hostname - the host name of the Hadoop HttpFS gateway node. This should be available to Integrate.io ETL's platform.
- HttpFS Port - the TCP port of the Hadoop HttpFS gateway node (Default is 14000).