Allowing Integrate.io ETL access to my server behind a firewall

Integrate.io ETL can access your database or file server, allowing you to read and write data easily, regardless of their whereabouts - on the cloud, hosted or on premise. There are three ways to do so:

  1. Directly by whitelisting Integrate.io ETL’s IP

  2. Through an SSH tunnel

  3. Through a Reverse SSH tunnel


Direct Connection

Provide Integrate.io ETL access to your server from Integrate.io ETL's network


SSH Tunnel

SSH Tunnel lets you connect Integrate.io ETL to your database or file server through a bastion host or a tunnel server. This connection type is used if you are unable to add an inbound firewall rule to your data warehouse, or your data warehouse IP address is on an internal network (no outside network access).

Creating Integrate.io ETL Connection

SSH Tunnel connection is supported on database connections and SFTP.

  1. On the dashboard Connections , click New Connection and choose for a connection to use.

  2. Choose SSH tunnel connection as Access type and supply your connection and tunnel host details.
    Then Click Create Connection to generate a unique SSH public key for your connection.
    thumbnail image

  3. Copy the SSH Public key by clicking the copy button and prepare your tunnel host for access.
    Once your tunnel host is prepared, click Test connection and Create connection once the connection is successful.
    thumbnail image


Preparing the Tunnel Host

You will need to prepare your host (either bastion host or tunnel server) by creating an integrate.io ETL user and adding the connection's public key to the integrate-io ETL ~/.ssh/authorized_keys file. Here’s how:
  1. Create group integrate.io ETL

    sudo groupadd integrate-io
  2. Create user integrate.io ETL and its home directory:

    sudo useradd -m -g integrate-io integrate-io
  3. Switch to the integrate.io ETL user

    sudo su - integrate-io
  4. Create the .ssh directory and change permission

    mkdir ~/.ssh && chmod 700 ~/.ssh
  5. Create the authorization_keys file and change permission

    touch ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys

    Using your favorite text editor, add your connections’s public key to the authorized_keys.
    thumbnail image

  6. Allow access to your server's host and port from Integrate.io ETL's IP addresses.

Reverse SSH Tunnel

You can also connect to Integrate.io ETL through reverse SSH Tunnel if you are unable to provide direct port access to your instance.

We recommend that you use autossh which starts an instance of SSH client and monitors it, restarting it as necessary should it die or stop passing traffic. In order to allow Integrate.io ETL to connect to your server through an SSH tunnel, you have to complete the following steps:

Prerequisite

Add a public key in your user settings. The public key will be propagated to all Integrate.io ETL servers in up to 30 minutes.

Creating Integrate.io ETL Connection

  1. On the dashboard Connections , click New Connection and choose for a connection to use.

  2. Choose Reverse SSH tunnel connection as Access type and supply your connection’s hostname, username and port

  3. Click Create connection. 

    thumbnail image
  4. It will fail because we haven't created the tunnel yet, but the light blue box will appear and you will be able to retrieve Integrate.io ETL’s tunnel server (Integrate.io ETL server endpoint that includes everything before the colon) and connection port (the number after the colon).

    thumbnail image

Establishing Reverse SSH Connection

  1. If you're running Windows, see here about opening an SSH tunnel. If you're on Linux, Install autossh on either the target server or a server that has access to it. On Ubuntu/Debian for example, you can install using apt-get:

    sudo apt-get install autossh

    For other Linux distributions, follow the instructions here.

  2. Create directories to keep logs and pid files for the connection. For example:

    mkdir -p ~/MyDB/logs ~/MyDB/run
  3. Add Integrate.io ETL's server public key to a known_hosts file. For example:

    ssh-keyscan -p 50683 <Integrate.io ETL server> >> ~/MyDB/known_hosts
  4. You can test the tunnel using SSH. Use the following syntax and insert your information at the placeholders:

    ssh -NR <connection port>:<my server>:<local port> sshtunnel@<Integrate.io ETL server> -g -i <private key file> -p 50683 -o "ExitOnForwardFailure yes" -o ServerAliveInterval=10 -o ServerAliveCountMax=1 -N -v
  5. Run autossh. Use the following syntax and insert your information at the placeholders:

    AUTOSSH_LOGFILE=~/MyDB/logs/tunnel.log AUTOSSH_PIDFILE=~/MyDB/run/autossh.pid autossh -M 0 -f -N -R <connection port>:<my server>:<local port> sshtunnel@<Integrate.io ETL server> -g -i <path to private key> -p 50683 -o "ExitOnForwardFailure yes" -o ServerAliveInterval=10 -o ServerAliveCountMax=1 -o UserKnownHostsFile=<path to known_hosts file>

    For example, if you open the tunnel to a database that listens to port 5432 on host mydbserver, and the connection's assigned host and port at Integrate.io ETL are tunnel.integrate.io ETL.com and 12345.
    Note that the SSH port in Integrate.io ETL's servers is 50683

    AUTOSSH_LOGFILE=~/MyDB/logs/tunnel.log AUTOSSH_PIDFILE=~/MyDB/run/autossh.pid autossh -M 0 -f -N -R 12345:mydbserver:5432 sshtunnel@tunnel.integrate.io ETL.com -g -i ~/.ssh/id_rsa -p 50683 -o "ExitOnForwardFailure yes" -o ServerAliveInterval=10 -o ServerAliveCountMax=1 -o UserKnownHostsFile=~/MyDB/known_hosts
  6. Add crontab record to run autossh automatically to reconnect after reboots. For example:

    @reboot AUTOSSH_LOGFILE=~/MyDB/logs/tunnel.log AUTOSSH_PIDFILE=~/MyDB/run/autossh.pid autossh -M 0 -f -N -R 12345:mydbserver:5432 sshtunnel@tunnel.integrate.io ETL.com -g -i ~/.ssh/id_rsa -p 50683 -o "ExitOnForwardFailure yes" -o ServerAliveInterval=10 -o ServerAliveCountMax=1 -o UserKnownHostsFile=~/MyDB/known_hosts