Warning About a Recruitment Scam >> Click Here

Integrate.io & Security

The Complete Guide to Data Security by Integrate.io

Oscar sage
Chapter 5

Working with a Trusted ETL Partner

Extract, Transform, Load (ETL) is a core process that allows you to store data in a secure repository. The process goes like this:

  • Extract: Obtain data from live production systems, such as CMS, ERP, eCommerce, marketing automation and so on.
  • Transform: Integrate and transform the raw data so it’s suitable for storage in a data repository.
  • Load: Send the transformed data to a secure storage location, such as a data warehouse.

During this process, the data passes through each of the three data states, as shown here →

Data in transit can be vulnerable, especially when it is moving outside of the on-premise data environment, so must be encrypted. As most organizations are now reliant on cloud-based warehouses, this kind of data movement is an inevitable fact of life.

The main approaches to ETL are to build your own solution, install an off-the-shelf ETL locally or use a cloud ETL service.

Pipeline
What is it? Pros Cons
In-house development Internal dev team creates a bespoke ETL for your specific needs Full control and transparency of all aspects of ETL You need an in-house team to build the solution and provide ongoing support
Local ETL Install Purchase an ETL solution and install it on your on-premise infrastructure Control over configuration without needing to develop the software from scratch Difficult to upscale and may not integrate securely with cloud- based warehouses
Cloud ETL A third-party service manages your ETL needs across the cloud Simple, no-code integration with cloud and on-premise services, with a trusted partner guaranteeing security Works best with other cloud services, such as AWS, Salesforce, and cloud-based analytics tools
How do Cloud ETL providers guarantee data security?

Cloud-based ETL provides your data with a single point of egress from the network. Rather than having multiple pipelines connecting each production database to a repository, each production system has a secure connection to the ETL service. The ETL then has a separate connection to the data repository.

Data makes a pitstop on the ETL servers, where it passes through the transformation layer. With in-house or local solutions, this stage can be vulnerable.

However, a good cloud ETL provider such as Integrate.io takes significant steps towards protecting data security. This includes things like:

  • Security-first development process: Any reputable ETL vendor will start with security protocols before they even begin developing the service. As a user, you can identify trustworthy products by examining their security details and seeing if the provider has baked security into their product.
  • Physical security: The physical location of your data in transition is a major factor in security. Reputable vendors will guarantee that all servers are safe. Integrate.io uses an AWS infrastructure, which sits on Amazon’s SOC-compliant data centers.
  • Working with reputable vendors: ETL is an interactive product by nature, with automatic integration to other services. ETL vendors must carefully vet and monitor all of their partners to ensure that their customers are not exposed to risk. The ETL vendor also stays on top of changing API requirements to ensure that integrations always meet current requirements.
  • Plans for disaster recovery: ETL vendors provide a critical connection between live systems and repositories. Real-time analytics services are dependent on an uninterrupted flow of data between these points. The ETL vendor should have a robust plan for maintaining service in any circumstances.
  • SOC 2 compliance: Routine testing is a requirement of this security standard. Integrate.io, for example, undergoes third-party penetration testing each year. When choosing an ETL vendor, ask to see their SOC 2 report & PenTest results before signing up.
Continue reading
Chapter 6
Essential Cloud ETL Data Security Features