Most enterprises are leveraging vast reserves of data to improve their business insights and decision-making. However, as companies manage larger stores of data and move more and more information from operational databases to data warehouses, it creates an ever-mounting threat of data breaches.
To combat these threats, most enterprises implement data governance and data management policies that comply with a range of regulations and standards such as GDPR, SOC2, HIPAA, CCPA, as well as the company’s internal data governance rules. For most enterprises, a key part of the data governance process involves the encryption or removal of sensitive data before moving information to a data warehouse.
This is where the concept of “ETLG” (Extract, Transform, Load for Data Governance) comes into play. At Integrate.io, we use the term ETLG to refer to the process of performing the minimum-required, lightweight transformations on your data (for data governance purposes) before loading it into the destination data warehouse. If more complex transformations are required, you can wait to perform them in the warehouse itself. In this respect, the ETLG strategy allows you to satisfy data governance requirements while also allowing you to rapidly ingest data without needing to worry about designing and coding complex transformations beforehand.
In this article, we’ll look at the concept of ETLG, and how it helps businesses satisfy their data governance and compliance rules while achieving the speed and flexibility of rapid data ingestion. But first, we’ll take a look at data governance and data compliance in the context of data integration.
Table of Contents
- Overview of Data Governance and Data Management
- How ETLG Satisfies Data Governance Needs While Achieving Rapid Data Ingestion
- Build an ETLG Strategy with Integrate.io
Overview of Data Governance and Data Management
Data governance and data management are two separate concepts that go hand in hand. Let’s take a look at each one separately:
1) Data Governance
Data governance refers to an organization’s rules, policies, and procedures that ensure the safe and correct usage and storage of information. A data governance policy codifies the data-related rules and requirements an organization will follow, in addition to clarifying the organization’s own internal data security standards.
A data governance policy does not implement any security rules, policies, or procedures. It simply codifies them. In doing so, a data governance policy usually answers the following questions:
- Which employees can access and read specific information?
- Which employees can access and edit or change specific information?
- What rules and processes does your organization adhere to when storing data?
- How long will your organization store different types of data?
- What policies and practices will ensure that stored data is secure?
- How will your organization mitigate the risks associated with storing sensitive information?
It’s important to note that the rules of a data governance policy could require masking, encrypting, or removing sensitive data (such as PII and PHI) before passing information to a data warehouse for BI analysis. This is because industry standards and government regulations, such as GDPR, SOC2, HIPAA, CCPA, etc., may require these security-related, pre-load data transformations.
Since this kind of data encryption policy involves pre-load transformations, implementation needs to occur through an ETL (Extract, Transform, Load) process. An ETL process can encrypt/redact sensitive information immediately after extracting it from the source, and before loading it into a destination data warehouse. In contrast, an ELT process cannot satisfy these pre-load transformation requirements. That’s because all ELT transformations occur after loading the data into the data warehouse. For this reason, when a data governance policy requires pre-load transformations to protect PII/PHI information (which is extremely common), organizations may not be able to implement an ELT workflow, even if such a workflow suits their purposes.
2) Data Management
Data management refers to the implementation and execution of the data governance rules, policies, and procedures. The data management process might involve the following tasks:
- Setting up role-based access control that enforces who can access, read, or edit specific information types.
- Configuring all databases and data warehouses, they adhere to the data storage rules laid out in the data governance plan.
- Configuring and continually managing systems, so they follow industry rules, government regulations, and your organization’s internal data security standards.
- Policing and monitoring the safety of stored data and identifying and resolving any safety risks.
- Setting up a master data monitoring system, which allows the data management team to view all data stats throughout the organization.
Ultimately, data management monitors and carries out the above tasks to ensure that the handling of all data – from the moment the data is created to the moment it is destroyed – adheres to the data governance policy. For example, when it comes to moving data from an operational database to a data warehouse for BI purposes, it’s the data management process that configures, implements, and monitors the data integration workflow according to the governance policy.
In adhering to the data governance policy, some data management processes involve the encryption or pseudonymization of PHI/PII data before loading it into the data warehouse via (1) an ETL process alone, or (2) a mix of ETL and ELT that involves lightweight, pre-load transformations (ETL), and saves more complex transformations to occur in the data warehouse later (ELT).
How ETLG Satisfies Data Governance Needs While Achieving Rapid Data Ingestion
ETLG (Extract, Transform, Load for Data Governance) allows you to reap the advantages of both pre-load ETL transformations and post-load ELT transformations. Essentially, ETLG empowers your data management processes to satisfy the pre-load PII/PHI encryption rules in your data governance policy – yet still ingest data rapidly, allowing you to benefit from the incredible data ingestion speeds and flexible business logic of an ELT approach to data integration.
Essentially, the ETLG workflow might look like this:
- Extract: Pull the data from the source and load it into a staging area.
- Pre-Load Transformations for Security: Perform light transformations on the data to remove or encrypt PII/PHI and other confidential information, and perform simple formatting functions for data governance/management purposes.
- Load: Load the lightly-transformed secure information into the destination.
- Post-Load, more complex transformations: If further transformations are desired, use the processing power of the data warehouse to perform more complex transformations.
In a traditional ETL workflow, all transformations must occur before loading. If transformations are numerous and complex, that can delay data ingestion for certain cases. On the other hand, an ETLG process allows you to quickly perform lightweight pre-load transformations to satisfy data management and data compliance requirements and save the rest of the transformations for later. This offers greater speed and agility when it comes to integrating data from a new source into a data warehouse.
With ETLG, you can also save more compute-heavy transformations for later so they occur within the warehouse itself. This offers greater flexibility to change your data integration process and business logic as needed. It also allows you to benefit from the tremendous power and speed of using a cloud-based data warehouse system to process transactions.
Build an ETLG Strategy with Integrate.io
Now that you’ve learned how ETLG can support your data governance and data management requirements while still allowing you to reap the benefits of ELT, you might want to try building an ETLG workflow yourself. One of the easiest and most affordable ways to build an ETLG strategy is to add the ETL-as-Service platform, Integrate.io, to your data integration stack.
Integrate.io is a powerful, easy-to-use platform that allows anyone, regardless of their data engineering skill level, to quickly build sophisticated ETL processes without writing a single line of code. As an essential data management tool, Integrate.io can perform a handful of lightweight, high-speed transformations that mask, encrypt or remove sensitive data (like PHI, PII) before moving data from one system to another. In this way, Integrate.io can help you adhere to the terms of your data governance policy while keeping pre-load transformations light, fast, and easy for anyone to set up. If and when necessary, you can always perform additional transformations within the destination data warehouse.
If you’d like to try Integrate.io for yourself, contact our team to find out how to get a demo or 14-day trial of the platform.