Let's simplify ETL to identify seven criteria that shouldn't be overlooked. Although these requirements seem simple, my purpose is to highlight that ETL requirements will differ if your company is looking at operational data pipelines vs. analytical use cases.
Here are seven general requirements:
Extract - Data extraction represents the first step of ETL but also requires considerations based on where data is headed and what the purpose is. Data engineers may look at the extraction process differently depending on what the data will be used for.
Cleanse - Cleansing data is a part of ongoing data quality. Organizations need to have reliable and trustworthy data otherwise no one will consume it, and those who do will not trust what they are interacting with. Clean data also means creating standardization for fields and flagging anything that does not comply.
Transform - This will depend on where the data is being stored. Transformations may require more effort in a data warehouse than in a lakehouse. And source data may require different transformations depending on their destination and purpose - for instance operational and analytical needs will always differ.
Load - Loading data aligns with automation of processes and also requires processes in place to identify loading errors.
Analyze - It is always important to analyze the success of ETL processes. Many organizations analyze their metadata. Others provide analytics about business domains. In this case, it makes sense to make sure your organization has an understanding of data pipeline usage and levels of success.
Automate - Automate as many processes as possible to create faster time to delivery and save time across functions.
Right time access - An important part of ETL is ensuring that data is delivered in a timely fashion based on business needs and delivery.