What is data mesh?
Data mesh is being talked about a lot to describe the way data is managed across the organization. But what does it really mean for your organization’s data management strategy and how can its framework support your business needs and drive data pipeline success? On a high level, data mesh is about connecting and enabling data management across distributed systems. So instead of only leveraging a data warehouse (DW) or creating a single source of truth and centralized data architecture, organizations develop strong governance frameworks to manage data access irrespective of source, across the organization. In many ways, this is the reality for most companies because important information assets exist across systems anyway. Some companies choose to leverage a data warehouse for all analytical outputs, but as data management becomes more complex, the ability to do so successfully becomes more challenging.
Data Warehouse vs Data Mesh: is it really an either/or decision?
Many businesses feel that it is an either/or decision. Organizations need to balance agility and changing business requirements with access to trusted and governed data to drive relevant and accurate business outcomes. Although a centralized data warehouse will provide strong analytical outputs, once data needs shift, the business requires a way to shift as well. Unfortunately, even today, many data warehouses are not as agile as needed to meet the demands of a fast-changing environment. This is why shadow IT exists, and how several data visualization companies have made their fortunes - by enabling quick access and insights to data outside traditional DWs. Taking all of this into account has led many to adopt a distributed, or decentralized approach to data pipeline management. Data mesh lets businesses access and analyzes disparate data without forcing a DW-only view of the world.
Identifying the better approach for any company requires a detailed analysis of business and technical requirements, needs, and priorities, and a way to maintain strong governance across distributed systems. Allowing access to data as needed lets organizations mitigate risks related to the visibility into operations. This may be why many see data mesh as the way to best manage complex data infrastructures. At the same time, unless this is coupled with strong data privacy, compliance, and security strategies, the risks may exceed the benefits.
Choosing sides for strong data outcomes
The real debate is not about selecting one approach versus another. Organizations require a way to build the best data management infrastructure to suit all of their various needs. Most organizations need to work with what they have and require a way to build data pipelines that support both approaches, and choosing one or the other is not an option. These considerations focus on identifying why, for most organizations, it is not an either/or approach, but requires a complex set of data pipelines that integrate both frameworks.
5 Considerations for managing a hybrid data mesh/data warehouse environment
The following are by no means comprehensive but provide an initial set of considerations to take into account when evaluating what data belongs within your DW and how to create the right data pipelines to support a data mesh.
1) Where does your data currently reside and how is it being accessed?
The question isn’t as important as an evaluation of what is currently working. For instance, if all of your analytical data resides in a DW, are you getting the information you want when you need it to be able to take action and not simply consume data? Are you able to access disparate systems that meet the organization’s latency requirements? Etc.
2) Does shadow IT currently exist or are there groups creating non-sanctioned data sources and data apps?
Flexibility and agility are important, but understanding who has access to which data is just as important. For compliance and security, there needs to be a balance between autonomy and structure and the risks are too high to not have governance frameworks in place to manage all data across the organization.
3) Who manages current data pipelines and how are they managed?
Although there is usually a data team who takes lead on how data is managed within the organization, it does not mean these teams own the data. Collaboration is required to ensure pipelines also match business requirements and can be prioritized according to need. One source may need to be accessed multiple times, but the outcomes will be different. Ensuring that business and technical resources are aligned and that data pipelines are prioritized to ensure SLAs are met is essential for better data pipeline development.
4) How often are changes made to data pipelines and what percentage of those changes are related to analytics requirements?
Data storage needs are growing faster than ever before and so are analytics needs. Data warehouses can accommodate small changes but not necessarily large schema changes on a regular basis, at least not with the many dependencies that exist. Data mesh can support more agility but may require a different outlook from an analytics-focused company.
5) What needs to change within your current infrastructure to ensure that decentralized data pipelines also support strong data governance, compliance, and security requirements?
Unfortunately, risk mitigation has become a large part of data management. Luckily there are built-in practices that can help mitigate that risk. Many organizations look at their data warehouse as a way to provide governed analytics access widely across the organization. Although essential in many cases, other instances require a more decentralized approach.
The trends from data centralization to decentralization shift every few years and organizations that have managed longer-term analytics environments tend to struggle with the concepts of building an enterprise data warehouse or leveraging a data mesh approach. The reality is that both exist, and will always exist within most companies. Organizations will continue to need their DWs and will become more focused on building a data mesh to ensure better agility and performance over time.