With the ever-increasing volume of data being generated from a highly diverse set of more efficiently and effectively. Indeed, in the current decade, having a robust is key to an organization’s success, and timely is what every management is striving for today., organizations have started to increasingly direct their focus on solutions that can help them with
For many years, and dominated the scene. A essentially being a central repository of structured data while are an all-purpose data storage solution where both structured and unstructured data could be stored. However, this has now started to pose problems in terms of and .
Table of Contents
Major Components of Framework
How Differs From a Centralized Data Repository
What is a?
big data in an ever-expanding environment.or a as defined by is an approach to that makes it easy to share, access, and manage
Usually, ais organized in the form of a stand-alone or a which is centrally managed by a .
In contrast, ahelps by simplifying the flow from to data consumption. The idea is to have each , create, process, manage and publish relevant data via a platform that can be used across for various purposes.
The aim is to remove the commonthat organizations face with their and data .
A Brief Look At the Evolution of Data Architecture
During the late 1990s, theconsisted entirely of a where the need was to store structured data in relational databases.
With time, asbegan to expand, organizations moved towards to store unstructured data in real time.
Moving forward, a hybrid system evolved with both data lakehouse where teams would process the unstructured data in a and load it into the .and combined into a
Later, with the advent of the cloud, storage capacities expanded, allowing organizations to ingest as much possible.
However, all of the above data integrity. Indeed, the autocratic structure was not viable since the team had to deal with a number of stakeholders. Also, delivering data was problematic since the central team lacked domain-level expertise.were centrally managed and this created problems with
was yet another issue as it was difficult to track down the to resolve data-related issues.
And today, we, therefore, have thewhich is a move toward data . Such improves upon the obstacles that are a result of a single architecture.
It also solves the problem of teams working inwhich occurs when each team has to simply reach out to a central without collaborating with others. Such a disconnect hurts agility and prevents .
Ais basically a form of that has long been used by software developers to speed up delivery and integration of features through .
Likewise, with a, each is responsible for managing its . Since each can have its own unique , the helps by ensuring that each team has access to the relevant to perform tailored analysis.
One way of understanding this is to imagine a large restaurant with various chefs specializing in certain cuisines where each chef requires a certain set of ingredients.
Now, instead of having one person responsible for buying all of the ingredients, each chef can buy their own ingredients and share them with others if needed. With each chef responsible for his/her own ingredients, the process becomes highly scalable and adaptable.
Imagine a rush hour in which different customers are demanding different types of dishes. If each chef were required to go to the ingredients guy, it would undoubtedly result in a huge mess.
Quite similarly, amakes each responsible for its , and doing so ensures that is maintained.
Major Components ofFramework
The four principles of an effectiveare by the , , , and . But before we can go into the details, a word is warranted on the fundamental objectives of a .
Aneeds to achieve high . Like the analogy of the restaurant, simply means that one team’s or ingredients can be used by others without any hassle.
Secondly, theof should be easy enough for any team to find the relevant data in time.
Also, security is key. With, it becomes quite difficult to ensure that data is secure. So strict guidelines need to be in place to prevent security breaches.
The goal of a central data repository was to ensure a high quantity of data. But with a, high quality is what takes the center stage.
The-a-product principle is more of a mindset that comes with a rather than a technicality. It is this that helps with across .
In a, each creates its that can be used by others. For example, the sales team can create a nice and clean sales that can be later used by a for some model.
Since ais a decentralized system, it is crucial that certain standards are followed to ensure consistency across and prevent data . So a usually involves a central team that outlines certain practices that need to be followed by all when publishing their .
For instance, this can be the file formats and naming conventions that need to be used when creating a.
Thepractice of a ensures that each team is prevented from having to constantly go to a central to fetch raw data. For instance, the finance team might simply use some financial that a might have already created.
Theshould, however, have clear details that can easily tell the finance team what each column in the means along with other information such as the date of creation, usage, etc.
Thesystem makes it easy for various to access relevant and get the most value out of the organization’s .
With aapproach, a culture of ownership prevails. Indeed, one of the of a was exactly that it was difficult to identify of a certain .
With a, the created by the would make it easy for everyone to identify the and communication amongst different teams would be incredibly streamlined, as the could be directly approached without any friction.
Each team would bear the responsibility of ensuring the operability of itswhich would enhance the quality of the of the entire organization.
Which Technologies are Required for Building a Robust?
Below is a list of technologies that are acting as a catalyst to the adoption of a.
DataOps is an emerging practice borrowed from the domain of DevOps in software engineering. DataOps mostly involvesof the so as to speed up the data delivery .
So instead of having a largeteam manage manually, with DataOps, each can deploy various tools that help automate jobs and speed up the integration process so as to ensure timely delivery and high-quality .
With the ever-increasing presence of cloud platforms, organizations today need not worry about having physical on-premises servers to store their data.
Rather, services likeor Google Cloud can help teams migrate data onto the cloud and transfer the responsibility of maintenance and integration to the cloud solutions vendor.
is one of the most crucial elements of a . A is similar to a catalog that you may find in a library to get information on a certain book.
Thefeature now comes in-built with platforms such as the of Google BigQuery, or Lake Formation in .
Acan help different teams understand . For instance, the sales team can create a for customers and also give information about the columns, , the date of creation, etc. Such can help, say, a to extract more insights when building a prediction model for example.
Organizations are relying more and more on external general-purposethat can be integrated with internal to give a deeper understanding of a certain problem. A data marketplace facilitates this since it is itself, an online store where different can be purchased
With a data virtualization platform, an organization can bring all itsin one place and the different teams can connect to the relevant sources directly for different types of analysis.
With data virtualization, instead of having to replicate afrom scratch, one can simply create a view and perform the required analysis as needed.
HowDiffers From a Centralized Data Repository
At this point, it should be somewhat clear as to whatis and how it differs from a central data repository.
Basically, ais a move towards a more democratized system where each can manage its , whereas, a centralized repository is managed by a single which handles all the access and delivery issues across the entire organization.
Limitations of Centralized Data Repository
As mentioned in the introductory section, a number of problems arise with a centralized system. Firstly, since a centralis managing access, it can lead to long delays if more and more access requests start coming from the .
Secondly, in a centralized system, pre-processing ais the responsibility of the . However, this requires a lot of knowledge which a single team may not have.
Lastly, no one really knows who the actualis in a centralized system. It is just one single team preparing and delivering a .
However, this does not mean that ais the perfect solution.
Ais as successful as the dedication of each . Since a transfers the onus of to those who understand their data well, a carefree attitude of these so-called experts can put the system in jeopardy.
Ais not just a tool. Rather, it is an approach that involves certain best practices. If they are not followed properly, the mesh can fail to meet its expectations.
Implementing acan be a long process. To begin, an organization can start by proliferating the idea of a product with some forward-thinking members in the company. These members can form a team to identify the data requirements of each .
One of thecan be selected to work on a . At the same time, the existing infrastructure can be modified or a new one built to support this.
Once this is built, the success can be shared with other teams as well. Gradually, as the practice expands, asystem can be developed to ensure that the is self-governing and sustainable.
Ais indeed the future of . Organizations, however, need to rethink the architectures they have in place and then consider whether a change is really needed.
Aisn’t for all. If your organization is not that big or if you are not facing any problems with the central , then perhaps you are better off with a centralized system. However, if you are experiencing rapid expansions, and having to deal with a lot of different coming in, then a might be the way to go.