- Data mart is for a specific company department and normally a subset of an enterprise-wide data warehouse.
- Data marts improve query speed with a smaller, more specialized set of data.
- Data warehouses help make enterprise-wide strategic decisions, data marts are for department level, tactical decisions.
- Data warehouse includes many data sets and takes time to update, data marts handle smaller, faster-changing data sets.
- Data warehouse implementation can take many years, data marts are much smaller in scope and can be implemented in months.
What's the difference between a data mart and a data warehouse? And, are data marts still relevant in today's cloud-first world? Let's dive into the definitions of data marts and data warehouses, the use cases for both, and the role of data marts in today's cloud ecosystem.
Table of Contents
What is a Data Mart?
A data mart is a segment of your data warehouse that is reserved for use in a specific area of business. It breaks the entire dataset into manageable, relevant chunks, such as data related to the finance or marketing department of a business.
Modern businesses capture a huge amount of data - structured and unstructured - every day. Given the volume of data, running queries against the entire dataset can be time-consuming. End-users would, typically, have to write complex queries just to fetch relevant data, before it can be analyzed. By breaking up data into business roles, data marts allow much faster access to relevant information. In turn, they expedite the process of fetching data insights.
To put it another way: If sales want some cheese, marketing wants some turkey, and legal wants some bread, you don't want to bring a sandwich around and have them deconstruct it one-by-one. With data marts, you give each of them what they need.
What is a Data Warehouse?
A data warehouse is your central data repository that has the entire dataset of the business. Controlled access to data within a data warehouse is important to conform to data privacy laws. Moreover, as mentioned earlier, running queries against an entire data warehouse can be complex for end-users.
Data marts segregate data according to business functions to make it easier for end-users to query it. Segregation of data can happen from an existing data warehouse. It is also possible that different business functions create their own data marts. These data marts can be merged to form a data warehouse.
Data Mart vs. Data Warehouse
Let's take a look at the fundamental properties of a data mart vs a data warehouse.
- Size: Under 100GB
- Subject: Single-subject
- Sources: BI tools specific to a business line
- Scope: A single line-of-business or multi-functional department
- Decision: Supports tactical decision making that leverages single department goals and tracking to build a bigger overall picture
- Cost: $10,000 - $100,000 depending upon the scope of the data mart, integrations, and ETL
- Integrations: The integrations necessary for a specific line-of-business
- Size: Over 100GB
- Subject: Multiple-subjects
- Sources: Internal and external resources that make up your businesses data loads
- Scope: Multiple lines-of-business
- Decision: Supports strategic decisions that impact the entire business
- Cost: $100,000 + for in-house data warehouses, but costs are significantly lower when you leverage cloud solutions
- Integrations: All business integrations
Types of Data Marts
There are three primary types of data marts within a data warehouse for typical business use.
- Dependent Data Marts: With dependent data marts, you can pull all of your data from a central data warehouse. There are definitely some benefits to this method (namely data centralization.) The primary purpose of developing data marts within your centralized warehouse is to improve performance (queries are run at the data mart level) and to give KPI tracking capabilities — since you can go in and develop KPI trackers for each data mart.
- Independent Data Marts: With independent data marts, your data marts aren't connected to the centralized data warehouse whatsoever. This may seem contradictory to the purpose of data warehousing (leveraging multiple data streams to create informed decisions) but it can be useful for short-term goals and rapid implementation. Often times, independent data marts are constructed during the process of building more complex dependent data marts — though many choose to skip this step altogether.
- Hybrid Data Marts: Hybrid data marts combine both data warehouse data and data from separate systems (i.e. your tech stack, etc.). Usually, these are leveraged for ad-hoc integrations or situations where you need to utilize the data from disparate sources immediately. Ideally, you want to integrate these disparate sources into your warehouse — circumventing the need for a hybrid system.
The Benefits of Data Marts
There are a number of reasons why you might decide to use a data mart:
- Allows users to access the exact data they need to make informed decisions
- Reduces risks and allows you to monitor and segment data usage
- Can build KPI trackers for each data mart
- Easy to integrate with BI tools
- Gives you a clearer look at data for each specific line-of-business
- Better performance since you can run queries at the data mart level
- Cheaper to build than a data warehouse
- Can framework data for better business-line accessibility
- Departments can control their own data workloads
- And data marts can act as building blocks towards a more robust data warehouse
Data Mart Structure and Data Modeling
Data marts play a critical role in data warehouse design. Depending on the data modeling method (or schema) you use, the way you construct and utilize data marts will differ wildly — which impacts the overall construction of your data warehouse solution.
There are tons of different data modeling methodologies that you can use for your business, but we'll cover the two main models — Bill Inmon's Top-Down and Ralph Kimball's Bottom-Up.
Bill Inmon's Top-Down Approach
Bill Inmon defined a top-down approach for data mart construction. In this model, the data warehouse is constructed first, and it's the primary repository for all business data. Then, data marts are constructed ad-hoc to meet specific business needs. So, the data warehouse is constructed without data marts in mind, which are later added when specific business lines require them.
A Top-Down approach follows this model — data warehouse > ETL > data mart > OLAP cube > interfaces
There are a few key benefits to the Inmon model:
- Data is kept consistent since the data warehouse acts as the central repository for all business data.
- Data integrity is incredible due to the "data warehouse first" approach.
Ralph Kimball's Bottom-Up
Ralph Kimball has a reverse methodology for data modeling. This approach could be considered a bottom-up approach, but most define it as a "star schema" or (with a few tweaks) a "snowflake" schema. With Kimball's approach, data marts are constructed first, and the data warehouse is defined as a "union of all the data marts."
A bottom-up approach follows this model — data mart > ETL > data warehouse > OLAP cube > interfaces
There are plenty of benefits to the Kimball model:
- Data marts aren't separated from the data warehouse leading to easier construction and natural integration with many of the leading BI tools.
- Analytics can access data in the data marts directly without going through the data warehouse first.
- Kimball's model is cheaper to build and has rapid scaling (i.e., you can start small for cheap and scale up slowly to meet business needs.)
Data Marts and ETL
Since your ETL tool is the fuel that drives your data warehouse, understanding the role of ETL and data marts is crucial. Remember, data marts are slices of your data warehouse. If you choose to build a warehouse using Kimball's bottom-up modeling, you'll run ETL loads directly against data marts. So, you'll build your data pipelines from your data marts into your data warehouse.
If you use Inmon's top-down approach, you'll run ETL against the data warehouse into your data marts. So, ETL loads are run directly against your data warehouse and data lakes.
Ideally, you should choose an ETL tool that can support both of these workflows. Integrate.io has easy-to-setup pipelines and rapid integration with all of the leading BI tools. Plus, it works perfectly at both the data mart and data warehouse level of ETL.
Data Mart Structure: A Top-Down or a Bottom-Up Approach?
Like everything, it depends on your needs. Both approaches have their pros and cons. The bottom-up approach is great for quick implementation. You can scale it up as your needs evolve. The top-down approach gives you a solid foundation to build upon. However, it can be expensive to build an on-premise data warehouse.
That's where the cloud comes in, and the balance tilts in favor of the top-down approach. Modern cloud-based data warehouse solutions (e.g., Redshift or BigQuery) allow businesses to have a central data repository without the exorbitant costs of an on-premise warehouse. Most of them have a pay-as-you-go pricing structure, making them easily scalable. And a lot of these data warehouse solutions have complex data modeling that differs widely from the two approaches mentioned above. Redshift and Snowflake, for instance, have their own hyper-complex data modeling that is much more efficient at organizing data for analysis.
With the advent of the cloud, the trend of independent data marts is on the decline. Instead, businesses are increasingly opting for dependent or hybrid data marts. With storage and compute capacity available at scale, it has also become easier to create transient and long-lived data clusters.
Integrate.io Makes Integrations Easier
Data marts can help you segment your data warehouse into slices of hyper-usable blocks that increase performance. If you are using the cloud to build or maintain a data warehouse, data integration is key for good data management. Integrate.io's automated ETL platform gives you a low-code solution to getting data into your data marts. The cloud-based platform has a visual interface and integrates with most of the popular databases and warehouses. Schedule a demo to learn more about how Integrate.io can help make your big data management more efficient.