Ready to get into the Inmon vs Kimball debate? Let's start at the beginning of this big data duel (one of the more fascinating rivalries in the industry, in our humble opinion).
In his article "Turbocharge Your Porsche - Buy An Elephant," Bill Inmon, "the father of data warehousing," criticizes Cloudera for associating Big Data with the data warehouse, two totally unrelated terms according to him. His old rival, Dr. Ralph Kimball, takes the opposing view by presenting a webinar with Cloudera about building a data warehouse with Hadoop.
This marks a new round in the fight between these two academic geezers, a decades-long argument over what is a data warehouse and its implementation. So, let's dive into this - and the Inmon vs Kimball debate.
Table of Contents
- Inmon vs Kimball: Top-down or Bottom-up?
- Inmon vs Kimball: Who's Right?
- Clarifying Inmon vs Kimball: How Integrate.io Can Help
Inmon vs. Kimball: Top-Down or Bottom-Up?
Inmon and Kimball published two radically different approaches in the 1990s on how an organization should manage its data for reporting and analysis.
Inmon’s approach, also called top-down, is to have "one version of the truth," a great entity that contains all the information for the entire enterprise in one place:
, an entity called the data warehouse. According to him, the data warehouse should operate in a relational format and store all of the organization’s atomized data. Once the data warehouse is fully designed and put in place, only then can you add small data marts for different departments to query data from the central data warehouse and store it in various dimensions.
Kimball sees this differently. He suggests that an organization should first build small data marts for each department. The data marts should contain facts and dimensions relevant to the business area and store them in a star or snowflake schema. Kimball says the data warehouse is essentially a union of all the data marts. Accordingly, his version is "bottom-up."
Related Reading: Data Mart vs. Data Warehouse
Their methodologies have evolved over the years. Inmon’s DW 2.0 version allows room for unstructured data as part of the data warehouse - while Kimball talks about eventually integrating the data marts into one data warehouse. In a presentation made by Inmon himself, he criticizes Kimball for only realizing now what his approach suggested over 20 years ago.
Why does Inmon criticize Cloudera for mixing up data warehouses with Big Data? Because, according to him, a data warehouse is a methodology, while Big Data is a technology. Therefore, these terms are not in the same category, meaning there should be no comparison with one another.
Inmon vs Kimball: Who’s Right?
Kimball's "bottom-up" approach of data marts seems to be more popular beyond the walls of academia since most companies prefer to start with something small that works rather than spec endlessly only to run the risk of creating a monster. Sometimes, however, there is already a data warehouse in place. When this is the case, the warehouse usually gets implemented by a relational database queried directly and used for online analytical processing (OLAP).
Although Inmon argues that a data warehouse is just an architecture, people use the term on a day-to-day basis to refer to an actual technology (e.g. "Our data warehouse isn’t fresh - the nightly process failed again!"). In that sense, Apache Hadoop could be part of the data warehouse, for example, as cheap data storage, or as part of the data processing performed before analysis.
Ironically, Big Data may fulfill the "top-down" vision that Inmon preaches - a central repository with one version of the truth where structured and unstructured data is stored together. Inmon insists on seeing a data warehouse as distant from Big Data as a Porsche is from an elephant. However, they are more like commercial jet planes and the huge Airbus A380 - the airliner with a vast capacity that can handle today’s busy air travel needs.
Also, if you are going to have one central data warehouse with all the information, it is going to have to handle data that comes in high volume, velocity, and variety. But isn’t that the very definition of Big Data?
In fact, it is. Inmon himself argues in his architecture for the need to store a variety of data as part of the data warehouse. If so, why would Inmon protest so harshly against mentioning Big Data and data warehouse in the same sentence? Could there be another reason?
Perhaps. Around the time Inmon published his article bashing Cloudera’s so-called mix of Porsches and elephants, Cloudera announced their webinar with Kimball. Could the timing for Inmon’s article and Kimball’s webinar be a coincidence? Or maybe (in the words of a thousand 1990s action films), this time, it’s personal.
Clarifying Inmon vs Kimball: How Integrate.io Can Help
When it comes to the Inmon vs Kimball debate, we at Integrate.io are pretty unbiased. We're just fascinated to see the thoughts that come from these two great Big Data minds.
No matter what side of the Inmon vs Kimball debate you come down on, Integrate.io's cloud-based solution provides the easy-to-understand visualized data pipelines for automated data flows that your company needs, across a wide range of sources and destinations. Our powerful on-platform transformation lets you transform, normalize, and clean your data, all while adhering to compliance best practices. With Integrate.io, your business can quickly and easily benefit from all the opportunities big data presents without having to invest in hardware, software, or related personnel.