This is a guest post for Integrate.io written by Bill Inmon, an American computer scientist recognized as the "father of the data warehouse." Inmon wrote the first book and first magazine column about data warehousing, held the first conference about this topic, and was the first person to teach data warehousing classes.
Five things to know about this topic:
- Data warehouses date back to the 1980s. Before these became popular, people used databases for transactional processing.
- Bill Inmon argues that it’s more convenient for data warehouses to reside on a single physical source of data.
- However, a warehouse can exist in multiple physical platforms as long as it follows the principles of the operational system of record.
- Inmon says that people who think a warehouse has to be a single physical source do not understand data warehousing.
- Integrate.io is the leading low-code data warehouse integration platform for data analysis and decision-making.
The data warehouse has been around since the 1980s. Prior to data warehousing, databases were used primarily for transaction processing. One of the goals of transaction processing was to ensure consistently high performance, and one of the ways users could enhance the performance of a transactional database was to only collect a limited amount of historical data. A typical range for data stored in a transactional database was a month’s worth to a quarter’s worth at most.
So, when the data warehouse came along, it stored from one to five years’ worth of data — significantly more than a standard transactional database could contain.
The question then arose: Does a data warehouse need to reside on a single physical database?
Table of Contents
Moving data to a supported data warehouse doesn't have to be a challenge. Integrate.io can perfect the data integration process via ETL, ELT, Reverse ETL, and super-fast Change Data Capture (CDC). Its out-of-the-box low-code/no-code connectors help you move data between locations for more robust data analysis without any of the jargon or technicalities. Try Integrate.io yourself with a 14-day free trial.
How Did Data Warehouses Grow So Large?
Many factors caused data warehouses to grow large. One of those factors was the advent of text being stored inside a data warehouse after being passed by textual disambiguation. As long as the data warehouse only stored transaction-based, structured data, the size of the data warehouse was tolerable.
But when it started to include textual-based data, the size of the data warehouse ballooned.
Should a Data Warehouse Sit on a Single Physical Source?
Certainly, it’s more convenient for many processes if a data warehouse resides on a single physical source of data. But for a variety of reasons, it may make sense for it to reside on more than one.As long as the data warehouse follows the principles of the system of record, there is no problem.
To understand the principles of a system of record, think about a large bank, say Bank of America. The bank has many customers and operates many activities. It has a large, complex set of databases and storage devices. Now suppose you have an account at the Bank of America. Your account balance exists in one and only one place in the bank. If your account balance existed in more than one place, then both you and the bank would have a problem. But the bank follows the principles of the system of record. The bank may have lots of physical databases, but those databases are organized according to these principles. Integrate.io helps your enterprise move data to a warehouse via out-of-the-box low-code/no-code connectors that require no advanced data engineering or pipeline building. Set up an ETL trial or an ELT trial now!
Why Not Split a Data Warehouse into Different Sectors?
There may be very good reasons for splitting a data warehouse into different sectors. One such reason might be the difference in the probability of access to the data. Some data has a high probability of user access, but users might infrequently access other data. Putting the data with a high probability of access in one place and storing the infrequently accessed data in another place is a very good strategy.
Nothing indicates that there should be a single physical location where data should exist in a data warehouse. The only principle that must be followed is adherence to the principles of the system of record. And the system of record can be spread over many different physical devices.
People who say a data warehouse has to be a single physical store of data do not understand data warehousing. Sure, it can be more convenient for a warehouse to reside in one physical source. However, a warehouse can exist in multiple physical platforms as long as it follows the principles of the system of record.
Integrate.io is the new leading low-code data warehouse integration platform for data analysis. Move data from sources to a supported destination in minutes with easy-to-use connectors. Schedule a 14-day trial today!
Bill Inmon, the father of the data warehouse, has authored 65 books. Computerworld named him one of the 10 most influential people in the history of computing. Inmon's Castle Rock, Colorado-based company Forest Rim Technology helps companies hear the voice of their customers. See more at www.forestrimtech.com.