5 Key Points
- Bill Inmon argues that data integrity is a critical component of data warehousing.
- Inmon also says data warehouse users need unrestricted access to all kinds of data.
- That data includes textual data, which is often processed through natural language processing.
- For successful data integration, data warehousing shouldn't have any limits.
- Integrate.io is a platform that can improve data integrity and help businesses move textual data to a warehouse without advanced coding or data engineering knowledge.
This is a guest post for Integrate.io written by Bill Inmon, an American computer scientist recognized as the "father of the data warehouse." Inmon wrote the first book and first magazine column about data warehousing, held the first conference about this topic, and was the first person to teach data warehousing classes.
This week, I was part of a panel discussion with several data warehouse and hardware vendors from Silicon Valley. As part of the discussion, one of the vendors said: “What users want is performance. They want their queries answered as fast as possible. They don’t want to wait an hour or more for a query to be processed.”
What the vendor said was true. Sort of. He got part of the story right but not the whole story.
Data Integrity is a Critical Component of Data Warehousing
But before I go a step further, let me sincerely thank the hardware vendors for making data warehousing possible. If it weren’t for the capacity and speed provided by the hardware vendors and the advances made, we wouldn’t have the data warehouse today. So the statement made by the vendor was certainly correct. Thank you, hardware vendors. I appreciate you. So do the many companies that use warehouses to generate business insights about customer service, marketing, and inventory management.
But the vendor was only looking at one aspect of data warehousing. Another aspect is data integrity. Here’s an example: It doesn’t do you any good to process an Excel spreadsheet quickly. Why? Because anyone can write anything they want on Excel. I can assign myself a salary of $1,000,000 a month on a spreadsheet. And I can process that spreadsheet very quickly. But if the information I am processing is fiction, it doesn’t matter how fast I am processing it. So another essential element of what the user wants is integrity of data.
Integrate.io helps companies improve data integrity when moving data to a warehouse. Users can enhance data accuracy and consistency and remove inaccuracies. Chat with our team to learn how this platform performs ETL, reverseETL, and fast CDC.
Read more: Blending Data in the Data Warehouse
Successful Data Warehousing Requires Unrestricted Access to All Kinds of Data
There is another element that users need from data warehousing. They want unrestricted access to all kinds of data. Currently data warehouses process almost exclusively transaction-based structured data. Now, there is real value in doing that. But what data warehouses do not process frequently or at all is textual data.
It is true that textual data has some processing capabilities in data warehousing. But nearly all textual data processed is done so under the aegis of NLP (natural language processing). But take a look at NLP. It has some inherent flaws in it that the NLP community is apparently not aware of. Otherwise, there would be widespread usage of NLP processing. (There isn’t.)
Now there is an innovative approach toward processing text in a data warehouse called textual ETL or textual disambiguation. The world is discovering, one company at a time, that you can start to process text analytically in a data warehousing context. In doing so, whole new opportunities for the usage of data are opening up. Textual ETL/textual disambiguation has found solutions to the limitations of NLP.
This new, innovative technology opens up the doors to whole new vistas of information processing and expands the universe served by data warehouses. Textual disambiguation provides access to untapped business value.
When the vendor made the statement about data warehouse users wanting speed, he was tacitly implying that the user wanted speed for data that came from the world of transaction processing. The way I would have said it would have been: “The user wants speed for queries against ALL sorts of data, not just structured data.”
Moving unstructured textual data to a textual or traditional warehouse requires advanced knowledge of coding and data engineering. Integrate.io helps enterprises integrate data to a supported warehouse via its native out-of-the-box connectors.
Final Word: Data Warehousing Shouldn’t Have Any Limits
Limiting the data warehouse to only one kind of data is like building a fancy, efficient firetruck for the Kansas residents of Kansas City. If you live in the Missouri portion of Kansas City, you can’t make use of the new and improved fire truck. And that doesn’t make any sense. Just because you live in Missouri, you should be able to use the fire truck to put out your fire. What I’m trying to say is: Just because you have a data warehouse you shouldn’t restrict yourself to transaction-based data. There is a whole world of opportunity waiting in a place that hasn’t been exploited.
So the vendor making the statement about data warehouse speed should have said: “What users want is fast execution against reliable data of all kinds of data.”
Now that is what the end user really wants.
Integrate.io integrates textual data with a supported data warehouse. Integrate.io’s philosophy is to simplify data integration and improve data integrity.
Bill Inmon, the father of the data warehouse, has authored 65 books. Computerworld named him one of the ten most influential people in the history of computing. Inmon's Castle Rock, Colorado-based company Forest Rim technology helps companies hear the voice of their customers. See more at www.forestrimtech.com.