What is Data Integrity?

In short, data integrity is the completeness, accuracy, and consistency of the data collected and reviewed by businesses to gain insights they can use to inform wide-scale decision-making. Actionable data can propel a business forward — but only if that data is reliable. The reliability of data affects all aspects of a company, from sales and marketing to operations. Poor data integrity can cause the decision-making process to collapse, taking the success of the business right along with it.

Key Features of Data Integrity

An oversimplified view of data integrity is a dataset that's free of errors. This is crucial, but it is only a small part of the process. Think of your collection of data as business knowledge. You want that knowledge to be consistent, relevant, and actionable. If it does not comply with guidelines for data integrity, it is no longer useful.

Analysts define data integrity not just in the business case, but also in the data management context. This is how a data analyst may view the information as it relates to existing databases and systems. The question for the data analyst is not, "does this achieve data integrity goals for improving sales?" but "does this meet the data integrity parameters of the current systems?"

To make this distinction a bit more clear, consider the traits an analyst looks for to determine data integrity:

Completeness: Datasets are complete if all available fields are populated. Suppose your CRM tracks customer name, address, email, telephone, amount of last purchase and aggregate spend over a lifetime. A customer record that includes only the name and address is not complete.

As a result, feeding information from various sources into your CRM may reduce data integrity. Perhaps one source only tracks customer name and purchase amount. This data, although it increases what you know, can reduce overall data integrity when combined with a dataset with more complete information.

Uniqueness: Moving information from a source into a data warehouse can cause the same record to appear twice. Duplicate records are not just a waste of resources. They are another factor that reduces data integrity. Maybe you made a sale to the same person through your website and through one of your sales agents. That customer will appear in your system twice, and when you combine the data sets, you'll want to ensure your CRM has just one record of that individual.

Accuracy: Technology advances have not eliminated the potential for error in data. In fact, human error remains one challenge of reaching data integrity. Mistakes take many forms: A sales agent misspells a customer name when typing it into the system. A client inverts digits in a phone number when filling out an online form. Or an analyst accidentally mislabels a spreadsheet, deletes a row, or combines cells so the information is no longer valid.

Getting to the right information is one goal of data cleansing, one aspect of transforming data so it achieves greater reliability.

Consistency: Communication is as important for data as it is for anything else in the business. Data consistency requires that data communicates the right information and the same information every time. There is no consistency when unique sources use the same word to refer to different things. When data from those sources merge into a single list, it has a low level of integrity.

Suppose your field sales team has a data field called "clients," and your website backend also has a "clients" record. Your sales agents always consider a client a client, no matter the number of months since the last sale. The website, by contrast, purges the "clients" list every six months. In this case, the combined record of "clients" is not consistent. The field sales team records are "all clients," while the website records are "current clients."

Validity: Data analysts want to link a piece of data back to a source. This tracking capability helps to ensure that the data is valid. Knowing data lineage lets analysts remain confident in the robust nature of the information. It also helps the ongoing monitoring of system data flows, as information is constantly moving from one place to another. Data lineage is traceable even if the information has undergone transformation through the Extract, Transform, Load (ETL) process.

Timeliness: Data may be correct, but if it is not up-to-date, it detracts from overall data integrity. It may be true that a client purchased an upgrade 13 months ago. But that piece of information is less valid if the record does not also reflect that the client reverted to the lower-priced package six months ago. Timeliness is about staying current. You should have confidence that looking at your data, what you have on file includes what's happened most recently.

Using Xplenty to Ensure Data Integrity

ETL is the most efficient way to move data from many sources into one integrated source. But this process has to do more than drive information from points A, B, and C to point D. It must also ensure that the refined data's integrity is as robust as possible. The Xplenty platform not only helps data experts build ETL pipelines, but the simple, easy-to-use, no-code interface also allows anyone in your organization to refine your data. Contact us today to learn more about our 14 day trial.

Share This Article
facebook linkedin twitter

Glossary of Terms

A guide to the nomenclature of data integration technology.