News of the latest massive data breach is always in the headlines. How can you avoid being next on the list? In order to function, businesses of all sizes and industries need to collect personally identifiable information (PII) about their employees and customers—but they also need to take proactive steps to keep this information secure and defend against PII breaches.

PII substitution is an effective tactic to shield your sensitive and confidential data from prying eyes. But what is PII substitution exactly, and how can you best use it to keep your data safe from information technology threats?

Table of Contents

What is Personally Identifiable Information (PII)?

Personally identifiable information (PII) is any data point that can reveal or help infer a unique individual's identity. The different types of PII include the following:

  • Identifiers: first, middle, and last names; home address; phone number and other contact information; age; date of birth and place of birth; mother's maiden name; gender; race or ethnicity; nationality; ID numbers (e.g. Social Security number or passport ID)
  • Work and education: employee or student ID; workplace or school address; years of work or study
  • Biometric data: biometric templates (i.e. digital representations) of an individual's fingerprints, retinal scans, facial recognition data
  • Internet data: browsing history, search history, IP addresses, mobile app activity, geolocation data
  • Financial information: credit card numbers, SSNs, bank account numbers
  • Healthcare data: medical conditions or illnesses; dates of treatment or consultation; medications or other treatments

What is PII Substitution?

PII data can be tremendously valuable to organizations that collect it. It's also a significant information security risk. The exposure of sensitive information during a data breach can have serious consequences for both the offending organization and the affected individuals.

Many organizations use data masking, also known as "data obfuscation," to improve data security. Data masking is a general term for a variety of techniques that conceal the true contents of PII data. The term "data substitution," or "PII substitution," refers to a subset of data masking techniques in which the actual contents of a dataset are substituted with dummy, placeholder, and/or false data.

Why is PII substitution important? 

Data breaches occur because of unauthorized access to, or unauthorized disclosure of, PII data. When unprotected PII falls into the wrong hands, it is vulnerable to use for nefarious purposes like identity theft. That's one reason why laws force organizations to take the protection of personal information so seriously.

You may be subject to fines, penalties, and even law enforcement action if you do not comply with data security regulations. These laws and regulations include HIPAA, which applies to healthcare organizations, and the Executive Office of Management and Budget (OMB)'s Memorandum M-17-12, which applies to federal agencies.

Using PII substitution helps protect sensitive and confidential information throughout the data life cycle. Data masking techniques such as PII substitution are robust best practices that effectively disguise PII at its source. It's an essential security technique if you plan to have long-term data retention in your data management systems.

4 Ways to Protect Sensitive Data with PII Substitution

There are dozens of techniques to improve data security at your organization, including:

  • Implementing stronger access controls
  • Creating IT security training and education programs
  • Developing incident handling and incident response plans
  • Formally defining your information collection and storage processes
  • Eliminating "shadow IT," i.e. unauthorized computers and mobile devices accessing the organization's network
  • Reinforcing physical security for data storage devices

In this section, however, we'll focus on one of the most prominent lines of defense: data masking with PII substitution. By substituting false data for the true sensitive PII, you can disguise the identities of individuals and keep their privacy intact. Even if a security breach occurs, the substituted data will be useless (or at least significantly less valuable) to the attacker.

1. Simple Substitution

In the simplest form of PII substitution, the true data elements are substituted with dummy or placeholder values. This placeholder data effectively and irreversibly transforms the information. For example, if you're working with 10-digit telephone numbers, you might substitute the number's first six digits with a dummy value such as 0 or X, e.g. (000)-000-5162. By performing PII substitution, this data is no longer linked to a unique individual, but still preserves some relevant information (in our example, the last 4 digits of the phone number).

2. Data Scrambling

Data scrambling is a weaker form of data substitution than simple substitution. Here the characters of a data field are "scrambled" or jumbled up. For example, you might scramble the interior characters of a person's name (e.g. “John Smith” becomes “Jhon Stmih”). Data scrambling can be effective in a pinch but is usually insufficient to completely disguise PII data.

3. Adding Noise

Adding noise, also known as "stochastic substitution," is a more advanced PII substitution technique. In this method, each field has a random (e.g. Gaussian) amount of noise added to it, while still preserving some degree of "truthiness." For example, if you want to disguise an individual's age, you might add or subtract a random number between 1 and 5 from the person's true age. Stochastic substitution needs to be handled with care to ensure that sophisticated attackers cannot uncover the true values beneath the noise—and thus decode the dataset.

4. Data Encryption

Last but certainly not least, data encryption is a powerful method to make PII useless in the hands of an attacker. To encrypt sensitive information is to transform it into a seemingly random string of letters and numbers. That makes it impossible for anyone without the corresponding decryption key to understanding its contents. Strictly speaking, encrypting data is its own data masking technique, rather than a subclass of data substitution. That's because unlike with data substitution, encrypted data has no detectable relation to underlying true data.

How Can Help with PII Substitution


As we've discussed, PII substitution can be an effective technique for keeping your data and information systems safe from attackers. But how can you perform PII substitution in practice?

One way is to use a powerful, feature-rich ETL and data integration platform like makes it easy for anyone to perform a variety of data transformations, including data masking and data substitution. We also use SSL/TLS encryption, so your data is always protected while flowing through the pipelines.

Ready to find out how can help protect your sensitive and confidential data? Get in touch with our team of data experts today for a chat about your business needs and objectives, or to start your 7-day pilot of the platform.