Data masking or data obfuscation is the process of hiding, replacing, or omitting sensitive information from a specific data set.
Data masking is usually used to protect specific data sets such as PII or commercially sensitive data or to comply with certain regulations such as HIPAA or PCI DSS. Data masking or obfuscation is also widely used for test platforms (where suitable test data is not available). Both techniques are typically applied when migrating test or development environments to the cloud or when protecting production environments from threats such as data exposure by insiders or outsiders.
Common approaches to data masking include:
- Random substitution: The value is replaced (or appended) with a random value.
- Algorithmic substitution: The value is replaced (or appended) with an algorithm-generated value (this typically allows for two-way substitution).
- Shuffle: Shuffles different values from the dataset, usually from the same column.
- Masking: Uses specific characters to hide certain parts of the data. Usually applied for credit card data formats: XXXX XXXX XX65 5432.
- Static masking: In static masking, a new copy of the data is created with the masked values. Static masking is typically efficient when creating clean, nonproduction environments.
- Dynamic masking: Dynamic masking (sometimes referred to as on-the-fly masking) adds a layer of masking between the application and the database. The masking layer is responsible for masking the information in the database “on the fly” when the presentation layer accesses it. This type of masking is efficient when protecting production environments; in other words, dynamic masking can hide the full credit card number from customer service representatives, but the data remains available for processing.
- Deletion: Simply uses a null value or deletes the data.
Related article – Data Anonymization