Data classification as a part of the information lifecycle management (ILM) process can be defined as a tool for categorization of data to help an organization to effectively answer the following questions:
- What data types are available?
- Where is certain data located?
- What access levels are implemented?
- What protection level is implemented, and does it adhere to compliance regulations?
A data classification process is recommended for implementing data controls such as DLP and encryption. Data classification is also a requirement of certain regulations and standards, such as ISO 27001 and PCI DSS.
Classification Categories
There are different reasons for implementing data classification, and therefore many different parameters and categories for the data classified.
Some of the commonly used classification categories are:
- Data type (format, structure)
- Jurisdiction (of origin, domiciled) and other legal constraints
- Context
- Ownership
- Contractual or business constraints
- Trust levels and source of origin
- Value, sensitivity, and criticality (to the organization or to a third party)
- Obligation for retention and preservation
The classification categories should match the data controls to be used. For example, when using encryption, data can be classified as “to encrypt” or “not to encrypt.” For DLP, other categories such as “internal use” and “limited sharing” would be required to correctly classify the data.
Data labeling is usually referred to as tagging the data with additional information (department, location, and creator). One of the labeling options can be classification according to certain criteria: top secret, secret, classified. Therefore, classification is usually considered a part of data labeling. Classification can be manual (a task usually assigned to the user creating the data) or automatic based on policy rules (according to location, creator, content, and so on).
Related article – Classification of Discovered Sensitive Data