Data Classification Should Not Depend On Users

If your company is manually classifying any data, you've already lost the data security battle. Data security is reliant on classification, but data classification is unreliable today because it relies on users.

Users tag or label files with common values like "public," "internal," "confidential"and "highly confidential," and solutions like data loss prevention, rights management and information protection all rely on some form of tag. This type of classification is very fragile because data is always in motion. What is important today might not be important tomorrow. What is not sensitive today might become sensitive in the future.

To get a sense of this problem, let's walk through a simple data classification workflow with manual user-based classification. An employee creates a generic project template and classifies it as public. There is no confidential data within the template. Another employee starts to use the template for a client and populates it with customer-specific information. The employee should change the classification to"internal."

Maybe the classification change occurs. Maybe it doesn't. The risk to the company is low at this point, but not zero. As time passes, employees could add more and more data to the file, including login credentials and account numbers. Has the file been reclassified as Confidential? If there are multiple versions of the file, have all instances of the file been reclassified correctly? There are too many opportunities for classification to fail. The risk to the company is now high.

The weakest link in the classification process is employees. Even diligent employees make mistakes. Many companies implement different security processes for files with "confidential"or "highly confidential" tags, such as not allowing them to be sent via email or stored in the cloud. These processes create additional workflow friction for employees. Employees do not have the incentive to classify data correctly.

Removing employees and the human element from security is the answer. Instead of relying on employees to follow procedures and evaluate data correctly, companies should consider security solutions based on automated classification. A popular marketing term for these types of solutions is data-centric. Like most marketing terms, companies bend the definitions as needed to fit their positioning and solution.

Regardless of what you call it, companies should look for data security solutions that do not require end users to be part of the security process. Authorized users should continue working without knowing security has validated their actions, while the system blocks unauthorized users from accessing secured data.

Security solutions need to focus on the data. Instead of relying on users to update classification based solely on the perception of what type of data is in a file, you should base security decisions on immutable values such as data content itself.

The employee still copies and pastes login credentials and account numbers from a previously secured and confidential file from our previous example, but the security recognizes the original data from a confidential file in this case and automatically changes the second file classification to confidential — all without any input from the user.

Even if the employee copies the file or creates a new version via "save as," the resulting file will be classified automatically. Now security is working automatically without any input for users.

In short, to win the data security battle, companies must first classify data correctly. Here are some tips to ensure your data security is successful:

• Remove end users from the security process. Users should not be deciding on data classification.

• Security needs to be transparent to authorized users. If not, they will find alternative workarounds to stay productive.

• Base classification on immutable values such as the content of files. As the content changes, the classification or label needs to change automatically.

• Do not rely on filename or metadata for classification.

• Look for data security that identifies content such as regulated data types and sources. Types can be personally identifiable information (PII), Payment Card Industry (PCI) and personal health information (PHI). Sources can be all data that originates from a SaaS service like Salesforce or Workday or from a centralized file server.

• Ensure classification occurs in realtime and not a nightly rescanning of the computer.

By adhering to these tips, companies can ensure that the data classification is credible and reliable. Classification decisions are critical to the data security process, and data classification historically has let companies down.

Why Data Classification Should Not Depend On Users

Original Forbes Article