DLP's Classification To Security Gap

Clearly, (555) 123-1234 is a phone number from the United States. Identifying critical data such as personally identifiable information (PII), credit card information (payment card industry or PCI data) or personal health information (PHI) is not a problem. Both users and artificial intelligence/machine learning (AI/ML) can locate names, addresses, account numbers and other personal information. If we can identify the critical data, why doesn't data security prevent data breaches and dataloss?

Data loss occurs because traditional security such as data loss prevention (DLP) doesn't secure data on the endpoint. DLP will discover and classify data on the endpoint, but it will not secure it. Instead, DLP will rely on blocking sensitive data as the data attempts to leave the device. 

The reason is security is not transparent and impacts users and workflows. To accomplish this, DLP requires admins to create and maintain an extensive list of rules that identify what is allowed and what is not.

Example rule: If the user is sending PII data via Outlook/Exchange and the recipient's domain is Acme.com, block the action by removing the PII attachment. Now, the example rule makes sense. But what happens when we introduce three layers of users: limited, corporate, executive. Each has a different outcome.

What happens if a user tries to send the file via WeirdAppOffInternet.exe? Or what happens if Acme.com also has a team in Japan with the domain Acme.co.jp? The number of rules and the ongoing maintenance to keep up with applications is nearly impossible.

Why do legacy solutions insist on not securing classified data on the endpoint? Because, until now, all solutions available impact users and workflows.

Securing data today stops work from happening. For example, when PII data is in a PDF file, DLP encrypts the PDF to protect the PII data. But now, users can't easily view or edit PDF files. Previews and thumbnails stop working. Users need to either decrypt the PDF file to edit or view. Alternatively, the DLP solution could provide some encrypted PDF viewer (or plug-in). At a minimum, users have to learn a new behavior to view and edit the secure data. More common is users pass a speed bump multiple times a day while trying to perform their job.

DLP has found ways to reduce the speed bump for common file types like PDF and Microsoft Office files. But PII and regulated data are also found in other file formats. Audio, compressed files, images, videos, source code/engineering files, databases and more are commonly used in business today. DLP is a significant speed bump for these and any custom file type. Users have to encrypt and decrypt the file before using and storing it. We can debate the end-to-end security of solutions that decrypt entire files as part of the standard workflow separately. (Hint: I will argue that any solution that decrypts the whole file is inherently flawed.)

Organizations should evaluate DLP solutions based on two criteria:

  • Data security needs to be transparent to end users and workflows. Solutions need to include securing data of any file type and compatibility with any application. Users will find workarounds to security if security impedes their ability to work.
  • PII and other regulated data sets need to be secured as soon as possible on the endpoint. Securing data on the endpoint protects from accidental and malicious insider threats and external threats such as ransomware.

The industry needs to change its approach. DLP solutions must secure the data on the endpoint and not just contain the data to the endpoint. This small difference is significant. There are too many inadvertent and malicious ways for data to leave the endpoint. The ongoing rule management burden DLP places on organizations is too high.

Data Loss Prevention's Classification To Security Gap

Original Forbes Article