The implementation of data discovery solutions provides an operative foundation for effective application and governance for any of the P&DP (Privacy and Data Protection) compliance.
From the customer’s perspective: The customers, in the role of data controllers, have full responsibility for compliance with the P&DP laws’ obligations. The implementation of data discovery solutions and data classification techniques allow customers to specify to the cloud service provider the requirements to be fulfilled, to perform effective periodic audits according to the applicable P&DP laws, to demonstrate to the competent privacy authorities their due accountability according to the applicable P&DP laws.
From the cloud service provider’s perspective: In the role of data processors, cloud service providers must implement, and be able to demonstrate they have implemented in a clear and objective way, the rules, and the security measures to be applied in the processing of personal data on behalf of the controllers. Thus, data discovery solutions and data classification techniques will help the CSPs to comply with the controllers’ P&DP instructions.
Implementation of data discovery and data classification techniques is foundational to DLP, data protection, and compliance with data privacy laws.
Data Discovery Issues and Challenges
You need to be aware of the following issues about data discovery:
- Poor data quality: Data visualization tools are only as good as the information that is input. If organizations lack an enterprise-wide data governance policy, they could be relying on inaccurate or incomplete information to create their charts and dashboards.
Having an enterprise-wide data governance policy will help to mitigate the risk of a data breach. This includes defining rules and processes related to dashboard creation, ownership, distribution, and usage; creating restrictions on who can access what data; and ensuring that employees follow the organization’s data usage policies.
- Dashboards: With every dashboard, you must wonder: Is the data accurate? Is the analytical method, correct? Most importantly, can critical business decisions be based on this information?
Users modify data and change fields with no audit trail and no way to tell who changed what. This disconnect can lead to inconsistent insight and result in flawed decisions, increased administration costs, and inevitably the creation of multiple versions of the truth.
Security also poses a problem with data discovery tools. IT staff typically have little or no control over these types of solutions, which means they cannot protect sensitive information. This can result in unencrypted data being cached locally and viewed by or shared with unauthorized users.
- Hidden costs: A common data discovery technique is to put all of the data into server RAM to take advantage of the inherent input/output rate improvements over disk.
This technique has been very successful and spawned a trend of using in-memory analytics for increased business intelligence performance. Here’s the catch, though: in-memory analytic solutions can struggle to maintain performance as the size of the data goes beyond the fixed amount of server RAM. For in-memory solutions, companies really need to hire someone with the right technical skills and background or purchase prebuilt appliances—both of which are unforeseen added costs. An integrated approach as part of an existing business intelligence platform delivers a self-managing environment that is a more cost-effective option.
This is of interest especially for companies that are experiencing lagging query responses due to large data volumes or a high volume of ad hoc queries.
Related article – Challenges with Data Discovery in the Cloud