GDPR's Impact on BI (Part 1 in a Series)
What are the six principles that define the GDPR and how will they affect business intelligence and analytics?
On May 25, the EU implemented the General Data Protection Regulation, generally known as GDPR. This new set of regulations is data protection with teeth. Breach of this new legislation will carry hefty fines of up to 20 million euros ($23 million) or 4 percent of a company's worldwide annual revenue.
Last year I was approached by a project manager at a major British insurance company to look at the impact of the GDPR on the company's BI systems. My first impression was this should be a straightforward task -- until I discovered that the company had several BI systems consisting of data marts, operational data stores, and staging areas for data feeds to external agents.
With interfaces to over 40 different source systems, this project began to look considerably more challenging. This scenario is not unusual for midsize and large corporations.
The Biggest Potential Impact on BI Systems
The GDPR consists of six principles that define a new set of rules designed to give EU citizens control over personal data that is stored and processed by businesses and to consolidate all previous data protection regulations under a single regulation. The six principles are:
- Data minimization
- Storage limitations (data retention)
- Lawfulness, fairness, and transparency
- Integrity and confidentiality
- Accuracy
- Purpose limitations
Most of the GDPR's regulations focus on the rights of individuals to know what personal data your business holds about them and how and why you will be processing that data. It also has a strong emphasis on security and protecting customers' data, but these are general data safeguards that many companies may have already implemented. The most specific impact for BI is in the areas of data minimization and storage limitation (also known as data retention).
Let's look at these two principles that have the greatest effect on BI systems. In the next article in this series, we'll examine the remaining four principles.
Data Minimization
Data minimization ensures that organizations are only collecting the minimum amount of personal data required to fulfill a purpose.
This goes against the grain for many BI systems. Data analysts are often trained to extract as much data as possible when loading a data warehouse. This reduces the need for successive visits to the data source for that extra piece of data. However, the GDPR dictates that you only hold the personal data required to serve specific business processes.
For example, consider an insurance company that has built a data mart to use for calculating risk and building pricing models. The insurer has a legitimate reason to collect the customer's name, address, email address, and phone number in order to process and manage their insurance policies and claims. The question is, do they need the customer's name, email address, and phone number in the data mart to calculate risk and build pricing models?
The answer is no, they don't need to know the customer's name to build a pricing model. They may require such personal items as postcode or ZIP code to identify risk zones (e.g., for flood risk). In this example, the pricing data mart has a legitimate interest in storing the customer's postcode or ZIP code but not other personal data items (such as name, email address, and phone number).
Storage Limitations (Data Retention)
The predecessor to the GDPR (EU Data Protection Directive 95/EC/46) required businesses to minimize the retention of personal data such that it was not retained for longer than needed for the purposes for which it was collected. The GDPR, which replaced this directive on May 25, goes further: to comply with the principles of storage limitation and data minimization, the business's data controllers must ensure that personal data is only stored for a limited time period.
For BI systems, this means that any personal data stored must have a clearly defined retention period beyond which it should be deleted; PII data items should be anonymized.
Today, many BI systems do not have a deletion process because "data is good" -- accumulating data over time enhances such BI processes as data mining and predictive analytics. We have also been encouraged to grab as much data as possible "just in case." Under the GDPR, "just in case" is not a sufficient justification; you must have a clear use case for storing and processing individuals' personal data.
Many of these new restrictions can be implemented with the use of views that limit access to personal data items to only those roles that have a legitimate business use. Going forward, more robust methods will have to be implemented to automatically manage the storage and retention of personal data. This could mark the beginning of a new era for BI systems.
The Other Four Principles
In Part 2 we'll examine the other four principles and their implications for BI systems.
About the Author
Rod Welch is a BI consultant with the breadth and depth of experience gained from over 15 years in the BI environment from agile requirements gathering and dimensional modeling to ETL programming. In addition, he has a keen interest in agile and automated data warehouse development and the move to cloud storage. He is currently contracted to a U.K. insurance company to assess the impact of -- and define the detailed requirements for -- implementing GDPR. You can contact the author via email or via LinkedIn.