Forbidden Fruit: Reconciling Data Temptations and Privacy Rules
Data is seductive. Here are five steps designers can take to make sure they don’t succumb to data’s temptations.
- By Cass Brewer
- March 25, 2016
Data is fascinating. Data is opportunity. It's the ability to see invisible behaviors and unlock powerful patterns. The more we get, the more we want, and data warehouses are especially well designed to fuel those cravings.
There is a wrong kind of data, too, fraught with privacy risks and liability. Like any powerful tool, data warehouses should be built with danger in mind.
In terms of privacy, dangerous data is typically customer data that regulations, industry standards, and internal policies say must be protected or restricted in some way or another. Financial data, protected health information (PHI), and personally identifiable information (PII) are the most common targets of privacy restrictions. HIPAA, Gramm-Leach-Bliley Act (GLBA), and other U.S. and international privacy laws define how some information must be restricted. PCI addresses specific types of financial data. Company privacy policies might further define “custom” restrictions that serve legal, market, or operational needs.
Although the boundary between privacy and security may ssem fuzzy, compliance wonks tend to distinguish privacy as mostly dealing with what the data is and what a company may do with it. Security, by contrast, largely deals with the mechanics of data accessibility.
This distinction is significant because privacy is too often seen as a non-technical concern, largely expressed in policy documents and customer communications. However, as a constraint on business users' desire for all of the data all of the time, privacy has technical implications for data warehousing planning, technologies, and processes.
Naturally, data warehouses, which are built for massive data consolidation and dissemination, are particularly juicy privacy targets. Even when data warehouses are just twinkles in their sponsors' eyes, designers should be thinking about how to “build in” privacy. Here are five specific things designers can do.
1. Put a privacy watchdog on the stakeholder list: Privacy is nuanced, easy to dismiss, and easy to forget in the thrill of new opportunity. At least one stakeholder should be accountable for understanding and communicating privacy requirements to the data warehouse project manager and design team.
Project managers should avoid quarantining privacy advocates from design discussions, especially those that include users (or their representatives) who will ultimately request analytics. Privacy education conducted early in the project will help prevent user dissatisfaction later. It also allows the design team to recognize and reconcile conflicting business and privacy requirements before the project's execution phase begins.
2. Reflect privacy policies in plans for data assessment, consolidation, and transformation: Sensitive data sneaks into odd places. Data warehouse designers should carefully assess source data for potential privacy issues before ETL begins. In most cases, it's easier (and less costly) to exclude problematic data before it's transferred than to expunge it from transfer files and the data warehouse itself. If sensitive data can't be excluded, for whatever reason, designers can still plan to automate anonymization, either as part of the transformation process or as a middleware process between the warehouse and reporting layer..
3. Make sure testing won't violate privacy rules: Developers can be a little myopic when it comes to getting things built, and production (i.e., raw customer) data can be a mighty tempting resource for testing data warehouses and related analytical systems prior to release. Data warehouse planners can anticipate QA processes, however, and include creation of a robust (but sanitized) test-data set in their project requirements.
4. Include privacy in training plans: Training can reduce acceptance friction and ramp-up time for new technology processes and tools. Ideally, data warehousing projects should include privacy in training plans for analysts and other business users. If analysts can query sensitive data, training will help them comply with restrictions on data use and distribution. Even if analysts can't access sensitive data, they should be aware of how its absence might influence the accuracy, consistency, and completeness of query results.
5. Plan to confirm privacy is working: There are more questions in heaven and earth than are dreamt in end-users' (initial) philosophies. Thus, as business needs change, the ways people use data will, too. Moreover, post-launch changes to a data warehouse can change its privacy profile. Data warehouse managers should plan for periodic audits of data queries and reports to ensure each is justified and compliant with company policies. As a nice side effect, this effort can also help reduce overhead by highlighting obsolete and unused reporting processes.
A Final Word
Many companies focus their privacy efforts on developing policies, communicating options to customers, and capturing customer preferences. However, security controls alone may not align these policies and preferences with how data is collected, consolidated, and disseminated through reports and queries.
By including privacy in data warehousing projects, and particularly in planning phases, managers can align privacy policies with technologies and processes. Although privacy might contrast with the business users’ desire for all of the data, all of the time, it's better to clarify the need up front than to discombobulate users by changing data availability or analytical processes down the road.
About the Author
Cass Brewer is the editorial and research director for the IT Compliance Institute (ITCi), an independent firm focusing on the intersection of information technology, regulatory compliance, and business governance. A prolific author and presenter, Ms. Brewer is a member of the Association for Computing Machinery (ACM) and the Organizers' Collaborative for the Grassroots Use of Technology.