TDWI Articles

Data Privacy in a Globally Competitive Reality

Protecting consumer privacy is key to securely providing the huge data sets required for innovations in AI analytics.

Once an afterthought, data privacy has come to the forefront of internet policy discussions. Increased legislative action surrounding consumer data privacy -- the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), for example -- are being well received by consumers who are empowered by these laws to control the use of their own data. Consumers decide if and when companies can have their data or not, giving them the peace of mind of knowing what's out there and how it's being used.

For Further Reading:

Forbidden Fruit: Reconciling Data Temptations and Privacy Rules

CEO Perspective: Data Privacy

Data Privacy: 3 Best Practices to Enact Now

However, data restrictions can impact the development of AI, which requires huge data sets to generate accurate prediction models and improve the end products. Varying regulations across international borders translate to some countries having access to larger data sets and thus the potential to pull ahead in the global AI race. This creates a challenge for data scientists who must seek to level the playing field while remaining compliant.

At a global level, there is a spectrum of consumer data privacy regulations. On one end, the European Union's GDPR gives individuals complete control over their personal data and who can access it. Enterprises processing such data must have strict technical and organizational measures in place to ensure data protection principles such as de-identification practices or full anonymization. When data is being processed, it must be done for one of six lawful reasons and the data subject is able to revoke permission at any time. Although strict data management protects consumers' privacy, from an artificial intelligence point of view it inadvertently may limit access to critical data elements or reduce the size of the data set which ultimately could affect the ability to create accurate algorithms. Additionally, limited-size data sets can greatly impact progress on research developments.

On the other end of the spectrum is China. With the largest population of internet users in the world, organizations can collect an enormous amount of data on customers that can be used in enterprise AI solutions. Because there are fewer restrictions about who can view and leverage personal data, Chinese data scientists are in many cases able to use the country's massive data sets as a competitive advantage in developing new AI algorithms.

The U.S. falls somewhere in the middle of the spectrum. Rather than having federal legislation, regulation varies state to state. CCPA, implemented in 2020, grants residents of California insight into and control over their personal information collected online. These consumers can request to see the data businesses are collecting and, if desired, can demand it be deleted or opt out of the data being sold to third parties. As other states are beginning to follow suit data scientists and other tech leaders must accept the reality of limited data.

Because the U.S. government doesn't often share information to foster collaboration with enterprises, it may create an advantage for organizations in other countries. For example, healthcare data is closely protected -- and for good reason. However, if it could be made available to researchers after it's anonymized, perhaps we could discover better treatment and prevention methods for healthcare patients. Unfortunately, it sometimes takes a crisis such as COVID-19 to ignite widespread collaboration, making data available to data scientists to encourage scientific discovery or tracing and prevention efforts. Imagine the advancements that could be made globally if more anonymous healthcare data were available.

To circumvent data regulation inadvertently inhibiting data innovation, we might consider approaching data in a way that enhances consumer privacy without hindering the scientific, technological, and academic communities.

Strict data governance and reliable de-identification will become a key foundation for innovative, data-driven solutions. With responsible access to large amounts of data, the resulting AI offerings could be able to create applications tobenefit society, especially if we consider "data for good" applications in the context of sustainability, improved health outcomes, and financial well-being. Of course, more personal data being stored in various IT systems also demands that cybersecurity receives increased attention at the executive level. Otherwise, we risk embarrassing data breaches which will further erode the trust of the general public, making them mistrust the benefits of data sharing.

Trust is key in a digital, data-driven economy and even more critical for the widespread adoption of AI algorithms that derive automated decisions from vast amounts of data. In this context, Singapore's "Model Framework," originally launched at the World Economic Forum in Davos, presents a refreshing contribution, tying together the concerns of data privacy, AI ethics, and governance. By explaining how AI systems work, building good data accountability practices, and creating open and transparent communication, the Model Framework aims to promote public understanding and trust in such technologies. We expect more organizations will leverage frameworks like this to inform and guide their own strategy around data governance and AI implementation.

Another example of this is KDD, one of the largest and longest-running data science conferences hosting both academic and industry experts. The conference's annual competition, KDD Cup, relies on sponsors to share sanitized data sets with the data science community to empower problem solving through machine learning (ML) and AI. This allows for better solutions to be developed faster and more accurately while fostering global collaboration among teams. Often, the sponsors walk away with a new solution to an existing problem.

With access to more information, data scientists would have the ability and freedom to develop more sophisticated algorithms, prediction models, and technologies. The more digital our world becomes, the more important it is for us to find a middle ground between privacy and innovation.

About the Author

Michael Zeller is the head of AI strategy and solutions at Temasek. Throughout his career, Dr. Zeller's passion has been to help organizations accelerate insights from data through machine learning and AI. In addition to his experience as entrepreneur, executive, and advisor to technology-centric organizations, he serves as treasurer on the executive committee of ACM SIGKDD, the premier international organization for data science. You can reach the author via Twitter or LinkedIn.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.