How the Trust Gap Is Holding Back Data-Driven Decisions
Why data catalogs just might be the final mile for analytics innovation.
- By Stephanie McReynolds
- October 16, 2018
The most recent results of a long-running global business intelligence survey revealed that 58 percent of companies base at least half of their regular business decisions on gut feel or experience rather than data and information.
Why? They don't trust the data, the analysis, or both. As we are seeing with automated intelligence, trust becomes particularly important when people cannot fully control a technology or scientific innovation. The algorithms and APIs become a black box, difficult to verify. Trust is the elusive final mile of the analytics innovation journey and the key to data-driven decision making.
What's Holding Us Back?
Adopting self-service tooling is easy. Fostering a data-driven culture is hard. For years, data management teams and vendors have innovated with new systems for data analytics. We've built cheaper repositories that could store the proliferation of data, faster processing engines of all flavors that efficiently churn through data to find patterns, and easier-to-use interfaces that engage a wide variety of users in beautiful, creative exploration of data and analysis.
What if the final gap, the issue holding us back from data-driven answers, is not a technology barrier at all? What if it is something very human?
To change the way we make decisions, we have to build trust. Trust in the data itself, trust in the specialist who transformed that data into analysis, and trust in the software that is storing, processing, and delivering insight.
Most current thought leadership about data governance describes the modern technical challenges of data security, quality, and lineage in a highly distributed data analytics architecture. However, solving those technical challenges won't directly increase the number of managers making data-driven decisions. Instead, we need to shift our focus as data governance becomes more important to developing insights. It is a slight but critical shift to the people and processes that ultimately deliver trusted insights.
No Standard Formula to Follow
A large body of excellent academic research and thinking about data ethics and governance is growing rapidly thanks to the increased focus on automated intelligence. With all of the minds at work in this area, you might think that the answer is clear, but it continues to trip up even the most innovative data companies.
What is the standard formula for trust in data? I wish there were an equation, a simple formula for ensuring trust, but none exists. Trust is a social construct. The foundations, boundaries, and norms for trust evolve out of a community. The role of regulation and what can be done to balance fair use of data, trust, and acceptance does not have a one-size-fits-all answer. Trust must be built.
Before self-service analytics, when data analysis was performed by a small team of specialists, trust was easy. Tribal knowledge passed from specialist to specialist and those one-to-one relationships were a sufficient foundation for trust. Each organization typically had a small group of data analysts or data professionals who were the go-to sources for trusted knowledge.
One customer of ours recently described the situation rather humorously: "I, personally, was the query tool for my team. If they wanted an accurate and governed query, they asked me to run it. My colleague was our business glossary. If someone wanted a detailed definition of what the data represented or what the accurate calculation would be to measure a business result, they asked her. Together, we ensured the accuracy of analysis for the team."
Today with broad access to data and analytics the norm for most organizations, trust is more challenging. This type of tribal knowledge sharing no longer scales.
How Is Trust In Data Built?
Trust "rests not only on the assumption that one is dependent on the knowledge of others who are more knowledgeable, it also entails a vigilance toward the risk to be misinformed," according to the authors of the essay "Trust in Science and the Science of Trust."
As the scientific community grew, it became unlikely that a scientist would know socially or personally the colleague who worked on a specific project or innovation. Therefore, trust in the context of science and research is often trust in the system of science, in its institutions, rules, and methods. The best model we have for building trust in data is to create something similar to the scientific method and the supporting system around it.
Albert Einstein famously said, "I have no special talents. I am only passionately curious." Trust in science is created through faith that the process of curiosity and discovery is open to anyone, rigorously documented, and made publicly available for further scientific inquiry. This transparency has created a solid foundation for scientific thought and innovation for centuries. Data governance should embrace processes for building similar transparency just as readily as processes for cleaning and securing data for analysis.
Cataloging for Trust
This brings us to why data catalogs have emerged as a new must-have component of a modern analytics architecture. Five years ago data catalogs did not exist, self-service analytics was not yet pervasive, and trust was delivered through networks of tribal knowledge.
In today's new self-service analytics world, in order to support trust in data and trust in analytics, you need a system for easily collecting, organizing, and accessing the documentation that describes analytics discovery. These catalogs need to be more than mere inventories of available data. If our goal is transparency and trust -- the trust needed by managers to change behaviors and make data-driven decisions -- data catalogs must embrace the notion that a data inventory is only a starting point. Inventories are not rich enough to deliver trust.
Data catalogs must capture the people, processes, and data involved in analytics discovery. The individuals asking curious questions and the methods applied during discovery are just as important as the quality, security, and lineage of the underlying data. The social information of who was involved in the analysis needs to be captured alongside the technical details of the analysis. This is the only way we can psychologically transfer trust in tribal knowledge to trust in the transparent scientific method performed during analysis. As managers -- and as humans -- we need this context to be able to trust the insight we receive.
If we fight the draw of defining a data catalog as just a technical inventory, embrace the opportunity to use it as a foundation for driving shared understanding of our analytics inquiry, and acknowledge the human transformation that has to happen to become data-driven managers, the catalog becomes a powerful driver for trust in data and analytics.
Facing the Challenge
After many years of being relegated to managing the plumbing of modern analytics architectures, data governance teams have a bright new future. The tooling is now available to support a new approach to data governance -- one that embraces the already prevalent broad-based, democratic use of data. This new approach recognizes that the community of data users is just as responsible as the data governance team for creating trust in data-driven insights.
If you are a data-driven organization and are not having a broad-based conversation about trust in data, start now. Start broadly within your community of employees, customers, and partners who have access to your data.
In addition, to close the trust gap and build a world defined by data-driven decisions, data usage must not be solely governed by top-down policies but by the community norms and guardrails. We're seeing this transition already start in organizations that have dedicated time to creating a community of data users focused on broad distribution of tribal knowledge and the transparent communication of how insights are produced.