Adjusting Metadata Management to the New World of BI
Implementing an updated strategy for metadata will be critical as users move toward self-service BI and analytics.
- By David Stodder
- December 7, 2016
"Toto," said Dorothy to her beloved canine companion as they emerged, wide-eyed, into the wonderful land of Oz, "I've a feeling we're not in Kansas anymore." Professionals in charge of enterprise business intelligence (BI) and data warehousing (DW) functions are forgiven if they feel a similar sensation when they look about the rapidly changing BI and data management landscape.
It is a new world that can seem fantastical and even magical, but within it lurk hidden dangers -- BI and data management's versions of lions, tigers, and bears -- that could threaten what has been accomplished with enterprise BI and DW.
The Rise of Self-Service and Agility
After spending years devoted to the development, care, and feeding of enterprise BI and DW, BI/DW professionals are watching users adopt self-service visual analytics tools and spin up "shadow" IT systems to support personalized data interaction. Standard reports and dashboards aren't cutting it for them.
Frustrated by limits on data sources integrated into the data warehouse and by the time it takes to introduce new sources, business units are trying to do more on their own, including setting up data lakes and other kinds of data repositories outside firewalls on public cloud platforms. Governance and data quality challenges that had been at least manageable within controlled enterprise BI and DW environments are becoming increasingly complex in the new, Oz-like world.
Yet there are no ruby slippers that one can tap to return organizations to their former state. Indeed, with ambitions to be more agile and data-driven, most organizations want to move further in the direction of self-service BI and analytics so that users can be more effective with data as part of their decision making.
Business agility is a prized competitive advantage in industries where market conditions are changing as customers develop new preferences, shift their behavior, and make choices among competitors. Agility today depends on an information supply chain that keeps the whole range of decision makers informed about what has happened, what is happening now, and, through predictive analytics, what could happen.
Agility also requires rapid development of new applications, dashboards, and other modes of data interaction and collaboration -- more rapid than is usually the case with traditional development of enterprise BI applications. However, organizations do not want data chaos to be the byproduct of their pursuit of agility.
Metadata: The Crux of the Matter
BI is all about bringing together different pieces of data so you can gain perspective within the context of business decisions you need to make. This makes the collection, management, and availability of metadata -- data (or information) about the data -- one of the most important aspects of any BI ecosystem. BI can't function without good metadata.
Almost all data sources -- including business applications, databases, and text content -- have metadata that enables users or automated programs to learn the structure of the source's specific cache of data, find relevant information, and discover characteristics about the data. BI tools themselves have metadata and often manage their own metadata repositories. Unfortunately, where self-service BI and visual analytics have succeeded most -- outside of IT in business units and departments -- metadata is often not well managed or even collected, so it difficult, if not impossible, to share it. As the enterprise BI and DW paradigm changes, how organizations collect and manage metadata needs to change as well so that collecting and sharing metadata is easier in environments where nontechnical users are directing BI, not IT.
Data warehousing systems need to collect metadata from sources to integrate the data and provide access to it. Data warehouses have played a critical role in metadata sharing at the enterprise level. Data warehousing expert Ralph Kimball has described metadata as "analogous to the data warehouse encyclopedia," and he and others have called it the DNA of the data warehouse.
Integrated metadata, often held in a specialized metadata repository inside or alongside the data warehouse, is essential to everything from extraction, transformation, and loading (ETL) to executing queries and creating views of data -- you name it. Metadata is vital to using data in the application of business rules, particularly for operational BI systems.
Master data management (MDM) systems also work with metadata to create master references to data and to enable access to data about customers, products, or other objects of interest across multiple sources. A good metadata repository is the Rosetta stone of sorts for data-driven business descriptions of higher-level objects of interest.
Although centralized metadata repositories are the norm, some organizations are implementing distributed or federated metadata management approaches that do not separate the metadata from the data itself, reasoning that the centralized resource becomes a single point of failure, is time-consuming to create and manage, and can become a bottleneck as organizations try to add new data or develop new types of analytics applications. These systems access (or "harvest") metadata as needed by users from sources all along the information supply chain; advanced technologies use semantics to automate the discovery and interpretation of metadata. However, distributed and federated systems have their challenges as well, including how to implement continuous, real-time updating of the metadata.
In any case, as organizations seek to reduce latency and accelerate time to value with BI and analytics, they are actively seeking alternatives that make metadata collection and management faster, more flexible, smarter, and more efficient. Organizations need to focus on finding the right tools because an updated technology strategy for metadata is especially critical as users move in a self-service BI and analytics direction.
Metadata and Governance
Metadata is vital to improving the quality, completeness, and consistency of your data. Assuming that the metadata is accurate (which can sometimes be a big assumption, particularly with spreadsheets, vertical applications, or semi- and unstructured big data sources), administrators can examine the metadata to take an in-depth look at table and column structures and observe how the organization of the data may be changing. This includes lineage information about who is changing the structures and who is transforming the data. The metadata view of the resources is thus important to data stewardship and governance.
In TDWI Best Practices Report: Improving Data Preparation for Business Analytics, published in the third quarter of 2016, we found that research participants' leading data governance objective in implementing self-service data preparation tools was to create business metadata and document definitions.
This indicates the importance of metadata to governance and also brings to light one of the strongest drivers behind acquisition of self-service data preparation tools, which use updated software technology to make it easier for both users and IT to prepare data for BI and analytics.
The Role of Data Catalogs
The Best Practices Report also revealed the importance of data catalogs -- centralized repositories that typically contain metadata. Satisfaction in data preparation is higher among research participants who say their organization is "reliant" on a data catalog. Shared resources such as data catalogs, glossaries, and metadata repositories can help users find quality sources and gain better knowledge about how data in multiple sources may be related.
Newer tools can alleviate some of the time-consuming manual work associated with building catalogs by automating building steps and the procedures for keeping catalogs up to date. The tools can discover metadata from existing data sets to learn details about data, tag data according to higher-level business definitions and rules, and locate and use existing documentation.
Users, including analysts and developers, can employ tools to examine data lineage to learn how data has been consumed and transformed by others, which is essential knowledge for data stewardship and governance.
Adapting to Change
As far as we know, unlike in the Oz story, in the new BI and DW landscape there is no great and wonderful "man behind the curtain" who can magically answer all data requests. Artificial intelligence such as machine learning is helpful for speed and automation, but building and managing metadata repositories such as data catalogs will always require time and careful work.
Organizations should evaluate tools that can automate steps, ease the incorporation of new data sources, and simplify repository revisions to match users' changing metadata requirements for BI and analytics. The key is to acknowledge change rather than attempt to force self-service users to conform to rules and procedures that fit an earlier paradigm.