TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Creating Self-Service Organizations with Data Catalogs

Data catalogs help you create a truly self-service organization by efficiently providing analysts with the means to find, understand, and trust their data.

By Aaron Kalb
March 28, 2017

In the BI world, "self-service" often means simply giving individual analysts the ability to define their own metrics and dimensions without involving IT. Although those analysts relish the freedom that comes from such disintermediation, it can result in a proliferation of reports and stats that actually make it harder for business users to self-serve because they don't know what they can trust.

For a careful analyst committed to accuracy, IT isn't the biggest bottleneck. The hardest part of the job is finding and understanding relevant and trustworthy data sets. The modern data catalog, powered by machine learning and designed for collaboration, has emerged to overcome these challenges, allowing analysts and business users to work quickly and correctly.

Find the Data

Finding the right data asset in a modern enterprise can be like trying to find a book in a massive library. In the 20th century, libraries used card catalogs to make it easier for book seekers to search by title, author, or category. In the digital age, Amazon and Google eclipsed their predecessors, largely by developing superior catalogs.

Much like Google indexes the Internet, the modern data catalog crawls, parses, and indexes all of an organization's data -- including information in BI tools, wikis, and usage logs -- to enable a single search function over a diverse array of data assets. Raw data elements can be annotated and tagged via both via expert curation and machine learning.

For instance, algorithms can train on existing documentation to make educated guesses about the logical meanings of inscrutable field names full of abbreviations and acronyms. With such "translations" in place, natural language search terms can yield useful results (e.g., a search for "daily revenue" can find "dly_rvnu").

In addition to identifying all relevant candidates, a modern catalog should -- like Google -- rank them so the most promising are near the top. Ranking by popularity -- a measure capturing recency and frequency of use -- can help data consumers identify the best assets based on the prior behavior of their peers. The result is an easy-to-use single source of reference for all of an organization's data assets with the context necessary to determine which are most applicable to the analytics question at hand.

Understand the Data

For analysts, finding the data is only the first step. Understanding data requires rich context such as definitions and information on history and usage.

An analyst needs to understand the shape of the data set, where it came from, whether it is up-to-date, who else has used it, and how it was used. In aggregate, the organic use suggests roughly how the data has been used historically. Data catalogs also show who in particular has used the data, helping analysts find the experts who know the data best (which can otherwise be quite challenging in organizations with hundreds of analysts.)

Top user lists can also indicate meaning: if everyone listed is in a particular team (such as the finance, risk, or marketing department), that can be a helpful hint. Just as a shopper on Amazon considers factors such as star ratings, price, delivery date, pictures, and other users' purchasing patterns to select the right product, a data consumer should be given a 360-degree view of each data asset in a catalog.

Trust the Data

Finally, analysts face the challenge of determining whether they can trust the data -- whether it is accurate and can yield meaningful insights. Think about trying to find a restaurant offering tasty dishes. A restaurant may have "tasty" written all over its website, but you can't necessarily trust that description. Data has a similar issue -- just because a table or file is named "q3_results_final_final_final," that doesn't necessarily mean it's actually final. If anything, such suffixes should raise suspicion -- presumably the "final_final" version looked conclusive at some point.

Traditional data documentation systems limit contribution permissions to a small, trusted group. The result is more accurate documentation for a few data assets but far less breadth of coverage. This method is also slow, and the documentation often becomes stale. It doesn't suffice for a self-service environment.

Modern data catalogs draw on third-party information to verify whether the data can be trusted. They incorporate active signals such as analyst endorsements (similar to the star ratings on Yelp) and mine passive signals (much like how Google PageRank interprets hyperlinks as votes of confidence). Together these active and passive signals provide good indicators of data's trustworthiness.

Modern data catalogs also trace a data asset's lineage to allow analysts to determine its origins. If the data set was underlying the CFO's quarterly earnings presentation, it is probably (hopefully!) trustworthy. Using the wisdom of the crowd, aided by machine learning to make recommendations, modern data catalogs overcome the limitations of traditional documentation systems.

The Insight-Driven Organization

Self-service analytics are critical for any organization striving to be insight driven. In this new paradigm, traditional data documentation is too slow and restrictive. The modern data catalog provides the mechanism for creating a truly self-service organization by efficiently providing analysts with the means to find, understand, and trust data.

About the Author

Aaron Kalb is cofounder and chief data officer at Alation. He has spent his career working at the intersection of humans and technology to help people satisfy their curiosity and make more rational decisions. As CDO, his mandate is to promote data culture and data-driven decision-making within the company and around the world. Prior to Alation, he worked on Siri at Apple in the Advanced Development Group. He holds bachelor’s and master’s degrees in Symbolic Systems from Stanford University.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Creating Self-Service Organizations with Data Catalogs

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Creating Self-Service Organizations with Data Catalogs

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career