TDWI Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

Learn how a semantic layer serves as a bridge between natural language queries and actionable insights.

The advent of generative AI sparked a push toward developing a conversational interface for business analytics and data insight generation. It seemed to be a perfect use case, allowing decision-makers who may not be tech-savvy or data literate to simply “talk to their data”—ask questions in natural language and receive instant, actionable insights.

Promise and Push-Back

For Further Reading:

Is a Semantic Data Plane the Answer to Poor Data Management?

How a Semantic Layer Helps Your Data Teams

How a Universal Semantic Layer Enables Consistent Answers to Business Questions

No more trawling through complex reports, deciphering spreadsheets or waiting for data teams to provide answers. Conversational BI/analytics, powered by AI, can now be a reality and not just a vision of the future.

However, despite its promise, the push for conversational BI was met with adoption inertia. Two major challenges have hindered its potential—the accuracy of the data insights and the speed at which the interface could provide the answers that were sought. 

This can be attributed to the inherent complexity of data architecture, which involves fragmented data in disparate systems with varying definitions, formats, and contexts. Without a unified structure, even the most advanced AI models risk delivering contextually irrelevant, inconsistent, or inaccurate results.

Moreover, traditional data pipelines are not designed for instantaneous query resolution and resolving data from multiple tables, which delays responses.

An Emergent Resolution

Large language models (LLMs) like GPT excel at interpreting natural language but lack the domain-specific knowledge of a data set. A semantic layer can resolve this challenge by acting as an intermediary between raw data and the conversational interface. It unifies data into a consistent, context-aware model that is comprehensible to both humans and machines.

Retrieval-augmented generation (RAG) techniques are employed to combine the generative power of LLMs with the retrieval capabilities of structured data systems. To make this integration work effectively, the semantic layer adds value in the following areas.

Clean and Consistent Metadata

A key strength of a semantic layer is its ability to create and manage clear and well-defined metadata for all data assets, such as tables, columns, and metrics.

The layer provides detailed information about data assets. This includes:

  • Semantics mapping: This is where business-friendly names and descriptions map to data fields.
  • Relationships and hierarchies: Clear definitions of how tables and columns are connected.
  • Data type and rules: Information on data formats, allowed ranges, and validation rules.
  • Business logic encapsulation: Pre-defined metrics and calculations, such as Total Revenue or Year-over-Year Growth, are encoded into the semantic layer.

Structured metadata in a semantic layer serves as the backbone for adding relevant context to queries asked in natural language, and the layer maps them to the correct data entities.

For example, for a query like “What was the average order value last year?”, metadata helps the LLM understand that:

  • “Average order value” refers to the metric Total Sales/Total Orders.
  • “Last year” maps to a dynamic date range in the data.
  • The required tables (e.g., orders and sales) are related by a specific foreign key.

Simplifying Data Complexity

Any enterprise data set would probably include hundreds or even thousands of interconnected tables, each representing a portion of the overall business operations. Understanding and navigating these relationships requires expertise in data modeling, which LLMs don’t have.

A semantic layer abstracts this complexity by defining and encoding data relationships between all these tables. The layer helps pre-define and apply appropriate joins based on business logic. In addition, it removes computational overhead by mapping user queries dynamically to the right column in the table.

In the same way, LLMs can choose a single, most appropriate semantic model to generate responses. This induces more accuracy and speed in creating queries from natural language questions and answering them.

Business Metrics (KPIs) Management

Different teams in an organization may calculate the same KPI in a somewhat differing manner, leading to discrepancies in reports and conflicting insights. For example, the finance team might define “Net Revenue” as total revenue minus refunds and discounts, while the sales team might exclude discounts.

The semantic layer becomes the common framework for defining, managing, and defending these KPIs. At this layer, all KPI calculations are encoded centrally so that everyone uses the same standardized definition. Since this groundwork is already completed, LLMs can also directly use business calculations that are clearly defined in a semantic layer, without any worries of data being compromised, inconsistent, or incorrect. 

Security and Governance

While conversational analytics ensures ease of data access, it also introduces a window of risk for a possible breach. Without proper security protocols, AI systems may enable unauthorized access to critical information, and users may receive details they aren’t allowed to see. These systems need robust security behind the scenes to prevent such leakages, and a semantic layer ensures that.

As a centralized hub for defining and enforcing role-based access control policies, it ensures that the access is limited to only data for which users have predefined authorization.

Transparency and Explainable AI (XAI)

With conversational analytics becoming more mainstream, there are many regulatory and ethical concerns regarding the authenticity and transparency of insights it generates. With clear metadata definitions and centralized governance, a semantic layer can make AI-driven queries fully explainable.

The layer helps log how each part of a natural language question maps to a specific data entity or transformation in the SQL query, creating a transparent lineage. This traceability ensures that the AI system can explain how and why specific data sets, filters, and joins are selected for a particular query.

Resolving Context Ambiguity

One of the fundamental challenges in conversational analytics is interpreting context-less values within user queries. For example, for a query like “What were the sales for last quarter in New York,” values like “last quarter” and “New York” lack explicit context. The system must infer which columns these values correspond to in the database, i.e., “last quarter” maps to a “time” column, and “New York” maps to a “region or city” column.

In this case, a semantic layer ensures that every value in a natural language query is correctly associated with the appropriate column using context-rich metadata.

Conclusion

The semantic layer serves as a bridge between natural language queries and actionable insights. This is made possible with its inherent capability of unifying disparate data sets, while also simplifying data complexity. It holds the promise of enforcing consistent business logic, expanding the use of explainable AI, and ensuring robust security and governance. With such definitive advantages, a semantic layer is uniquely positioned as the key enabler for trust, transparency, and efficiency in conversational analytics.

 

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.