TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Data-Architectural Futures

What will future data architectures look like? You can bet that something like data federation will be in the mix. Because both data storage and data processing are -- and will continue to be -- highly distributed, federation is inevitable.

January 10, 2017

What will the data architectures of the future look like? You can bet that something like data federation will be in the mix.

Federation is an old concept that hasn't quite shrugged off its baggage. When it first emerged 15 years ago, vendors -- and even some data management practitioners -- championed it as a replacement for traditional data warehouse architecture.

This was wildly misguided; however, a decade and a half later, similar concepts live on in data virtualization, as well as in products such as Teradata's QueryGrid. IBM, Oracle, and SAP-Sybase not only market data federation technologies, they've introduced federation-like capabilities in their flagship RDBMSs.

Why Data Federation?

Because both data storage and data processing are -- and will continue to be -- highly distributed, something like data federation is inevitable. Data lives in more and different places today than at any other time in human history.

The architectural and platform innovations that were supposed to eliminate this distribution -- the data warehouse and the Hadoop data lake -- have failed. Data still lives in legacy repositories and black box applications, operational data stores and sandboxes, a new generation of streaming repositories, cloud apps and services, and a slew of disparate -- in some cases unknown -- silos.

Data lives in so many places because data use and consumption paradigms have changed, argues Mark Madsen, a research analyst with information management consultancy Third Nature. This is particularly true of how organizations create and use analytics -- the data warehouse was designed for a paradigm in which users passively consumed reports and dashboards. It worked because it centralized both data and access, and its data was extracted from upstream systems and transformed to conform to a predefined schema.

In contrast, the advanced analytics use cases of today are characterized by open-ended exploration and much deeper (usually iterative) data analysis, Madsen says. Unlike the reports and dashboards that were the bread and butter of the data warehouse, exploratory uses have unpredictable data and processing needs.

Data Movement's the Thing

As we retool our organizations for advanced analytics, we're increasingly confronting the problem of accessing data and making it available for analysis. At its core, this is an issue of data movement. The problem is that the physics of moving data at big data scale is incredibly daunting.

According to Madsen and other experts, data movement will be one of the biggest problems going forward, and not just moving data, but minimizing how much must be moved. This requires shifting the data engineering workload -- the preparation and transformation of data -- to the systems on which the data to be moved "lives." Instead of moving a large volume of data en bloc, data is processed in place so only a small subset of data is actually moved.

Philip Russom, senior director of data management for TDWI Research, tackles this issue in a new TDWI Checklist Report, Evolving Toward the Modern Data Warehouse. Russom sees a more central (if radically transformed) role for the data warehouse than does Madsen, but he likewise zeroes in on data distribution -- and the attendant issue of data movement -- as a challenging problem.

"The trick is integrating big data or data lake platforms and an RDBMS so they work together optimally. For example, an emerging best practice ... is to manage diverse big data in [the Hadoop Distributed File System] but process it and move the results ... to RDBMSs ... that are more conducive to SQL-based analytics," Russom writes.

"This requires new interfaces and interoperability between big data or data lake platforms and RDBMSs, and it requires integration at the semantic layer in which all data -- even multistructured, file-based data in Hadoop or Spark -- looks relational," he argues. "This is the secret sauce that unifies the RDBMS/big data and data lake architecture. It enables distributed queries based on standard SQL that simultaneously access data in the warehouse, HDFS, and elsewhere without preprocessing data to remodel or relocate it."

What Russom describes sounds an awful lot like data federation -- or rather its replacement, data virtualization.

Third Nature's Madsen expands on this point.

By virtue of the diversity and complexity of analytics workloads, Madsen says, the data warehouse is now just one among several environments for analytics. Analytics sandboxes -- in the form of standalone RDBMS systems, small Hadoop (or Hadoop/Spark) clusters, and data lakes -- are increasingly common. So are other repositories of record, from the data lake itself to streaming repositories to graph database systems to (effectively limitless) cloud storage services.

The modern data warehouse must be able to get data from, and share data with, all of these platforms. "Data movement in the new analytics environment is bidirectional. Think about it. Data lives in a variety of sources. It doesn't just flow from these sources. In some cases, for example, you might want to push new or aggregated data back to those sources. The upshot is that analysts will often initiate data movement from different systems at different times," he points out.

"There is no 'center' in the new environment. Every system in it is a possible source of data and a possible source of queries to other systems for data. Data movement requires a fabric, not a one-way connector or a retrieval mechanism that only works from one location."

Federation by Any Other Name

Russom doesn't call the secret sauce he refers to "federation." Madsen, too, shies away from the term. This is because the core problem they're describing isn't strictly one of federated query -- the raison d'etre of data federation and virtualization. The core problem is, instead, least-cost data movement.

Least-cost data movement is a strategy for pushing data transformations and other aspects of data preparation up or down to source or target systems. This is more involved than data federation or virtualization. A good illustration of an approach that substantively tackles this problem is Teradata's QueryGrid.

On the one hand, QueryGrid does something similar to database links in Oracle -- it provides a means to transparently redirect queries to distributed RDBMSs. On the other hand, QueryGrid is a least-cost data movement technology. It's a scheme for transparently shifting data processing to where data lives -- in DBMSs or data sources. More important, QueryGrid is cooperative. It can push processing out to non-Teradata platforms, such as MongoDB, Hadoop, and Spark, as well as to non-Teradata RDBMSs.

Bill Grenwelge, a technical advisor with FedEx Services, says QueryGrid will permit FedEx to simplify and optimize its data architecture. Instead of an emphasis on moving data (as with ETL), QueryGrid permits FedEx to move just enough data. The difference is critical, Grenwelge says.

"It's going to enable our users to do things they never could before. From our perspective, it's an opportunity to maybe leave the data where it sits. For example, to do these reports, I don't need to pull data from that platform over there. If your data sits in a separate platform, [QueryGrid is a means to] just grab the information you need -- no more, no less. It aggregates and generates [data] in a summary table over there so that I can create a report from it," he explains.

This has additional benefits, too, Grenwelge says. "QueryGrid is going to give us the opportunity to leave data where it needs to be or where it already is and then utilize it from there in a more efficient manner. I can cut down on the replication and slim down some of my databases because I don't need to have my data replicated unless it's a disaster recovery scenario," he says.

True, QueryGrid is a Teradata-centric technology. It's likewise very much a work in progress. Teradata continues to develop it and revamp it, with a QueryGrid 2.0 release that represents a significant improvement over version 1.0. This is particularly true of QueryGrid's support for bidirectional data movement. Teradata-centric or not, QueryGrid is a more cooperative solution than other approaches.

It's likewise a lighter-weight alternative to -- and in its capacity to push data processing workloads up or down to source or target systems, more pragmatic than -- full-fledged data virtualization. As a technology for both federated query and least-cost data movement, QueryGrid anticipates the data fabric or synthetic data architecture that will knit together the enterprises (or data centers) of the future.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Data-Architectural Futures

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Data-Architectural Futures

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career