TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Reshaping Data Management for a Hybrid, Multicloud World

Three trends that were at the forefront of both the Cloudera Analysts conference and Strata Data.

By David Stodder
November 25, 2019

As a longtime participant in tech conferences at the Jacob K. Javits Convention Center on the west side of Manhattan in New York City, this September's walk to attend the O'Reilly Strata Data Conference was startling. In the old days, my walk took me through a ramshackle district of taxi cab garages and repair shops, vacant lots, hole-in-the-wall bars, and an assortment of humble workshops and warehouses. Now, the walk takes you through the brand-new, not-quite-finished Hudson Yards, a land of gleaming, towering, mirrored skyscrapers and ultra-modern shops and restaurants.

For Further Reading:

Why Enterprises Are Turning to the Cloud for Global Data Management

Plan Carefully when Migrating to a Cloud Data Warehouse

Artificial Intelligence and the Data Quality Conundrum

This grand cityscape outside the Javits Center fit with the changes going on inside. Strata used to be focused on Apache Hadoop, its ecosystem of technologies, and the data science that drove leading-edge big data use cases. You would leave with a headful of code and a pocketful of stuffed yellow elephants -- the mascot of Hadoop. Now, Strata Data's scope encompasses cloud computing, artificial intelligence, and diverse data architectures, with Hadoop among them but hardly mentioned.

Yet, Hadoop is not forgotten. At the Cloudera Analysts conference that preceded Strata Data, Cloudera chief product officer (and former Hortonworks co-founder) Arun C Murthy described four areas where "the Hadoop philosophy" still guides the development of data platforms:

Disaggregated software stack (of storage, compute, security, governance, and more)
Extremely large scale, using distributed systems and commodity infrastructure, hardware, and the cloud
Open source and open data standards
An evolving ecosystem that can include diverse technologies and enable independent innovation at every layer

The Hadoop philosophy (not to mention Cloudera's and Hortonwork's Hadoop runtime distributions) still lives inside the Cloudera Data Platform (CDP), introduced by Cloudera in June but more fully described and fleshed out with services at Strata. Since its merger with Hortonworks in October 2018, the two companies' customers, the technology industry, and concerned financial investors have all been watching to see how well the combined entity would articulate and execute on its strategy. Cloudera has had to adjust as the industry landscape has shifted rapidly to the cloud, where dominant platforms such as Amazon, Google, and Microsoft offer their own Hadoop and Apache Spark services.

The comprehensive CDP and its services, including the newly announced cloud-native Cloudera Data Warehouse, have repositioned Cloudera as "the enterprise data cloud company," to use its own description. Cloudera has shifted its center of gravity to the cloud, but with customers still invested in on-premises systems, it is taking a hybrid, multicloud approach that offers unified management across on-premises and cloud-based systems. CDP works with open-source Kubernetes container management and orchestration to enable easier integration and portability. CDP aims at supporting five secure and governed self-service experiences: flow and streaming, data engineering, operational database, machine learning, and data warehouse.

Cloudera Data Warehouse (CDH) applies containers to enable faster and easier creation of virtual data warehouses. Instead of requiring data engineers to work at a lower level to set up Impala clusters, Cloudera aims to elevate the user experience toward simply declaring a "T-shirt size": that is, set requests that the system can interpret to adaptively scale, auto-provision, and optimize resources for the workloads. At the Analysts conference, Cloudera described optimization capabilities for "bursting" on-premises workloads, data, metadata, and more to the cloud to make the transition faster and more in tune with the elasticity organizations are seeking from cloud deployments.

Three Important Trends

There's more to talk about regarding Cloudera's announcements, but I would like to focus the rest of this article on three trends that were top of mind at both the Cloudera Analysts conference and Strata Data.

For Further Reading:

Why Enterprises Are Turning to the Cloud for Global Data Management

Plan Carefully when Migrating to a Cloud Data Warehouse

Artificial Intelligence and the Data Quality Conundrum

Separating computation from data storage. The growing use of containers is important to the trend toward flexible systems that use layered architectures and allow independent selection of data storage services and computation resources, such as the number and type of processing units working in clusters. Over the history of database systems, the pendulum has swung back and forth between tightly coupled computation and storage systems and more loosely coupled systems. Hadoop Distributed File System (HDFS), which largely uses directly attached storage (DAS), has been tightly coupled to avoid latency that can grow when you have to move big data from storage to computation resources. However, as they shift to cloud data architectures, organizations need greater flexibility to choose which type and how much computation they need to handle a particular workload. They also need flexibility on the data storage side so they can switch to newer technologies and also position their data to meet hot, medium, and cold levels of access demands.

Today, faster networks are facilitating looser coupling, where it matters less where the data is stored if it can be moved, replicated, or accessed quickly. Along with faster networks, the other factor driving separation is the use of scalable object storage such as Amazon S3, Google Cloud Storage, and Microsoft Azure Storage, which enables organizations to replicate data across locations more easily. Many organizations are moving their data lakes, for example, to object storage in the cloud.

However, looser coupling across hybrid, multicloud environments can open data systems up to performance, accessibility, and other issues such as too much redundancy. Data must be managed differently to ensure performance efficiency and quality.

This is likely to drive demand for data orchestration, which "brings speed and agility to big data and AI workloads and reduces costs by eliminating data duplication and enables users to move to newer storage solutions such as object stores," according to Alluxio, a solution provider I met with at Strata Data. "Data orchestration is to data what container orchestration is to containers." Alluxio, based on the research project "Tachyon," is an open-source virtual distributed file system that offers a data abstraction layer and interface between computation frameworks and storage. As hybrid, multicloud environments grow, we will see other data management and integration solutions introduced to address how organizations can avoid swamping networks with massive lift-and-shift data migration to the cloud. Orchestration can also help organizations adhere to regulations that require them to be highly selective about what data gets migrated to the cloud and for which workloads.

Metadata and data catalogs. Knowledge about how data is defined, its lineage, and how it is related to other data is crucial to getting value from data, whether through visual data exploration, advanced analytics, or AI and machine learning. Because they are not tightly integrated, virtualized, loosely coupled, and distributed systems especially need access to good, centralized metadata to coordinate data meaning and collaborative understanding across users and applications. Although not all the same, solution providers such as Alation, Cambridge Semantics, Collibra, and Waterline Data are gaining prominence by variously providing smart, AI-augmented data catalog development and management, faster data discovery, and more self-service, business-driven examination of data relationships. This is a key area for modern, diverse data architectures.

Data quality and governance. These two areas are "mom and apple pie" in that they are important to the success of every kind of data management system and BI or analytics application, whether strictly on premises or in a hybrid multicloud environment. Yet, obviously, in the latter scenario there are new potential exposures to data inconsistency, redundancy, and poor governance. Organizations need data preparation, governance, and integration solutions that enable them to control what data is migrated where. I met with Trifacta at Strata Data, where the company announced new data quality assessment, remediation, and monitoring solutions. Trifacta and other vendors are using AI techniques such as machine learning to enable data profiling and cleansing for higher volume, speed, and variety of data.

The Cloud: Reshaping Views of Data

Just as Hudson Yard's mirrored skyscrapers are changing the look and feel of New York City's West Side, cloud platforms are reshaping how organizations need to view data management and architecture. Older ways of preparing, integrating, and governing data are proving inadequate in hybrid, multicloud environments that are expanding in data volume, speed, and variety. Before leaping into the cloud with both feet, organizations need to assess their readiness and evaluate solutions that might better fit the new data landscape.

About the Author

David Stodder is director of TDWI Research for business intelligence. He focuses on providing research-based insight and best practices for organizations implementing BI, analytics, performance management, data discovery, data visualization, and related technologies and methods. He is the author of TDWI Best Practices Reports on mobile BI and customer analytics in the age of social media, as well as TDWI Checklist Reports on data discovery and information management. He has chaired TDWI conferences on BI agility and big data analytics. Stodder has provided thought leadership on BI, information management, and IT management for over two decades. He has served as vice president and research director with Ventana Research, and he was the founding chief editor of Intelligent Enterprise, where he served as editorial director for nine years.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Reshaping Data Management for a Hybrid, Multicloud World

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Reshaping Data Management for a Hybrid, Multicloud World

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career