TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Modernize and Govern: Unifying Your Data Strategy July 10, 2025
  - Expert Panel: Best Practices for Modernizing Your Data Environment July 14, 2025
  - Powering Data Science with AI-Driven Tools and Practices July 15, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 17, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Sunrise at the Lakehouse: Why the Future Looks Bright for the Data Lake’s Successor

A data lakehouse offers plenty of benefits -- including many that are not immediately obvious -- that mark a turning point in the evolution of data analytics.

By Billy Bosworth
July 11, 2022

After suffering from inflated expectations and well-known management challenges, data lakes have made an impressive comeback in the past few years. In fact, they have evolved so much that the industry is rapidly recognizing their full potential by giving them a new name: lakehouses. The term “lakehouse” connotes that data lakes are now robust enough to be considered on par with data warehouses. Beyond just catching up, lakehouses have hidden benefits that offer distinct advantages over the data warehouse architecture.

For Further Reading:

The Data Lakehouse: Bridging Information Gaps in the Enterprise

Welcome to the Lakehouse

Executive Q&A: Data Lakehouses

The Rise of the Data Lakehouse

Let’s look at how we got here. There are two primary catalysts for the rise of the current generation of lakehouses: application developers and the evolution of cloud storage.

First, consider developers. The world of infrastructure choices has more or less always been shaped by developer preference. Because developers write the line-of-business applications that generate revenue, they mostly get what they want in terms of tools and platforms. To build applications that function properly, developers have to fine-tune applications and databases to meet performance goals and service-level agreements (SLAs).

Once their fine-tuned applications are live, some developers get sensitive about who can touch them. However, the data from those applications is critical for data analysts. The data needs to be copied and moved from the application database to some other location -- and often combined with other data from other systems -- where it can then be analyzed.

This process of copying and moving data is largely captured under the broad heading of extract, transform, and load (ETL). To extract data means that at some level you will need to interact with the database. That’s where the application teams get nervous: they just don’t like other teams touching their databases. There’s too much at stake. Therefore, most of the time they create “data dumps” themselves. These data dumps are just extracted data stored in some common file format. Even if you are not a database person, chances are that at some point you opened a file in Excel that was in comma-separated values (CSV) format -- a common format used by application teams when extracting data from their databases.

With modern applications, the cumulative size of these files can get large very quickly. Where will they be stored? In the last decade, the first version of data lakes stored such data in on-premises Hadoop clusters. There are many reasons why that approach has largely fallen out of favor, some rooted in the rise of competitive cloud services.

The attractiveness of cloud storage to application developer teams has soared. Examples of cloud storage are S3 on AWS, Azure Data Lake Storage on Azure, and Google Cloud Storage on Google Cloud. The appeal of these storage layers is that they are, for all practical purposes, infinitely scalable, extremely easy to interface with, and available at very low cost. Add the fact that new applications are usually cloud-native and you have an easy, effective, and inexpensive way for developers to store their data dumps.

Hidden Benefits by Design

This brings us to the first hidden benefit of lakehouses: they reduce data copies. Data engineers do not like moving and copying data, because it adds complexity to the environment every time it’s done. There are governance concerns, limitations on batch window times, intricate job dependencies, increased costs for duplicate storage and additional computation resources, questions of which data set is endorsed for the business to use, and so on.

With lakehouses, those challenges are dramatically diminished because the largest, fastest-growing data sets are those being dumped from cloud applications into cloud storage. That yields an amazing benefit: being able to query the data directly where it lands versus forcing it through numerous ETL jobs and then sending it to a data warehouse for analysis.

The second benefit comes from the breadth of the data available for analysis. Data analysts can never get enough data. However, due to the numerous concerns about copying data already noted, only a subset of application data is typically copied into data warehouses for analysis. The data engineering teams work hard to ensure that the subset of data is what the business needs, but what if you didn’t have to worry about that at all?

With a lakehouse, you can point your data analysts to the entire data set without worrying about subsets and extracts. Data analysts really appreciate this approach and it stops them from trying to “backdoor” the data warehouse teams in search of their own personal copies of data. However, hearing the words “entire data set” may raise a security question. Do you really want all your data available to all users? Of course not, and that is not what is being offered.

To the contrary, having a centralized lakehouse is actually quite advantageous from a security standpoint because lakehouse platforms now provide fine-grained access control. Companies can control who can see what data -- at the table level or even at the column and row level. In fact, eliminating the need to copy data into other systems is a massive security benefit. Because permissions don’t travel with data, as soon as you start copying data into data warehouses (and then create various derivatives within the data warehouse, and extracts outside the data warehouse), the IT team loses the ability to control who can access what data -- or even the ability to see who is accessing what data and when.

Security risks increase dramatically with every data copy. That’s why it makes more security sense to limit the physical locations of the data. By doing so, you are also limiting the number of security controls you have to implement.

Hidden Benefits and Open Standards

Finally, there are two additional hidden benefits available -- provided you design your lakehouse in an “open” way, which is to say you design it using open standards and open architectures.

For the entire history of databases, whenever you wanted to do real analytics, you moved your data into a query engine. Those query engines were generically called databases (and when it came to analytics, data warehouses). For them to work, you had to put your data into their engine.

With open lakehouses, that paradigm changes dramatically. Instead of bringing the data to the engine, you can now bring the engines to the data. Whether it’s a SQL query engine, a Spark engine, or a streaming engine, in an open lakehouse architecture they all have access to your data -- which lives independent from any of them -- via open standards and open formats. This helps teams avoid lock-in with a specific vendor and makes it easy to adopt new, best-of-breed engines on the horizon.

Avoiding unnecessary copies, giving analysts the full breadth of the data set, avoiding vendor lock-in, and adopting modern data engines are all powerful hidden benefits that a company can derive from a well-designed, open lakehouse. These benefits mark a turning point in the evolution of data analytics in service of growing business value. The lakehouse is where accessible data now lives independently from any particular vendor and in an architecture ready for whatever new cloud services the future may bring.

About the Author

Billy Bosworth has been in the tech industry for over 30 years in roles ranging from engineer to CEO to public company board member. He has served as the CEO of Dremio Corporation, a privately held company in the data analytics market, since February 2020. Prior to joining Dremio, Billy served as the CEO of DataStax, Inc. Billy frequently writes and speaks on topics such as data autonomy, data analytics, and BI, and as a coach at heart, he also speaks broadly on career management and leadership. You can find out more about Billy on Dremio’s website or LinkedIn. You can follow Dremio on Twitter.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Sunrise at the Lakehouse: Why the Future Looks Bright for the Data Lake’s Successor

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Sunrise at the Lakehouse: Why the Future Looks Bright for the Data Lake’s Successor

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career