TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

The Growing Need for SQL for Hadoop

Experienced data management professionals are adopting Hadoop, and they want to use their SQL skills and tools with Hadoop data.

By Philip Russom
March 6, 2017

TDWI research shows that data warehouse professionals are increasingly adopting Hadoop as part of a modern multiplatform data warehouse. A TDWI survey that ran in 2012 found that 10 percent of data warehouse programs canvassed had Hadoop in production in their warehouse environments. Note that this is a count of production systems, not prototypes or skunkworks. Subsequent surveys that repeated the same question saw production systems increase to 14 percent in 2014 and to 20 percent in 2016.

SQL is one of many challenges that data warehouse professionals face as they adopt Hadoop. Data warehouse and other data management professionals want to use their SQL skills and tools with Hadoop data. They have valuable experience being productive with SQL, and they have portfolios of tools that access data and operate on it via SQL queries and interfaces. The catch is that Hadoop is still weak with relational functions in general, and that includes SQL.

Why Is SQL Critical to Hadoop?

Now I know what you're thinking. Why should anyone need a legacy technology such as SQL on a shiny new platform like Hadoop?

Well, SQL just had its thirtieth birthday, and it's as relevant as ever, regardless of platform type.

SQL continues to be the language of data. Other query languages (namely object query language and XQuery) tried to displace SQL and failed. Although Hive on Hadoop is a popular tool among Hadoop users, most of those users are application developers. Data management professionals have no trouble learning HiveQL, but they prefer SQL, and their tools do too.

SQL is familiar and it works. Many people (both business and technical) have SQL skills and use them daily.

Numerous tools that could be used with a Hadoop-based data lake support ANSI SQL. These include tools for exploration, reporting, analytics, visualization, data integration, and data quality. Many users already have these in their software portfolios, and they wish to use them with a data lake.

Data exploration is a top priority for many Hadoop-based data lake users. Most of them want to explore the lake via ad hoc queries based on ANSI SQL. Without SQL, exploration is harder and slower.

SQL on and off Hadoop

One of the hottest debates concerning Hadoop and SQL is about the distinction between:

SQL on Hadoop: SQL executes natively as a process inside Hadoop
SQL off Hadoop: Tools outside Hadoop query Hadoop data

TDWI discusses the two approaches together using the phrase SQL for Hadoop. Our recent data lake survey asked Hadoop users to gauge their need for SQL for Hadoop.

Almost all Hadoop users need some kind of SQL support. A mere 1 percent of survey respondents say they have no need for SQL for Hadoop. This is not surprising -- most of Hadoop's use cases involve practices that tend to be wedded to SQL, namely decision making, exploration, and analytics.

The majority need both SQL on Hadoop and SQL off Hadoop. There's a slight preference for SQL on Hadoop (21 percent) over SQL off Hadoop (11 percent). However, most Hadoop users surveyed need both (67 percent).

This makes sense. A Hadoop implementation may need to satisfy the requirements of diverse user types, which leads to multiple, diverse approaches to data and tools. Furthermore, a range of data integration and quality tools rely on SQL, and a data lake needs these for data ingestion and data flows. Even traditional data warehouse and reporting environments include multiple SQL-based tools for the same reasons.

Some emerging practices for self-service with Hadoop data, including data exploration, self-service analytics, visualization, and data prep, assume SQL. Without SQL on Hadoop, self-service is limited, which in turn limits the range of user types who can get business value from Hadoop data.

To be fair and complete, let's recall that there are also many viable use cases for Hadoop that do not require SQL or any kind of relational function. For example, advanced analytics takes many forms, and diverse forms are enabled by technologies for mining, graph, statistics, associative engines, semantic technologies, and natural language processing.

Further Study

Keep an eye out for my next TDWI Best Practices Report, Data Lakes: Purposes, Practices, Patterns, and Platforms. TDWI will publish it at www.tdwi.org/bpreports on or about April 1, 2017. In the meantime, to hear other views about SQL on Hadoop, replay the 2016 TDWI Webinar, "SQL for Hadoop: When to Use Which Approach," online at www.tdwi.org/webinars.

About the Author

Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at prussom@tdwi.org, @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Growing Need for SQL for Hadoop

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Growing Need for SQL for Hadoop

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career