TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Modernize and Govern: Unifying Your Data Strategy July 10, 2025
  - Expert Panel: Best Practices for Modernizing Your Data Environment July 14, 2025
  - Powering Data Science with AI-Driven Tools and Practices July 15, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 17, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Blog

TDWI Blog: Data 360

Sparks are Flying in 2015

By David Stodder, TDWI Director of Research for Business Intelligence

We are past the half-way point of 2015. Major League Baseball is celebrating its all-stars in Cincinnati as teams contemplate trades that they hope will make them stronger for the second-half run. Meanwhile, fall sports are starting to stir; National Football League teams open their training camps around the end of the month. Even pumpkin farmers are aware of time passing; to have fully grown pumpkins for Halloween, they need to have their seeds planted by now. While the air is warm and the sun isstill high in the sky, it’s a good time to contemplate significant trends in our industry this year.

The top trend on my list would be the flourishing of Apache Spark, the open source parallel processing framework (or “engine”) for developing analytic applications and systems working with big data. If Spark “went supernova in 2014,” as Stephen Swoyer put it in a fine article earlier this year, the energy from its explosion is forcefully generating a lot of industry activity in 2015. And not just among the small, newer vendors: IBM, Intel, Microsoft, and other mainstream vendors have issued major Spark announcements and product releases already this year, with more to come. Describing Spark’s potential impact, IBM experts have called Spark “the next Linux.”

As I learned at Strata in February and even more at the Spark Summit in June, Spark is shaking up the big data realm, whichhasbeen dominated by Hadoop, MapReduce, Hive, and Storm technologies. While compatible with them, Sparkoffers performance and scalability advantages over these technologies, including through support for multi-step pipelines that reduce the wait for steps to complete, and support for in-memory data sharing.

One of Spark’s most important attributes is a unified approach tothe management and interaction with a greater diversity of data. The Spark framework can support not only batch processing a la Hadoop but also interactive SQL, real-time processing, machine learning, and stream analytics. At Strata, I met with Matei Zaharia, CTO of Databricks, which was founded by Zaharia and other members of the University of California, Berkeley’s AMPLab team that created Spark and launched it as an Apache project. He did not envision organizations being satisfied with putting all their data into massive Hadoop data lakes; he saw instead increasing diversity in data sources that users seek to access, which requires the unified framework and processing layer that Spark provides.

Spark has changed the parameters of the debate about how SQL-based business intelligence and visual analytics tools and application users might access big data. With Spark SQL, one of the four primary AMPLab-developed libraries that fit into the Spark framework, organizations could bypass some of the steps that have been necessary to move and transform Hadoop files into data warehouses before they can fully analyze the data. Application programming interfaces, such as SparkR for R language programming, are broadening the toolkit available for analytics.

Spark is not as mature as Hadoop or the SQL-on-Hadoop offerings in the market. Spark is also not the only “star” in the open source interactive analytic SQL query galaxy; Presto, which is now strongly backed by Teradata, is another interesting distributed SQL query engine to watch. All of these technologies are enabling organizations to do broader and deeper analytics with data and are becoming important parts of emerging diverse, “hybrid” data architectures (pardon a shameless plug: this topic will be covered at our Solution Summit in Scottsdale later this year).

Spark is a major trend in 2015. What are other trends you are seeing? I would be interested to hear your thoughts.

Hyperlinks embedded in this blog:

Apache Spark: https://spark.apache.org/

Swoyer article: http://tdwi.org/articles/2015/01/06/apache-spark-next-big-thing.aspx

IBM announcement: https://www-03.ibm.com/press/us/en/pressrelease/47107.wss

Intel: https://software.intel.com/sites/campaigns/sparks/IgnitingSparks.php

Microsoft: http://azure.microsoft.com/blog/2015/07/10/interactive-analytics-on-big-data-with-the-release-of-spark-for-azure-hdinsight/

“the next Linux”: https://youtu.be/CrGB_2GJ-fA

Strata: http://strataconf.com/

Spark Summit: https://spark-summit.org/

Databricks: http://www.databricks.com/

AMPLab: https://amplab.cs.berkeley.edu/