The Surprisingly Strong Case for Mainframe-based Analytics
Some of us in data management might be finished with the mainframe, but the mainframe isn't finished with us.
- By Steve Swoyer
- May 4, 2016
Some of us might be finished with the mainframe, but the mainframe isn't finished with us.
Mainframe mainstay IBM Corp. underscored this point when it announced its new z/OS Platform for Apache Spark, which permits mainframe users to run Spark workloads in z/OS, IBM's premier mainframe operating system. The z/OS Platform for Apache Spark is one of several recent analytics-oriented offerings for Big Blue's System z mainframe.
IBM's case for System z's relevance is pretty straightforward: if you already have a mainframe in-house, and you're continuing to invest in that system, why not make it the centerpiece of your business intelligence (BI) and data warehousing (DW) efforts?
This isn't a weak argument. Consider the raft of analytic products or features IBM now offers for System z:
- IBM z/OS Platform for Apache Spark
- IBM DB2 Analytic Accelerator for z/OS
- IBM Smart Analytics Optimizer for DB2 for z/OS
- IBM InfoSphere BigInsights for Linux for System z
- IBM Cognos BI for z/OS
- IBM SPSS for Linux for System z
Porting and certifying apps that already run in Linux to run on zLinux is one thing; coding or porting apps to run in z/OS is quite another. Kathryn Guarini, vice president of IBM's zSystems and LinuxOne units, says Big Blue isn't trying to create a market for big iron analytics; instead, she claims IBM is giving its mainframe customers just what they want.
"Across many of the big industries ... the mainframe continues to play a very significant role for transaction processing and data processing. Those [applications] aren't going anywhere. Those [mainframe-based] data sources are maintaining and ... growing organically -- growing based on these new workloads [that are available for] the platform."
To a degree, this view is endorsed by other data management-oriented vendors. For example, data integration (DI) specialist Diyotta Inc. makes mainframe DI a conspicuous part of its marketing message. There's a reason for that, argues Ravindra Punuru, chief strategy officer with Diyotta.
"We have a solution ... for integrating the data from the mainframe [in] EBCDIC [format]," Punuru says, explaining that EBCDIC is a non-ASCII mainframe data format. Most types of mainframe ETL -- where scripted (S)FTP file transfer is still an all-too-viable option -- convert EBCDIC to ASCII in a middle-tier staging area, he contends. Diyotta moves EBCDIC data en bloc from the mainframe to Hadoop, which it uses as a combined storage and data processing layer.
"We came up with our Hadoop plug-in, which we wrote for a customer. Basically, we heard from several [customers] with this problem, so we wrote custom code that actually does convert the EBCDIC files in a multiple-parallel way in Hadoop to convert to ASCII," Punuru continues.
"[Mainframe data integration] is a market that just won't go away. People keep trying to write it off. If you are a large enough company, chances are [good that] you have a mainframe."
Rocket Software Inc. isn't exactly a disinterested observer of the big iron space, either. It markets a range of mainframe-oriented offerings, including its flagship Rocket Data Virtualization product. It says it developed Rocket Data Virtualization to address the very use case -- in situ data access, data processing, and analytics -- for which IBM positions System z.
Gregg Willhoit, managing director and general manager with Rocket Software, told BI This Week last summer, "We completely built an engine optimized for [System] z and runs completely on zIIP. From IBM's perspective, they said, 'We have to do this.' Ninety percent of [their customers] were doing ETL off of the mainframe or off of Oracle [on the mainframe] or whatever. IBM said to them, 'What you're doing is expensive, the data is no longer current, it's no longer as secure, you [have] multiple degrees of separation -- and it's unnecessary, because we're giving you this [zIIP engine].
"We use IBM hardware for some significant compression advantages. We can do this inexpensively by taking advantage of some of the technologies [IBM] provide[s] for System z. You can use [Rocket Data Virtualization] as an extract and load tool, but in this case you don't have to run a bunch of complex extract, transform, and load scripts. Rocket Data Virtualization handles the transformations. It virtualizes in-place, in-memory, then loads the data into the database."
A Linux-Centered Mainframe Initiative
IBM's Guarini also heads up LinuxONE, Big Blue's ambitious Linux-only mainframe initiative. IBM kicked off LinuxONE last August, announcing plans to support several analytic or data management workloads in Linux on System z, including Spark, Node.js, MongoDB, MariaDB, and PostgreSQL.
IBM has marketed Linux-centric mainframes, such as its z800 and z9 Business Class, in the past. The mainframes it offers under the LinuxONE umbrella are Linux-only beasts, however. These systems are shorn of z/OS, z/VM, and other traditional mainframe technologies.
Guarini says IBM believes LinuxONE will entice net-new customers to consolidate Linux workloads onto zLinux. "We introduced Linux on the [z] platform sixteen years ago and since then we've had tremendous adoption of Linux [on the mainframe]. Today 28 percent of all of the capacity for zSystems in the field is running Linux. What we did [with LinuxONE] is we introduced a new line of servers in a Linux-only environment targeted at net-new opportunities. The idea is that [customers can] leverage the scale-up capabilities -- the high performance, the security, the availability -- of the enterprise [System z] platform ... but exclusively for Linux workloads."
With LinuxONE, then, IBM is going after Linux workloads running on commodity (x64) systems.
"What we've tried here is to design a platform that is optimized for the Linux environment to compete with the scale-up Linux solutions in the marketplace," she explains.
Big data platforms are disproportionately powered by Linux systems, as are the streaming ingest, data processing, and analytic technologies associated with the Internet of Things (IoT). What's more, the mainframe itself is home to many potential signalers -- such as IBM's venerable CICS Transaction Server -- that could be grist for IoT or complex event processing (CEP) analytics.
Bringing Analytics and Data Processing Workloads Back to the Mainframe
IBM has given mainframe shops plenty of other reasons to bring data processing and analytics workloads back home. Consider IBM's DB2 Analytics Accelerator, or DAA, for z/OS, now in version 5.1. It's based on the massively parallel processing (MPP) analytics database technology IBM acquired from the former Netezza Inc.
With its zEnterprise mainframe launch in 2010, IBM introduced a new mainframe-open systems hybrid: the zEnterprise BladeCenter Extension (zBX). zBX, for folks who don't speak zEnterprise-ese, is a technology "sidecar" that, when used in conjunction with Big Blue's Unified Resource Manager, permits IBM's BladeCenter platforms, along with zEnterprise mainframes, to be managed as a single virtualized system.
In other words, shops can manage Windows workloads (including Windows-based BI and DW workloads) along with their mainframe, Unix, and Linux workloads. The wrinkle here is that to the extent a shop is willing to pony up the cash for a zBX, any Windows workloads running in the zBX will essentially become the property of the mainframe group (just as Linux workloads running in a zBX, or hosted by zLinux, become the property of the mainframe group).
The message, in a sense, is clear: you can centralize all of your workloads on or near the mainframe. You can manage all of your workloads from the mainframe. Why not shift your BI and DW workloads back to big iron?
This message was strong back in 2012. In light of z/OS Platform for Apache Spark, LinuxONE, and IBM's InfoSphere BigInsights for Linux for System z, it has become much stronger. System z is credibly positioned as a platform for both in situ and downstream analytic use cases.
The mainframe's in situ use case is simple enough: from IBM's CICS Transaction Server to any of the thousands of third-party and custom apps (some of them decades old) that power critical business processes, the mainframe remains a premier source of time-sensitive business information.
Spark is a good general-purpose analytic platform, flexible enough to accommodate traditional SQL analytics, streaming analytics, machine learning, and other advanced analytic requirements. Spark also exposes RESTful APIs, either via an open source add-on project (the Spark JobServer) or its own still-gestating REST API. (z/OS is itself REST-ready, for the record, thanks to IBM's z/OS Connect software.) With support for Java, Scala, Python, SQL, and other languages, Spark is a polyglot analytic platform, too.
Running Spark in z/OS gives customers a cost-effective means to comb through a CICS transaction trail to flag events, detect anomalies, or identify complex, interrelated events at the point of origin.
For some customers, argues Kathryn Guarini, the ability to run BI and analytic workloads on System z -- and in z/OS itself -- is a critical requirement. She positions the new z/OS Platform for Apache Spark as both a complement to and extension of Spark running in Linux on the mainframe, which IBM has supported since 2015.
"Earlier, we had announced the ability for Apache Spark to run in the Linux environment as well, but [the z/OS Platform for Spark is a new offering]. It allows access to structured and unstructured data through Spark APIs to data sources on the mainframe, as well as in other systems."
The mainframe's usefulness as a platform for downstream analytics is no less compelling. There's a case to be made for something like traditional ETL -- e.g., using Spark to prepare derived data sets for consumption by downstream requesters -- as well as in situ access, via data virtualization.
"The neat part about having a data virtualization server on zSeries is that you can also support mainframe apps, such as CICS or IMS," Rocket Software's Willhoit notes. "All of the mainframe environments, all of the mainframe data [stores] are supported, they're all consuming clients of our data virtualization engine that can use our SQL interface."
He continues, "Usually when you talk about data virtualization, you're talking about SQL, NoSQL, Web services interfaces, and REST, but the applicable data sources should include mainframe applications such as CICS and IMS, too. You can get to data on z[Series] and off, so if you wanted to join data in Oracle with [data] in CICS ... you could do that."
Against expectations, IBM's zSeries mainframe continues to grow -- at least in context. The long-term trend is clear: organizations that have large mainframe investments continue to expand those investments, while companies that are less dependent on big iron continue to migrate off of it.
IBM is losing mainframe customers in absolute numbers. At the same time, large customers tend to reinvest in ever-bigger mainframe systems. For this last category, it now makes practical, technological, and even economic sense to locate data processing and analytics workloads on their mainframe systems. This wasn't necessarily obvious 15 years ago, but in 2016, it's close to being a no-brainer.