From Legacy to Hybrid Data Platforms: Managing Disparate Data in the Cloud (Part 1 of 2)
How to integrate legacy systems into an adaptive analytics fabric for the hybrid cloud.
- By David P. Mariani
- January 22, 2020
In spite of loud and ubiquitous proclamations that the only way to do business is on and over the cloud, there are compelling reasons why businesses with legacy systems, some of them decades old, would choose to hold on to those systems.
In the 1960s, for example, mainframes became the de facto systems for processing large sets of data. To this day, many large enterprises and financial firms still use these systems for their workhorse reliability. Beyond the mainframe, there are legitimate reasons for any enterprise to continue using certain legacy systems. For example, there may be years and layers of business integrations into the legacy systems that would be difficult to lift and shift to the cloud. The systems might just work, providing reliability and five decimal uptimes that could be critical for the enterprise's success.
All of this being said, when your enterprise IT teams are deploying new systems within your organization, they're most likely going to be in the cloud. This, you can imagine, is a nightmare for IT teams who have to bridge the gap with siloed, fragmented data between the legacy systems staying in place and newer cloud technologies deployed. Additionally, BI teams or data analysts are faced with the challenge of integrating and analyzing data sets from two or more disparate analytics platforms.
The challenge thus becomes for the enterprise: how do we marry this massive amount of siloed data without re-engineering the entire IT architecture, and at the same time give the right data to the right people -- fast -- so they can make decisions that drive the business?
Challenges of a Harmonious Hybrid Environment
Hybrid cloud models have thus risen in favor in recent years, marrying these old-school systems with newer, more agile successors. According to recent research, the global market for hybrid cloud is expected to grow from $44.60 billion in 2018 to $97.64 billion by 2023.
Across the enterprise there are many pockets of legacy data in different systems in completely different formats. Data transformation often becomes a challenge, where the legacy data must be reformatted or updated to integrate with newer cloud solutions. Some cloud solutions may even require data to be in their proprietary formats, increasing the difficulty when integrating data from legacy systems and causing new interoperability challenges between the new cloud solutions and other databases or business intelligence tools.
Additionally, the manual data engineering required to normalize legacy data with that of the cloud is an ongoing expense that may cost the company months of time, causing business disruptions and delays for users. Although legacy systems are ingrained in the ecosystems of the enterprise, query performance is often abysmally slow; queries on databases with billions of records can take days to return.
Adaptive Analytics Fabric: A Modern Approach for Hybrid Cloud Success
Data virtualization has existed for many years but only recently have new capabilities come online that enable companies to leverage disparate legacy and modern data across the hybrid cloud, bringing it together for BI teams and the greater business.
Specifically, as part of an overall adaptive analytics fabric (the virtualized data and associated tools to aid analytics speed, accuracy, and ease of use), virtualization empowers companies to treat all their disparate data repositories as a single, unified data source that's extensible to support future technologies. A fabric provides a bridge across data warehouses, data marts, and data lakes, delivering a single view of an organization's data without having to physically integrate, engineer, or re-architect it. This abstraction enables enterprises to instantly surface usable data, no matter where it's actually stored, to produce fast, timely insights.
The ability to merge data from different sources reveals another advantage. Rather than combining data into a single system that necessitates formatting data for the lowest common denominator of capability, adaptive analytics fabrics enable enterprises to store data in the data structures that best fit its use (e.g., in legacy systems). For example, time-series data can be stored one way and relational data can be stored another way, enabling companies to use the specialized analytics formats for that data. Data can live in the format best suited for its utility while the analytics fabric adapts to business or operational analytics users by translating and presenting the data as needed.
With regard to querying necessary legacy systems, autonomous data engineering, as part of an adaptive analytics fabric, shortens these query times from days (on large data sets) to hours or minutes. As queries are run against data sets in the analytics fabric, machine learning is applied to determine what data within the larger set is needed, bypassing extraneous data altogether during the query process.
Bridging the Gap from Legacy to Hybrid
With an adaptive analytics fabric, companies are able to elegantly overcome the three primary challenges with provisioning data over hybrid cloud networks:
- Providing centralized accessibility to data
- Facilitating interoperability of data
- Maximizing performance of each constituent database
With an adaptive analytics fabric, companies can leverage the power of their legacy investments and easily surface their data alongside newer, more efficient systems and treat all data as a single, unified data source to unlock insights and allow the business to drive the bottom line. IT and data engineering teams are saved from costly and time-consuming data preparation, BI teams and data scientists save hours a day in query time, and the business is primed to save big with greater efficiencies in freeing up IT and business analysts' time.
There are additional benefits to adaptive analytics fabrics. In part 2 of this article we will discuss how an adaptive analytics fabric can help with privacy regulation and self-service business intelligence initiatives.
About the Author
Dave Mariani is the founder and chief technology officer of AtScale. Prior to AtScale, Dave ran data and analytics for Yahoo!, where he pioneered the development of Hadoop for analytics and created the world's largest multidimensional analytics platform. He also held the position of CTO for Bluelithium, where he managed one of the first display advertising networks delivering 300M ads per day powered by a multiterabyte behavior targeting data warehouse. Dave is a big data visionary and serial entrepreneur. You can contact the author at LinkedIn.