Executive Q&A: The Value of Improved Data Management
How should your organization respond to changing data management requirements? Upside spoke to Luke Han, cofounder and CEO of Kyligence, to discuss today's trends and best practices.
- By James E. Powell
- November 22, 2021
Today's surging data volumes are challenging data management and analysis. Add multiple cloud platforms, data sources, and the need to integrate diverse technologies and platforms, and many users are unable to find and access their most valuable data. Upside spoke to Luke Han, co-founder and CEO of Kyligence, to discuss how these trends are driving evolution in data management and changing data analytics.
Upside: What are some of the challenges making the road of enterprise data management and analysis more ambiguous?
Luke Han: There are three main challenges that enterprises face in achieving the maximum benefit from their data. First, the compounding effect of continually adding new data sources, and thus more data, dilutes the value of data under analysis.
Adding demographic data enriches the data set, which is like adding electrolyte to tap water -- it is good and can be done easily. The challenge we face today is that we also have many new sources for the transaction data (e.g., from online purchases, business partners, and mobile apps). We suddenly have data for every page visit, every click, and every location. This is like upgrading a faucet to a fire hose in your kitchen. In theory you have access to a lot of water, but how much of it will go wasted if you don't have the right tool or technology to process it?
Second, the increasing reliance on data captured or purchased in the cloud raises questions about how to rationalize on-premises data as part of an analytics strategy. For many organizations, data generated on premises cannot leave the confines of its firewall. This complicates the creation of a complete picture of the truth.
Having both on-premises and cloud applications does complicate the analytics process. Cloud data may be scattered on different cloud platforms (such as AWS, Azure, or GCP), and it is very difficult to build a single view of data across cloud platforms and on-premises platform, due to security, network delay, data format, etc. issues.
Finally, the explosive growth of data sources puts more pressure on data engineers to prepare more data for analysis. That means a spike in ETL jobs that represents a lot of staff hours and the consequent expense of creating, maintaining, and manually tweaking and debugging a lot more code.
How have enterprises changed as a result of the complexity of multiple cloud platforms and surging data volumes?
Enterprise data organizations have broadened their view of what constitutes the cloud. The notion of deploying a multicloud environment or data fabric is more mainstream than theoretical because most organizations are already dealing with the headaches of multiple clouds, complicated trust boundaries, and the practical reality of virtually unlimited data available for consideration
Analytics organizations will have to follow the lead of other parts of IT that are adopting AI for DataOps -- using AI and machine learning (ML) to predict, diagnose, heal, and automate operations. Using analytics to make analytics better? Makes perfect sense!
The next logical step for AI for DataOps in large scale analytics programs will be a dramatic increase in analytics process automation through the use of robotic process automation (RPA) and low-code platforms. In the past, these types of tools have demoed well but have seemed like a solution looking for a problem. With the massive and accelerating scale of modern analytics, that problem has definitely arrived.
How are IT organizations responding to these new/changing needs and have they been successful?
Companies are getting serious about metrics and metadata. Enterprises have had no choice but to change in response to the new scale, complexity, and rapid evolution of data environments. The result of half a decade of digital transformation projects means that virtually any and all lines of business -- and their associated rules and processes -- have generated an order of magnitude more metadata and metrics that must be tracked. This has overwhelmed traditional reporting practices and suggests that performance metrics must be consolidated, tracked, and constantly reviewed and evaluated.
Beyond data volumes, the value and importance of corporate metrics and metadata can hardly be overstated. That is why enterprises have been putting increased emphasis on systematically managing this data -- with varying results. The variety and volume of this data means that it needs the same type of attention as the source data. The data management aspect of this challenge is fairly well understood, but the semantic layer needed to understand the meaning of these metrics and metadata is a work in progress. Look for AI/ML to be applied here as well.
What are the three best practices an organization should adopt to address these issues?
First, a cardinal rule that has been in place for many years is more critical than ever: minimize data movement and data duplication. At these data volumes and heterogeneous operating environments, data movement kills your budget, your network, and your efficiency. Organizations must create a virtualized conception of their data as a whole that takes into account the location and format of the data source. In short, they must connect data, not collect it at a central location.
Second, build and accelerate your ability to use the cloud to your advantage. You've dipped your toes in the cloud pool; it's time to take the plunge. By the time you have developed the skills, the strategy, and the cost model to make cloud work for you, there will be a whole new universe of best practices that are far easier to exploit than manual on-premises processes.
Last, organizations must push the boundaries of intelligent automation. Tools and platforms that do not self-tune and self-manage will become legacy and then vanish. There is simply too little time to spare for the manual process. Complete, immersive, intelligent automation is the end game for data management and analytics. The first stepping stones of that path are before us now.
Some companies implement a hybrid cloud analytics model that reserves some functions to be performed in cloud-hosted environments while others utilize on-premises servers. What's the benefit of this?
For the next decade or so, there will remain compelling reasons for many organizations to have large data centers and an on-premises strategy for analytics, master data management, and R&D. Sensitivities about data security, data gravity, and technical debt dictate that on-premises operations still have a role to play.
As others have observed, many organizations have been bitten financially for doing too much, too soon in the cloud. The economics of cloud are compelling but not yet so clear cut that there is universal adoption, although a tipping point approaches.
Amazon AWS has had a significant lead in the public cloud space while the competitors are catching up quickly. This trend will drive down the cloud infrastructure cost. Hybrid cloud strategy has evolved to running workloads on both on premises and more than one public or private cloud platform. On-premises operations will remain an important component, especially for large enterprises, for both cost and security reasons.
About the Author
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.