Executive Summary | Data Management for Advanced Analytics
Executive Summary for the TDWI Best Practices Report: Data Management for Advanced Analytics
- By Philip Russom, Ph.D.
- May 29, 2020
Modern enterprises are expanding their analytics programs to improve their ability to make fact-based decisions, plan for an uncertain future, compete on analytics, and grow customer accounts. These high-value business goals require advanced forms of analytics, which in turn demand use-case-appropriate data integration, data platforms, and other data management (DM). Without the right data in the right format on the right platform, critical and expensive efforts in advanced analytics (AA) have little or no business value.
Addressing this problem is challenging because there are many forms of AA, including statistical analysis, data mining, clustering, graph, neural net, text mining, natural language processing, artificial intelligence, machine learning, and predictive analytics. Likewise DM includes many types of databases and other data platforms plus tools for integration, quality, metadata, event processing, and so on. To sort this out, this report defines data management for advanced analytics (DM for AA), which tailors established and emerging DM best practices and techniques to specific forms of AA, thereby raising the precision, productivity, and business value of analytics.
The secret to successful DM for AA is to match a combination of DM platforms and tools to each specific use case for AA. For example, for analytics approaches that demand massive data volumes (e.g., mining, clustering, statistics), users tend to deploy Hadoop or a cloud-based DBMS for their analytics data. Some analytics tools run best “in database,” which means you must acquire a data platform that supports the form of in-database analytics you need. Real-time analytics requires tools for real-time data ingestion. To succeed with self-service analytics, you need solid business metadata and possibly a data catalog.
Most people responding to this report’s survey (94%) find DM for AA to be an opportunity because it increases the usefulness, accuracy, and business value of advanced analytics. The leading benefits of DM for AA include improvements to operations, analytics outcomes, DM upgrades, and real-time data and analytics. The downside is that DM for AA involves more work and expertise for data management professionals plus a longer list of data platforms and tools to acquire and manage. Potential barriers to successful DM for AA may arise in governance, architecture, skills, and DM infrastructure. Given its numerous compelling benefits, most survey respondents consider DM for AA to be extremely important (79%).
Users perform DM for AA with a wide range of data platforms and tools, both on premises and in the cloud. These include data warehouses (81% on premises, 33% cloud), data integration platforms (68% on premises, 32% cloud), data lakes (43% on premises, 29% cloud), and analytics tools (81% on premises, 42% cloud). These are currently prominent on premises yet well established on cloud platforms. TDWI expects the “cloud gap” to shrink as cloud providers and software vendors raise the maturity of their offerings. Furthermore, survey data suggests that data volumes for AA managed on cloud platforms will quadruple within three years. Other tools important for DM for AA include those for data semantics, data virtualization, self-service data, and real-time integration for real-time analytics.
This report canvasses current and future data management strategies and best practices, then links combinations of these to the leading forms of advanced analytics. The focus is on data management more than analytics. The intention is to help DM and AA professionals and their business counterparts achieve greater success and business impact. Two of the hottest growth areas in AA today are self-service data practices and machine learning, and so this report concludes with detailed discussions of DM requirements for these.
Datastax, Denodo, Hitachi, Matillion, Oracle, SAP, Snowflake, and TIBCO sponsored the research and writing of this report.
About the Author
Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email (prussom@tdwi.org), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).