TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Machine Learning that Automates Data Management Tasks and Processes

Machine learning is not just for predictive analytics. It can also be embedded within tools to automate data management development and optimize execution.

By Philip Russom
July 30, 2018

Today, most efforts with machine learning (ML)are to support predictive analytics, especially when the analytics parse vast amounts of diverse big data. This is an important practice, and it will continue to grow and mature.

However, a few cutting-edge vendors and open-source projects are embedding ML-driven intelligence into data management (DM) tools. Embedded within DM tools, ML algorithms and models typically address three broad goals:

Automation for well-understood but time-consuming development tasks, such as mapping sources to targets, cataloging data, or onboarding new sources

Optimization of system performance, by automatically selecting query optimization strategies, table join approaches, resource management schemes, and distribution methods for data (e.g., hot versus cold storage, memory versus disk, or replication across nodes)

Capacity management via workload-aware autoscaling, spot instance purchasing, and integrating node types in heterogeneous clusters

Machine learning is high value in these contexts because it increases developer productivity, makes advanced functions doable by lightly technical users, and elevates system performance with minimal administrator involvement. Due to these compelling benefits, TDWI expects to see -- in a few years -- most DM functions automated or optimized via ML and other approaches (e.g., rules engines). Here are a few examples.

Data cataloging. Modern tools can catalog and categorize data automatically via machine learning algorithms and models as well as via old-school business rules and application logic. Cataloging can apply to data sources, datasets, tables, or even individual columns and fields. A single data element can be categorized by its domain, compliance risk, quality level, source, lineage, and so on, as the user organization requires. Cataloging each data element multiple ways enriches user searches and queries of the catalog, and it enables richer cross-category analytics correlations.

Data domains. ML algorithms and other tool logic can recognize and catalog data sources and structures that are of particular domains. This helps users who will browse or search the catalog for domains of high interest, such as the customer, product, and financial domains. Advanced algorithms can even detect domains and domain relationships across datasets. ML algorithms can also recognize and catalog data elements that are potentially sensitive in terms of privacy and compliance.

Data lineage. ML algorithms can parse large volumes of complex data (even data distributed across multiple data platforms) to record data pathways and cluster data elements and datasets of common origin. With these details, users can quickly get deep insights into data provenance and impact analysis.

Metadata management. With big data, IoT, and other new sources that are notoriously devoid of metadata, a modern DM tool with ML embedded can parse data and deduce credible metadata. The tool can suggest a metadata structure to a data developer for approval or log that structure in a metadata repository without human intervention.

Data mappings. Time-consuming source-to-target mappings can now be performed by ML models and algorithms. ML's accuracy and breadth increase as it watches successful users map manually. Automated mappings increase the productivity of data developers, data scientists, and data-savvy business users.

Data-anomaly detection. ML has the potential to spot and react to data defects, such outliers, nonstandard data, and various data quality issues. Some tools go beyond detection to automatically remediate data quality issues, based on ML models or encoded business rules.

Upcoming use cases for the ML automation and optimization of DM. In the near future, catalog-based ML will also contribute to data security, governance, capacity planning, system performance, and guided data exploration.

Editor's Note

This article is excerpted from the final section of the 2018 TDWI Checklist Report The Automation and Optimization of Advanced Analytics Based on Machine Learning. Read the entire report online here.

About the Author

Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at prussom@tdwi.org, @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Machine Learning that Automates Data Management Tasks and Processes

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Machine Learning that Automates Data Management Tasks and Processes

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career