TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Tackling Bias and Explainability in Automated Machine Learning

Automated machine learning is likely to introduce two critical problems. Fortunately, vendors are introducing tools to tackle both of them.

By Fern Halper
August 17, 2020

Adoption of automated machine learning -- tools that help data scientists and business analysts (and even business users) automate the construction of machine learning models -- is expected to increase over the next few years because these tools simplify model building. For example, in some of the tools, all the user needs to do is specify the outcome or target variable of interest along with the attributes believed to be predictive. The automated machine learning (autoML) platform picks the best model.

For Further Reading:

Automated Machine Learning and the Future of Data Science Teams

Artificial Intelligence and the Data Quality Conundrum

Q&A: An Introduction to Deep Learning

These tools offer several benefits. First, they can help data scientists become more productive. Second, autoML can help those who are not data scientists (e.g., modern data analysts) build models. At TDWI, we've recommended that organizations that want to use these tools should still have the skills to verify the insights produced. There are a few particular areas that are critical for model builders to address, regardless of their skill level. These include bias and explainability.

Bias comes in many forms. For instance, on the data collection front:

Sample bias occurs when data doesn't represent the environment (e.g., the problem space) where a model might be deployed
Prejudice bias arises when training data contains information about race, gender, or nationality
Exclusion bias occurs when certain data might be removed from the training set because the data is deemed irrelevant

On the model front, you can find:

Measurement bias can be introduced when the training data differs from production data
Algorithmic bias might occur when a model was trained on data that results in unfair outcomes

Understanding and mitigating bias is crucial because machine learning models often make decisions that affect our lives -- in medicine, criminal justice, hiring, and finance.

Explainability involves describing the why behind an ML prediction in a way a human can understand. For example, a customer should be able to understand why his loan application was rejected; a doctor should understand why a system might have made a certain diagnosis. Aside from ethical and transparency factors, new regulations also require explainability. For instance, Article 22 of the GDPR states that users have the right to review automated decisions. That requires that a model used to derive business decisions be understandable -- and that means explainable by those who created the model.

Bias, Explainability, and AutoML

According to TDWI research, if users stick to their plans, autoML adoption is expected to grow significantly over the next few years. That means that, theoretically, business analysts and even business users might be using these tools to build models. These models can be operationalized as part of a business process or they might be used to simply provide insights. Regardless, model builders will need to be able to explain the output and how biased data can affect it.

At a minimum, users need to understand the risk of bias in their data set because much of the bias in model building can be human bias. That doesn't mean just throwing out variables, which, if done incorrectly, can lead to additional issues. Research in bias and explainability has grown in importance recently and tools are starting to reach the market to help. For instance, the AI Fairness 360 (AIF360) project, launched by IBM, provides open source bias mitigation algorithms developed by the research community. These include bias mitigation algorithms to help in the pre-processing, in-processing, and post-processing stages of machine learning. In other words, the algorithms operate over the data to identify and treat bias.

Vendors, including SAS, DataRobot, and H20.ai, are providing features in their tools that help explain model output. One example is a bar chart that ranks a feature's impact. That makes it easier to tell what features are important in the model. Vendors such as H20.ai provide three kinds of output that help with explainability and bias. These include feature importance as well as Shapely partial dependence plots (e.g., how much a feature value contributed to the prediction) and disparate impact analysis. Disparate impact analysis quantitatively measures the adverse treatment of protected classes (e.g., is any class being treated differently by race, age, or gender).

With these features, model builders can examine the analysis and determine if their model is adversely impacting any group. This output can be used to determine whether the model is fair and what next steps need to be taken with the model.

A Final Word

These are tough problems to solve and work is just beginning. The good news is that vendors and end users are both becoming aware of machine learning bias issues. Additionally, they are starting to care about biased model output and take it seriously -- and not simply because of legal and compliance issues.

The first step is to become aware and educated about the problem and how to address it. This includes understanding human as well as technology approaches to help mitigate bias.

About the Author

Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is VP and senior research director, advanced analytics at TDWI Research, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at fhalper@tdwi.org or LinkedIn at linkedin.com/in/fbhalper.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Tackling Bias and Explainability in Automated Machine Learning

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Tackling Bias and Explainability in Automated Machine Learning

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career