TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

The Shortcomings of Predictive Analytics

Data scientist Claudia Perlich explains why we must use machine learning and predictive technologies ethically, responsibly, and mindfully.

By Steve Swoyer
March 8, 2017

Do data scientists need a refresher course in the Hippocratic precept "first, do no harm"?

This is a question that data scientist Claudia Perlich has spent considerable time grappling with.

Predictive Models Have Unintended Side Effects

Perlich, chief data scientist with marketing analytics specialist Dstillery, believes data science and advanced analytics are powerful tools for human good, and she'll be making this case at TDWI's upcoming Accelerate conference, in Boston April 3-5. Accelerate will feature tutorials and presentations by Perlich and other industry luminaries.

In all likelihood, few sessions will be as provocative as Perlich's. Even though she's an unapologetic champion of predictive analytics, Perlich recognizes that machine learning and other technologies can only be forces for good if people use them ethically, responsibly, and mindfully.

For Further Reading:

3 Flavors of Predictive Analytics Automation

Taking Advantage of Predictive Models

AI in the Crosshairs

"I'm a huge fan of this technology. I love what I do and I've been doing it for almost 20 years. In that time, I've collected a deep understanding of why things don't work, often for very surprising reasons that have nothing to do with classical reasons," she explains. "I'm really interested in when and why things fail. 'Failing' isn't [the right word]. I'm talking about 'unintended side effects' -- [things] you didn't really count on when you decided to build models and put them out there in the wild."

First, Perlich says, we have to recognize that predictive models embody the acknowledged and unacknowledged biases of the people who created them.

"If you use a machine learning system to automatically screen job candidates ... your predictive model may propagate historical biases. If a model makes predictions [based on] what has happened in the past, it is bounded by [the selection criteria of] the past," Perlich says. "All of us who are enthusiastically building these models need to develop a moral sense of responsibility ... about how and when they are put to use."

Models Give You Exactly What You Ask For

This "moral" sense isn't just limited to scrubbing biases out of models. In some cases, a predictive model is optimized to predict the letter but not the spirit of what the modeler desires.

"I have seen the exact analogous effect in advertising ... when we talk about models that predict who will click on ads and we try to select those opportunities with the highest probability [of click-through]. You're trying to find the people most interested in the product -- people who will actually buy the product," she explains.

"This ignores the fact that people tend to accidentally click on ads. A person has eyesight problems; a person has lent their device to their three year old; a person is distracted. If you base your model on all [click-through data], you're going to ... end up with something that is technically correct but doesn't actually do what you want it to do."

Data scientists don't just have a responsibility to the strict letter of a requirement -- e.g., predicting successful job applicants or click-through opportunities -- but to the spirit of what they're trying to model and measure, she argues.

"The model is doing its job. It will find you a set of opportunities with the highest click-through rate. The applicant recommended by the [candidate-screening] model will be highly likely to succeed. [However,] you are stuck with this incompatibility where you're saying you want one thing and your model is giving you something else entirely," Perlich says.

"The discrepancy between the two objectives will increase as you are more able to do [the one thing] really, really well, [be it identifying] higher click-through rates or successful job applicants."

Designing Better Models

When you're designing predictive models, there are a couple of things to be alert to, Perlich says.

"You should never have any single technical criteria -- you should never focus just on click-through rates, for example. You should never try to do too much with your [individual] models. It's hard to build models that are optimized [for] many things at the same time," she observes.

"If your model is getting too good, it's almost always a problem. There was an example where we built a really good model that predicted breast cancer -- except it didn't. The only thing it had basically learned was ... that people in a [breast cancer] treatment center are more likely to have cancer than people in a breast cancer screening center."

Perlich sees the zero-sum character of societal debate about data mining and data science as a distraction. "The criticism being brought forward against data mining and data science is, in principle, often correct, but at the same time the antagonism between the critics of data science and its actual practitioners is exaggerated and nonproductive," she points out. "We're being told from a privacy point of view that everything we do is evil. What we need to ... collaborate on are better options to do these things the right way."

Because of its power, predictive technology will be used. It's inevitable. The challenge is to promote ethical and responsible usage.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Shortcomings of Predictive Analytics

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Shortcomings of Predictive Analytics

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career