Live, Virtual Data and AI Training

Course Description

07/02/2025 Data Cleaning Demystified: Tools and Techniques for Rapid Results

July 2, 2025

9:00 am - 12:30 pm CT
Half-Day (Morning)

Prerequisite: None

Mark Madsen

President

Third Nature

Mark Madsen works on the use of data, analytics, and AI to augment human decision-making and evolve organizational systems. Mark is president of Third Nature where he advises companies on data science and analytics strategy, product design, and data governance. He is also a Fellow emeritus from the Technology & Innovation Office at Teradata.

Mark spent the past 25 years working in the field of analytics and decision support, starting with AI at the University of Pittsburgh and autonomous robotics at Carnegie Mellon University, winning several international awards. Also, he is involved with emerging technology as a researcher, chairs events, is on several conference committees, is one of the faculty at TDWI, and is a member of the Data Engineering & Science Council.

As self-service and data lakes spread, more people are encountering a key issue: what do you do when you need to clean data? Whether you are a business analyst, data scientist, or data management professional, this challenge threatens to hinder finding useful insights with data. In this course, you will learn principles and practices to prepare data, whether your task is to create a dashboard, do a quick analysis, or prepare data for machine learning.

Every organization says they want clean, high-quality data, but most are unwilling to pay for the staff and infrastructure that would provide it. We are often provided “self-service” access to data lakes or direct access to source applications. Too often, the data was collected in a format and for a purpose other than analysis, visualization, or modeling. No one promoting self-service mentions having to clean the data first. How do you get it done quickly? What can you use?

You don’t have to be a programmer to do the work, nor must it take weeks to do. There are many no-code or low-code tools available to help you clean data faster. The key is to understand three things: what kind of quality problems do you have, what does your end task require, and what do you need to do to make “good enough” data?

This class is an introduction to data cleaning concepts, practices, and tools, with emphasis on techniques and methods to get the job done quickly.

You Will Learn

  • Where dirty data comes from, how it got that way, and why it will continue
  • Process and practices for cleaning your data
  • Overview of the kinds of tools available, from basic to advanced
  • Finding and categorizing data problems
  • Techniques, tips, tricks to clean up your data
  • Profiling and triage
  • Data types and the problems of text file formats
  • How and when to fill in missing values
  • Dealing with duplicates
  • Fixing strings, dates, and other messy data
  • Standardizing, conforming, and normalizing data
  • What to do with the data after you’re done

Geared To

Anyone who needs to perform data cleanup tasks in order to do the thing they really want to do:

  • Business analysts and self-service users
  • BI and analytics team members
  • Analytics managers
  • Data scientists
  • Data engineers
  • Architects

Half-Day Pricing: $350

Train more, save more. Click here to learn how.

Register Now