Level: Intermediate to Advanced
Prerequisite: None
As self-service and data lakes spread, more people are encountering a key issue: what do you do when you need to clean data? Whether you are a business analyst, data scientist, or data management professional, this challenge threatens to hinder finding useful insights with data. In this course, you will learn principles and practices to prepare data, whether your task is to create a dashboard, do a quick analysis, or prepare data for machine learning.
Every organization says they want clean, high-quality data, but most are unwilling to pay for the staff and infrastructure that would provide it. We are often provided “self-service” access to data lakes or direct access to source applications. Too often, the data was collected in a format and for a purpose other than analysis, visualization, or modeling. No one promoting self-service mentions having to clean the data first. How do you get it done quickly? What can you use?
You don’t have to be a programmer to do the work, nor must it take weeks to do. There are many no-code or low-code tools available to help you clean data faster. The key is to understand three things: what kind of quality problems do you have, what does your end task require, and what do you need to do to make “good enough” data?
This class is an introduction to data cleaning concepts, practices, and tools, with emphasis on techniques and methods to get the job done quickly.
You Will Learn
- Where dirty data comes from, how it got that way, and why it will continue
- Process and practices for cleaning your data
- Overview of the kinds of tools available, from basic to advanced
- Finding and categorizing data problems
- Techniques, tips, tricks to clean up your data
- Profiling and triage
- Data types and the problems of text file formats
- How and when to fill in missing values
- Dealing with duplicates
- Fixing strings, dates, and other messy data
- Standardizing, conforming, and normalizing data
- What to do with the data after you’re done
Geared To
Anyone who needs to perform data cleanup tasks in order to do the thing they really want to do:
- Business analysts and self-service users
- BI and analytics team members
- Analytics managers
- Data scientists
- Data engineers
- Architects