How to Develop a Data-Literate Workforce
These three fundamental approaches will guide you as you implement data literacy across your enterprise.
- By Sharad Varshney
- August 16, 2021
Data literacy is critical. When business users are data literate, they can collaborate on data projects, innovate, and make better business decisions. Ultimately, when data literacy is widespread, an organization can experience companywide growth.
In this article, we'll cover the steps required to implement -- and maintain -- a successful data literacy program. Core steps include introducing data discovery mechanisms, identifying and standardizing common terms and definitions, classifying company data assets, and making data searchable.
Why Data Literacy Is Mission-Critical
You probably already know the importance of data literacy, but to frame this article, let's position the benefits in a modern data governance setting. The best way to do so is to use an example where the absence of data literacy led to disastrous consequences.
There are many well-known examples of data literacy issues leading to extreme failures. However, one of the most significant occurred at NASA in 1999 and led to the loss of a $125 million Mars probe.
The probe burnt up as it descended through the Martian atmosphere because of a mathematical error caused by conflicting definitions. The navigation team at NASA's Jet Propulsion Laboratory (JPL) worked in the metric system (meters and millimeters), while Lockheed Martin Astronautics, the company responsible for designing and building the probe, provided the navigation team with acceleration data in imperial measurements (feet, pounds, and inches).
Because there were no common terms or definitions in place, the JPL team read the data inaccurately and failed to quantify the speed at which the craft was accelerating. The result was catastrophic, but it could have been easily avoided if a system of data literacy had been in place.
Identifying Common Data Literacy Issues
Organizations face three key problems when data literacy is not a priority. They include issues with access, terminology, and standardization.
Access
Problem: When data is not cataloged effectively, it makes data literacy more difficult to implement. Organizations have data spread across tens, hundreds, or even thousands of data sources, and when it comes to finding, accessing, and learning about this data, such dispersal makes it difficult. Even if a user is motivated to innovate through data analytics, that user will struggle to find the data assets needed.
Solution: Data literacy is greatly improved when users can access data quickly and easily. Further down the line, easy access makes collaborative efforts more successful, too, so teams of data users can derive greater value from data sets.
A data catalog draws in data from all of a company's data sources and presents it on a single platform. Data catalogs make this information searchable.
This combination of accessibility and searchability is key to encouraging regular business users to adopt data analytics techniques and make data-driven decisions.
Terminology
Problem: In the absence of data literacy, different users and departments will use different terms to describe the same data point. For example, a user trying to find student names may use the term student name. However, if the information is classified in a table with field name STU Nm, the user won't be able to find it. This also leads to confusion when users are accessing data, because they don't know what they are reading, which ultimately skews the context.
Solution: Using the functions of a data catalog, assign the terms you use in your organization to specific data sets to avoid mixed messaging and confusion.
Let's take the same example of student name. You take the different terms used for it, such as STU name, Student Desc, and Student, and attach them to a single term, Student Name in a business glossary. Whenever end users come across one of the versions of the term, they see it attached to the standard term Student Name.
This method enables you to keep the terms already in use and clarify them using a standard term. Also, any time a new table is added, the standard term information will be available so there should be no repeat of the problem.
When implementing data literacy, organizations are often hesitant to change terminology across the board. For example, different people in an organization may use different applications, and changing terms across applications is difficult, time-consuming, and expensive. You have to change hundreds of reports and potentially thousands of documents where the terms are being used.
A data catalog automates the process, so whenever users see a term used they also see the standard term. The enterprise doesn't have to change this term over multiple applications.
Slowly, this also builds a culture around the specific terms you authorize. It provides transparency and clarity. In essence, if you can't change the term, then providing information about it is the best mechanism for improving data literacy. What cannot be changed should be endured -- but you should make it easy to do so.
Standardization
Problem: In addition to issues with terminology, there are often discrepancies when explaining definitions. Standardizing these definitions is critical. As we explained earlier, using different standards can lead to serious negative outcomes. Although the case for standardization of terms is strong, the standardization of definitions is an undeniable business priority.
Of course, not every business user relies on standardized terms to land a spacecraft, but they are still important. For example, in healthcare, different hospitals may use different definitions of the length of stay. Sales teams might measure performance KPIs differently from HR. In the manufacturing sector, units of measurement must be consistent for a product to reach the end of a production line complete.
Solution: Data users can create a business glossary where company definitions are standardized, improving data quality. Using a business glossary, you can explain what defines length of stay, performance KPIs -- even the unit of measurements required to land a spacecraft on Mars -- avoiding any confusion about these similar terms.
Just as when you standardize terms, using a business glossary to create companywide definitions will ensure that critical business processes run smoothly.
A Bespoke Approach
When you embark on a data governance initiative, it's quite tempting to try to find a standard protocol for data literacy that will enable you to achieve a common outcome. However, because every company has a different objective and access to unique data assets, there is no industry standard.
Instead, you must focus on the three key areas we have discussed in this article -- access, terminology, and standardization -- and adapt your literacy efforts to suit your organization's internal expectations. This way, you can optimize your implementation drive with greater accuracy.