When we speak about literacy in everyday usage, we are usually referring to the ability to read and write in a given language, that is, to understand ideas from other people, and to communicate information to other people. From a pedagogical standpoint, literacy refers to a more specific set of concepts, but we'll stick with this basic understanding here. In this blog post we will discuss data literacy.
Increasingly we hear the phrase "data literacy," and to be honest we use it ourselves on occasion. It's not a perfect analogue to linguistic literacy, since most definitions of data literacy include at a minimum understanding, interpreting, and communicating about data, which is likely to entail more than reading and writing. Still, the term resonates as we think about why organizations struggle to make effective use of their data.
Dig a little deeper into most descriptions of data literacy and you'll see that a few more basic requirements emerge: a certain level of numeracy, such as arithmetic; knowledge of basic statistics, such as means and medians; an understanding of the difference between quantitative and qualitative data.
Ultimately, however, the reason we want data literate employees is so that we can make decisions informed by data. In order to do that, people must be able to use analytical principles, to apply logical reasoning, to evaluate whether and which data is suitable for a given problem or task, and to explain their work and process.
Baseline data literacy sounds like a combination of knowledge, skills, aptitudes, and methods. Some level of data literacy is needed in our data producers, our data providers, and our data consumers. But just as people fall into these roles based on differing skills and experiences, we should expect the exact forms of their data literacies to vary as well, and plan and act accordingly.
Data literacy needs to manifest at every point in the data management life cycle, since at each stage a different set of people's needs come to the fore.
We work with too many organizations whose lack of data literacy hinders them from the very first stage of the data life cycle, data acquisition. Too many of our clients do not gather or collect data, they mainly accumulate it. Sometimes data is generated in the course of regular activity, and the challenge at this moment is generally capturing it and storing it. But as more and more data is available, it is ever easier to acquire haphazardly, or without real understanding of how it could be useful. When we gather or collect other objects, we have standards, patterns, and goals; those should also be present during data collection. Data will not assemble itself into a set suitable for analysis, so the greater the fluctuations during its collection, the more challenging the next steps become.
We would expect data literacy to be present when data moves into the analysis stage, since analysts tend to have strong backgrounds, if not formal training, in data manipulation and statistical analysis. But the prevailing understanding is that analysts spend 70% or more of their time simply preparing data for analysis, whether that's joining disparate data sets, inputing or removing missing values, identifying and resolving outliers, defining variables to assist in analysis, and so on.
How, exactly, do analysts identify outliers or inpute missing values? No matter how skilled an analyst may be, if they don't have an understanding of the business context around the data, then their ability to provide useful insight is limited. Moreover, the form in which data is initially stored tends not to be the form that lends itself to real analysis, so some amount of modeling, aggregating, and transforming may be necessary; this looks like a different combination of data literacy factors.
Ideally, when data is delivered into the sharing or publishing stage, it will have been cleansed thoroughly and analyzed sufficiently. Key performance indicators and management metrics, perhaps presented visually or even interactively, can then serve as the impetus for change, or reassurances for staying the course.
Even in this ideal situation, data consumers need a set of data skills that allows them to process charts, graphs, and tables, to draw inferences based on the data, and to formulate decisions informed by their understanding. Arriving at the wrong conclusions about data can lead to suboptimal, even distastrous, choices and actions. So the successful pipeline from analysis to presentation, like the pipeline from collection to analysis, needs not only the right amount but also the right application of data literacy.
The final stage in the data management life cycle has to do with the disposition of data, whether that's some form of retention (such as cycling back to the first stage for further use or archiving) or some form of destruction. But decisions at this stage are also deeply tied to data literacy: deciding whether data is suitable for further analysis and thus needs to be retained in some fashion requires analytical and potentially statistical aptitude; being able to calculate how many records fall into various categories (and then confirming they ended up where they were supposed to go) requires mathematical skills; and of course understanding legal or other regulatory regimes that affect these decisions is a prime example of data in context.
So how does an organization ensure that it has the right level of data literacy throughout its data pipelines? Existing employees can upskill through training and study, if available, but if it's managerial data consumers who most need this training, they are the least likely to be able to make time for this. When new employees are hired, data literacy and associated skills can be part of the hiring criteria, but since even the wealthiest corporations report a shortage of skilled data scientists and analysts, your organization may be competing with others with deeper pockets over an already small group of potential hires.
As with our approach to data governance and data intelligence, we recommend that you not take a one-size-fits-all approach, nor that you attempt to solve your problem all at once.
Data collection, for example, is generally managed or overseen by data stewards, whatever their title. These people can define business terminology and relate it to data their units collect and process. They can develop rules and standards for data quality, particularly around accuracy and integrity, and ensure that incoming data adheres to those standards.
When data is defined - and those definitions are shared - then architects, modelers, and analysts alike better understand its provenance and its purpose. When data quality is maintained from origination, then those involved in data analysis have to do less cleansing, less manipulating, less guessing, which can free them up to, among other things, annotate their analytical outputs, and experiment with techniques to make their visualizations richer and clearer.
The easier it is to glean insight from data, the more likely data consumers are to act on that insight. Better (more engaging) data products, delivered faster, sourced from trusted data collections, will enable leaders and managers to more fully factor data into their decisions. They can become savvier consumers, leading in turn to their asking productive questions about data, which in turn can lead to changes and continued improvement at every point in the data life cycle.
The Data Cookbook can be a powerful component on your organization's journey to widespread data literacy. At the heart of the Data Cookbook is a business glossary, where functional and technical definitions of data terminology are stored, as well as lineage, collection methodology, and data quality standards. Also at its heart is a knowledge base, consisting of an inventory of data systems and collections, as well as a curated library of data deliverables such as reports and dashboards. Simply through documentation, the Data Cookbook helps demystify and make transparent data at every point in its life cycle. Furthermore, the Data Cookbook includes a collaboration engine, where consumers can ask questions, perform searches for data in context, and conduct discovery, where engineers and analysts can share the details and accessible summaries of the work at the core of analysis and business intelligence, and where data stewards and subject matter experts convert tacit understandings into a crowdsourced but professionally moderated knowledge base.
Ultimately, the Data Cookbook can help your organization document data in context, providing a springboard for analytics, insight, data-enabled decisions, and other ways to unlock the value stored in your data assets. We hope that you enjoyed this post.
The Data Cookbook can assist an organization in its data governance, data intelligence, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
Photo Credit: StockSnap_UOAUVG4AHV_womansmiling_dataliteracy_BP #B1202