One reason we don't get as much out of our data as we could is that we don't trust it. In this blog post we will cover why we don't trust our data. We will also discuss a single source of truth and data certification.
Why don't we trust it?
Sometimes the data seems incomplete, or inaccurate, or there appears to be some other data quality problem (real or perceived). One real data quality problem that many of our clients struggle with is the length of time between asking a question and receiving an answer. We find that analysts go to great length to provide accurate data, but if the final result isn't timely, then it is hard to use it effectively.
Perhaps the data does not seem consistent, and so we are loath to rely on it for important decisions. How can I trust this week's numbers when last week's seemed a lot different? Or, when I ask Michael and Julie what I think is the same question but I get two different answers, what do I do?
The data does not seem legible to decision makers: key performance indicators look like they tell two opposing stories, or key metrics don't seem reflective of our experience on the job, or there is simply too much, or too little, data provided.
Maybe decision makers don't understand the data: they don't know what data terminology means in a given context, they don't know how this particular data point could affect their work, or they don't have the quantitative or logical reasoning skills to interpret subtle variations.
Decision makers think they've been burned before: they tried to rely on data to support a decision in the past, but the outcome went south, or they got too much negative publicity.
Any number of cognitive biases can be in play. We know, for example, that people are very good at poking holes in the evidence that contradicts their beliefs, but not nearly as good as recognizing evidence that calls their priors into question. It's easy to fall victim to recency bias, or availability heuristics, or any number of mental shortcuts that don't involve grappling with the full scope of information that might be relevant.
To combat this problem of a general lack of trust in data, many organizations look to establish a single source of truth, generally some kind of reporting data store from which key data can be extracted and presented. The idea is that data that goes into this store has been vetted in some fashion, and so products that are built against it are more reliable, and thus easier to trust.
Well, maybe. It turns out that curating a data warehouse, for example, is time-consuming, and it's increasingly difficult to keep up with demands to funnel data from a growing number of sources into the user layers. And what does it mean to vet or curate that data, anyway? Technical documentation around ETL processes is usually inaccessible to consumers, both because they can't access it and also because it might as well be written in Greek.
Also, as we suggested above, many of the reasons that people don't trust data stem from too much opacity at too many points in the data life cycle. How exactly does this single source ensure consistency in reported data? What about when the transactional systems where data is captured have data quality issues? How do users know when data has been cleaned for their use? What guarantees do they have when they report a potential problem? For that matter, what about data managed and used by only one business unit--is it worth the trouble to replicate it in another location?
So a single source of truth is probably the right move, but it can't be the only move you make. We have worked with a number of clients for whom we recommended developing a path to certify some of their data products. This certification is designed to confirm the validity and currency of the data, and to declare that the data sources have been vetted, or cleansed, or optimized for reporting. But this process also involves getting data consumers more deeply involved in the review and testing stages, to guarantee that the presentation is clear, and that the context for the data has been explained or at least described.
Other aspects of this process that we like include the fact that almost from the moment of data capture, data users, especially subject matter experts and data stewards, are thinking about how this data will be used throughout the organization, and for what purposes. Another thing we like is that regardless of the immediate data source, or the manipulating/reporting tool that's used, or even the intended audience, the same review and certification process can be used.
This certification helps address the issues of users not understanding the data, whether that's a general sense of the data, or a specific instance of it in transit. A formal review tends to ensure that the final presentation is user-friendly, and that key metrics are consistently reported. Providing these metrics to managers along with a description of not only how the metric is calculated but also why it matters to the organization can go a long way towards developing increased data literacy capacity, and ensuring that capacity sticks around.
This process of certification can be part of your formal data governance framework, and our Data Cookbook can enable it, from defining terminology to documenting report specifications to enumerating quality and business rules that affect how you present data and with whom you share it.
Data has to be used for it to provide value, and it's more likely to be used when it's trusted. No matter what your role or your level of responsibility, you can always ask what you're doing to foster trust in data, and what you can do to knock down barriers to that trust.
Additional data governance and data intelligence resources can be accessed from here.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(Image Credit: StockSnap_T0QZOUZ2UY_datateam_datatrust_BP #1192)