We often point out, in webinars and elsewhere, that if enough people in your organization perceive that you have a data quality problem, then what you have is a data quality problem. Perception is reality.
There are some accepted dimensions for measuring data quality, including accuracy, consistency, completeness, and validity, and you may even have metrics that show good numbers on those dimensions. But another aspect of data quality is timeliness, and that's often what trips up our clients. Either it takes too long to scrub or verify data in its journey up and down the Business Intelligence (BI) stack, or it takes too long to answer questions like "why does this data look funny?" or "are we sure we don't have more/less of x thing we count?"
Your data quality problem might not even be timeliness, it might just be that too many people have a low opinion of the quality of organizational data. It could be that the root cause of your data quality problem is actually low data literacy, or poor documentation & understanding of data assets, or outdated BI & analysis tools and products, or some combination of these and other issues. Frankly, what matters is this: if employees don't trust the data they have access to, they won't use it in meaningful ways.
Any organization that traffics in data wants high-quality data, but many think they lack the resources to ensure the desired level of data quality.
Deploying those resources, whatever they are, can be a challenge. Think about common points of engagement for data quality: they might include data on a dashboard that doesn't look right, or a user looking at a record in an application where data is missing or incorrect, or perhaps errors generated via an integration or ETL script.
What happens when a user encounters an example of what looks like low-quality data? You have limited resources available to address any kind of data issue, including data quality problems. So how do you deploy those resources you have most effectively?
The way we've seen it happen all too often is probably the least effective way. You know where we're headed with this: the squeaky wheel gets the grease, right? The offices or users who complain the loudest get the attention, even if they're mistaken, even if the issue they report is a low priority, even if resolving a real issue would not provide much return on investment.
It may be that you encounter a little bit of the boy who cried wolf phenomenon with your repeat offenders. But, if nothing else, it's worth investing in some process for people to report data quality issues, and for that process to be somewhat interactive. Reporting bad data, if it never gets fixed, is going to be something people will stop doing, and for our money that's a worse outcome than chasing reports of bad data that turn out not to be all that bad.
Some readers, notably the ones who spend unrewarding investigating phantom issues, might disagree. We can hear some of you suggesting that when their users report data quality issues they're really reporting data understanding issues, and making it easier for them to report problems that don't exist just makes more work for other people.
We hear that concern, although we suspect that there may not be a good way to discourage persistent users. Most users are familiar, at least these days, with a ticketing system for reporting issues to a help desk, or requesting data, or seeking access to additional data sources, etc. One way to engage users more deeply with data quality would be to encourage them to use your existing ticketing system as a way for them to report potential issues, and to track the progress of the investigation and resolution of those issues.
Of course, human-reported data quality problems or issues generally have one key drawback: they're created by humans. Is the data on that dashboard *really* incorrect, or does the person looking at it not know much about the underlying data (or how dashboard filters/slicers work)? Users who find data issues in the course of their transactional work are probably more reliable reporters, but this method of identifying data quality issues isn't systematic, and it's likely to be fairly random.
We mentioned points of engagement earlier, and we often talk about our Data Cookbook solution and data governance in general as a "help desk for your data." The best help desks meet users where they are: sometimes automated instructions or chatbots get the job done, sometimes the request is straightforward and so is the response. But much of the time the person who receives the ticket needs to follow up with additional questions, maybe even a real conversation. We think that's the same approach to take to data quality issues.
We've described a sample resolution in this space before, but key aspects of it involve learning what is the barrier, how serious is that barrier and what is its scope, and what are the root causes. Too often we look either to prove the data is fine, or for a quick way to clean up the bad records.
Users who report problems with data are showing us that they are invested in using data. We want to support and enable that use. And, yes, sometimes all we need is a quick script to alter some offending records (and don't forget to document this work as part of a data quality rule so this issue is easier to avoid or to remediate in the future!). But much of the time these reported issues are emblematic of additional data curation needs. Business terminology needs to be defined, systems of record need to be identified, appropriate sources and uses of data need further elaboration, etc.
A lot of those responsibilities fall on data stewards. But aren't people who come to you with questions about how to understand data, or suspicions that there may be something erroneous or incomplete about data, practicing data stewardship as well? Aren't they volunteering some of their expertise, or at least admitting they are willing to learn more about your organization's data? And wouldn't it be data malpractice not to take them up on their offer?
Hope you found this blog post beneficial. To access other resources (blog posts, videos, and recorded webinars) about data quality feel free to check out our data quality spotlight page.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts including data quality. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(Image Credit: StockSnap_LLUARZPR33_PuzzledMan_DataQualityBP #B1275)