While those situations can reflect poor data quality - timeliness is certainly a key dimension of data quality, reports whose elements sum to more than 100% of a published total may reflect duplicate entries or coding in a source database, and some users do know their data inside and out - in our observation they tend to be more indicative of poor process and weak communication, which speaks to a breakdown in or a fundamental lack of data governance, data intelligence, and/or data management.
How is bad data identified in the first place? Sure, it can happen because a keen-eyed person looks at an individual record and notices an inaccuracy (that person's name is misspelled, or they no longer live at the address recorded for them) or an invalidity (the zip code for Chicago is not 34543!) or inconsistency (these people are category A in this system but A4 in this other system), but frankly those instances are rare.
Somewhat more common may be the situation where a person knows something about organizational data, maybe because they have a vested interest. Perhaps a person recognizes that the list of all the sales closed last quarter is incomplete. Or I know that this patient was discharged two days ago, so why am I being prompted for a status update when I log into my clinical portal? Or we're trying to close the books on a fiscal period and some invoices that should have been paid weeks ago are still marked as open.
The most common reason people grow skeptical about their organization's data quality occurs when they look at it in the aggregate. Numbers change without explanation or warning, possibly in hard-to-believe ways: last week we had 780 applications complete, and this week we only have 770! Figures don't square with expectations: our sales forecast for this quarter is $9 million, but our pipeline only shows 4! Or perhaps the most common: I asked for this information last week, but by the time I received it, it was too late to act on.
So this user perception of low-quality data tends to show up in formats that don't lend themselves to quick quality checks, and potential problems tend to be noticed by people who may not otherwise work directly with organizational data at the record level.
In fact, perceived data quality issues may represent many different problems, and they can develop at many places in the data management lifecycle.
There is good news. If the situations described above can be recognized, steps can be taken to address them. An even bigger issue when it comes to data quality, however, may be that in many cases we don't really know whether our data is high-quality, low-quality, or somewhere in between. And why don't we know?
As a data consumer, I have to evaluate data quality based on limited access to data, often using a heuristic like "is this count of x accurate?" But I typically have very minimal firsthand knowledge of x, I have almost no visibility into the way the count is produced, and my standards for accuracy might be based entirely on unfounded or untested assumptions about x, or the way it's counted, or previous counts of x that have been provided.
Challenges when considering data quality can be grouped into the following three categories.
Our response is: data quality management (DQM) is a data governance initiative. Data governance, when it's successful, is characterized by openness and transparency, collaboration and communication, and by iterative improvements to process and practice. So for data quality to get better, and for data to be trusted, DQM programs will need to leverage the knowledge of people who manage and maintain data: data stewards, subject matter experts, data and business analysts. And DQM efforts will benefit from adopting or mirroring the data governance framework they are part of.
We could write about data quality until the proverbial cows come home, and over the next few months we just might! But in the meantime, here's hoping this post has opened up your horizons as you think about data quality challenges and opportunities in your organization. And if you're interested in some practical steps you can take, we recommend you watch this presentation from our founder and CEO, Brian Parish.
Link to Data Governance Resources page for additional resources. Feel free to check out our other data quality resources in this blog post.
IData has a solution, the Data Cookbook, that can aid the employees and the institution in its data governance and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(image credit: StockSnap_3HGXPSXH2B_manthinking_reframingdataqualityproblems #1021)