The most pro-active component of a data quality program is to measure or assess data against quality rules. Routine assessment can uncover hidden gaps or inaccuracies before a user discovers it. Most assessment tasks can be automated and scheduled.
There are 4 steps when assessing data:
- Set a threshold for each quality rule
- Write a query against the data that supports the rule, and run the query
- Communicate detailed assessment outcomes to data steward or other responsible party
- Communicate broad assessment outcomes to all data users
In Step 2, a technical person writes a query to match the threshold and either schedules the query to run automatically or runs it manually.
In Step 3, a data steward or data manager in the business unit should receive detailed results, whether positive or negative. This communication process should be routinized using standard email subject lines and formats so that data steward can quickly review results. Results include the number of records assessed and the number or percent that failed to meet the threshold. Perhaps the actual data that failed (sometimes referred to as “rows”) may be conveniently provided. Security measures should provide a secure place for data steward to review failed data.
Step 4 entails communicating broad assessment outcomes to all data users. A simple notation of Low, Medium, or High quality should be published near a data collection’s name wherever data collections are listed or accessed. The rankings of Low, Medium, and High would be locally defined. Low quality might signify a significant failure to meet a threshold, Medium means it meets threshold most of the time, High means it never fails the threshold. Data users may opt to use low quality data if that is the only data collected within a domain. The point is to provide a simple communication to data consumers that allows the data consumer to assess the risk of use for the data-informed strategic decisions they make.
It is important to routinely communicate general information and expectations about quality assessment programs. This signals that an organization cares about data quality and uses a proactive process to monitor quality against quality rules. Data users will understand that the process for better quality data starts with a data quality rule and progresses to assessment. The path to participation in the data quality program begins with quality rules.
Occasionally the assessment process uncovers a surprise in the quality of key data attributes. Data that was formerly thought to be accurate is discovered to be inaccurate. To pre-empt embarrassment of data stewards or consumers, include a sentence or two in the description of the assessment program that reaffirms a commitment to quality over situational embarrassment.
This post covered how to create assessments and communicate the results. A subsequent post will discuss a process to cleanse data that fails to meet thresholds. We hope you found this post useful. Feel free to check out our other data quality resources in our data quality resources blog post.
If you need help in implementing data governance or data intelligence, remember that IData provides data governance services. A data governance solution like the Data Cookbook can help in successful implementation of data governance at a higher education institution or organization, thus improving data quality. Feel free to .
(image credit StockSnap_DBRAYDPLP1_checklist1 #1074)