In an earlier post we covered components of a data quality program and data quality rules. Today we want to discuss data quality assessment or measurement.
The most pro-active component of a data quality program is to measure or assess data against quality rules. Routine assessment can uncover hidden gaps or inaccuracies before a user discovers it. Most assessment tasks can be automated and scheduled.
There are 4 steps when assessing data:
- Set a threshold for each quality rule
- Write a query against the data that supports the rule, and run the query
- Communicate detailed assessment outcomes to data steward or other responsible party
- Communicate broad assessment outcomes to all data users
In Step 1, a data steward or other business data manager sets a threshold. A threshold defines the minimum number of records that are permitted to fail a quality rule. A threshold is expressed as a percentage or an actual number. For example, a data steward for Advancement data defines a quality rule “Graduate forwarding address must be no older than 90 days”. The data steward sets the threshold at 5%. This means that if there are 2,000 graduates, the threshold allows about 150 addresses to fail the rule (i.e. be older than 90 days). If the number of old addresses surpasses 5%, the threshold would be broken and the data set would have reached a level of unacceptable quality. Thresholds typically allow some data to fail the rule. Business processes can often perform well with some degree of imperfect data and it is not cost effective to maintain perfect data in all areas. Of course, some thresholds will be 0%, usually for quality rules that support government compliance. A data steward or data manager is the best person to set the threshold because they can best assess the risk of poor quality.
In Step 2, a technical person writes a query to match the threshold and either schedules the query to run automatically or runs it manually.
In Step 3, a data steward or data manager in the business unit should receive detailed results, whether positive or negative. This communication process should be routinized using standard email subject lines and formats so that data steward can quickly review results. Results include the number of records assessed and the number or percent that failed to meet the threshold. Perhaps the actual data that failed (sometimes referred to as “rows”) may be conveniently provided. Security measures should provide a secure place for data steward to review failed data.
Step 4 entails communicating broad assessment outcomes to all data users. A simple notation of Low, Medium, or High quality should be published near a data collection’s name wherever data collections are listed or accessed. The rankings of Low, Medium, and High would be locally defined. Low quality might signify a significant failure to meet a threshold, Medium means it meets threshold most of the time, High means it never fails the threshold. Data users may opt to use low quality data if that is the only data collected within a domain. The point is to provide a simple communication to data consumers that allows the data consumer to assess the risk of use for the data-informed strategic decisions they make.
It is important to routinely communicate general information and expectations about quality assessment programs. This signals that an organization cares about data quality and uses a proactive process to monitor quality against quality rules. Data users will understand that the process for better quality data starts with a data quality rule and progresses to assessment. The path to participation in the data quality program begins with quality rules.
Occasionally the assessment process uncovers a surprise in the quality of key data attributes. Data that was formerly thought to be accurate is discovered to be inaccurate. To pre-empt embarrassment of data stewards or consumers, include a sentence or two in the description of the assessment program that reaffirms a commitment to quality over situational embarrassment.
This post covered how to create assessments and communicate the results. A subsequent post will discuss a process to cleanse data that fails to meet thresholds. We hope you found this post useful. Feel free to check out our other data quality resources in our data quality resources blog post.
If you need help in implementing data governance or data intelligence, remember that IData provides data governance services. A data governance solution like the Data Cookbook can help in successful implementation of data governance at a higher education institution or organization, thus improving data quality. Feel free to
.
(image credit StockSnap_DBRAYDPLP1_checklist1 #1074)