Curating Data About Data

Curating Data About Data

StockSnap_CXXONEIULG_WomanWatering_CurateData_BPA key part of data governance and data intelligence is to have an easily accessible knowledge base and a key part of this knowledge base is a data processing (report) catalog. And a key piece of the information in the data processing catalog is curated data. In a previous blog post we discussed about changing your report catalog into a data processing catalog which includes more than just reports (such as extracts). And in another blog post we covered the uses of a data processing catalog. This blog post will cover the curated use of a data processing catalog.

A main use for a data processing catalog is curation which includes:

  • additional context
  • clarity
  • transparency
  • policy attributes used for enforcing data policies
  • usage
  • other custom attributes

Some of the basic information regarding your organization’s reports and extracts can be imported such as name and description which is helpful from a search standpoint. We highly recommend an inventory that brings in all the raw, simple stuff and then curate for the remaining critical pieces of information. Curation will add more information that will make the data processing catalog more beneficial to the organization. And curated data about data requires human intervention.

This additional curated information can include:

  • purpose of the report
  • owner
  • definitions or glossary entries mentioned on the report
  • policy attributes
  • report security classification
  • requirement for data sharing agreement
  • data retention policy
  • PII information included on report
  • any additional information that would be beneficial

You decide what curated information you want to add to your data processing catalog. Determine what will add the most value. You want to have information in your data processing catalog so that if someone is looking at a report, they can understand how it works and what it does. Make sure you have a link within the report so this curated content can be viewed and used.

When do you want to curate? Focus curating only on the mission critical processes. Your organization may have hundreds or thousands of reports. And you do not need to document all of them. You do not need to curate all of them right away. But it is important to document, as many as you can, and do your curator part. The best time to curate is when you are creating new reports (extracts) or changing reports (extracts).   Curate data about data when there are questions about the data. When someone says, I do not understand this, that is the perfect time to do curation. Let it happen naturally. You are just capturing the work that is already being done in a central place that can be used in the future. And when you build specifications about reports and extracts you often end up spawning additional data governance-related content. For example, when you are creating an ETL process you are documenting data lineage, discovering/resolving data quality problems, and finding out that a data system does not exist in your inventory.

It is important to have a data request process in place so that you have a point of entry for these types of requests (reports, extracts). This is the concept of just in time data governance or data intelligence. And that way, you can process that curation and get it back to them from a customer service standpoint.  The process includes who the request gets routed to (data stewards and subject matter experts). You want your subject matter experts handling the curation of the requested report.

For example, you get a data request for a report that has a list of all our customers with their satisfaction rating. Answer these questions:

  • What does that look like?
  • How do we build it?
  • Where do we build it?
  • Who are the ones who sign off on it?
  • What is the definition of the things on it?
  • What does it mean to the customer or requestor?

These answers should be captured for later use. And the answers might generate new requests such as the need for new business glossary data definitions. If you just created the report, you may have nothing more than some SQL code underneath it.

To assist with your data processing catalog and your curation, you want to have specification templates, which are forms that people can fill in for different types of requests that have preset questions. Templates can be for reports using a specific reporting tool, ETL processes, integration, or an API. This can be done in a Word document, ticketing system or data governance solution like the Data Cookbook. The point is to capture this information as simply as possible and the series of questions that you have captured become the documentation.

We hope that you found this blog post helpful in getting more use and benefit from your data processing catalog especially the curated pieces. Additional Data Processing (Report) Catalog Resources can be found in this blog post.

IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data intelligence, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.

 Contact Us

Photo Credit: StockSnap_CXXONEIULG_WomanWatering_CurateData_BP #B1184

Jim Walery
About the Author

Jim Walery is a marketing professional who has been providing marketing services to technology companies for over 20 years and specifically those in higher education since 2010. Jim assists in getting the word out about the community via a variety of channels. Jim is knowledgeable in social media, blogging, collateral creation and website content. He is Inbound Marketing certified by HubSpot. Jim holds a B.A. from University of California, Irvine and a M.A. from Webster University. Jim can be reached at jwalery[at]

Subscribe to Email Updates

Recent Posts