Many organizations have documented their reports (or at least their most critical reports). And hopefully this information is in an accessible knowledgebase. This is often referred to as a report catalog or a data deliverable catalog. Our suggestion is that this accessible information be expanded and include information on other data processing activities. This catalog, which we call a data processing catalog, is a critical piece of data governance and data intelligence. Remember the goal is to help people. This is more than an inventory of your reports. And in this blog post we will cover what should be in this catalog and talk more about what we call a data processing catalog.
What is in your data processing catalog or should be?
- Reports – You probably have some or all your columnar reports documented. Start with the critical reports or the ones that are being asked for. Do not forget about the graphical visualizations (charts), queries or views. Document whether this is a report that is curated or certified for use. Provide the information in this document that would be needed when accessed from the report itself.
- ETL (Extract/Transform/Load) – Make sure that you especially document the ETL processes to a data mart and data warehouse.
- Surveys – Make sure that your data processing catalog includes data extracts to flat files for external surveys or data collections as well as the survey data collection for internal survey analysis.
- Data Integration Processes – Integrations are usually created and then forgotten. And then a change occurs in a database, and it is a major effort to make this change in the integration. It is critical that integrations be documented and placed in the data processing catalog. This includes inbound data integrations such as file import processes and inbound web service API imports. Do not forget to document the outbound integrations as well such as an extract file to external batch integration and pushes to external web service APIs.
- Collections – This is our term for anything that is a group of documents. Usually a dashboard is a collection as it is a group of visualizations. Or an aggregated report like a fact book at a college institution is a collection. It should be documented on how the various components (do not forget to link to these documents) are gathered and put together.
What are some of the shoulds regarding a data processing catalog?
- You should have a data request process in place. Make this process of asking for a new report or integration or dashboard easy to do. This process will increase trust in your data and improve your organization’s information. This process feeds what should be in the data processing catalog.
- You should have different templates for different types of data requests or activity types. These templates should be easy to use but thorough. And remember to include examples or style guides. You are striving for consistency and completeness of the documentation.
- You should have workflows and people in place for approvals on the items going into the data processing catalog. This includes involving data stewards and subject matter experts (functional, technical, and policy) who can approve and curate the information.
- Besides at point of entry of the request, you should improve these documents (we also call them specifications) when data processing activities change or when there are questions about them.
What are some things to remember regarding a data processing catalog?
A good thing to remember is that a used and updated data processing catalog spawns and improves other data governance related content such as business glossary definitions, reference data lists, data lineage and data quality rules. Link documents together when applicable and mention the relationship between them. The linking of documents and the sharing of information with consumers makes these documents more beneficial to the organization. You want to help people. You want them to find information easily. You want to make it easy for them to submit data requests of all types.
In conclusion, expand your report catalog to include all types of data processing items. Keep adding to this data processing catalog with new requests and keep improving the documents in the catalog so that they can be used by others easily. The Data Cookbook solution by IData has a catalog so that this data processing information can be stored and easily found by users. Additional resources about a data processing or report catalog can be found here. We hope that this blog post was beneficial to you and your organization.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data intelligence, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
Photo Credit: StockSnap_3LTVJMIIKW_ManonRock_ExpandDataCatalog_BP #B1164