We described the concept and the value of a data catalog in this space recently, and made an arbitrary distinction between cataloging data assets--the actual data elements themselves and the systems and applications in which they reside--and data products. Obviously, one could argue that this distinction is too arbitrary.
We have seen many, many clients who have a central BI tool with artifacts - typically canned reports but also logical and semantic layers - that date back a decade, often more. Documentation is virtually nonexistent, and what little documentation exists is both often highly technical and highly inconsistent, since it reflects efforts by committed individuals rather than the outcomes of a mature process. The sheer volume of data products is often pretty overwhelming, and we think it's helpful to separate assets from products when considering the work ahead.
Users may have some knowledge of data sources, reporting data stores, and even groups of data elements. But the vast majority of users either take it on faith that the data products they rely on are still current and usable, or they make repeated copies of existing products in order to incorporate current fiscal periods or product lines, or they go off into some satellite realm where a point-in-time data export becomes the data source for new work, even as it ages and becomes more and more limited. New users inherit these brittle artifacts and are told just to use them, and frequently they are in no position to speak to the quality or completeness of these products.
While that situation has continued, largely unabated, new visualization tools and in-application analytics have been introduced. Managers want charts and dashboards so they can get quick views of up-to-the-minute status, and much of the appeal of new niche data applications is not only that they solve particular problems but they can generate attractive, presentation-ready analytics. So these products then start to perambulate in the reporting, BI, and analysis realm along with legacy products.
These newer tools often support more useful descriptive metadata, and/or they support explanatory annotation in their output. But, if the standard at an organization has been to publish with minimal testing and documentation, that standard often remains in place across new tools and new users.
One observation we made early on as a company, and one that has guided our work with clients and customers over the last two decades, is that it's not enough to provide access to data. Users need to see and understand data in context, and in order for them to make meaningful use of data they need to be confident that it's current, reliable, and consistent (over time and across data systems). Any survey of data usage, even recent ones, indicate that across organizations and industries, people lack confidence in their organization's data: they are suspicious of its quality, they are ignorant of its provenance and lineage, they hear the same terminology used to refer to disparate data elements and concepts, and they receive inconsistent results depending on when - and whom - they ask.
At a more basic level, many users don't know what data their organization collects, whether they have access to it upon collection, and, of course, what they would do with it if it turns out they do have access. Over the years, we've had to work backwards to this realization, since a lot of the custom work we've provided, and a lot of the training and support we've delivered, hasn't really landed. At many of our clients, the people who come to us tend to be not only data literate, but fairly accomplished in the analysis and manipulation of data; unfortunately, that only goes so far in organizations where a culture of data hasn't really taken hold.
As we have mentioned before, a central inventory of all the data products that are out there is valuable. But given the size and age of your inventory, and the technical debt around making and maintaining data products, there might not be all that much useful information readily available when you first populate that inventory. It seems to us, however, that this is one of the best opportunities for a data governance point of engagement.
At some point, a user will execute a canned report, and they'll find that the results don't seem to make sense. Or maybe a new executive will have questions about a published dashboard, or will request a new data product of some kind. An analyst or report author will request that a new data element be added or changed in your data warehouse, or one of the logical layers they work with.
Each of these situations is a point of engagement with data, and an opportunity for your data governance framework to be invoked.
Situation | Engagement Action | Potential Outcome(s) |
Uncertainty about reported data | Request for review/curation | Data product is certified or deprecated or updated |
New manager has access to existing set of dashboards | Terminology is different from the way it was used in previous department |
New or updated business glossary entries Dashboard undergoes recertification |
Analyst creates variable in a report or logical layer | Confirm uniqueness and accuracy of variable |
Review other dashboards and reports for suitability Update business rules and related glossary entries Update lineage documentation in data catalog |
New data element added to warehouse table | Review other products generated from this table | Data quality assessment |
All of these situations occur regularly: users rely on reports they know nothing about, and could not explain; managers make snap judgements based on quick reviews of dashboards, and may misunderstand key metrics; analysts and data scientists come up with a clever improvement, but find no way to propagate beyond a particular product or moment; the BI stack is constantly changing, and opportunities for upstream and downstream engagement are missed.
We believe your data governance framework, activities, and tools, such as the Data Cookbook, will be markedly more successful, and will show that success sooner and more visibly, if they're oriented around engaging stakeholders when and where they encounter data.
Additional data governance and data intelligence resources can be accessed from here.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(Image Credit: StockSnap_940CB5B9CD_ComputerTyping_DataProducts_BP #1206)