It seems like every day we're hearing about data products, or data as a product. The terms have certainly crept into our vocabulary and we have the distinct impression we're using them more often. (Undoubtedly this says more about our proclivity for recency bias than it does about the data discourse.) The barest minimum of research leads us back to this article from 2018, which we've read, seen referenced, and seen linked to many times since then.
Data products should potentially be accounted for among your inventory of data assets, but sometimes an arbitrary distinction is useful. We might say on the one hand that data assets are largely composed of systems, applications, tools, platforms, and technologies that capture, store, move, and transform data, as well as the data housed in or transported through those tools. Data products, on the other hand, are easier to understand as things that are created by deriving, combining, enriching, or analyzing data by using data assets.
The most visible data product our clients deal with is a dashboard or visualization, or the data set(s) used to source the visualization. These data sets, which can also be rendered as tables or narrative summaries, should probably be considered part of the eventual data products. And most of these data sets are the result of a query or group of queries executed at a certain time, so let's make sure to include these queries (or other data extraction processes) in our product definition as well.
The point of this distinction is not to create a taxonomy, but to better understand the scope of some of the data challenges we face, and to come up with strategies to address those challenges.
Many of our clients face a growing problem in that they don't actually know enough about their data assets to determine whether they're taking unnecessary risks, whether they're generating or deriving value from these tools, whether they're realizing efficiencies or actually duplicating effort, and so on.
Keeping track of data products, then, raises that challenge by an order of magnitude, since data products can be generated directly within an asset, or by integrating assets, or aggregating them, enriching them in some fashion, in the process creating altogether new data stores.
Clients often come to us with dozens if not hundreds of data assets about which they know a lot less than they'd like to, and frequently they can point to thousands of data products available to their users, across multiple platforms and tools, most of them undocumented and poorly, if at all, catalogued.
- In many cases it's not clear whether the products are in use, or if they're even usable: they could be out of date, they could rely on business or data quality rules no longer in effect, they might utilize a permissions model that's no longer appropriate.
- In other cases it's clear that products are frequently used but in an entirely ungoverned fashion: no one can say where the data in the product originates, how long it's been around, what enrichment or transformation has been applied.
- This sounds like an edge case, but it's far from infrequent in our experience: a business unit is entirely responsible for the purchase and use of a SaaS tool, and customized data products generated inside that application, but the user who acquired the tool has left--and yet that user's credentials are still required to generate queries or update outputs!
Cataloguing data products is a challenge, no question. Our Data Cookbook solution allows a simple, searchable repository for this kind of catalog, and it accommodates both the most skeletal set of sketchy notes as well as robust and richly detailed documentation. Even if you don't use our tool, this is an effort worth doing. At the very least describe the product beyond its name, identify the data domain responsible for it, and capture whatever additional significant facts you can. (Perhaps in the not too distant future, this will be a job for AI. But, speaking as people deeply experienced with information technology, the future has a history of quickly receding away from the present. So, as ever, maybe don't wait for someone or something else to solve this problem for you?)
Cataloging data products is not for the faint of heart, but it's a solid step on your journey to effectively governing data in motion. The next step - or several steps - has to do with curating that catalog. Now, curating or certifying or otherwise marking data products as valid is no small undertaking, we'll admit to that right away. In our view, however, over the long term this task is at least as important as creating a data product catalog.
It would not surprise us to learn that 20% of your data products account for 80% of your users' data activity (well, to be honest, we wouldn't be surprised at an even more unbalanced distribution), so achieving certainty about the accuracy, timeliness, and utility of that core set of products has value, no matter where your organization sits in its ability to leverage data. There may be some metadata available to you to help identify how recently a data set has been accessed, how frequently a dashboard has been visited, how many times an API has been called, all of which could signify priority and importance of some data products.
Of course, as with all data-related activities, it'll probably be helpful to wear out some (virtual) shoe leather by visiting or contacting stakeholders directly. Not only do we want to better document and make more available our most useful data products, we'd also like to inculcate better habits for the products we're making now or planning to make in the future. What do those better habits include? Here's a starter list, but feel free to add your own items.
- Clarifying the business need for the product, and describing how the product will address that need.
- Understanding the audience(s) for the product, and tailoring its development so that it will be delivered in an appropriate format.
- Providing transparency around the request, the assignment, the prioritization, and the scheduled completion (or other milestones) of the product.
- Making the product discoverable (e.g., in a data catalog).
- Documenting lineage, quality concerns, cleansing/transformation, and of course related business terminology to aid in understanding.
- Where possible, making it easy to enhance or expand or otherwise iterate improvements.
For most organizations, most users will have their most meaningful interaction with data as consumers of one or more of these data products. It behooves any organization that seeks to be data enabled to develop reliable and consistent data products, and to make them available and consumable as widely as is sensible.
Additional data governance and data intelligence resources can be accessed from here.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(Image Credit: StockSnap_3O4WGBKKE6_grouporanges_dataproducts_BP #1270)