So a lot of work would get done to provide a report or other data product that didn't do what the requester wanted it to do. The figures wouldn't seem quite right, or the contents of this product wouldn't match the ones provided by another analyst or from a different office. The other thing we'd observe is that a lot of duplicate work ended up being performed. Because of inconsistent terminology, or the lack of a repeatable process, someone might ask for data that was already available in a canned report, or that could easily be obtained using an existing filter on a dashboard.
So we figured that if you could get agreement on data-related business terms, and if you could further document the data systems those terms were part of (and the steps you would take to extract the appropriate data), you could solve a lot of pretty basic problems. And we also figured, why not bake some data governance and data intelligence into this work at the same time? Somebody is responsible for collecting, managing, and defining that data: why not have them sign off on terminology and the various reports and dashboards where those terms would be used?
We thought that, if we made it easier to fulfill data requests using a common storehouse of terms and products, data consumers would be able to get what they wanted from data providers more quickly, with less friction, and that they would then be able to use that data to support their work. And we thought that once a request had been documented, approved, and catalogued, it would be easier for future requesters to find what they wanted, meaning they would not be submitting new requests for duplicate work. And voila, the Data Cookbook was born.
And, for certain kinds of data requests at some organizations, that's exactly what has happened. For standard requests relating to management metrics, using accepted tools, and delivered in a consistent framework, this process is reliable and efficient, as long as you at least minimally commit to documenting your work, testing your products, and recording user acceptance.
But if you are in an organization where the data delivery process has historically been fairly casual, even instituting this minimal amount of formal oversight can be quite a lift. And even where a reasonably strong process is in place, not all data requests are created equal. Sometimes data requests come pre-filtered, say via an administrative assistant or departmental analyst; sometimes requests get passed to people without the requisite expertise in the interest of expediency; sometimes data requests are really fishing expeditions, no matter how well described they are, which means that fulfillment is not complete until the right kind of fish are provided (or until the net comes up empty enough times). As we've documented in this space more than once, even when requester and provider are on the same page, so to speak, there's no guarantee what will happen when interpretation and utilization are needed.
The lack of a shared vocabulary was not the only reason for poor communication, of course. Data consumers do not always know which system the data they're looking for is housed in, and even a clear request, if directed at the wrong data, won't get the desired results. Say I want to know something about sales: the sales system could include transactions that have not been invoiced or paid yet, whereas the finance system would not count those as sales until the income has been recognized. Somewhere along the way we need to clarify not only what questions I'm trying to answer, but which system is the best source to answer those questions.
And the data landscape has not stood still over the past dozen years or so, either. The amount of data produced and collected has grown well beyond what many of us anticipated back then, and simply keeping your head above water, so to speak, requires more effort than it used to. Much of any organization's data is likely to be hosted, or even sequestered, in the cloud, which means more work is needed in order to include that data in any kind of centralized BI or analytics stack, even if that stack is largely virtualized. Many more consumers, internal and external, are now requesting data, and some of those consumers and their requests are quite sophisticated. While self-service analytics may not yet have lived up to the hype, there's no question that more users are much more comfortable connecting to data sources, querying and visualizing, and attempting to draw meaningful conclusions from their work. (Whether their data fluency skills have kept up with their data extraction skills might well be another question.)
Every organization wants to use its data to operate more efficiently and to generate increased effectiveness. To do that, it's absolutely crucial to have a shared understanding of what data means, and what purposes it will be put towards. This effort is central to our recommended data governance best practices.
But if you are only looking at a subset of your organizational data, then there may be additional data that would be useful if only it were tracked and classified. So your data governance effort probably ought to make room for cataloging all of your data in some fashion, and for tracking its flows from system to system, and its data lineage as it makes its way into reporting stores and data products.
When your data makes its way into a repository for analysis and sharing, it could well have had quite a journey on its way. How do you know that it's accurate and up to date? How confident are you that it is complete and reliable? Data quality sounds like it might be another of the pillars that your data governance structure needs to be built on.
The central pipeline for data delivery at your organization, which even when well-governed is fragile, is undoubtedly supplemented by your growing population of citizen data scientists and the in-application analytics made possible by your SaaS systems. While there's no reason to think these other data products are unreliable, they are often not very transparent. So your data governance program could well expand to include some processes to curate and certify data products even after you' haveclassified data and agreed on terminology.
Ultimately your management dashboards and your key performance metrics are a way to put a more human face on data. You want to describe, understand, and act on data in the language of business, and in a way that is inclusive and accommodating. But the work you do behind the scenes grows ever more complicated and byzantine, so at some point it will behoove you to provide technical details that support functional needs. Your data quality standards and practices, your data-related business terminology, your reporting and analytics deliverables, your integrations and data pipelines--all of these could be audited by regulators, by inquisitive team members, and by the leadership of your organization.
Data governance, it seems to us, involves giving people as clear a path from question to answer as possible. Doing so requires both taking a broad view of data resources and assets, and also offering end-to-end visibility into your data operations. Our work on the Data Cookbook over the years has expanded out from a central core, but we believe it still hews true to its original central insights and guiding spirit. We hope you enjoyed this blog post.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(Image Credit: StockSnap_XPAJMZLRFT_Biker Path_DG Never Ends_BP #1244)