A Pragmatic Approach to Minding Data Governance Gaps

A Pragmatic Approach to Minding Data Governance Gaps

StockSnap_BXVO5YWHDY_HappyTeam_DGImprovement_BPSummer is almost officially upon us, or winter, if you're in the other hemisphere. School is out many places, and in others we're sure students (and teachers!) are counting hours. For some people, maybe things are a little less hectic now, and so we're serving a slightly longer post that covers more ground than usual. We hope you enjoy! And it's OK to read it in pieces...

Data as a strategic asset

Organizations that use data want to use their data strategically, and there are many, many ways they might do this. Acting strategically requires thinking strategically. Strategic thinking about data includes, at a minimum, some kind of cost/benefit analysis, some estimation of a range of return on investment (ROI), and above all a clear assessment of areas where data might help meet business objectives as well as areas where data might be less useful. Anyone who says, just collect it all, or store it all, or throw it all in the data lake, because we might find a use for it some day, is not someone we'd call a strategic thinker about data. How many of us have closets or basements or garages full of things that might come in handy someday? (Speaking of ways to spend extra time this summer...)

We tend to think that a data strategy is mainly a strategy for using data in support of your business strategy, but we could entertain an argument for something more distinct. If you or your organization can't articulate myriad ways data could be used to support such a strategy, then maybe the exercise to develop a written data strategy would be helpful.

It's probably impossible to think strategically about something without first understanding its importance. In the data governance world, we'll occasionally hear someone say, or we'll say it ourselves, that a key starting point involves recognizing data as an asset. 

What does that mean? Some interesting pieces have been written on this topic from an accounting perspective, but our approach here is a bit more casual. You might say it has two components. First, you have to recognize that data is an asset, meaning something that has value and that can be used. (A true financial asset would probably have some kind of quantifiable exchange value, and you may not be there, or even be headed in that direction.) The second component to this notion would involve treating data as an asset, which would, at a minimum, involve seeking to take advantage of or leverage data while at the same time trying to preserve or steward it.

Stewarding your data assets

Preserving and stewarding data in our view are practices that come from enabling and guiding use, rather than securing or locking down. When you preserve wealth, you work to ensure future generations can benefit from it. When you steward forests, you work to make them both long-lasting and useful. Stewarding a forest, doesn't mean that no trees can be felled, ever. It may mean only a few can be cut down now, or maybe none for a while, but a forest steward would have as goals both keeping that forest healthy and making that forest valuable for people.    

We still see many vendors in the data space talking about how their tools support analytics of governed data, and it's clear from context that when they say governed data that they mean determining which users get access to which data sets and data elements. In our view, that is a narrow and often self-defeating vision of data governance. 

Of course, governance includes rules and regulations (it does include the word govern, after all). But the point of governance is to help things work better and run more smoothly, right? So effective data governance would be a situation where users are able to find, understand, utilize, and build on the data they need.

The best data stewards with whom we've worked are quite knowledgeable about the data their unit or team collects and manages, and they can provide a laundry list of ways their group uses it. By and large, however, how others in an organization might find value in that data, and what additional purposes they might put it to, exist largely outside any given data steward's scope. So asking those stewards to determine whether to grant access to the data they manage, and under what circumstances, asks stewards to make decisions they're often not well positioned to make. This might even be somewhat counterproductive in environments where data security is paramount.

A more effective use of data stewards' time and knowledge might be to assist them in classifying data elements or data sets, and then to develop an organizational policy about which users can see which classifications, under which circumstances, what they can do with data they're allowed to see, and so on.  Such a policy would be based on actual knowledge and practice, and if used consistently would in all likelihood result in greater transparency and more equitable access.

Data literacy requires understanding data in context

This practice of conflating being a steward with being a guard makes for uncertainty, if not chaos, and the situation is further exacerbated by a general lack of data fluency or data literacy across many organizations. People who are not particularly well-equipped to deal with data are unlikely to know much about how data is organized and shared behind the scenes, and they're even less likely to seek to learn more about this situation. Of course, issues with data integrity are often visible, even to casual data consumers, but knowing to whom to turn to learn more, and then perhaps waiting for an investigation to be performed, and then hoping the explanation is delivered in understandable language, is a lot to ask. Unless the issue is obviously mission critical, we suspect that many users won't be willing to invest the time or effort.

It's tempting to frame data literacy as a technical skill — knowing how to use a BI tool, read a pivot table, interpret a confidence interval. And those things matter. But the more durable version of data literacy is probably better understood as a mindset, one that asks a number of related, repeatable questions: Is the source of this data known, and is it reliable? Is the analysis appropriate for the question being asked? Do the conclusions actually follow from the evidence? These aren't data skills in a narrow sense. They are habits of mind that apply equally well to a spreadsheet, a dashboard, a large language model's output, or a headline.

While data literacy might typically be characterized by abilities to understand and act on data, the people whose data literacy is most fully formed also understand that the usefulness of data is affected during its collection, by its age, and by where it is encountered during its lifecycle. A number pulled from a system of record is different from the same number after it has passed through two integrations and a transformation layer, even if both land in the same report cell or bar graph. Employees who understand this bring a fundamentally different level of scrutiny to their work than those who treat data as though it simply exists.

And we observe that the failure to build these competencies is not always, or even primarily, a failure of will, training, or resources. It is often a rational response to an illegible environment. If employees have no reliable way to know where a data element came from, what transformations it passed through, or what it actually represents in context, then investing effort in scrutinizing that data becomes an act of diminishing returns. You can train someone to ask, "Is this source reliable?" but if the honest answer is, "I have no idea and neither does anyone else," the habit will eventually wither into dust.

Obscure lineage hampers both literacy and stewardship

The ability to interact successfully with data manifests in various people at various times. By the same token, there are multiple blockers to the successful analysis and strategic use of data. Some of those are obvious: poorly implemented systems and tools, weak or shallow data quality management, incomplete or inaccessible data sets, and many more. (It seems like we've discussed so many in this very blog!)

The systems that move data from its point of origin through integrations, into warehouses, through transformation layers, and finally into reporting are, in most organizations, opaque at best. When something looks wrong in a report, knowing where to start the investigation is itself a specialized skill. The integrations that feed data stores and legacy warehouses are often invisible to the people who consume their outputs, and the lineage of data as it deepens into the analytics stack is rarely documented in any accessible way.

This data lineage conundrum is a spot where the efforts of even the strongest stewards can break down, and where even organizations with high data literacy can founder. And it may even be  more corrosive than it looks, because its effects are diffuse. No single broken thing announces itself. Instead, there is a low-grade erosion of trust in data, a background hum of uncertainty that makes confident, data-driven work harder than it needs to be — not just for sophisticated analysts, but for every employee whose job involves interpreting information.

At this point, a reader might reasonably interject: "But great lineage tools exist! There are platforms that map data as it moves through pipelines, trace a field back to its source, document transformations, flag anomalies." And that reader would be correct. These tools are genuinely valuable for the data engineers and architects who build and maintain those pipelines. For that audience, they represent real progress.

But the employees whose work we're trying to support are not that audience. They are analysts, managers, coordinators, and functional specialists whose relationship with data is transactional and contextual: they need specific data, for specific purposes, in terms they already understand. A lineage tool that maps pipeline dependencies at the system level does not help a business analyst understand why the revenue figure in one report doesn't match the one in another. It doesn't help a department head assess whether last quarter's numbers are actually comparable to the prior year's. The map it provides is not a map they can read.

This isn't a criticism of those tools. It's a recognition that they were built for a different problem. The gap between what technical lineage tooling provides and what the average data consumer actually needs is substantial — and filling it requires something other than more tooling.

Invisible problems are still problems

In many organizations, the data governance or data intelligence function is not positioned to address any of this, not because the people involved lack expertise or commitment, but because the governance model itself has been defined too narrowly to include it.

Data governance, even today, is far too often understood as a matter of access control: who can see which data, under which conditions, with what approval process. The energy goes into classification schemes, security roles, compliance documentation, and the management of data stewards as a kind of gatekeeping layer. These are not trivial concerns. Regulatory requirements are real, and managing access to sensitive data matters. As many organizations ponder sharing proprietary, even confidential information with large-language models, about which confidentiality questions are frequently still unanswered, the extent to which these concerns are not trivial becomes even more apparent.

But a governance model organized around access control runs up against the observation we made earlier: data stewards are experts in their own data — its definitions, its uses within their team, its quirks — while at the same time thcy often are essentially unaware of what happens to it downstream. It produces security tendencies  that answer "who can see this," while leaving "what does this actually mean in practice, and how does it change as it moves through systems" largely unanswered. It is a governance model that cannot easily see the lineage problem, because the journey that data takes through an organization, and especially the meanings data accumulates and sometimes loses along the way, falls between the jurisdictions it has defined.

Why AI probably won't solve this problem

If users don't trust reported data, they won't consult those reports and dashboards as frequently as they could, and they will be less likely to generate even basic inferences about what data might be saying, much less take strategic or creative action based on data. And if users are prevented from looking into data reported elsewhere, or organizational data that hasn't even made its way into reporting data stores, then the level of their trust in data is even further mooted. 

Obviously, raising the level of organizational ability to manage and utilize data will be easier when coupled with strong data stewardship and leaders who talk the talk and walk the walk when it comes to valuing data as a strategic asset. Increased data fluency will likely reinforce the quality of data stewardship and the value of organizational data, so there is the potential for a virtuous circle here.

Organizations whose employees lack key data literacy competencies are likely to be ones where data is not fully valued as an asset, nor utilized to its full extent. The drive to expand these competencies reflects many of the core data governance objectives we hear about, whether that's asking better questions about data (or using data to better answer questions about business), understanding which data are meaningful in a given context, recognizing (and remediating) low-quality or missing data, and so on. 

There's no question that AI tools, both the ones currently available and the ones that will soon be in front of us, can make it easier for employees to interact with data. Any number of tools have come along over the past several decades that make it easier for employees to interact with data, from spreadsheets to warehouses to dashboard tools, etc. And, yet, have all those tools really moved the needle when it comes to developing a data-enabled workforce? 

Much attention is now being paid to AI governance and AI literacy, and in a vacuum we'll just say we're in favor of both! Let's further observe that some of the key habits associated with data literacy are undoubtedly useful when it comes to Artificial Intelligence (AI): is the source known, and is it reliable? is the analysis appropriate for the question or topic? do the conclusions follow from the evidence? When it comes to data and AI, literate users are both skeptical and open-minded.

If the overarching goal is an organization where employees can find, understand, evaluate, and act on data with genuine confidence, then the governance model has to be broad enough to treat that goal as its own responsibility — not a downstream outcome that will emerge naturally from properly secured and documented data and/or AI assets. It has to treat data stewardship as an enabling function, not a gatekeeping one. And it has to take seriously the question of how data is understood across its full lifecycle, not just at the point of access.

That is a wider aperture than the lens through which most governance programs view data. Widening it is not primarily a technology problem. It's a problem of scope, priority, and organizational will. The good news, we think, is that organizations willing to reframe the question, to move from "how do we control access to our data" to "how do we make our data genuinely legible to the people who need it," will find that many of their literacy concerns begin to resolve on their own. And when we frame the task as making data legible, then it's easy to see how AI might easily come inside to offer a helping hand.

Some parting thoughts

A few questions worth sitting with as you think about where your organization stands:

  • What are your data stewards actually doing?

If their primary activity is fielding access requests and maintaining security classifications, they are likely underutilized — and the collaborative, context-sharing work that builds organizational data literacy isn't getting done. What would it look like to reorient even a portion of their time toward enabling rather than gatekeeping? How might AI support either security operations or steward collaboration, or both?

  • How legible is your data environment to a motivated, curious non-specialist? 

Can someone who wants to understand a discrepancy between two reports actually trace it — without escalating to a data engineer? If not, that's a governance gap worth naming as such, not just a documentation backlog. Given the right prompts, today's AI might well be able to help out with tracing lineage and performing basic validation, and motivated users might not give up in discouragement right away if they're provided with an accessible pathway for further inquiry.

  • Are your governance tools and resources designed to build understanding over time?

There's a difference between a resource that resolves today's confusion and one that leaves the user slightly better equipped for tomorrow's. The latter is harder to build but compounds in value in ways the former never does. AI tools, used well, can genuinely help here, by  surfacing context, fielding exploratory questions, in the best cases helping users develop better foundational approaches to the data they work with. But they work best in environments where the underlying data is already reasonably legible and well-described.

  • What would it mean for your governance program to treat data literacy (and, potentially, AI literacy) as an outcome it owns? 

We have argued elsewhere that data literacy efforts would not go far in an environment without serious data governance, and of course we believe that practicing help desk and just-in-time data governance builds data literacy to some extent both in the moment and iteratively. Still, this is a different way of thinking of their relationship: data literacy not as a training initiative data governance  sponsors, or a metric it reports on, but a condition it is actively responsible for creating and sustaining. This formulation might surface the real obstacles rather quickly.

These questions do not have clean universal answers: the right answers depend on where an organization is starting from, what its data environment actually looks like, and what it's trying to accomplish. They're also, not coincidentally, the kinds of questions that IData works through with clients. If any of this has landed close to home, we'd be glad to talk.

We hope you found this blog post useful.  Also check our our data governance spotlight resources located at https://www.datacookbook.com/spotlights.  IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data intelligence, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.

 Contact Us

Image Credit: StockSnap_BXVO5YWHDY_HappyTeam_DGImprovement_BP #1320 

Aaron Walker
About the Author

Aaron joined IData in 2014 after over 20 years in higher education, including more than 15 years providing analytics and decision support services. Aaron’s role at IData includes establishing data governance, training data stewards, and improving business intelligence solutions.

Subscribe to Email Updates

Recent Posts

Archives

Categories