Governing Data for AI Users: Some Thoughts

Written by Aaron Walker | Apr 10, 2026 5:46:16 AM

There are a number of use cases for Artificial Intelligence (AI) in the data realm, some of which we've discussed in this blog. Advancements in data profiling, along with the ability to parse and unify dozens or hundreds of existing exception reports, could lead to some real improvement in data quality, reducing the data debt you have to make a payment on every time you work with messy data. AI looks like it might besuper helpful in managing data pipelines, lakehouses, virtualization, etc., perhaps even up to a point where agents not only actively correct issues but they might even generate analytical data sets independently. And of course increased processing power, and the ability of large language models (LLMs) to turn natural language conversations and questions into SQL queries, python scripts, R functions, etc., might dramatically open up the opportunities for citizen data analysis. Any of these use cases involves opening up data sets to AI, in some fashion, and there are risks associated with doing so.

Your frontier model, whichever one you're using, is unlikely to write sprawling, inefficient queries that drag down your database, and it's unlikely to extract sensitive or confidential information and then share that information inappropriately, although the stories about what some agents have provided certainly make us want to point out that unlikely and impossible are not the same thing. What is much more likely to happen, and not just because AI is known to hallucinate, is that when asked to produce some analytics, the natural language engine will not have been trained on your business vocabulary, and will not know the vagaries of your data or the peculiarities of your organization's metrics.

So you might kick off a process that takes a very sophisticated look at a lot of data, and that generates an attractive and compelling set of charts or other visual output. And, well, the results might be meaningless, or misleading, or both.

Even the loudest cheerleaders are still well aware of these risks, and in trying to address these risks we hear a couple of recommendations over and over.

The first recommendation is something like AI governance as the newest iteration of data governance. This really has two strands. The first one involves making sure you know which data sets you're sharing with AI, and setting up pretty stringent ground rules about what goes in those data sets, what AI is allowed to do to and with data, and so on. The second strand here has to do with existing user permissions and behavior. End-users and data consumers are likely accustomed to submitting a data request to a queue, or at least taking it to a business analyst or data scientist. They don't know what's available to them in the organizational data lake, and they don't have much of a facility to find out. If by some accident these requests brush up against data these requesters don't have access to, or is otherwise questionable, enough layers of friction exist to identify questions of access, security, suitability, and so forth. But if these consumers now pose their data requests to an AI chat interface, even the most innocent of questions might result in a suggestion for augmented study that could be off the rails.

The second recommendation boils down to something like, your AI-fueled analytics needs a semantic layer to make sure that when you ask a question in natural language, AI knows how which tables and which columns in which data sets it should consult in order to answer that question. We have heard people say, think of AI like a talented and curious intern. It has lots of skills, and is eager to use them and to grow them. But it's inexperienced, it knows only a little about your industry and even less about your organization. So without your mentorship, the AI-as-intern could waste time and money, and it could even cause considerable damage.

Look, we think semantic layers in BI tools are fabulous ideas. When they're thoroughly built out and fully vetted, they illuminate data sets and data elements for analysts and developers and other key users. A semantic layer does not, however, give your AI intern deep knowledge of your organization's data in its business context. Even a universal semantic layer, which incorporates lineage and transformation across multiple sources, still sits between your analytics tools and your data stores. Calculations can be written out, for example, revenue = sum(order_amount) where status='completed'. But what does revenue mean for the business? What's an order amount? What is status in this example: order status? fulfillment status? What does it mean to be completed?

We've been arguing for years that if your vision for data governance starts and ends with security and access, or if it originates from the perceived need to restrict people's access to certain pieces of data, that's a blinkered vision, and it's one that's doomed to limited success. That vision of data governance does not enable more users to treat data as an asset from which they derive value on behalf of your organization. What it does tend to lead to is shadow databases, lingering data quality issues, furtive if not entirely secretive transactions, and much less of the productive collaboration that fuels success.

But too much of what we hear about data governance for AI seems to start and stop with security. We observed a product demonstration recently where the AI tool responded to a prompt along the lines of, "I don't have access to that data." And that was great, as far as it went. If the person writing the prompt didn't need access to that data for security reasons, then that's a win. But what if the issue was that the data in question wasn't really relevant? Or that it hadn't yet been curated into gold medallion status but it might have still have been interesting in the context of the prompt?

We tend to believe that rigorous data governance that focuses on enablement and guidance can be extended pretty easily to a lot of situations where AI can improve data usage. For a sizable number of users, having access to a semantic layer would assist in enablement and guidance. How is this revenue column in my warehouse computed? Where does this data originate? Are there some related elements that would further inform my analysis, or add useful details to the managers with whom I'm going to share this dashboard?

Conspicuous by its absence in both of these recommendations for responsibly using AI on your data is data stewardship. We conceive of data stewardship as a set of actions, decisions, principles, etc., that is largely concerned with the acquisition, storage, use, and understanding of data. Data stewards exercise control over the data their unit or office collects and uses operationally, and then we tend to expect that they can explain their data elements, terminology, processes, etc., from that perspective. Data stewards make determinations about what data can be shared, with whom, in what format, and they set standards for what constitutes high-quality data.

Any serious attempt to use AI to boost analytics has to build on whatever framework of data stewardship an organization has in place. If all you ask of data stewards is to say yes/no to requests for access, or to provide minimal links between vague business terminology and data elements, then don't be surprised when your AI intern continues to provide you with irrelevant, unreliable, or even specious data products. But, if you have a vibrant community of data stewardship practice, and a collaborative, accessible repository of stewards' knowledge, practices, and data guidelines, and if you include that in the training package you provide AI, then we suspect you'll be in a much better position to understand on, act on, and obtain value from your data.

Hope this blog post was of assistance to you and your organization. All our data governance and data intelligence resources (blog posts, videos, and recorded webinars) can be accessed from our data governance resources page. IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data intelligence, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.

Image Credit: StockSnap_89AZTB8E5H_ComputerScreen_GoverningDataAIUsers_BP #B1315

View full post