We speak to current and potential clients regularly about the critical role data stewards play in data governance: developing policies, enforcing standards, guaranteeing data quality and legibility, assisting with the use of data in decision support, and much more. This critical role is almost immediately visible when clients encounter the Data Cookbook, and many of them are rightly concerned about increasing the workload for their data stewards.
We have argued for years that having robust data governance artifacts and practices in place actually simplifies things for data stewards. A data catalog integrated with a business glossary helps data consumers understand data products and terminology without repeatedly pestering stewards for refreshers. A shared business glossary generates agreement on what data means in context. A set of data quality standards, ideally accompanied with a vibrant data quality maintenance program, helps instill confidence in data sets, and it reduces the amount of effort--and data steward consultation--required to vet data for analysis. A rigorous data classification scheme makes it easy to apply standards for access and security, and frees up data steward time now spent reviewing individual requests for access, or scrutinizing dashboards to make sure they don't include confidential information.
The challenge is, getting from where organizations are today to where they might be down the road probably increases the workload in the short term on data managers and stewards. Given this fact, our recommendation generally comes down to "don't boil the ocean." Take the work you're already doing, whether it's addressing data integrity, developing management dashboards, creating or revising policy, explaining the source and definition of data, and make it something you don't have to do repeatedly. Store the outcomes of those conversations and activities in a tool like the Data Cookbook, and train users to go there first when they have questions about data. Eventually you'll reach a point of critical mass, and those repetitive time-wasting tasks go away, to be replaced--one hopes--by more productive and useful activities.
We've written in this space many times before about our observation that many problems that organizations face regarding their data actually stem from ineffective, inconsistent, or even nonexistent data governance practices. Disputes about ownership and access, disagreement on the meaning and methodology of data metrics, distrust of data stores and data providers, a lack of confidence in the completeness and accuracy of data sets, insufficient data literacy competencies and analytics maturity, and so on--all of these issues can often be traced back to gaps in data governance coverage. And one continuing reason these gaps don't get filled is that data stewards are just too busy.
We are always looking for ways to help clients be successful, and success in data governance isn't going to happen without successful data stewards. Since we don't live in a remote mountain monastery, we have heard the stories, true and otherwise, of promising new uses for generative artificial intelligence (Gen AI). And potential clients frequently ask us if our products incorporate AI, or if there's a way to use AI during implementation. (To be fair: even people who live in remote mountain monasteries have probably heard something about AI. They might even be power users!)
We know GenAI can probably help with automating some routine tasks, with replacing or supplementing some initial stages of time- or labor-intensive human interaction (such as chatbots or help desk tickets), and we know that AI's tremendous processing power can summarize huge amounts of information very quickly. If you're in data, or even data-adjacent, you have to wonder whether, and how, AI can help manage, understand, analyze, and act on the massive quantities of data nearly all organizations now possess.
So naturally we wondered, can GenAI ease the burden on data stewards to help fill these data governance gaps? So we went directly to the source, and asked one of the large language models that very question. And not to spoil things, but its response seemed pretty typical of what we've encountered so far with GenAI: some insight, some obvious observations, and something of a tendency to veer off topic.
Let's start with the obvious. And let's be clear: just because something is obvious, or basic, doesn't mean the observation isn't useful or accurate.
When it comes to data quality, AI can be a powerful and speedy profiling tool to detect missing values, inconsistencies, outliers, duplicates, discrepancies between data sets, and so on.
When it comes to compliance and data security, AI can monitor data usage, and it can potentially identify risks and vulnerabilities associated with that usage.
When it comes to data discovery, AI can summarize large and complicated data sets, which can help users decide whether a particular data set is of interest.
We extrapolate from this last assertion that AI can also scan other aspects of data catalogs, and could suggest whether certain dashboards, canned reports, or other analytics outputs might help answer a particular question or set of questions. We suspect this particular aspect would be more valuable to more users than the data set summaries.
What did we find insightful?
Our AI partner suggested that AI could "provide clear and concise explanations of technical concepts and data relationships." This is an ongoing and longstanding challenge.
It also suggested that AI could create "comprehensive documentation for data assets, including data dictionaries and lineage reports." Again, we make certain assumptions, but our expectation is that AI can generate this documentation not only by examining database metadata, but also ETL scripts, integrations, API calls, queries and semantic layers stored in BI tools, and perhaps even reporting outputs.
What was not so relevant?
We asked specifically about supporting the work of data stewards. So the observation that GenAI can provide visualizations of data and perhaps even flag trends and patterns isn't especially germane.
Now, we did not specify a definition for data steward, but we're pretty sure the prevailing understanding has to do with people who are responsible for the capture, secure storage, maintenance, sharing, and general business use of data. Data analysis is essential, but also downstream, often significantly so, of day-to-day data stewardship. Perhaps if we had given our AI engine a bit more context, the response might have been more focused.
What is the jury still out on?
AI can surely categorize large amounts of data into specific buckets, and perhaps this supports the work of granting access, applying security policies and regulations, and even--at some remove--analytics. A simple example might be recognizing that a data element that is confidential in one application also appears in another application or data set, and should probably be flagged as confidential in (or even excised from) that other location.
A more valuable use, possibly, would be recognizing where data in combination might run afoul of privacy or security regulations, and creating some notification or even taking some independent action. What sort of training would AI require in order to do that?
Our AI response also contained some cautionary language.
To wit: "AI models rely on high-quality data to produce accurate and reliable results. Poor data quality can lead to inaccurate insights and decisions. [Thus,] [d]ata stewards must ensure that data is clean, accurate, and up-to-date to maximize the benefits of AI." And also: "AI-generated content can perpetuate biases present in the training data. Data stewards must be vigilant in monitoring AI outputs and mitigating bias."
It's hard to dispute those statements, but starting a statement by saying AI can help ensure data quality, and then concluding by saying be careful not to feed AI poor-quality data, strikes us as a little circular.
So where do we find ourselves? Data governance is a nonstarter without a cohort of informed and active data stewards. Data stewards are already heavily burdened with data management responsibilities, and too many of them spend too much time answering the same questions again and again, performing repetitive tasks, or spending too much time trying to resolve what could be simple issues.
Obviously, we think there's room for AI in data governance, and there will be more room in years to come. Right now, you probably need a number of data governance artifacts in place for AI to provide value to consumers and others with questions about data. One challenge for organizations is collecting or creating those artifacts, another challenge is organizing them consistently, still another involves using them to support data-informed decision making and data-enabled employees. Our products and our practices are designed to assist clients deal with each of these challenges, and we anticipate a future where AI speeds the pace of development, improves and distributes understanding, and supports widespread adoption of data governance principles and data management best practices.
What does that governed data future look like, and when does it arrive? You could ask AI. Or, you could stand next to your dedicated data stewards, roll up your sleeves alongside them, and set about designing it for yourself.
Hope you found this blog post beneficial. To access other resources (blog posts, videos, and recorded webinars) about data governance feel free to check out our data governance resources page.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts including data quality. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(Image Credit: StockSnap_5NLKT00MVB_brain_artificialintelligence_BP #B1278)