Your Data Is Not Yet Ready for AI. Is Your Data Governance?

Your Data Is Not Yet Ready for AI. Is Your Data Governance?

StockSnap_XPAJMZLRFT_BikeRiderTrail_ReadyforAI_BPIn recent months we have read a number of articles and blog posts arguing that organizations really need to get their data quality in order prior to exposing data to generative artificial intelligence (AI). In some not-too-distant future, AI could be expected to review and analyze data sets of all sizes, and to generate anaytic assessments, insights, visualizations, and other artifacts. Theoretically, organizations will be able to train AI to dig deeper and deeper into their data, and to generate better insights from this data.

Poor data quality, in this scenario, could lead to incomplete AI model training. When data sets are fragmented, when they contain inaccuracies or inconsistencies, or when they are otherwise compromised, you could end up with biased models, weak or even contradictory insights, and unreliable automations. Low-quality data shared with AI for analysis could even result in outputs that pose security and compliance risks, potentially leading to regulatory violations or worse.

In our view, a number of questions are begged by this assertion. Which is not to say it’s an erroneous assertion; it’s just that a number of assumptions on which the argument rests could do with a little further exploration.

Questions to Ask before Getting Too Carried Away

First, what is it exactly we expect analytics to do, and how do we see AI fitting into that expectation? Will AI build data models? Will it blend data sets? Even with high-quality data, what leads us to think it will not hallucinate in key moments? We might go back even further, and ask what exactly do we mean when we say analytics, and what does it mean for AI to generate or even simply support analytics programs or frameworks?

Second, when we say poor data quality, what are we referring to? Do we mean inaccurate data? Incomplete data? Inconsistently formatted data? There exists a set of accepted dimensions of data quality, typically organized around supporting the operating needs met by transactional data. It’s far from clear to us that a data quality initiative organized around these traditional dimensions would lead to data sets appropriate for training AI models, or from which AI could generate useful output.

Third, what is wrong with our existing analytics pipelines and processes that we are looking to AI to improve? Many of our clients and prospective clients already have staff who spend time cleaning and reorganizing data sets in order to generate sophisticated analytics products, and what they report to us is that their leaders are unable (or unwilling?) to engage with these products, much less act on data-related insights. While there may be real potential for AI to support, even replace some or all of this work, how is AI going to bridge what we might think of as the "analytics last mile"?

We intend to return to each of these questions in the months ahead. For the time being, we want to suggest a realistic vision for using AI to support your data management and analytics work today, or in the very near future.

AI and the Analytics Framework 

The promise of analytics is based on the fact that organizations accumulate thousands or even millions of transactional records, resulting in a data set with the potential to provide valuable information for organizational improvement, enabling intelligent decisions, informed strategies, and accurate answers to inquiries. In reality, most of us have not one but many data sets. Each of them is potentially a rich source for mining, but it’s when they’re aggregated that the possibility for truly insightful analysis really emerges.

Extracting insight from this data has long required dedicated reporting tools, a deep knowledge of complicated database structures, the ability to integrate or otherwise join data together in a single repository, and of course, the ability to query or otherwise manipulate data into legible summaries. Many organizations have struggled to acquire and use the requisite tools, and they also struggle to hire and hold on to employees with this knowledge and skill set.

Something is needed to navigate the thicket of disparate data sets, of tools with proprietary features, and to fill in gaps where positions are vacant or skills are a mismatch. Is AI that something? What would it look like for AI to fill this role?

Data Quality - Easier Said Than Done?

If we pull back our lens a bit further, we might ask, if data quality is something we can simply achieve if we put our minds to it, why have so many organizations *not* achieved AI-ready data sets? Or, framed another way, what is it about AI that makes data quality mission-critical now, when it hasn’t been for so many years?

It may sound jarring to hear data quality described as not mission-critical, but given the number of organizations we work with that have not invested the time and effort needed to resolve ongoing data quality issues, we're afraid the evidence sort of speaks for itself.

What organizations do have is a lot of employees who know their data well, and can recognize and come up with workarounds when poor data quality affects their work. Modelers and analysts work together to clean data when it’s moved into data warehouses or other repositories.  Data stewards and subject matter experts provide feedback when dashboards or other publications contain data that appears erroneous or anomalous, and nearly all our clients sacrifice speedy data turnaround in favor of accuracy.

Are organizations who look like this missing out on time-sensitive insights? Undoubtedly. But the trade-off is that for business as usual, they can usually be confident of an acceptable amount of accuracy and completeness. Let's say, somehow, you wrangle your organization's data into a level of data quality that's good enough for AI to work with--what then? How do you maintain this level? How do you know if your data quality is slipping? What does it look like for AI to understand the scope and dimension of your data in the same way your employees do?

How Do We Know What Isn't So?

Your organization may or may not employ data scientists, but it almost certainly employs a number of people who spend at least some of their time analyzing data and sharing their analyses. In the best-case scenario, this work is actionable, meaning leaders and decision-makers draw conclusions from analytics and take actions based on those conclusions. In such organizations, the most pressing issue may be speed or volume: they want more insights delivered more quickly! And maybe AI is exactly the partner needed here.

More common, we suspect, is a situation where for any number of reasons analytics products and other data outputs do not regularly or meaningfully contribute to decisions or action. If one of the reasons for this has to do with a lack of resources, then it’s easy to think of ways AI could supplement, if not entirely fill in the gaps. But if the root cause is cultural, such as a low level of data literacy, or a leadership team that doesn’t trust the data it receives, then the ways AI could contribute probably need to be elaborated more clearly, and the expectations around those contributions might well need to be managed much more carefully.

It All Comes Back to Data Governance (or Does It?)

To recap, then. There is undoubtedly some threshold for data quality below which your everyday operations are compromised. And there is probably some threshold for data quality for reporting and analysis above which any additional cleaning or remediation efforts result in a marginal gain, at best. Most organizations probably find themselves somewhere between these two thresholds: managing data quality issues well enough to keep the lights on, so to speak, but probably spending a significant amount of staff time and energy wrangling data into just barely acceptable condition for some level of analytics and BI.

Data quality is, of course, just one spoke of the data management wheel, while data governance is more like the hub. And, from where we sit, we observe that the failure to fully and consistently govern data over the years continues to haunt organizations.

  • Without some level of curation around data products and assets, it’s difficult for data consumers to make sense of dashboards and analyses, even ones that purport to assess key performance indicators (KPIs) and other important metrics, even if it's been established beyond a shadow of a doubt that the data is accurate. This often originates from a lack of vision about analytics beyond "it'll help." A curated data environment won't in and of itself cure analytics woes, but it will help you direct attention and resources more effectively. 
  • Without data quality standards, or an active process to report, investigate, and resolve data quality problems, it’s already difficult for organizations to achieve consistency in reported data, or to build trust in data warehouses and BI tools. Having these standards and processes in place doesn't fix errors or changes in direction that occurred in the past, but it does set a foundation for trust in reliable data going forward.
  • Without a shared, agreed-on data terminology in place, it’s difficult for decisionmakers to formulate good data requests, it’s difficult for data providers to create products that meet managerial needs, and it's difficult for insights to get discovered or be acted on. A business glossary won't instantly resolve data communication or interpration issues, but it's a sizable step in the right direction, especially if your business glossary encompasses quality practices related to terminology.

It takes a very positive outlook (and, probably, an unrealistic amount of hope) to think that AI will somehow wipe away all this “governance debt.” Similarly, the idea that AI will provide real value soon, if ever, without significant progress in each of these areas is wishful, maybe even magical thinking. 

At IData, we're here to help. As part of our data governance roadmap service, we help organizations think more clearly about what their analytics objectives are, and whether their expectations, including for the use of AI, are suitable and realistic. When those objectives are agreed on and clearly established, our roadmap helps identify ways in which data quality problems compromise existing analytics efforts, and what needs to be done to make data sets suitable for AI-supported analytics. And our roadmap, in conjunction with the Data Cookbook, helps develop the practices, standards, skills, documentation, and other artifacts that are essential for the modern data-enabled enterprise.  Ultimately, getting your data shipshape for AI means performing data governance in the age of AI.

Hope you found this blog post beneficial.  To access other resources (blog posts, videos, and recorded webinars) about data governance and data intelligence feel free to check out our data governance resources page.

Feel free to contact us and let us know how we can assist.

 Contact Us

(Image Credit: StockSnap_XPAJMZLRFT_BikeRiderTrail_ReadyforAI_BP #B1290)

 

Aaron Walker
About the Author

Aaron joined IData in 2014 after over 20 years in higher education, including more than 15 years providing analytics and decision support services. Aaron’s role at IData includes establishing data governance, training data stewards, and improving business intelligence solutions.

Subscribe to Email Updates

Recent Posts

Archives

Categories