Recently we were working with a client and it came to us after much discussion that the right name for the key problems they were facing was something that should be called "data friction." In the modern enterprise, data needs to flow smoothly so that it gets to the right users at the right time, in a usable format. But, there are any number of places where things rub against data, so to speak, preventing its efficient flow and timely usage.
The phrase came to mind just recently, but these issues are things we've encountered with many of our clients over the years. Now, you might prefer to call this phenomenon a "data bottleneck," and in some instances we'd even consider "data inertia" an excellent description! No matter what you call it, however, this kind of friction usually isn't good for data. Well, maybe a certain amount is, if applied correctly, since the unfettered flow of data carries risks as well.
What we see is data collected without real thought into its usability or eventual resting places, or we see missed opportunities to collect data that could have real value. We see users jumping through hoops just to discover the existence of data sets to request access to, and we see unnecessarily elaborate mechanisms for granting this access. We see a patchwork of integrations, too few of them automated, and far too many of them undocumented and fragile. We see reports, analyses, dashboards, and similar data products taking too long to produce and deliver. When delivered, we see that they frequently do not include important data points, or they present the data in ways that discourage understanding.
We have written in this space before about the importance of understanding the data lifecycle, and the necessity of applying different governance practices and principles at different moments. Today we want to revisit this topic, but this time looking at it through this lens of data friction. What causes data friction, and how can it be avoided or treated?
When collecting data, questions about its provenance and relevance are (or should be) at the fore, and real business acumen must be paired with technical decisions. Data collection involves recognizing which data will be useful, how to collect it, what level of scrutiny it requires upon collection, where the best original storage location is, and so on.
Where friction occurs: Data stewards are generally the deciders here, as they have the best knowledge of which data is necessary for their domain to perform their work. However, these data stewards need to take an enterprise perspective, understanding that other offices and users will interact with this data at some point. Too often, one office lacks insight into the data another office collects, and ends up creating a duplicate collection. We also see changes in collection methods when a change in leadership occurs, often because the outgoing manager failed to document why they did things the way they did.
How to avoid or reduce this friction: a data catalog, such as our Data Cookbook, allows you to create a central repository that describes, at a high level, which systems and applications are in use, and what kinds of data each of them utilizes. An organizational data strategy that specifies the basic parameters of data collection and usage also provides clarity in this area.
As data is stored, maintained, and enriched, it tends to travel across systems, and responsibility for that data may shift; in this stage of the lifecycle, ensuring the completeness, integrity, and validity of data is a top priority. Integrations, feeds, and other data loads are usually the key mechanisms here, although it's not uncommon for data to remain at rest in its originating system but for responsibility to change.
Where friction occurs: integrating data between systems is possibly the greatest source of data friction. In many cases, it's an onerous and labor-intensive process to perform this task. Even with modern integration tools, mapping data from one system to another is challenging ,(especially with different date formats and reference code differences) and heaven forbid you need to update this mapping! We might also include our old friend data ROT (redundancy, obsolescence, triviality) in this area, as this ROT interferes with the maintenance and accessibility of useful and relevant data.
How to reduce friction: we assume, vendor claims to the contrary notwithstanding, that there is no one-size-fits-all tool to manage your disparate integration needs. All other things being equal, a powerful integration tool that automates, checks for errors, sends notifications, etc., is a great place to start. We recommend documenting data flows, integrations, and data sharing agreements from both the functional and technical sides. It's not simply what gets pushed (or transferred, or copied), but why, and how often, and what to do when you want to make changes, that needs to be widely understood.
In most data lifecycle models, storage / maintenance precedes analysis and sharing/publication, but of course data can be extracted any time for analysis, aggregation, etc. No matter when these operations occur, the outcomes are best when the data in question is reliable and trustworthy. Typically, that means some level of curation--defining, classifying, organizing, or otherwise reviewing and preparing--prior to extraction. In certain circumstances, some downstream certification or validation is recommended, to make sure that manipulations or recharacterizations in the logical, semantic, or presentation layer don't irrevocably alter the meaning or reliability of the data.
Where friction occurs: much of the time, friction occurs because of a failure of communication. Frequently, terminology differs from office to office, or unit to unit. Data requesters and consumers have a different relationship to and understanding of data than data providers and analysts, and so they ask different questions of each other as well as of the data. Friction can also occur if it is difficult or time-consuming to deliver analytics, visualizations, or other data product. Increasingly we see friction introduced into the BI realm as a result of integraton failures: valuable data is not available inside the reporting data set, so extraordinary (and almost always slow) measures have to be taken to include that information in the final output.
How to reduce friction: a key feature of the Data Cookbook is a business glossary that is tied tightly to a data processing catalog, which allows your data teams to agree on terminology and to identify which reports, dashboards, integrations, ETL transactions, etc., include data elements represented by those terms. We've seen a lot of ETL documentation over the years, and it's often far too technical to be used by most analysts, let alone consumers, and it tends not to be widely available when it would be most useful (e.g., when creating a management dashboard). Even without the Data Cookbook, robust specifications for data products go a long way towards improving understanding, and consumer-oriented semantic layers and dashboard annotations provide useful data lubrication (?).
At some point, data must be actively retained, or archived in some fashion, or even disposed of. This stage of the data lifecyle is no less complex than any other, although it does seem to attract the least attention--that is, until something goes wrong, and needed data cannot be found, or data that should be deleted persists.
Where friction occurs: Most organizations are bound by at least one set of data protection regulations, which guide the ultimate disposition of certain data records. But complying with these regulations is not always routine, and without repeatable and well-documented procedures in place, you run the risk of edging out of compliance (at best).
How to reduce friction: For data not explicitly covered by some regime, we recommend a set of internal data policies or guidelines identifying whether data should be retained for continued use, archived in case of future need, or destroyed. And, of course, a definition of each of these outcomes, along with clear instructions for what it means to achieve them, is essential as well. For example, if you periodically freeze data in order to perform point-in-time reporting, is that considered archiving? Another key recommendation is to not arrive at the moment of disposition not knowing which data falls into which category. Classifying data when it is collected and stored, and propagating those classifications throughout possession, makes it much easier to recognize which bucket data falls in at the end of the lifecycle.
In a vacuum, erring on the side of collecting too much data is probably better than collecting too little, and sharing too much is probably better than sharing too little. The disadvantages to data friction almost certainly outweigh the occasional benefit. Still, recognizing that data is a strategic asset, and treating it as such, should mean that you have a plan for it: how and when to acquire, what purposes it will be put toward, how long and where to hold onto it, and what to do with it when you're finished with it.
A process whereby data stakeholders are identified and involved from the beginning, where they can work collaboratively and transparently to access, share, and effectively utilize data, will go a long way towards eliminating or reducing data friction. The right tools, such as our Data Cookbook and our professional data governance services, can surely help.
Image: StockSnap_1S9HNWSSI9_rockyshore_datafriction_BP B1268