Everyone knows shadow systems are bad, and yet they still proliferate. Rather than enumerate once again why you should take pains to avoid them, we want to spend some time in this post exploring why they continue to exist, what problems they solve, and how improved data governance (data intelligence) might be a more permanent solution than whatever it is you’re doing now. In this post, when we say “shadow system” we mean a database or data set created and maintained outside of your systems of record, often without the knowledge of data trustees or stewards. Sometimes these are called “shadow databases,” but since that phrase also refers to an exact copy of a production database we will use “shadow system” here.
Shadow systems commonly show up as a spreadsheet, or series of spreadsheets, used to generate lists or track activities, and often the data in these spreadsheets duplicates, or purports to duplicate, data stored in your ERP or other key information systems. Sometimes these shadow databases use a classic database structures, often made in Microsoft Access, that reproduce (or even store) a subset of institutional data. Other times they’re data sets used for reporting. Occasionally they’re lists written in a word processor or text editor!
Let’s look at some common scenarios where shadow systems emerge.
Scenario One: I’m a casual or occasional user of our organization’s very comprehensive and complicated ERP. Extracting data from an ERP can be a challenge, and it is a time-consuming task, since I have to submit a request for the data. Once I’ve got the list of records I need, it is easier and more convenient to work from my list for future analysis, despite its age or other limitations, than to request a fresh data set. If I use this list to generate mail, maybe I track updates here – again, it’s simpler to make changes to my spreadsheet, a tool I use daily, than to try to navigate changing an address in the official system (assuming I even have access and training to perform this work).
Scenario Two: I’m an administrator or manager who has an idea, but for one reason or another I don’t know whether the data I need even exists, or whom to ask if it does, or how to get access to it. So I may begin my own data-gathering efforts and record that data outside of my institution’s systems of record.
Scenario Three: I’m a data analyst or other kind of power user, and I’m in a hurry, or I’ve got a request from above that our existing toolset doesn’t seem capable of solving. Our business intelligence (BI) tool doesn’t make graphs and charts as easily as this desktop application or piece of open-source software I have experience with. Or the data that comes from our central systems needs to be reformatted, massaged, or otherwise altered prior to use or presentation. Or I need to compare current period statistics to a previous period and it’s a quick shortcut to use the data set I requested last time, rather than go through the proper channels to construct a new one, even if errors have been removed or clean-up has occurred. Expediency is my top need, even though no one else can reconstruct or confirm the numbers I report.
So why are shadow systems problematic? There are several reasons. A major one is security, since they are generally stored in unencrypted fashion on a user’s workstation. A second critical failing is data quality: data extracted from a transactional system into a shadow data source (or, worse, entered by hand) will not have been validated, and the longer it sits in the shadow system the more likely it is to get out of date. In some cases, these shadow systems become replacements for systems of record, and users try to keep them up to date. Consistency tends to fade in these situations, and more critical data that ought to be updated in our system of record may not be updated. A third compelling concern about shadow systems is that they result in operating inefficiencies and losses of productivity, because they represent a duplication of effort. In many cases, however, this is a recipe for data inconsistency, accuracy and completeness errors, and potentially even conflict (what one of our colleagues called “data brawling”).
So how and why do people resort to shadow systems? In our experience, the primary reason is that users work more efficiently using their shadow systems.
One, they are often easy to use, or at least users have a comfort level with them. Spreadsheets are ubiquitous, and many people have at least a passing familiarity with them. Desktop database tools such as Access and Filemaker have been in existence for decades, and savvy users have figured out that they are customizable and portable. Contrast this with legacy ERP applications and complicated reporting and BI tools, which offer more features and security but can be intimidating to use and difficult to learn.
Two, shadow systems provide us with information right away. Some of this is because they’re easy to use, particularly if we’ve customized them to our personal preferences. Some of this is because they’re on our desktop or even removable storage (!), and we don’t have to connect to a remote system or go through another person to get the information we’re after.
Three, the work is often repeatable. That is, we do the same thing every time, and generally with fewer steps than we’d have to go through using organizationally sanctioned tools. In the case of shadow analytics, I may have developed a data frame (if I’m using a statistics package) or set of custom variables (in Tableau, for example) that I would have to recreate from scratch every time I load new data.
We can see the appeal of shadow systems to casual users and savvy analysts alike. Traditional practice has been to try to stamp these systems out, either via desktop control or simply loudly declared policy, and that practice is often not successful.
As we mentioned in previous posts, the goal of data governance ought to be to help people do their work more effectively, which is the same goal people have when they resort to shadow systems! Once we see how users employ shadow systems, then we can think about how we might enable them to accomplish their work without these insecure, inferior workarounds.
Some thoughts:
- If the issue is a lack of access to necessary data, or a lack of knowledge about what data exists and/or how to get to it, these are exactly the types of situations that functioning data governance would address. Do you have a clear path for a person to request data, or access to data using your existing tools? Do you have an environment where data is vetted and made available widely, or do you have an environment characterized by silos and fiefdoms? This latter model, while it served some purposes in the past, is not representative of modern data stewardship.
- Some shadow systems exist because the systems in the light don’t meet our needs. For example, a shadow system captures information that is otherwise not being captured. Perhaps there are other tools available to do this work, in which case it’s probably a good goal to transition to one of those tools, especially if you already have access to it as an organization. However, if this is not an option, then these shadow systems need to become part of your data system inventory, where they can be subject to regular review and proper oversight. If nothing else, data entry standards, rules for access and sharing, and protocols for backing up data can be applied to these systems the same way they’re applied to other campus systems.
- Advances in self-service reporting within an enterprise reporting tool like Cognos or Business Objects have probably already helped limit the proliferation of aging spreadsheets, but it’s not enough to simply make these tools available. Users need local training and support to use them easily. Merely making them available with cursory training is not going to get the job done. You have many options: provide simple canned reports for casual users; update files with the frequency that users need; develop a cadre of business analysts, report writers, and other data curators whose job includes providing timely and caring assistance for consumers and other less savvy users; and so on. To sum up: users need to know what data is being collected and where it is stored. If they don’t have access to it, they need to know who is responsible for this data, and how to request access to it. Access must be provided in a variety of ways and formats, recognizing the multiple ways data can be put to use. Acceptable use, presentation, and sharing of data must be decided on if not democratically then at least transparently, and these guidelines need to be communicated widely and clearly.
What we’re describing here is data governance: good, flexible, open processes, managed by and responsive to people, making data available for decision support and organizational management. Shadow systems exist by and large because people can’t get the data they need when they need it, in a form they can use. Better technology, whether it’s the current stack or the tools you’re evaluating, will only do as much to eradicate shadow systems as these other data governance pillars allow. Remember: shadow systems are a symptom, and efforts to wipe them out will not succeed if you don’t resolve the underlying problems that motivated users to create them.
Also feel free to review our other data system inventory resources located at this blog post.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
Let us know about your thoughts on shadow systems.
(image credit StockSnap_M86IE4N066_ShadowSystems_BP #1084)