On the Trials and Tribulations of Major Data Projects

StockSnap_FRU4Y0OBEY_projectteam_trialsmajorprojects_BP Clients often come to us for data governance assistance after they have migrated to a major new software solution, or after they have deployed significant upgrades to their BI stack (a new warehouse, an upgraded set of analytics tools, a lake to capture unstructured data, what have you). And although it is never too late to move forward with data governance, or to build out a data catalog, in these situations it's almost always too late to take advantage of all the "stealth" data governance (data intelligence) work you do when planning for and executing a major data or technology project.

If you're planning - or even thinking about - a significant system upgrade, or swapping out one or more key systems, or reconfiguring your analytics tool set, we recommend you keep the following challenges, opportunities, and risks in mind, and that you act accordingly. Even if you have robust data governance framework in place now, any chance to improve it seems worth reaching for!

Challenges

There are lots of challenges facing any organization looking to migrate data, or replace key applications, and including meaningful data governance can seem like a bridge too far.

As with any large data or technology project or business effort, you can have too many moving parts to keep track of. People are trying to learn a new tool that they're not yet using, plus they continue to process data in the old system, plus they're still signing off on access requests and reviewing data products. So you've already asked them to bulk up on their workload, and now you want to demand even more of their time?

Logistically, you can often get locked into go-live dates that are unrealistic, and you can easily use up a consulting or training budget prior to achieving the level of readiness you'd like. Whenever you're scrambling to meet a deadline, you're likely to make hasty decisions, to cut corners, to say "we'll get to that later" and then never really get to it, and so on.

You will often run into a new but not that different version of the "black box" problem. Most functional leads have a pretty limited understanding of the data architecture of the tools they use now, and so they're not really in a position to provide much input into data mapping decisions, which often must be made months in advance. There's a lot of training in the new tools with sample data, and often there's not enough training in the actual data your team uses. Even if the mapping is simple and the new tool has an excellent interface, the data will look different and it will seem to behave differently in the new system.

New bells and whistles can divert attention from critical foundations. A lot of legacy data applications had basic and sparse reporting capabilities. Mainly they allowed users to generate exports, or to provide minimal responses to prompts. But a lot of contemporary applications have pretty flashy in-application analytics. This is of course a welcome development, especially for your users who can take advantage of this feature and who can generate some selection of data products independently. But many cautions are advised! When users spend time learning new techniques without properly vetting the underlying data, or ensuring that data capture going forward will consistently adhere to quality standards, this effort could well be wasteful and even counter-productive.

The best way for users to gain immediate trust in a new system, no matter how bitterly they complained about the old one, is for critical reports and dashboards to be available from the minute you go live, and for those reports and dashboards to show the same figures as what they used to show, assuming you have done the work to vet those standard products and confirm their accuracy.

So what happens? Well, you can find yourself doing duplicate data entry because for whatever reason you can't complete the cutover. You can find yourself scrambling to move data into a queryable structure, which is a situation where things can fall through the cracks, or where you find yourself relying on standard products that don't fully address the way your organization uses data. We've seen clients take several steps backwards on the analytics front, generating massive exports and putting them through laborious manipulation to provide what previously had been routine, if not simple. We've seen clients bring in additional consultants from other firms to focus on aspects of the work that the original consultants didn't consider important or have the right expertise to handle.

Opportunities
There's a lot that can - and will - go wrong. But, at the same time, there are serious opportunities. Maybe not once-in-a-lifetime opportunities, but once-in-a-decade, or at the very least opportunities you're not likely to see again in your career at this organization.

For one thing, you tend to get the undivided attention of key data stakeholders, and at least the divided attention of the people who manage those stakeholders. They may never see this level of collaboration again. Now is the time to have those discussions about policy and standards, about who is capturing what kind of data and where, and how that might affect or be affected by this new tool or set of tools.

As you look for leadership and good examples for data governance after the transition, your star data stewards and analysts are likely to shine brightest. They understand the importance of the work, they identify problems and they test solutions earlier than anyone else, their advanced data literacy capabilities help them work better with (good) consultants, and so on. This is a chance to cultivate their skills and to recruit them as strategic partners, which you'll need when data problems arise (as they inevitably do).

Depending on the scope of your technology project, this will likely be a rare opportunity to examine all your data. Where does it live, how old is it, what is its quality profile, where and how is it reported/analyzed/shared, etc. In many cases, this is tribal knowledge at best: why this set of codes and not another? is this a duplicate data set or some enriched extract? who uses this report, for what, and how frequently? are they happy with it? do they really understand it? can they explain it? does this data product count the same elements as that product, or do they just use the same label for different concepts? You may never again have this collection of knowledgeable people working this closely together for this extended a period. Not doing something to capture this knowledge and make it available strikes as data malpractice.

Risks
Obviously, you can go over budget, or have a delayed or even failed implementation. That's true of any technology project or business process change. But we see some specific data-related risks with these migrations.

One major risk that far too often becomes a reality is that you fall into the same poor habits and unsustainable workarounds that caused so much frustration with the last system/architecture. Even if you do some major data cleanup, without policies, standards, safeguards, error checking and issue reporting, and visibility into the whole process, you can very easily get back into the same data quality hole. If the manner in which you provide data upon request or build data products is poorly designed, or easy to circumvent, you can very easily find yourself facing the same shadow analytics you've fallen into.

A related risk is that you'll not realize the opportunity cost of the work. This technology project is a chance to get everyone back on the same page with respect to recognizing and valuing data as a shared asset. Even if you address poor data management habits and wipe out those stopgap procedures that became standard, if you "flip the switch" and people continue to have difficulty getting the data they seek, or if the results seem inconsistent and untrustworthy, people will be dissatisfied with the outcome and regretful about the work they'd put in. You can easily find yourself having thrown good money after bad, so to speak.

Data quality issues don't get addressed sustainably. Cleaning up historical data isn't easy--if it were easy, you'd already have done it, right? Generally some level of cleansing is unavoidable, just in order to make all the data you're migrating conform to the requirements of the new system. But identify and handling data ROT (data that is redundant, obsolete, and/or trivial), especially across your entire organization, is quite an undertaking, and asking people to do it at this juncture also seems like piling on! So, low-hanging fruit gets plucked, and some cosmetic changes get made, but the root causes do not get dealt with.

Anyone who works in or around this field will hear technical debt mentioned frequently, and you've undoubtedly got some in your organization. For example, old integrations that need to be modernized and automated, old servers that need to be upgraded and/or decommissioned, aging architectures around many aspects of your technology stack that have to be updated or replaced, etc. But you've also got data debt: old, often out-of-date records that should be purged or archived, historical grants and privileges to users who aren't around any more, or to roles that need to be re-evaluated; all manner of legacy reporting that at best is just fallow, but at worst is still used to drive operations and decisions despite being unreliable or inapposite. If you don't use a data migration project to address this data debt, when are you going to?

Useful Artifacts
Unless you're in a pickle and have to do this work yourself, you'll almost always be working with a vendor or consultant to manage this transition. They bring product expertise, training resources, and project coordination--but, they have other clients, and their goals may not be 100% aligned with yours.

These consultants generally do a lot of discovery to figure out the scope of the work they're going to help with, and that discovery manifests in various artifacts: business process analyses, hardware and software profiles, data quality investigation, an inventory of existing and also nonexistent analytics outputs, and of course many, many mapping documents to facilitate the movement of data from the old system to the new ones. One of the most important decisions you'll make has to do with how much data you're going to migrate to the new system: how far back will you go, how broadly will you cast the data net, which reference data will get ported over as is and which ones will be reconfigured.

Most of the time, these artifacts - while digital - end up as the electronic equivalent of paper files: hard to find, poorly indexed (if at all), generally unavailable to answer questions in a timely fashion.

So the question naturally follows: how to make all this research something of enduring value for your organization, not only now but into the future?

You will have to document, at least to some extent, which data gets archived, and where the archives are, you will create crosswalks between old data structures and new ones, you will establish parameters about what new data you'll start to track and what old data you'll no longer record. How will you make this documentation accessible both during and after implementation? Do your implementation partners have a plan to make this business process review an ongoing effort, or do they expect to close the books and walk away when the contract is up?

Making these decisions about porting, storing, transforming, and purging data is the work of data governance, whether or not that phrase has taken hold in your organization (and, if it has, whether it's understood to mean anything beyond security and compliance). The steps involved in building a data governance framework to assign and record responsibility, to enhance accountability and transparency, and to share the work and the outcomes of these data management activities looks a lot like gathering people in a room to discuss valid values, data quality standards, whether data products are adequate, and so on.

Yes, much of the work to migrate data into a new system, to stand up that new system, and to shutter the old one will require technical expertise. But all this labor is performed in the service of what the business needs--so don't fall into the old trap that this is somehow IT's problem or IT's responsibility.

The Data Cookbook is an easy-to-use and cost-effective tool that would provide valuable assistance in this work, from defining business terminology to vetting data products to mapping data elements between systems. Whether you're replacing a major piece of software, moving to new BI tools and data sources, modernizing your data transfer pipeline, or simply adding a key new application to the mix, the conversations you have, the decisions you make, the testing you perform, can nearly all be recorded in the Data Cookbook, where they can be stored as long as necessary and made available widely to your users. Moreover, these artifacts can be managed using workflows that associate data elements with business units, and data decisions with data stewards or trustees.

Resources (blog posts, videos, and recorded webinars) related to data governance and major technology projects can be found in this blog post. Additional data governance and data intelligence resources can be accessed from here.

IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance and data intelligence efforts. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.

(Image Credit: StockSnap_FRU4Y0OBEY_projectteam_trialsmajorprojects_BP #1215)

On the Trials and Tribulations of Major Data Projects

On the Trials and Tribulations of Major Data Projects

About the Author

Subscribe to Email Updates

Recent Posts

Archives

Categories

Address