Nearly every organization has begun to use some software-as-a-service, where data originates, is processed and frequently stored in the cloud (or, as one of our clients likes to say, "on somebody else's computer!"). For cost- and labor-savings reasons, many organizations are moving some previously on-premise data to the cloud, or replicating it there for easier access and analysis, or setting up some kind of hybrid arrangement.
The cloud offers many advantages: many cloud applications can guarantee always-on access, automatic backups and failovers, improved performance, economies of scale, and professional expertise that organizations for whom IT is not a core competency cannot expect to be staffed for. Additionally, moving data, software, and other operations to the cloud resolves other issues: purchasing, maintaining, upgrading, securing, and decommisioning servers; testing software upgrades and applying patches; installing client applications to workstations; just to name a few.
But moving operations to the cloud raises some issues, and it often reduces your control over your data (or at least reshapes what that control looks like). Some thoughts:
- It's reasonable to be concerned about data breaches, or a general lack of data security, although your vendor's continued survival as a corporate entity probably depends on rigorous adherence to data security principles. A similarly reasonable concern can be raised about storing personally identifiable, sensitive, or confidential information in a distributed environment.
- Cloud applications are often specialized, and they may allow you to collect data that you couldn't capture or store in your legacy on-premise applications. While this new niche data may support operations in one area, it may hamper reporting or data management tasks in others. There is certainly an increased risk of problems with data consistency and data integrity.
- Cloud storage is cost-effective, and one consequence of an organization acquiring increased data storage space is that the organization is likely to store more data (by purging less often, by replicating data for reporting or extracting, by creating backups more frequently, you name it). No matter where you store it, an increased volume of data generates more opportunities for data quality issues to arise, for access to be mishandled, for data stewards to fall behind.
The cloud also forces organizations to learn new skills, update existing ones, and even replace older tools and methods. For example, many cloud vendors and applications require that data be consumed via API rather than a direct database call. Users may find it simpler or at least faster to learn to use in-application analytics tools rather than porting data to an in-house BI environment. Legacy cron jobs and scripted integrations may need to give way to other tools specifically designed to extract and load data.
It's possible to think of cloud data operations as a new paradigm: the cloud represents a new infrastructure, it engenders new vendor relationships (and it may endanger existing ones), it requires new data modeling around a reconfigured architecture, it brings weak or nonexistent data security policies into sharp relief, etc.
But paradigm shifts are disruptive, and they can be painful. While moving applications and storage to the cloud can be painful (as with any data migration or new software adoption), the result of that decision should be improved operating performance, it should resolve to more technical reliability and stability, and if it doesn't necessarily save money it should provide greater efficiency as you do more and/or receive more for the same level of investment.
In our view, one way to make sure your transition to the cloud (or a multi-cloud or hybrid cloud or whatever catchphrase comes next) is standard operating procedure, rather than a massive paradigm shift, is to apply the same data management practices and data governance practices you'd use when migrating data or converting systems or standing up new technology on premise.
We've argued many times that the most successful approach to data governance is not to try to develop a group of policies and an infrastructure to support and enforce them, but rather to identify key best practices around data governance and selectively apply them as you solve existing problems. We sometimes refer to this as "just-in-time data governance."
- Don't know exactly which vendors, tools, technologies, and platforms are in use at your organization? Create an inventory of data assets that tells you about the tools and technologies in use, and who is responsible for them.
- Requests for data take a long time to fill, and then when filled turn out to have involved duplicated effort? Build up a searchable data catalog to better understand what data products (reports, dashboards, analyses, queryable data sets, etc.) are available, and where to find them.
- Too many competing versions of the data truth? Trouble figuring out why one office calls it "X" and another calls it "Not Quite X"? Purchase a data governance tool like the Data Cookbook to empower your data stewards in the task of creating an institutional business glossary (actually the Data Cookbook will help with many data governance tasks, wink wink).
In fact, pushing some of your data activities to the cloud may be an even stronger argument for just-in-time data governance. Some actions to take:
- Migrating data is a great time to make sure you have a robust business glossary, since the calculations and derivations to produce the needed data output will be different from the legacy to the new application. Data stewards and subject matter experts need to have business rules documented and data operations defined, or you'll run into the same terminology and usage problems in the cloud that you're likely to have experienced with on-premise tools.
- If you're going to pull down data from the cloud into a system of record, a third-party application, or even just a data store, you'll need data lineage and structural metadata documented to avoid data type mismatches, duplication, and false data cognates, where columns may have similar names across two or more systems but do not actually contain the same information.
- Will cloud data be added to an existing data set for reporting and analysis? Will cloud analytics replace or supplement current analytical products? Changing your data model is always an opportunity to improve and extend your documentation. Bringing additional BI tools into your shed is the perfect moment to realize your data catalog needs updating. In some areas of life, competition is valuable; in this one, it's a drag on productivity and it hampers your organization's ability to make strategic decisions and implement them.
There are many factors involved in your decisions to move data to the cloud, including performance/effectiveness, cost savings/efficiencies, and speed/reliability. These factors are all legitimate and we hope they all receive careful, honest consideration. But remember the guiding principle of data governance: your data is an asset. You want to maximize its value, you want to leverage it, you want to put it to use supporting organizational operations. In some situations, simply moving data to the cloud increases its value as an asset. In other situations, that move will need to be carefully planned, rigorously monitored, and steadfastly managed - that is to say, appropriately governed - in order to realize this increased value.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
Credit: StockSnap_4JB0XJSMM6_MovingOnUp_Cloud_BP #1030