Recently we discussed the differences between a business glossary and a data dictionary, and why you are likely to need both at your organization. In this post we want to delve more deeply into the information you will probably want to capture in your business glossary including glossary elements, definitions and technical information.
Remember that a business glossary is a place to store information about your data, but from a business or end-user perspective. A successful business glossary is going to include business terminology that refers to organizational data, and definitional information about that terminology.
Let's start with names of glossary elements. These could be words, phrases, acronyms, or anything that identifies a business concept for or about which we have data. In higher education we have data about applicants for admission, about student academic history, about classes scheduled and taught, about graduation or transfer information, etc. Most organizations, higher ed or not, have information about employee history and performance, payroll and benefits, accounts payable and receivable, etc. It's also common to store information about alumni activities, donors and donations, facilities, fixed assets, etc.
In some cases the name of the glossary element is obvious because we always call it the same thing. And in some cases it’s obvious because the glossary element name is the same as the label for the data in its main database source. Even in this case, however, our recommendation is to use natural language conventions rather than machine conventions, i.e., no underscores, no abbreviations, etc.
In other cases the name is less obvious, perhaps because more than one office uses the same word or phrase to refer to different things, or because they use different names for what is essentially the same thing. So naming conventions are something you will have to deal with as you design and build your business glossary.
A simple naming and/or organizing convention might involve including the name of the office or business unit that uses the terminology, or if the term is widely used, then the office that is responsible for maintaining the data and managing its usage. Data about people who apply to be admitted as students is managed by your Admissions Office; data about people who apply for employment is probably managed by Human Resources. At a minimum we probably have two different definitions of “Applicant,” so a simple first step might be to identify one as “Admissions Applicant” and another as “Position Applicant” or something similar.
Of course this practice can go many different directions and your phrasebook can look quite different as well, which is another argument for associating definitions with an office or unit so everyone can quickly tell which group of people is responsible. If I have questions or suggestions, or if I need access to data, I know where to go with these requests. (This responsibility for defining business glossary elements is one of the many things expected of data stewards, and in our view it’s one of the most important as well!) This naming distinction is more and more crucial as we pile up more and more data sources and tools to pick through data.
A business glossary works in much the same way as a writer's glossary: it helps us pick out and use the right words/names in our conversations about institutional data. So while a concise definition, such as you might find in a regular dictionary, is valuable, we believe it's also worth including information that answers questions like "why do we care about this data?" and "how does it factor into our work?" We can and often should go even further: Where does the data come from? How does it change over time? For derived values, what are the steps to derive? For calculations, what is the basis for those calculations? For variations on a theme, what are the key differences between the variations? If a data concept is known by multiple names, what are those synonyms? So in our view there’s room for a fair amount of detail in your business definition.
Let’s go back to our Human Resources office. If we want to know how desirable our organization is as an employer, we might want to know how many people applied for open positions during the past fiscal year. But wait: what does it mean to apply for a position? Does it mean to fill out an on-line application, even if the person doesn’t supply a resume or cover letter? That person would not be considered a serious candidate, would they? This could depend on the type of position and the posting requirements. We should confirm with HR before embarking on too much analysis. So our definition of “Position Applicant” may include some business rules that must be recognized before we generate any data summaries.
Now, we can trust that our BI analysts know this information, or that they will always check with both the person making the data request and the business glossary. In a perfect world, both are true. But sometimes our HR analyst is on vacation and someone else must fill this request. Sometimes our analyst is brand new! Sometimes our data stewards have updated definitions because business operations have changed. Sometimes we are allowed to produce reports from our transactional systems, and other times we need to use another source. That sounds like a lot of variables to hope to go the right way every time. What if we just include the relevant technical information in our business glossary?
Again, this could be done several ways. The business glossary could contain a link to an element or multiple elements in an on-line data dictionary. Or it could reference a series of existing publications that contain the technical documentation we need. Or our technical definition could be a relevant snippet of code, or the name of a function or procedure. What’s critical is that analysts and other interested parties can find this information quickly and easily, and that it is consistent with our functional information.
While you might want to store all manner of additional attributes, we will just highlight a couple of elements. Frequently, the sharing of information about individuals is governed by privacy regulations or preferences. The scope of these regulations can be elusive, and the way they affect specific database elements is not always clear. Privacy and confidentiality flags are a start, but including some kind of privacy code to elements in your business glossary doesn’t require much additional effort and may well help to keep your organization’s name out of the wrong kind of headlines!
Other benefits: we often tend to think of data quality management as a set of activities outside defining data, but consider the kinds of questions about data quality that can be resolved if the issue reporter has access to a robust, up-to-date business glossary. If my glossary tells me about expected formats, potential ranges of values, historical variations, and so on, then what looks erroneous on first glance may well have a reasonable expectation. By the same token, a complete entry for any given data element can also confirm when data quality really is a problem!
Thanks for reading this quick tour of the kinds of information we have found valuable in compiling business glossaries. Our product, the Data Cookbook, was designed with these needs in mind, and is at its heart a nimble, scalable business glossary. Hundreds of organizations have found it a useful way to capture, organize, and store information about their data, and we invite you to take a look at it. In future posts we’ll consider some methods of identifying what data needs to be described in your glossary, and some common avenues for getting started with your business glossary, whatever tools you use.
(image credit StockSnap_QUR0CQODB2_backtobasics_BP #1097)