Having accessible knowledge is an important part of data governance (or also called data intelligence) and we thought in this post (and in upcoming posts) we would go back to the basics. Topic this time: is “What is the difference between a data dictionary and a business glossary? And why do we need both?”.
A data dictionary is typically tied to one database or database application, and is organized around database objects such as columns, tables, views, procedures, etc. While a data dictionary may contain some information about how a data field, for example, is used in an application, or even by the business, it will typically be richer in technical information, such as the length or datatype of a field, whether it can be nullable, what constraints or indices use it, etc.
This information is valuable to many parts of the enterprise, and more frequently than you might expect. But a data dictionary has three major shortcomings when what we need is to answer questions about data. First, I may be storing a piece of information in multiple tables, and a data dictionary will not tell me which table provides me the information I need right now, or whether similar information is residing in another table (to say nothing of existing in another system). Second, the data about which I have questions may not exist at the field level, but is instead derived from multiple fields, or calculated on-the-fly during reports or look ups! So, if there is no field in my database for "first-time, full-time student," then my data dictionary isn’t going to help me find the answer to my question of where to find these students or how to count them. Third, although this is changing somewhat, data dictionaries tend not to be very accessible to end-users and data consumers. So, no matter how comprehensive my data dictionary might be, if I can’t easily get to it when I need it then it’s of little value to me.
A business glossary, however, is oriented around terminology used by the business and business users, and data needed by them to continue daily operations, to build or assess strategic plans, and to make decisions about actions. A business glossary ought to have some facility for a researcher to find technical information, but it first should answer questions about what the data means, how it is used, who is responsible for it, whether it is classified for security or other reasons, etc. These are questions IT may not know the answers to (and in many cases IT shouldn't be expected to know them), which is why it is incumbent on data stewards and subject matter experts to contribute to the creation and maintenance of a business glossary.
Let’s go back to “first-time, full-time student.” Even if our database explicitly identifies these students (e.g. via a user-entered flag or custom field, or perhaps in a data warehouse or other transformed data source), finding this information in a data dictionary isn’t a guarantee that the field or fields in question will provide the data I need. Does “first-time, full-time” mean that the student has never taken a course at the college level before, or just that the student has never enrolled on a full-time basis? What happens if I want to count students for whom the former is true, but our database only flags them when the latter is the case?
Let’s be clear. There is no tension between a data dictionary on the one hand and a business glossary on the other. In fact they inform each other: the technical information in a data dictionary helps us clarify the functional information in a business glossary, and the functional information in a business glossary helps us track technical uses of and access to data. Ideally, just like in chemistry, the equation is balanced, and we can see business and technical information side by side. This relationship can help ensure that calculations are performed the same way every time, and that when we ask for a piece of information both the requester and the provider know exactly what is needed and where to find it/how to derive it.
Many data dictionaries have been customized to be able to include functional information, and that’s a very positive step. However, data dictionaries are vast, and repetitive, and they have serious limitations when it comes to surfacing information for users to find, or when performing searches, and especially when it comes to understanding how data that is described a certain way in a database is utilized when it makes its way into reports or other deliverables. And in our experience dictionary tools still have the look and feel of technical applications – they have not been designed with functional users in mind, and their information display has not been optimized to highlight business knowledge.
IData’s solution, the Data Cookbook, by contrast, begins with a business orientation: data definitions cannot even be created without a name, a description that reflects business usage, and a functional area to identify responsibility; then they can be associated with a whole host of technical information, including a direct link to a data dictionary entry—itself housed inside the Data Cookbook and not a database tool—if appropriate. The Data Cookbook is accessed via web browser, and it is designed to accommodate natural language searching and documentation, so users who need to understand how data operates rather than (or in addition to) where it lives can get answers quickly. The Data Cookbook also relies on data stewards and other data governance personnel to approve definitions for public consumption via a colorful user interface, rather than tasking technical resources with collecting and uploading this information.
Whether you use the Data Cookbook, or another product, or customize your existing data dictionary, your business glossary won’t go anywhere without entries. In future posts we will explore the best ways to populate your business glossary, what specific information you need to include, and how to take advantage of the knowledge your functional and technical users already possess. Other resources regarding business glossary can be found in this blog post.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance (data intelligence) and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
(image credit StockSnap_QUR0CQODB2_backtobasics_BP #1094)