We are talking about reference information versus transactional information such as list of names, dates, or activities. It is important to note that not all foreign keys in your application from a database standpoint should be considered reference data because many of those are pointed to transactional information,
How can you decide what is reference data and what is not? Here are some general guidelines (not hard fast rules):
Let us look at an employment status code where you might have the following values:
A Employee has current appointment and is on active status
I Employee does not have current appointment, but has not been separated; thus inactive
N Employee is on non-paid leave of absence
P Employee is on paid leave of absence
S Employee has been separated from employment
A new status needs to be added: F – Employee is currently furloughed (getting medical insurance)
This addition will cause a great deal of change: programming changes, report changes, documentation changes, training changes, integration with other data system changes, etc. And this change need to be managed.
Think of reference data from a data governance point of view. Which fields in your various databases would benefit from a level of data governance or approval review? Which ones would have great impact if changed?
Examples of Reference Data (or maybe not)
Obvious Ones
Possible or Thought-Provoking Ones
Probably Not but Could Be
Let us discuss some of the above a little more.
States and provinces for North America are pushing toward 100 items, but they do not change very often so fit into our reference data guidelines. If say California was split into 3 states that would have a big impact in reports and software programs. How would this be handled at your organization?
Regarding country code, there are 193 countries in the United Nations. New countries are formed, and names change. If a new country was formed, how would that change be handled at your organization?
There are a lot of postal codes but do not change often. Our opinion is that postal codes are not reference data but are more like an inventory. Same can be considered for GL Numbers and industry classifications.
Should Boolean values be reference data? Things like true false. But you might have undetermined which is different than undefined. Meaning that no one has even looked at it. What you are talking about here is more like data typing. Most people would not consider true false as reference data.
Hopefully, this post gives you some perspective on, at least how to talk about reference data, from a pragmatic data governance (data intelligence) standpoint, and what it is and what should be managed. If you prefer seeing this information in video format then check out our "What is Reference Data" video. Additional reference data resources (blog posts, videos, recorded webinars, etc.) can be found in this blog post.
IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.
Photo Credit StockSnap_YYJBMNVN6Q_whatreferencedata_blankpage_BP #B1122