What is Reference Data

What is Reference Data

StockSnap_YYJBMNVN6Q_whatreferencedata_blankpage_BPIn this blog post we will cover what is reference data by giving some examples and some general guidelines. The most common use of reference data are code values or lookup values. The dropdown lists from your master screens such as invoice status, employee status, shipment status, customer status, vendor status or student status are great examples of reference data. They are used daily, and reports are selected and sorted by these values. Look at the screens or reports you use every day and write down these lists of values. If any of these lists change what does it affect? We are sure at least training documentation and some reports would change (possibly integration as well). You want to be aware of any changes before they cause major issues.

We are talking about reference information versus transactional information such as list of names, dates, or activities.  It is important to note that not all foreign keys in your application from a database standpoint should be considered reference data because many of those are pointed to transactional information,

How can you decide what is reference data and what is not? Here are some general guidelines (not hard fast rules):

  • Typically, less than 100 values (some exceptions such as country code)
  • Changes very infrequently (remember these are not transactional in nature)
  • Is used as defining attribute for other data objects i.e. changes have big impacts beyond the values themselves
  • Benefits from approval / review before changes

Let us look at an employment status code where you might have the following values:

A         Employee has current appointment and is on active status
I           Employee does not have current appointment, but has not been separated; thus inactive
N         Employee is on non-paid leave of absence
P          Employee is on paid leave of absence
S          Employee has been separated from employment

A new status needs to be added: F – Employee is currently furloughed (getting medical insurance)

This addition will cause a great deal of change: programming changes, report changes, documentation changes, training changes, integration with other data system changes, etc.  And this change need to be managed.

Think of reference data from a data governance point of view. Which fields in your various databases would benefit from a level of data governance or approval review? Which ones would have great impact if changed?

Examples of Reference Data (or maybe not)

Obvious Ones

  • Employee Status (active, full-time, part-time, furlough, inactive, leave, etc.)
  • Gender (has been added to over the past few years)
  • Customer status
  • Vendor status
  • Student status
  • Product status
  • Shipment status
  • Invoice status
  • Purchase order status
  • Work order status

Possible or Thought-Provoking Ones

  • States/Provinces
  • Country Codes

Probably Not but Could Be

  • GL Numbers
  • Postal Codes (over 40,000 in the US)
  • NAICS/SIC/CEEB Codes (industry or school classifications)
  • Sections or Courses (for schools)
  • Boolean (True/False/Undetermined/Undefined)

Let us discuss some of the above a little more.

States and provinces for North America are pushing toward 100 items, but they do not change very often so fit into our reference data guidelines. If say California was split into 3 states that would have a big impact in reports and software programs. How would this be handled at your organization?

Regarding country code, there are 193 countries in the United Nations. New countries are formed, and names change. If a new country was formed, how would that change be handled at your organization?

There are a lot of postal codes but do not change often. Our opinion is that postal codes are not reference data but are more like an inventory. Same can be considered for GL Numbers and industry classifications.

Should Boolean values be reference data? Things like true false. But you might have undetermined which is different than undefined. Meaning that no one has even looked at it. What you are talking about here is more like data typing. Most people would not consider true false as reference data.

Hopefully, this post gives you some perspective on, at least how to talk about reference data, from a pragmatic data governance (data intelligence) standpoint, and what it is and what should be managed.  If you prefer seeing this information in video format then check out our "What is Reference Data" video.  Additional reference data resources (blog posts, videos, recorded webinars, etc.) can be found in this blog post.

IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.

 Contact Us

Photo Credit StockSnap_YYJBMNVN6Q_whatreferencedata_blankpage_BP #B1122

Jim Walery
About the Author

Jim Walery is a marketing professional who has been providing marketing services to technology companies for over 20 years and specifically those in higher education since 2010. Jim assists in getting the word out about the community via a variety of channels. Jim is knowledgeable in social media, blogging, collateral creation and website content. He is Inbound Marketing certified by HubSpot. Jim holds a B.A. from University of California, Irvine and a M.A. from Webster University. Jim can be reached at jwalery[at]idatainc.com.

Subscribe to Email Updates

Recent Posts

Categories