The What and How of Data Lineage

The What and How of Data Lineage

Data at the beginning of the line usually changes by the time it gets to the end of the line.  You need to understand the changes that occur.  All organizations, including higher education institutions, require data that is secure and compliant.  This data needs to be available when and where it is needed.  This need for clean data becomes further complicated when data changes as well as with multiple end-users, platforms and data sources.  This blog post will cover what is data lineage, the benefits of it and how to implement data lineage.

Businessman finding the solution of a maze

What is Data Lineage?

Data lineage describes data origins, movements, characteristics and quality.  Data lineage describes where data begins and how it is changed to the outcome.  The movement of data creates lineage.  This movement is done by a process.  Some process examples include: Extract, Transform, and Load (ETL), report, query, API load, and data entry.  A mapping is set of lineages.  A lineage connects one or more source data object(s) to a single target data object.  We don’t recommend multiple targets in one lineage.  It can be difficult to track the full details of a lineage when only looking at object to object mappings.  The overall process may contain additional complexity and logic such as: selection criteria, multi-table joins, filters and conditional logic.  To fully understand lineage, you need to look at the mappings AND the overall process’ functional and technical details.

Benefits of Data Lineage

  • Understanding - Organizations need to ensure data meets business standards. Data lineage, being a part of data governance, provides understanding and validation of data usage as well as the risks that need to be mitigated.
  • Quality - Challenges to data quality increases when there is data movement, transformation, interpretation, and selection through people and processes. The origin of the data and how it transforms through the organization needs to be documented to prove that data quality was maintained through all stages.
  • Compliance - Multiple stakeholders, including employees and outside agencies, need to trust the reported data while quickly responding to the regulatory challenges. They need to know how the information got to the report they are viewing.  When data lineage is tracked it proves that the data on the report is accurate.
  • Analysis - Organizations need to understand how internal departments and users, as well as external data users, share the provided data and how this data changes.

How to Implement Data Lineage Effectively

  1. Recommend implementing a data governance solution, like the Data Cookbook from IData, where the data lineage can be documented.  Break down where the data resides and how it flows through the various applications.  You do not want to do this manually.
  2. Talk to individuals and find out who is using the data, what does it mean, when was it captured, when is it being used and why is it stored and/or used.
  3. Document the relationships between data including how data originates and moves between people, processes, services and products. Data stewards need to conceptualize this information from the internal entities (such as departments within an organization), external agencies and the interaction between the internal and external entities.

The power of data lineage increases trust in your data, allows for faster processes and quicker responses to organizational challenges.  Hope that this blog post helped understand data lineage and feel free to check out our recorded webinar on the subject.  All our data governance and data intelligence resources (blog posts, videos, recorded webinars, etc.) can be accessed from our data governance resources page.  Additional resources regarding data lineage can be found in this blog post.

IData has a solution, the Data Cookbook, that can aid the employees and the organization in its data governance, data stewardship and data quality initiatives. IData also has experts that can assist with data governance, reporting, integration and other technology services on an as needed basis. Feel free to contact us and let us know how we can assist.

 Contact Us

Photo Credit Businessman finding the solution of a maze #B1111

Jim Walery
About the Author

Jim Walery is a marketing professional who has been providing marketing services to technology companies for over 20 years and specifically those in higher education since 2010. Jim assists in getting the word out about the community via a variety of channels. Jim is knowledgeable in social media, blogging, collateral creation and website content. He is Inbound Marketing certified by HubSpot. Jim holds a B.A. from University of California, Irvine and a M.A. from Webster University. Jim can be reached at jwalery[at]idatainc.com.

Subscribe to Email Updates

Recent Posts

Categories