Slide 29
Slide 29 text
THE PRINCIPLES OF DATA ENRICHMENT
Operations that automatically match, correct, or interpolate data values operate with some "confidence" level, meaning that sometimes they are wrong. That means
that hundreds of thousands of matches may have been incorrect - not necessarily an issue for the particular application involved, but something for those implementing
enrichment to consider.
By following these three guiding principles, organizations can ensure that they deploy enrichment processes that enhance business value of integrated data while
minimizing risk and maximizing flexibility as requirements evolve.
The business should drive and manage enrichment definition: Data stewards who understand the incoming data and the intended
use must be the key drivers of what data is enriched, how it is done, and test of the enrichment outcomes.
Enriched data must be identifiable and audit-able in the target database: Any integration target database should feature
complete lineage metadata: where is this data element from, when was it loaded, and what happened to it along the way. This is even more true for data
added by interpolating from, augmenting, matching, or correcting source data. Analysts must know which data came directly from the source, which was
generated, and the confidence level of the latter.
Data replaced by enrichment must be available alongside the enriched data: Enrichment processes must store modified or
added data in such a way that analysts have access to the "raw" source data. Analysts should be able to independently test enrichment processes and
suggest improvements if needed. If, for whatever reason, enrichment doesn't meet specific analysis needs, then they should be able to fall back to the
original source data.
1
3
2
Smart Data Smart Region | www.smartdata.how