Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Lineage made easy with Graph Databases

Data Lineage made easy with Graph Databases

Data Lineage has always been a topic, at least for auditing, and came back as a key element with regulations like GDPR and similar. The problem is that with the multiplications of tools, sources, transformations and movements of data it's getting harder and harder to have a clear picture of the whole data lineage in a company and even more complicated to use that information for auditing. This is where graph databases jump in to make things easier: data lineage is by nature a graph. It's possible to model every single flow, every single component down to a column in a database or a dashboard. Add the whole corporate security on top with the various abstraction layers of groups and roles on top of users and your graph is ready for analysis. This talk will cover why graph databases are a perfect match for data lineage and use an analytical enterprise platform as example, tracking a single column from a database table to the very end into dashboards and reports and the respective security. (Based on Oracle Property Graph engine PGX with Cytoscape for visualization, and OAC/OBIEE for the data lineage example).

Gianni Ceresa

May 07, 2019

More Decks by Gianni Ceresa

Other Decks in Technology


  1. • • • • • • • • • •

    • Only heard about it few times, for not much $. Did you?
  2. edge edge label edge properties edge ID directed edge vertex

    (node) vertex properties vertex ID a vertex can have a label
  3. A shortcut is like a symbolic link: it makes something

    accessible with a different path. It is invisible to most users and OOTB auditing tools.
  4. An alias is an alternative way to access a data.

    It is invisible to most users and OOTB auditing tools.
  5. From 45700 nodes with 105406 edges, to 85 nodes with

    218 edges in seconds Catalog RPD Security