Graph Database makes Data Lineage easy

Graph Database makes Data Lineage easy

Data Lineage (DL) has been (re)discovered by many companies because of the GDPR enforcement in May 2018. It is the easiest way to know and keep control on “who can access what, how and why” throughout a corporate analytical platform. The challenge of DL is the non-linear aspect of most data flows, often being M-to-N relationships, making them difficult to analyze easily and quickly with traditional tools. A Graph Database is the perfect way to store and analyze metadata collected for DL because of its modular structure composed of nodes and edges. We will demonstrate an implementation and analysis of DL: generating and loading the graph into the Oracle Database, analyze it via SQL, Notebook (Python/Java) or visually with Cytoscape.

Bf71450537acca19e045ae6f7febdf9a?s=128

Gianni Ceresa

May 11, 2018
Tweet

Transcript

  1. None
  2. None
  3. None
  4. • • • • • • • • • •

    • •
  5. • • • • • •

  6. Common Enterprise Information Model Connecting Data with Self Service Analytic

    Applications Presentation Layer Physical Layer Semantic Object Layer Map Phyisical Data Connections Schemas Business Model Dimensions & Hierarchies Measures & Calculations Time Series & Aggregation Simplified Business View Subject Areas Security and Roles Preferences
  7. mapped to reference reference page contains contains Catalog ACL member

    of member of
  8. (what) (who) If interested in how to practically get the

    metadata from OAC/OBIEE have a look at
  9. None
  10. vertex (node) vertex properties vertex ID edge edge label edge

    properties edge ID directed edge
  11. • • BEGIN OPG_APIS.CREATE_PG('sa607', 4, 8, ''); END;

  12. • • • • • • • •

  13. • •

  14. None
  15. • • • • • • • •

  16. • • • • • •

  17. • • • • • • • • • •

    • • • •
  18. • • • • Doesn’t support loading a graph from

    DB !!! • Will support loading from DB
  19. None
  20. GraalVM will make this part useless thanks to its polyglot

    feature • Python will have direct access to Java objects and methods
  21. None
  22. WITH properties AS ( SELECT DISTINCT k, t, 'Vertex' AS

    kind FROM sa607vt$ UNION ALL SELECT DISTINCT k, t, 'Edge' AS kind FROM sa607ge$ ) ,cfg AS ( SELECT '.add' || kind || 'Property("' || k || '",PropertyTypeClass.' || CASE WHEN t = 1 THEN 'STRING' WHEN t = 5 THEN 'DATE' END || ')' AS prop FROM properties ) SELECT LISTAGG(prop,'') WITHIN GROUP(ORDER BY prop) FROM cfg;
  23. None
  24. None
  25. None
  26. None
  27. • • • • • •

  28. None
  29. None
  30. None
  31. None
  32. None
  33. • • • • • • • • • •

  34. • • • •

  35. • • • • •