Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graph Database makes Data Lineage easy

Graph Database makes Data Lineage easy

Data Lineage (DL) has been (re)discovered by many companies because of the GDPR enforcement in May 2018. It is the easiest way to know and keep control on “who can access what, how and why” throughout a corporate analytical platform. The challenge of DL is the non-linear aspect of most data flows, often being M-to-N relationships, making them difficult to analyze easily and quickly with traditional tools. A Graph Database is the perfect way to store and analyze metadata collected for DL because of its modular structure composed of nodes and edges. We will demonstrate an implementation and analysis of DL: generating and loading the graph into the Oracle Database, analyze it via SQL, Notebook (Python/Java) or visually with Cytoscape.

Gianni Ceresa

May 11, 2018
Tweet

More Decks by Gianni Ceresa

Other Decks in Technology

Transcript













  1. View full-size slide







  2. View full-size slide

  3. Common Enterprise Information Model
    Connecting Data with Self Service Analytic Applications
    Presentation Layer Physical Layer
    Semantic Object
    Layer
    Map Phyisical Data
    Connections
    Schemas
    Business Model
    Dimensions & Hierarchies
    Measures & Calculations
    Time Series & Aggregation
    Simplified Business View
    Subject Areas
    Security and Roles
    Preferences

    View full-size slide

  4. mapped to reference
    reference
    page contains
    contains
    Catalog ACL
    member of
    member of

    View full-size slide

  5. (what)
    (who)
    If interested in how to practically get the metadata from OAC/OBIEE have a look at

    View full-size slide

  6. vertex (node)
    vertex
    properties
    vertex ID
    edge
    edge label
    edge
    properties
    edge ID
    directed
    edge

    View full-size slide



  7. BEGIN
    OPG_APIS.CREATE_PG('sa607', 4, 8, '');
    END;

    View full-size slide









  8. View full-size slide









  9. View full-size slide







  10. View full-size slide















  11. View full-size slide





  12. Doesn’t support loading a
    graph from DB !!!

    Will support loading from DB

    View full-size slide

  13. GraalVM will make this part useless thanks to its polyglot feature
    • Python will have direct access to Java objects and methods

    View full-size slide

  14. WITH properties AS (
    SELECT DISTINCT k, t, 'Vertex' AS kind
    FROM sa607vt$
    UNION ALL
    SELECT DISTINCT k, t, 'Edge' AS kind
    FROM sa607ge$
    )
    ,cfg AS (
    SELECT '.add' || kind || 'Property("' || k || '",PropertyTypeClass.'
    || CASE WHEN t = 1 THEN 'STRING' WHEN t = 5 THEN 'DATE' END
    || ')' AS prop
    FROM properties
    ) SELECT LISTAGG(prop,'') WITHIN GROUP(ORDER BY prop) FROM cfg;

    View full-size slide







  15. View full-size slide












  16. View full-size slide





  17. View full-size slide






  18. View full-size slide