Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GDPR Readiness and Data Lineage for Oracle Anal...

GDPR Readiness and Data Lineage for Oracle Analytics Cloud

GDPR is a business-critical topic for any corporation working with customer data of European citizens. “Who can access what, how and why” must be tracked and controlled not only in the data storage, but throughout the corporate analytical platform, like Oracle Analytics Cloud, with its capacity for blending and analyzing data from various heterogeneous sources.
Key challenge is the complexity and non-linear aspect of much of the structural information and Data Lineage. This includes contained data flows, models and security concepts. Most often being M-to-N relationships, making them difficult to analyze with traditional tools. OAC contains various components crucial to the problem of GDPR which are easily engaged to provide a solution for full readiness.
Last but not least while most companies have invested in securing their data storage, their data usage remains completely open, untracked and untraced and makes them unprepared for inquiries on the "right to an explanation", how the personal data is actually used.

Christian Berg

December 05, 2018
Tweet

More Decks by Christian Berg

Other Decks in Technology

Transcript

  1. www.dimensionality.ch @Nephentur freenode | obihackers slide 2 • Oracle ACE

    Director Business Analytics • Oracle Analytics since 2001 (nQuire + Peregrin aquisitions by Siebel) • Speaker at OpenWorld, KScope, User Groups and open-source conferences • Part-time blogger on Analytics, BI, DWH, Data Science (http://dimensionality.ch) • Full-time IRC (freenode | #obihackers) • ODC and OCCC community advocate • Trainer for Oracle University since 2006 Who am I? Who am I? Who am I? Who am I?
  2. 3 Membership Tiers • Oracle ACE Director • Oracle ACE

    • Oracle ACE Associate bit.ly/OracleACEProgram 500+ Technical Experts Helping Peers Globally Connect: Nominate yourself or someone you know: acenomination.oracle.com @oracleace Facebook.com/oracleaces [email protected]
  3. www.dimensionality.ch @Nephentur freenode | obihackers slide 6 Oracle Analytics Cloud

    Oracle’s complete suite of Platform Services (PaaS) for unified analytics in the cloud • Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Immediate scale up or down ‣ Simplified licensing • Several options to suit your needs: ‣ Oracle or customer managed ‣ 3 funcational editions 6
  4. www.dimensionality.ch @Nephentur freenode | obihackers slide 7 Functionalities OAC supports

    every type of analytics workload 7 • Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management • Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science & machine learning
  5. www.dimensionality.ch @Nephentur freenode | obihackers slide 8 Classic Enterprise BI

    • Similar User Experience to OBIEE 12c – Centrally maintained & governed – Semantic model remains key • Interactive Dashboards – Ideal for KPI measurement & monitoring – Guided navigation paths • BI Publisher – Highly formatted, burst outputs • Action Framework – Navigation actions – Scheduled agents 8
  6. www.dimensionality.ch @Nephentur freenode | obihackers slide 9 Modern Data Discovery

    • Data Preparation – Acquire data from multiple connections – Apply enrichments data prior to analysis – Define repeatable preparation flows • Data Visualisation – Create visual insights rapidly – Construct narated storyboards – Share findings • Machine Learning – Build & train ML models – Apply model to new data sets 9
  7. www.dimensionality.ch @Nephentur freenode | obihackers slide 10 Mobile Options •

    Mobile Web & BI Mobile App – All DV projects will auto-render on mobile devices – The heritage mobile app supports all OAC content • Synopsis Mobile App – Automatic Excel/CSV ingestion & analysis – Extending to all DV supported sources • Day by Day – Included within Enterprise Edition – Search driven analytics – Voice recognition allows you to verablise questions – Embedded learning enables a tailored experience 1
  8. www.dimensionality.ch @Nephentur freenode | obihackers slide 11 Two Service Options

    1 Analytics Cloud Analytics Cloud Autonomous Analytics Cloud Autonomous Analytics Cloud Services managed by Oracle: Backup & Recovery Service Monitoring Patching & Upgrades Test & Production instances Based on Oracle Cloud Infrastructure (OCI) Services managed by You: Based on Oracle Cloud Infrastructure Classic Out of scope … for the moment
  9. www.dimensionality.ch @Nephentur freenode | obihackers slide 15 GDPR – the

    stuff you’ve heard 500 times General Data Protection Regulation – Approved by EU Parliament on April 2016 – It is already in place !! – Enforcement date: 25 May 2018 (fines started from that date) Some key points to remember: – The same across the European Union – “Personal data”: any information relating to a person who can be identified (directly or indirectly) – Fines: lot of money! Up to €20 million or 4% global revenue (the greater of the two) – Data Protection Officer – Privacy management – Breach & Notification – Data subject access requests – Data retention – Right to be forgotten Still didn’t hear of many big lawsuits, right? So everybody did everything correct and we’re cool, right? Not really, no…
  10. www.dimensionality.ch @Nephentur freenode | obihackers slide 16 GDPR Trying to

    keep it simple: • Know where the data is stored in your company • Who has access (can’t allow full DB access anymore) Over the last 12-24 month GDPR has been a key topic at conferences … in the database track mainly • Which DB stores what? • Who has access to the DB? ERP/CRM streams also covered the topic as they often are the entry point where data is gathered
  11. www.dimensionality.ch @Nephentur freenode | obihackers slide 17 Overconfidence The DB

    stores all data? GDPR compliance is easy! I control my DB, I control security in my DB, I can do auditing on it. Nothing to worry about. GDPR compliant GDPR compliant GDPR compliant GDPR compliant
  12. www.dimensionality.ch @Nephentur freenode | obihackers slide 18 Analytics and GDPR

    articles of law § Article 6 – Lawfulness of processing § Article 18 – Right to restriction of processing § Article 21 – Right to object § Article 22 – Automated individual decision-making, including profiling
  13. www.dimensionality.ch @Nephentur freenode | obihackers slide 19 Data multiplication /

    balkanization Often purposeless storage Analytics and GDPR issues
  14. www.dimensionality.ch @Nephentur freenode | obihackers slide 20 The human factor:

    • “I may need it” • “IT takes too long • “If I have it I can do what I want” Analytics and GDPR issues
  15. www.dimensionality.ch @Nephentur freenode | obihackers slide 21 Post-fact data modelling

    / mashups Data prep / data enrichment Citizen data science capabilities Analytics and GDPR issues Pretty much all the cool toys we always wanted. Thanks, GDPR…
  16. www.dimensionality.ch @Nephentur freenode | obihackers slide 22 Analytics and GDPR

    issues “Data preparation—also known as data enrichment— isn't new. In fact, I can almost guarantee that every analytics deployment out there has their users doing some kind of data preparation to support their visualizations.” -- Barry Mostert, Oracle
  17. www.dimensionality.ch @Nephentur freenode | obihackers slide 26 Graph Database –

    What’s that? vertex (node) vertex (node) vertex (node) vertex (node) vertex properties vertex properties vertex properties vertex properties vertex ID vertex ID vertex ID vertex ID edge edge edge edge edge label edge label edge label edge label edge properties edge properties edge properties edge properties edge ID edge ID edge ID edge ID directed edge directed edge directed edge directed edge
  18. www.dimensionality.ch @Nephentur freenode | obihackers slide 27 Graph Database key

    factors • No “model first” • No predefined schema needed • Completely flexible and extensible • Alleviates painful (relational) shortcomings
  19. www.dimensionality.ch @Nephentur freenode | obihackers slide 28 Graph Examples Travelling

    from a location A to a location B : Finding shortest path between 2 nodes of the graph
  20. www.dimensionality.ch @Nephentur freenode | obihackers slide 29 Graphs for Auditing

    – Analytics Physical column (Physical layer) Logical column (BMM layer) mapped to mapped to mapped to mapped to Presentation column (Presentation layer) reference reference reference reference Analysis (Catalog) reference reference reference reference Dashboard page (Catalog) page contains page contains page contains page contains Dashboard (Catalog) contains contains contains contains Application role (Security) Catalog ACL Catalog ACL Catalog ACL Catalog ACL LDAP Group (LDAP) member of member of member of member of LDAP User (LDAP) member of member of member of member of
  21. www.dimensionality.ch @Nephentur freenode | obihackers slide 30 Graphs for Auditing

    – Security Systems Files Data / content Functional Reports
  22. www.dimensionality.ch @Nephentur freenode | obihackers slide 31 Graphs for Auditing

    – Data Lineage Property graph for auditing from a GDPR point of view? • Data creation / acquisition • Data is moved around • Data is transformed • Data is consumed by users or other processes Data lineage is a perfect match for a graph. Data lifecycle steps can be tracked and navigated, node by node, following edges and using properties.
  23. www.dimensionality.ch @Nephentur freenode | obihackers slide 32 Graphs for Auditing

    – Database A database can be seen as… • A set of schemas • A schema can have one or many tables • A table has columns • Various users/schemas can have access to some objects • Objects can be used by other objects – Synonyms, views etc. • Users run queries using objects • Users can generate “new” data from the results of queries Again – perfect match for a graph database to track data lineage!
  24. www.dimensionality.ch @Nephentur freenode | obihackers slide 33 Breaks down into

    1. Source(s) 2. Target(s) 3. Transformations Perfect match for a graph database to track data lineage! Graphs for Auditing – ETL / ELT
  25. www.dimensionality.ch @Nephentur freenode | obihackers slide 35 Data Lineage on

    Steroids Data lineage graph size: OBIEE (12.2.1.1.0) Sample Application v607 : – 45'700 nodes – 105'406 edges OBIA (BIAPPS 10.2) RPD + Catalog on OAC (no security) : – 850'393 nodes – 1'717'554 edges
  26. www.dimensionality.ch @Nephentur freenode | obihackers slide 38 Graph in Oracle

    DB – Creation What you need: • Oracle Database 12c R2 or newer • Extended Data Types (to have varchar of more than 4’000) BEGIN OPG_APIS.CREATE_PG('sa607', 4, 8, ''); END; GE$ : edges of the graph VT$ : vertices of the graph GT$ : graph skeleton IT$ : text index metadata SS$ : graph snapshots
  27. www.dimensionality.ch @Nephentur freenode | obihackers slide 39 Graph in Oracle

    DB – Loading The graph is by loaded by SQL, doing standard “INSERT” into the tables
  28. www.dimensionality.ch @Nephentur freenode | obihackers slide 40 Graph in Oracle

    DB – Loading • The graph can by loaded by Java / Python using one of the “utility” methods of the OraclePropertyGraphUtils class • In Python it can be done by using JPype or the new GraalVM released by Oracle not long ago (warning: python support in GraalVM is still limited and “fragile”) • Example utility methods: – OraclePropertyGraphUtils.convertRDBMSTable2OPV – OraclePropertyGraphUtils.convertRDBMSTable2OPE – OraclePropertyGraphUtils.convertCSV2OPV – OraclePropertyGraphUtils.convertCSV2OPE • More methods exist to generate 2 files in the OPV/OPE format (flat text files: one for vertices, one for edges)
  29. www.dimensionality.ch @Nephentur freenode | obihackers slide 41 Graph in Oracle

    DB – Query limitations Classic SQL is limited – Analyse edges/vertices properties and labels – Counting – Find simple connection like (A) –[connected to]-> (B) – More complex paths require hierarchical queries as the edges “map” a source vertex to a target vertex
  30. www.dimensionality.ch @Nephentur freenode | obihackers slide 43 Graph in Oracle

    DB – Querying Support specialised graph algorithms: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/spgdg/OPG_APIS-reference.html
  31. www.dimensionality.ch @Nephentur freenode | obihackers slide 44 Graph in Oracle

    DB – Advantages • Standard manipulation on data by SQL can be useful (mass updates) • Define SCD2-like with effective date columns management of data to keep the graph smaller (instead of full snapshots all the time) • NOT supported by “vanilla graph” • Standard backup and restore (remember: it’s “just” tables in the DB … kind of)
  32. www.dimensionality.ch @Nephentur freenode | obihackers slide 48 Graph Analysis with

    Cytoscape From 45700 nodes with From 45700 nodes with From 45700 nodes with From 45700 nodes with 105406 edges, to 85 nodes 105406 edges, to 85 nodes 105406 edges, to 85 nodes 105406 edges, to 85 nodes with 218 edges in seconds with 218 edges in seconds with 218 edges in seconds with 218 edges in seconds Catalog Catalog Catalog Catalog RPD RPD RPD RPD
  33. www.dimensionality.ch @Nephentur freenode | obihackers slide 53 OAC – Security

    ANY Active Directory, LDAP, DB user/group store etc.
  34. www.dimensionality.ch @Nephentur freenode | obihackers slide 58 OAC – Security

    Inheritance What groups/application roles is “Leslie Emerson” part of directly or indirectly?
  35. www.dimensionality.ch @Nephentur freenode | obihackers slide 59 OAC – Security

    Inheritance What groups/application roles is “Leslie Emerson” part of directly or indirectly?
  36. www.dimensionality.ch @Nephentur freenode | obihackers slide 65 Conclusion: Graphs on

    OAC alleviate GDPR The free structure of a graph representing information with connections between nodes allows to store any kind of data lineage: from DB, ETL or Analytics system Graph analysis can be performed with multiple languages and tools: visually or by code/script Not only for GDRP: graphs can represent any kind of information
  37. www.dimensionality.ch @Nephentur freenode | obihackers slide 66 GDPR Graphs Analytics

    OAC Cloud Visualization Security Data Science Self-service