Slide 1

Slide 1 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 1 GDPR Readiness and Data Lineage for Oracle Analytics Cloud

Slide 2

Slide 2 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 2 • Oracle ACE Director Business Analytics • Oracle Analytics since 2001 (nQuire + Peregrin aquisitions by Siebel) • Speaker at OpenWorld, KScope, User Groups and open-source conferences • Part-time blogger on Analytics, BI, DWH, Data Science (http://dimensionality.ch) • Full-time IRC (freenode | #obihackers) • ODC and OCCC community advocate • Trainer for Oracle University since 2006 Who am I? Who am I? Who am I? Who am I?

Slide 3

Slide 3 text

3 Membership Tiers • Oracle ACE Director • Oracle ACE • Oracle ACE Associate bit.ly/OracleACEProgram 500+ Technical Experts Helping Peers Globally Connect: Nominate yourself or someone you know: acenomination.oracle.com @oracleace Facebook.com/oracleaces [email protected]

Slide 4

Slide 4 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 4 “Thanks” to this guy

Slide 5

Slide 5 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 5 Oracle Analytics Cloud

Slide 6

Slide 6 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 6 Oracle Analytics Cloud Oracle’s complete suite of Platform Services (PaaS) for unified analytics in the cloud • Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Immediate scale up or down ‣ Simplified licensing • Several options to suit your needs: ‣ Oracle or customer managed ‣ 3 funcational editions 6

Slide 7

Slide 7 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 7 Functionalities OAC supports every type of analytics workload 7 • Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management • Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science & machine learning

Slide 8

Slide 8 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 8 Classic Enterprise BI • Similar User Experience to OBIEE 12c – Centrally maintained & governed – Semantic model remains key • Interactive Dashboards – Ideal for KPI measurement & monitoring – Guided navigation paths • BI Publisher – Highly formatted, burst outputs • Action Framework – Navigation actions – Scheduled agents 8

Slide 9

Slide 9 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 9 Modern Data Discovery • Data Preparation – Acquire data from multiple connections – Apply enrichments data prior to analysis – Define repeatable preparation flows • Data Visualisation – Create visual insights rapidly – Construct narated storyboards – Share findings • Machine Learning – Build & train ML models – Apply model to new data sets 9

Slide 10

Slide 10 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 10 Mobile Options • Mobile Web & BI Mobile App – All DV projects will auto-render on mobile devices – The heritage mobile app supports all OAC content • Synopsis Mobile App – Automatic Excel/CSV ingestion & analysis – Extending to all DV supported sources • Day by Day – Included within Enterprise Edition – Search driven analytics – Voice recognition allows you to verablise questions – Embedded learning enables a tailored experience 1

Slide 11

Slide 11 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 11 Two Service Options 1 Analytics Cloud Analytics Cloud Autonomous Analytics Cloud Autonomous Analytics Cloud Services managed by Oracle: Backup & Recovery Service Monitoring Patching & Upgrades Test & Production instances Based on Oracle Cloud Infrastructure (OCI) Services managed by You: Based on Oracle Cloud Infrastructure Classic Out of scope … for the moment

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 13 GDPR - Yes, this is still a topic

Slide 14

Slide 14 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 14 GDPR 25 May 2018 25 May 2018 25 May 2018 25 May 2018

Slide 15

Slide 15 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 15 GDPR – the stuff you’ve heard 500 times General Data Protection Regulation – Approved by EU Parliament on April 2016 – It is already in place !! – Enforcement date: 25 May 2018 (fines started from that date) Some key points to remember: – The same across the European Union – “Personal data”: any information relating to a person who can be identified (directly or indirectly) – Fines: lot of money! Up to €20 million or 4% global revenue (the greater of the two) – Data Protection Officer – Privacy management – Breach & Notification – Data subject access requests – Data retention – Right to be forgotten Still didn’t hear of many big lawsuits, right? So everybody did everything correct and we’re cool, right? Not really, no…

Slide 16

Slide 16 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 16 GDPR Trying to keep it simple: • Know where the data is stored in your company • Who has access (can’t allow full DB access anymore) Over the last 12-24 month GDPR has been a key topic at conferences … in the database track mainly • Which DB stores what? • Who has access to the DB? ERP/CRM streams also covered the topic as they often are the entry point where data is gathered

Slide 17

Slide 17 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 17 Overconfidence The DB stores all data? GDPR compliance is easy! I control my DB, I control security in my DB, I can do auditing on it. Nothing to worry about. GDPR compliant GDPR compliant GDPR compliant GDPR compliant

Slide 18

Slide 18 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 18 Analytics and GDPR articles of law § Article 6 – Lawfulness of processing § Article 18 – Right to restriction of processing § Article 21 – Right to object § Article 22 – Automated individual decision-making, including profiling

Slide 19

Slide 19 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 19 Data multiplication / balkanization Often purposeless storage Analytics and GDPR issues

Slide 20

Slide 20 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 20 The human factor: • “I may need it” • “IT takes too long • “If I have it I can do what I want” Analytics and GDPR issues

Slide 21

Slide 21 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 21 Post-fact data modelling / mashups Data prep / data enrichment Citizen data science capabilities Analytics and GDPR issues Pretty much all the cool toys we always wanted. Thanks, GDPR…

Slide 22

Slide 22 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 22 Analytics and GDPR issues “Data preparation—also known as data enrichment— isn't new. In fact, I can almost guarantee that every analytics deployment out there has their users doing some kind of data preparation to support their visualizations.” -- Barry Mostert, Oracle

Slide 23

Slide 23 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 23 And then there’s “Analytics”… GDPR

Slide 24

Slide 24 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 24 Context set, time to dive into graphs … * source neo4j

Slide 25

Slide 25 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 25 THIS is not a “Graph”

Slide 26

Slide 26 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 26 Graph Database – What’s that? vertex (node) vertex (node) vertex (node) vertex (node) vertex properties vertex properties vertex properties vertex properties vertex ID vertex ID vertex ID vertex ID edge edge edge edge edge label edge label edge label edge label edge properties edge properties edge properties edge properties edge ID edge ID edge ID edge ID directed edge directed edge directed edge directed edge

Slide 27

Slide 27 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 27 Graph Database key factors • No “model first” • No predefined schema needed • Completely flexible and extensible • Alleviates painful (relational) shortcomings

Slide 28

Slide 28 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 28 Graph Examples Travelling from a location A to a location B : Finding shortest path between 2 nodes of the graph

Slide 29

Slide 29 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 29 Graphs for Auditing – Analytics Physical column (Physical layer) Logical column (BMM layer) mapped to mapped to mapped to mapped to Presentation column (Presentation layer) reference reference reference reference Analysis (Catalog) reference reference reference reference Dashboard page (Catalog) page contains page contains page contains page contains Dashboard (Catalog) contains contains contains contains Application role (Security) Catalog ACL Catalog ACL Catalog ACL Catalog ACL LDAP Group (LDAP) member of member of member of member of LDAP User (LDAP) member of member of member of member of

Slide 30

Slide 30 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 30 Graphs for Auditing – Security Systems Files Data / content Functional Reports

Slide 31

Slide 31 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 31 Graphs for Auditing – Data Lineage Property graph for auditing from a GDPR point of view? • Data creation / acquisition • Data is moved around • Data is transformed • Data is consumed by users or other processes Data lineage is a perfect match for a graph. Data lifecycle steps can be tracked and navigated, node by node, following edges and using properties.

Slide 32

Slide 32 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 32 Graphs for Auditing – Database A database can be seen as… • A set of schemas • A schema can have one or many tables • A table has columns • Various users/schemas can have access to some objects • Objects can be used by other objects – Synonyms, views etc. • Users run queries using objects • Users can generate “new” data from the results of queries Again – perfect match for a graph database to track data lineage!

Slide 33

Slide 33 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 33 Breaks down into 1. Source(s) 2. Target(s) 3. Transformations Perfect match for a graph database to track data lineage! Graphs for Auditing – ETL / ELT

Slide 34

Slide 34 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 34 Graphs for Auditing – ETL / ELT

Slide 35

Slide 35 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 35 Data Lineage on Steroids Data lineage graph size: OBIEE (12.2.1.1.0) Sample Application v607 : – 45'700 nodes – 105'406 edges OBIA (BIAPPS 10.2) RPD + Catalog on OAC (no security) : – 850'393 nodes – 1'717'554 edges

Slide 36

Slide 36 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 36

Slide 37

Slide 37 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 37 Pushing the Graph to the database

Slide 38

Slide 38 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 38 Graph in Oracle DB – Creation What you need: • Oracle Database 12c R2 or newer • Extended Data Types (to have varchar of more than 4’000) BEGIN OPG_APIS.CREATE_PG('sa607', 4, 8, ''); END; GE$ : edges of the graph VT$ : vertices of the graph GT$ : graph skeleton IT$ : text index metadata SS$ : graph snapshots

Slide 39

Slide 39 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 39 Graph in Oracle DB – Loading The graph is by loaded by SQL, doing standard “INSERT” into the tables

Slide 40

Slide 40 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 40 Graph in Oracle DB – Loading • The graph can by loaded by Java / Python using one of the “utility” methods of the OraclePropertyGraphUtils class • In Python it can be done by using JPype or the new GraalVM released by Oracle not long ago (warning: python support in GraalVM is still limited and “fragile”) • Example utility methods: – OraclePropertyGraphUtils.convertRDBMSTable2OPV – OraclePropertyGraphUtils.convertRDBMSTable2OPE – OraclePropertyGraphUtils.convertCSV2OPV – OraclePropertyGraphUtils.convertCSV2OPE • More methods exist to generate 2 files in the OPV/OPE format (flat text files: one for vertices, one for edges)

Slide 41

Slide 41 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 41 Graph in Oracle DB – Query limitations Classic SQL is limited – Analyse edges/vertices properties and labels – Counting – Find simple connection like (A) –[connected to]-> (B) – More complex paths require hierarchical queries as the edges “map” a source vertex to a target vertex

Slide 42

Slide 42 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 42 Graph in Oracle DB – Querying

Slide 43

Slide 43 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 43 Graph in Oracle DB – Querying Support specialised graph algorithms: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/spgdg/OPG_APIS-reference.html

Slide 44

Slide 44 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 44 Graph in Oracle DB – Advantages • Standard manipulation on data by SQL can be useful (mass updates) • Define SCD2-like with effective date columns management of data to keep the graph smaller (instead of full snapshots all the time) • NOT supported by “vanilla graph” • Standard backup and restore (remember: it’s “just” tables in the DB … kind of)

Slide 45

Slide 45 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 45 Connecting all the pieces

Slide 46

Slide 46 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 46 Graph Analysis with Cytoscape

Slide 47

Slide 47 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 47 Graph Analysis with Cytoscape

Slide 48

Slide 48 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 48 Graph Analysis with Cytoscape From 45700 nodes with From 45700 nodes with From 45700 nodes with From 45700 nodes with 105406 edges, to 85 nodes 105406 edges, to 85 nodes 105406 edges, to 85 nodes 105406 edges, to 85 nodes with 218 edges in seconds with 218 edges in seconds with 218 edges in seconds with 218 edges in seconds Catalog Catalog Catalog Catalog RPD RPD RPD RPD

Slide 49

Slide 49 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 49 OAC – Catalog Structures and Shortcuts

Slide 50

Slide 50 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 50 OAC – Catalog Structures and Shortcuts

Slide 51

Slide 51 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 51 OAC – RPD Aliases

Slide 52

Slide 52 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 52 OAC – Security

Slide 53

Slide 53 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 53 OAC – Security ANY Active Directory, LDAP, DB user/group store etc.

Slide 54

Slide 54 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 54 OAC – Security

Slide 55

Slide 55 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 55 OAC – Security

Slide 56

Slide 56 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 56 OAC – Security

Slide 57

Slide 57 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 57 OAC – Security

Slide 58

Slide 58 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 58 OAC – Security Inheritance What groups/application roles is “Leslie Emerson” part of directly or indirectly?

Slide 59

Slide 59 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 59 OAC – Security Inheritance What groups/application roles is “Leslie Emerson” part of directly or indirectly?

Slide 60

Slide 60 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 60 OAC – Security Inheritance

Slide 61

Slide 61 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 61 OAC – Data Sets

Slide 62

Slide 62 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 62 OAC – Data Sets

Slide 63

Slide 63 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 63 OAC – Data Sets The Excel mafia is alive and kicking

Slide 64

Slide 64 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 64 OAC – Data Sets

Slide 65

Slide 65 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 65 Conclusion: Graphs on OAC alleviate GDPR The free structure of a graph representing information with connections between nodes allows to store any kind of data lineage: from DB, ETL or Analytics system Graph analysis can be performed with multiple languages and tools: visually or by code/script Not only for GDRP: graphs can represent any kind of information

Slide 66

Slide 66 text

www.dimensionality.ch @Nephentur freenode | obihackers slide 66 GDPR Graphs Analytics OAC Cloud Visualization Security Data Science Self-service