Slide 1

Slide 1 text

[email protected] www.rittmanmead.com @rittmanmead 1 Analysing the Panama Papers with 
 Oracle Big Data Spatial and Graph Robin Moffatt, Rittman Mead speakerdeck.com/rmoff/ OUGN 2017

Slide 2

Slide 2 text

[email protected] www.rittmanmead.com @rittmanmead Robin Moffatt 2 • Head of R&D, Rittman Mead • Previously OBIEE/DW developer at large UK retailer • Previously SQL Server DBA, Business Objects, 
 DB2, COBOL…. • Oracle ACE • Frequent blogger : http://ritt.md/rmoff and http://rmoff.net • Twitter: @rmoff • IRC: rmoff / #obihackers / freenode

Slide 3

Slide 3 text

[email protected] www.rittmanmead.com @rittmanmead Rittman Mead 3 • Oracle Gold Partner with offices in the UK and USA • 70+ staff delivering Oracle BI, DW, Big Data and Advanced Analytics projects • Significant web presence with the Rittman Mead Blog 
 (http://www.rittmanmead.com) • Hadoop R&D lab for “dogfooding” solutions developed for customers

Slide 4

Slide 4 text

[email protected] www.rittmanmead.com @rittmanmead What is a Property Graph and Why is it So Useful? 4 • Graph enables us to answer question that relational would struggle with • You could write recursive or procedural SQL but it would be nasty. It would also be impossible to maintain and repeat at scale. • Graph-based algorithms (e.g. PageRank) enrich an existing dataset and give us additional insights into it

Slide 5

Slide 5 text

[email protected] www.rittmanmead.com @rittmanmead Property Graph Terminology • Node/Vertex - The “What” • Edge - The “How” / Relationship • (Un)Directed - The direction of the relationship • Properties - Nodes or Edges 5

Slide 6

Slide 6 text

[email protected] www.rittmanmead.com @rittmanmead Graph Analysis Uses 6

Slide 7

Slide 7 text

[email protected] www.rittmanmead.com @rittmanmead What are the “Panama Papers”? 7 • Dataset of 11.5 million documents regarding offshore entities, released in 2015 • International Consortium of Investigative Journalists (ICIJ) analysed the raw data and made available a curated set of the data • The New York Times and The Guardian among newspapers that investigated the data in depth There are legitimate uses for offshore companies and trusts. We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names. https://www.theguardian.com/news/2016/apr/08/fallout-from-panama-papers-revelations-so-far-country-by-country

Slide 8

Slide 8 text

[email protected] www.rittmanmead.com @rittmanmead Oracle Big Data Spatial and Graph 8 • Store the Property Graph definition in HBase or Oracle NoSQL • API to load/modify data • In-memory analytic engine (PGX) loads graph for analysis, and provides built-in algorithm implementations • Also provides RDF and Spatial capabilities

Slide 9

Slide 9 text

[email protected] www.rittmanmead.com @rittmanmead New in Oracle 12.2 - Property Graph support 9 • Store the Property Graph definition in Oracle Database 12.2 - on-premises or cloud

Slide 10

Slide 10 text

[email protected] www.rittmanmead.com @rittmanmead Interacting with Property Graphs 10 • Apache Tinkerpop’s “Gremlin” • Java APIs for programatic access • Nascent Python support (pyopg) • Spark library available • Interactive visualisation and exploration of data with tools like Cytoscape (open source), Tom Sawyer (paid), etc

Slide 11

Slide 11 text

[email protected] www.rittmanmead.com @rittmanmead Notebooks 11 • Interactive code development & execution environment • Notebooks can be shared for others to run and reproduce findings • Apache Zeppelin and Jupyter two popular options • Working with Spatial and Graph: - pyopg/Jupyter - PGX/Zeppelin

Slide 12

Slide 12 text

[email protected] www.rittmanmead.com @rittmanmead Apache Zeppelin 12

Slide 13

Slide 13 text

[email protected] www.rittmanmead.com @rittmanmead Loading the Data 13 • Data can be loaded from various formats: - GraphML Data Format - GraphSON Data Format - GML Data Format - Oracle Flat File Format • Source data was CSV, which needed wrangling to fit a supported input format - Oracle Flat File Format was chosen - Supports highly-parallised loading in BDSG - BDSG now provides CSV->OPE/V Java API

Slide 14

Slide 14 text

[email protected] www.rittmanmead.com @rittmanmead Data Wrangling with R 14 panama_opv_ope.R https://gist.github.com/rmoff/17025830c81e60d6446e34a37273f705

Slide 15

Slide 15 text

[email protected] www.rittmanmead.com @rittmanmead Apache HBase - importFlatFiles() 15

Slide 16

Slide 16 text

[email protected] www.rittmanmead.com @rittmanmead Loading Property Graph in Oracle 12.2 16 • Same process as HBase, but different Java class • Make sure DB is configured max_string_size=extended

Slide 17

Slide 17 text

[email protected] www.rittmanmead.com @rittmanmead Property Graph in Oracle 12.2 17 • Property Graph data is stored in a set of tables,

Slide 18

Slide 18 text

[email protected] www.rittmanmead.com @rittmanmead Inspecting the Property Graph 18

Slide 19

Slide 19 text

[email protected] www.rittmanmead.com @rittmanmead Analysing the Property Graph - Zeppelin 19 • Native rendering support for resultset objects

Slide 20

Slide 20 text

[email protected] www.rittmanmead.com @rittmanmead Analysing the Data - Property Graph Query Language (PGQL) 20 • SQL-like language for querying property graph • Same SELECT .. WHERE clause pattern but with syntax for expressing graph relationships • http://pgql-lang.org/spec/1.0/

Slide 21

Slide 21 text

[email protected] www.rittmanmead.com @rittmanmead Simple PGQL 21

Slide 22

Slide 22 text

[email protected] www.rittmanmead.com @rittmanmead Powerful Predicate Support 22

Slide 23

Slide 23 text

[email protected] www.rittmanmead.com @rittmanmead PGX Built-In Algorithms 23

Slide 24

Slide 24 text

[email protected] www.rittmanmead.com @rittmanmead Analysing the Property Graph - SQL 24

Slide 25

Slide 25 text

[email protected] www.rittmanmead.com @rittmanmead SQL is OK - but PGQL is More Elegant and Powerful 25 with OfficerPR as (select V.vid, pr.pr from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Type' 
 and v.V = 'Officer' order by PR desc fetch first 5 rows only) select pr2.pr,v2.k,v2.v from OfficerPR pr2 inner join panamaVT$ v2 on pr2.vid = v2.vid where v2.k in ('Name','Countries'); SQL select n.pr, n.name, n.countries WHERE (n WITH Type =~ 'Officer') ORDER BY n.pr limit 5 PGQL

Slide 26

Slide 26 text

[email protected] www.rittmanmead.com @rittmanmead Exploring the Property Graph with Cytoscape 26

Slide 27

Slide 27 text

[email protected] www.rittmanmead.com @rittmanmead Exploring the Property Graph with Cytoscape 27

Slide 28

Slide 28 text

[email protected] www.rittmanmead.com @rittmanmead Exploring the Property Graph with Cytoscape 28

Slide 29

Slide 29 text

[email protected] www.rittmanmead.com @rittmanmead Layout Algorithms 29 Prefuse Force Directed Layout

Slide 30

Slide 30 text

[email protected] www.rittmanmead.com @rittmanmead Community Detection 30

Slide 31

Slide 31 text

[email protected] www.rittmanmead.com @rittmanmead Community Detection 31

Slide 32

Slide 32 text

[email protected] www.rittmanmead.com @rittmanmead Graph Beats Relational for Exploring Relationships! 32 https://panamapapers.icij.org/20160404-azerbaijan-hidden-wealth.html There are legitimate uses for offshore companies and trusts. We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names.

Slide 33

Slide 33 text

[email protected] www.rittmanmead.com @rittmanmead Load two vertices 15005001,49522 33

Slide 34

Slide 34 text

[email protected] www.rittmanmead.com @rittmanmead Set up colouring & label 34

Slide 35

Slide 35 text

[email protected] www.rittmanmead.com @rittmanmead 35

Slide 36

Slide 36 text

[email protected] www.rittmanmead.com @rittmanmead expand rosamund 36

Slide 37

Slide 37 text

[email protected] www.rittmanmead.com @rittmanmead 37

Slide 38

Slide 38 text

[email protected] www.rittmanmead.com @rittmanmead 38

Slide 39

Slide 39 text

[email protected] www.rittmanmead.com @rittmanmead 39

Slide 40

Slide 40 text

[email protected] www.rittmanmead.com @rittmanmead 40

Slide 41

Slide 41 text

[email protected] www.rittmanmead.com @rittmanmead 41

Slide 42

Slide 42 text

[email protected] www.rittmanmead.com @rittmanmead expand node 42

Slide 43

Slide 43 text

[email protected] www.rittmanmead.com @rittmanmead 43

Slide 44

Slide 44 text

[email protected] www.rittmanmead.com @rittmanmead 44

Slide 45

Slide 45 text

[email protected] www.rittmanmead.com @rittmanmead 45

Slide 46

Slide 46 text

[email protected] www.rittmanmead.com @rittmanmead 46

Slide 47

Slide 47 text

[email protected] www.rittmanmead.com @rittmanmead 47

Slide 48

Slide 48 text

[email protected] www.rittmanmead.com @rittmanmead 48

Slide 49

Slide 49 text

[email protected] www.rittmanmead.com @rittmanmead 49

Slide 50

Slide 50 text

[email protected] www.rittmanmead.com @rittmanmead EOF 50 email
 [email protected] web
 http://ritt.md/rmoff http://rmoff.net twitter
 @rmoff irc
 rmoff @ #obihackers #EOF speakerdeck.com/rmoff/ https://community.oracle.com/docs/DOC-1006400