Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Oracle Big Data Spatial and Graph enables the analysis of datasets beyond that of standard relational analytics commonly used. Through Graph technology relationships can be identified that may not otherwise have been. This has practical uses including in product recommendations, social network analysis, and fraud detection.

In this presentation we will see a practical demonstration of Oracle Big Data Spatial and Graph to load and analyse the "Panama Papers" dataset. Graph algorithms will be utilised to identify key actors and organisations within the data, and patterns of relationships shown. This practical example of using the tool will give attendees a clear idea of the functionality of the tool and how it could be used within their own organisation.

Robin Moffatt

March 09, 2017
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. [email protected] www.rittmanmead.com @rittmanmead 1 Analysing the Panama Papers with 


    Oracle Big Data Spatial and Graph Robin Moffatt, Rittman Mead speakerdeck.com/rmoff/ OUGN 2017
  2. [email protected] www.rittmanmead.com @rittmanmead Robin Moffatt 2 • Head of R&D,

    Rittman Mead • Previously OBIEE/DW developer at large UK retailer • Previously SQL Server DBA, Business Objects, 
 DB2, COBOL…. • Oracle ACE • Frequent blogger : http://ritt.md/rmoff and http://rmoff.net • Twitter: @rmoff • IRC: rmoff / #obihackers / freenode
  3. [email protected] www.rittmanmead.com @rittmanmead Rittman Mead 3 • Oracle Gold Partner

    with offices in the UK and USA • 70+ staff delivering Oracle BI, DW, Big Data and Advanced Analytics projects • Significant web presence with the Rittman Mead Blog 
 (http://www.rittmanmead.com) • Hadoop R&D lab for “dogfooding” solutions developed for customers
  4. [email protected] www.rittmanmead.com @rittmanmead What is a Property Graph and Why

    is it So Useful? 4 • Graph enables us to answer question that relational would struggle with • You could write recursive or procedural SQL but it would be nasty. It would also be impossible to maintain and repeat at scale. • Graph-based algorithms (e.g. PageRank) enrich an existing dataset and give us additional insights into it
  5. [email protected] www.rittmanmead.com @rittmanmead Property Graph Terminology • Node/Vertex - The

    “What” • Edge - The “How” / Relationship • (Un)Directed - The direction of the relationship • Properties - Nodes or Edges 5
  6. [email protected] www.rittmanmead.com @rittmanmead What are the “Panama Papers”? 7 •

    Dataset of 11.5 million documents regarding offshore entities, released in 2015 • International Consortium of Investigative Journalists (ICIJ) analysed the raw data and made available a curated set of the data • The New York Times and The Guardian among newspapers that investigated the data in depth There are legitimate uses for offshore companies and trusts. We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names. https://www.theguardian.com/news/2016/apr/08/fallout-from-panama-papers-revelations-so-far-country-by-country
  7. [email protected] www.rittmanmead.com @rittmanmead Oracle Big Data Spatial and Graph 8

    • Store the Property Graph definition in HBase or Oracle NoSQL • API to load/modify data • In-memory analytic engine (PGX) loads graph for analysis, and provides built-in algorithm implementations • Also provides RDF and Spatial capabilities
  8. [email protected] www.rittmanmead.com @rittmanmead New in Oracle 12.2 - Property Graph

    support 9 • Store the Property Graph definition in Oracle Database 12.2 - on-premises or cloud
  9. [email protected] www.rittmanmead.com @rittmanmead Interacting with Property Graphs 10 • Apache

    Tinkerpop’s “Gremlin” • Java APIs for programatic access • Nascent Python support (pyopg) • Spark library available • Interactive visualisation and exploration of data with tools like Cytoscape (open source), Tom Sawyer (paid), etc
  10. [email protected] www.rittmanmead.com @rittmanmead Notebooks 11 • Interactive code development &

    execution environment • Notebooks can be shared for others to run and reproduce findings • Apache Zeppelin and Jupyter two popular options • Working with Spatial and Graph: - pyopg/Jupyter - PGX/Zeppelin
  11. [email protected] www.rittmanmead.com @rittmanmead Loading the Data 13 • Data can

    be loaded from various formats: - GraphML Data Format - GraphSON Data Format - GML Data Format - Oracle Flat File Format • Source data was CSV, which needed wrangling to fit a supported input format - Oracle Flat File Format was chosen - Supports highly-parallised loading in BDSG - BDSG now provides CSV->OPE/V Java API
  12. [email protected] www.rittmanmead.com @rittmanmead Loading Property Graph in Oracle 12.2 16

    • Same process as HBase, but different Java class • Make sure DB is configured max_string_size=extended
  13. [email protected] www.rittmanmead.com @rittmanmead Analysing the Data - Property Graph Query

    Language (PGQL) 20 • SQL-like language for querying property graph • Same SELECT .. WHERE clause pattern but with syntax for expressing graph relationships • http://pgql-lang.org/spec/1.0/
  14. [email protected] www.rittmanmead.com @rittmanmead SQL is OK - but PGQL is

    More Elegant and Powerful 25 with OfficerPR as (select V.vid, pr.pr from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Type' 
 and v.V = 'Officer' order by PR desc fetch first 5 rows only) select pr2.pr,v2.k,v2.v from OfficerPR pr2 inner join panamaVT$ v2 on pr2.vid = v2.vid where v2.k in ('Name','Countries'); SQL select n.pr, n.name, n.countries WHERE (n WITH Type =~ 'Officer') ORDER BY n.pr limit 5 PGQL
  15. [email protected] www.rittmanmead.com @rittmanmead Graph Beats Relational for Exploring Relationships! 32

    https://panamapapers.icij.org/20160404-azerbaijan-hidden-wealth.html There are legitimate uses for offshore companies and trusts. We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names.
  16. [email protected] www.rittmanmead.com @rittmanmead EOF 50 email
 [email protected] web
 http://ritt.md/rmoff http://rmoff.net

    twitter
 @rmoff irc
 rmoff @ #obihackers #EOF speakerdeck.com/rmoff/ https://community.oracle.com/docs/DOC-1006400