Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Oracle Big Data Spatial and Graph enables the analysis of datasets beyond that of standard relational analytics commonly used. Through Graph technology relationships can be identified that may not otherwise have been. This has practical uses including in product recommendations, social network analysis, and fraud detection.

In this presentation we will see a practical demonstration of Oracle Big Data Spatial and Graph to load and analyse the "Panama Papers" dataset. Graph algorithms will be utilised to identify key actors and organisations within the data, and patterns of relationships shown. This practical example of using the tool will give attendees a clear idea of the functionality of the tool and how it could be used within their own organisation.

Robin Moffatt

March 09, 2017
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. [email protected] www.rittmanmead.com @rittmanmead 1
    Analysing the Panama Papers with 

    Oracle Big Data Spatial and Graph
    Robin Moffatt, Rittman Mead
    speakerdeck.com/rmoff/
    OUGN 2017

    View Slide

  2. [email protected] www.rittmanmead.com @rittmanmead
    Robin Moffatt
    2
    • Head of R&D, Rittman Mead

    • Previously OBIEE/DW developer at large UK
    retailer
    • Previously SQL Server DBA, Business Objects, 

    DB2, COBOL….
    • Oracle ACE

    • Frequent blogger : http://ritt.md/rmoff and
    http://rmoff.net

    • Twitter: @rmoff

    • IRC: rmoff / #obihackers / freenode

    View Slide

  3. [email protected]anmead.com www.rittmanmead.com @rittmanmead
    Rittman Mead
    3
    • Oracle Gold Partner with offices in the
    UK and USA

    • 70+ staff delivering Oracle BI, DW, Big
    Data and Advanced Analytics projects

    • Significant web presence with the
    Rittman Mead Blog 

    (http://www.rittmanmead.com)

    • Hadoop R&D lab for “dogfooding”
    solutions developed for customers

    View Slide

  4. [email protected] www.rittmanmead.com @rittmanmead
    What is a Property Graph and Why is it So Useful?
    4
    • Graph enables us to answer question that relational would
    struggle with

    • You could write recursive or procedural SQL but it would be
    nasty. It would also be impossible to maintain and repeat at
    scale.

    • Graph-based algorithms (e.g. PageRank) enrich an existing
    dataset and give us additional insights into it

    View Slide

  5. [email protected] www.rittmanmead.com @rittmanmead
    Property Graph Terminology
    • Node/Vertex

    - The “What”
    • Edge

    - The “How” / Relationship
    • (Un)Directed

    - The direction of the relationship
    • Properties

    - Nodes or Edges
    5

    View Slide

  6. [email protected] www.rittmanmead.com @rittmanmead
    Graph Analysis Uses
    6

    View Slide

  7. [email protected] www.rittmanmead.com @rittmanmead
    What are the “Panama Papers”?
    7
    • Dataset of 11.5 million documents regarding
    offshore entities, released in 2015

    • International Consortium of Investigative
    Journalists (ICIJ) analysed the raw data and
    made available a curated set of the data

    • The New York Times and The Guardian
    among newspapers that investigated the
    data in depth
    There are legitimate uses for offshore companies and trusts.
    We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks
    Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names.
    https://www.theguardian.com/news/2016/apr/08/fallout-from-panama-papers-revelations-so-far-country-by-country

    View Slide

  8. [email protected]ead.com www.rittmanmead.com @rittmanmead
    Oracle Big Data Spatial and Graph
    8
    • Store the Property Graph
    definition in HBase or Oracle
    NoSQL

    • API to load/modify data

    • In-memory analytic engine (PGX)
    loads graph for analysis, and
    provides built-in algorithm
    implementations

    • Also provides RDF and Spatial
    capabilities

    View Slide

  9. [email protected] www.rittmanmead.com @rittmanmead
    New in Oracle 12.2 - Property Graph support
    9
    • Store the Property Graph
    definition in Oracle Database
    12.2 - on-premises or cloud

    View Slide

  10. [email protected] www.rittmanmead.com @rittmanmead
    Interacting with Property Graphs
    10
    • Apache Tinkerpop’s “Gremlin”

    • Java APIs for programatic access

    • Nascent Python support (pyopg)

    • Spark library available

    • Interactive visualisation and
    exploration of data with tools like
    Cytoscape (open source), Tom
    Sawyer (paid), etc

    View Slide

  11. [email protected] www.rittmanmead.com @rittmanmead
    Notebooks
    11
    • Interactive code development &
    execution environment

    • Notebooks can be shared for others
    to run and reproduce findings

    • Apache Zeppelin and Jupyter two
    popular options

    • Working with Spatial and Graph:

    - pyopg/Jupyter
    - PGX/Zeppelin

    View Slide

  12. [email protected] www.rittmanmead.com @rittmanmead
    Apache Zeppelin
    12

    View Slide

  13. [email protected] www.rittmanmead.com @rittmanmead
    Loading the Data
    13
    • Data can be loaded from various formats:

    - GraphML Data Format
    - GraphSON Data Format
    - GML Data Format
    - Oracle Flat File Format
    • Source data was CSV, which needed wrangling
    to fit a supported input format - Oracle Flat File
    Format was chosen

    - Supports highly-parallised loading in BDSG
    - BDSG now provides CSV->OPE/V Java API

    View Slide

  14. [email protected] www.rittmanmead.com @rittmanmead
    Data Wrangling with R
    14
    panama_opv_ope.R https://gist.github.com/rmoff/17025830c81e60d6446e34a37273f705

    View Slide

  15. [email protected] www.rittmanmead.com @rittmanmead
    Apache HBase - importFlatFiles()
    15

    View Slide

  16. [email protected] www.rittmanmead.com @rittmanmead
    Loading Property Graph in Oracle 12.2
    16
    • Same process as HBase, but different Java class

    • Make sure DB is configured max_string_size=extended

    View Slide

  17. [email protected] www.rittmanmead.com @rittmanmead
    Property Graph in Oracle 12.2
    17
    • Property Graph data is stored in a set of tables,

    View Slide

  18. [email protected] www.rittmanmead.com @rittmanmead
    Inspecting the Property Graph
    18

    View Slide

  19. [email protected] www.rittmanmead.com @rittmanmead
    Analysing the Property Graph - Zeppelin
    19
    • Native rendering support for resultset objects

    View Slide

  20. [email protected] www.rittmanmead.com @rittmanmead
    Analysing the Data - Property Graph Query Language (PGQL)
    20
    • SQL-like language for querying property graph

    • Same SELECT .. WHERE clause pattern but with syntax for
    expressing graph relationships

    • http://pgql-lang.org/spec/1.0/

    View Slide

  21. [email protected] www.rittmanmead.com @rittmanmead
    Simple PGQL
    21

    View Slide

  22. [email protected] www.rittmanmead.com @rittmanmead
    Powerful Predicate Support
    22

    View Slide

  23. [email protected] www.rittmanmead.com @rittmanmead
    PGX Built-In Algorithms
    23

    View Slide

  24. [email protected] www.rittmanmead.com @rittmanmead
    Analysing the Property Graph - SQL
    24

    View Slide

  25. [email protected] www.rittmanmead.com @rittmanmead
    SQL is OK - but PGQL is More Elegant and Powerful
    25
    with OfficerPR as
    (select V.vid, pr.pr
    from panamaPR pr
    inner join PANAMAVT$ V
    on pr.NODE = v.vid
    where v.K = 'Type' 

    and v.V = 'Officer'
    order by PR desc
    fetch first 5 rows only)
    select pr2.pr,v2.k,v2.v
    from OfficerPR pr2
    inner join panamaVT$ v2
    on pr2.vid = v2.vid
    where v2.k in ('Name','Countries');
    SQL
    select n.pr, n.name, n.countries
    WHERE (n WITH Type =~ 'Officer')
    ORDER BY n.pr limit 5
    PGQL

    View Slide

  26. [email protected] www.rittmanmead.com @rittmanmead
    Exploring the Property Graph with Cytoscape
    26

    View Slide

  27. [email protected] www.rittmanmead.com @rittmanmead
    Exploring the Property Graph with Cytoscape
    27

    View Slide

  28. [email protected] www.rittmanmead.com @rittmanmead
    Exploring the Property Graph with Cytoscape
    28

    View Slide

  29. [email protected] www.rittmanmead.com @rittmanmead
    Layout Algorithms
    29
    Prefuse Force Directed Layout

    View Slide

  30. [email protected] www.rittmanmead.com @rittmanmead
    Community Detection
    30

    View Slide

  31. [email protected] www.rittmanmead.com @rittmanmead
    Community Detection
    31

    View Slide

  32. [email protected] www.rittmanmead.com @rittmanmead
    Graph Beats Relational for Exploring Relationships!
    32
    https://panamapapers.icij.org/20160404-azerbaijan-hidden-wealth.html
    There are legitimate uses for offshore companies and trusts.
    We do not intend to suggest or imply that any persons, companies or other entities included in the ICIJ Offshore Leaks
    Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names.

    View Slide

  33. [email protected] www.rittmanmead.com @rittmanmead
    Load two vertices 15005001,49522
    33

    View Slide

  34. [email protected] www.rittmanmead.com @rittmanmead
    Set up colouring & label
    34

    View Slide

  35. [email protected] www.rittmanmead.com @rittmanmead 35

    View Slide

  36. [email protected] www.rittmanmead.com @rittmanmead
    expand rosamund
    36

    View Slide

  37. [email protected] www.rittmanmead.com @rittmanmead 37

    View Slide

  38. [email protected] www.rittmanmead.com @rittmanmead 38

    View Slide

  39. [email protected] www.rittmanmead.com @rittmanmead 39

    View Slide

  40. [email protected] www.rittmanmead.com @rittmanmead 40

    View Slide

  41. [email protected] www.rittmanmead.com @rittmanmead 41

    View Slide

  42. [email protected] www.rittmanmead.com @rittmanmead
    expand node
    42

    View Slide

  43. [email protected] www.rittmanmead.com @rittmanmead 43

    View Slide

  44. [email protected] www.rittmanmead.com @rittmanmead 44

    View Slide

  45. [email protected] www.rittmanmead.com @rittmanmead 45

    View Slide

  46. [email protected] www.rittmanmead.com @rittmanmead 46

    View Slide

  47. [email protected] www.rittmanmead.com @rittmanmead 47

    View Slide

  48. [email protected] www.rittmanmead.com @rittmanmead 48

    View Slide

  49. [email protected] www.rittmanmead.com @rittmanmead 49

    View Slide

  50. [email protected] www.rittmanmead.com @rittmanmead
    EOF
    50
    email

    [email protected]
    web

    http://ritt.md/rmoff
    http://rmoff.net
    twitter

    @rmoff
    irc

    rmoff @ #obihackers
    #EOF
    speakerdeck.com/rmoff/
    https://community.oracle.com/docs/DOC-1006400

    View Slide