Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Machine Learning Meets Graph Databases

Gianni Ceresa
November 21, 2019

When Machine Learning Meets Graph Databases

Machine Learning is everywhere these days (just after AI), it started as a python and R thing, it joined the Oracle Database after and it's now available for Oracle Graph Database as well. Let's go through some examples of how graphs require to slightly adapt data preparation to run Machine Learning algorithms.

Gianni Ceresa

November 21, 2019
Tweet

More Decks by Gianni Ceresa

Other Decks in Technology

Transcript

  1. View Slide

  2. View Slide

  3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
    bit.ly/OracleACEProgram
    450+ Technical Experts
    Helping Peers Globally
    Nominate yourself or someone you know: acenomination.oracle.com

    View Slide

  4. View Slide

  5. Vertex
    edge
    A Property Graph
    (also called node)

    View Slide

  6. edge
    edge label
    edge properties
    edge ID
    directed edge
    vertex (node)
    vertex
    properties
    vertex ID
    a vertex can
    have a label

    View Slide

  7. PGX
    Scalable and Persistent Storage
    Graph Data Access Layer API
    Graph Analytics
    In-memory Analytic Engine
    Blueprints & SolrCloud / Lucene
    Property Graph Support on
    Files, Apache HBase, Oracle NoSQL or Oracle DB 12.2+
    REST Web Service
    Python, Perl, PHP, Ruby,
    Javascript, …
    Java APIs
    Java APIs/JDBC/SQL/PLSQL
    Cytoscape Plug-in
    R Integration (OAAgraph)
    Spark integration
    SQL*Plus, …

    View Slide

  8. View Slide

  9. From 45,700 nodes
    with 105,406 edges,
    to 85 nodes with
    218 edges in
    seconds
    Catalog
    RPD

    View Slide

  10. Spain
    Italy
    John Doe
    Company A
    Company B
    Company C
    Company D
    Located in
    Located in
    Located in
    Located in
    Buys from
    Buys from
    Buys from
    Buys from
    Money laundering
    and VAT frauds
    Owns

    View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

























  15. View Slide





  16. View Slide

  17. How much?
    by Francesco Tisiot (34)

    View Slide

  18. How much?

    View Slide

  19. How much?

    View Slide

  20. How much?
    1’000 more columns of features
    Machine Learning isn’t Machine Guessing

    View Slide

  21. How much?
    (100’000 rows of houses with a price)
    Training

    View Slide

  22. View Slide

  23. View Slide




  24. View Slide

  25. Customer 1
    Customer 3
    Customer 2
    Product 2
    Product 3
    Product 4
    Product 5
    Product 1
    Customer 1 is more similar to Customer 3 than Customer 2

    View Slide




  26. View Slide

  27. View Slide







  28. View Slide

  29. View Slide



  30. View Slide

  31. View Slide





  32. View Slide

  33. View Slide

  34. Teacher A Teacher B
    Director

    View Slide

  35. Teacher A Teacher B
    Director
    Students Students

    View Slide

  36. Teacher A Teacher B
    Director
    Students Students
    7
    6
    8
    1 2 3 4 5 9 10 11 12 13

    View Slide

  37. Teacher A Teacher B
    Director
    Students Students
    7
    6
    8
    1 2 3 4 5 9 10 11 12 13

    View Slide

  38. 7
    6 8
    1
    2
    3 4
    5
    9
    10
    11 12
    13

    View Slide

  39. 7
    6 8
    1
    2
    3 4
    5
    9
    10
    11 12
    13
    1
    2
    3
    4
    5
    6
    7
    start

    View Slide

  40. 7
    6 8
    1
    2
    3 4
    5
    9
    10
    11 12
    13
    1
    2
    3
    4
    5
    6
    7
    start
    Walk (nodes) :
    1 – 6 – 7 – 6 – 3 – 4 – 6 – 2
    Walk length: 8

    View Slide





  41. View Slide







  42. Example graph

    View Slide





  43. View Slide


  44. View Slide

  45. View Slide

  46. (the details of the Word2vec implementation are of out of the scope of this presentation and would take too
    long to cover)
    n = layer size (by default 200 for DeepWalk in PGX)
    context word
    1
    , 2
    , 3
    , … ,
    1
    , 2
    , 3
    , … ,
    1
    , 2
    , 3
    , … ,
    1
    , 2
    , 3
    , … ,
    1
    , 2
    , 3
    , … ,
    target word

    View Slide

  47. DEMO

    View Slide





  48. View Slide

  49. pgx> var similars = model.computeSimilars("Albert_Einstein", 10)
    pgx> similars.print()
    +-----------------------------------------+
    | dstVertex | similarity |
    +-----------------------------------------+
    | Albert_Einstein | 1.0000001192092896 |
    | Physics | 0.8664291501045227 |
    | Werner_Heisenberg | 0.8625140190124512 |
    | Richard_Feynman | 0.8496938943862915 |
    | List_of_physicists | 0.8415523767471313 |
    | Physicist | 0.8384397625923157 |
    | Max_Planck | 0.8370327353477478 |
    | Niels_Bohr | 0.8340970873832703 |
    | Quantum_mechanics | 0.8331197500228882 |
    | Special_relativity | 0.8280861973762512 |
    +-----------------------------------------+

    View Slide





  50. View Slide






  51. View Slide






  52. At least for now…
    & features

    View Slide