Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Machine Learning Meets Graph Databases

Bf71450537acca19e045ae6f7febdf9a?s=47 Gianni Ceresa
November 21, 2019

When Machine Learning Meets Graph Databases

Machine Learning is everywhere these days (just after AI), it started as a python and R thing, it joined the Oracle Database after and it's now available for Oracle Graph Database as well. Let's go through some examples of how graphs require to slightly adapt data preparation to run Machine Learning algorithms.

Bf71450537acca19e045ae6f7febdf9a?s=128

Gianni Ceresa

November 21, 2019
Tweet

Transcript

  1. None
  2. None
  3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | bit.ly/OracleACEProgram 450+ Technical Experts Helping Peers Globally Nominate yourself or someone you know: acenomination.oracle.com
  4. None
  5. Vertex edge A Property Graph (also called node)

  6. edge edge label edge properties edge ID directed edge vertex

    (node) vertex properties vertex ID a vertex can have a label
  7. PGX Scalable and Persistent Storage Graph Data Access Layer API

    Graph Analytics In-memory Analytic Engine Blueprints & SolrCloud / Lucene Property Graph Support on Files, Apache HBase, Oracle NoSQL or Oracle DB 12.2+ REST Web Service Python, Perl, PHP, Ruby, Javascript, … Java APIs Java APIs/JDBC/SQL/PLSQL Cytoscape Plug-in R Integration (OAAgraph) Spark integration SQL*Plus, …
  8. None
  9. From 45,700 nodes with 105,406 edges, to 85 nodes with

    218 edges in seconds Catalog RPD
  10. Spain Italy John Doe Company A Company B Company C

    Company D Located in Located in Located in Located in Buys from Buys from Buys from Buys from Money laundering and VAT frauds Owns
  11. None
  12. None
  13. None
  14. None
  15. • • • • • • • • • •

    • • • • • • • • • • • • • •
  16. • • • •

  17. How much? by Francesco Tisiot (34)

  18. How much?

  19. How much?

  20. How much? 1’000 more columns of features Machine Learning isn’t

    Machine Guessing
  21. How much? (100’000 rows of houses with a price) Training

  22. None
  23. None
  24. • • •

  25. Customer 1 Customer 3 Customer 2 Product 2 Product 3

    Product 4 Product 5 Product 1 Customer 1 is more similar to Customer 3 than Customer 2
  26. • • •

  27. None
  28. • • • • • •

  29. None
  30. • •

  31. None
  32. • • • •

  33. None
  34. Teacher A Teacher B Director

  35. Teacher A Teacher B Director Students Students

  36. Teacher A Teacher B Director Students Students 7 6 8

    1 2 3 4 5 9 10 11 12 13
  37. Teacher A Teacher B Director Students Students 7 6 8

    1 2 3 4 5 9 10 11 12 13
  38. 7 6 8 1 2 3 4 5 9 10

    11 12 13
  39. 7 6 8 1 2 3 4 5 9 10

    11 12 13 1 2 3 4 5 6 7 start
  40. 7 6 8 1 2 3 4 5 9 10

    11 12 13 1 2 3 4 5 6 7 start Walk (nodes) : 1 – 6 – 7 – 6 – 3 – 4 – 6 – 2 Walk length: 8
  41. • • • •

  42. • • • • • • Example graph

  43. • • • •

  44. None
  45. (the details of the Word2vec implementation are of out of

    the scope of this presentation and would take too long to cover) n = layer size (by default 200 for DeepWalk in PGX) context word 1 , 2 , 3 , … , 1 , 2 , 3 , … , 1 , 2 , 3 , … , 1 , 2 , 3 , … , 1 , 2 , 3 , … , target word
  46. DEMO

  47. • • • •

  48. pgx> var similars = model.computeSimilars("Albert_Einstein", 10) pgx> similars.print() +-----------------------------------------+ |

    dstVertex | similarity | +-----------------------------------------+ | Albert_Einstein | 1.0000001192092896 | | Physics | 0.8664291501045227 | | Werner_Heisenberg | 0.8625140190124512 | | Richard_Feynman | 0.8496938943862915 | | List_of_physicists | 0.8415523767471313 | | Physicist | 0.8384397625923157 | | Max_Planck | 0.8370327353477478 | | Niels_Bohr | 0.8340970873832703 | | Quantum_mechanics | 0.8331197500228882 | | Special_relativity | 0.8280861973762512 | +-----------------------------------------+
  49. • • • •

  50. • • • • •

  51. • • • • • At least for now… &

    features