Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Review of Graph Databases

Review of Graph Databases

Walk through of available graph databases, query languages and managed solutions available at a time.

Arturas Smorgun

September 05, 2018
Tweet

More Decks by Arturas Smorgun

Other Decks in Technology

Transcript

  1. (review of graph databases) ( )— —>( )<— —( )

    Graph Recap !-> Types of Graph Solutions !-> Query Languages !-> Graphs in AWS
  2. Review of Graph Databases, Arturas Smorgun, 2018 Domain: encyclopedia •

    Vertices: • Author • Page • Edges: • Page relates to other page • Author contributes to a page (authors, edits, reviews)
  3. Review of Graph Databases, Arturas Smorgun, 2018 Graph Algorithms FTW!!

    Breadth First Search Depth First Search Shortest Path Minimum Spanning Tree Maximum Flow Connectivity … …
  4. Review of Graph Databases, Arturas Smorgun, 2018 Types of Graph

    Solutions Graph Recap !-> Types of Graph Solutions !-> Query Languages !-> Graphs in AWS
  5. Review of Graph Databases, Arturas Smorgun, 2018 Resource Description Framework

    (RDF) • W3C Recommendation • Open Source • since 1999 • Information Exchange • Semantic Web
  6. Review of Graph Databases, Arturas Smorgun, 2018 Labeled Property Graph

    (LPG) • Not sure who was first • Some open source • Some proprietary • Emphasis on querying and traversal
  7. Review of Graph Databases, Arturas Smorgun, 2018 RDF • Vertice

    = Resource • Edge = Relationship • Qualify relationships: NO • Identify relationship: NO • Triple: subject, predicate, object • Data mobility: YES • Verifiability: YES • Vertice = Node • Edge = Relationship • Qualify relationships: YES • Identify relationship: YES • No strictly defined structure • Data mobility: Maybe? • Verifiability: No? LPG
  8. Review of Graph Databases, Arturas Smorgun, 2018 Graph DB vs

    Relational DB • Good for recursive loose schema • Vertices for entities • Edges for relationships • Easy to add new entity types • Easy to add new relationships • Easy to traverse relationships • Good for flat strict schema • Tables for entities • Tables or FK for relationships • Easy to add new entity types • Easy to add new relationships • Hard to traverse relationships
  9. Review of Graph Databases, Arturas Smorgun, 2018 Graph DB vs

    Document DB • Loose schema with nester relations • Vertices for entities • Edges for relationships • Easy to add new entity types • Easy to add new relationships • Easy to traverse relationships • Loose schema with duplications • Document with all relations • Document per use case • Easy to add new entity types • Hard to add new relationships • Hard to traverse relationships
  10. Review of Graph Databases, Arturas Smorgun, 2018 Query Languages Graph

    Recap !-> Types of Graph Solutions !-> Query Languages !-> Graphs in AWS
  11. Review of Graph Databases, Arturas Smorgun, 2018 SPARQL • For

    RDF Graphs • W3C Recommendation • widely used • not an xml, yay (the RDF documents can be xml) PREFIX foaf: <http:!//xmlns.com/foaf/0.1!/> SELECT ?name WHERE { ?person foaf:name ?name . }
  12. Review of Graph Databases, Arturas Smorgun, 2018 Cypher • For

    Labeled Property Graphs • created by Neo4j • open sourced and widely available now • “SQL of Graph Databases” (striving to be) MATCH (node1:Label1)-[rel]!->(node2:Label2) WHERE node1.propertyA = {value} RETURN node2.propertyA, node2.propertyB
  13. Review of Graph Databases, Arturas Smorgun, 2018 Gremlin • For

    Labeled Property Graphs • Apache Project • open sourced and used for many different databases and processors • very similar to Cypher, but I found tooling and documentation poorer g.V().has("name","gremlin"). out("knows"). out("knows"). values("name")
  14. Review of Graph Databases, Arturas Smorgun, 2018 GraphQL • NOT

    graph db query language • started by Facebook • aimed at api communication • very useful to query existing data { hero { name } }
  15. Review of Graph Databases, Arturas Smorgun, 2018 Insert in SPARQL

    (RDF XML) vs Cypher <?xml version="1.0"?> <rdf:RDF xmlns="http:!//!!www.w3.org/2002/07/owl" xml:base="http:!//!!www.w3.org/2002/07/owl" xmlns:x="http:!//!!www.example.org/" xmlns:rdfs="http:!//!!www.w3.org/2000/01/rdf-schema#" xmlns:owl="http:!//!!www.w3.org/2002/07/owl#" xmlns:xsd="http:!//!!www.w3.org/2001/XMLSchema#" xmlns:rdf="http:!//!!www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="author_1"> <x:name>David Smith!</x:name> <x:authored rdf:resource="page_1">!</x:authored> <x:authored rdf:resource="page_2">!</x:authored> <x:authored rdf:resource="page_4">!</x:authored> !</rdf:Description> <rdf:Description rdf:about="author_2"> <x:name>Jack Jones!</x:name> <x:authored rdf:resource="page_3">!</x:authored> <x:edited rdf:resource="page_2">!</x:edited> <x:reviewed rdf:resource="page_1">!</x:reviewed> !</rdf:Description> <rdf:Description rdf:about="author_3"> <x:name>James James!</x:name> <x:authored rdf:resource="page_4">!</x:authored> !</rdf:Description> <rdf:Description rdf:about="page_1"> <x:name>City!</x:name> !</rdf:Description> <rdf:Description rdf:about="page_2"> <x:name>Owl!</x:name> !</rdf:Description> <rdf:Description rdf:about="page_3"> <x:name>Branch!</x:name> <x:related_to rdf:resource="page_4">!</x:related_to> !</rdf:Description> <rdf:Description rdf:about="page_4"> <x:name>Some!</x:name> !</rdf:Description> !</rdf:RDF> CREATE (a1:AUTHOR {name: "David Smith"}) CREATE (a2:AUTHOR {name: "Jack Jones"}) CREATE (a3:AUTHOR {name: "James James"}) CREATE (p1:PAGE {name: "City"}) CREATE (p2:PAGE {name: "Owl"}) CREATE (p3:PAGE {name: "Branch"}) CREATE (p4:PAGE {name: "Some"}) CREATE (a1)-[:AUTHORED]!->(p1) CREATE (a1)-[:AUTHORED]!->(p2) CREATE (a1)-[:AUTHORED]!->(p4) CREATE (a2)-[:EDITED]!->(p2) CREATE (a2)-[:AUTHORED]!->(p3) CREATE (a2)-[:REVIEWED]!->(p1) CREATE (a3)-[:EDITED]!->(p3) CREATE (p3)-[:RELATED_TO]!->(p4)
  16. Review of Graph Databases, Arturas Smorgun, 2018 Insert in SPARQL

    (Turtle) vs Cypher @prefix x: <http:!//example.com!/> INSERT DATA { <http:!//example.com/author_1> x:name: "David Smith" x:authored: <http:!//example.com/page_1> x:authored: <http:!//example.com/page_2> x:authored: <http:!//example.com/page_4> <http:!//example.com/author_2> x:name: "Jack Jones" x:authored: <http:!//example.com/page_3> x:edited: <http:!//example.com/page_2> x:reviewed: <http:!//example.com/page_1> <http:!//example.com/author_3> x:name: "James James" x:authored: <http:!//example.com/page_4> <http:!//example.com/page_1> x:name: "City" <http:!//example.com/page_2> x:name: "Owl" <http:!//example.com/page_3> x:name: "Branch" x:related_to: <http:!//example.com/page_4> <http:!//example.com/page_4> x:name: "Some" CREATE (a1:AUTHOR {name: "David Smith"}) CREATE (a2:AUTHOR {name: "Jack Jones"}) CREATE (a3:AUTHOR {name: "James James"}) CREATE (p1:PAGE {name: "City"}) CREATE (p2:PAGE {name: "Owl"}) CREATE (p3:PAGE {name: "Branch"}) CREATE (p4:PAGE {name: "Some"}) CREATE (a1)-[:AUTHORED]!->(p1) CREATE (a1)-[:AUTHORED]!->(p2) CREATE (a1)-[:AUTHORED]!->(p4) CREATE (a2)-[:EDITED]!->(p2) CREATE (a2)-[:AUTHORED]!->(p3) CREATE (a2)-[:REVIEWED]!->(p1) CREATE (a3)-[:EDITED]!->(p3) CREATE (p3)-[:RELATED_TO]!->(p4)
  17. Review of Graph Databases, Arturas Smorgun, 2018 Select in SPARQL

    vs Cypher !// select all @prefix x: <http:!//example.com!/> SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } !// select ordered authors @prefix x: <http:!//example.com!/> SELECT { ?g WHERE { GRAPH ?g { ?s x:authored ?o } }. ?s x:name ?name. } ORDER BY ?name !// select all MATCH (n1)-[r]!->(n2) RETURN r, n1, n2 !// select ordered authors MATCH (author)-[r:AUTHORED]!->(page) RETURN author, r, page ORDER BY author.name
  18. Review of Graph Databases, Arturas Smorgun, 2018 Update in SPARQL

    vs Cypher @prefix x: <http:!//example.com> DELETE {<http:!//example.com/author_1> x:name ?o}
 INSERT {<http:!//example.com/author_1> x:name “John Smith”} WHERE {<http:!//example.com/author_1> x:name ?o MATCH (a:AUTHOR {name: “John Smith”) SET c.name = "David Smith" RETURN c
  19. Review of Graph Databases, Arturas Smorgun, 2018 Select all related

    pages in SPARQL vs Cypher !// not sure … !// update graph MATCH (p:PAGE {name: "Some"}) CREATE (p5:PAGE {name: "Some More"}) CREATE (p5)-[:RELATED_TO]!->(p) !// select MATCH (a:AUTHOR {name: "Jack Jones"})-[:AUTHORED]! ->(ap: PAGE), (ap)-[:RELATED_TO*1!..3]-(other:PAGE) RETURN a, ap, other
  20. Review of Graph Databases, Arturas Smorgun, 2018 Ready solutions on

    AWS Graph Recap !-> Types of Graph Solutions !-> Query Languages !-> Graphs in AWS
  21. Review of Graph Databases, Arturas Smorgun, 2018 Amazon Neptune •

    AWS Native Service • Managed • LPG & RDF • Query languages: SPARQL, Gremlin (can do Cypher) • ACID: YES • Limitation: 64TB of data • Proprietary and vendor locked (but compatible) • Do not have dev env (use compatible services) • AWS Marketplace Solution • Hosted (support can be bought) • LPG (RDF via community plugin) • Query language: Cypher • ACID: YES? • Limitation: 34.4B of nodes • Open source and graph native • Have dev env (native or docker) Neo4j Enterprise
  22. Review of Graph Databases, Arturas Smorgun, 2018 HA: Amazon Neptune

    vs Neo4j Enterprise • Master-Slave • Automatic backups to s3 • Replicated across AZ • Failover in 30 seconds • Causal Graph
  23. Review of Graph Databases, Arturas Smorgun, 2018 Internals: Amazon Neptune

    vs Neo4j Enterprise • ¯\_(ツ)_/¯ • Graph Native
  24. Review of Graph Databases, Arturas Smorgun, 2018 Cost: Amazon Neptune

    vs Neo4j Enterprise • Pay as you go, no upfront cost • $0.30..$5.50 per hour (compute) • $0.10 per month (per GB store) • $0.20 per 1 million of requests • backups, replication, disaster recovery included • No upfront cost (check license) • pay for compute resource used • pay for storage used • pay for network traffic • commercial support is very expensive (reportedly ~$200k), need devops to maintain
  25. Review of Graph Databases, Arturas Smorgun, 2018 (CYPHER)-[:IS]->(GREAT) Neo4j has

    awesome tools to start quickly https://neo4j.com/download/ https://neo4j.com/sandbox-v2/