Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adopting Neo4j @ Enterprise scale

Adopting Neo4j @ Enterprise scale

Dmitrijs Vrublevskis

October 14, 2016
Tweet

More Decks by Dmitrijs Vrublevskis

Other Decks in Programming

Transcript

  1. Adopting Neo4j
    @ Enterprise scale

    View full-size slide

  2. Dmitry Vrublevsky
    Software developer @
    ƀ [email protected]
    @FylmTM
    Ambassador @

    View full-size slide

  3. Agenda
    1. Why graph databases?
    2. Why Neo4j?
    3. Neo4j internals
    4. Use cases

    View full-size slide

  4. Graphs 101
    Circle - node
    Arrow - relationship

    View full-size slide

  5. Proof-of-Concept
    Evaluate Neo4j Graph Database
    as replacement to
    existing RDBMS solution.

    View full-size slide

  6. Small dataset Medium dataset Large dataset

    View full-size slide

  7. Why graph databases?
    Domain Use case

    View full-size slide

  8. Telecommunication domain

    View full-size slide

  9. Use case: Validate network
    5+ is OK

    View full-size slide

  10. Use case: Validate network
    5+ is OK

    View full-size slide

  11. Use case: Validate network
    :(
    5+ is OK

    View full-size slide

  12. Why Neo4j?
    Highly scalable native graph
    database that leverages data
    relationships as 

    first-class entities.
    by Neo Technology, Inc.

    View full-size slide

  13. http://db-engines.com/en/ranking

    View full-size slide

  14. Features
    Native Processing &
    Storage
    ACID
    Cypher - Graph Query
    Language
    REST & Native API
    Optional schema
    Lock Manager
    High-performance cache
    Clustering
    Backups
    Monitoring
    Community Enterprise

    View full-size slide

  15. First-class
    Everything is an entity
    Entities have properties
    Entities have a type

    View full-size slide

  16. First-class
    {details: —}
    :LIKES
    :DMITRY
    :HighLoadStrategy
    {works_with: Neo4j}
    {day: 14.10.2016}
    Properties
    Labels
    Type

    View full-size slide

  17. Neo4j internals
    1. Native storage
    2. Native processing

    View full-size slide

  18. Native storage
    Specifically designed to
    store and manage graphs.

    View full-size slide

  19. http://neo4j.com/developer/graph-db-vs-rdbms/

    View full-size slide

  20. http://neo4j.com/developer/graph-db-vs-rdbms/

    View full-size slide

  21. http://neo4j.com/developer/graph-db-vs-rdbms/

    View full-size slide

  22. Native processing
    Efficient way of processing
    graph data since connected
    nodes physically “point” to
    each other
    a.k.a. “index-free adjacency”

    View full-size slide

  23. $ ls -1 data/databases/graph.db | column -c 100
    index neostore.propertystore.db.index.id
    index.db neostore.propertystore.db.index.keys
    messages.log neostore.propertystore.db.index.keys.id
    neostore neostore.propertystore.db.strings
    neostore.counts.db.a neostore.propertystore.db.strings.id
    neostore.counts.db.b neostore.relationshipgroupstore.db
    neostore.id neostore.relationshipgroupstore.db.id
    neostore.labeltokenstore.db neostore.relationshipstore.db
    neostore.labeltokenstore.db.id neostore.relationshipstore.db.id
    neostore.labeltokenstore.db.names neostore.relationshiptypestore.db
    neostore.labeltokenstore.db.names.id neostore.relationshiptypestore.db.id
    neostore.nodestore.db neostore.relationshiptypestore.db.names
    neostore.nodestore.db.id neostore.relationshiptypestore.db.names.id
    neostore.nodestore.db.labels neostore.schemastore.db
    neostore.nodestore.db.labels.id neostore.schemastore.db.id
    neostore.propertystore.db neostore.transaction.db.0
    neostore.propertystore.db.arrays neostore.transaction.db.1
    neostore.propertystore.db.arrays.id schema
    neostore.propertystore.db.id store_lock
    neostore.propertystore.db.index

    View full-size slide

  24. $ ls -1 data/databases/graph.db | column -c 100
    index neostore.propertystore.db.index.id
    index.db neostore.propertystore.db.index.keys
    messages.log neostore.propertystore.db.index.keys.id
    neostore neostore.propertystore.db.strings
    neostore.counts.db.a neostore.propertystore.db.strings.id
    neostore.counts.db.b neostore.relationshipgroupstore.db
    neostore.id neostore.relationshipgroupstore.db.id
    neostore.labeltokenstore.db neostore.relationshipstore.db
    neostore.labeltokenstore.db.id neostore.relationshipstore.db.id
    neostore.labeltokenstore.db.names neostore.relationshiptypestore.db
    neostore.labeltokenstore.db.names.id neostore.relationshiptypestore.db.id
    neostore.nodestore.db neostore.relationshiptypestore.db.names
    neostore.nodestore.db.id neostore.relationshiptypestore.db.names.id
    neostore.nodestore.db.labels neostore.schemastore.db
    neostore.nodestore.db.labels.id neostore.schemastore.db.id
    neostore.propertystore.db neostore.transaction.db.0
    neostore.propertystore.db.arrays neostore.transaction.db.1
    neostore.propertystore.db.arrays.id schema
    neostore.propertystore.db.id store_lock
    neostore.propertystore.db.index

    View full-size slide

  25. Storage layout
    Node (15 bytes)
    in_use
    next_rel_id
    next_prop_id
    labels
    extra
    Relationship (34 bytes)
    directed | in_use
    first_node
    second_node
    rel_type
    first_prev_rel_id
    first_next_rel_id
    second_prev_rel_id
    second_next_rel_id
    next_prop_id
    first_in_chain_markers

    View full-size slide

  26. Storage layout
    Node (15 bytes)
    next_rel_id
    Relationship (34 bytes)
    first_node
    second_node
    first_prev_rel_id
    first_next_rel_id

    View full-size slide

  27. Storage math
    Node = RecordSize * ID
    Relationship = RecordSize * ID

    View full-size slide

  28. Traversal (Node -> Relationship)
    Node (15 bytes)
    next_rel_id=2
    Relationships (34 bytes)
    2 * 34 = 68
    0B
    34B
    68B
    102B
    136B
    170B

    View full-size slide

  29. Traversal (Relationship -> Node)
    Relationship (34 bytes) Nodes (15 bytes)
    0B
    15B
    30B
    45B
    60B
    75B
    first_node=1
    second_node=4
    1 * 15 = 15
    4 * 15 = 60

    View full-size slide

  30. Native summary
    O(1) traversal hops
    Avoid super nodes!

    View full-size slide

  31. Cypher
    Cypher is a declarative
    graph query language that
    allows for expressive and
    efficient querying.
    https://github.com/opencypher/openCypher

    View full-size slide

  32. Cypher 101
    ASCII art:
    ( ) - node
    --> - relationship
    Keywords:
    MATCH
    CREATE
    WHERE
    RETURN

    View full-size slide

  33. Cypher example (1)
    MATCH (root)-->(children)
    RETURN *

    View full-size slide

  34. Cypher example (2)
    MATCH
    (t:Towers)
    -[:CHILDREN]->
    (n:NetworkPiece)
    -[:CHILDREN]->
    (e:Function)
    WHERE NOT
    (t)-[:CHILDREN]->(:CellJCA)
    RETURN t

    View full-size slide

  35. Neo4j adoption

    View full-size slide

  36. Application
    Persistence layer
    Neo4j driver
    Neo4j
    Performance
    Fast
    Slow
    Persistence service

    View full-size slide

  37. Application
    Persistence layer
    Neo4j driver
    Neo4j
    Performance
    Fast
    Slow
    Persistence service

    View full-size slide

  38. Use cases
    Measurement average, 98%
    Resource usage ~ same

    View full-size slide

  39. UC: Sync Before Neo4j
    ~90m ~35m
    Count Per second
    Node count 80.32M 37498
    Relationship count 80.30M 37488
    Properties count 257.78M 120345

    View full-size slide

  40. UC: Single node Before Neo4j
    3ms 2ms
    MATCH (n)

    WHERE n.id = {id}
    RETURN n

    View full-size slide

  41. UC: Subgraph Before Neo4j
    88ms 14ms
    MATCH (n)-[r*]->(c)

    WHERE n.id = {id}
    RETURN *

    View full-size slide

  42. UC: By type Before Neo4j
    235ms 194ms
    MATCH (t:Tower)

    RETURN t

    View full-size slide

  43. UC: Count Before Neo4j
    32ms 16ms
    MATCH (n)-[r*]->(c)

    WHERE n.id = {id}
    RETURN count(*)

    View full-size slide

  44. 3
    4
    5
    6
    8
    2
    7
    1
    UC: Traversal Before Neo4j
    112ms 39ms
    MATCH (n)-[r]->(c)

    WHERE n.id = {id}
    RETURN *

    View full-size slide

  45. Future
    • Real graph API
    for application
    • Rewrite manual
    traversals to
    Cypher queries

    View full-size slide

  46. Deployment
    • Implemented in Java
    • Works everywhere
    • Writes - vertical scaling
    • Reads - horizontal scaling
    • Extensions & Stored procedures

    View full-size slide

  47. Stability
    • High load on DB
    • Kill Slave/master
    • Rolling upgrade
    • Split-brain
    • Server power-off

    View full-size slide