Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neo4j Magic Adventures

Neo4j Magic Adventures

When the domain of your data is clearly a graph, why shove it into a relational model? Specialized graph databases like Neo4j have demonstrated that it's easier to "think in graphs", while working with your data. But is Neo4j fast enough for use cases where tight performance is needed?

Dmitrijs Vrublevskis

November 10, 2014
Tweet

More Decks by Dmitrijs Vrublevskis

Other Decks in Programming

Transcript

  1. Neo4j
    Magic Adventures
    Dmitry Vrublevsky @ Neueda
    [email protected]

    View full-size slide

  2. Briefing
    1. Evaluate Neo4j capabilities
    • Import test dataset
    • Implement and run test cases
    • Measure everything
    2. Compare Neo4j with existing solution

    View full-size slide

  3. Neo4j
    “Neo4j – the World’s Leading Graph Database”

    View full-size slide

  4. • Native graph storage
    • Property graph database
    • Schema-less
    • Powerful Query Language
    • Clustering
    • Hot backups

    View full-size slide

  5. Network graph
    • Looks like a tree
    • Node has unique id
    • Node has type
    • Node can contain structures
    • … or structure sequences
    • … or structure with sequence of structures

    View full-size slide

  6. • Node (Object)
    • Structure
    • Sequence

    View full-size slide

  7. Network graph
    ~ 8.970.000 nodes
    ~ 11.000.000 relationships
    ~ 33.140.000 properties
    ~ 5.7 GB

    View full-size slide

  8. Environment
    • 8 GB / 4 CPUs
    • 80 GB SSD Disk
    • CentOS 6.5 x64

    View full-size slide

  9. Day 1 - 7
    • Environment setup…
    • Dataset import…
    • Documentation read…
    Done
    Done
    Done

    View full-size slide

  10. Mission impossible
    • Take existing data model
    • Take existing test case
    • Make it as fast, as possible

    View full-size slide

  11. Test case
    Read whole graph under specified root node with one request.
    Node count in graph - 97254
    (structures & sequences not counted)

    View full-size slide

  12. Neo4j toolbox

    View full-size slide

  13. REST API
    • Node & Relationship endpoints
    • Transactional endpoint
    • Traversal endpoint
    • Batch operations endpoint
    • Other

    View full-size slide

  14. Cypher
    “Cypher is a declarative graph query
    language that allows for expressive
    and efficient querying and updating
    of the graph store. “

    View full-size slide

  15. • ASCII art
    • Keywords like WHERE and ORDER BY are inspired
    by SQL
    • Focuses on the clarity of expressing what to
    retrieve from a graph
    • Collection semantics have been borrowed from
    languages such as Haskell and Python

    View full-size slide

  16. MATCH (john {name: ‘John’})-[:friend]->(friends)
    MATCH (friends)-[:friend]->(fof)
    RETURN john, fof
    ( … )
    [ … ]
    ()-[]-()
    - node
    - relationship
    - path

    View full-size slide

  17. Day 8
    Cypher? REST API?

    View full-size slide

  18. Solution via Cypher query
    MATCH
    (root:Object)-[r1:CHILDREN*]->(child:Object)
    WHERE
    root.id = {rootNodeId} 

    OPTIONAL MATCH
    (child)-[property:PROPERTY]->(child_property) 

    RETURN *

    View full-size slide

  19. #!/bin/bash


    QUERY=bodies/subgraph.json


    curl -i -XPOST \

    -o output.log \

    --data "@$QUERY" \

    -H "Accept: application/json" \

    -H "Content-Type: application/json" \

    http://127.0.0.1:7474/db/data/transaction/commit

    View full-size slide

  20. ? Received 1225 mb
    ?
    Download
    Speed
    1535 kb/s
    ? Time spent 817 seconds
    N/A REST/Default

    View full-size slide

  21. Observations
    • Response streamed
    • Large response size
    • Long request time

    View full-size slide

  22. Day 9
    Requirements arrived!

    View full-size slide

  23. Test case total time 2 seconds
    Currently we have 817 seconds

    View full-size slide

  24. [

    {

    "id": "100",

    "graph": {
    "nodes": [
    {"id": “101"}
    ]
    }

    },

    {

    "id": "100",

    "graph": {
    "nodes": [
    {"id": “102"}
    ]
    }

    }

    ]
    100
    102
    101

    View full-size slide

  25. Cypher thinks in paths,
    not graphs!
    • Unnecessary data duplication
    • Cypher doesn’t know about our data model

    View full-size slide

  26. Another solution
    MATCH
    (root:Object)-[r1:CHILDREN*]->(child:Object)
    WHERE
    root.id = {rootNodeId}
    OPTIONAL MATCH
    (child)-[r2:PROPERTY]->(child_propety) 

    RETURN
    collect(root) + collect(child) + collect(child_property)
    as nodes,
    collect(r2) as relationships

    View full-size slide

  27. #!/bin/bash


    QUERY=bodies/subgraph_optimised.json


    curl -i -XPOST \

    -o output.log \

    --data "@$QUERY" \

    -H "Accept: application/json" \

    -H "Content-Type: application/json" \

    http://127.0.0.1:7474/db/data/transaction/commit

    View full-size slide

  28. 1225 mb Received 85.2 MB
    1535 kb/s
    Download
    Speed
    1579 KB/s
    817 seconds Time spent 55 seconds
    REST/Default REST/Optimized

    View full-size slide

  29. Conclusion
    • We need more control on querying & serialisation
    process!
    • Maybe another serialisation format?
    • Another querying api?

    View full-size slide

  30. Day 10
    Morning standup
    - “We need to implement our own extension. Can we
    do it?”
    - “Yeah, definitely.”

    View full-size slide

  31. Unmanaged extension
    • The unmanaged extensions are a way of deploying
    arbitrary JAX-RS code into the Neo4j server.

    View full-size slide

  32. Plan
    1. Take fast serialisation library
    2. Take Neo4j Java API
    3. Implement our own endpoints
    4. …
    5. Profit!

    View full-size slide

  33. BSON
    • Obvious choice
    • Brought by MongoDB
    • Fast serialisation
    http://bsonspec.org/
    1. Lightweight
    2. Traversable
    3. Efficient
    (as they say)

    View full-size slide

  34. Jackson
    • Jackson used by Neo4j internally
    • It’s cool
    • Jackson has BSON plugin
    https://github.com/FasterXML/jackson
    https://github.com/michel-kraemer/bson4jackson

    View full-size slide

  35. //create mapper

    ObjectMapper mapper = new ObjectMapper(
    new BsonFactory()
    );
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    //serialize data
    mapper.writeValue(baos, pojo);



    ByteArrayInputStream bais =

    new ByteArrayInputStream(baos.toByteArray());
    //deserialize data
    mapper.readValue(bais, PojoClass.class);


    View full-size slide

  36. //create bson factory

    BsonFactory factory = new BsonFactory();

    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    //serialize data

    JsonGenerator gen = factory.createJsonGenerator(baos);

    gen.writeStartObject();

    gen.writeFieldName("name");

    gen.writeString(bob.getName());

    gen.close();
    Streaming!

    View full-size slide

  37. JAX-RS Streaming
    • Because why not?
    http://docs.oracle.com/javaee/6/api/javax/ws/rs/core/StreamingOutput.html

    View full-size slide

  38. StreamingOutput stream = new StreamingOutput() {

    @Override

    public void write(OutputStream os) {


    Writer writer = new BufferedWriter(

    new OutputStreamWriter(os)

    );

    writer.write("Hello World!");

    writer.flush();

    }

    };

    return Response.ok(stream).build();

    View full-size slide

  39. Day 11 - 12
    • Extension setup…
    • Cypher endpoint…
    • Documentation read…
    Done
    Done
    Done

    View full-size slide

  40. Dependencies

    org.neo4j

    neo4j

    ${neo4j.version}

    provided



    javax.ws.rs

    javax.ws.rs-api

    2.0

    provided


    View full-size slide

  41. private final GraphDatabaseService db;
    try(Transaction tx = db.beginTx()) {

    ExecutionEngine engine = new ExecutionEngine(db);

    ExecutionResult result = engine.execute(query);


    Bson.serialize(output, result);
    }

    View full-size slide

  42. Day 13
    Another day, another experiment

    View full-size slide

  43. New cypher solution
    MATCH (root:Object)-[:CHILDREN*]->(child:Object)
    WHERE root.id = {rootNodeId}
    RETURN child
    (Properties autoloaded during serialisation!)

    View full-size slide

  44. Iterable relationships =
    node.getRelationships(Relationships.PROPERTY, Direction.OUTGOING);


    for(Relationship relationship: relationships) {

    Node endNode = relationship.getEndNode();

    if(endNode.hasLabel(Labels.Structure)) {

    ...

    } else if(endNode.hasLabel(Labels.Sequence)) {

    ...
    }

    }

    View full-size slide

  45. #!/bin/bash


    QUERY=bodies/bson.json


    curl -i -XPOST \

    -o output.log \

    --data "@$QUERY" \

    -H "Content-Type: application/json" \

    http://127.0.0.1:7474/extension/bson/cypher

    View full-size slide

  46. 85.2 MB Received 78.3 MB
    1579 KB/s
    Download
    Speed
    12.3 MB/s
    55 seconds Time spent 6 seconds
    REST/Optimized Bson/Cypher

    View full-size slide

  47. Traverse API
    “The Neo4j Traversal API is a callback based, lazily
    executed way of specifying desired movements
    through a graph in Java.”

    View full-size slide

  48. ResourceIterable nodes = db
    .traversalDescription()

    .breadthFirst()

    .relationships(
    Relationships.CHILDREN,
    Direction.OUTGOING
    )

    .evaluator(Evaluators.all())

    .traverse(rootNode)

    .nodes();

    View full-size slide

  49. #!/bin/bash


    curl -i -XGET \

    -o output.log \

    http://127.0.0.1:7474/extension/bson/traverse

    View full-size slide

  50. 78.3 MB Received 78.3 MB
    12.3 MB/s
    Download
    Speed
    23.7 MB/s
    6 seconds Time spent 3 seconds
    Bson/Cypher Bson/Traverse

    View full-size slide

  51. Day 14
    Can we do better?

    View full-size slide

  52. “Kryo is a fast and efficient object graph serialization
    framework for Java.”

    View full-size slide

  53. Output output = new Output(outputStream);
    // Setup
    Kryo kryo = new Kryo();

    kryo.setRegistrationRequired(true);

    kryo.register(HashMap.class);

    kryo.register(String[].class);

    kryo.register(NodeDAO.class);

    kryo.register(RelationshipDAO.class);
    // Serialize
    kryo.writeObject(output, object1);
    kryo.writeObject(output, object2);
    output.close();

    View full-size slide

  54. #!/bin/bash


    QUERY=bodies/kryo.json


    curl -i -XPOST \

    -o output.log \

    --data "@$QUERY" \

    -H "Content-Type: application/json" \

    http://127.0.0.1:7474/extension/kryo/cypher

    View full-size slide

  55. 78.3 MB Received 68.5 MB
    12.3 MB/s
    Download
    Speed
    20.5 MB/s
    6 seconds Time spent 3 seconds
    Bson/Cypher Kryo/Cypher

    View full-size slide

  56. #!/bin/bash


    curl -i -XGET \

    -o output.log \

    -w "

    time_connect=%{time_connect}

    time_start_transfer=%{time_starttransfer}

    time_total=%{time_total}

    " \

    http://127.0.0.1:7474/extension/kryo/traverse

    View full-size slide

  57. 78.3 MB Received 68.5 MB
    23.7 MB/s
    Download
    Speed
    47.2 MB/s
    3 seconds Time spent 1.4 seconds
    Bson/Traverse Kryo/Traverse

    View full-size slide

  58. Compression?

    View full-size slide

  59. LZF
    • Optimized for speed
    • Streaming
    https://github.com/ning/compress

    View full-size slide

  60. https://github.com/ning/jvm-compressor-benchmark/wiki

    View full-size slide

  61. import com.esotericsoftware.kryo.io.Output;
    // Before
    Output output = new Output(outputStream);
    // After
    Output output = new Output(
    new LZFOutputStream(outputStream)
    );

    View full-size slide

  62. #!/bin/bash


    curl -i -XGET \

    -o output.log \

    -w "

    time_connect=%{time_connect}

    time_start_transfer=%{time_starttransfer}

    time_total=%{time_total}

    " \

    http://127.0.0.1:7474/extension/kryo/traverse

    View full-size slide

  63. 68.5 MB Received 7.6 MB
    47.2 MB/s
    Download
    Speed
    4299 KB/s
    1.4 seconds Time spent 1.7 seconds
    Kryo/Traverse
    Kryo/Traverse
    Compressed

    View full-size slide

  64. Short conclusion
    Compression useful (mostly) for large (huge)
    responses.

    View full-size slide

  65. Day 15
    Configurations

    View full-size slide

  66. Important
    • Neo4j makes heavy use of the java.nio package.
    Native I/O will result in memory being allocated
    outside the normal Java heap.
    • Neo4j will require all of the heap memory of the
    JVM plus the memory to be used for memory
    mapping to be available as physical memory.

    View full-size slide

  67. File buffer cache
    • The file buffer cache is sometimes called low level
    cache or file system cache.
    • It uses the operating system memory mapping
    features when possible.
    • Neo4j uses multiple file buffer caches, one for each
    different storage file.

    View full-size slide

  68. Store file Record size Contents
    neostore.nodestore.db 15 B Nodes
    neostore.relationshipstore.db 34 B Relationships
    neostore.propertystore.db 41 B
    Properties for nodes and
    relationships
    neostore.propertystore.db.strings 128 B Values of string properties
    neostore.propertystore.db.arrays 128 B Values of array properties
    String and arrays is stored in one or more 120B chunks, with 8B record overhead.

    View full-size slide

  69. # Default values for the low-level graph engine

    neostore.nodestore.db.mapped_memory=25M

    neostore.relationshipstore.db.mapped_memory=50M

    neostore.propertystore.db.mapped_memory=90M

    neostore.propertystore.db.strings.mapped_memory=130M

    neostore.propertystore.db.arrays.mapped_memory=130M
    # Tuned
    neostore.nodestore.db.mapped_memory=150M

    neostore.relationshipstore.db.mapped_memory=400M

    neostore.propertystore.db.mapped_memory=600M

    neostore.propertystore.db.strings.mapped_memory=1450M

    neostore.propertystore.db.arrays.mapped_memory=400M

    View full-size slide

  70. 150MB + 400MB + 600MB + 1450MB + 400MB = 3000MB
    # Tuned
    neostore.nodestore.db.mapped_memory=150M

    neostore.relationshipstore.db.mapped_memory=400M

    neostore.propertystore.db.mapped_memory=600M

    neostore.propertystore.db.strings.mapped_memory=1450M

    neostore.propertystore.db.arrays.mapped_memory=400M
    Available memory
    3GB - File Buffers 3GB - Java Heap 2GB - OS

    View full-size slide

  71. 68.5 MB Received 68.5 MB
    47.2 MB/s
    Download
    Speed
    49.6 MB/s
    1.4 seconds Time spent 1.37 seconds
    Kryo/Traverse
    Kryo/Traverse
    Tuned

    View full-size slide

  72. Object cache
    • The object cache is sometimes called high level
    cache.
    • It caches the Neo4j data in a form optimized for
    fast traversal.
    • Two different categories
    • Reference caches
    • High-Performance Cache

    View full-size slide

  73. Object cache
    • Nodes and relationships are added to the object
    cache as soon as they are accessed (lazily).
    • Reading from this cache may be 5 to 10 times
    faster than reading from the file buffer cache.

    View full-size slide

  74. Object cache type
    • None
    • Soft (Community default)
    • Weak
    • Strong
    • HPC (Enterprise default)

    View full-size slide

  75. Day 16
    Upgrade to Enterprise

    View full-size slide

  76. Neo4j Enterprise
    • Advanced Monitoring
    • Backups
    • Cluster
    • HPC

    View full-size slide

  77. HPC
    • Assigned a certain maximum amount of space on
    the JVM heap
    • Purge objects whenever it grows bigger than that
    • GC-pauses can be better controlled

    View full-size slide

  78. 68.5 MB Received 68.5 MB
    49.6 MB/s
    Download
    Speed
    64.7 MB/s
    1.37 seconds Time spent
    1.049
    seconds
    Kryo/Traverse
    Tuned
    Kryo/Traverse
    HPC

    View full-size slide

  79. 1.049
    seconds
    817
    seconds

    View full-size slide

  80. Physical server
    • 240 GB / 32 CPUs
    • CentOS 6.5 x64

    View full-size slide

  81. 68.5 MB Received 68.5 MB
    64.7 MB/s
    Download
    Speed
    68.5 MB/s
    1.049
    seconds
    Time spent
    0.991
    seconds
    Kryo/Traverse
    Virtual
    Kryo/Traverse
    Psyhical

    View full-size slide

  82. Another results
    • Get 30000 (huge) nodes by field: 2.3 seconds
    • Create nodes: >15000 nodes/second
    • ~2500 concurrent requests on virtual hardware

    View full-size slide