Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OrientDB: go fast in a graph world

OrientDB: go fast in a graph world

Relational databases have been the center of the world for many years although they suffer from a prefixed schema you have to adhere to. Now you have a choice: using a NoSQL database.
OrientDB is a NoSQL, multimodel and amazingly fast database since it can store 220,000 records per second on common hardware. This talk will show you some graph theory and the main advantages of using a graph database such as OrientDB.

Andrea Giuliano

February 25, 2016
Tweet

More Decks by Andrea Giuliano

Other Decks in Technology

Transcript

  1. O R I E N T D B
    G O FA S T I N A G R A P H W O R L D
    February 25th, 2016
    @bit_shark
    Andrea Giuliano

    View Slide

  2. Andrea Giuliano
    @bit_shark
    $whoami

    View Slide

  3. O N C E U P O N A T I M E

    View Slide

  4. O N C E U P O N A T I M E - 1 9 7 9
    • first commercially available RDBMS
    • written in assembly
    • runs in 128K of memory
    • not support for transactions
    • support for basic sql queries and joins

    View Slide

  5. R E L AT I O N A L D ATA B A S E S
    • data is presented to the user in the form of
    rows and columns (a relation)
    • data can be manipulated through relational
    operators in a tabular form

    View Slide

  6. O V E R T I M E
    • data start growing in size
    • data become heterogeneous
    • structured, semi-structured, unstructured data
    • rate at which data is generated increased

    View Slide

  7. B I G D ATA

    View Slide

  8. 3 0 Y E A R S L AT E R ( 2 0 0 9 )
    • NoSQL movement
    • some intents of NOSQL databases:
    • being non-relational
    • simplicity of design
    • simpler horizontal scaling
    • speed up some operations
    • distributed

    View Slide

  9. ( S O M E ) T Y P E S O F N O S Q L D ATA B A S E S
    • document
    • key-value
    • object-oriented
    • graph
    • multi-model

    View Slide

  10. D O C U M E N T M O D E L
    • the document encapsulate data in some standard
    format: yaml, json, xml, bson
    {
    "id": 45,
    "name": "Andrea",
    "fav_colours": ["blue", "green"],
    "driver_license": {
    "number": "AA123"
    }
    }

    View Slide

  11. K E Y- VA L U E M O D E L
    • dictionary in which data is represented as a collection
    of key-value pairs
    > SET akey “Andrea”
    > GET akey

    “Andrea”
    akey Andrea

    View Slide

  12. O B J E C T- O R I E N T E D M O D E L
    • data is represented in the form of objects
    Animal
    Dog Cat

    View Slide

  13. G R A P H M O D E L
    • data is represented in the form of a graph

    View Slide

  14. M U LT I M O D E L
    K e y - Va l u e
    D o c u m e n t
    O b j e c t - o r i e n t e d
    G r a p h

    View Slide

  15. R E L AT I O N A L V S N O S Q L
    • how data is represented
    • how data is related
    • relational databases have the concept of joins
    • NoSQL databases have multiple concepts
    • aggregation
    • relation (through edges)

    View Slide

  16. I S S U E S W I T H J O I N
    User
    name id
    Andrea 45
    John 48
    Steven 53
    Bill 70
    Like
    user_id food_id
    45 13
    45 49
    70 38
    Food
    id name
    13 Pasta
    38 Sushi
    49 Kebab
    63 Meat
    SELECT F.name FROM User U, Like L, Food F
    WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;

    View Slide

  17. I S S U E S W I T H J O I N
    User
    name id
    Andrea 45
    John 48
    Steven 53
    Bill 70
    Like
    user_id food_id
    45 13
    45 49
    70 38
    Food
    id name
    13 Pasta
    38 Sushi
    49 Kebab
    63 Meat
    SELECT F.name FROM User U, Like L, Food F
    WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
    double JOIN per record at runtime

    View Slide

  18. I S S U E S W I T H J O I N
    • the relationships are computed every time a query is
    performed
    • time complexity grows with data: O(log n)
    • heavy runtime cost with large datasets
    • index lookup does not help
    • speeds up searches but slows down inserts, updates, deletes
    • imagine on billions of records
    speakerdeck.com/agiuliano/index-management-in-depth

    View Slide

  19. S U M M I N G U P J O I N
    • a join operation involves
    • searching a record in the starting table (User)
    • use the foreign key to lookup the intermediate table
    (Like) through its index
    • traversing the intermediate table looking up the
    target table (Food) ids

    View Slide

  20. The more entries you have
    the more your queries are SLOW
    www.flickr.com/photos/blacktigersdream/8737830046

    View Slide

  21. S AV I N G P R O J E C T I O N S

    View Slide

  22. S AV I N G P R O J E C T I O N S
    advantages
    • data is predetermined
    disadvantages
    • data synchronization
    • solves only reads
    UserLikesFood
    User user_id Like food_id
    Andrea 45 Pasta 13
    Andrea 45 Kebab 49
    Bill 70 Sushi 38

    View Slide

  23. R E L AT I O N S H I P S
    I N N O S Q L W O R L D

    View Slide

  24. R E L AT I O N S H I P S I N D O C U M E N T S
    • embed information in documents where you need
    them
    • data duplication
    • faster access
    {
    "id": 45,
    "name": "Andrea",
    "likes": ["Pasta", "Kebab"]
    }

    View Slide

  25. G R A P H S

    View Slide

  26. G R A P H
    G = (V, E)
    Graph Vertices Edges
    Edge Vertex
    Graph

    View Slide

  27. G R A P H
    Andrea
    BMW
    name: Andrea
    license: A123
    drives
    model: X5
    doors: 5
    V E RT I C E S
    A R E D I R E C T E D
    V E RT I C E S
    C A N H AV E
    P R O P E RT I E S
    E D G E S
    C A N H AV E
    P R O P E RT I E S

    View Slide

  28. G R A P H
    Andrea
    BMW
    drives
    owns
    N-M relationships can be represented
    using multiple edges

    View Slide

  29. B U I L D S M A R T R E L AT I O N S H I P S
    Andrea
    Luxury Cars
    BMW
    Ferrari
    Customers
    John
    Cars
    Root vertices

    View Slide

  30. B U I L D S M A R T R E L AT I O N S H I P S
    • root vertices can be meta graphs
    • meta graphs add information to make traversal 

    easier and faster

    View Slide

  31. a Car can be enriched with information regarding
    • date of purchase
    • country of manufacture
    EXAMPLE
    www.flickr.com/photos/aigle_dore/5952275132

    View Slide

  32. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016

    View Slide

  33. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Ferrari
    Maserati
    Europe
    Italy
    Germany

    View Slide

  34. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany

    View Slide

  35. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany
    get all the italian cars
    sold on 01/15/2016

    View Slide

  36. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany
    let’s start from Made

    View Slide

  37. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany

    View Slide

  38. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany
    found the cars made in Italy
    now filter by date using incoming edges

    View Slide

  39. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany

    View Slide

  40. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany
    let’s try from Purchase

    View Slide

  41. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany

    View Slide

  42. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany
    found the cars purchased on 01/15/2016
    now filter by country using incoming edges

    View Slide

  43. B U I L D S M A R T R E L AT I O N S H I P S
    BMW
    Made
    Purchase Year
    2016
    Month
    Jan 2016
    Day
    01/15/2016
    Ferrari
    Maserati
    Month
    Feb 2016
    Day
    02/01/2016
    Europe
    Italy
    Germany

    View Slide

  44. O R I E N T D B

    View Slide

  45. O R I E N T D B
    • nosql database
    • multimodel
    • high performance (can write 400,000 records/sec*)
    • http rest and json api
    • ACID
    *On Intel i7 8 core CPU, 16 GB RAM, SSD RPM, Multi-threads, no indexes (orientdb.com)

    View Slide

  46. 15+ languages
    30+ drivers

    View Slide

  47. I N S TA L L AT I O N
    orientdb.com/docs/2.1/Tutorial-Installation.html
    $ docker run -d -v … orientdb/orientdb
    $ brew install orientdb

    View Slide

  48. L O G I C A L C O N C E P T S
    • class
    • type of data model
    • cluster
    • stores groups of records within a class
    class Car
    cluster
    USA_car
    cluster
    Italy_car

    View Slide

  49. V E R T I C E S
    • record identifier (RID)
    • each record has its own self-assigned unique ID
    • composed of 2 parts 

    #:
    • list of properties
    • edge’s RID
    • in
    • out

    View Slide

  50. E D G E S
    • record identifier (RID)
    • each record has its own self-assigned unique ID
    • composed of 2 parts 

    #:
    • in
    • RID of the ingoing vertex
    • out
    • RID of the outgoing vertex

    View Slide

  51. R E L AT I O N S H I P S
    • does not make use of JOINs like RDBMS
    • physical links O(1)
    • relationship managed by storing the edge’s RID in
    both vertices as “out” and “in”
    • for 1-to-n relationship collections of rid are used
    o u t : [ # 1 3 : 3 5 ]
    i n : [ # 1 5 : 1 0 0 ]
    l i c e n s e : A 1 2 3
    drives
    o u t : [ # 1 4 : 5 4 ]
    n a m e : A n d re a
    i n : [ # 1 4 : 5 4 ]
    m o d e l : X 5
    #13:35 #15:100
    #14:54
    Andrea BMW

    View Slide

  52. T R AV E R S E A R E L AT I O N S H I P
    o u t : [ # 1 3 : 3 5 ]
    i n : [ # 1 5 : 1 0 0 ]
    drives
    o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
    #13:35 #15:100
    #14:54
    Andrea BMW

    View Slide

  53. T R AV E R S E A R E L AT I O N S H I P
    drives
    #13:35 #15:100
    #14:54
    Andrea BMW
    o u t : [ # 1 3 : 3 5 ]
    i n : [ # 1 5 : 1 0 0 ]
    o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]

    View Slide

  54. C R E AT E A C L A S S
    CREATE CLASS Car EXTENDS V
    V
    C a r
    E
    d r i v e s
    CREATE CLASS drives EXTENDS E

    View Slide

  55. A D D P R O P E R T I E S T O A C L A S S
    • create properties involves to define its name and its
    type
    • is mandatory in order to define indexes or constraints
    CREATE PROPERTY Car.model String
    C a r
    m o d e l : S t r i n g

    View Slide

  56. A D D C O N S T R A I N T S T O A P R O P E R T Y
    • alter the defined property adding the constraint
    ALTER PROPERTY Car.model MANDATORY TRUE
    C a r
    m o d e l : S t r i n g

    View Slide

  57. Q U E RY I N G
    SELECT FROM Car WHERE model=‘X5’
    C a r
    r i d : # 1 5 : 6
    m o d e l : X 5
    SELECT FROM #15:6

    View Slide

  58. Q U E RY I N G
    C a r
    r i d : # 1 5 : 6
    m o d e l : X 5
    SELECT FROM [#15:6, #15:7]
    C a r
    r i d : # 1 5 : 7
    m o d e l : Z 4

    View Slide

  59. Q U E RY I N G
    SELECT name, OUT(“drives”).model AS DrivesCar
    FROM #17:0
    name DrivesCar
    Andrea [“X5”, “Z4”]

    View Slide

  60. Q U E RY I N G
    SELECT name, OUT(“drives”).model AS DrivesCar
    FROM #17:0
    UNWIND DrivesCar
    name DrivesCar
    Andrea X5
    Andrea Z4

    View Slide

  61. Q U E RY I N G
    TRAVERSE * FROM #17:0 MAXDEPTH 4
    Andrea
    BMW
    Maserati
    drives
    drives

    View Slide

  62. D E P T H F I R S T S E A R C H
    TRAVERSE * FROM #17:0 STRATEGY DEPTH_FIRST
    1
    2 8
    7
    3 6 9 1 2
    1 1
    1 0
    5
    4

    View Slide

  63. B R E A D T H F I R S T S E A R C H
    1
    2 4
    3
    TRAVERSE * FROM #17:0 STRATEGY BREADTH_FIRST
    5 6 7 8
    1 2
    1 1
    1 0
    9

    View Slide

  64. W H E N
    • store inter-connected data
    • query data by relation of arbitrary length
    • continuously evolving data set
    • make it easy to evolve the database

    View Slide

  65. View Slide