Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OrientDB: go fast in a graph world

OrientDB: go fast in a graph world

Relational databases have been the center of the world for many years although they suffer from a prefixed schema you have to adhere to. Now you have a choice: using a NoSQL database.
OrientDB is a NoSQL, multimodel and amazingly fast database since it can store 220,000 records per second on common hardware. This talk will show you some graph theory and the main advantages of using a graph database such as OrientDB.

Andrea Giuliano

February 25, 2016
Tweet

More Decks by Andrea Giuliano

Other Decks in Technology

Transcript

  1. O R I E N T D B G O

    FA S T I N A G R A P H W O R L D February 25th, 2016 @bit_shark Andrea Giuliano
  2. O N C E U P O N A T

    I M E - 1 9 7 9 • first commercially available RDBMS • written in assembly • runs in 128K of memory • not support for transactions • support for basic sql queries and joins
  3. R E L AT I O N A L D

    ATA B A S E S • data is presented to the user in the form of rows and columns (a relation) • data can be manipulated through relational operators in a tabular form
  4. O V E R T I M E • data

    start growing in size • data become heterogeneous • structured, semi-structured, unstructured data • rate at which data is generated increased
  5. 3 0 Y E A R S L AT E

    R ( 2 0 0 9 ) • NoSQL movement • some intents of NOSQL databases: • being non-relational • simplicity of design • simpler horizontal scaling • speed up some operations • distributed
  6. ( S O M E ) T Y P E

    S O F N O S Q L D ATA B A S E S • document • key-value • object-oriented • graph • multi-model
  7. D O C U M E N T M O

    D E L • the document encapsulate data in some standard format: yaml, json, xml, bson { "id": 45, "name": "Andrea", "fav_colours": ["blue", "green"], "driver_license": { "number": "AA123" } }
  8. K E Y- VA L U E M O D

    E L • dictionary in which data is represented as a collection of key-value pairs > SET akey “Andrea” > GET akey
 “Andrea” akey Andrea
  9. O B J E C T- O R I E

    N T E D M O D E L • data is represented in the form of objects Animal Dog Cat
  10. G R A P H M O D E L

    • data is represented in the form of a graph
  11. M U LT I M O D E L K

    e y - Va l u e D o c u m e n t O b j e c t - o r i e n t e d G r a p h
  12. R E L AT I O N A L V

    S N O S Q L • how data is represented • how data is related • relational databases have the concept of joins • NoSQL databases have multiple concepts • aggregation • relation (through edges)
  13. I S S U E S W I T H

    J O I N User name id Andrea 45 John 48 Steven 53 Bill 70 Like user_id food_id 45 13 45 49 70 38 Food id name 13 Pasta 38 Sushi 49 Kebab 63 Meat SELECT F.name FROM User U, Like L, Food F WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
  14. I S S U E S W I T H

    J O I N User name id Andrea 45 John 48 Steven 53 Bill 70 Like user_id food_id 45 13 45 49 70 38 Food id name 13 Pasta 38 Sushi 49 Kebab 63 Meat SELECT F.name FROM User U, Like L, Food F WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id; double JOIN per record at runtime
  15. I S S U E S W I T H

    J O I N • the relationships are computed every time a query is performed • time complexity grows with data: O(log n) • heavy runtime cost with large datasets • index lookup does not help • speeds up searches but slows down inserts, updates, deletes • imagine on billions of records speakerdeck.com/agiuliano/index-management-in-depth
  16. S U M M I N G U P J

    O I N • a join operation involves • searching a record in the starting table (User) • use the foreign key to lookup the intermediate table (Like) through its index • traversing the intermediate table looking up the target table (Food) ids
  17. The more entries you have the more your queries are

    SLOW www.flickr.com/photos/blacktigersdream/8737830046
  18. S AV I N G P R O J E

    C T I O N S
  19. S AV I N G P R O J E

    C T I O N S advantages • data is predetermined disadvantages • data synchronization • solves only reads UserLikesFood User user_id Like food_id Andrea 45 Pasta 13 Andrea 45 Kebab 49 Bill 70 Sushi 38
  20. R E L AT I O N S H I

    P S I N N O S Q L W O R L D
  21. R E L AT I O N S H I

    P S I N D O C U M E N T S • embed information in documents where you need them • data duplication • faster access { "id": 45, "name": "Andrea", "likes": ["Pasta", "Kebab"] }
  22. G R A P H G = (V, E) Graph

    Vertices Edges Edge Vertex Graph
  23. G R A P H Andrea BMW name: Andrea license:

    A123 drives model: X5 doors: 5 V E RT I C E S A R E D I R E C T E D V E RT I C E S C A N H AV E P R O P E RT I E S E D G E S C A N H AV E P R O P E RT I E S
  24. G R A P H Andrea BMW drives owns N-M

    relationships can be represented using multiple edges
  25. B U I L D S M A R T

    R E L AT I O N S H I P S Andrea Luxury Cars BMW Ferrari Customers John Cars Root vertices
  26. B U I L D S M A R T

    R E L AT I O N S H I P S • root vertices can be meta graphs • meta graphs add information to make traversal 
 easier and faster
  27. a Car can be enriched with information regarding • date

    of purchase • country of manufacture EXAMPLE www.flickr.com/photos/aigle_dore/5952275132
  28. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016
  29. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Ferrari Maserati Europe Italy Germany
  30. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany
  31. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany get all the italian cars sold on 01/15/2016
  32. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany let’s start from Made
  33. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany
  34. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany found the cars made in Italy now filter by date using incoming edges
  35. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany
  36. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany let’s try from Purchase
  37. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany
  38. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany found the cars purchased on 01/15/2016 now filter by country using incoming edges
  39. B U I L D S M A R T

    R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 Europe Italy Germany
  40. O R I E N T D B • nosql

    database • multimodel • high performance (can write 400,000 records/sec*) • http rest and json api • ACID *On Intel i7 8 core CPU, 16 GB RAM, SSD RPM, Multi-threads, no indexes (orientdb.com)
  41. I N S TA L L AT I O N

    orientdb.com/docs/2.1/Tutorial-Installation.html $ docker run -d -v … orientdb/orientdb $ brew install orientdb
  42. L O G I C A L C O N

    C E P T S • class • type of data model • cluster • stores groups of records within a class class Car cluster USA_car cluster Italy_car
  43. V E R T I C E S • record

    identifier (RID) • each record has its own self-assigned unique ID • composed of 2 parts 
 #<cluster-id>:<cluster-position> • list of properties • edge’s RID • in • out
  44. E D G E S • record identifier (RID) •

    each record has its own self-assigned unique ID • composed of 2 parts 
 #<cluster-id>:<cluster-position> • in • RID of the ingoing vertex • out • RID of the outgoing vertex
  45. R E L AT I O N S H I

    P S • does not make use of JOINs like RDBMS • physical links O(1) • relationship managed by storing the edge’s RID in both vertices as “out” and “in” • for 1-to-n relationship collections of rid are used o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] l i c e n s e : A 1 2 3 drives o u t : [ # 1 4 : 5 4 ] n a m e : A n d re a i n : [ # 1 4 : 5 4 ] m o d e l : X 5 #13:35 #15:100 #14:54 Andrea BMW
  46. T R AV E R S E A R E

    L AT I O N S H I P o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] drives o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ] #13:35 #15:100 #14:54 Andrea BMW
  47. T R AV E R S E A R E

    L AT I O N S H I P drives #13:35 #15:100 #14:54 Andrea BMW o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
  48. C R E AT E A C L A S

    S CREATE CLASS Car EXTENDS V V C a r E d r i v e s CREATE CLASS drives EXTENDS E
  49. A D D P R O P E R T

    I E S T O A C L A S S • create properties involves to define its name and its type • is mandatory in order to define indexes or constraints CREATE PROPERTY Car.model String C a r m o d e l : S t r i n g
  50. A D D C O N S T R A

    I N T S T O A P R O P E R T Y • alter the defined property adding the constraint ALTER PROPERTY Car.model MANDATORY TRUE C a r m o d e l : S t r i n g
  51. Q U E RY I N G SELECT FROM Car

    WHERE model=‘X5’ C a r r i d : # 1 5 : 6 m o d e l : X 5 SELECT FROM #15:6
  52. Q U E RY I N G C a r

    r i d : # 1 5 : 6 m o d e l : X 5 SELECT FROM [#15:6, #15:7] C a r r i d : # 1 5 : 7 m o d e l : Z 4
  53. Q U E RY I N G SELECT name, OUT(“drives”).model

    AS DrivesCar FROM #17:0 name DrivesCar Andrea [“X5”, “Z4”]
  54. Q U E RY I N G SELECT name, OUT(“drives”).model

    AS DrivesCar FROM #17:0 UNWIND DrivesCar name DrivesCar Andrea X5 Andrea Z4
  55. Q U E RY I N G TRAVERSE * FROM

    #17:0 MAXDEPTH 4 Andrea BMW Maserati drives drives
  56. D E P T H F I R S T

    S E A R C H TRAVERSE * FROM #17:0 STRATEGY DEPTH_FIRST 1 2 8 7 3 6 9 1 2 1 1 1 0 5 4
  57. B R E A D T H F I R

    S T S E A R C H 1 2 4 3 TRAVERSE * FROM #17:0 STRATEGY BREADTH_FIRST 5 6 7 8 1 2 1 1 1 0 9
  58. W H E N • store inter-connected data • query

    data by relation of arbitrary length • continuously evolving data set • make it easy to evolve the database