Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ArangoDB at Rubyshift Munich

Lucas Dohmen
September 11, 2013

ArangoDB at Rubyshift Munich

Here are my ArangoDB slides from last night's Ruby Usergroup in Munich.

Lucas Dohmen

September 11, 2013
Tweet

More Decks by Lucas Dohmen

Other Decks in Programming

Transcript

  1. Lucas Dohmen ‣ ArangoDB Core Team ‣ ArangoDB Foxx &

    Ruby Adapter ‣ Student on the master branch ‣ hacken.in & nerdkun.de 2 /\ (~( ) ) /\_/\ ( _-----_(@ @) ( \ / /|/--\|\ V " " " "
  2. Why did we start ArangoDB? How should an ideal multi-purpose

    database look like? Is it already out there? ‣ Second Generation NoSQL DB ‣ Unique feature set ‣ Solves some problems of other NoSQL DBs ‣ Greenfield project ‣ Experienced team building NoSQL DBs for more than 10 years 3
  3. Main Features 4 ‣ Open source and free ‣ Multi

    model database ‣ Convenient querying ‣ Extendable through JS & MRuby ‣ High performance & space efficiency ‣ Easy to use ‣ Started in Sep 2011 ‣ Version 1.0 in Sep 2012 ‣ Version 1.4 in Aug 2013 ‣ Multi Database Support ‣ Foxx API Framework ‣ Master/Slave Replication
  4. Free and Open Source ‣ Apache 2 License ‣ On

    Github ‣ Do what you want with it ‣ ... and don‘t pay a dime! 5
  5. Key-Value Store ‣ Map value data to unique string keys

    (identifiers) ‣ Treat data as opaque (data has no structure) ‣ Can implement scaling and partitioning easily due to simplistic data model ‣ Key-value can be seen as a special case of documents. For many applications this is sufficient, but not for all cases. ArangoDB ‣ Supports key-value documents ‣ In the near future it supports special key-value collection ‣ The value will not be parsed ‣ Sharding capabilities of Key-Value Collections will differ from Document Collections 7
  6. Document Store ‣ Normally based on key-value stores (each document

    still has a unique key) ‣ Allow to save documents with logical similarity in “collections” ‣ Treat data records as attribute-structured documents (data is no longer opaque) ‣ Often allows querying and indexing document attributes ArangoDB ‣ Supports both. A database can contain collections from different types ‣ For efficient memory handling we have an automatic schema recognition ‣ Different ways to retrieve data 8
  7. ‣ Example: Computer Science Bibliography ArangoDB ‣ Supports Property Graphs

    ‣ Vertices and edges are documents ‣ Query them using geo-index, full-text, SQL-like queries ‣ Edges are directed relations between vertices ‣ Custom traversals and built-in graph algorithms Graph Store 9 Type: inproceeding Title: Finite Size Effects Type: proceeding Title: Neural Modeling Type: person Name: Anthony C. C. Coolen Label: written Label: published Pages: 99-120 Type: person Name: Snchez-Andrs Label: edited
  8. Analytic Processing DBs Transaction Processing DBs Managing the evolving state

    of an IT system Complex Queries Map/Reduce Graphs Extensibility Key/Value Column- Stores Documents Massively Distributed Structured Data NoSQL Map 10
  9. 11 Transaction Processing DBs Managing the evolving state of an

    IT system Analytic Processing DBs Map/Reduce Graphs Extensibility Key/Value Column- Stores Complex Queries Documents Massively Distributed Structured Data Another NoSQL Map
  10. *) Source: Martin Fowler, http://martinfowler.com/articles/nosql-intro.pdf Reporting RDBMS User activity log

    Cassandra Product Catalog MongoDB Analytics Cassandra Shopping Cart Riak Recommendations Neo4J Financial Data RDBMS User Sessions Redis Polyglot Persistence Example* Polyglot Persistence with ArangoDB Reporting RDBMS User activity log Cassandra Product Catalog ArangoDB Analytics Cassandra Shopping Cart ArangoDB Recommendations ArangoDB Financial Data ArangoDB User Sessions ArangoDB 12 Polyglot Persistence Speculative Retailer‘s Web Application
  11. Convenient querying Different scenarios require different access methods: ‣ Query

    a document by its unique id / key: GET /_api/document/users/12345 ‣ Query by providing an example document: PUT /_api/simple/by-example { "name": "Jan", "age": 38 } ‣ Query via AQL: FOR user IN users FILTER user.active == true RETURN { name: user.name } ‣ Graph Traversals and JS for your own traversals ‣ JS Actions for “intelligent” DB request 13
  12. Why another query language? ‣ Initially, we implemented a subset

    of SQL's SELECT ‣ It didn't fit well ‣ UNQL addressed some of the problems ‣ Looked dead ‣ No working implementations ‣ XQuery seemed quite powerful ‣ A bit too complex for simple queries ‣ JSONiq wasn't there when we started 14
  13. Other Document Stores ‣ MongoDB uses JSON/BSON as its “query

    language” ‣ Limited ‣ Hard to read & write for more complex queries ‣ Complex queries, joins and transactions not possible ‣ CouchDB uses Map/Reduces ‣ It‘s not a relational algebra, and therefore hard to generate ‣ Not easy to learn ‣ Complex queries, joins and transactions not possible 15
  14. ArangoDB Query Language (AQL) ‣ We came up with AQL

    mid-2012 ‣ Declarative language, loosely based on the syntax of XQuery ‣ Other keywords than SQL so it's clear that the languages are different ‣ Implemented in C and JavaScript 16
  15. Example for Aggregation ‣ Retrieve cities with the number of

    users: FOR u IN users COLLECT city = u.city INTO g RETURN { "city" : city, "numUsersInCity": LENGTH(g) } 17
  16. Example for Graph Query ‣ Paths: FOR u IN users

    LET userRelations = ( FOR p IN PATHS( users, relations, "OUTBOUND" ) FILTER p._from == u._id RETURN p ) RETURN { "user" : u, "relations" : userRelations } 18
  17. Extendable through JS & MRuby ‣ Scripting-Languages enrich ArangoDB ‣

    Multi Collection Transactions ‣ Building small and efficient Apps - Foxx App Framework ‣ Graph Traversals ‣ Cascading deletes/updates ‣ Assign permissions to actions ‣ Aggregate data from multiple queries into a single response ‣ Carry out data-intensive operations ‣ Currently supported ‣ JavaScript (Google V8) ‣ MRuby (experimental, not fully integrated yet) 19
  18. Action Server - kind of Application Server ‣ ArangoDB can

    answer arbitrary HTTP requests directly ‣ You can write your own JavaScript functions (“actions”) that will be executed server-side ‣ Includes a permission system ➡ You can use it as a database or as a combined database/app server 20
  19. ‣ Single Page Web Applications ‣ Native Mobile Applications ‣

    ext. Developer APIs APIs - will become more & more important 21
  20. ArangoDB Foxx ‣ What if you could talk to the

    database directly? ‣ It would only need an API. ‣ What if we could define this API in JavaScript? ‣ ArangoDB Foxx is streamlined for API creation – not a jack of all trades ‣ It is designed for front end developers: Use JavaScript, which you already know (without running into callback hell) 22 /\ (~( ) ) /\_/\ ( _-----_(@ @) ( \ / /|/--\|\ V " " " "
  21. High performance & space efficiency ‣ Automatic schema recognition ‣

    C database core, a C++ communication layer, JS and C++ for additional functionalities ‣ Performance critical points can be transformed to C oder C++ ‣ Although ArangoDB has a wide range of functions, such as MVCC real ACID, schema recognition, etc., it can compete with popular stores documents 23
  22. Space Efficiency ‣ Measure the space on disk of different

    data sets ‣ First in the standard config, then with some optimization ‣ We measured a bunch of different tasks 24
  23. Store 50,000 Wiki Articles 25 0 MB 500 MB 1000

    MB 1500 MB 2000 MB ArangoDB CouchDB MongoDB http://www.arangodb.org/2012/07/08/collection-disk-usage-arangodb Normal Optimized
  24. 3,459,421 AOL Search Queries 26 0 MB 750 MB 1500

    MB 2250 MB 3000 MB ArangoDB CouchDB MongoDB http://www.arangodb.org/2012/07/08/collection-disk-usage-arangodb Normal Optimized
  25. Performance: Disclaimer ‣ Always take performance tests with a grain

    of salt ‣ Performance is very dependent on a lot of factors including the specific task at hand ‣ This is just to give you a glimpse at the performance ‣ Always do your own performance tests (and if you do, report back to us :) ) ‣ But now: Let‘s see some numbers 27
  26. Execution Time: Bulk Insert of 10,000,000 documents 28 ArangoDB CouchDB

    MongoDB http://www.arangodb.org/2012/09/04/bulk-inserts-mongodb-couchdb-arangodb
  27. Conclusion from Tests ‣ ArangoDB is really space efficient ‣

    ArangoDB is “fast enough” ‣ Please test it for your own use case 29
  28. Easy to use ‣ Easy to use admin interface ‣

    Simple Queries for simple queries, AQL for complex queries ‣ Simplify your setup: ArangoDB only – no Application Server etc. – on a single server is sufficient for some use cases ‣ You need graph queries or key value storage? You don't need to add another component to the mix. ‣ No external dependencies like the JVM – just install ArangoDB ‣ HTTP interface – use your load balancer 30
  29. Join the growing community 31 They are working on geo

    index, full text search and many APIs: Ruby, Python, PHP, Java, D, ...
  30. A call to arms ‣ There‘s no “Mongoid” for ArangoDB

    yet ‣ Let‘s build one ‣ If you‘re interested, just contact me 32