Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ArangoDB – A different approach to NoSQL

ArangoDB – A different approach to NoSQL

This is the talk I gave at Froscon 2013 about ArangoDB.

Lucas Dohmen

August 24, 2013
Tweet

More Decks by Lucas Dohmen

Other Decks in Programming

Transcript

  1. Lucas Dohmen ‣ ArangoDB Core Team ‣ ArangoDB Foxx &

    Ruby Adapter ‣ Student on the master branch ‣ hacken.in & nerdkun.de /\ (~( ) ) /\_/\ ( _-----_(@ @) ( \ / /|/--\|\ V " " " "
  2. Why did we start ArangoDB? How should an ideal multi-purpose

    database look like? Is it already out there? ArangoDB ‣ Unique feature set ‣ Solves some problems of other NoSQL DBs ‣ Greenfield project ‣ Experienced team building NoSQL DBs for more than 10 years ‣ Second Generation NoSQL DB 3
  3. Main Features 4 ‣ Open source and free ‣ Multi

    model database ‣ Convenient querying ‣ Extendable through JS & MRuby ‣ High performance & space efficiency ‣ Easy to use ‣ Started in Sep 2011 ‣ Version 1.0 in Sep 2012 ‣ Version 1.4 in Aug 2013 ‣ Multi Database Support ‣ Foxx API Framework ‣ Master/Slave Replication
  4. Free and Open Source ‣ ... as in FrOSCon ‣

    Apache 2 License ‣ On Github ‣ Do what you want with it ‣ ... and don‘t pay a dime! 5 : )
  5. Reporting RDBMS *) Source: Martin Fowler, http://martinfowler.com/articles/nosql-intro.pdf Product Catalog MongoDB

    Shopping Cart Riak User activity log Cassandra Analytics Cassandra Recommendations Neo4J Financial Data RDBMS User Sessions Redis Polyglot Persistence Example* Polyglot Persistence with ArangoDB Reporting RDBMS Product Catalog ArangoDB Shopping Cart ArangoDB User activity log Cassandra Analytics Cassandra Recommendations ArangoDB Financial Data ArangoDB User Sessions ArangoDB 7 Polyglot Persistence Speculative Retailer‘s Web Application
  6. 8 Transaction Processing DBs Managing the evolving state of an

    IT system Analytic Processing DBs Map/Reduce Graphs Extensibility Key/Value Column- Stores Complex Queries Documents Massively Distributed Structured Data Another NoSQL Map
  7. Key-Value Store ‣ Map value data to unique string keys

    (identifiers) ‣ Treat data as opaque (data has no schema) ‣ Can implement scaling and partitioning easily due to simplistic data model ‣ Key-value can be seen as a special case of documents ‣ For many applications this is sufficient, but not for all cases ArangoDB ‣ Supports key-value documents ‣ In the near future we will be support special key-value collection 9
  8. Document Store ‣ Normally based on key-value stores (each document

    still has a unique key) ‣ Allow to save documents with logical similarity in “databases” or “collections” ‣ Treat data records as attribute-structured documents (data is no more opaque) ‣ Often allow querying and indexing document attributes ArangoDB ‣ Supports documents ‣ For efficient memory handling we have an automatic schema recognition 10
  9. ‣ Example: Computer Science Bibliography ArangoDB ‣ Supports Property Graphs

    ‣ Vertices and edges are documents ‣ Custom traversals and built-in graph algorithms Graph Store 11 Type: inproceeding Title: Finite Size Effects Type: proceeding Title: Neural Modeling Type: person Name: Anthony C. C. Coolen Label: written Label: published Pages: 99-120 Type: person Name: Snchez-Andrs Label: edited
  10. Convenient querying Different scenarios require different access methods: ‣ Query

    a document by its unique id / key: GET /_api/document/users/12345 ‣ Query by providing an example document: PUT /_api/simple/by-example { "name": "Jan", "age": 38 } ‣ Query via AQL: FOR user IN users FILTER user.active == true RETURN { name: user.name } ‣ Graph Traversals ‣ JS Actions 12
  11. Why another query language? ‣ Initially, we implemented a subset

    of SQL's SELECT ‣ It didn't fit well ‣ UNQL addressed some of the problems ‣ Looked dead ‣ No working implementations ‣ XQuery seemed quite powerful ‣ A bit too complex for simple queries ‣ JSONiq wasn't there when we started 13
  12. Other Document Stores ‣ MongoDB uses JSON/BSON as its “query

    language” ‣ Limited ‣ Hard to read & write for more complex queries ‣ CouchDB uses Map/Reduces ‣ It‘s not a relational algebra, and therefore hard to generate ‣ Not easy to learn ‣ More about Queries in different NoSQL Databases: ‣ Query mechanisms for NoSQL databases ‣ Jan Steemann ‣ 14:00, HS5 14
  13. ArangoDB Query Language (AQL) ‣ We came up with AQL

    mid-2012 ‣ Declarative language, loosely based on the syntax of XQuery ‣ Other keywords than SQL so it's clear that the languages are different ‣ Implemented in C and JavaScript 15
  14. Example for Aggregation ‣ Retrieve cities with the number of

    users: FOR u IN users COLLECT city = u.city INTO g RETURN { "city" : city, "numUsersInCity": LENGTH(g) } 16
  15. Example for Graph Query ‣ Paths: FOR u IN users

    LET userRelations = ( FOR p IN PATHS( users, relations, "OUTBOUND" ) FILTER p._from == u._id RETURN p ) RETURN { "user" : u, "relations" : userRelations } 17
  16. Extendable through JS & MRuby ‣ Dynamic Languages enrich ArangoDB

    ‣ Multi Collection Transactions ‣ Graph Traversals ‣ Cascading deletes/updates ‣ Aggregate data from multiple queries into a single response ‣ Data-intensive operations ‣ Actions, Foxx, Application Server ‣ Currently supported ‣ JavaScript (V8) ‣ MRuby (experimental, not fully integrated yet) 18
  17. Application Server / Action Server ‣ ArangoDB can answer arbitrary

    HTTP requests directly ‣ You can write your own JavaScript functions (“actions”) that will be executed server-side ‣ Includes a permission system ➡ You can use it as a database or as a combined database/ application server 19
  18. ArangoDB Foxx ‣ What if we could talk to the

    database directly? ‣ It would only need an API! ‣ What if we could define this API in JavaScript? ‣ ArangoDB Foxx is streamlined for API creation – not a Jack of all trades ‣ It is designed for front end developers: Use JavaScript, you already know that 21 /\ (~( ) ) /\_/\ ( _-----_(@ @) ( \ / /|/--\|\ V " " " "
  19. More features ‣ Full access to ArangoDB‘s internal APIs: ‣

    Simple Queries ‣ AQL ‣ Traversals ‣ ... ‣ Automatic generation of interactive documentation ‣ Models and Repositories ‣ Central repository of Foxx apps for re-use and inspiration ‣ Authentication Module ‣ ... ‣ Check out https://github.com/mchacki/aye_aye 23
  20. High performance & space efficiency ‣ Automatic schema recognition ‣

    C database core, a C++ communication layer, JS and C++ for additional functionalities ‣ Performance critical points can be transformed to C oder C++ ‣ Even though it has a richer feature set, it can compete performance wise with existing solutions and partly even achieve better results 24
  21. Space Efficiency ‣ Measure the space on disk of different

    data sets ‣ First in the standard config, then with some optimization ‣ We measured a bunch of different tasks ‣ I‘ll show you two of them here 25
  22. Store 50.000 Wiki Articles 26 0 500 1000 1500 2000

    ArangoDB CouchDB MongoDB http://www.arangodb.org/2012/07/08/collection-disk-usage-arangodb
  23. 3.459.421 AOL Search Queries 27 0 750 1500 2250 3000

    ArangoDB CouchDB MongoDB http://www.arangodb.org/2012/07/08/collection-disk-usage-arangodb
  24. Performance: Disclaimer ‣ Always take performance tests with a grain

    of salt ‣ Performance is very dependent on a lot of factors including the specific task at hand ‣ This is just to give you a glimpse at the performance ‣ Always do your own performance tests (and if you do, report back to us :) ) ‣ But now: Let‘s see some numbers 28
  25. Execution Time: Bulk Insert of 10.000.000 documents 29 ArangoDB CouchDB

    MongoDB http://www.arangodb.org/2012/09/04/bulk-inserts-mongodb-couchdb-arangodb
  26. Conclusion from Tests ‣ ArangoDB is really space efficient ‣

    ArangoDB is “fast enough” ‣ Please test it for your own use case 30
  27. Easy to use ‣ Easy to use admin interface ‣

    Simple Queries for simple queries, AQL for complex queries ‣ Simplify your setup: ArangoDB only – no Application Server etc. – on a single server is sufficient for some use cases ‣ You need graph queries or key value storage? You don't need to add another component to the mix. ‣ No external dependencies like the JVM – just install ArangoDB ‣ HTTP interface – use your load balancer 31
  28. Join our growing community 32 .. working on the geo

    index, the full text search and many APIs: Ruby, Python, PHP, Java, D, ...
  29. Summary ‣ API based on web standards: HTTP, REST, JSON

    ‣ Simple to use ‣ Flexible in terms of querying ‣ Can be used as a database and an application server 33