Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Barrel: Build a P2P document database

Barrel: Build a P2P document database

Barrel (https://barrel-db.org) is a modern document-oriented database in Erlang focusing on data locality (put/match the data next to you) and P2P with an effort to maintain a compatibility with the Apache CouchDB API.

Barrel started as a fork of Apache CouchDB, another database in Erlang, but it quickly appeared that we needed to go further. Building a database in Erlang is indeed challenging. I/Os are handled differently from the other VM for example. Performance is always a trade-off versus the concurrency and the fault tolerance. On the other hand, Erlang, its vm, the OTP framework offer many competitive advantages that can help you to build a very effective database. So Barrel has then been rewritten to benefit from them.

This talk will first deconstruct a database and then focus how we can build one in Erlang using Barrel as an example. We will see which part probably needs to be in C, which one really fit well in Erlang… It will show you also how building a P2P protocol in Erlang is easy and help us to make barrel a true P2P database.

F04edc7cb2099745e5413c754d3d22f5?s=128

Benoit Chesneau

March 10, 2016
Tweet

Transcript

  1. BARREL BUILD A P2P DOCUMENT
 ORIENTED DATABASE https://barrel-db.org Erlang Factory

    San Francisco 2016
  2. None
  3. None
  4. None
  5. VISION AND CONCEPT

  6. local database mobile sensor "cloud" database local database mobile DATA

    IS MOBILE
  7. share discover replicate PEER TO PEER (P2P)

  8. ▸ Local first ▸ Put/Match the data next to you

    ▸ Query Locally ▸ Replicate a view of the data you need
  9. WHAT IS BARREL

  10. WHAT IS BARREL ▸ a document database ▸ document are

    JSON with attachments and links ▸ changes feed for document and indexes ▸ replication between any nodes in both way ▸ views (~ map) ▸ HTTP 1.1/2 API
  11. ▸ DATA: not just blobs ▸ Replicated APPs ▸ Couchapps

    but extended and revisited REPLICATED APPS
  12. DECONSTRUCT

  13. APPEND ONLY & MVCC Doc1 Doc2 Doc3 Doc4 Doc5 Doc6

    Doc7 Btree Node Btree Node Document revision Block btree node invalid data version
  14. ▸ Create a new file to remove the fragmentation ▸

    A race between copy and the addition of new data ▸ Require at least twice of the storage THE COMPACTION ISSUE
  15. DOCUMENT STORAGE ID 1 METADATA 1 ID 2 METADATA 2

    ID 3 METADATA 3 ID-Index SEQ 1 METADATA 1 SEQ 2 METADATA 2 SEQ 3 METADATA 3 Seq-Index Btree Node Btree Node Doc Indexed document DB file
  16. ▸ 2 indexes (btree): by sequence, by id ▸ 1

    index for local documents without conflict handling ▸ A revision tree is stored in indexes pointed to the revision offset ▸ The revision is stored in the file separately HOW ARE STORED DOCUMENTS
  17. ▸ Reverse index (map) ▸ Index using a function ▸

    Function in javascripts, erlang, .. ▸ Incremental index ▸ Retrieves changes (aka view changes) ▸ View are regrouped by groups (1 db file/group) VIEWS
  18. VIEW STORAGE DOCID View 1 KEY 1 SEQ 1 ADD

    KEY 2 SEQ 2 DEL View 2 KEY 1 SEQ 1 ADD Log-Index [KEY 1, DOCID] [VALUE, DOCREV, SEQ] [KEY 2, DOCID] [del, DOCREV, SEQ] [KEY 3, DOCID 2] [VALUE, DOCREV, SEQ] Key-Index [SEQ 1, KEY] [VALUE, DOCREV, SEQ] [SEQ 2, KEY 2] [del, DOCREV, SEQ] [SEQ 3, KEY]3 [VALUE, DOCREV, SEQ] SEQ-Index view
  19. REVISION TREE

  20. BUILT IN ERLANG /> <

  21. ▸ Write is slow ▸ Read should not being blocked

    by writes ▸ No shared memory ▸ No atomic integer trick ▸ Only actors and message passing ▸ Operations on a doc are atomic CHALLENGES
  22. READ/WRITE OPERATIONS DB STATE READER READER writer writer update share

    state
  23. ▸ LRU to cache blocks
 https://github.com/barrel-db/erlang-lru ▸ 1 File process,

    Operations are limited ▸ DB users are linked to the database process ▸ Optional Write buffer to reduce the latency ▸ Optional wal READ/WRITE OPERATIONS
  24. ▸ STORE SEGMENTS of data for compaction ▸ IO is

    "relatively" slow in erlang ▸ USE a “native KV store” as a nif. SPEEDUPS
  25. INDEX OPERATIONS View Group READER READER change reader indexer update

    share state send /collect changes DB get changes
  26. ▸ Credit Flow Based ▸ The View group keep the

    state ▸ View group is created on demande ▸ kept open until it has readers ▸ Indexer ask for updates ▸ Read functions (Map functions) are processed in // INDEX OPERATIONS
  27. ▸ Added 2 features: ▸ MOVE: move doc(s) to another

    node or database (like copy but with delet ▸ User hooks functions (run in background) using hooks:
 https://github.com/barrel-db/hooks ▸ Partition on demand ▸ Decision depends on the application needs NEW FUNCTIONS
  28. CHANGES HANDLER subscriber change dispatcher broadcast changes DB

  29. ▸ Use the sequence index ▸ changes load balancing ▸

    consumer subscribe on patterns (delete, update, …) ▸ Create changes Load Balancer on demand ▸ Allows remote nodes to subscribe to a queue ▸ Based on primer (release on March 2016) CHANGES EVENTS
  30. ▸ Use the sequence index ▸ changes load balancing ▸

    consumer subscribe on patterns (delete, update, …) ▸ Create changes Load Balancer on demand ▸ Allows remote nodes to subscribe to a queue ▸ Based on primer (release on March 2016) CHANGES EVENTS
  31. ▸ inherited the HTTP api in mochiweb ▸ small changes

    to makes the server more resilient ▸ chatterbox ▸ wip in cowboy. ▸ yaws ? HTTP API
  32. P2P

  33. ▸ Over HTTP ▸ Replication is the core ▸ Each

    nodes can replicate each others ▸ PUSH/PULL ▸ Chained replication P2P
  34. ▸ Based ont the change feed ▸ fetch the revisions

    and their attachments
 not present on the node ▸ continuous or not ▸ try to collect multiple docs at once ▸ use hackney:
 http://github.com/benoitc/hackney ▸ Use a Flow-based pattern instead of a classic pool REPLICATION
  35. REPLICATION OPERATIONS replication worker replication proxy fetch docs DB SOURCE

    get changes DB TARGET notify changes push docs
  36. ▸ Replication state is stored a least on one node

    ▸ checkpoints ▸ get the revisions not actually stored on the nodes (“_rev_diffs” ▸ the replication proxy maintains routes ▸ build replication chains, by replicating status REPLICATION
  37. None
  38. HTTPS://BARREL-DB.ORG Barrel HTTP://ENKIM.EU Enki Multimedia