Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Barrel: Build a P2P document database

Barrel: Build a P2P document database

Barrel (https://barrel-db.org) is a modern document-oriented database in Erlang focusing on data locality (put/match the data next to you) and P2P with an effort to maintain a compatibility with the Apache CouchDB API.

Barrel started as a fork of Apache CouchDB, another database in Erlang, but it quickly appeared that we needed to go further. Building a database in Erlang is indeed challenging. I/Os are handled differently from the other VM for example. Performance is always a trade-off versus the concurrency and the fault tolerance. On the other hand, Erlang, its vm, the OTP framework offer many competitive advantages that can help you to build a very effective database. So Barrel has then been rewritten to benefit from them.

This talk will first deconstruct a database and then focus how we can build one in Erlang using Barrel as an example. We will see which part probably needs to be in C, which one really fit well in Erlang… It will show you also how building a P2P protocol in Erlang is easy and help us to make barrel a true P2P database.

Benoit Chesneau

March 10, 2016
Tweet

More Decks by Benoit Chesneau

Other Decks in Programming

Transcript

  1. BARREL
    BUILD A P2P DOCUMENT

    ORIENTED DATABASE
    https://barrel-db.org
    Erlang Factory San Francisco 2016

    View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. VISION
    AND CONCEPT

    View Slide

  6. local database
    mobile
    sensor
    "cloud" database
    local
    database
    mobile
    DATA IS MOBILE

    View Slide

  7. share
    discover
    replicate
    PEER TO PEER (P2P)

    View Slide

  8. ▸ Local first
    ▸ Put/Match the data next to you
    ▸ Query Locally
    ▸ Replicate a view of the data you need

    View Slide

  9. WHAT
    IS BARREL

    View Slide

  10. WHAT IS BARREL
    ▸ a document database
    ▸ document are JSON with attachments and links
    ▸ changes feed for document and indexes
    ▸ replication between any nodes in both way
    ▸ views (~ map)
    ▸ HTTP 1.1/2 API

    View Slide

  11. ▸ DATA: not just blobs
    ▸ Replicated APPs
    ▸ Couchapps but extended and revisited
    REPLICATED APPS

    View Slide

  12. DECONSTRUCT

    View Slide

  13. APPEND ONLY & MVCC
    Doc1
    Doc2
    Doc3
    Doc4
    Doc5
    Doc6
    Doc7
    Btree
    Node
    Btree
    Node
    Document revision
    Block
    btree node
    invalid data
    version

    View Slide

  14. ▸ Create a new file to remove the fragmentation
    ▸ A race between copy and the addition of new data
    ▸ Require at least twice of the storage
    THE COMPACTION ISSUE

    View Slide

  15. DOCUMENT STORAGE
    ID 1 METADATA 1
    ID 2 METADATA 2
    ID 3 METADATA 3
    ID-Index
    SEQ 1 METADATA 1
    SEQ 2 METADATA 2
    SEQ 3 METADATA 3
    Seq-Index
    Btree
    Node
    Btree
    Node
    Doc
    Indexed document
    DB file

    View Slide

  16. ▸ 2 indexes (btree): by sequence, by id
    ▸ 1 index for local documents without conflict handling
    ▸ A revision tree is stored in indexes pointed to the revision
    offset
    ▸ The revision is stored in the file separately
    HOW ARE STORED DOCUMENTS

    View Slide

  17. ▸ Reverse index (map)
    ▸ Index using a function
    ▸ Function in javascripts, erlang, ..
    ▸ Incremental index
    ▸ Retrieves changes (aka view changes)
    ▸ View are regrouped by groups (1 db file/group)
    VIEWS

    View Slide

  18. VIEW STORAGE
    DOCID View 1 KEY 1 SEQ 1 ADD
    KEY 2 SEQ 2 DEL
    View 2 KEY 1 SEQ 1 ADD
    Log-Index
    [KEY 1, DOCID] [VALUE, DOCREV, SEQ]
    [KEY 2, DOCID] [del, DOCREV, SEQ]
    [KEY 3, DOCID 2] [VALUE, DOCREV, SEQ]
    Key-Index
    [SEQ 1, KEY] [VALUE, DOCREV, SEQ]
    [SEQ 2, KEY 2] [del, DOCREV, SEQ]
    [SEQ 3, KEY]3 [VALUE, DOCREV, SEQ]
    SEQ-Index
    view

    View Slide

  19. REVISION TREE

    View Slide

  20. BUILT
    IN ERLANG
    />
    <

    View Slide

  21. ▸ Write is slow
    ▸ Read should not being blocked by writes
    ▸ No shared memory
    ▸ No atomic integer trick
    ▸ Only actors and message passing
    ▸ Operations on a doc are atomic
    CHALLENGES

    View Slide

  22. READ/WRITE OPERATIONS
    DB STATE
    READER
    READER
    writer
    writer
    update
    share state

    View Slide

  23. ▸ LRU to cache blocks

    https://github.com/barrel-db/erlang-lru
    ▸ 1 File process, Operations are limited
    ▸ DB users are linked to the database process
    ▸ Optional Write buffer to reduce the latency
    ▸ Optional wal
    READ/WRITE OPERATIONS

    View Slide

  24. ▸ STORE SEGMENTS of data for compaction
    ▸ IO is "relatively" slow in erlang
    ▸ USE a “native KV store” as a nif.
    SPEEDUPS

    View Slide

  25. INDEX OPERATIONS
    View Group
    READER
    READER
    change reader
    indexer
    update
    share state
    send /collect changes
    DB
    get changes

    View Slide

  26. ▸ Credit Flow Based
    ▸ The View group keep the state
    ▸ View group is created on demande
    ▸ kept open until it has readers
    ▸ Indexer ask for updates
    ▸ Read functions (Map functions) are processed in //
    INDEX OPERATIONS

    View Slide

  27. ▸ Added 2 features:
    ▸ MOVE: move doc(s) to another node or database (like copy but with delet
    ▸ User hooks functions (run in background) using hooks:

    https://github.com/barrel-db/hooks
    ▸ Partition on demand
    ▸ Decision depends on the application needs
    NEW FUNCTIONS

    View Slide

  28. CHANGES HANDLER
    subscriber
    change dispatcher
    broadcast changes
    DB

    View Slide

  29. ▸ Use the sequence index
    ▸ changes load balancing
    ▸ consumer subscribe on patterns (delete, update, …)
    ▸ Create changes Load Balancer on demand
    ▸ Allows remote nodes to subscribe to a queue
    ▸ Based on primer (release on March 2016)
    CHANGES EVENTS

    View Slide

  30. ▸ Use the sequence index
    ▸ changes load balancing
    ▸ consumer subscribe on patterns (delete, update, …)
    ▸ Create changes Load Balancer on demand
    ▸ Allows remote nodes to subscribe to a queue
    ▸ Based on primer (release on March 2016)
    CHANGES EVENTS

    View Slide

  31. ▸ inherited the HTTP api in mochiweb
    ▸ small changes to makes the server more resilient
    ▸ chatterbox
    ▸ wip in cowboy.
    ▸ yaws ?
    HTTP API

    View Slide

  32. P2P

    View Slide

  33. ▸ Over HTTP
    ▸ Replication is the core
    ▸ Each nodes can replicate each others
    ▸ PUSH/PULL
    ▸ Chained replication
    P2P

    View Slide

  34. ▸ Based ont the change feed
    ▸ fetch the revisions and their attachments

    not present on the node
    ▸ continuous or not
    ▸ try to collect multiple docs at once
    ▸ use hackney:

    http://github.com/benoitc/hackney
    ▸ Use a Flow-based pattern instead of a classic pool
    REPLICATION

    View Slide

  35. REPLICATION OPERATIONS
    replication
    worker
    replication proxy
    fetch docs
    DB SOURCE
    get changes
    DB TARGET
    notify
    changes
    push docs

    View Slide

  36. ▸ Replication state is stored a least on one node
    ▸ checkpoints
    ▸ get the revisions not actually stored on the nodes (“_rev_diffs”
    ▸ the replication proxy maintains routes
    ▸ build replication chains, by replicating status
    REPLICATION

    View Slide

  37. View Slide

  38. HTTPS://BARREL-DB.ORG
    Barrel
    HTTP://ENKIM.EU
    Enki Multimedia

    View Slide