Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CouchDB 4.0: 1 + 2 = 4!

Joan Touzet
October 21, 2019

CouchDB 4.0: 1 + 2 = 4!

CouchDB 4.0 unifies the best features of CouchDB 2.x with the consistent semantics of CouchDB 1.x. Learn more in this presentation from the CouchDB Users' Group in Berlin, Germany from October 2019.

Joan Touzet

October 21, 2019
Tweet

More Decks by Joan Touzet

Other Decks in Technology

Transcript

  1. Joan Touzet ❦ https://atypical.net/ ❦ wohali

    View Slide

  2. CouchDB & Apache
    Contributor / User (~2008)
    Committer (Feb 2013)
    PMC member (April 2014)
    ASF Member (2015)
    Apache Board of Directors (2019)
    2

    View Slide

  3. 3

    View Slide

  4. • The “original” NoSQL (…but we were provably first!)
    • Document-oriented structure
    • Map-Reduce
    • Streaming changes feeds
    4

    View Slide

  5. 5

    View Slide

  6. • Couch file
    – holds a binary tree (B-Tree)
    – 1 file per database or view group (1 design document =1 view group)
    – Databases: indexed by ID and by sequence number
    – Views: holds one binary tree, key space per view in a design doc
    • replicator – “just a client process”.
    – Source  Target. Multi-master & bidirectional.
    • http layer + authentication
    6

    View Slide

  7. CouchDB 1.x, by itself, was a fully consistent database.
    Unintentionally.
    When replicating with another DB, it was eventually
    consistent, but with document conflicts.
    7

    View Slide

  8. 8
    bob
    v1
    bob
    v1
    bob
    v2a
    bob
    v2b
    bob
    v2a
    v2b
    bob
    v2a
    v2b

    View Slide

  9. 9
    Clustered HTTP
    Clustered CouchDB
    API Layer
    (Dynamo Model)
    Low-latency,
    Highly parallel
    Remote call (RPC) library
    Magic “consistent”
    shard mapping
    database
    “basically” CouchDB 1.x,
    but with enhancements

    View Slide

  10. CouchDB 2.x has native clustering functionality
    “Internal replication” is optimized for this process
    CouchDB 2.x shards the database for optimization
    CouchDB has no leader election or “global coordinator”!
    10

    View Slide

  11. 11
    q = # of shards
    (default: 8)
    (4 here for a good picture)
    n = number of replicas
    (default: 3)

    View Slide

  12. 12
    CouchDB 1.x CouchDB 2.x
    HTTP
    1
    2
    3
    Erlang

    View Slide

  13. 13
    bob v1
    bob v1
    bob v1
    00:00.000

    View Slide

  14. 14
    bob v2a
    bob v1
    bob v2b
    00:01.000

    View Slide

  15. 15
    bob v2a
    bob v2b
    bob v2b
    00:01.001

    View Slide

  16. 16
    bob v2a
    bob v2b
    bob v2b

    00:01.002

    View Slide

  17. 17
    bob v2a
    bob v2b
    bob v2b


    00:01.003

    View Slide

  18. 18
    bob v2a
    bob v2b
    bob v2b
    copies = 2
    n = 3
    Quorum OK
    copies = 1
    n = 3
    Quorum NG
    00:01.004
    Quorum:
    ≥ +1
    2
    copies

    View Slide

  19. 19
    bob v2a
    bob v2b
    bob v2b
    copies = 2
    n = 3
    Quorum OK
    copies = 1
    n = 3
    Quorum NG
    201 Created
    00:01.009
    202 Accepted

    View Slide

  20. 20
    bob v2a
    bob v2b
    bob v2a
    bob v2b
    bob v2a
    bob v2b
    00:01.010
    bob v2a
    “arbitrarily”
    wins!

    View Slide

  21. You bet. But that’s eventual consistency for you.
    Q: What if “Blue” and “Purple” are the same app with 2 consecutive writes?!
    Applications need to design around this:
    • Single application writer per document, or
    • Clearly defined hand-offs between different stages of processing, or
    • Stream-based model (documents never modified), or
    • Database-per-user model
    21

    View Slide

  22. CouchDB 3.0 will be “the best CouchDB 2.x,” adding:
    • Per-document access restrictions
    • Automated shard splitting
    • Automatic view warming
    • Better automatic compaction
    • “Ready for Lucene Search” (without a recompile)
    • Optional highly tunable I/O queue (IOQ2)
    • …plus a long-term support (LTS) strategy
    • …and the same semantics as CouchDB 2.x.
    22

    View Slide

  23. CouchDB 4.0 will have a new storage layer based on FoundationDB.
    • Fully consistent, distributed data store
    • 10 years in the making by a dedicated development team
    • Intended as the underlying infrastructure for other Databases only
    • CouchDB implemented as a “Layer” on top of FoundationDB
    – CouchDB 4.0 is a completely stateless application layer for FoundationDB.
    23

    View Slide

  24. 24
    • CouchDB FDB Layer implements
    CouchDB (1.x) semantics and indexes
    • FDB is a consistent MVCC key-value
    store using PAXOS coordination and a
    transactional authority
    • FDB can be a single instance (on your
    Raspberry Pi or laptop) or a cluster of
    hundreds of Linux machines

    View Slide

  25. • Written in C++, using actor-based concurrency (very similar to Erlang)
    • Uses ACID-compliant transactions
    – This allows us to bring back CouchDB 1.x semantics! (And keep our ‘crash-proof’ design.)
    – User-visible transactions may come to a future CouchDB!
    • Imposes some restrictions:
    – 10MB per transaction
    – 5 seconds per transaction
    – Keys and values have size restrictions (10k and 100k respectively)
    • CouchDB documents will be broken up into multiple FoundationDB keys and values
    25

    View Slide

  26. • CouchDB 4.0 will have:
    – CouchDB 1.0 semantics
    – CouchDB 2.0 clustering
    – Plus more new features yet to be announced.
    26

    View Slide

  27. 27
    Joan Touzet ❦ https://atypical.net/ ❦ wohali

    View Slide