Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache CouchDB

Apache CouchDB

This presentation gives an overview of the Apache CouchDB project. It explains CouchDB architecture in relation to replication, usage, its UI and the platforms it is available for.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Mike Frampton

May 21, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache CouchDB ? • A document oriented NoSQL

    database • Open sourced / Apache 2.0 license • Written in Erlang, JavaScript, C, C++ • Stores documents using JSON • Single node or cluster • Takes offline first approach / uses bi directional replication • DB access via HTTP requests
  2. How Does CouchDB Work? • It provides ACID support (Atomic

    Consistent Isolated Durable) • It has a crash-only design – No shutdown, just termination • CouchDB uses Multi-Version Concurrency Control (MVCC) • OS crash or power failure – Partially flushed updates are simply forgotten (or) – Surviving copy of previous identical headers remains – Ensures coherency of all previously committed data • Crash friendly design
  3. Cross Platform • Available for – Linux / Unix –

    FreeBSD – Windows – Mac OSX – Cloud – Mobile ( IOS / Android – Lite version ) • Install from binary or source • Install via Docker / Snap • Install on Kubernetes
  4. CouchDB Replication • Synchronise two copies of same database •

    One source and one target database • Can be on same or different CouchDB instances • Can be one way or bi directional ( Master – Master ) • Controlling documents to replicate – Local documents never replicated – Filter functions to select documents – Use Selector Objects • A query object to test document • For replication
  5. CouchDB Cluster • CouchDB can be single node or clustered

    • Cluster defined by – Number of shards or parts of database (q) – Number of document copies / replicas (n) • Since V3 default is q=2, n=3 – Each database (and secondary index) – Split into 2 shards, with 3 replicas per shard – For a total of 6 shard replica files
  6. CouchDB Cluster • Replicas add failure resistance • Some nodes

    can be offline • Without everything crashing down – n=1 - All nodes must be up. – n=2 - Any 1 node can be down – n=3 - Any 2 nodes can be down • Using default values and a single database – q x n = 2 x 3 = 6 nodes – A maximum of six nodes – Defines maximum nodes for horizontal scaling
  7. CouchDB UI • Fauxton CouchDB UI simplifies access • Manage

    cluster or single node • Manage CouchDB – Databases – Active tasks – Configuration – Replication – Users • Access documentation • Verify CouchDB install
  8. CouchDB + CAP Theorum • CAP Theorum examines – Consistency

    • All database clients see the same data, even with concurrent updates. – Availability • All database clients are able to access some version of the data. – Partition tolerance • The database can be split over multiple servers • CouchDB provides eventual consistency by – By balancing partition tolerance and availability
  9. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  10. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration