Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Production CouchDB: Hard-earned knowledge

Production CouchDB: Hard-earned knowledge

CouchDB has matured a lot since it landed in 2005. The practices around it, however, have a long way to go. Over the last two years at ShareableInk, I've learned a lot about the good and bad aspects of CouchDB. This talk shares a few of those lessons so that you can use CouchDB with more confidence.

Jason Orendorff

August 11, 2012
Tweet

More Decks by Jason Orendorff

Other Decks in Programming

Transcript

  1. Who are you, anyway? Keith Marcum, @keithmarcum, marcum.keith@gmail, [email protected], freenode/kmarcum

    With ShareableInk since 2010 ShareableInk has a relatively large multi-tenant CouchDB (122 databases, an assortment of gigs) Saturday, August 11, 12
  2. Preface A lot of these lessons involved bumping up against

    bad corners of CouchDB and learning workarounds I’ll also talk about features that started out good and became great as we learned Saturday, August 11, 12
  3. The Unfulfilled Promise of Map/Reduce The idea is to split

    the data across N workers, process in parallel, then aggregate Out of the box, CouchDB only has 1 worker WTF? You may be I/O bound anyway Saturday, August 11, 12
  4. How did this happen? There are actually a couple of

    things wrong here Virtualized disks are slow :( CouchDB 1.2 adopted yajl, and picked up multi- threading along the way Not webscale Saturday, August 11, 12
  5. Impossibly easy replication Think about your favorite DB replication tool

    (yes, you have one) How easy is it? CouchDB’s is easier Saturday, August 11, 12
  6. Impossibly easy replication Continuous replication in 3 clicks Suddenly your

    staging environment is always current Saturday, August 11, 12
  7. Horizontal scaling with replication It’s easy to imagine nginx in

    front of N co-replicating couches Replication isn’t instant You have entered the realm of eventual consistency Saturday, August 11, 12
  8. Compaction is a big deal CouchDB is all about getting

    your data to disk as quickly as possible Append-only means no locking, but at the cost of disk space No, really, a lot of disk space Saturday, August 11, 12
  9. Compaction is a big deal Databases and views can bloat

    somewhere in the neighborhood of 20x their post-compaction size If you don’t compact regularly, you’ll eventually have to spend huge amounts of I/O and CPU to fix it CouchDB 1.2 has compaction daemons built in Saturday, August 11, 12
  10. Ch-Ch-Ch _changes A stream of deltas as they happen Presents

    some interesting possibilities for AOP-style document monitoring It’s like a twitter feed for your database, but with less drama Saturday, August 11, 12
  11. Ch-Ch-Ch _changes We actually have a process monitoring the all

    of our _changes feeds that we creatively named CouchMonitor CouchMonitor dispatches notifications to other parts of our ecosystem This is the magic behind continuous replication Saturday, August 11, 12
  12. No non-indexed queries There comes a point in RDBMS when

    a table is too tall to query without knowing what indexes will be used CouchDB reaches that point almost immediately Views take longer to build the more documents you have, like indexes do Simple view on a 300k document db took 5 minutes to build (ask if you want more details on the benchmark) Saturday, August 11, 12
  13. Default views by type A gem from CouchRestModel (maybe the

    only one) users/all Simple view that gets you back into application code Saturday, August 11, 12