Production CouchDB: Hard-earned knowledge

Slide 1

Slide 1 text

Production CouchDB Hard-earned knowledge Saturday, August 11, 12

Slide 2

Slide 2 text

Who are you, anyway? Keith Marcum, @keithmarcum, marcum.keith@gmail, [email protected], freenode/kmarcum With ShareableInk since 2010 ShareableInk has a relatively large multi-tenant CouchDB (122 databases, an assortment of gigs) Saturday, August 11, 12

Slide 3

Slide 3 text

Preface A lot of these lessons involved bumping up against bad corners of CouchDB and learning workarounds I’ll also talk about features that started out good and became great as we learned Saturday, August 11, 12

Slide 4

Slide 4 text

The Unfulﬁlled Promise of Map/Reduce The idea is to split the data across N workers, process in parallel, then aggregate Out of the box, CouchDB only has 1 worker WTF? You may be I/O bound anyway Saturday, August 11, 12

Slide 5

Slide 5 text

How did this happen? There are actually a couple of things wrong here Virtualized disks are slow :( CouchDB 1.2 adopted yajl, and picked up multi- threading along the way Not webscale Saturday, August 11, 12

Slide 6

Slide 6 text

Impossibly easy replication Think about your favorite DB replication tool (yes, you have one) How easy is it? CouchDB’s is easier Saturday, August 11, 12

Slide 7

Slide 7 text

Impossibly easy replication Continuous replication in 3 clicks Suddenly your staging environment is always current Saturday, August 11, 12

Slide 8

Slide 8 text

Horizontal scaling with replication It’s easy to imagine nginx in front of N co-replicating couches Replication isn’t instant You have entered the realm of eventual consistency Saturday, August 11, 12

Slide 9

Slide 9 text

Compaction is a big deal CouchDB is all about getting your data to disk as quickly as possible Append-only means no locking, but at the cost of disk space No, really, a lot of disk space Saturday, August 11, 12

Slide 10

Slide 10 text

Compaction is a big deal Databases and views can bloat somewhere in the neighborhood of 20x their post-compaction size If you don’t compact regularly, you’ll eventually have to spend huge amounts of I/O and CPU to ﬁx it CouchDB 1.2 has compaction daemons built in Saturday, August 11, 12

Slide 11

Slide 11 text

Ch-Ch-Ch _changes A stream of deltas as they happen Presents some interesting possibilities for AOP-style document monitoring It’s like a twitter feed for your database, but with less drama Saturday, August 11, 12

Slide 12

Slide 12 text

Ch-Ch-Ch _changes We actually have a process monitoring the all of our _changes feeds that we creatively named CouchMonitor CouchMonitor dispatches notiﬁcations to other parts of our ecosystem This is the magic behind continuous replication Saturday, August 11, 12

Slide 13

Slide 13 text

No non-indexed queries There comes a point in RDBMS when a table is too tall to query without knowing what indexes will be used CouchDB reaches that point almost immediately Views take longer to build the more documents you have, like indexes do Simple view on a 300k document db took 5 minutes to build (ask if you want more details on the benchmark) Saturday, August 11, 12

Slide 14

Slide 14 text

Default views by type A gem from CouchRestModel (maybe the only one) users/all Simple view that gets you back into application code Saturday, August 11, 12

Slide 15

Slide 15 text

To Recap CouchRestModel CouchDB Saturday, August 11, 12