Slide 1

Slide 1 text

Two Years of MongoDB at Ian White @eonwhite MongoNYC 2012 5/24/12 Scaling and Design

Slide 2

Slide 2 text

Sailthru Behavioral communication and analytics platform Powering relevancy: one-to-one personalization across email, web, mobile Original idea: API-based transactional email 3 engineers two years ago, now ~65 employees Some Clients: Fab, Huffington Post, OpenSky, Patch, Thrillist, Refinery 29, Totsy, Business Insider, Savored, NY Observer, College Humor, Oscar De La Renta, Tippr, NY Post, American Media, • • • • •

Slide 3

Slide 3 text

Sailthru and MongoDB MongoDB has been Sailthru’s production database since mid-2010 (first prototype was MySQL) (I’ve been using MongoDB in production since 2008) • •

Slide 4

Slide 4 text

User Profile Data email onsite mobile user profile purchase customer specific geo

Slide 5

Slide 5 text

Content Tailored To User Interests {if horizon_interest(‘facebook’) >= 2}

Slide 6

Slide 6 text

User Profile + Content + Template = Message content template message end user user profile

Slide 7

Slide 7 text

JSON-Based DSL For Personalization (Zephyr) {* Page Format Logic *} {page_format = ""} {if skin_ads.left || skin_ads.top || skin_ads.right} {if skin_ads.left.vars.right_rail || skin_ads.top.vars.right_rail || skin_ads.right.vars.right_rail} {page_format = "skinned_piece_with_right_rail"} {else} {page_format = "skinned_piece"} {if skin_ads.left.vars.alignment == "left"} {block.header_extension = block.header_extension - (header_content_diff + 20)} {else} {block.header_extension = block.header_extension - (half_header_content_diff + 10)} {/if} {if 10 > block.header_extension} {block.header_extension = 0} {/if} {/if} {else} {if right_rail_ad} {page_format = "piece_with_right_rail"} {else} {page_format = "centered_piece"} {/if} {/if} {followingSlugs = filter(profile.vars.sellerSlugs, lambda sellerSlug: !contains(['bluedot', 'clearance'], sellerSlug) && sellers[sellerSlug])}

Slide 8

Slide 8 text

Sailthru Scale 200 million user profiles 40 million emails sent per day 1000 requests per second 8 replica sets, 40 nodes Billions of documents • • • • •

Slide 9

Slide 9 text

Sailthru Architecture Critical services: API, link rewriting, onsite tracking/recommendations, email delivery, reporting/user interface Uptime is critical, any downtime impacts our customers’ revenue Infrastructure split between Amazon EC2 and colo (Peer1) Java, LAMP, puppet, scribe, ActiveMQ • • • •

Slide 10

Slide 10 text

Sailthru MongoDB Architecture Different replica sets for different purposes (e.g. messages vs user profiles) Largest logical collections are partitioned at the application level Made sense for us as our data is naturally partitioned by customer • • •

Slide 11

Slide 11 text

How We Got To Mongo from MySQL JSON is the lingua franca Migrated one table at a time (very, very carefully) and ran both for a while Glad that’s long over with Simplified stack • • • •

Slide 12

Slide 12 text

Advantages of MongoDB at Sailthru Rapid development Makes it easy to store flexible JSON- based customer input (many now use Mongo themselves) Good performance Encourages scalable approach We know it well • • • • •

Slide 13

Slide 13 text

Basic mongod mongod --dbpath /path/to/db --logpath /path/to/log/mongodb.log --logappend --fork --rest -- replSet main1 Don’t ever run without replication Don’t ever kill -9 Don’t run without writing to a log Run behind a firewall Use journaling • • • • • •

Slide 14

Slide 14 text

SCALING AND OPERATIONS

Slide 15

Slide 15 text

Do The Simplest Thing That Could Possibly Work Simplicity = flexibility = speed = scalability Complexity is the enemy The simpler it is, the more likely you can scale it • • •

Slide 16

Slide 16 text

Focus On The Big Wins Three collection types represent almost all of our data storage So a small win there counts for much more than a big win elsewhere • •

Slide 17

Slide 17 text

Monitoring Is Everything Users will surprise you Systems will surprise you Production systems are complex Graph everything you can, so you can see when something you did changed the pattern Alerts when something’s wrong • • • • •

Slide 18

Slide 18 text

Some Things To Monitor lock %, r/w queue size, load average faults/sec: if this starts to creep up, you may be nearing exceeding working set number of connections: could be driver or network connectivity problem replication lag: usually load on primary dataSize and indexSize • • • • •

Slide 19

Slide 19 text

Understand What Is Happening Graph everything you can MMS is a great tool for diagnosing issues But also Graphite / StatsD / Nagios And don’t forget the log explain() can shed light on pathological • • • • •

Slide 20

Slide 20 text

Control What Is Happening All our MongoDB access happens through one thin wrapper class which we wrote If you use someone else’s lib or ORM, make sure (if you had to) you could do stuff like: set timeouts or enable failfast retry add a $hint for all instances of a query queue writes elsewhere temporarily ensure all writes are “safe” for a collection • • •

Slide 21

Slide 21 text

Resiliency If a write fails, can it be queued somewhere and tried later? (What if the queue fails?) What if a queued write is failing indefinitely? If a read fails, can you timeout quickly and try again on a different node? (failfast retry) In some cases we might not care, in some • • • •

Slide 22

Slide 22 text

Amazon EC2 Gotchas EBS volumes can go into degraded state unpredictably • Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 1.50 0.00 16.00 10.67 135.13 19564.67 667.33 100.10 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 316.31 0.00 0.00 100.10 Replica sets are designed to promote masters on downtime, not degraded state •

Slide 23

Slide 23 text

Amazon EC2 Gotchas Must distribute across multiple Availability Zones for redundancy But you will see connectivity problems between AZs sometimes Have seen this cause replica sets to stepDown primaries randomly • • •

Slide 24

Slide 24 text

DESIGN (and a couple tricks)

Slide 25

Slide 25 text

Develop Your Mental Model of MongoDB You don’t need to know all the internals But try to gain a working understanding of how MongoDB operates, especially RAM, indexes, and replication • •

Slide 26

Slide 26 text

Big-Picture Design Questions What is the data I want to store? How will I want to use that data later? How big will the data get? If the answers are “I don’t know yet”, guess with your best YAGNI • • • •

Slide 27

Slide 27 text

Specific MongoDB Design Questions Embed vs top-level collection? Denormalize (double-store data)? How many/which indexes? Arrays vs hashes for embedding? Implicit schema (field names and types) • • • • •

Slide 28

Slide 28 text

Favor Human-Readable Foreign Keys DBRefs are a bit cumbersome, we never use em Referencing by MongoId can mean doing extra lookups Build human-readable references to save you doing lookups and manual joins Just be mindful of space tradeoffs for readability • • • •

Slide 29

Slide 29 text

Embed vs Top-Level Collections? Major question of MongoDB schema design If you can ask the question at all, you might want to err on the side of embedding Don’t embed if the embedding could get huge Don’t feel too bad about denormalizing by embedding AND storing in a top-level collection • • • •

Slide 30

Slide 30 text

Typical Properties of Top-Level Collections Independence: They don’t “belong” conceptually to another collection Nouns: the building blocks of your system Easily referenceable and updatable • • •

Slide 31

Slide 31 text

Embedding Pros Fast retrieval of document with related data Atomic updates “Ownership” of embedded document is obvious Usually maps well to code structures • • • •

Slide 32

Slide 32 text

Embedding Cons Harder to get at, do mass queries Does not size up infinitely, will hit 16MB limit Hard to create references to embedded object Limited ability to indexed-sort the embedded objects Really huge objects are cumbersome and will have deserialization overhead • • • • •

Slide 33

Slide 33 text

Indexes Indexes are a tradeoff Keep your indexes as small as you can and maximize the value of the ones you do add Only worry about index size for big (or potentially big) collections • • •

Slide 34

Slide 34 text

Take Advantage of Multiple-Field Indexes If you have an index on {client_id: 1, email: 1 } Then you also have the {client_id: 1} index “for free” but not { email: 1} • • •

Slide 35

Slide 35 text

A Fun Gotcha We Hit (Multiple-Field Indexes) > db.test.save( { a: 1, b: ["t1", "t2", "t3"] } ); > db.test.save( { a: 1, b: ["t4", "t5"] } ); > db.test.ensureIndex( { a: 1, b: 1 } ); > db.test.find( { a: 1 } ).explain(); Pop quiz: what is nscanned (number of objects scanned) going to be? •

Slide 36

Slide 36 text

A Fun Gotcha We Hit (Multiple-Field Indexes) > db.test.find( { a: 1 } ).explain(); { "cursor" : "BtreeCursor a_1_b_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 2, Pop quiz: what is nscanned (number of objects scanned) going to be? • > db.test.save( { a: 1, b: ["t1", "t2", "t3"] } ); > db.test.save( { a: 1, b: ["t4", "t5"] } ); > db.test.ensureIndex( { a: 1, b: 1 } ); > db.test.find( { a: 1 } ).explain();

Slide 37

Slide 37 text

Use your _id You must use an _id for every collection, which will cost you index size So do something useful with _id • •

Slide 38

Slide 38 text

Take advantage of fast ^indexes Messages have _ids like: 32423.00000341 Need all messages in blast 32423: db.message.blast.find( { _id: /^32423\./ } ); (Yeah, I know the \. is ugly. Don’t use a dot if you do this.) • • • •

Slide 39

Slide 39 text

Questions? Looking for a job? [email protected] @eonwhite