Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoNYC 2012: Two Years of MongoDB at Sailthru...

mongodb
May 25, 2012
1.9k

MongoNYC 2012: Two Years of MongoDB at Sailthru: Scaling and Design

MongoNYC 2012: Two Years of MongoDB at Sailthru: Scaling and Design, Ian White, Sailthru. Sailthru is a behavioral communication and analytics platform focusing on email, onsite, and mobile personalization. Sailthru stores terabytes of data for hundreds of millions of end users and billions of messages, all in MongoDB. Since we presented last year, we've had 10x growth and experienced lots of challenges scaling on AWS. We've learned lots of lessons about building and designing for MongoDB at scale, and we'll share them.

mongodb

May 25, 2012
Tweet

Transcript

  1. Sailthru Behavioral communication and analytics platform Powering relevancy: one-to-one personalization

    across email, web, mobile Original idea: API-based transactional email 3 engineers two years ago, now ~65 employees Some Clients: Fab, Huffington Post, OpenSky, Patch, Thrillist, Refinery 29, Totsy, Business Insider, Savored, NY Observer, College Humor, Oscar De La Renta, Tippr, NY Post, American Media, • • • • •
  2. Sailthru and MongoDB MongoDB has been Sailthru’s production database since

    mid-2010 (first prototype was MySQL) (I’ve been using MongoDB in production since 2008) • •
  3. JSON-Based DSL For Personalization (Zephyr) {* Page Format Logic *}

    {page_format = ""} {if skin_ads.left || skin_ads.top || skin_ads.right} {if skin_ads.left.vars.right_rail || skin_ads.top.vars.right_rail || skin_ads.right.vars.right_rail} {page_format = "skinned_piece_with_right_rail"} {else} {page_format = "skinned_piece"} {if skin_ads.left.vars.alignment == "left"} {block.header_extension = block.header_extension - (header_content_diff + 20)} {else} {block.header_extension = block.header_extension - (half_header_content_diff + 10)} {/if} {if 10 > block.header_extension} {block.header_extension = 0} {/if} {/if} {else} {if right_rail_ad} {page_format = "piece_with_right_rail"} {else} {page_format = "centered_piece"} {/if} {/if} {followingSlugs = filter(profile.vars.sellerSlugs, lambda sellerSlug: !contains(['bluedot', 'clearance'], sellerSlug) && sellers[sellerSlug])}
  4. Sailthru Scale 200 million user profiles 40 million emails sent

    per day 1000 requests per second 8 replica sets, 40 nodes Billions of documents • • • • •
  5. Sailthru Architecture Critical services: API, link rewriting, onsite tracking/recommendations, email

    delivery, reporting/user interface Uptime is critical, any downtime impacts our customers’ revenue Infrastructure split between Amazon EC2 and colo (Peer1) Java, LAMP, puppet, scribe, ActiveMQ • • • •
  6. Sailthru MongoDB Architecture Different replica sets for different purposes (e.g.

    messages vs user profiles) Largest logical collections are partitioned at the application level Made sense for us as our data is naturally partitioned by customer • • •
  7. How We Got To Mongo from MySQL JSON is the

    lingua franca Migrated one table at a time (very, very carefully) and ran both for a while Glad that’s long over with Simplified stack • • • •
  8. Advantages of MongoDB at Sailthru Rapid development Makes it easy

    to store flexible JSON- based customer input (many now use Mongo themselves) Good performance Encourages scalable approach We know it well • • • • •
  9. Basic mongod mongod --dbpath /path/to/db --logpath /path/to/log/mongodb.log --logappend --fork --rest

    -- replSet main1 Don’t ever run without replication Don’t ever kill -9 Don’t run without writing to a log Run behind a firewall Use journaling • • • • • •
  10. Do The Simplest Thing That Could Possibly Work Simplicity =

    flexibility = speed = scalability Complexity is the enemy The simpler it is, the more likely you can scale it • • •
  11. Focus On The Big Wins Three collection types represent almost

    all of our data storage So a small win there counts for much more than a big win elsewhere • •
  12. Monitoring Is Everything Users will surprise you Systems will surprise

    you Production systems are complex Graph everything you can, so you can see when something you did changed the pattern Alerts when something’s wrong • • • • •
  13. Some Things To Monitor lock %, r/w queue size, load

    average faults/sec: if this starts to creep up, you may be nearing exceeding working set number of connections: could be driver or network connectivity problem replication lag: usually load on primary dataSize and indexSize • • • • •
  14. Understand What Is Happening Graph everything you can MMS is

    a great tool for diagnosing issues But also Graphite / StatsD / Nagios And don’t forget the log explain() can shed light on pathological • • • • •
  15. Control What Is Happening All our MongoDB access happens through

    one thin wrapper class which we wrote If you use someone else’s lib or ORM, make sure (if you had to) you could do stuff like: set timeouts or enable failfast retry add a $hint for all instances of a query queue writes elsewhere temporarily ensure all writes are “safe” for a collection • • •
  16. Resiliency If a write fails, can it be queued somewhere

    and tried later? (What if the queue fails?) What if a queued write is failing indefinitely? If a read fails, can you timeout quickly and try again on a different node? (failfast retry) In some cases we might not care, in some • • • •
  17. Amazon EC2 Gotchas EBS volumes can go into degraded state

    unpredictably • Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 1.50 0.00 16.00 10.67 135.13 19564.67 667.33 100.10 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 316.31 0.00 0.00 100.10 Replica sets are designed to promote masters on downtime, not degraded state •
  18. Amazon EC2 Gotchas Must distribute across multiple Availability Zones for

    redundancy But you will see connectivity problems between AZs sometimes Have seen this cause replica sets to stepDown primaries randomly • • •
  19. Develop Your Mental Model of MongoDB You don’t need to

    know all the internals But try to gain a working understanding of how MongoDB operates, especially RAM, indexes, and replication • •
  20. Big-Picture Design Questions What is the data I want to

    store? How will I want to use that data later? How big will the data get? If the answers are “I don’t know yet”, guess with your best YAGNI • • • •
  21. Specific MongoDB Design Questions Embed vs top-level collection? Denormalize (double-store

    data)? How many/which indexes? Arrays vs hashes for embedding? Implicit schema (field names and types) • • • • •
  22. Favor Human-Readable Foreign Keys DBRefs are a bit cumbersome, we

    never use em Referencing by MongoId can mean doing extra lookups Build human-readable references to save you doing lookups and manual joins Just be mindful of space tradeoffs for readability • • • •
  23. Embed vs Top-Level Collections? Major question of MongoDB schema design

    If you can ask the question at all, you might want to err on the side of embedding Don’t embed if the embedding could get huge Don’t feel too bad about denormalizing by embedding AND storing in a top-level collection • • • •
  24. Typical Properties of Top-Level Collections Independence: They don’t “belong” conceptually

    to another collection Nouns: the building blocks of your system Easily referenceable and updatable • • •
  25. Embedding Pros Fast retrieval of document with related data Atomic

    updates “Ownership” of embedded document is obvious Usually maps well to code structures • • • •
  26. Embedding Cons Harder to get at, do mass queries Does

    not size up infinitely, will hit 16MB limit Hard to create references to embedded object Limited ability to indexed-sort the embedded objects Really huge objects are cumbersome and will have deserialization overhead • • • • •
  27. Indexes Indexes are a tradeoff Keep your indexes as small

    as you can and maximize the value of the ones you do add Only worry about index size for big (or potentially big) collections • • •
  28. Take Advantage of Multiple-Field Indexes If you have an index

    on {client_id: 1, email: 1 } Then you also have the {client_id: 1} index “for free” but not { email: 1} • • •
  29. A Fun Gotcha We Hit (Multiple-Field Indexes) > db.test.save( {

    a: 1, b: ["t1", "t2", "t3"] } ); > db.test.save( { a: 1, b: ["t4", "t5"] } ); > db.test.ensureIndex( { a: 1, b: 1 } ); > db.test.find( { a: 1 } ).explain(); Pop quiz: what is nscanned (number of objects scanned) going to be? •
  30. A Fun Gotcha We Hit (Multiple-Field Indexes) > db.test.find( {

    a: 1 } ).explain(); { "cursor" : "BtreeCursor a_1_b_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 2, Pop quiz: what is nscanned (number of objects scanned) going to be? • > db.test.save( { a: 1, b: ["t1", "t2", "t3"] } ); > db.test.save( { a: 1, b: ["t4", "t5"] } ); > db.test.ensureIndex( { a: 1, b: 1 } ); > db.test.find( { a: 1 } ).explain();
  31. Use your _id You must use an _id for every

    collection, which will cost you index size So do something useful with _id • •
  32. Take advantage of fast ^indexes Messages have _ids like: 32423.00000341

    Need all messages in blast 32423: db.message.blast.find( { _id: /^32423\./ } ); (Yeah, I know the \. is ugly. Don’t use a dot if you do this.) • • • •