Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Two Years of MongoDB at Sailthru

Two Years of MongoDB at Sailthru

Sailthru CTO, Ian White's presentation at MongoNYC 2012. Sailthru + MongoDB Scaling & Design

sailthru

June 01, 2012
Tweet

More Decks by sailthru

Other Decks in Technology

Transcript

  1. Sail ru • Behavioral communication and analytics platform • Powering

    relevancy: one-to-one personalization across email, web, mobile • Original idea: API-based transactional email • 3 engineers two years ago, now ~65 employees • Some Clients: Fab, Huffington Post, OpenSky, Patch, Thrillist, Refinery 29, Totsy, Business Insider, Savored, NY Observer, College Humor, Oscar De La Renta, Tippr, NY Post, American Media, Flavorpill, Codecademy, Ahalife, GroupCommerce, BustedTees, Lifebooker, BET, Newsweek/Daily Beast, turntable.fm
  2. Sail ru and MongoDB • MongoDB has been Sail ru’s

    production database since mid-2010 (first prototype was MySQL) • (I’ve been using MongoDB in production since 2008)
  3. JSON-Based DSL For Personalization (Zephyr) {* Page Format Logic *}

    {page_format = ""} {if skin_ads.left || skin_ads.top || skin_ads.right} {if skin_ads.left.vars.right_rail || skin_ads.top.vars.right_rail || skin_ads.right.vars.right_rail} {page_format = "skinned_piece_with_right_rail"} {else} {page_format = "skinned_piece"} {if skin_ads.left.vars.alignment == "left"} {block.header_extension = block.header_extension - (header_content_diff + 20)} {else} {block.header_extension = block.header_extension - (half_header_content_diff + 10)} {/if} {if 10 > block.header_extension} {block.header_extension = 0} {/if} {/if} {else} {if right_rail_ad} {page_format = "piece_with_right_rail"} {else} {page_format = "centered_piece"} {/if} {/if} {followingSlugs = filter(profile.vars.sellerSlugs, lambda sellerSlug: ! contains(['bluedot', 'clearance'], sellerSlug) && sellers[sellerSlug])}
  4. Sail ru Scale • 200 million user profiles • 40

    million emails sent per day • 1000 requests per second • 8 replica sets, 40 nodes • Billions of documents
  5. Sail ru Architecture • Critical services: API, link rewriting, onsite

    tracking/recommendations, email delivery, reporting/user interface • Uptime is critical, any downtime impacts our customers’ revenue • Infrastructure split between Amazon EC2 and colo (Peer1) • Java, LAMP, puppet, scribe, ActiveMQ
  6. Sail ru MongoDB Architecture • Different replica sets for different

    purposes (e.g. messages vs user profiles) • Largest logical collections are partitioned at e application level • Made sense for us as our data is naturally partitioned by customer
  7. How We Got To Mongo from MySQL • JSON is

    e lingua franca • Migrated one table at a time (very, very carefully) and ran bo for a while • Glad at’s long over wi • Simplified stack
  8. Advantages of MongoDB at Sail ru • Rapid development •

    Makes it easy to store flexible JSON- based customer input (many now use Mongo emselves) • Good performance • Encourages scalable approach • We know it well
  9. Basic mongod • mongod --dbpath /path/to/db --logpath /path/to/log/ mongodb.log --logappend

    --fork --rest --replSet main1 • Don’t ever run wi out replication • Don’t ever kill -9 • Don’t run wi out writing to a log • Run behind a firewall • Use journaling • Default oplog size seems fine
  10. Do The Simplest Thing That Could Possibly Work • Simplicity

    = flexibility = speed = scalability • Complexity is e enemy • The simpler it is, e more likely you can scale it
  11. Focus On The Big Wins • Three collection types represent

    almost all of our data storage • So a small win ere counts for much more an a big win elsewhere
  12. Monitoring Is Every ing • Users will surprise you •

    Systems will surprise you • Production systems are complex • Graph every ing you can, so you can see when some ing you did changed e pattern • Alerts when some ing’s wrong
  13. Some Things To Monitor • lock %, r/w queue size,

    load average • faults/sec: if is starts to creep up, you may be nearing exceeding working set • number of connections: could be driver or network connectivity problem • replication lag: usually load on primary • dataSize and indexSize
  14. Understand What Is Happening • Graph every ing you can

    • MMS is a great tool for diagnosing issues • But also Graphite / StatsD / Nagios • And don’t forget e log • explain() can shed light on pa ological queries
  15. Control What Is Happening • All our MongoDB access happens

    rough one in wrapper class which we wrote • If you use someone else’s lib or ORM, make sure (if you had to) you could do stuff like: • set timeouts or enable failfast retry add a $hint for all instances of a query queue writes elsewhere temporarily ensure all writes are “safe” for a collection
  16. Resiliency • If a write fails, can it be queued

    somewhere and tried later? (What if e queue fails?) • What if a queued write is failing indefinitely? • If a read fails, can you timeout quickly and try again on a different node? (failfast retry) • In some cases we might not care, in some cases lives might depend on it
  17. Amazon EC2 Gotchas • EBS volumes can go into degraded

    state unpredictably Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 1.50 0.00 16.00 10.67 135.13 19564.67 667.33 100.10 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 316.31 0.00 0.00 100.10 • Replica sets are designed to promote masters on downtime, not degraded state • Thankfully, it doesn’t last too long
  18. Amazon EC2 Gotchas • Must distribute across multiple Availability Zones

    for redundancy • But you will see connectivity problems between AZs sometimes • Have seen is cause replica sets to stepDown primaries randomly
  19. Develop Your Mental Model of MongoDB • You don’t need

    to know all e internals • But try to gain a working understanding of how MongoDB operates, especially RAM, indexes, and replication
  20. Big-Picture Design Questions • What is e data I want

    to store? • How will I want to use at data later? • How big will e data get? • If e answers are “I don’t know yet”, guess wi your best YAGNI
  21. Specific MongoDB Design Questions • Embed vs top-level collection? •

    Denormalize (double-store data)? • How many/which indexes? • Arrays vs hashes for embedding? • Implicit schema (field names and types)
  22. Favor Human- Readable Foreign Keys • DBRefs are a bit

    cumbersome, we never use em • Referencing by MongoId can mean doing extra lookups • Build human-readable references to save you doing lookups and manual joins • Just be mindful of space tradeoffs for readability
  23. Embed vs Top-Level Collections? • Major question of MongoDB schema

    design • If you can ask e question at all, you might want to err on e side of embedding • Don’t embed if e embedding could get huge • Don’t feel too bad about denormalizing by embedding AND storing in a top-level collection
  24. Typical Properties of Top-Level Collections • Independence: They don’t “belong”

    conceptually to ano er collection • Nouns: e building blocks of your system • Easily referenceable and updatable
  25. Embedding Pros • Fast retrieval of document wi related data

    • Atomic updates • “Ownership” of embedded document is obvious • Usually maps well to code structures
  26. Embedding Cons • Harder to get at, do mass queries

    • Does not size up infinitely, will hit 16MB limit • Hard to create references to embedded object • Limited ability to indexed-sort e embedded objects • Really huge objects are cumbersome and will have deserialization overhead
  27. Indexes • Indexes are a tradeoff • Keep your indexes

    as small as you can and maximize e value of e ones you do add • Only worry about index size for big (or potentially big) collections
  28. Take Advantage of Multiple-Field Indexes • If you have an

    index on {client_id: 1, email: 1 } • Then you also have e {client_id: 1} index “for free” • but not { email: 1}
  29. A Fun Gotcha We Hit (Multiple-Field Indexes) > db.test.save( {

    a: 1, b: ["t1", "t2", "t3"] } ); > db.test.save( { a: 1, b: ["t4", "t5"] } ); > db.test.ensureIndex( { a: 1, b: 1 } ); > db.test.find( { a: 1 } ).explain(); • Pop quiz: what is nscanned (number of objects scanned) going to be?
  30. A Fun Gotcha We Hit (Multiple-Field Indexes) > db.test.find( {

    a: 1 } ).explain(); { "cursor" : "BtreeCursor a_1_b_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 2, • Pop quiz: what is nscanned (number of objects scanned) going to be? > db.test.save( { a: 1, b: ["t1", "t2", "t3"] } ); > db.test.save( { a: 1, b: ["t4", "t5"] } ); > db.test.ensureIndex( { a: 1, b: 1 } ); > db.test.find( { a: 1 } ).explain();
  31. Use your _id • You must use an _id for

    every collection, which will cost you index size • So do some ing useful wi _id
  32. Take advantage of fast ^indexes • Messages have _ids like:

    32423.00000341 • Need all messages in blast 32423: • db.message.blast.find( { _id: /^32423\./ } ); • (Yeah, I know e \. is ugly. Don’t use a dot if you do is.)