Upgrade to Pro — share decks privately, control downloads, hide ads and more …

mongodb + ex.fm @ MongoPGH 2012

mongodb + ex.fm @ MongoPGH 2012

_id, padding factor, and bucketing, oh my! Slides from my talk at MongoPGH http://www.10gen.com/events/mongodb-pgh May 15, 2012

Discussion on hacker news http://news.ycombinator.com/item?id=3980091

Lucas Hrabovsky

May 15, 2012
Tweet

More Decks by Lucas Hrabovsky

Other Decks in Technology

Transcript

  1. _id and indexes •  Bad Ideas – ObjectId("4fb284…") – Big Compound Indexes

    – Long,VariableWidthStringsMissIndexes •  Good Ideas – Make _id mean something – Fixed Width Hashes – Use _id as a compound index
  2. activity feeds: first attempt db.user.feed.find({‘username’: ‘lucas’, ‘verb’: ‘love’}) .sort({‘created’: -1})

    {“_id”: “201109122304-lucas-dan- c7dede43…”, "username”: “lucas”, "created”: 201109122304, "actor”: “dan”, “verb”: “love”} Working just fine for 4MM documents, but getting slow…
  3. new version of activity feeds db.user.feed.find({‘vid’: /^lucas-/}) .sort({‘vid’: -1}) {“_id”:

    “201109122304-lucas-dan- c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”: lucas-love-201109122304, "actor”: “dan”} Fast for all 3 use cases!
  4. padding factor •  Variable document size •  Allocate for the

    latest and fattest •  Document moves •  Can be very inefficient •  More RAM! •  Pre-allocate to prevent moves
  5. unbounded embedded lists •  Useful for followers, favorites •  Good

    for a few things, bad for lots •  Constantly bumping up padding factor •  Lots of document moves
  6. a metaphor •  You run a coffee shop and can

    buy only one size of cup. Which size do you buy? •  On average, each customer has only one cup •  Heavy drinkers have hundreds of cups credit: Macintex macintex.deviantart.com
  7. bucketing! •  Split list across multiple documents •  Median number

    of items = bucket size •  Pre-allocate •  Easy seeking and traversal •  Much faster
  8. site.meta 1 site.songs 1 site.songs 2 site.meta 2 Allocated  and

     unused   Allocated  and  full  of  data   hey charts!
  9. same charts when using bucketing site.meta 1 site.songs 1 -2

    site.songs 1 - 1 site.songs 2 - 1 site.songs 2 -6 site.songs 2 - 3 site.songs 2 - 4 site.songs 2 - 5 site.songs 2 - 2 site.meta 2 Allocated  and  unused   Allocated  and  full  of  data  
  10. doesn’t work for everything… •  Picking right bucket size • 

    Defragging •  Random insertion – Easy for things you don’t much care about the order of – More difficult is you’re going to insert and change the order later
  11. micro documents db.site.songs.find({_id: / ^bfc25de08d964a8a41226c6016dd7753-/}). sort({_id:-1}) { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029114",

    ”s" : 18436532 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029113", ”s" : 18804590 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029112", ”s" : 18804591 }
  12. paying it back •  Bent mongoengine to make this easy

    •  Follow github.com/exfm •  Also added tooling for – Trace all queries – Aggregate tracing by request middleware – Raise exceptions when queries miss an index