Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design at Scale - Eliot Horowitz, 10gen

mongodb
October 05, 2011

Schema Design at Scale - Eliot Horowitz, 10gen

MongoBoston 2011

Schema design is a critical step in making sure an application scales well. There are considerations for reads and writes, both with and without... sharding. We'll go through a few use cases and examine how difference schemas impact performance.

mongodb

October 05, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Schema • Single biggest performance factor • More choices than

    in an RDBMS • Embedding, index design, shard keys Wednesday, October 5, 2011
  2. Blog Post - Embedded { _id : “/post/eliot/2011-05-24/1”), author :

    "eliot", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03 GMT-0700", text : "Best Post Ever!" } ]} Wednesday, October 5, 2011
  3. Blog Post - Not Embedded blog.posts { _id : “/post/eliot/2011-05-24/1”),

    author : "eliot", text : "About MongoDB...", tags : [ "tech", "databases" ] } blog.comments { post : “/post/eliot/2011-05-24/1” author : "Fred", date : "May 24 2011", text : "Best Post Ever!" } Wednesday, October 5, 2011
  4. Embedding • Great for read performance • One seek to

    load entire object • One roundtrip to database • Writes can be slow if adding to objects all the time • Should you embed comments? Wednesday, October 5, 2011
  5. Blog Post - Hybrid blog.comments { _id : “/post/eliot/2011-05-24/1---1” comments

    : [ { author : "Fred", date : "May 24 2011", text : "Best Post Ever!" } , { author : "Bob", date : "May 24 2011", text : "Awesome" } , ] } Wednesday, October 5, 2011
  6. Adding a Comment id = “/post/eliot/2011-05-24/1”; post = db.blog.posts.findOne( {

    _id : id } ); cid = id + “---” + Math.floor( post.numComments / 100 ); db.blog.comments.update( { _id : cid } , { $push : { comments : theComment } ); db.blog.posts.update( { _id : id } , { $inc : { numComments : 1 } ); Wednesday, October 5, 2011
  7. Getting All Comments cursor = db.blog.comments.find( { _id : /^\/post\/eliot

    \/2011\-05\-24\/1\-\-\-/ } ); while ( cursor.hasNext() ) { doc = cursor.next(); for ( var i=0; i<doc.comments.length; i++ ) printjson( doc.comments[i] ) } Wednesday, October 5, 2011
  8. Deleting a Comment db.blog.comments.update( { _id : /^\/post\/eliot\/2011\-05\-24\/1\-\-\-/ } ,

    { $pull : { “theComment.author” : “eliot” } } ) Wednesday, October 5, 2011
  9. Indexes • Index common queries • Make sure there aren’t

    duplicates: (A) and (A,B) aren’t needed • Right-balanced indexes keep working set small Wednesday, October 5, 2011
  10. Random Index Access Have to keep entire index in ram

    •email address •hash Wednesday, October 5, 2011
  11. Right-Balanced Index Access Only have to keep small portion in

    ram •Time Based •ObjectId •Auto Increment Wednesday, October 5, 2011
  12. Covered Indexes • Keep data sequential in index • find(

    { email : “[email protected]” } , { first : 1 , last : 1 , state : 1 } ) • index: { email : 1 , first : 1 , last : 1 , state : 1 } Wednesday, October 5, 2011
  13. Choosing a Shard Key • Shard key determines how data

    is partitioned • Hard to change • Most important performance decision Wednesday, October 5, 2011
  14. Range Based • collection is broken into chunks by range

    • chunks default to 64mb or 100,000 objects Wednesday, October 5, 2011
  15. Use Case: User Profiles { email : “[email protected]” , addresses

    : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 } Wednesday, October 5, 2011
  16. Use Case: Activity Stream { user_id : XXX, event_id :

    YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes Wednesday, October 5, 2011
  17. Use Case: Photos { photo_id : ???? , data :

    <binary> } What’s the right key? • auto increment • MD5( data ) • now() + MD5(data) • month() + MD5(data) Wednesday, October 5, 2011
  18. Use Case: Logging { machine : “app.foo.com” , app :

    “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys • { machine : 1 } • { when : 1 } • { app : 1 } Wednesday, October 5, 2011
  19. Download MongoDB http://www.mongodb.org and  let  us  know  what  you  think

    @eliothorowitz        @mongodb 10gen is hiring! http://www.10gen.com/jobs Wednesday, October 5, 2011