Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design for MongoDB

mongodb
April 18, 2012

Schema Design for MongoDB

Kevin Hanson's Schema Design Talk

mongodb

April 18, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. • mongoDB Data Model • Parallels with RDBMS • Embedding

    v. Linking • Denormalization • Managing Arrays • Schema Decisions when Sharding Agenda Wednesday, April 18, 12
  2. { title: ‘Who Needs Rows?’, reasons: [ { name: ‘scalability’,

    desc: ‘no more joins!’ }, { name: ‘human readable’, desc: ‘ah this is nice...’ } ], model: { relational: false, awesome: true } } mongoDB Data Model: Rich Documents Wednesday, April 18, 12
  3. Parallels RDBMS MongoDB Table Collection Row Document Column Field Index

    Index Join Embedding  &  Linking Schema  Object Wednesday, April 18, 12
  4. How Should the Documents Look? What Are We Going to

    Do with the Data? To embed or to link... That is the question! Wednesday, April 18, 12
  5. 1) Fully Embedded { blog-title: ‘Commuting to Work’, blog-text: [

    ‘This section is about airplanes’, ‘this section is about trains’ ], comments: [ { author: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ }, { author: ‘John Smith’, comment: ‘this blog is aWful!!11!!!!’ } ], } Wednesday, April 18, 12
  6. 1) Fully Embedded Pros • Can query the comments or

    the blog for results • Cleanly encapsulated Wednesday, April 18, 12
  7. 1) Fully Embedded Pros • Can query the comments or

    the blog for results • Cleanly encapsulated Cons • What if we get too many comments? (16MB mongoDB doc size) • What if we want our results to be comments, not blog posts? Wednesday, April 18, 12
  8. 2) Separating Blog & Comments { _id: ObjectId("4c4ba5c0672c685e 5e8aabf3") comment-ref:

    ObjectId("4c4ba5c0672c685e 5e8aabf4") blog-title: ‘Commuting to Work’, blog-text: [ ‘This section is about airplanes’, ‘this section is about trains’ ] } { _id: ObjectId("4c4ba5c0672c685e5e 8aabf4") blog-ref: ObjectId("4c4ba5c0672c685e5e 8aabf3") comments: [ { author: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ }, { author: ‘John Smith’, comment: ‘this blog is aWful!!11!!!!’ } ], } Wednesday, April 18, 12
  9. 2) Separating Blog & Comments Pros • Blog Post Size

    Stays Constant • Can Search Sets of Comments Wednesday, April 18, 12
  10. 2) Separating Blog & Comments Pros • Blog Post Size

    Stays Constant • Can Search Sets of Comments Cons • Too Many Comments? (same problem) • Managing Document Links Wednesday, April 18, 12
  11. 3) Each Comment Gets Own Doc { blog-title: ‘Commuting to

    Work’, blog-text: [ ‘This section is about airplanes’, ‘this section is about trains’] } { commenter: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ } { commenter: ‘John Smith’, comment: ‘this blog is aWful!!11!!!!’ } Wednesday, April 18, 12
  12. 3) Each Comment Gets Own Doc Pros • Can Query

    Individual Comments • Never Need to Worry About Doc Size Wednesday, April 18, 12
  13. 3) Each Comment Gets Own Doc Pros • Can Query

    Individual Comments • Never Need to Worry About Doc Size Cons • Many Documents • Standard Use Cases Become Complicated Wednesday, April 18, 12
  14. More on Denormalization • Faster Than Normalized • More Object

    Oriented • Application Level Implications Wednesday, April 18, 12
  15. More on Denormalization • Faster Than Normalized • More Object

    Oriented • Application Level Implications For example... Blog Post 1: Paul enters his name as “Paul” Blog Post 2: Paul enter his name as “Paul Pedersen” We want to avoid this. Wednesday, April 18, 12
  16. Managing Arrays Pushing to an Array Infinitely... • Document Will

    Grow Larger than Pre- Allocated Size • Document May Increase Max Doc Size of 16MB Can this be avoided?? • Yes! • A Hybrid of Linking and Embedding • Somewhere Between Methods 2 and 3 Wednesday, April 18, 12
  17. 2.5) Limiting Array Length { start: ‘1’, end: ’3’, full:

    ‘true’, comments: [ { author: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ }, { author: ‘Sally Smith’, comment: ‘this blog is aWful!!11!!!!’ } ], { author: ‘Kathryn Fong’, comment: ‘I can comment!’ } ] } { start: ‘4’, end: ’6’, full: ‘false’, comments: [ { author: ‘Professor Man’, comment: ‘Schema design is intellectually stimulating!’ } ] } Wednesday, April 18, 12
  18. Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary

    Secondary Secondary Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 MongoS MongoS MongoS Read Write MongoS Config Config Config Schema Decisions When Sharding Wednesday, April 18, 12
  19. Schema Decisions When Sharding • Can we intelligently partition data?

    • Will this partitioning create hotspots? • Can our partitioning actually improve overall performance? Wednesday, April 18, 12
  20. Schema Decisions When Sharding • Can we intelligently partition data?

    • Will this partitioning create hotspots? • Can our partitioning actually improve overall performance? “Bad” Shard Key Example: Sharding on “date” field and constantly inserting most recent data... Possible* Good Shard Key Example... Sharding blog posts on “author” field *depends on usage patterns Wednesday, April 18, 12
  21. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo More info at http://www.mongodb.org/ Wednesday, April 18, 12