Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design - Kevin Hanson, 10gen

mongodb
January 26, 2012

Schema Design - Kevin Hanson, 10gen

MongoDB Los Angeles 2012

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense

mongodb

January 26, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. • mongoDB Data Model • Parallels with RDBMS • Embedding

    v. Linking • Denormalization • Managing Arrays • Schema Decisions when Sharding Agenda 2 Monday, January 23, 12
  2. { title: ‘Who Needs Rows?’, reasons: [ { name: ‘scalability’,

    desc: ‘no more joins!’ }, { name: ‘human readable’, desc: ‘ah this is nice...’ } ], model: { relational: false, awesome: true } } mongoDB Data Model: Rich Documents 3 Monday, January 23, 12
  3. Parallels RDBMS MongoDB Table Collection Row Document Column Field Index

    Index Join Embedding  &  Linking Schema  Object 4 Monday, January 23, 12
  4. How Should the Documents Look? What Are We Going to

    Do with the Data? To embed or to link... That is the question! 5 Monday, January 23, 12
  5. 1) Fully Embedded { blog-title: ‘Commuting to Work’, blog-text: [

    ‘This section is about airplanes’, ‘this section is about trains’ ], comments: [ { author: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ }, { author: ‘John Smith’, comment: ‘this blog is aWful!!11!!!!’ } ], } 7 Monday, January 23, 12
  6. 1) Fully Embedded Pros • Can query the comments or

    the blog for results • Cleanly encapsulated Cons • What if we get too many comments? (16MB mongoDB doc size) • What if we want our results to be comments, not blog posts? 8 Monday, January 23, 12
  7. 2) Separating Blog & Comments { _id: ObjectId("4c4ba5c0672c685e 5e8aabf3") comment-ref:

    ObjectId("4c4ba5c0672c685e 5e8aabf4") blog-title: ‘Commuting to Work’, blog-text: [ ‘This section is about airplanes’, ‘this section is about trains’ ] } { _id: ObjectId("4c4ba5c0672c685e5e 8aabf4") blog-ref: ObjectId("4c4ba5c0672c685e5e 8aabf3") comments: [ { author: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ }, { author: ‘John Smith’, comment: ‘this blog is aWful!!11!!!!’ } ], } 9 Monday, January 23, 12
  8. 2) Separating Blog & Comments Pros • Blog Post Size

    Stays Constant • Can Search Sets of Comments Cons • Too Many Comments? (same problem) • Managing Document Links 10 Monday, January 23, 12
  9. 3) Each Comment Gets Own Doc { blog-title: ‘Commuting to

    Work’, blog-text: [ ‘This section is about airplanes’, ‘this section is about trains’] } { commenter: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ } { commenter: ‘John Smith’, comment: ‘this blog is aWful!!11!!!!’ } 11 Monday, January 23, 12
  10. 3) Each Comment Gets Own Doc Pros • Can Query

    Individual Comments • Never Need to Worry About Doc Size Cons • Many Documents • Standard Use Cases Become Complicated 12 Monday, January 23, 12
  11. More on Denormalization • Faster Than Normalized • More Object

    Oriented • Application Level Implications For example... Blog Post 1: Paul enters his name as “Paul” Blog Post 2: Paul enter his name as “Paul Pedersen” We want to avoid this. 15 Monday, January 23, 12
  12. Managing Arrays Pushing to an Array Infinitely... • Document Will

    Grow Larger than Pre- Allocated Size • Document May Increase Max Doc Size of 16MB Can this be avoided?? • Yes! • A Hybrid of Linking and Embedding • Somewhere Between Methods 2 and 3 16 Monday, January 23, 12
  13. 2.5) Limiting Array Length { start: ‘1’, end: ’3’, full:

    ‘true’, comments: [ { author: ‘Kevin Hanson’, comment: ‘dude, what about driving?’ }, { author: ‘Sally Smith’, comment: ‘this blog is aWful!!11!!!!’ } ], { author: ‘Kathryn Fong’, comment: ‘I can comment!’ } ] } { start: ‘4’, end: ’6’, full: ‘false’, comments: [ { author: ‘Professor Man’, comment: ‘Schema design is intellectually stimulating!’ } ] } 17 Monday, January 23, 12
  14. Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary

    Secondary Secondary Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 MongoS MongoS MongoS Read Write MongoS Config Config Config Schema Decisions When Sharding 18 Monday, January 23, 12
  15. Schema Decisions When Sharding • Can we intelligently partition data?

    • Will this partitioning create hotspots? • Can our partitioning actually improve overall performance? “Bad” Shard Key Example: Sharding on “date” field and constantly inserting most recent data... Possible* Good Shard Key Example... Sharding blog posts on “author” field *depends on usage patterns 19 Monday, January 23, 12
  16. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo More info at http://www.mongodb.org/ We’re Hiring ! http://www.10gen.com/jobs 20 Monday, January 23, 12