Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Evening with MongoDB - San Diego 2012: Schema Design Principles and Practice

mongodb
July 25, 2012
480

An Evening with MongoDB - San Diego 2012: Schema Design Principles and Practice

Matt Shopsin, 10gen
One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions: * Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document? * Are join tables necessary, or is there another technique for building out many-to-many relationships? * What level of denormalization is appropriate? * How do my data modeling decisions affect the efficiency of updates and queries? In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.

mongodb

July 25, 2012
Tweet

Transcript

  1. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat

    Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever" } ]} Embedded Documents
  2. Parallels RDBMS   MongoDB   Table   Collection   Row

      Document   Column   Field   Index   Index   Join   Embedding  &  Linking   Schema  Object  
  3. User • Name • Email Address Category • Name • Url Article • Name • Slug

    • Publish date • Text Tag • Name • Url Comment • Comment • Date • Author Relational
  4. User • Name • Email Address Article • Name • Slug • Publish date • Text

    • Author Tag[] • Value Comment[] • Comment • Date • Author Category[] • Value MongoDB
  5. 1) Fully Embedded { blog-title: bCommuting to Work`, blog-text: [

    bThis section is about airplanes`, bthis section is about trains` ], comments: [ { author: bKevin Hanson`, comment: bdude, what about driving?` }, { author: bJohn Smith`, comment: bthis blog is aWful!!11!!!!` } ], }
  6. 1) Fully Embedded Pros •  Can query the comments or

    the blog for results •  Cleanly encapsulated Cons •  What if we get too many comments? (16MB MongoDB doc size) •  What if we want our results to be comments, not blog posts?
  7. 2) Each Comment Gets Own Doc { blog-title: bCommuting to

    Work`, blog-text: [ bThis section is about airplanes`, bthis section is about trains`] } { commenter: bKevin Hanson`, comment: bdude, what about driving?` } { commenter: bJohn Smith`, comment: bthis blog is aWful!!11!!!!` }
  8. 2) Each Comment Gets Own Doc Pros •  Can Query

    Individual Comments •  Never Need to Worry About Doc Size Cons •  Many Documents •  Standard Use Cases Become Complicated
  9. Managing Arrays Pushing to an Array Infinitely... •  Document Will

    Grow Larger than Allocated Space •  Document May Increase Max Doc Size of 16MB Can this be avoided?? •  Yes! •  A Hybrid of Linking and Embedding
  10. Tags, Geo Coordinates, and Tips          

           {  name:  “10gen  HQ”,                    address:  “578  Broadway,  7th  Floor”,                    city:  “New  York”,                    zip:  10012,                    tags:  [“MongoDB”,  “business”],                    latlong:  [40.0,  72.0],                    tips:  [{user:  “kevin”,  time:  “3/15/2012”,tip:   “Make  sure  to  stop  by  for  office  hours!”}],}  
  11. Updating Tips db.places.update({name:"10gen  HQ"},    {$push  :{tips:      

     {user:"nosh",  time: 3/15/2012,        tip:"stop  by  for  office  hours   on        Wednesdays  from  4-­‐6"}}})    
  12. Querying Places ★ Creating  Indexes    db.places.ensureIndex({tags:1})db.places.ensureIndex({name:1})    db.places.ensureIndex({latlong:”2d”})   ★ Finding

     Places    db.places.find({latlong:{$near:[40,70]}})   ★ Regular  Expressions    db.places.find({name:  /^typeaheadstring/)   ★ Using  Tags    db.places.find({tags:  “business”})  
  13. Users user1  =  {  name:  “Kevin  Hanson”  e-­‐mail:   “[email protected]”,

     check-­‐ins:   [4b97e62bf1d8c7152c9ccb74,    5a20e62bf1d8c736ab]   }   checkins  []  =  ObjectId  reference  to  Check-­‐Ins   Collection  
  14. Check-Ins checkin  =  {  place:    “10gen  HQ”,  ts:  

     9/20/2010   10:12:00,  userId:  <object  id  of  user>   }   Every  Check-­‐In  is  Two  Operations   •   Insert  a  Check-­‐In  Object  (check-­‐ins  collection)   •   Update  ($push)  user  object  with  check-­‐in  ID  (users   collection)  
  15. Stats w/ MapReduce mapFunc  =  function()  {   emit(this.place,  1);}

      reduceFunc  =  function(key,  values)  {   return  Array.sum(values);}     res  =  db.checkins.mapReduce(mapFunc,reduceFunc,    {    query:  {      timestamp:  {        $gt:nowminus3hrs          }      }   })   res  =  [{_id:”10gen  HQ”,  value:  17},  …..,  ….]