Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design Basics - Spencer Brody, 10gen

mongodb
October 05, 2011

Schema Design Basics - Spencer Brody, 10gen

MongoBoston 2011

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules... of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.

mongodb

October 05, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Wednesday, October 5, 2011
  2. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues Wednesday, October 5, 2011
  3. Terminology RDBMS MongoDB Table Collection Row(s) JSON  Document Index Index

    Join Embedding  &  Linking Wednesday, October 5, 2011
  4. How can we manipulate this data? • Dynamic Queries •

    Secondary Indexes • Atomic Updates • Map Reduce Wednesday, October 5, 2011
  5. Schema-design Criteria Access Patterns? • Read / Write Ratio •

    Types of updates • Types of queries • Data life-cycle Wednesday, October 5, 2011
  6. Schema-design Criteria Considerations • No Joins • No multi-document transactions

    Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle Wednesday, October 5, 2011
  7. A simple start: Map the documents to your application. post

     =  {author:  “Orson  Scott  Card”,                date:  new  Date(),                text:  “Ender’s  Game”,                tags:  [“scifi”,  “aliens”]} >  db.post.save(post) Wednesday, October 5, 2011
  8. >  db.posts.find()    {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:

     "Orson  Scott  Card",          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text:  "Ender’s  game",          tags:  [  "scifi",  "aliens"  ]    }     Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document Wednesday, October 5, 2011
  9. Secondary index on “author”  >  db.posts.ensureIndex({author:  1})  >  db.posts.find({author:  'Orson

     Scott  Card'})          {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          author:  "Orson  Scott  Card",            ...  } Add an index, find via index Wednesday, October 5, 2011
  10. Examine the query plan >  db.blogs.find({author:  'Orson  Scott  Card'}).explain() {

      "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Orson  Scott  Card",         "Orson  Scott  Card"       ]     ]   } } Wednesday, October 5, 2011
  11. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.posts.ensureIndex({tags:  1}) Wednesday, October 5, 2011
  12. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.posts.ensureIndex({tags:  1}) //  find  posts  with  a  specific  tag //  (This  will  use  an  index!) >  db.posts.find({tags:  ‘scifi’}) Wednesday, October 5, 2011
  13. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size,

    $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll Wednesday, October 5, 2011
  14. Extending the Schema        new_comment  =  {author:  “Spencer”,

                                     date:  new  Date(),                                text:  “great  book”,                                votes:  5}  >  db.posts.update(                      {text:  “Ender’s  Game”  },                        {  ‘$push’:  {comments:  new_comment},                          ‘$inc’:    {comments_count:  1}}) Wednesday, October 5, 2011
  15.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Orson  Scott  Card",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Ender’s  Game",        tags  :  [  "scifi",  "aliens"  ],                comments  :  [   {     author  :  "Spencer",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5   }        ],        comments_count:  1    } Extending the Schema Wednesday, October 5, 2011
  16. //  create  index  on  nested  documents: >  db.posts.ensureIndex({"comments.author":  1}) >

     db.posts.find({comments.author:”Spencer”}) The ‘dot’ operator Wednesday, October 5, 2011
  17. //  create  index  on  nested  documents: >  db.posts.ensureIndex({"comments.author":  1}) >

     db.posts.find({:“comments.author”:“Spencer”}) //  create  index  comment  votes: >  db.posts.ensureIndex({“comments.votes”:  1}) //  find  all  posts  with  any  comments  with  more  than //  50  votes >  db.posts.find({“comments.votes”:  {$gt:  50}}) The ‘dot’ operator Wednesday, October 5, 2011
  18. Rich Documents {      _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),    

                 line_items  :  [  {  sku:  ‘tt-­‐123’,                                          name:  ‘Tintin  a  la  Lune’  },                                      {  ski:  ‘tt-­‐457’,                                          name:  ‘Coke  en  stock’  }  ],                address  :  {  name:  ‘Brody’,                                street:  ‘123  Main’,                                zip:  10011  },                  payment:  {  cc:  4567,                              exp:  Date(2011,  7,  7)  },        subtotal:  2355 } Wednesday, October 5, 2011
  19. Single Table Inheritance - RDBMS shapes table id type area

    radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2 Wednesday, October 5, 2011
  20. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} Wednesday, October 5, 2011
  21. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) Wednesday, October 5, 2011
  22. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) //  create  sparse  index >  db.shapes.ensureIndex({radius:  1},  {sparse:  true}) Wednesday, October 5, 2011
  23. One to Many - Embedded Array - $slice operator to

    return subset of array - some queries hard e.g find latest comments across all documents - Normalized (2 collections) - most flexible - more queries Wednesday, October 5, 2011
  24.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Orson  Scott  Card",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Ender’s  Game",        tags  :  [  "scifi",  "aliens"  ],                comments  :  [      {     author  :  "Spencer",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5      }        ],        comments_count:  1    } Wednesday, October 5, 2011
  25.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Orson  Scott  Card",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Ender’s  Game",        tags  :  [  "scifi",  "aliens"  ]    }        {        _id:  ObjectId("4e852433ff23f83798728af0"),   book_id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author  :  "Spencer",   date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",   text  :  "great  book",          votes  :  5    } Wednesday, October 5, 2011
  26. Referencing vs. Embedding - Embed when the ‘many’ objects always

    appear with their parent. - Reference when you need more flexibility. Wednesday, October 5, 2011
  27. Many - Many Example: - Product can be in many

    categories - Category can have many products Wednesday, October 5, 2011
  28. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Ender’s  Game",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     Many - Many Wednesday, October 5, 2011
  29. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Ender’s  Game",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "scifi",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} Many - Many Wednesday, October 5, 2011
  30. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Ender’s  Game",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "scifi",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} //  Multi-­‐key  index  on  array  fields >  db.products.ensureIndex({category_ids:  1}); >  db.categories.ensureIndex({product_ids:  1}); Many - Many Wednesday, October 5, 2011
  31. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Ender’s  Game",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "scifi"} Better alternative Wednesday, October 5, 2011
  32. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Ender’s  Game",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "scifi"} //  All  products  for  a  given  category >  db.products.find({category_ids:          ObjectId("4c4ca25433fb5941681b912f")})     Better alternative Wednesday, October 5, 2011
  33. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Ender’s  Game",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "scifi"} //  All  products  for  a  given  category >  db.products.find({category_ids:          ObjectId("4c4ca25433fb5941681b912f")})   //  All  categories  for  a  given  product product    =  db.products.find(_id  :  some_id) >  db.categories.find({_id  :  {$in  :  product.category_ids}})   Better alternative Wednesday, October 5, 2011
  34. Trees Full Tree in Document {  comments:  [    

         {  author:  “Spencer”,  text:  “...”,                replies:  [                                            {author:  “Kyle”,  text:  “...”,                                              replies:  []}                ]}    ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit     Wednesday, October 5, 2011
  35. Trees Parent Links - Each node is stored as a

    document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child) Wednesday, October 5, 2011
  36. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } Wednesday, October 5, 2011
  37. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) Wednesday, October 5, 2011
  38. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) //find  all  ancestors  of  f: >  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors >  db.tree2.find({_id:  {  $in  :  ancestors}) Wednesday, October 5, 2011
  39. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } Wednesday, October 5, 2011
  40. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } //  find  highest  priority  job  and  mark  as  in-­‐progress job  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1),                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}}})     Wednesday, October 5, 2011
  41. Summary Schema design is different in MongoDB Basic data design

    principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-) Wednesday, October 5, 2011
  42. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo More info at http://www.mongodb.org/ We’re Hiring ! http://www.10gen.com/jobs Wednesday, October 5, 2011