Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design Principles and Practice

mongodb
April 24, 2012
600

Schema Design Principles and Practice

MongoDB Stockholm 2012
Schema Design Principles and Practice
Christian Kvalheim, Software Engineer, 10gen

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.

mongodb

April 24, 2012
Tweet

Transcript

  1. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Tuesday, April 24, 12
  2. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues Tuesday, April 24, 12
  3. Terminology RDBMS MongoDB Table Collection Row(s) JSON  Document Index Index

    Join Embedding  &  Linking Tuesday, April 24, 12
  4. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce • Aggregation framework (2.1 -> ) Tuesday, April 24, 12
  5. Schema-design Criteria Access Patterns? • Read / Write Ratio •

    Types of updates • Types of queries • Data life-cycle Tuesday, April 24, 12
  6. A simple start: Map the documents to your application. post

     =  {author:  “Hergé”,                date:  new  Date(),                text:  “Destination  Moon”,                tags:  [“comic”,  “adventure”]} >  db.post.save(post) Tuesday, April 24, 12
  7. >  db.posts.find()    {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:

     "Hergé",          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document Tuesday, April 24, 12
  8. Secondary index on “author”  >  db.posts.ensureIndex({author:  1})  >  db.posts.find({author:  'Hergé'})

             {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          author:  "Hergé",            ...  } Add an index, find via index Tuesday, April 24, 12
  9. Examine the query plan >  db.blogs.find({author:  'Hergé'}).explain() {   "cursor"

     :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   } } Tuesday, April 24, 12
  10. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.posts.ensureIndex({tags:  1}) Tuesday, April 24, 12
  11. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.posts.ensureIndex({tags:  1}) //  find  posts  with  a  specific  tag //  (This  will  use  an  index!) >  db.posts.find({tags:  ‘comic’}) Tuesday, April 24, 12
  12. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size,

    $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll Tuesday, April 24, 12
  13. Extending the Schema        new_comment  =  {author:  “Christian”,

                                     date:  new  Date(),                                text:  “great  book”,                                votes:  5}  >  db.posts.update(                      {text:  “Destination  Moon”  },                        {  ‘$push’:  {comments:  new_comment},                          ‘$inc’:    {comments_count:  1}}) Tuesday, April 24, 12
  14.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Christian",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5   }        ],        comments_count:  1    } Extending the Schema Tuesday, April 24, 12
  15. //  create  index  on  nested  documents: >  db.posts.ensureIndex({"comments.author":  1}) >

     db.posts.find({comments.author:”Christian”}) The ‘dot’ operator Tuesday, April 24, 12
  16. //  create  index  on  nested  documents: >  db.posts.ensureIndex({"comments.author":  1}) >

     db.posts.find({comments.author:”Christian”}) //  create  index  comment  votes: >  db.posts.ensureIndex({comments.votes:  1}) //  find  all  posts  with  any  comments  with  more  than //  50  votes >  db.posts.ensureIndex({comments.votes:  {$gt:  50}}) The ‘dot’ operator Tuesday, April 24, 12
  17. Rich Documents {      _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),    

                 line_items  :  [  {  sku:  ‘tt-­‐123’,                                          name:  ‘Tintin  a  la  Lune’  },                                      {  ski:  ‘tt-­‐457’,                                          name:  ‘Coke  en  stock’  }  ],                address  :  {  name:  ‘Kvalheim’,                                street:  ‘111  Main’,                                zip:  10010  },                  payment:  {  cc:  4567,                              exp:  Date(2011,  7,  7)  },        subtotal:  2355 } Tuesday, April 24, 12
  18. Single Table Inheritance - RDBMS shapes table id type area

    radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2 Tuesday, April 24, 12
  19. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} Tuesday, April 24, 12
  20. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) Tuesday, April 24, 12
  21. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) //  create  sparse  index >  db.shapes.ensureIndex({radius:  1},  {sparse:  true}) Tuesday, April 24, 12
  22. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents Tuesday, April 24, 12
  23. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query Tuesday, April 24, 12
  24. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries Tuesday, April 24, 12
  25. One to Many - patterns - Embedded Array / Array

    Keys - Embedded Array / Array Keys - Embedded tree - Normalized Tuesday, April 24, 12
  26.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [      {     author  :  "Christian",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5      }        ],        comments_count:  1    } Tuesday, April 24, 12
  27.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ]    }        {   book_id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author  :  "Christian",   date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",   text  :  "great  book",          votes  :  5    }     Tuesday, April 24, 12
  28. Referencing vs. Embedding - Embed when the ‘many’ objects always

    appear with their parent. - Reference when you need more flexibility. Tuesday, April 24, 12
  29. Many - Many Example: - Product can be in many

    categories - Category can have many products Tuesday, April 24, 12
  30. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     Many - Many Tuesday, April 24, 12
  31. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} Many - Many Tuesday, April 24, 12
  32. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} //  Multi-­‐key  index  on  array  fields >  db.products.ensureIndex({category_ids:  1}); >  db.categories.ensureIndex({product_ids:  1}); Many - Many Tuesday, April 24, 12
  33. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} Better alternative Tuesday, April 24, 12
  34. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:   ObjectId("4c4ca25433fb5941681b912f")})     Alternative Tuesday, April 24, 12
  35. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:          ObjectId("4c4ca25433fb5941681b912f")})   //  All  categories  for  a  given  product product    =  db.products.find(_id  :  some_id) >  db.categories.find({_id  :  {$in  :  product.category_ids}})   Alternative Tuesday, April 24, 12
  36. Trees Full Tree in Document {  comments:  [    

         {  author:  “Christian”,  text:  “...”,                replies:  [                                            {author:  “Fred”,  text:  “...”,                                              replies:  []}                ]}    ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit 1.8 >     Tuesday, April 24, 12
  37. Trees Parent Links - Each node is stored as a

    document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child) Tuesday, April 24, 12
  38. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } Tuesday, April 24, 12
  39. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) Tuesday, April 24, 12
  40. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) //find  all  ancestors  of  f: >  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors >  db.tree2.find({_id:  {  $in  :  ancestors}) Tuesday, April 24, 12
  41. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } Tuesday, April 24, 12
  42. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } //  find  highest  priority  job  and  mark  as  in-­‐progress job  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1),                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}}})     Tuesday, April 24, 12
  43. Summary Schema design is different in MongoDB Basic data design

    principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-) Tuesday, April 24, 12
  44. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] Tuesday, April 24, 12