Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design - Kyle Banker, 10gen

mongodb
October 07, 2011

Schema Design - Kyle Banker, 10gen

MongoSF 2011

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions:

Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document? Are join tables necessary, or is there another technique for building out many-to-many relationships? What level of denormalization is appropriate? How do my data modeling decisions affect the efficiency of updates and queries?

In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.

mongodb

October 07, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Wednesday, May 25, 2011
  2. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues Wednesday, May 25, 2011
  3. Terminology RDBMS MongoDB Table Collection Row(s) JSON  Document Index Index

    Join Embedding  &  Linking Wednesday, May 25, 2011
  4. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce Wednesday, May 25, 2011
  5. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle Wednesday, May 25, 2011
  6. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce Considerations • No Joins • Document writes are atomic Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle Wednesday, May 25, 2011
  7. A simple start: Map the documents to your application. post

     =  {author:  “Hergé”,                date:  new  Date(),                text:  “Destination  Moon”,                tags:  [“comic”,  “adventure”]} >  db.post.save(post) Wednesday, May 25, 2011
  8. >  db.posts.find()    {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:

     "Hergé",          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document Wednesday, May 25, 2011
  9. Secondary index on “author”  >  db.posts.ensureIndex({author:  1})  >  db.posts.find({author:  'Hergé'})

             {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          author:  "Hergé",            ...  } Add an index, find via index Wednesday, May 25, 2011
  10. Examine the query plan >  db.blogs.find({author:  'Hergé'}).explain() {   "cursor"

     :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   } } Wednesday, May 25, 2011
  11. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.posts.ensureIndex({tags:  1}) Wednesday, May 25, 2011
  12. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.posts.ensureIndex({tags:  1}) //  find  posts  with  a  specific  tag //  (This  will  use  an  index!) >  db.posts.find({tags:  ‘comic’}) Wednesday, May 25, 2011
  13. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size,

    $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll Wednesday, May 25, 2011
  14. Extending the Schema        new_comment  =  {author:  “Kyle”,

                                     date:  new  Date(),                                text:  “great  book”,                                votes:  5}  >  db.posts.update(                      {text:  “Destination  Moon”  },                        {  ‘$push’:  {comments:  new_comment},                          ‘$inc’:    {comments_count:  1}}) Wednesday, May 25, 2011
  15.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5   }        ],        comments_count:  1    } Extending the Schema Wednesday, May 25, 2011
  16. //  create  index  on  nested  documents: >  db.posts.ensureIndex({"comments.author":  1}) >

     db.posts.find({comments.author:”Kyle”}) The ‘dot’ operator Wednesday, May 25, 2011
  17. //  create  index  on  nested  documents: >  db.posts.ensureIndex({"comments.author":  1}) >

     db.posts.find({comments.author:”Kyle”}) //  create  index  comment  votes: >  db.posts.ensureIndex({comments.votes:  1}) //  find  all  posts  with  any  comments  with  more  than //  50  votes >  db.posts.ensureIndex({comments.votes:  {$gt:  50}}) The ‘dot’ operator Wednesday, May 25, 2011
  18. Rich Documents {      _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),    

                 line_items  :  [  {  sku:  ‘tt-­‐123’,                                          name:  ‘Tintin  a  la  Lune’  },                                      {  ski:  ‘tt-­‐457’,                                          name:  ‘Coke  en  stock’  }  ],                address  :  {  name:  ‘Banker’,                                street:  ‘111  Main’,                                zip:  10010  },                  payment:  {  cc:  4567,                              exp:  Date(2011,  7,  7)  },        subtotal:  2355 } Wednesday, May 25, 2011
  19. Single Table Inheritance - RDBMS shapes table id type area

    radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2 Wednesday, May 25, 2011
  20. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} Wednesday, May 25, 2011
  21. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) Wednesday, May 25, 2011
  22. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) //  create  sparse  index >  db.shapes.ensureIndex({radius:  1},  {sparse:  true}) Wednesday, May 25, 2011
  23. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents Wednesday, May 25, 2011
  24. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query Wednesday, May 25, 2011
  25. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries Wednesday, May 25, 2011
  26. One to Many - patterns - Embedded Array / Array

    Keys - Embedded Array / Array Keys - Embedded tree - Normalized Wednesday, May 25, 2011
  27.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [      {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5      }        ],        comments_count:  1    } Wednesday, May 25, 2011
  28.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ]    }        {   book_id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author  :  "Kyle",   date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",   text  :  "great  book",          votes  :  5    }     Wednesday, May 25, 2011
  29. Referencing vs. Embedding - Embed when the ‘many’ objects always

    appear with their parent. - Reference when you need more flexibility. Wednesday, May 25, 2011
  30. Many - Many Example: - Product can be in many

    categories - Category can have many products Wednesday, May 25, 2011
  31. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     Many - Many Wednesday, May 25, 2011
  32. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} Many - Many Wednesday, May 25, 2011
  33. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} //  Multi-­‐key  index  on  array  fields >  db.products.ensureIndex({category_ids:  1}); >  db.categories.ensureIndex({product_ids:  1}); Many - Many Wednesday, May 25, 2011
  34. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} Better alternative Wednesday, May 25, 2011
  35. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:  ObjectId ("4c4ca25433fb5941681b912f")})     Alternative Wednesday, May 25, 2011
  36. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:          ObjectId("4c4ca25433fb5941681b912f")})   //  All  categories  for  a  given  product product    =  db.products.find(_id  :  some_id) >  db.categories.find({_id  :  {$in  :  product.category_ids}})   Alternative Wednesday, May 25, 2011
  37. Trees Full Tree in Document {  comments:  [    

         {  author:  “Kyle”,  text:  “...”,                replies:  [                                            {author:  “Fred”,  text:  “...”,                                              replies:  []}                ]}    ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit     Wednesday, May 25, 2011
  38. Trees Parent Links - Each node is stored as a

    document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child) Wednesday, May 25, 2011
  39. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } Wednesday, May 25, 2011
  40. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) Wednesday, May 25, 2011
  41. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) //find  all  ancestors  of  f: >  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors >  db.tree2.find({_id:  {  $in  :  ancestors}) Wednesday, May 25, 2011
  42. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } Wednesday, May 25, 2011
  43. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } //  find  highest  priority  job  and  mark  as  in-­‐progress job  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1),                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}}})     Wednesday, May 25, 2011
  44. Summary Schema design is different in MongoDB Basic data design

    principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-) Wednesday, May 25, 2011
  45. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] Wednesday, May 25, 2011