Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design - Jared Rosoff, 10gen

mongodb
February 23, 2012

Schema Design - Jared Rosoff, 10gen

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions:

Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document?
Are join tables necessary, or is there another technique for building out many-to-many relationships?
What level of denormalization is appropriate?
How do my data modeling decisions affect the efficiency of updates and queries?

In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.

mongodb

February 23, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues
  2. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce
  3. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle
  4. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce Considerations • No Joins • Document writes are atomic Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle
  5. A simple start: Map the documents to your application. book

     =  {author:  “Hergé”,                date:  new  Date(),                text:  “Destination  Moon”,                tags:  [“comic”,  “adventure”]} >  db.books.save(book)
  6. >  db.books.find()    {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:

     "Hergé",          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document
  7. Secondary index on “author”  >  db.books.ensureIndex({author:  1})  >  db.books.find({author:  'Hergé'})

             {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          author:  "Hergé",            ...  } Add an index, find via index
  8. Examine the query plan >  db.books.find({author:  'Hergé'}).explain() {   "cursor"

     :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   } }
  9. Multi-key indexes //  Build  an  index  on  the  ‘tags’  array

    >  db.books.ensureIndex({tags:  1}) //  find  posts  with  a  specific  tag //  (This  will  use  an  index!) >  db.books.find({tags:  ‘comic’})
  10. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size,

    $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll
  11. Extending the Schema        new_comment  =  {author:  “Kyle”,

                                     date:  new  Date(),                                text:  “great  book”,                                votes:  5}  >  db.books.update(                      {text:  “Destination  Moon”  },                        {  ‘$push’:  {comments:  new_comment},                          ‘$inc’:    {comments_count:  1}})
  12.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5   }        ],        comments_count:  1    } Extending the Schema
  13. //  create  index  on  nested  documents: >  db.books.ensureIndex({"comments.author":  1}) >

     db.books.find({comments.author:”Kyle”}) The ‘dot’ operator
  14. //  create  index  on  nested  documents: >  db.books.ensureIndex({"comments.author":  1}) >

     db.books.find({comments.author:”Kyle”}) //  create  index  comment  votes: >  db.books.ensureIndex({comments.votes:  1}) //  find  all  books  with  any  comments  with  more  than //  50  votes >  db.books.ensureIndex({comments.votes:  {$gt:  50}}) The ‘dot’ operator
  15. Rich Documents {      _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),    

                 line_items  :  [  {  sku:  ‘tt-­‐123’,                                          name:  ‘Tintin  a  la  Lune’  },                                      {  sku:  ‘tt-­‐457’,                                          name:  ‘Coke  en  stock’  }  ],                address  :  {  name:  ‘Banker’,                                street:  ‘111  Main’,                                zip:  10010  },                  payment:  {  cc:  4567,                              exp:  Date(2011,  7,  7)  },        subtotal:  2355 }
  16. Single Table Inheritance - RDBMS shapes table id type area

    radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  17. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}
  18. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}})
  19. Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",

     type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) //  create  sparse  index >  db.shapes.ensureIndex({radius:  1},  {sparse:  true})
  20. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents
  21. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query
  22. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries
  23. One to Many - patterns - Embedded Array / Array

    Keys - Embedded Array / Array Keys - Embedded tree - Normalized
  24.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [      {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5      }        ],        comments_count:  1    }
  25.      {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),        

     author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ]    }        {   book_id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author  :  "Kyle",   date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",   text  :  "great  book",          votes  :  5    }    
  26. Referencing vs. Embedding - Embed when the ‘many’ objects always

    appear with their parent. - Reference when you need more flexibility.
  27. Many - Many Example: - Product can be in many

    categories - Category can have many products
  28. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     Many - Many
  29. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} Many - Many
  30. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} //  Multi-­‐key  index  on  array  fields >  db.products.ensureIndex({category_ids:  1}); >  db.categories.ensureIndex({product_ids:  1}); Many - Many
  31. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} Better alternative
  32. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:  ObjectId ("4c4ca25433fb5941681b912f")})     Alternative
  33. products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),        

     name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:          ObjectId("4c4ca25433fb5941681b912f")})   //  All  categories  for  a  given  product product    =  db.products.find(_id  :  some_id) >  db.categories.find({_id  :  {$in  :  product.category_ids}})   Alternative
  34. Trees Full Tree in Document {  comments:  [    

         {  author:  “Kyle”,  text:  “...”,                replies:  [                                            {author:  “Fred”,  text:  “...”,                                              replies:  []}                ]}    ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit    
  35. Trees Parent Links - Each node is stored as a

    document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
  36. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }
  37. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’})
  38. Array of Ancestors - Store all Ancestors of a node

       {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) //find  all  ancestors  of  f: >  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors >  db.tree2.find({_id:  {  $in  :  ancestors})
  39. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      }
  40. Queue Requirements • See jobs waiting, jobs in progress •

    Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } //  find  highest  priority  job  and  mark  as  in-­‐progress job  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1),                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}}})    
  41. Summary Schema design is different in MongoDB Basic data design

    principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)
  42. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]