Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoSP: MongoDB Schema Design

MongoSP: MongoDB Schema Design

Greg Studer's presentation at Mongo Sao Paulo

mongodb

July 29, 2011
Tweet

More Decks by mongodb

Other Decks in Programming

Transcript

  1. Topics Introduction • Working with documents • Evolving a schema

    • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues
  2. Schema-design Criteria How can we manipulate this data? • Dynamic

    Queries • Secondary Indexes • Atomic Updates • Map Reduce Considerations • No Joins • Document writes are atomic Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle
  3. A simple start: Map the documents to your application. post

    = { author: “Hergé”, date: new Date(), text: “Destination Moon”, tags: [“comic”, “adventure”] } > db.post.save(post)
  4. > db.posts.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: "Sat Jul

    24 2010 19:47:11 GMT-0700 (PDT)", text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document
  5. Secondary index on “author” > db.posts.ensureIndex({ author : 1 })

    > db.posts.find({ author : 'Hergé' }) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", author: "Hergé", ... } Add an index, find via index
  6. Examine the query plan > db.blogs.find({ author : 'Hergé' }).explain()

    { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } }
  7. Multi-key indexes // Build an index on the ‘tags’ array

    > db.posts.ensureIndex({ tags : 1 }) // find posts with a specific tag // (This will use an index!) > db.posts.find({ tags : ‘comic’ })
  8. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size,

    $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll
  9. Extending the Schema new_comment = { author : “Kyle”, date

    : new Date(), text : “great book”, votes : 5 } > db.posts.update( { text : “Destination Moon” }, { ‘$push’ : {comments: new_comment } , ‘$inc’ : {comments_count: 1 } } )
  10. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : "Sat

    Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "great book", votes : 5 } ], comments_count: 1 } Extending the Schema
  11. // create index on nested documents: > db.posts.ensureIndex({ "comments.author" :

    1 }) > db.posts.find({ "comments.author" : ”Kyle” }) The ‘dot’ operator
  12. // create index on nested documents: > db.posts.ensureIndex({ "comments.author" :

    1 }) > db.posts.find({ "comments.author" : ”Kyle” }) // create index comment votes: > db.posts.ensureIndex({ "comments.votes" : 1 }) // find all posts with any comments with more than // 50 votes > db.posts.find({ "comments.votes" : { $gt : 50 } }) The ‘dot’ operator
  13. Rich Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ {

    sku: ‘tt-123’, name : ‘Tintin a la Lune’ }, { ski: ‘tt-457’, name : ‘Coke en stock’ } ], address : { name : ‘Banker’, street : ‘111 Main’, zip : 10010 }, payment : { cc : 4567, exp : Date(2011, 7, 7) }, subtotal : 2355 }
  14. Single Table Inheritance - RDBMS shapes_table id type area radiu

    s d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  15. Single Table Inheritance - MongoDB > db.shapes.find() { _id :

    "1", type : "circle", area : 3.14, radius : 1} { _id : "2", type : "square", area : 4, d : 2} { _id : "3", type : "rect", area : 10, length : 5, width : 2}
  16. Single Table Inheritance - MongoDB > db.shapes.find() { _id :

    "1", type : "circle", area : 3.14, radius : 1 } { _id : "2", type : "square", area : 4, d : 2 } { _id : "3", type : "rect", area : 10, length : 5, width : 2 } // find shapes where radius > 0 > db.shapes.find({ radius : { $gt : 0 } })
  17. Single Table Inheritance - MongoDB > db.shapes.find() { _id :

    "1", type : "circle", area : 3.14, radius : 1 } { _id : "2", type : "square", area : 4, d : 2 } { _id : "3", type : "rect", area : 10, length : 5, width : 2 } // find shapes where radius > 0 > db.shapes.find({ radius : { $gt : 0 } }) // create sparse index > db.shapes.ensureIndex({ radius : 1 }, { sparse : true })
  18. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g. find latest comments across all documents
  19. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g. find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query
  20. One to Many - Embedded Array / Array Keys -

    $slice operator to return subset of array - some queries hard e.g. find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries
  21. One to Many - patterns - Embedded Array / Array

    Keys - Embedded Array / Array Keys - Embedded tree - Normalized
  22. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : "Sat

    Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "great book", votes : 5 } ], comments_count: 1 }
  23. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : "Sat

    Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Destination Moon", tags : [ "comic", "adventure" ] } { book_id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "great book", votes : 5 }
  24. Referencing vs. Embedding - Embed when the ‘many’ objects always

    appear with their parent. - Reference when you need more flexibility.
  25. Many - Many Example: - Product can be in many

    categories - Category can have many products
  26. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Destination Moon", category_ids

    : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”] } Many - Many
  27. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Destination Moon", category_ids

    : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”] } categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "adventure", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"] } Many - Many
  28. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Destination Moon", category_ids

    : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”] } categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "adventure", product_ids : [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"] } // Multi-key index on array fields > db.products.ensureIndex({ category_ids : 1 }); > db.categories.ensureIndex({ product_ids : 1 }); Many - Many
  29. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Destination Moon", category_ids

    : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”] } categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "adventure" } Better alternative
  30. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Destination Moon", category_ids

    : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”] } categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "adventure" } // All products for a given category > db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f") }) Alternative
  31. products: { _id : ObjectId("4c4ca23933fb5941681b912e"), name : "Destination Moon", category_ids

    : [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”] } categories: { _id : ObjectId("4c4ca25433fb5941681b912f"), name : "adventure" } // All products for a given category > db.products.find({ category_ids : ObjectId("4c4ca25433fb5941681b912f") }) // All categories for a given product product = db.products.find({ _id : some_id }) > db.categories.find({ _id : { $in : product.category_ids } }) Alternative
  32. Trees Full Tree in Document { comments: [ { author:

    “Kyle”, text: “...”, replies: [ { author: “Fred”, text: “...”, replies: [ ] } ] } ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit
  33. Trees Parent Links - Each node is stored as a

    document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)
  34. Array of Ancestors - Store all Ancestors of a node

    { _id : "a" } { _id : "b", ancestors : [ "a" ], parent : "a" } { _id : "c", ancestors : [ "a", "b" ], parent : "b" } { _id : "d", ancestors : [ "a", "b" ], parent : "b" } { _id : "e", ancestors : [ "a" ], parent : "a" } { _id : "f", ancestors : [ "a", "e" ], parent : "e" }
  35. Array of Ancestors - Store all Ancestors of a node

    { _id : "a" } { _id : "b", ancestors : [ "a" ], parent : "a" } { _id : "c", ancestors : [ "a", "b" ], parent : "b" } { _id : "d", ancestors : [ "a", "b" ], parent : "b" } { _id : "e", ancestors : [ "a" ], parent : "a" } { _id : "f", ancestors : [ "a", "e" ], parent : "e" } //find all descendants of b: > db.tree2.find({ ancestors : ‘b’ }) //find all direct descendants of b: > db.tree2.find({ parent : ‘b’ })
  36. Array of Ancestors - Store all Ancestors of a node

    { _id : "a" } { _id : "b", ancestors : [ "a" ], parent : "a" } { _id : "c", ancestors : [ "a", "b" ], parent : "b" } { _id : "d", ancestors : [ "a", "b" ], parent : "b" } { _id : "e", ancestors : [ "a" ], parent : "a" } { _id : "f", ancestors : [ "a", "e" ], parent : "e" } //find all descendants of b: > db.tree2.find({ ancestors : ‘b’ }) //find all direct descendants of b: > db.tree2.find({ parent : ‘b’ }) //find all ancestors of f:
  37. Queue Requirements - See jobs waiting, jobs in progress -

    Ensure that each job is started once and only once { inprogress : false, priority : 1, message : “Rich documents FTW!”, ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query : { inprogress : false }, sort : { priority: -1 }, update : { $set : { inprogress : true, started : new Date() } } })
  38. Summary Schema design is different in MongoDB Basic data design

    principles remain the same Focus on how the app manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)
  39. @mongodb conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter |

    LinkedIn http://bit.ly/mongofb http://linkd.in/joinmongo download at mongodb.org Looking for translators! http://pt.wiki.mongodb.org/display/DOCS/Home
  40. @mongodb conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter |

    LinkedIn http://bit.ly/mongofb http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]