Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Schema Design

MongoDB Schema Design

Presented at MongoDB Sydney. March 2012

Sridhar Nanjundeswaran

March 17, 2012
Tweet

More Decks by Sridhar Nanjundeswaran

Other Decks in Programming

Transcript

  1. Topics Schema Design 101 - Introduction • Basic Modelling &

    Schema evolution Schema Design 102 - Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues Schema design is easy! • Data as Objects in code
  2. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index

    Join Embedding & Linking Partition Shard Partition Key Shard Key
  3. Design Session Design documents that simply map to your application

    > post = {author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]} > db.posts.save(post)
  4. Find the document > db.posts.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé",

    date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] }
  5. Add and index, find via Index Secondary index for “author”

    // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... }
  6. Examine the query plan > db.blogs.find({author: "Hergé"}).explain() { "cursor" :

    "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } }
  7. Extending the Schema new_comment = {author: ”Sridhar", date: new Date(),

    text: "great book"} > db.posts.update( {text: "Destination Moon" }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1}} )
  8. Extending the Schema > db.blogs.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3")}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),

    author : "Hergé", …, comments : [ { author : ”Sridhar", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" }], comments_count: 1 }
  9. Extending the Schema // create index on nested documents: >

    db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({"comments.author":"Kyle"}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5) // most commented post:  db.posts.find().sort({comments_count:-1}).limit(1) When sorting, check if you need an index
  10. Summarize • Use the flexibility to iterate • Use richness

    of documents to map your actual objects • Index wisely • Common queries • Do not duplicate indexes (A) and (A,B) are not needed
  11. Single Table Inheritance - RDBMS shapes table id type area

    radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  12. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1",

    type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} missing values not stored!
  13. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1",

    type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})
  14. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1",

    type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // create index > db.shapes.ensureIndex({radius: 1}, {sparse:true}) index only values present!
  15. One to Many One to Many relationships can specify •

    degree of association between objects • containment • life-cycle
  16. One to Many Embedded Array • $slice operator to return

    subset of comments • some queries harder e.g find latest comments across all blogs blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : ”Sridhar", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ]}
  17. One to Many Normalized (2 collections) - most flexible -

    more queries blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), comments: [ {comment : 1)} ]} comments : { _id : 1, blog: 1000, author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z")} > blog = db.blogs.find({text: "Destination Moon"}); > db.comments.find({blog: blog._id});
  18. Many - Many Example: • Product can be in many

    categories • Category can have many products
  19. Many - Many products: { _id: 10, name: "Destination Moon",

    category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }
  20. Many - Many products: { _id: 10, name: "Destination Moon",

    category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } //All categories for a given product > db.categories.find({product_ids: 10})
  21. products: { _id: 10, name: "Destination Moon", category_ids: [ 20,

    30 ] } categories: { _id: 20, name: "adventure"} Alternative
  22. products: { _id: 10, name: "Destination Moon", category_ids: [ 20,

    30 ] } categories: { _id: 20, name: "adventure"} • // All products for a given category • > db.products.find({category_ids: 20)}) Alternative
  23. products: { _id: 10, name: "Destination Moon", category_ids: [ 20,

    30 ] } categories: { _id: 20, name: "adventure"} // All products for a given category > db.products.find({category_ids: 20)}) // All categories for a given product > product = db.products.find(_id : some_id) > db.categories.find({_id : {$in : product.category_ids}}) Alternative
  24. Trees Full Tree in Document { comments: [ { author:

    “Sridhar”, text: “...”, replies: [ {author: “James”, text: “...”, replies: []} ]} ] } • Pros: Single Document, Performance, Intuitive • Cons: Hard to search, Partial Results, 16MB limit •
  25. Array of Ancestors Store all Ancestors of a node {

    _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.msg_tree.find({thread: "b"}) A B C D E F
  26. Array of Ancestors Store all Ancestors of a node {

    _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.msg_tree.find({thread: "b"}) // find all direct message "b: replied to > db.msg_tree.find({replyTo: "b"}) A B C D E F
  27. Array of Ancestors Store all Ancestors of a node {

    _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.msg_tree.find({thread: "b"}) //find all ancestors of f: > threads = db.msg_tree.findOne({_id:"f"}).thread > db.msg_tree.find({_id: { $in : threads}) A B C D E F
  28. Trees as Paths Store hierarchy as a path expression -

    Separate each node by a delimiter, e.g. “/” - Use text search for find parts of a tree { comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jim’s comment", path: "jim" }, { author: "Kyle", text: "Kyle’s reply to Jim", path : "jim/kyle"} ] } // Find the conversations Jim was part of > db.posts.find({path: /^jim/i})
  29. Queue Need to maintain order and state Ensure that updates

    are atomic db.jobs.save( { inprogress: false, priority: 1, ... }); // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
  30. Queue Need to maintain order and state Ensure that updates

    are atomic db.jobs.save( { inprogress: false, priority: 1, ... }); // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
  31. Use MongoDB with your language 10gen Supported Drivers •Ruby, Python,

    Perl, PHP, Javascript •Java, C/C++, C#, Scala, Erlang, Haskell Object Data Mappers •Morphia – Java •Mongoid, MongoMapper - Ruby Community Drivers •Smalltalk, Clojure, Go, Groovy
  32. Design in code – Java driver // Get a connection

    to the database •DBCollection coll = new Mongo().getDB("blogs"); // Create the Object •Map<String, Object> obj = new HashMap... •obj.add("author", "Hergé"); •obj.add("text", "Destination Moon"); •obj.add("date", new Date()); // Insert the object into MongoDB •coll.insert(new BasicDBObject(obj));
  33. Design in code - Java using ODM • // Use

    Morphia annotations • @Entity • class Blog { • @Id • String author; • @Indexed • Date date; • String text; • }
  34. Design in code - Java using ODM • // Create

    the data store • Datastore ds = new Morphia().createDatastore() • // Create the Object • Blog entry = new Blog("Hergé", New Date(), "Destination Moon") • // Insert object into MongoDB • ds.save(entry);
  35. Summary • Schema design is different in MongoDB • Basic

    data design principles stay the same • Focus on how the application manipulates data • Rapidly evolve schema to meet your requirements • Enjoy your new freedom, use it wisely :-)