Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB DC 2012: Schema Design by Example

D8fc2580cfaca035f666d9e4ee79a7f7?s=47 mongodb
June 26, 2012
1.7k

MongoDB DC 2012: Schema Design by Example

MongoDB has been designed for versatility, but the techniques you might use to build, say, an analytics engine or a hierarchical data store might not be obvious. In this talk, we'll learn about MongoDB in practice by looking at four hypothetical application designs (based on real-world designs, of course). Topics to be covered include schema design, indexing, transactions (gasp!), trees, what's fast, and what's not. Sprinkled with tips, tricks, shoots, ladders, and trap doors, you're guaranteed to learn something new in this interdisciplinary talk.

D8fc2580cfaca035f666d9e4ee79a7f7?s=128

mongodb

June 26, 2012
Tweet

Transcript

  1. Schema'Design'by'Example MongoDC'2012 Robert'Stam Software'Engineer robert@10gen.com

  2. MongoDB Data Model Blog Posts & Comments Geospatial Check-Ins Ticket

    Sales Final Thoughts • • • • • Agenda
  3. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Roger", date : ISODate("2012-06-22T18:50:16Z"),

    text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : ISODate("2012-06-22T18:56:30Z"), text : "Best Movie Ever" } ] } MongoDB Data Model: Rich Documents
  4. Document properties Based on JSON (self describing) Stored as BSON

    BSON has a few additional data types (e.g. DateTime, Int32/64, ObjectId) Binary format allows for faster processing • • • •
  5. Terminology Comparison RDBMS MongoDB Table Collection Row Document Column Field

    Index Index Join Embedding;&;Linking Schema;Object
  6. User Name Email Address • • Category Name Url •

    • Article Name Slug Publish date Text • • • • Tag Name Url • • Comment Comment Date Author • • • Relational model
  7. User Name Email Address • • Article Name Slug Publish

    date Text Author • • • • • Tag[] Value • Comment[] Comment Date Author • • • Category[] Value • Document model
  8. Blog Posts and Comments

  9. How Should the Documents Look? What Are We Going to

    Do with the Data? To embed or to link... That is the question!
  10. { _id : ObjectId("4fe4c2848e6d3c0afb79bd1f"), blog_title : "Commuting to Work", blog_text

    : [ "This section is about airplanes", "this section is about trains" ], comments : [ { author : "Kevin Hanson", comment : "dude, what about driving?" }, { author : "John Smith", comment : "this blog is aWful!!11!!!!" } ] } 1) Fully Embedded
  11. 1) Fully embedded Pros Can query the comments or the

    blog for results Cleanly encapsulated Cons What if we get too many comments? (16MB MongoDB doc size) What if we want our results to be comments, not blog posts? • • • • • •
  12. { _id : ObjectId("4fe4c2848e6d3c0afb79bd1f") blog_title : "Commuting to Work", blog_text

    : [ "This section is about airplanes", "this section is about trains" ] } { _id : ObjectId("4fe4c31d869457dfc7ded270") blog_id : ObjectId("4fe4c2848e6d3c0afb79bd1f") commenter : "Kevin Hanson", comment : "dude, what about driving?" } { _id : ObjectId("4fe4c326869457dfc7ded271") blog_id : ObjectId("4fe4c2848e6d3c0afb79bd1f") commenter : "John Smith", comment : "this blog is aWful!!11!!!!" } 2) Each Comment Gets Own Doc
  13. 2) Each Comment Gets Own Doc Pros Can Query Individual

    Comments Never Need to Worry About Doc Size • • Cons Many Documents Standard Use Cases Become Complicated • •
  14. Managing Arrays Pushing to an Array Infinitely... Document Will Grow

    Larger than Allocated Space Document May Increase Max Doc Size of 16MB • • Can this be avoided?? Yes! A Hybrid of Linking and Embedding • •
  15. Geospatial Check-Ins

  16. We Need 3 Things Places Check-Ins Users

  17. Places Q: Current location A: Places near location User Generated

    Content Places
  18. var p = { _id : ObjectId("4fe925cee672e691fadee3af"), name : "10gen

    HQ", address : "578 Broadway, 7th Floor", city : "New York", zip : "10012" } > db.places.save(p) Inserting a Place
  19. { _id : ObjectId("4fe925cee672e691fadee3af"), name : "10gen HQ", address :

    "578 Broadway, 7th Floor", city : "New York", zip : "10012", tags : [ "MongoDB", "business" ], location : [72.0, 40.0], // longitude first! tips : [ { user : "kevin", time : ISODate("2012-03-15T22:45:12Z"), tip : "Make sure to stop by for office hours!" } ] } Tags, Geo Coordinates, and Tips
  20. > db.places.update( { name : "10gen HQ" }, { $push

    : { tips : { user : "nosh", time : ISODate("2012-04-15T14:22:05Z"), tip : "stop by for office hours on Wednesdays from 4-6" } }} ) Updating Tips
  21. Querying Places Creating)Indexes db.places.ensureIndex({)tags):)1)}) db.places.ensureIndex({)name):)1)}) db.places.ensureIndex({)location):)"2d")}) Finding)Places db.places.find({)location):){)$near):)[)70,)40)]}}) Regular)Expressions db.places.find({)name):)/^typeaheadstring/)})

    Using)Tags db.places.find({)tags):)"business")}) • ★ ★ ★ • ★ • ★ • ★
  22. User Check-Ins Record User Check-Ins Check-Ins Users Stats Users Stats

  23. > var user1 = { _id : ObjectId("4fe4c55d869457dfc7ded272"), name :

    "Kevin Hanson", email : "kevin@10gen.com", checkIns : [ ObjectId("4fe4c5ae869457dfc7ded273"), ObjectId("4fe4c5b9869457dfc7ded274")] } // checkins [] = references to checkIns Users
  24. Check-Ins > var checkIn = { _id : ObjectId("4fe4c5ae869457dfc7ded273"), place

    : "10gen HQ", ts : ISODate("2010-09-20T10:12:00Z"), user_id : ObjectId("4fe4c55d869457dfc7ded272") } Every&Check+In&is&Two&Operations &Insert&a&Check+In&Object&(check+ins&collection) &Update&($push)&user&object&with&check+in&ID&(users&collection) • •
  25. > db.checkins.find({ place : "10gen HQ" ) > db.checkins.find( {

    place : "10gen HQ" } ).sort({ ts : -1 }).limit(10) > db.checkins.find( { place : "10gen HQ", ts : { $gt : midnight }} ).count() Simple Stats
  26. > var mapFunc = function() { emit(this.place, 1); } >

    var reduceFunc = function(key, values) { return Array.sum(values); } > res = db.checkins.mapReduce( mapFunc, reduceFunc, { query : { timestamp: { $gt : nowminus3hrs }}} ) res = [{ _id : "10gen HQ", value : 17 }, ...] ..."or"try"using"the"new"aggregation"framework! Tyler"Brock"@"4:15PM"in"Melton"Rehearsal"Hall Stats w/ MapReduce
  27. Ticket Sales We want to able to hold seats while

    the purchase is in progress without using traditional transactions Data model SeatingChart collection Seat collection • • • •
  28. // SeatingChart document { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"),

    row : "A" }, available : { A1 : false, // not available A2 : true, // available .... } } SeatingChart collection
  29. // Seat document { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"),

    seat_id : "A1" }, status : "Available" } // full set of statuses is: // Available // Held // Authorizing // Purchased Seat collection
  30. Protocol to hold seats Find a block of available seats

    Use FindAndModify to atomically change status of seats from "Available" to "Held" If any are no longer "Available" set all "Held" seats back to "Available" and start over Only now go to SeatingChart and change the available flag to false for every seat in the block • • • •
  31. > var q = { _id : { event_id :

    ObjectId("4fe4f305869457dfc7ded276"), seat_id : "A1" }, status : "Available" } > var u = { { $set : { status : "Held", status_ts : new Date() } } } > db.seats.findAndModify({ query : q, update : u }) Query to change seat to "Held"
  32. // if all seats were changed to "Held" change seatingChart

    > var q = { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"), row : "A" } } > var u = { $set : { "available.A1" : false, "available.A2" : false, ... } } > db.seatingChart.update(q, u) Query to change available to false
  33. Protocol to purchase seats Change status of each seat from

    "Held" to "Authorizing" If any seats were no longer in "Held" status start over If payment authorization succeeds change status of each seat from "Authorizing" to "Purchased" If payment authorization does not suceed change status of each seat from "Authorizing" to "Held" and request new form of payment • • • •
  34. Compensating Actions What if customer just walks away? Periodically scan

    for seats that have been in "Held" status for longer than some limit (say 10 minutes) and set them back to "Available" • •
  35. Final Thoughts

  36. Data How the App Wants It There are no absolute

    rules about how to design your schema, only guidelines Think about how the application wants the data Not how it is most “normalized” Biggest decision is usually embedding vs linking • • • •
  37. @mongodb http://bit.ly/mongofb Facebook7777|777777Twitter7777|77777LinkedIn http://linkd.in/joinmongo More info at http://www.mongodb.org/ Robert7Stam Software7Engineer,710gen

    robert@10gen.com