Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB DC 2012: Schema Design by Example

mongodb
June 26, 2012
1.8k

MongoDB DC 2012: Schema Design by Example

MongoDB has been designed for versatility, but the techniques you might use to build, say, an analytics engine or a hierarchical data store might not be obvious. In this talk, we'll learn about MongoDB in practice by looking at four hypothetical application designs (based on real-world designs, of course). Topics to be covered include schema design, indexing, transactions (gasp!), trees, what's fast, and what's not. Sprinkled with tips, tricks, shoots, ladders, and trap doors, you're guaranteed to learn something new in this interdisciplinary talk.

mongodb

June 26, 2012
Tweet

Transcript

  1. MongoDB Data Model Blog Posts & Comments Geospatial Check-Ins Ticket

    Sales Final Thoughts • • • • • Agenda
  2. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Roger", date : ISODate("2012-06-22T18:50:16Z"),

    text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : ISODate("2012-06-22T18:56:30Z"), text : "Best Movie Ever" } ] } MongoDB Data Model: Rich Documents
  3. Document properties Based on JSON (self describing) Stored as BSON

    BSON has a few additional data types (e.g. DateTime, Int32/64, ObjectId) Binary format allows for faster processing • • • •
  4. Terminology Comparison RDBMS MongoDB Table Collection Row Document Column Field

    Index Index Join Embedding;&;Linking Schema;Object
  5. User Name Email Address • • Category Name Url •

    • Article Name Slug Publish date Text • • • • Tag Name Url • • Comment Comment Date Author • • • Relational model
  6. User Name Email Address • • Article Name Slug Publish

    date Text Author • • • • • Tag[] Value • Comment[] Comment Date Author • • • Category[] Value • Document model
  7. How Should the Documents Look? What Are We Going to

    Do with the Data? To embed or to link... That is the question!
  8. { _id : ObjectId("4fe4c2848e6d3c0afb79bd1f"), blog_title : "Commuting to Work", blog_text

    : [ "This section is about airplanes", "this section is about trains" ], comments : [ { author : "Kevin Hanson", comment : "dude, what about driving?" }, { author : "John Smith", comment : "this blog is aWful!!11!!!!" } ] } 1) Fully Embedded
  9. 1) Fully embedded Pros Can query the comments or the

    blog for results Cleanly encapsulated Cons What if we get too many comments? (16MB MongoDB doc size) What if we want our results to be comments, not blog posts? • • • • • •
  10. { _id : ObjectId("4fe4c2848e6d3c0afb79bd1f") blog_title : "Commuting to Work", blog_text

    : [ "This section is about airplanes", "this section is about trains" ] } { _id : ObjectId("4fe4c31d869457dfc7ded270") blog_id : ObjectId("4fe4c2848e6d3c0afb79bd1f") commenter : "Kevin Hanson", comment : "dude, what about driving?" } { _id : ObjectId("4fe4c326869457dfc7ded271") blog_id : ObjectId("4fe4c2848e6d3c0afb79bd1f") commenter : "John Smith", comment : "this blog is aWful!!11!!!!" } 2) Each Comment Gets Own Doc
  11. 2) Each Comment Gets Own Doc Pros Can Query Individual

    Comments Never Need to Worry About Doc Size • • Cons Many Documents Standard Use Cases Become Complicated • •
  12. Managing Arrays Pushing to an Array Infinitely... Document Will Grow

    Larger than Allocated Space Document May Increase Max Doc Size of 16MB • • Can this be avoided?? Yes! A Hybrid of Linking and Embedding • •
  13. var p = { _id : ObjectId("4fe925cee672e691fadee3af"), name : "10gen

    HQ", address : "578 Broadway, 7th Floor", city : "New York", zip : "10012" } > db.places.save(p) Inserting a Place
  14. { _id : ObjectId("4fe925cee672e691fadee3af"), name : "10gen HQ", address :

    "578 Broadway, 7th Floor", city : "New York", zip : "10012", tags : [ "MongoDB", "business" ], location : [72.0, 40.0], // longitude first! tips : [ { user : "kevin", time : ISODate("2012-03-15T22:45:12Z"), tip : "Make sure to stop by for office hours!" } ] } Tags, Geo Coordinates, and Tips
  15. > db.places.update( { name : "10gen HQ" }, { $push

    : { tips : { user : "nosh", time : ISODate("2012-04-15T14:22:05Z"), tip : "stop by for office hours on Wednesdays from 4-6" } }} ) Updating Tips
  16. > var user1 = { _id : ObjectId("4fe4c55d869457dfc7ded272"), name :

    "Kevin Hanson", email : "[email protected]", checkIns : [ ObjectId("4fe4c5ae869457dfc7ded273"), ObjectId("4fe4c5b9869457dfc7ded274")] } // checkins [] = references to checkIns Users
  17. Check-Ins > var checkIn = { _id : ObjectId("4fe4c5ae869457dfc7ded273"), place

    : "10gen HQ", ts : ISODate("2010-09-20T10:12:00Z"), user_id : ObjectId("4fe4c55d869457dfc7ded272") } Every&Check+In&is&Two&Operations &Insert&a&Check+In&Object&(check+ins&collection) &Update&($push)&user&object&with&check+in&ID&(users&collection) • •
  18. > db.checkins.find({ place : "10gen HQ" ) > db.checkins.find( {

    place : "10gen HQ" } ).sort({ ts : -1 }).limit(10) > db.checkins.find( { place : "10gen HQ", ts : { $gt : midnight }} ).count() Simple Stats
  19. > var mapFunc = function() { emit(this.place, 1); } >

    var reduceFunc = function(key, values) { return Array.sum(values); } > res = db.checkins.mapReduce( mapFunc, reduceFunc, { query : { timestamp: { $gt : nowminus3hrs }}} ) res = [{ _id : "10gen HQ", value : 17 }, ...] ..."or"try"using"the"new"aggregation"framework! Tyler"Brock"@"4:15PM"in"Melton"Rehearsal"Hall Stats w/ MapReduce
  20. Ticket Sales We want to able to hold seats while

    the purchase is in progress without using traditional transactions Data model SeatingChart collection Seat collection • • • •
  21. // SeatingChart document { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"),

    row : "A" }, available : { A1 : false, // not available A2 : true, // available .... } } SeatingChart collection
  22. // Seat document { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"),

    seat_id : "A1" }, status : "Available" } // full set of statuses is: // Available // Held // Authorizing // Purchased Seat collection
  23. Protocol to hold seats Find a block of available seats

    Use FindAndModify to atomically change status of seats from "Available" to "Held" If any are no longer "Available" set all "Held" seats back to "Available" and start over Only now go to SeatingChart and change the available flag to false for every seat in the block • • • •
  24. > var q = { _id : { event_id :

    ObjectId("4fe4f305869457dfc7ded276"), seat_id : "A1" }, status : "Available" } > var u = { { $set : { status : "Held", status_ts : new Date() } } } > db.seats.findAndModify({ query : q, update : u }) Query to change seat to "Held"
  25. // if all seats were changed to "Held" change seatingChart

    > var q = { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"), row : "A" } } > var u = { $set : { "available.A1" : false, "available.A2" : false, ... } } > db.seatingChart.update(q, u) Query to change available to false
  26. Protocol to purchase seats Change status of each seat from

    "Held" to "Authorizing" If any seats were no longer in "Held" status start over If payment authorization succeeds change status of each seat from "Authorizing" to "Purchased" If payment authorization does not suceed change status of each seat from "Authorizing" to "Held" and request new form of payment • • • •
  27. Compensating Actions What if customer just walks away? Periodically scan

    for seats that have been in "Held" status for longer than some limit (say 10 minutes) and set them back to "Available" • •
  28. Data How the App Wants It There are no absolute

    rules about how to design your schema, only guidelines Think about how the application wants the data Not how it is most “normalized” Biggest decision is usually embedding vs linking • • • •