Slide 1

Slide 1 text

Schema'Design'by'Example MongoDC'2012 Robert'Stam Software'Engineer [email protected]

Slide 2

Slide 2 text

MongoDB Data Model Blog Posts & Comments Geospatial Check-Ins Ticket Sales Final Thoughts • • • • • Agenda

Slide 3

Slide 3 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Roger", date : ISODate("2012-06-22T18:50:16Z"), text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : ISODate("2012-06-22T18:56:30Z"), text : "Best Movie Ever" } ] } MongoDB Data Model: Rich Documents

Slide 4

Slide 4 text

Document properties Based on JSON (self describing) Stored as BSON BSON has a few additional data types (e.g. DateTime, Int32/64, ObjectId) Binary format allows for faster processing • • • •

Slide 5

Slide 5 text

Terminology Comparison RDBMS MongoDB Table Collection Row Document Column Field Index Index Join Embedding;&;Linking Schema;Object

Slide 6

Slide 6 text

User Name Email Address • • Category Name Url • • Article Name Slug Publish date Text • • • • Tag Name Url • • Comment Comment Date Author • • • Relational model

Slide 7

Slide 7 text

User Name Email Address • • Article Name Slug Publish date Text Author • • • • • Tag[] Value • Comment[] Comment Date Author • • • Category[] Value • Document model

Slide 8

Slide 8 text

Blog Posts and Comments

Slide 9

Slide 9 text

How Should the Documents Look? What Are We Going to Do with the Data? To embed or to link... That is the question!

Slide 10

Slide 10 text

{ _id : ObjectId("4fe4c2848e6d3c0afb79bd1f"), blog_title : "Commuting to Work", blog_text : [ "This section is about airplanes", "this section is about trains" ], comments : [ { author : "Kevin Hanson", comment : "dude, what about driving?" }, { author : "John Smith", comment : "this blog is aWful!!11!!!!" } ] } 1) Fully Embedded

Slide 11

Slide 11 text

1) Fully embedded Pros Can query the comments or the blog for results Cleanly encapsulated Cons What if we get too many comments? (16MB MongoDB doc size) What if we want our results to be comments, not blog posts? • • • • • •

Slide 12

Slide 12 text

{ _id : ObjectId("4fe4c2848e6d3c0afb79bd1f") blog_title : "Commuting to Work", blog_text : [ "This section is about airplanes", "this section is about trains" ] } { _id : ObjectId("4fe4c31d869457dfc7ded270") blog_id : ObjectId("4fe4c2848e6d3c0afb79bd1f") commenter : "Kevin Hanson", comment : "dude, what about driving?" } { _id : ObjectId("4fe4c326869457dfc7ded271") blog_id : ObjectId("4fe4c2848e6d3c0afb79bd1f") commenter : "John Smith", comment : "this blog is aWful!!11!!!!" } 2) Each Comment Gets Own Doc

Slide 13

Slide 13 text

2) Each Comment Gets Own Doc Pros Can Query Individual Comments Never Need to Worry About Doc Size • • Cons Many Documents Standard Use Cases Become Complicated • •

Slide 14

Slide 14 text

Managing Arrays Pushing to an Array Infinitely... Document Will Grow Larger than Allocated Space Document May Increase Max Doc Size of 16MB • • Can this be avoided?? Yes! A Hybrid of Linking and Embedding • •

Slide 15

Slide 15 text

Geospatial Check-Ins

Slide 16

Slide 16 text

We Need 3 Things Places Check-Ins Users

Slide 17

Slide 17 text

Places Q: Current location A: Places near location User Generated Content Places

Slide 18

Slide 18 text

var p = { _id : ObjectId("4fe925cee672e691fadee3af"), name : "10gen HQ", address : "578 Broadway, 7th Floor", city : "New York", zip : "10012" } > db.places.save(p) Inserting a Place

Slide 19

Slide 19 text

{ _id : ObjectId("4fe925cee672e691fadee3af"), name : "10gen HQ", address : "578 Broadway, 7th Floor", city : "New York", zip : "10012", tags : [ "MongoDB", "business" ], location : [72.0, 40.0], // longitude first! tips : [ { user : "kevin", time : ISODate("2012-03-15T22:45:12Z"), tip : "Make sure to stop by for office hours!" } ] } Tags, Geo Coordinates, and Tips

Slide 20

Slide 20 text

> db.places.update( { name : "10gen HQ" }, { $push : { tips : { user : "nosh", time : ISODate("2012-04-15T14:22:05Z"), tip : "stop by for office hours on Wednesdays from 4-6" } }} ) Updating Tips

Slide 21

Slide 21 text

Querying Places Creating)Indexes db.places.ensureIndex({)tags):)1)}) db.places.ensureIndex({)name):)1)}) db.places.ensureIndex({)location):)"2d")}) Finding)Places db.places.find({)location):){)$near):)[)70,)40)]}}) Regular)Expressions db.places.find({)name):)/^typeaheadstring/)}) Using)Tags db.places.find({)tags):)"business")}) • ★ ★ ★ • ★ • ★ • ★

Slide 22

Slide 22 text

User Check-Ins Record User Check-Ins Check-Ins Users Stats Users Stats

Slide 23

Slide 23 text

> var user1 = { _id : ObjectId("4fe4c55d869457dfc7ded272"), name : "Kevin Hanson", email : "[email protected]", checkIns : [ ObjectId("4fe4c5ae869457dfc7ded273"), ObjectId("4fe4c5b9869457dfc7ded274")] } // checkins [] = references to checkIns Users

Slide 24

Slide 24 text

Check-Ins > var checkIn = { _id : ObjectId("4fe4c5ae869457dfc7ded273"), place : "10gen HQ", ts : ISODate("2010-09-20T10:12:00Z"), user_id : ObjectId("4fe4c55d869457dfc7ded272") } Every&Check+In&is&Two&Operations &Insert&a&Check+In&Object&(check+ins&collection) &Update&($push)&user&object&with&check+in&ID&(users&collection) • •

Slide 25

Slide 25 text

> db.checkins.find({ place : "10gen HQ" ) > db.checkins.find( { place : "10gen HQ" } ).sort({ ts : -1 }).limit(10) > db.checkins.find( { place : "10gen HQ", ts : { $gt : midnight }} ).count() Simple Stats

Slide 26

Slide 26 text

> var mapFunc = function() { emit(this.place, 1); } > var reduceFunc = function(key, values) { return Array.sum(values); } > res = db.checkins.mapReduce( mapFunc, reduceFunc, { query : { timestamp: { $gt : nowminus3hrs }}} ) res = [{ _id : "10gen HQ", value : 17 }, ...] ..."or"try"using"the"new"aggregation"framework! Tyler"Brock"@"4:15PM"in"Melton"Rehearsal"Hall Stats w/ MapReduce

Slide 27

Slide 27 text

Ticket Sales We want to able to hold seats while the purchase is in progress without using traditional transactions Data model SeatingChart collection Seat collection • • • •

Slide 28

Slide 28 text

// SeatingChart document { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"), row : "A" }, available : { A1 : false, // not available A2 : true, // available .... } } SeatingChart collection

Slide 29

Slide 29 text

// Seat document { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"), seat_id : "A1" }, status : "Available" } // full set of statuses is: // Available // Held // Authorizing // Purchased Seat collection

Slide 30

Slide 30 text

Protocol to hold seats Find a block of available seats Use FindAndModify to atomically change status of seats from "Available" to "Held" If any are no longer "Available" set all "Held" seats back to "Available" and start over Only now go to SeatingChart and change the available flag to false for every seat in the block • • • •

Slide 31

Slide 31 text

> var q = { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"), seat_id : "A1" }, status : "Available" } > var u = { { $set : { status : "Held", status_ts : new Date() } } } > db.seats.findAndModify({ query : q, update : u }) Query to change seat to "Held"

Slide 32

Slide 32 text

// if all seats were changed to "Held" change seatingChart > var q = { _id : { event_id : ObjectId("4fe4f305869457dfc7ded276"), row : "A" } } > var u = { $set : { "available.A1" : false, "available.A2" : false, ... } } > db.seatingChart.update(q, u) Query to change available to false

Slide 33

Slide 33 text

Protocol to purchase seats Change status of each seat from "Held" to "Authorizing" If any seats were no longer in "Held" status start over If payment authorization succeeds change status of each seat from "Authorizing" to "Purchased" If payment authorization does not suceed change status of each seat from "Authorizing" to "Held" and request new form of payment • • • •

Slide 34

Slide 34 text

Compensating Actions What if customer just walks away? Periodically scan for seats that have been in "Held" status for longer than some limit (say 10 minutes) and set them back to "Available" • •

Slide 35

Slide 35 text

Final Thoughts

Slide 36

Slide 36 text

Data How the App Wants It There are no absolute rules about how to design your schema, only guidelines Think about how the application wants the data Not how it is most “normalized” Biggest decision is usually embedding vs linking • • • •

Slide 37

Slide 37 text

@mongodb http://bit.ly/mongofb Facebook7777|777777Twitter7777|77777LinkedIn http://linkd.in/joinmongo More info at http://www.mongodb.org/ Robert7Stam Software7Engineer,710gen [email protected]