Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoNYC 2012: Schema Design by Example

mongodb
May 25, 2012
510

MongoNYC 2012: Schema Design by Example

MongoNYC 2012: Schema Design by Example, Kyle Banker, 10gen. MongoDB has been designed for versatility, but the techniques you might use to build, say, an analytics engine or a hierarchical data store might not be obvious. In this talk, we'll learn about MongoDB in practice by looking at hypothetical application designs (based on real-world designs, of course). Topics to be covered include schema design, indexing, transactions (gasp!), trees, what's fast, and what's not. Sprinkled with tips, tricks, shoots, ladders, and trap doors, you're guaranteed to learn something new in this interdisciplinary talk.

mongodb

May 25, 2012
Tweet

Transcript

  1. Strategy Start with a normalized model. Use a rich, 'embedded'

    document for simplicity and / or performance.
  2. Two Strategies Store a map of the hierarchy in a

    single document. Store a list of ancestors (what we'll do).
  3. A sample node: { _id: ObjectId("6a5b1476238d3b4dd5000048"), parent_id: ObjectId("9a9fb1476238d3b4dd5000001"), name: "Hard

    bop", description: "Pioneered by ...", slug: "hard-bop", ancestors: [ {name: "Bop", _id: ObjectId("8b87fb1476238d3b4dd5000003"), slug: "bop" }, {name: "Ragtime", _id: ObjectId("9a9fb1476238d3b4dd5000001"), slug: "ragtime"} ] }
  4. Build an ancestor list, and update the category. function build_ancestors(

    _id, parent_id ) { ancestor_list = [] while( parent = db.cat.findOne( {_id: parent_id}, { name: 1, slug: 1, _id: 1 } ) { ancestor_list.unshift( parent ); parent_id = parent.parent_id } db.cat.update({_id: _id}, {"$set" {ancestors: ancestor_list}} ); }
  5. Create a document for 'swing'... category = { parent_id: ragtime_id,

    slug: "swing", name: "Swing", description: "It don't mean a thing if it ain't got that..." };
  6. Then, update the ancestor list for all descendents of 'Bop'.

    query = {'ancestors.id': bop_id}; db.cat.find(query).forEach(function(category) { build_ancestors(category['_id'], category['parent_id']); }
  7. Bank accounts Transferring money is not a transaction. Distributed systems,

    long-running transactions, and contentious environments may require a different transactional strategy (Jim Gray. "The Transaction Concept: Virtues and Limitations").
  8. One document for the seating map. // Orchestra section map.

    { _id: ObjectId("8b87fb1476238d3b4dd5000003"), event_id: 133, section: 'ORCH', seats: { 'A1': 1 'A2': 1 'A3': 0 'A4': 0 'A5': 0 // 'A5' - 'J20' } }
  9. 2. One document for each seat. { _id: { event_id:

    133, seat: 'A1' }, order_id: null, updated: new Date(), state: 'AVAILABLE' }
  10. The protocol: User requests seats A3, A4, and A5. We

    read-modify-write the state of each document. Then we update the seating map.
  11. // Transition state, and roll back on failure. [ 'A3',

    'A4', 'A5' ].forEach(function(n) { q = { _id: { event_id: 133, seat: n } }; u = { $set: { order_id: 333, state: 'CART', updated: new Date() } }; if( ! db.seats.findAndModify({ query: q, update: u }) rollback(); }
  12. // If successful, modify seat map. db.seat_maps.update({ event_id: 133 },

    {$set: { 'seats.A2': 1, 'seats.A3': 1, 'seats.A4': 1 });
  13. Cleanup protocol: Periodic daemon. Runs every minute. Revert if 'CART'

    or 'PENDING_AUTH' and older than 10 minutes. Set to 'AVAILABLE'.
  14. Checkout // Transition state, and roll back on failure. [

    'A3', 'A4', 'A5' ].forEach(function(n) { q = { _id: { event_id: 133, seat: n } }; u = { $set: { order_id: 333, state: 'PENDING_AUTH', updated: new Date() } }; if( ! db.seats.findAndModify({ query: q, update: u }) rollback(); } db.orders.update()
  15. Authorize // Authorize credit card. // If failure, rollback() tickets

    to 'CART' status. // If successful, transition tickets to 'PURCHASED'. // Add tickets to order object. db.orders.update( _id: 333, { $pushAll: { line_items: [ 'A3', 'A4', 'A5' ] });
  16. Summary Use atomic writes and read-modify write semantics. Define a

    transactional procedure using compensation mechanisms. If you're using MongoDB, and you need basic transactional semantics, you have some tools.
  17. User document (stores subscribed feeds) { _id: ObjectId("4e316f236ce9cca7ef17d59d"), username: 'kbanker',

    feeds: [ { _id: ObjectId("4e316f236ce9cca7ef17d54c"), name: "GigaOM" }, { _id: ObjectId("4e316f236ce9cca7ef17d54d"), name: "Salon.com" } { _id: ObjectId("4e316f236ce9cca7ef17d54e"), name: "Google News" } ], latest: Date(2011, 7, 25) }
  18. Adding a feed with an upsert db.feeds.update( { url: 'http://news.google.com/feed.xml/',

    name: 'Google News'}, { $inc: {subscriber_count: 1} }, true )
  19. Subscribing a user to the feed db.users.update( {_id: ObjectId("4e316f236ce9cca7ef17d59d") },

    { $addToSet: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Google News' } } )
  20. Unsubscribing from a feed db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") }, {

    $pull: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Google News' } } )
  21. Feed entries (populate in background) { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"),

    title: 'Lorem Ipsum...', body: 'Vestibulum eget dui metus...', reads: 5, date: Date(2011, 7, 27) }
  22. What we need now Populate personal feeds (buckets) Avoid lots

    of expensive queries Record what's been read
  23. But with bucketing, we can be smarter. entries = db.entries.find(

    { date: {$gt: user_latest }, feed_id: { $in: user_feed_ids } ).sort({date: -1})
  24. bucket = { _id: ObjectId("4e3185c26ce9cca7ef17d552"), user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: Date( 2011,

    7, 27 ) n: 100, entries: [ { _id: ObjectId("4e316b8a6ce9cca7ef17d5ac"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Lorem Ipsum...', body: 'Vestibulum eget dui metus...', date: Date(2011, 7, 27), read: false }, { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Panda bear waves hello', body: 'A panda bear at the local zoo...', date: Date(2011, 7, 27), read: false } ] }
  25. The bucket is a strategic denormalization 1 query + 1

    seek (if not in memory) = document-level locality
  26. Insert the bucket and set the latest entry db.buckets.insert( bucket

    ) db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") } { $set: { date: latest_entry_date } } )
  27. Mark a personal feed item as 'read'. db.buckets.update( { user_id:

    ObjectId("4e316f236ce9cca7ef17d59d"), 'entries.id': ObjectId("4e316b8a6ce9cca7ef17d5c8")}, { $set: { 'entries.$.read' : true } )
  28. Increment the read count on the entry... db.entries.update( { _id:

    ObjectId("4e316b8a6ce9cca7ef17d5c8") }, { $inc: { read: 1 } } )
  29. Summary Use rich document structures for caching. Take advantage of

    locality of reference. Think about sharded schema design.