Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design By Example - Kyle Banker, 10gen

mongodb
November 01, 2011

Schema Design By Example - Kyle Banker, 10gen

MongoChicago 2011

MongoDB has been designed for versatility, but the techniques you might use to build, say, an analytics engine or a hierarchical data store might...... not be obvious. In this talk, we'll learn about MongoDB in practice by looking at four hypothetical application designs (based on real-world designs, of course). Topics to be covered include schema design, indexing, transactions (gasp!), trees, what's fast, and what's not. Sprinkled with tips, tricks, shoots, ladders, and trap doors, you're guaranteed to learn something new in this interdisciplinary talk.

mongodb

November 01, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. A sample node: { _id: ObjectId("6a5b1476238d3b4dd5000048"), parent_id: ObjectId("9a9fb1476238d3b4dd5000001"), name: "Hard

    bop", description: "Pioneered by ...", slug: "hard-bop", ancestors: [ {name: "Bop", _id: ObjectId("8b87fb1476238d3b4dd5000003"), :slug => "bop" }, {name: "Ragtime", _id: ObjectId("9a9fb1476238d3b4dd5000001"), slug: "swing"} ] }
  2. Reading and displaying a node: // Display the 'Hard Bop'

    category // with links to its parents. db.categories.find( {slug: 'hard-bop'} );
  3. You'll want this index: // Display all descendents of "Bop"

    db.categories.ensureIndex( {'ancestors.slug': 1 } )
  4. // Build ancestor list for a single category, // and

    then update the category. function build_ancestors( _id, parent_id) { ancestor_list = [] while( parent = db.cat.findOne( {_id: parent_id}, { name: 1, slug: 1, _id: 1 } ) { ancestor_list.unshift( parent ); parent_id = parent.parent_id } db.cat.update({_id: _id}, {"$set" {ancestors: ancestor_list}} ); }
  5. category = { parent_id: ragtime_id, slug: "swing", name: "Swing", description:

    "It don't mean a thing if it ain't got that..." }; swing_id = db.cat.insert(category); build_ancestors(swing_id, ragtime_id);
  6. // Rebuild the ancestor list for // all descendents of

    'Bop' query = {'ancestors.id': bop_id}; db.cat.find(query).forEach(function(category) { build_ancestors(category['_id'], category['parent_id']); }
  7. // Update the main 'Bop' document. doc = db.cat.findOne({_id: bop_id});

    doc['name'] = 'Bebop'; doc = db.cat.update({_id: bop_id}, doc); // Multi-update with positional operator ($) db.cat.update({'ancestors.id': bop_id}, {'$set' => {'ancestors.$': doc}}, false, true)
  8. Bank accounts Transferring money is not a transaction. Distributed systems,

    long-running transactions, and contentious environments may require a different transactional strategy (Jim Gray. "The Transaction Concept: Virtues and Limitations").
  9. One document for the seating map. // Orchestra section map.

    { _id: ObjectId("8b87fb1476238d3b4dd5000003"), event_id: 133, section: 'ORCH', seats: { 'A1': 1 'A2': 1 'A3': 0 'A4': 0 'A5': 0 // 'A5' - 'J20' } }
  10. 2. One document for each seat. { _id: { event_id:

    133, seat: 'A1' }, order_id: null, updated: new Date(), state: 'AVAILABLE' }
  11. The protocol: User requests seats A3, A4, and A5. We

    read-modify-write the state of each document. Then we update the seating map.
  12. // Transition state, and roll back on failure. [ 'A3',

    'A4', 'A5' ].forEach(function(n) { q = { _id: { event_id: 133, seat: n } }; u = { $set: { order_id: 333, state: 'CART', updated: new Date() } }; if( ! db.seats.findAndModify({ query: q, update: u }) rollback(); }
  13. // If successful, modify seat map. db.seat_maps.update({ event_id: 133 },

    {$set: { 'seats.A2': 1, 'seats.A3': 1, 'seats.A4': 1 });
  14. Cleanup protocol: Periodic daemon. Runs every minute. Revert if 'CART'

    or 'PENDING_AUTH' and older than 10 minutes. Set to 'AVAILABLE'.
  15. Checkout // Transition state, and roll back on failure. [

    'A3', 'A4', 'A5' ].forEach(function(n) { q = { _id: { event_id: 133, seat: n } }; u = { $set: { order_id: 333, state: 'PENDING_AUTH', updated: new Date() } }; if( ! db.seats.findAndModify({ query: q, update: u }) rollback(); } db.orders.update()
  16. Authorize // Authorize credit card. // If failure, rollback() tickets

    to 'CART' status. // If successful, transition tickets to 'PURCHASED'. // Add tickets to order object. db.orders.update( _id: 333, { $pushAll: { line_items: [ 'A3', 'A4', 'A5' ] });
  17. Summary Use atomic writes and read-modify write semantics. Define a

    transactional procedure using compensation mechanisms. Not right for everything. But if you're using MongoDB, and you need basic transactional semantics, you have some tools.
  18. Users { _id: ObjectId("4e316f236ce9cca7ef17d59d"), username: 'kbanker', feeds: [ { _id:

    ObjectId("4e316f236ce9cca7ef17d54c"), name: "GigaOM" }, { _id: ObjectId("4e316f236ce9cca7ef17d54d"), name: "Salon.com" } { _id: ObjectId("4e316f236ce9cca7ef17d54e"), name: "Foo News" } ], latest: Date(2011, 7, 25) }
  19. Feeds { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), url: 'http://news.foo.com/feed.xml/', name: 'Foo News', subscriber_count:

    2, latest: Date(2011, 7, 25) } Index smaller db.feeds.ensureIndex( { url: 1 }, { unique: true } )
  20. Adding a feed subscription // Add this feed to user

    feed list db.users.update( {_id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $addToSet: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Foo News' } } )
  21. Removing a feed subscription db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") }, {

    $pull: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Foo News' } } )
  22. Entries (populate in background) { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title:

    'Important person to resign', body: 'A person deemed very important has decided...', reads: 5, date: Date(2011, 7, 27) }
  23. What we need now Populate personal feeds (buckets) Avoid lots

    of expensive queries Record what's been read
  24. Without bucketing // Naive query runs every time db.entries.find( feed_id:

    {$in: user_feed_ids} ).sort({date: 1}).limit(25)
  25. With bucketing // A bit smarter: only runs once entries

    = db.entries.find( { date: {$gt: user_latest }, feed_id: { $in: user_feed_ids } ).sort({date: 1})
  26. bucket = { _id: ObjectId("4e3185c26ce9cca7ef17d552"), user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: Date( 2011,

    7, 27 ) n: 100, entries: [ { _id: ObjectId("4e316b8a6ce9cca7ef17d5ac"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Important person to resign', body: 'A person deemed very important has decided...', date: Date(2011, 7, 27), read: false }, { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Panda bear waves hello', body: 'A panda bear at the local zoo...', date: Date(2011, 7, 27), read: false } ] }
  27. Viewing a personal feed // Next newest (that's how we

    paginate) db.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: { $lt: previous_reader_date } } ).sort({date: -1}).limit(1)
  28. Summary Use rich document structures for caching. Take advantage of

    locality of reference. Think about sharded schema design.