Slide 1

Slide 1 text

Schema Design by Example Using MongoDB

Slide 2

Slide 2 text

Kyle Banker [email protected]

Slide 3

Slide 3 text

@hwaet

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Three App Designs

Slide 6

Slide 6 text

What we'll cover: Schema Design Principles Hierarchies Ticketing Feed reader

Slide 7

Slide 7 text

Schema Design Principles

Slide 8

Slide 8 text

Documents Documents can be flat and relational, like rows. Or they can be richly structured...

Slide 9

Slide 9 text

Strategy Start with a normalized model. Use a rich, 'embedded' document for simplicity and / or performance.

Slide 10

Slide 10 text

1. Hierarchies

Slide 11

Slide 11 text

Two Strategies Store a map of the hierarchy in a single document. Store a list of ancestors (what we'll do).

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

A sample node: { _id: ObjectId("6a5b1476238d3b4dd5000048"), parent_id: ObjectId("9a9fb1476238d3b4dd5000001"), name: "Hard bop", description: "Pioneered by ...", slug: "hard-bop", ancestors: [ {name: "Bop", _id: ObjectId("8b87fb1476238d3b4dd5000003"), slug: "bop" }, {name: "Ragtime", _id: ObjectId("9a9fb1476238d3b4dd5000001"), slug: "ragtime"} ] }

Slide 17

Slide 17 text

Fetch the 'hard-bop' node with parent links db.categories.find( {slug: 'hard-bop'} );

Slide 18

Slide 18 text

Display all descendents of 'Bop'... db.categories.find( {'ancestors.slug': 'bop' } )

Slide 19

Slide 19 text

You'll want this index: db.categories.ensureIndex( {'ancestors.slug': 1 } )

Slide 20

Slide 20 text

Now, how do we update the structure?

Slide 21

Slide 21 text

First: a helper method.

Slide 22

Slide 22 text

Build an ancestor list, and update the category. function build_ancestors( _id, parent_id ) { ancestor_list = [] while( parent = db.cat.findOne( {_id: parent_id}, { name: 1, slug: 1, _id: 1 } ) { ancestor_list.unshift( parent ); parent_id = parent.parent_id } db.cat.update({_id: _id}, {"$set" {ancestors: ancestor_list}} ); }

Slide 23

Slide 23 text

Now, let's add a new category: Swing.

Slide 24

Slide 24 text

Now, let's add a new category: Swing.

Slide 25

Slide 25 text

Create a document for 'swing'... category = { parent_id: ragtime_id, slug: "swing", name: "Swing", description: "It don't mean a thing if it ain't got that..." };

Slide 26

Slide 26 text

Insert and then build ancestors. swing_id = db.cat.insert(category); build_ancestors(swing_id, ragtime_id);

Slide 27

Slide 27 text

Next, let's place 'Bop' under 'Swing'.

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

First, update the parent id... db.cat.update({_id: bop_id}, {$set: {parent_id: swing_id}});

Slide 31

Slide 31 text

Then, update the ancestor list for all descendents of 'Bop'. query = {'ancestors.id': bop_id}; db.cat.find(query).forEach(function(category) { build_ancestors(category['_id'], category['parent_id']); }

Slide 32

Slide 32 text

Finally, rename 'Bop'...

Slide 33

Slide 33 text

...to 'Bebop'.

Slide 34

Slide 34 text

Use the $ (positional) operator.

Slide 35

Slide 35 text

Update the main 'Bop' document... doc = db.cat.findOne({_id: bop_id}); doc['name'] = 'Bebop'; doc = db.cat.update({_id: bop_id}, doc);

Slide 36

Slide 36 text

...then mult-update in all ancestors with ($) db.cat.update({'ancestors.id': bop_id}, {'$set': {'ancestors.$': doc}}, false, true)

Slide 37

Slide 37 text

Summary Use arrays and multi-key indexes. Update embedded objects with the positional operator.

Slide 38

Slide 38 text

2. Ticketing

Slide 39

Slide 39 text

But don't you need transactions?

Slide 40

Slide 40 text

Transactions without Transactions

Slide 41

Slide 41 text

Bank accounts Transferring money is not a transaction. Distributed systems, long-running transactions, and contentious environments may require a different transactional strategy (Jim Gray. "The Transaction Concept: Virtues and Limitations").

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Two collections

Slide 45

Slide 45 text

One document for the seating map. // Orchestra section map. { _id: ObjectId("8b87fb1476238d3b4dd5000003"), event_id: 133, section: 'ORCH', seats: { 'A1': 1 'A2': 1 'A3': 0 'A4': 0 'A5': 0 // 'A5' - 'J20' } }

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

2. One document for each seat. { _id: { event_id: 133, seat: 'A1' }, order_id: null, updated: new Date(), state: 'AVAILABLE' }

Slide 48

Slide 48 text

The protocol: User requests seats A3, A4, and A5. We read-modify-write the state of each document. Then we update the seating map.

Slide 49

Slide 49 text

// Transition state, and roll back on failure. [ 'A3', 'A4', 'A5' ].forEach(function(n) { q = { _id: { event_id: 133, seat: n } }; u = { $set: { order_id: 333, state: 'CART', updated: new Date() } }; if( ! db.seats.findAndModify({ query: q, update: u }) rollback(); }

Slide 50

Slide 50 text

// If successful, modify seat map. db.seat_maps.update({ event_id: 133 }, {$set: { 'seats.A2': 1, 'seats.A3': 1, 'seats.A4': 1 });

Slide 51

Slide 51 text

Intermediate state (Compensation mechanism)

Slide 52

Slide 52 text

Cleanup protocol: Periodic daemon. Runs every minute. Revert if 'CART' or 'PENDING_AUTH' and older than 10 minutes. Set to 'AVAILABLE'.

Slide 53

Slide 53 text

Checkout // Transition state, and roll back on failure. [ 'A3', 'A4', 'A5' ].forEach(function(n) { q = { _id: { event_id: 133, seat: n } }; u = { $set: { order_id: 333, state: 'PENDING_AUTH', updated: new Date() } }; if( ! db.seats.findAndModify({ query: q, update: u }) rollback(); } db.orders.update()

Slide 54

Slide 54 text

Authorize // Authorize credit card. // If failure, rollback() tickets to 'CART' status. // If successful, transition tickets to 'PURCHASED'. // Add tickets to order object. db.orders.update( _id: 333, { $pushAll: { line_items: [ 'A3', 'A4', 'A5' ] });

Slide 55

Slide 55 text

Summary Use atomic writes and read-modify write semantics. Define a transactional procedure using compensation mechanisms. If you're using MongoDB, and you need basic transactional semantics, you have some tools.

Slide 56

Slide 56 text

3. Feed reader

Slide 57

Slide 57 text

Four collections Users Feeds Entries Buckets

Slide 58

Slide 58 text

User document (stores subscribed feeds) { _id: ObjectId("4e316f236ce9cca7ef17d59d"), username: 'kbanker', feeds: [ { _id: ObjectId("4e316f236ce9cca7ef17d54c"), name: "GigaOM" }, { _id: ObjectId("4e316f236ce9cca7ef17d54d"), name: "Salon.com" } { _id: ObjectId("4e316f236ce9cca7ef17d54e"), name: "Google News" } ], latest: Date(2011, 7, 25) }

Slide 59

Slide 59 text

Unique index on 'username' db.users.ensureIndex( { username: 1 }, { unique: true } )

Slide 60

Slide 60 text

Feeds { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), url: 'http://news.google.com/feed.xml/', name: 'Google News', subscriber_count: 2, latest: Date(2011, 7, 25) }

Slide 61

Slide 61 text

Unique index on URL db.feeds.ensureIndex( { url: 1 }, { unique: true } )

Slide 62

Slide 62 text

Adding a feed with an upsert db.feeds.update( { url: 'http://news.google.com/feed.xml/', name: 'Google News'}, { $inc: {subscriber_count: 1} }, true )

Slide 63

Slide 63 text

Subscribing a user to the feed db.users.update( {_id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $addToSet: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Google News' } } )

Slide 64

Slide 64 text

Unsubscribing from a feed db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $pull: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Google News' } } )

Slide 65

Slide 65 text

Removing a feed subscription db.feeds.update( {url: 'http://news.google.com/feed.xml/'}, { $inc: {subscriber_count: -1} } )

Slide 66

Slide 66 text

Feed entries (populate in background) { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Lorem Ipsum...', body: 'Vestibulum eget dui metus...', reads: 5, date: Date(2011, 7, 27) }

Slide 67

Slide 67 text

What we need now Populate personal feeds (buckets) Avoid lots of expensive queries Record what's been read

Slide 68

Slide 68 text

We'll use the bucketing pattern

Slide 69

Slide 69 text

By default, a less-then-optimal query. db.entries.find( feed_id: {$in: user_feed_ids} ).sort({date: -1}).limit(25)

Slide 70

Slide 70 text

But with bucketing, we can be smarter. entries = db.entries.find( { date: {$gt: user_latest }, feed_id: { $in: user_feed_ids } ).sort({date: -1})

Slide 71

Slide 71 text

A compound index on 'date', 'feed_id'. db.entries.ensureIndex( { date: 1, feed_id: 1} )

Slide 72

Slide 72 text

bucket = { _id: ObjectId("4e3185c26ce9cca7ef17d552"), user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: Date( 2011, 7, 27 ) n: 100, entries: [ { _id: ObjectId("4e316b8a6ce9cca7ef17d5ac"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Lorem Ipsum...', body: 'Vestibulum eget dui metus...', date: Date(2011, 7, 27), read: false }, { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Panda bear waves hello', body: 'A panda bear at the local zoo...', date: Date(2011, 7, 27), read: false } ] }

Slide 73

Slide 73 text

The bucket is a strategic denormalization 1 query + 1 seek (if not in memory) = document-level locality

Slide 74

Slide 74 text

Insert the bucket and set the latest entry db.buckets.insert( bucket ) db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") } { $set: { date: latest_entry_date } } )

Slide 75

Slide 75 text

Viewing a personal feed db.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d") } ).sort({date: -1}).limit(1)

Slide 76

Slide 76 text

Retrieving the next set of results... db.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: { $lt: previous_reader_date } } ).sort({date: -1}).limit(1)

Slide 77

Slide 77 text

Include this index, of course. db.buckets.ensureIndex( { user_id: 1, date: -1 } )

Slide 78

Slide 78 text

Mark a personal feed item as 'read'. db.buckets.update( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), 'entries.id': ObjectId("4e316b8a6ce9cca7ef17d5c8")}, { $set: { 'entries.$.read' : true } )

Slide 79

Slide 79 text

Increment the read count on the entry... db.entries.update( { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8") }, { $inc: { read: 1 } } )

Slide 80

Slide 80 text

A note on sharding. The buckets collection is eminently shardable Shard key: { user_id: 1, date: 1 }

Slide 81

Slide 81 text

Summary Use rich document structures for caching. Take advantage of locality of reference. Think about sharded schema design.

Slide 82

Slide 82 text

Thank you