Schema Design - Jared Rosoff, 10gen

Slide 1

Slide 1 text

Jared Rosoff MongoDB Schema Design

Slide 2

Slide 2 text

[email protected] @forjared

Slide 3

Slide 3 text

Topics Introduction • Working with documents • Evolving a schema • Queries and indexes • Rich documents

Slide 4

Slide 4 text

Topics Introduction • Working with documents • Evolving a schema • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues

Slide 5

Slide 5 text

Ways to model data: http://www.ﬂickr.com/photos/42304632@N00/493639870/

Slide 6

Slide 6 text

Normalized

Slide 7

Slide 7 text

Denormalized

Slide 8

Slide 8 text

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking

Slide 9

Slide 9 text

Schema-design Criteria How can we manipulate this data? • Dynamic Queries • Secondary Indexes • Atomic Updates • Map Reduce

Slide 10

Slide 10 text

Schema-design Criteria How can we manipulate this data? • Dynamic Queries • Secondary Indexes • Atomic Updates • Map Reduce Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle

Slide 11

Slide 11 text

Schema-design Criteria How can we manipulate this data? • Dynamic Queries • Secondary Indexes • Atomic Updates • Map Reduce Considerations • No Joins • Document writes are atomic Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle

Slide 12

Slide 12 text

Here’s an example with a book:

Slide 13

Slide 13 text

A simple start: Map the documents to your application. book = {author: “Hergé”, date: new Date(), text: “Destination Moon”, tags: [“comic”, “adventure”]} > db.books.save(book)

Slide 14

Slide 14 text

> db.books.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document

Slide 15

Slide 15 text

Secondary index on “author” > db.books.ensureIndex({author: 1}) > db.books.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", author: "Hergé", ... } Add an index, ﬁnd via index

Slide 16

Slide 16 text

Examine the query plan > db.books.find({author: 'Hergé'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } }

Slide 17

Slide 17 text

Multi-key indexes // Build an index on the ‘tags’ array > db.books.ensureIndex({tags: 1})

Slide 18

Slide 18 text

Multi-key indexes // Build an index on the ‘tags’ array > db.books.ensureIndex({tags: 1}) // find posts with a specific tag // (This will use an index!) > db.books.find({tags: ‘comic’})

Slide 19

Slide 19 text

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Slide 20

Slide 20 text

Extending the Schema new_comment = {author: “Kyle”, date: new Date(), text: “great book”, votes: 5} > db.books.update( {text: “Destination Moon” }, { ‘$push’: {comments: new_comment}, ‘$inc’: {comments_count: 1}})

Slide 21

Slide 21 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-‐0700 (PDT)", text : "great book", votes : 5 } ], comments_count: 1 } Extending the Schema

Slide 22

Slide 22 text

// create index on nested documents: > db.books.ensureIndex({"comments.author": 1}) > db.books.find({comments.author:”Kyle”}) The ‘dot’ operator

Slide 23

Slide 23 text

// create index on nested documents: > db.books.ensureIndex({"comments.author": 1}) > db.books.find({comments.author:”Kyle”}) // create index comment votes: > db.books.ensureIndex({comments.votes: 1}) // find all books with any comments with more than // 50 votes > db.books.ensureIndex({comments.votes: {$gt: 50}}) The ‘dot’ operator

Slide 24

Slide 24 text

Rich Documents

Slide 25

Slide 25 text

Rich Documents

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Rich Documents

Slide 28

Slide 28 text

Rich Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: ‘tt-‐123’, name: ‘Tintin a la Lune’ }, { sku: ‘tt-‐457’, name: ‘Coke en stock’ } ], address : { name: ‘Banker’, street: ‘111 Main’, zip: 10010 }, payment: { cc: 4567, exp: Date(2011, 7, 7) }, subtotal: 2355 }

Slide 29

Slide 29 text

Inheritance

Slide 30

Slide 30 text

Single Table Inheritance - RDBMS shapes table id type area radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

Slide 33

Slide 33 text

Slide 34

Slide 34 text

One to Many One to Many relationships can specify

Slide 35

Slide 35 text

One to Many - Embedded Array / Array Keys - $slice operator to return subset of array - some queries hard e.g ﬁnd latest comments across all documents

Slide 36

Slide 36 text

One to Many - Embedded Array / Array Keys - $slice operator to return subset of array - some queries hard e.g ﬁnd latest comments across all documents - Embedded tree - Single document - Natural - Hard to query

Slide 37

Slide 37 text

Slide 38

Slide 38 text

One to Many - patterns - Embedded Array / Array Keys - Embedded Array / Array Keys - Embedded tree - Normalized

Slide 39

Slide 39 text

Slide 40

Slide 40 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)", text : "Destination Moon", tags : [ "comic", "adventure" ] } { book_id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Kyle", date : "Sat Jul 24 2010 20:51:03 GMT-‐0700 (PDT)", text : "great book", votes : 5 }

Slide 41

Slide 41 text

Referencing vs. Embedding - Embed when the ‘many’ objects always appear with their parent. - Reference when you need more ﬂexibility.

Slide 42

Slide 42 text

Many - Many Example: - Product can be in many categories - Category can have many products

Slide 43

Slide 43 text

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Destination Moon", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} Many - Many

Slide 44

Slide 44 text

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Destination Moon", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "adventure", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} Many - Many

Slide 45

Slide 45 text

Slide 46

Slide 46 text

Slide 47

Slide 47 text

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Destination Moon", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "adventure"} // All products for a given category > db.products.find({category_ids: ObjectId ("4c4ca25433fb5941681b912f")}) Alternative

Slide 48

Slide 48 text

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Destination Moon", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "adventure"} // All products for a given category > db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find(_id : some_id) > db.categories.find({_id : {$in : product.category_ids}}) Alternative

Slide 49

Slide 49 text

Trees Full Tree in Document { comments: [ { author: “Kyle”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit

Slide 50

Slide 50 text

Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)

Slide 51

Slide 51 text

Slide 52

Slide 52 text

Array of Ancestors - Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } //find all descendants of b: > db.tree2.find({ancestors: ‘b’}) //find all direct descendants of b: > db.tree2.find({parent: ‘b’})

Slide 53

Slide 53 text

Slide 54

Slide 54 text

Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once { inprogress: false, priority: 1, message: “Rich documents FTW!” ... }

Slide 55

Slide 55 text

Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once { inprogress: false, priority: 1, message: “Rich documents FTW!” ... } // find highest priority job and mark as in-‐progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1), update: {$set: {inprogress: true, started: new Date()}}})

Slide 56

Slide 56 text

Summary Schema design is different in MongoDB Basic data design principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)

Slide 57

Slide 57 text

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongo> Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]