Slide 1

Slide 1 text

Jared Rosoff MongoDB Schema Design

Slide 2

Slide 2 text

[email protected] @forjared

Slide 3

Slide 3 text

Topics Introduction • Working with documents • Evolving a schema • Queries and indexes • Rich documents

Slide 4

Slide 4 text

Topics Introduction • Working with documents • Evolving a schema • Queries and indexes • Rich documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues

Slide 5

Slide 5 text

Ways to model data: http://www.flickr.com/photos/42304632@N00/493639870/

Slide 6

Slide 6 text

Normalized

Slide 7

Slide 7 text

Denormalized

Slide 8

Slide 8 text

Terminology RDBMS MongoDB Table Collection Row(s) JSON  Document Index Index Join Embedding  &  Linking

Slide 9

Slide 9 text

Schema-design Criteria How can we manipulate this data? • Dynamic Queries • Secondary Indexes • Atomic Updates • Map Reduce

Slide 10

Slide 10 text

Schema-design Criteria How can we manipulate this data? • Dynamic Queries • Secondary Indexes • Atomic Updates • Map Reduce Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle

Slide 11

Slide 11 text

Schema-design Criteria How can we manipulate this data? • Dynamic Queries • Secondary Indexes • Atomic Updates • Map Reduce Considerations • No Joins • Document writes are atomic Access Patterns? • Read / Write Ratio • Types of updates • Types of queries • Data life-cycle

Slide 12

Slide 12 text

Here’s an example with a book:

Slide 13

Slide 13 text

A simple start: Map the documents to your application. book  =  {author:  “Hergé”,                date:  new  Date(),                text:  “Destination  Moon”,                tags:  [“comic”,  “adventure”]} >  db.books.save(book)

Slide 14

Slide 14 text

>  db.books.find()    {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:  "Hergé",          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes: • _id must be unique, but can be anything you’d like • Default BSON ObjectId if one is not supplied Find the document

Slide 15

Slide 15 text

Secondary index on “author”  >  db.books.ensureIndex({author:  1})  >  db.books.find({author:  'Hergé'})          {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          author:  "Hergé",            ...  } Add an index, find via index

Slide 16

Slide 16 text

Examine the query plan >  db.books.find({author:  'Hergé'}).explain() {   "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   } }

Slide 17

Slide 17 text

Multi-key indexes //  Build  an  index  on  the  ‘tags’  array >  db.books.ensureIndex({tags:  1})

Slide 18

Slide 18 text

Multi-key indexes //  Build  an  index  on  the  ‘tags’  array >  db.books.ensureIndex({tags:  1}) //  find  posts  with  a  specific  tag //  (This  will  use  an  index!) >  db.books.find({tags:  ‘comic’})

Slide 19

Slide 19 text

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Slide 20

Slide 20 text

Extending the Schema        new_comment  =  {author:  “Kyle”,                                  date:  new  Date(),                                text:  “great  book”,                                votes:  5}  >  db.books.update(                      {text:  “Destination  Moon”  },                        {  ‘$push’:  {comments:  new_comment},                          ‘$inc’:    {comments_count:  1}})

Slide 21

Slide 21 text

     {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),          author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5   }        ],        comments_count:  1    } Extending the Schema

Slide 22

Slide 22 text

//  create  index  on  nested  documents: >  db.books.ensureIndex({"comments.author":  1}) >  db.books.find({comments.author:”Kyle”}) The ‘dot’ operator

Slide 23

Slide 23 text

//  create  index  on  nested  documents: >  db.books.ensureIndex({"comments.author":  1}) >  db.books.find({comments.author:”Kyle”}) //  create  index  comment  votes: >  db.books.ensureIndex({comments.votes:  1}) //  find  all  books  with  any  comments  with  more  than //  50  votes >  db.books.ensureIndex({comments.votes:  {$gt:  50}}) The ‘dot’ operator

Slide 24

Slide 24 text

Rich Documents

Slide 25

Slide 25 text

Rich Documents

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Rich Documents

Slide 28

Slide 28 text

Rich Documents {      _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),                  line_items  :  [  {  sku:  ‘tt-­‐123’,                                          name:  ‘Tintin  a  la  Lune’  },                                      {  sku:  ‘tt-­‐457’,                                          name:  ‘Coke  en  stock’  }  ],                address  :  {  name:  ‘Banker’,                                street:  ‘111  Main’,                                zip:  10010  },                  payment:  {  cc:  4567,                              exp:  Date(2011,  7,  7)  },        subtotal:  2355 }

Slide 29

Slide 29 text

Inheritance

Slide 30

Slide 30 text

Single Table Inheritance - RDBMS shapes table id type area radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2

Slide 31

Slide 31 text

Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

Slide 32

Slide 32 text

Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}})

Slide 33

Slide 33 text

Single Table Inheritance - MongoDB >  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  d:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2} //  find  shapes  where  radius  >  0   >  db.shapes.find({radius:  {$gt:  0}}) //  create  sparse  index >  db.shapes.ensureIndex({radius:  1},  {sparse:  true})

Slide 34

Slide 34 text

One to Many One to Many relationships can specify

Slide 35

Slide 35 text

One to Many - Embedded Array / Array Keys - $slice operator to return subset of array - some queries hard e.g find latest comments across all documents

Slide 36

Slide 36 text

One to Many - Embedded Array / Array Keys - $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query

Slide 37

Slide 37 text

One to Many - Embedded Array / Array Keys - $slice operator to return subset of array - some queries hard e.g find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries

Slide 38

Slide 38 text

One to Many - patterns - Embedded Array / Array Keys - Embedded Array / Array Keys - Embedded tree - Normalized

Slide 39

Slide 39 text

     {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),          author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [      {     author  :  "Kyle",     date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",     text  :  "great  book",                  votes  :  5      }        ],        comments_count:  1    }

Slide 40

Slide 40 text

     {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),          author  :  "Hergé",        date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ]    }        {   book_id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author  :  "Kyle",   date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",   text  :  "great  book",          votes  :  5    }    

Slide 41

Slide 41 text

Referencing vs. Embedding - Embed when the ‘many’ objects always appear with their parent. - Reference when you need more flexibility.

Slide 42

Slide 42 text

Many - Many Example: - Product can be in many categories - Category can have many products

Slide 43

Slide 43 text

products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     Many - Many

Slide 44

Slide 44 text

products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} Many - Many

Slide 45

Slide 45 text

products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure",            product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),                                        ObjectId("4c4ca30433fb5941681b9130"),                                        ObjectId("4c4ca30433fb5941681b913a"]} //  Multi-­‐key  index  on  array  fields >  db.products.ensureIndex({category_ids:  1}); >  db.categories.ensureIndex({product_ids:  1}); Many - Many

Slide 46

Slide 46 text

products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} Better alternative

Slide 47

Slide 47 text

products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:  ObjectId ("4c4ca25433fb5941681b912f")})     Alternative

Slide 48

Slide 48 text

products:      {  _id:  ObjectId("4c4ca23933fb5941681b912e"),          name:  "Destination  Moon",          category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),                                          ObjectId("4c4ca25433fb5941681b92af”]}     categories:      {  _id:  ObjectId("4c4ca25433fb5941681b912f"),            name:  "adventure"} //  All  products  for  a  given  category >  db.products.find({category_ids:          ObjectId("4c4ca25433fb5941681b912f")})   //  All  categories  for  a  given  product product    =  db.products.find(_id  :  some_id) >  db.categories.find({_id  :  {$in  :  product.category_ids}})   Alternative

Slide 49

Slide 49 text

Trees Full Tree in Document {  comments:  [          {  author:  “Kyle”,  text:  “...”,                replies:  [                                            {author:  “Fred”,  text:  “...”,                                              replies:  []}                ]}    ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit    

Slide 50

Slide 50 text

Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)

Slide 51

Slide 51 text

Array of Ancestors - Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

Slide 52

Slide 52 text

Array of Ancestors - Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’})

Slide 53

Slide 53 text

Array of Ancestors - Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }    {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }    {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  } //find  all  descendants  of  b: >  db.tree2.find({ancestors:  ‘b’}) //find  all  direct  descendants  of  b: >  db.tree2.find({parent:  ‘b’}) //find  all  ancestors  of  f: >  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors >  db.tree2.find({_id:  {  $in  :  ancestors})

Slide 54

Slide 54 text

Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      }

Slide 55

Slide 55 text

Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once      {  inprogress:  false,          priority:  1,          message:  “Rich  documents  FTW!”      ...      } //  find  highest  priority  job  and  mark  as  in-­‐progress job  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1),                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}}})    

Slide 56

Slide 56 text

Summary Schema design is different in MongoDB Basic data design principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)

Slide 57

Slide 57 text

@mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongo>   Facebook                    |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]