Slide 1

Slide 1 text

1 1 Online Conference: Deep Dive with MongoDB

Slide 2

Slide 2 text

2 2 Building your first App with MongoDB

Slide 3

Slide 3 text

3 3 •  Quick introduction to mongoDB •  Data modeling in mongoDB, queries, geospatial, updates and map reduce. •  Using a location-based app as an example •  Example works in mongoDB JS shell

Slide 4

Slide 4 text

4 4

Slide 5

Slide 5 text

5 5 MongoDB is a scalable, high-performance, open source, document-oriented database. •  Fast Querying •  In-place updates •  Full Index Support •  Replication /High Availability •  Auto-Sharding •  Aggregation; Map/Reduce •  GridFS

Slide 6

Slide 6 text

6 6 MongoDB is Implemented in C++ •  Windows, Linux, Mac OS-X, Solaris Drivers are available in many languages 10gen supported •  C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs! •  Multiple community supported drivers The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x

Slide 7

Slide 7 text

7 7 RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Partition Shard Join Embedding/Linking Schema (implied Schema)

Slide 8

Slide 8 text

8 8 { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-02-03T17:22:21.124Z"), text : "Best Post Ever!" }], comment_count : 1 }

Slide 9

Slide 9 text

9 9 • JSON has powerful, limited set of datatypes –  Mongo extends datatypes with Date, Int types, Id, … • MongoDB stores data in BSON • BSON is a binary representation of JSON –  Optimized for performance and navigational abilities –  Also compression See: bsonspec.org!

Slide 10

Slide 10 text

10 10 •  Intrinsic support for fast, iterative development •  Super low latency access to your data •  Very little CPU overhead •  No additional caching layer required •  Built in replication and horizontal scaling support

Slide 11

Slide 11 text

11 11 • Want to build an app where users can check in to a location • Leave notes or comments about that location

Slide 12

Slide 12 text

12 12 "As a user I want to be able to find other locations nearby" •  Need to store locations (Offices, Restaurants, etc) –  name, address, tags –  coordinates –  User generated content e.g. tips / notes

Slide 13

Slide 13 text

13 13 "As a user I want to be able to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations

Slide 14

Slide 14 text

14 14 users user1, user2 loc1, loc2, loc3 locations checkins checkin1, checkin2

Slide 15

Slide 15 text

15 15 > location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 }

Slide 16

Slide 16 text

16 16 > location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.find({name: "Lotus Flower"})

Slide 17

Slide 17 text

17 17 > location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.ensureIndex({name: 1}) > db.locations.find({name: "Lotus Flower"})

Slide 18

Slide 18 text

18 18 > location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] }

Slide 19

Slide 19 text

19 19 > location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1})

Slide 20

Slide 20 text

20 20 > location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "dumplings"})

Slide 21

Slide 21 text

21 21 > location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] }

Slide 22

Slide 22 text

22 22 > location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"})

Slide 23

Slide 23 text

23 23 > location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}})

Slide 24

Slide 24 text

24 24 // creating your indexes: > db.locations.ensureIndex({tags: 1}) > db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({lat_long: "2d"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) // with regular expressions: > db.locations.find({name: /^Din/}) // by tag: > db.locations.find({tag: "dumplings"})

Slide 25

Slide 25 text

25 25 Atomic operators: $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

Slide 26

Slide 26 text

26 26 // initial data load: > db.locations.insert(location_3) // adding a tip with update: > db.locations.update( {name: "Lotus Flower"}, {$push: { tips: { user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!"} }})

Slide 27

Slide 27 text

27 27 > db.locations.findOne() { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387], tips:[{ user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!" }] }

Slide 28

Slide 28 text

28 28 "As a user I want to be able to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations

Slide 29

Slide 29 text

29 29 > user_1 = { _id: "[email protected]", name: "Asya", twitter: "asya999", checkins: [ {location: "Lotus Flower", ts: "28/03/2012"}, {location: "Meridian Hotel", ts: "27/03/2012"} ] } > db.users.ensureIndex({checkins.location: 1}) > db.users.find({checkins.location: "Lotus Flower"})

Slide 30

Slide 30 text

30 30 // find all users who've checked in here: > db.users.find({"checkins.location":"Lotus Flower"})

Slide 31

Slide 31 text

31 31 // find all users who've checked in here: > db.users.find({"checkins.location":"Lotus Flower"}) // find the last 10 checkins here? > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10)

Slide 32

Slide 32 text

32 32 // find all users who've checked in here: > db.users.find({"checkins.location":"Lotus Flower"}) // find the last 10 checkins here: - Warning! > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10) Hard to query for last 10

Slide 33

Slide 33 text

33 33 > user_2 = { _id: "[email protected]", name: "Asya", twitter: "asya999", } > checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } > db.checkins.ensureIndex({user: 1}) > db.checkins.find({user: user_id})

Slide 34

Slide 34 text

34 34 // find all users who've checked in here: > location_id = db.checkins.find({"name":"Lotus Flower"}) > u_ids = db.checkins.find({location: location_id}, {_id: -1, user: 1}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: location_id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: location_id, ts: {$gt: midnight}} ).count()

Slide 35

Slide 35 text

35 35 // Find most popular locations > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numEntries: {$sum: 1}}} ) > agg.result [{"_id": "Lotus Flower", "numEntries" : 17}]

Slide 36

Slide 36 text

36 36 // Find most popular locations > map_func = function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Lotus Flower", "value" : 17}

Slide 37

Slide 37 text

37 37 Deployment

Slide 38

Slide 38 text

38 38 P •  Single server - need a strong backup plan

Slide 39

Slide 39 text

39 39 •  Single server - need a strong backup plan •  Replica sets - High availability - Automatic failover P P S S

Slide 40

Slide 40 text

40 40 •  Single server - need a strong backup plan •  Replica sets - High availability - Automatic failover •  Sharded - Horizontally scale - Auto balancing P S S P S S P P S S

Slide 41

Slide 41 text

41 41 User  Data  Management   High  Volume  Data  Feeds     Content  Management   Opera9onal  Intelligence   E-­‐Commerce  

Slide 42

Slide 42 text

42 42

Slide 43

Slide 43 text

43 43 @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by

Slide 44

Slide 44 text

44 44 Schema Design with MongoDB

Slide 45

Slide 45 text

Schema Design with MongoDB Antoine Girbal [email protected] @antoinegirbal

Slide 46

Slide 46 text

So why model data? http://www.flickr.com/photos/42304632@N00/493639870/  

Slide 47

Slide 47 text

Goals Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Avoid bias toward a particular query Make use of all SQL features In MongoDB Similar goals apply but rules are different Denormalization for optimization is an option: most features still exist, contrary to BLOBS Normalization

Slide 48

Slide 48 text

Terminology RDBMS   MongoDB   Table   Collection   Row(s)   JSON  Document   Index   Index   Join   Embedding  &  Linking   Partition   Shard   Partition  Key   Shard  Key  

Slide 49

Slide 49 text

Equivalent to a Table in SQL Cheap to create (max 24000) Collections don’t have a fixed schema Common for documents in a collection to share a schema Document schema can evolve Consider using multiple related collections tied together by a naming convention: e.g. LogData-2011-02-08 Collections Basics

Slide 50

Slide 50 text

Elements are name/value pairs, equivalent to column value in SQL elements can be nested Rich data types for values JSON for the human eye BSON for all internals 16MB maximum size (many books..) What you see is what is stored Document basics

Slide 51

Slide 51 text

Schema Design - Relational

Slide 52

Slide 52 text

Schema Design - MongoDB

Slide 53

Slide 53 text

Schema Design - MongoDB embedding  

Slide 54

Slide 54 text

Schema Design - MongoDB embedding   linking  

Slide 55

Slide 55 text

! Design documents that simply map to your application ! > post = { author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z"),! text: "Destination Moon",! tags: ["comic", "adventure"]! }! ! > db.blogs.save(post)! Design Session

Slide 56

Slide 56 text

> db.blogs.find()! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! author: "Hergé", ! date: ISODate("2011-09-18T09:56:06.298Z"), ! text: "Destination Moon", ! tags: [ "comic", "adventure" ]! }   Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied Find the document

Slide 57

Slide 57 text

Secondary index for “author” // 1 means ascending, -1 means descending! > db.blogs.ensureIndex( { author: 1 } )! ! > db.blogs.find( { author: 'Hergé' } ) ! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! date: ISODate("2011-09-18T09:56:06.298Z"),! author: "Hergé", ! ... }! Add and index, find via Index

Slide 58

Slide 58 text

> db.blogs.find( { author: "Hergé" } ).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } } Examine the query plan

Slide 59

Slide 59 text

> db.blogs.find( { author: "Hergé" } ).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } } Examine the query plan

Slide 60

Slide 60 text

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Query operators

Slide 61

Slide 61 text

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) !   Query operators

Slide 62

Slide 62 text

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) ! Counting: // number of posts written by Hergé! > db.blogs.find( { author: "Hergé" } ).count() ! Query operators

Slide 63

Slide 63 text

> new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Extending the Schema

Slide 64

Slide 64 text

> db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 } Extending the Schema

Slide 65

Slide 65 text

// create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) Extending the Schema

Slide 66

Slide 66 text

// create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) Extending the Schema

Slide 67

Slide 67 text

// create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an index Extending the Schema

Slide 68

Slide 68 text

Patterns: •  Inheritance •  one to one •  one to many •  many to many Common Patterns

Slide 69

Slide 69 text

Inheritance

Slide 70

Slide 70 text

shapes table Single Table Inheritance - MongoDB id   type   area   radius   length   width   1   circle   3.14   1   2   square   4   2   3   rect   10   5   2  

Slide 71

Slide 71 text

> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} Single Table Inheritance - MongoDB missing  values  not   stored!  

Slide 72

Slide 72 text

> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) Single Table Inheritance - MongoDB

Slide 73

Slide 73 text

> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) // create index > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) Single Table Inheritance - MongoDB index  only  values   present!  

Slide 74

Slide 74 text

One to Many Either:     • Embedded  Array  /  Document:   •  improves  read  speed   •  simplifies  schema     • Normalize:   •  if  list  grows  significantly   •  if  sub  items  are  updated  often   •  if  sub  items  are  more  than  1  level  deep  and  need  updating  

Slide 75

Slide 75 text

One to Many Embedded Array:! • $slice operator to return subset of comments! • some queries become harder (e.g find latest comments across all blogs)! ! blogs: { ! author : "Hergé",! date : ISODate("2011-09-18T09:56:06.298Z"), ! comments : [! " {! " "author : "Kyle",! " "date : ISODate("2011-09-19T09:56:06.298Z"),! " "text : "great book"! " }! ]! }!

Slide 76

Slide 76 text

One to Many Normalized (2 collections)! • most flexible! • more queries! ! blogs: { _id: 1000, ! author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z") }! ! comments : { _id : 1,! blogId: 1000,! author : "Kyle",! " " date : ISODate("2011-09-19T09:56:06.298Z") }! ! > blog = db.blogs.find( { text: "Destination Moon" } );! ! > db.ensureIndex( { blogId: 1 } ) // important!! > db.comments.find( { blogId: blog._id } );!

Slide 77

Slide 77 text

Example: •  Product can be in many categories •  Category can have many products Many - Many

Slide 78

Slide 78 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! Many - Many

Slide 79

Slide 79 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Many - Many

Slide 80

Slide 80 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Cuts mapping table and 2 indexes, but:! •  potential consistency issue! •  lists can grow too large! Many - Many

Slide 81

Slide 81 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! Alternative

Slide 82

Slide 82 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! // All products for a given category! > db.products.ensureIndex( { category_ids: 1} ) // yes!! > db.products.find( { category_ids: 20 } )! ! ! Alternative

Slide 83

Slide 83 text

Use cases: •  Trees •  Time Series Common Use Cases

Slide 84

Slide 84 text

Hierarchical information           Trees

Slide 85

Slide 85 text

Full Tree in Document   { retweet: [! { who: “Kyle”, text: “...”, ! retweet: [! {who: “James”, text: “...”,! retweet: []} ! ]}! ]! }!   Pros: Single Document, Performance, Intuitive Cons: Hard to search or update, document can easily get too large         Trees

Slide 86

Slide 86 text

// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) Array of Ancestors A   B   C   D   E   F  

Slide 87

Slide 87 text

// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) Array of Ancestors A   B   C   D   E   F  

Slide 88

Slide 88 text

// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) // find tweet history of f: > tweets = db.tweets.findOne( { _id: "f" } ).tree > db.tweets.find( { _id: { $in : tweets } } ) Array of Ancestors A   B   C   D   E   F  

Slide 89

Slide 89 text

Store hierarchy as a path expression •  Separate each node by a delimiter, e.g. “,” •  Use text search for find parts of a tree •  search must be left-rooted and use an index! { retweets: [! { _id: "a", text: "initial tweet", ! path: "a" },! { _id: "b", text: "reweet with comment",! path: "a,b" },! { _id: "c", text: "reply to retweet",! path : "a,b,c"} ] }! ! // Find the conversations "a" started ! > db.tweets.find( { path: /^a/i } )! // Find the conversations under a branch ! > db.tweets.find( { path: /^a,b/i } )! Trees as Paths A   B   C   D   E   F  

Slide 90

Slide 90 text

•  Records stats by •  Day, Hour, Minute •  Show time series Time Series

Slide 91

Slide 91 text

// Time series buckets, hour and minute sub-docs { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } } // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 }) Time Series

Slide 92

Slide 92 text

•  Sequence of key/value pairs •  NOT a hash map •  Optimized to scan quickly BSON Storage ...   0   1   2   3   1439   What is the cost of update the minute before midnight?

Slide 93

Slide 93 text

•  Can skip sub-documents BSON Storage ...   0   1   59   1439   How could this change the schema? 0   ...   23   ...   1380  

Slide 94

Slide 94 text

Use more of a Tree structure by nesting! // Time series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { "minute.23.59": 1 } }) Time Series

Slide 95

Slide 95 text

Document to represent a shopping order: { _id: 1234, ts: ISODate("2011-12-09T00:00:00.000Z") customerId: 67, total_price: 1050, items: [{ sku: 123, quantity: 2, price: 50, name: “macbook”, thumbnail: “macbook.png” }, { sku: 234, quantity: 1, price: 20, name: “iphone”, thumbnail: “iphone.png” }, ... } } The item information is duplicated in every order that reference it. Mongo’s flexible schema makes it easy! Duplicate data

Slide 96

Slide 96 text

Pros: only 1 query to get all information needed to display the order processing on the db is as fast as a BLOB can achieve much higher performance Cons: more storage used ... cheap enough updates are much more complicated ... just consider fields immutable Duplicate data

Slide 97

Slide 97 text

Basic data design principles stay the same ... But MongoDB is more flexible and brings possibilities embed or duplicate data to speed up operations, cut down the number of collections and indexes watch for documents growing too large make sure to use the proper indexes for querying and sorting schema should feel natural to your application! Summary

Slide 98

Slide 98 text

@mongodb   conferences,  appearances,  and  meetups   http://www.10gen.com/events   http://bit.ly/mongo_     Facebook                    |                  Twitter                  |                  LinkedIn   http://linkd.in/joinmongo   download at mongodb.org

Slide 99

Slide 99 text

99 99 Replication and Replica Sets

Slide 100

Slide 100 text

100 100 Why Have Replication?

Slide 101

Slide 101 text

101 101 •  High Availability (auto-failover) •  Read Scaling (extra copies to read from) •  Backups –  Online, Delayed Copy (fat finger) –  Point in Time (PiT) backups •  Use (hidden) replica for secondary workload –  Analytics –  Data-processing –  Integration with external systems

Slide 102

Slide 102 text

102 102 Planned –  Hardware upgrade –  O/S or file-system tuning –  Relocation of data to new file-system / storage –  Software upgrade Unplanned –  Hardware failure –  Data center failure –  Region outage –  Human error –  Application corruption

Slide 103

Slide 103 text

103 103 •  A cluster of N servers •  All writes to primary •  Reads can be to primary (default) or a secondary •  Any (one) node can be primary •  Consensus election of primary •  Automatic failover •  Automatic recovery

Slide 104

Slide 104 text

104 104 •  Replica Set is made up of 2 or more nodes Member 1 Member 2 Member 3

Slide 105

Slide 105 text

105 105 •  Election establishes the PRIMARY •  Data replication from PRIMARY to SECONDARY Member 1 Member 2 Primary Member 3

Slide 106

Slide 106 text

106 106 •  PRIMARY may fail •  Automatic election of new PRIMARY if majority exists Member 1 Member 2 DOWN Member 3 negotiate new master

Slide 107

Slide 107 text

107 107 Member 1 Member 2 DOWN Member 3 Primary negotiate new master •  New PRIMARY elected •  Replica Set re-established

Slide 108

Slide 108 text

108 108 •  Automatic recovery Member 1 Member 3
 Primary Member 2 Recovering

Slide 109

Slide 109 text

109 109 •  Replica Set re-established Member 1 Member 3
 Primary Member 2

Slide 110

Slide 110 text

110 110 Understanding automatic failover

Slide 111

Slide 111 text

111 111 Primary Secondary Secondary As long as a partition can see a majority (>50%) of the cluster, then it will elect a primary.

Slide 112

Slide 112 text

112 112 Primary Failed Node Secondary 66% of cluster visible. Primary is elected

Slide 113

Slide 113 text

113 113 Failed Node 33% of cluster visible. Read only mode. Failed Node Secondary

Slide 114

Slide 114 text

114 114 Primary Secondary Secondary

Slide 115

Slide 115 text

115 115 Primary Secondary Secondary Primary Failed Node Secondary 66% of cluster visible Primary is elected

Slide 116

Slide 116 text

116 116 Secondary 33% of cluster visible Read only mode. Primary Secondary Failed Node Failed Node Secondary

Slide 117

Slide 117 text

117 117 Primary Secondary Secondary Secondary

Slide 118

Slide 118 text

118 118 Primary Secondary Secondary Secondary Failed Node Secondary Failed Node 50% of cluster visible Read only mode. Secondary

Slide 119

Slide 119 text

119 119 Primary Secondary Failed Node Secondary Failed Node 50% of cluster visible Read only mode. Secondary Secondary Secondary

Slide 120

Slide 120 text

120 120 Avoid single points of failure

Slide 121

Slide 121 text

121 121

Slide 122

Slide 122 text

122 122 Primary Secondary Secondary Top of rack switch Rack falls over

Slide 123

Slide 123 text

123 123 Primary Secondary Secondary Loss of internet Building burns dow

Slide 124

Slide 124 text

124 124 Primary Secondary Secondary San Francisco Dallas

Slide 125

Slide 125 text

125 125 Primary Secondary Secondary San Francisco Dallas Priority 1 Priority 1 Priority 0 Disaster recover data center. Will never become primary automatically.

Slide 126

Slide 126 text

126 126 Primary Secondary Secondary San Francisco Dallas New York

Slide 127

Slide 127 text

127 127 Fast recovery

Slide 128

Slide 128 text

128 128 Primary Arbiter Secondary Is this a good idea?

Slide 129

Slide 129 text

129 129 Primary Arbiter Secondary 1

Slide 130

Slide 130 text

130 130 Primary Arbiter Secondary Primary Arbiter Secondary 1 2

Slide 131

Slide 131 text

131 131 Primary Arbiter Secondary Primary Arbiter Secondary 1 2 Primary Arbiter Secondary 3 Secondary Full Sync Uh oh. Full Sync is going to use a lot of resources on the primary. So I may have downtime or degraded performance

Slide 132

Slide 132 text

132 132 Primary Secondary 1 Secondary

Slide 133

Slide 133 text

133 133 Primary Secondary Primary Secondary 1 2 Secondary Secondary

Slide 134

Slide 134 text

134 134 Primary Secondary Primary Secondary 1 2 Primary Secondary 3 Secondary Full Sync Sync can happen from secondary, which will not impact traffic on Primary. Secondary Secondary Secondary

Slide 135

Slide 135 text

135 135 •  Avoid single points of failure – Separate racks – Separate data centers •  Avoid long recovery downtime – Use journaling – Use 3+ replicas •  Keep your actives close – Use priority to control where failovers happen

Slide 136

Slide 136 text

136 136 Q&A after this session

Slide 137

Slide 137 text

137 137 Introducing MongoDB into your Organization

Slide 138

Slide 138 text

138 138 Introducing MongoDB into your Organization Edouard Servan-Schreiber, Ph.D. Director for Solution Architecture [email protected] @edouardss

Slide 139

Slide 139 text

139 139 •  You  are  using,  or  want  to  use,  MongoDB   –  What  benefits?   –  Poten9al  Use  cases   –  Steering  the  adop9on  of  MongoDB   •  Why  is  MongoDB  Safe   –  Execu9on   –  Opera9onal   –  Financial   •  Why  10gen?   –  People   –  Company   –  Future  

Slide 140

Slide 140 text

140 140 Your First MongoDB Project

Slide 141

Slide 141 text

141 141 Big Data! New Programming models New Hardware Architecture

Slide 142

Slide 142 text

142 142 Horizontally Scalable { author: “roger”, date: new Date(), text: “Spirited Away”, tags: [“Tezuka”, “Manga”]} Document Oriented High Performance -indexes -RAM Application"

Slide 143

Slide 143 text

143 143 User  Data  Management   High  Volume  Data  Feeds     Content  Management   Opera9onal  Intelligence   Product  Data  Mgt  

Slide 144

Slide 144 text

144 144 •  “NoSQL databases are proving valuable for scaling out cloud and on- premises uses of numerous content types, and document-oriented open- source solutions are emerging as one of the leading choices. “

Slide 145

Slide 145 text

145 145 •  Reassuring  the  Ops  Team   •  Reassuring  the  Business  Team   •  Start  with  low  stakes  –  learn  to  trust   •  Grow  towards  a  mission  cri9cal  use  case   •  LET  US  HELP  YOU!    è  [email protected]  

Slide 146

Slide 146 text

146 146 Execution

Slide 147

Slide 147 text

147 147

Slide 148

Slide 148 text

148 148 { " _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), " author : "roger"," date : "Sat Jul 24 2010 19:47:11", " text : "Spirited Away"," tags : [ "Tezuka", "Manga" ]," comments : [" { author : ’’ Fred "," date : "Sat Jul 24 2010 20:51:03"," text : "Best Movie Ever” } , " { author : ’’ Bill "," date : "Sat Jul 24 2010 21:13:23"," text : ” No Way !! ” }" " ] " }" "

Slide 149

Slide 149 text

Iteration

Slide 150

Slide 150 text

150 150 •  Start   •  Develop   •  Scale  

Slide 151

Slide 151 text

151 151 Operational

Slide 152

Slide 152 text

152 152 •  Elas9c  capacity   •  Data  center  outages   •  Upgrading  DB  versions   •  Upgrade  App  versions   •  Change/Evolve  schema/representa9on  

Slide 153

Slide 153 text

153 153 •  Data  Durability     –  Journal   –  Replicated  Writes   •  Data  Consistency   –  Single  Master   –  Shard  to  Scale   •  YOU  are  in  control!  

Slide 154

Slide 154 text

154 154 •  Millions  of  IO  ops/sec   •  Petabytes  of  data   •  Commodity  hardware  –  Virtual  hardware  

Slide 155

Slide 155 text

155 155 Economics

Slide 156

Slide 156 text

156 156 •  Less  code   •  More  produc9ve  coding   •  Easier  to  maintain   •  Con9ngency  plans  for  turnover   •  Commodity  hardware   •  No  upfront  license,  pay  for  value  over  9me   •  Cost  visibility  for  growth  of  usage  

Slide 157

Slide 157 text

157 157 §  Analyze  a  staggering   amount  of  data  for  a  system   build  on  con9nuous  stream   of  high-­‐quality  text  pulled   from  online  sources   §  Adding  too  much  data  too   quickly  resulted  in  outages;   tables  locked  for  tens  of   seconds  during  inserts   §  Ini9ally  launched  en9rely  on   MySQL  but  quickly  hit   performance  road  blocks     Problem Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  drama?cally  smaller.   Since  we  don’t  spend  ?me  worrying  about  the  database,  we  can  spend  more  ?me  wri?ng  code  for  our   applica?on.   §  Migrated  5  billion  records  in   a  single  day  with  zero   down9me   §  MongoDB  powers  every   website  requests:  20m  API   calls  per  day   §  Ability  to  eliminated   memcached  layer,  crea9ng  a   simplified  system  that   required  fewer  resources   and  was  less  prone  to  error.   Why MongoDB §  Reduced  code  by  75%   compared  to  MySQL   §  Fetch  9me  cut  from  400ms   to  60ms   §  Sustained  insert  speed  of  8k   words  per  second,  with   frequent  bursts  of  up  to  50k   per  second   §  Significant  cost  savings  and   15%  reduc9on  in  servers     Impact Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Tony Tam, Vice President of Engineering and Technical Co-founder

Slide 158

Slide 158 text

158 158 Why 10gen ?

Slide 159

Slide 159 text

159 159 Dwight Merriman – CEO! Founder, CTO DoubleClick" Max Shireson – President! COO MarkLogic" 9 Years at Oracle" Eliot Horowitz – CTO ! Co-founder of Shopwiki, DoubleClick Erik Frieberg – VP Marketing! HP Software, Borland, BEA Ben Sabrin – VP of Sales ! VP of Sales at Jboss, over 9 years of Open Source experience

Slide 160

Slide 160 text

160 160 •  Community  and  Commercial   •  Dedicated  support  staff  across  the  globe   –  NY   –  CA   –  Dublin   –  London   –  Australia  

Slide 161

Slide 161 text

161 161 •  Union  Square  Ventures   •  Sequoia  Capital   •  Flybridge  Capital   •  NEA   •  $80M  raised  overall   •  Most  recent  round:  $42M  in  May…  

Slide 162

Slide 162 text

162 162 What’s in store…

Slide 163

Slide 163 text

163 163 •  Authen9ca9on   •  Data  encryp9on     –  At  rest   –  In  flight   •  Full  Text  Search   •  Global  Database  lock  ?   •  Monitoring  

Slide 164

Slide 164 text

164 164 Version  2.2  (now)     •  Database  level  locking   •  Aggrega9on  Framework   •  TTL  collec9ons   •  Geo-­‐aware  sharding   •  Read  Preferences   Version  2.4  (Q4  2012)     •  Kerberos/LDAP  authen9ca9on   •  Collec9on  level  locking   •  Full  Text  Search   •  Improved  Aggrega9on   Framework  

Slide 165

Slide 165 text

165 165 [email protected] Easy to start Easy to develop Easy to scale