Online Conference: Deep Dive with MongoDB

Slide 1

Slide 1 text

1 1 Online Conference: Deep Dive with MongoDB

Slide 2

Slide 2 text

2 2 Building your first App with MongoDB

Slide 3

Slide 3 text

3 3 •  Quick introduction to mongoDB •  Data modeling in mongoDB, queries, geospatial, updates and map reduce. •  Using a location-based app as an example •  Example works in mongoDB JS shell

Slide 4

Slide 4 text

4 4

Slide 5

Slide 5 text

5 5 MongoDB is a scalable, high-performance, open source, document-oriented database. •  Fast Querying •  In-place updates •  Full Index Support •  Replication /High Availability •  Auto-Sharding •  Aggregation; Map/Reduce •  GridFS

Slide 6

Slide 6 text

6 6 MongoDB is Implemented in C++ •  Windows, Linux, Mac OS-X, Solaris Drivers are available in many languages 10gen supported •  C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs! •  Multiple community supported drivers The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the ﬁle again. If the red x

Slide 7

Slide 7 text

7 7 RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Partition Shard Join Embedding/Linking Schema (implied Schema)

Slide 8

Slide 8 text

8 8 { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-02-03T17:22:21.124Z"), text : "Best Post Ever!" }], comment_count : 1 }

Slide 9

Slide 9 text

9 9 • JSON has powerful, limited set of datatypes –  Mongo extends datatypes with Date, Int types, Id, … • MongoDB stores data in BSON • BSON is a binary representation of JSON –  Optimized for performance and navigational abilities –  Also compression See: bsonspec.org!

Slide 10

Slide 10 text

10 10 •  Intrinsic support for fast, iterative development •  Super low latency access to your data •  Very little CPU overhead •  No additional caching layer required •  Built in replication and horizontal scaling support

Slide 11

Slide 11 text

11 11 • Want to build an app where users can check in to a location • Leave notes or comments about that location

Slide 12

Slide 12 text

12 12 "As a user I want to be able to find other locations nearby" •  Need to store locations (Offices, Restaurants, etc) –  name, address, tags –  coordinates –  User generated content e.g. tips / notes

Slide 13

Slide 13 text

13 13 "As a user I want to be able to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations

Slide 14

Slide 14 text

14 14 users user1, user2 loc1, loc2, loc3 locations checkins checkin1, checkin2

Slide 15

Slide 15 text

15 15 > location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 }

Slide 16

Slide 16 text

16 16 > location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.find({name: "Lotus Flower"})

Slide 17

Slide 17 text

17 17 > location_1 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.ensureIndex({name: 1}) > db.locations.find({name: "Lotus Flower"})

Slide 18

Slide 18 text

18 18 > location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] }

Slide 19

Slide 19 text

19 19 > location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1})

Slide 20

Slide 20 text

20 20 > location_2 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "dumplings"})

Slide 21

Slide 21 text

21 21 > location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] }

Slide 22

Slide 22 text

22 22 > location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"})

Slide 23

Slide 23 text

23 23 > location_3 = { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}})

Slide 24

Slide 24 text

24 24 // creating your indexes: > db.locations.ensureIndex({tags: 1}) > db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({lat_long: "2d"}) // ﬁnding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) // with regular expressions: > db.locations.find({name: /^Din/}) // by tag: > db.locations.find({tag: "dumplings"})

Slide 25

Slide 25 text

25 25 Atomic operators: $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

Slide 26

Slide 26 text

26 26 // initial data load: > db.locations.insert(location_3) // adding a tip with update: > db.locations.update( {name: "Lotus Flower"}, {$push: { tips: { user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!"} }})

Slide 27

Slide 27 text

27 27 > db.locations.findOne() { name: "Lotus Flower", address: "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387], tips:[{ user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!" }] }

Slide 28

Slide 28 text

28 28 "As a user I want to be able to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations

Slide 29

Slide 29 text

29 29 > user_1 = { _id: "[email protected]", name: "Asya", twitter: "asya999", checkins: [ {location: "Lotus Flower", ts: "28/03/2012"}, {location: "Meridian Hotel", ts: "27/03/2012"} ] } > db.users.ensureIndex({checkins.location: 1}) > db.users.find({checkins.location: "Lotus Flower"})

Slide 30

Slide 30 text

30 30 // ﬁnd all users who've checked in here: > db.users.find({"checkins.location":"Lotus Flower"})

Slide 31

Slide 31 text

31 31 // ﬁnd all users who've checked in here: > db.users.find({"checkins.location":"Lotus Flower"}) // ﬁnd the last 10 checkins here? > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10)

Slide 32

Slide 32 text

32 32 // ﬁnd all users who've checked in here: > db.users.find({"checkins.location":"Lotus Flower"}) // ﬁnd the last 10 checkins here: - Warning! > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10) Hard to query for last 10

Slide 33

Slide 33 text

33 33 > user_2 = { _id: "[email protected]", name: "Asya", twitter: "asya999", } > checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } > db.checkins.ensureIndex({user: 1}) > db.checkins.find({user: user_id})

Slide 34

Slide 34 text

34 34 // ﬁnd all users who've checked in here: > location_id = db.checkins.find({"name":"Lotus Flower"}) > u_ids = db.checkins.find({location: location_id}, {_id: -1, user: 1}) > users = db.users.find({_id: {$in: u_ids}}) // ﬁnd the last 10 checkins here: > db.checkins.find({location: location_id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: location_id, ts: {$gt: midnight}} ).count()

Slide 35

Slide 35 text

35 35 // Find most popular locations > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numEntries: {$sum: 1}}} ) > agg.result [{"_id": "Lotus Flower", "numEntries" : 17}]

Slide 36

Slide 36 text

36 36 // Find most popular locations > map_func = function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Lotus Flower", "value" : 17}

Slide 37

Slide 37 text

37 37 Deployment

Slide 38

Slide 38 text

38 38 P •  Single server - need a strong backup plan

Slide 39

Slide 39 text

39 39 •  Single server - need a strong backup plan •  Replica sets - High availability - Automatic failover P P S S

Slide 40

Slide 40 text

40 40 •  Single server - need a strong backup plan •  Replica sets - High availability - Automatic failover •  Sharded - Horizontally scale - Auto balancing P S S P S S P P S S

Slide 41

Slide 41 text

41 41 User Data Management High Volume Data Feeds Content Management Opera9onal Intelligence E-‐Commerce

Slide 42

Slide 42 text

42 42

Slide 43

Slide 43 text

43 43 @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by

Slide 44

Slide 44 text

44 44 Schema Design with MongoDB

Slide 45

Slide 45 text

Schema Design with MongoDB Antoine Girbal [email protected] @antoinegirbal

Slide 46

Slide 46 text

So why model data? http://www.ﬂickr.com/photos/42304632@N00/493639870/

Slide 47

Slide 47 text

Goals Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Avoid bias toward a particular query Make use of all SQL features In MongoDB Similar goals apply but rules are different Denormalization for optimization is an option: most features still exist, contrary to BLOBS Normalization

Slide 48

Slide 48 text

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key

Slide 49

Slide 49 text

Equivalent to a Table in SQL Cheap to create (max 24000) Collections don’t have a fixed schema Common for documents in a collection to share a schema Document schema can evolve Consider using multiple related collections tied together by a naming convention: e.g. LogData-2011-02-08 Collections Basics

Slide 50

Slide 50 text

Elements are name/value pairs, equivalent to column value in SQL elements can be nested Rich data types for values JSON for the human eye BSON for all internals 16MB maximum size (many books..) What you see is what is stored Document basics

Slide 51

Slide 51 text

Schema Design - Relational

Slide 52

Slide 52 text

Schema Design - MongoDB

Slide 53

Slide 53 text

Schema Design - MongoDB embedding

Slide 54

Slide 54 text

Schema Design - MongoDB embedding linking

Slide 55

Slide 55 text

! Design documents that simply map to your application ! > post = { author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z"),! text: "Destination Moon",! tags: ["comic", "adventure"]! }! ! > db.blogs.save(post)! Design Session

Slide 56

Slide 56 text

> db.blogs.find()! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! author: "Hergé", ! date: ISODate("2011-09-18T09:56:06.298Z"), ! text: "Destination Moon", ! tags: [ "comic", "adventure" ]! } Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied Find the document

Slide 57

Slide 57 text

Secondary index for “author” // 1 means ascending, -1 means descending! > db.blogs.ensureIndex( { author: 1 } )! ! > db.blogs.find( { author: 'Hergé' } ) ! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! date: ISODate("2011-09-18T09:56:06.298Z"),! author: "Hergé", ! ... }! Add and index, find via Index

Slide 58

Slide 58 text

> db.blogs.find( { author: "Hergé" } ).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } } Examine the query plan

Slide 59

Slide 59 text

Slide 60

Slide 60 text

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Query operators

Slide 61

Slide 61 text

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) ! Query operators

Slide 62

Slide 62 text

Slide 63

Slide 63 text

> new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Extending the Schema

Slide 64

Slide 64 text

> db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 } Extending the Schema

Slide 65

Slide 65 text

// create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) Extending the Schema

Slide 66

Slide 66 text

Slide 67

Slide 67 text

// create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an index Extending the Schema

Slide 68

Slide 68 text

Patterns: •  Inheritance •  one to one •  one to many •  many to many Common Patterns

Slide 69

Slide 69 text

Inheritance

Slide 70

Slide 70 text

shapes table Single Table Inheritance - MongoDB id type area radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2

Slide 71

Slide 71 text

> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} Single Table Inheritance - MongoDB missing values not stored!

Slide 72

Slide 72 text

> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) Single Table Inheritance - MongoDB

Slide 73

Slide 73 text

Slide 74

Slide 74 text

One to Many Either: • Embedded Array / Document: •  improves read speed •  simpliﬁes schema • Normalize: •  if list grows signiﬁcantly •  if sub items are updated often •  if sub items are more than 1 level deep and need updating

Slide 75

Slide 75 text

One to Many Embedded Array:! • $slice operator to return subset of comments! • some queries become harder (e.g find latest comments across all blogs)! ! blogs: { ! author : "Hergé",! date : ISODate("2011-09-18T09:56:06.298Z"), ! comments : [! " {! " "author : "Kyle",! " "date : ISODate("2011-09-19T09:56:06.298Z"),! " "text : "great book"! " }! ]! }!

Slide 76

Slide 76 text

One to Many Normalized (2 collections)! • most flexible! • more queries! ! blogs: { _id: 1000, ! author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z") }! ! comments : { _id : 1,! blogId: 1000,! author : "Kyle",! " " date : ISODate("2011-09-19T09:56:06.298Z") }! ! > blog = db.blogs.find( { text: "Destination Moon" } );! ! > db.ensureIndex( { blogId: 1 } ) // important!! > db.comments.find( { blogId: blog._id } );!

Slide 77

Slide 77 text

Example: •  Product can be in many categories •  Category can have many products Many - Many

Slide 78

Slide 78 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! Many - Many

Slide 79

Slide 79 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Many - Many

Slide 80

Slide 80 text

Slide 81

Slide 81 text

Slide 82

Slide 82 text

// Each product list the IDs of the categories! products:! { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! // All products for a given category! > db.products.ensureIndex( { category_ids: 1} ) // yes!! > db.products.find( { category_ids: 20 } )! ! ! Alternative

Slide 83

Slide 83 text

Use cases: •  Trees •  Time Series Common Use Cases

Slide 84

Slide 84 text

Hierarchical information Trees

Slide 85

Slide 85 text

Full Tree in Document { retweet: [! { who: “Kyle”, text: “...”, ! retweet: [! {who: “James”, text: “...”,! retweet: []} ! ]}! ]! }! Pros: Single Document, Performance, Intuitive Cons: Hard to search or update, document can easily get too large Trees

Slide 86

Slide 86 text

Slide 87

Slide 87 text

// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) Array of Ancestors A B C D E F

Slide 88

Slide 88 text

Slide 89

Slide 89 text

Store hierarchy as a path expression •  Separate each node by a delimiter, e.g. “,” •  Use text search for find parts of a tree •  search must be left-rooted and use an index! { retweets: [! { _id: "a", text: "initial tweet", ! path: "a" },! { _id: "b", text: "reweet with comment",! path: "a,b" },! { _id: "c", text: "reply to retweet",! path : "a,b,c"} ] }! ! // Find the conversations "a" started ! > db.tweets.find( { path: /^a/i } )! // Find the conversations under a branch ! > db.tweets.find( { path: /^a,b/i } )! Trees as Paths A B C D E F

Slide 90

Slide 90 text

•  Records stats by •  Day, Hour, Minute •  Show time series Time Series

Slide 91

Slide 91 text

// Time series buckets, hour and minute sub-docs { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } } // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 }) Time Series

Slide 92

Slide 92 text

•  Sequence of key/value pairs •  NOT a hash map •  Optimized to scan quickly BSON Storage ... 0 1 2 3 1439 What is the cost of update the minute before midnight?

Slide 93

Slide 93 text

•  Can skip sub-documents BSON Storage ... 0 1 59 1439 How could this change the schema? 0 ... 23 ... 1380

Slide 94

Slide 94 text

Use more of a Tree structure by nesting! // Time series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { "minute.23.59": 1 } }) Time Series

Slide 95

Slide 95 text

Document to represent a shopping order: { _id: 1234, ts: ISODate("2011-12-09T00:00:00.000Z") customerId: 67, total_price: 1050, items: [{ sku: 123, quantity: 2, price: 50, name: “macbook”, thumbnail: “macbook.png” }, { sku: 234, quantity: 1, price: 20, name: “iphone”, thumbnail: “iphone.png” }, ... } } The item information is duplicated in every order that reference it. Mongo’s flexible schema makes it easy! Duplicate data

Slide 96

Slide 96 text

Pros: only 1 query to get all information needed to display the order processing on the db is as fast as a BLOB can achieve much higher performance Cons: more storage used ... cheap enough updates are much more complicated ... just consider fields immutable Duplicate data

Slide 97

Slide 97 text

Basic data design principles stay the same ... But MongoDB is more flexible and brings possibilities embed or duplicate data to speed up operations, cut down the number of collections and indexes watch for documents growing too large make sure to use the proper indexes for querying and sorting schema should feel natural to your application! Summary

Slide 98

Slide 98 text

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongo_ Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org

Slide 99

Slide 99 text

99 99 Replication and Replica Sets

Slide 100

Slide 100 text

100 100 Why Have Replication?

Slide 101

Slide 101 text

101 101 •  High Availability (auto-failover) •  Read Scaling (extra copies to read from) •  Backups –  Online, Delayed Copy (fat finger) –  Point in Time (PiT) backups •  Use (hidden) replica for secondary workload –  Analytics –  Data-processing –  Integration with external systems

Slide 102

Slide 102 text

102 102 Planned –  Hardware upgrade –  O/S or file-system tuning –  Relocation of data to new file-system / storage –  Software upgrade Unplanned –  Hardware failure –  Data center failure –  Region outage –  Human error –  Application corruption

Slide 103

Slide 103 text

103 103 •  A cluster of N servers •  All writes to primary •  Reads can be to primary (default) or a secondary •  Any (one) node can be primary •  Consensus election of primary •  Automatic failover •  Automatic recovery

Slide 104

Slide 104 text

104 104 •  Replica Set is made up of 2 or more nodes Member 1 Member 2 Member 3

Slide 105

Slide 105 text

105 105 •  Election establishes the PRIMARY •  Data replication from PRIMARY to SECONDARY Member 1 Member 2 Primary Member 3

Slide 106

Slide 106 text

106 106 •  PRIMARY may fail •  Automatic election of new PRIMARY if majority exists Member 1 Member 2 DOWN Member 3 negotiate new master

Slide 107

Slide 107 text

107 107 Member 1 Member 2 DOWN Member 3 Primary negotiate new master •  New PRIMARY elected •  Replica Set re-established

Slide 108

Slide 108 text

108 108 •  Automatic recovery Member 1 Member 3  Primary Member 2 Recovering

Slide 109

Slide 109 text

109 109 •  Replica Set re-established Member 1 Member 3  Primary Member 2

Slide 110

Slide 110 text

110 110 Understanding automatic failover

Slide 111

Slide 111 text

111 111 Primary Secondary Secondary As long as a partition can see a majority (>50%) of the cluster, then it will elect a primary.

Slide 112

Slide 112 text

112 112 Primary Failed Node Secondary 66% of cluster visible. Primary is elected

Slide 113

Slide 113 text

113 113 Failed Node 33% of cluster visible. Read only mode. Failed Node Secondary

Slide 114

Slide 114 text

114 114 Primary Secondary Secondary

Slide 115

Slide 115 text

115 115 Primary Secondary Secondary Primary Failed Node Secondary 66% of cluster visible Primary is elected

Slide 116

Slide 116 text

116 116 Secondary 33% of cluster visible Read only mode. Primary Secondary Failed Node Failed Node Secondary

Slide 117

Slide 117 text

117 117 Primary Secondary Secondary Secondary

Slide 118

Slide 118 text

118 118 Primary Secondary Secondary Secondary Failed Node Secondary Failed Node 50% of cluster visible Read only mode. Secondary

Slide 119

Slide 119 text

119 119 Primary Secondary Failed Node Secondary Failed Node 50% of cluster visible Read only mode. Secondary Secondary Secondary

Slide 120

Slide 120 text

120 120 Avoid single points of failure

Slide 121

Slide 121 text

121 121

Slide 122

Slide 122 text

122 122 Primary Secondary Secondary Top of rack switch Rack falls over

Slide 123

Slide 123 text

123 123 Primary Secondary Secondary Loss of internet Building burns dow

Slide 124

Slide 124 text

124 124 Primary Secondary Secondary San Francisco Dallas

Slide 125

Slide 125 text

125 125 Primary Secondary Secondary San Francisco Dallas Priority 1 Priority 1 Priority 0 Disaster recover data center. Will never become primary automatically.

Slide 126

Slide 126 text

126 126 Primary Secondary Secondary San Francisco Dallas New York

Slide 127

Slide 127 text

127 127 Fast recovery

Slide 128

Slide 128 text

128 128 Primary Arbiter Secondary Is this a good idea?

Slide 129

Slide 129 text

129 129 Primary Arbiter Secondary 1

Slide 130

Slide 130 text

130 130 Primary Arbiter Secondary Primary Arbiter Secondary 1 2

Slide 131

Slide 131 text

131 131 Primary Arbiter Secondary Primary Arbiter Secondary 1 2 Primary Arbiter Secondary 3 Secondary Full Sync Uh oh. Full Sync is going to use a lot of resources on the primary. So I may have downtime or degraded performance

Slide 132

Slide 132 text

132 132 Primary Secondary 1 Secondary

Slide 133

Slide 133 text

133 133 Primary Secondary Primary Secondary 1 2 Secondary Secondary

Slide 134

Slide 134 text

134 134 Primary Secondary Primary Secondary 1 2 Primary Secondary 3 Secondary Full Sync Sync can happen from secondary, which will not impact traffic on Primary. Secondary Secondary Secondary

Slide 135

Slide 135 text

135 135 •  Avoid single points of failure – Separate racks – Separate data centers •  Avoid long recovery downtime – Use journaling – Use 3+ replicas •  Keep your actives close – Use priority to control where failovers happen

Slide 136

Slide 136 text

136 136 Q&A after this session

Slide 137

Slide 137 text

137 137 Introducing MongoDB into your Organization

Slide 138

Slide 138 text

138 138 Introducing MongoDB into your Organization Edouard Servan-Schreiber, Ph.D. Director for Solution Architecture [email protected] @edouardss

Slide 139

Slide 139 text

139 139 •  You are using, or want to use, MongoDB –  What beneﬁts? –  Poten9al Use cases –  Steering the adop9on of MongoDB •  Why is MongoDB Safe –  Execu9on –  Opera9onal –  Financial •  Why 10gen? –  People –  Company –  Future

Slide 140

Slide 140 text

140 140 Your First MongoDB Project

Slide 141

Slide 141 text

141 141 Big Data! New Programming models New Hardware Architecture

Slide 142

Slide 142 text

142 142 Horizontally Scalable { author: “roger”, date: new Date(), text: “Spirited Away”, tags: [“Tezuka”, “Manga”]} Document Oriented High Performance -indexes -RAM Application"

Slide 143

Slide 143 text

143 143 User Data Management High Volume Data Feeds Content Management Opera9onal Intelligence Product Data Mgt

Slide 144

Slide 144 text

144 144 •  “NoSQL databases are proving valuable for scaling out cloud and on- premises uses of numerous content types, and document-oriented open- source solutions are emerging as one of the leading choices. “

Slide 145

Slide 145 text

145 145 •  Reassuring the Ops Team •  Reassuring the Business Team •  Start with low stakes – learn to trust •  Grow towards a mission cri9cal use case •  LET US HELP YOU! è [email protected]

Slide 146

Slide 146 text

146 146 Execution

Slide 147

Slide 147 text

147 147

Slide 148

Slide 148 text

148 148 { " _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), " author : "roger"," date : "Sat Jul 24 2010 19:47:11", " text : "Spirited Away"," tags : [ "Tezuka", "Manga" ]," comments : [" { author : ’’ Fred "," date : "Sat Jul 24 2010 20:51:03"," text : "Best Movie Ever” } , " { author : ’’ Bill "," date : "Sat Jul 24 2010 21:13:23"," text : ” No Way !! ” }" " ] " }" "

Slide 149

Slide 149 text

Iteration

Slide 150

Slide 150 text

150 150 •  Start •  Develop •  Scale

Slide 151

Slide 151 text

151 151 Operational

Slide 152

Slide 152 text

152 152 •  Elas9c capacity •  Data center outages •  Upgrading DB versions •  Upgrade App versions •  Change/Evolve schema/representa9on

Slide 153

Slide 153 text

153 153 •  Data Durability –  Journal –  Replicated Writes •  Data Consistency –  Single Master –  Shard to Scale •  YOU are in control!

Slide 154

Slide 154 text

154 154 •  Millions of IO ops/sec •  Petabytes of data •  Commodity hardware – Virtual hardware

Slide 155

Slide 155 text

155 155 Economics

Slide 156

Slide 156 text

156 156 •  Less code •  More produc9ve coding •  Easier to maintain •  Con9ngency plans for turnover •  Commodity hardware •  No upfront license, pay for value over 9me •  Cost visibility for growth of usage

Slide 157

Slide 157 text

157 157 §  Analyze a staggering amount of data for a system build on con9nuous stream of high-‐quality text pulled from online sources §  Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts §  Ini9ally launched en9rely on MySQL but quickly hit performance road blocks Problem Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and drama?cally smaller. Since we don’t spend ?me worrying about the database, we can spend more ?me wri?ng code for our applica?on. §  Migrated 5 billion records in a single day with zero down9me §  MongoDB powers every website requests: 20m API calls per day §  Ability to eliminated memcached layer, crea9ng a simplified system that required fewer resources and was less prone to error. Why MongoDB §  Reduced code by 75% compared to MySQL §  Fetch 9me cut from 400ms to 60ms §  Sustained insert speed of 8k words per second, with frequent bursts of up to 50k per second §  Significant cost savings and 15% reduc9on in servers Impact Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Tony Tam, Vice President of Engineering and Technical Co-founder

Slide 158

Slide 158 text

158 158 Why 10gen ?

Slide 159

Slide 159 text

159 159 Dwight Merriman – CEO! Founder, CTO DoubleClick" Max Shireson – President! COO MarkLogic" 9 Years at Oracle" Eliot Horowitz – CTO ! Co-founder of Shopwiki, DoubleClick Erik Frieberg – VP Marketing! HP Software, Borland, BEA Ben Sabrin – VP of Sales ! VP of Sales at Jboss, over 9 years of Open Source experience

Slide 160

Slide 160 text

160 160 •  Community and Commercial •  Dedicated support staﬀ across the globe –  NY –  CA –  Dublin –  London –  Australia

Slide 161

Slide 161 text

161 161 •  Union Square Ventures •  Sequoia Capital •  Flybridge Capital •  NEA •  $80M raised overall •  Most recent round: $42M in May…

Slide 162

Slide 162 text

162 162 What’s in store…

Slide 163

Slide 163 text

163 163 •  Authen9ca9on •  Data encryp9on –  At rest –  In ﬂight •  Full Text Search •  Global Database lock ? •  Monitoring

Slide 164

Slide 164 text

164 164 Version 2.2 (now) •  Database level locking •  Aggrega9on Framework •  TTL collec9ons •  Geo-‐aware sharding •  Read Preferences Version 2.4 (Q4 2012) •  Kerberos/LDAP authen9ca9on •  Collec9on level locking •  Full Text Search •  Improved Aggrega9on Framework

Slide 165

Slide 165 text

165 165 [email protected] Easy to start Easy to develop Easy to scale