Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Online Conference: Deep Dive with MongoDB

mongodb
July 11, 2012
6.3k

Online Conference: Deep Dive with MongoDB

July 11, 2012

This four hour online conference will introduce you to some MongoDB basics and get you up to speed with why and how you should choose MongoDB for your next project.

mongodb

July 11, 2012
Tweet

Transcript

  1. 3 3 •  Quick introduction to mongoDB •  Data modeling

    in mongoDB, queries, geospatial, updates and map reduce. •  Using a location-based app as an example •  Example works in mongoDB JS shell
  2. 4 4

  3. 5 5 MongoDB is a scalable, high-performance, open source, document-oriented

    database. •  Fast Querying •  In-place updates •  Full Index Support •  Replication /High Availability •  Auto-Sharding •  Aggregation; Map/Reduce •  GridFS
  4. 6 6 MongoDB is Implemented in C++ •  Windows, Linux,

    Mac OS-X, Solaris Drivers are available in many languages 10gen supported •  C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs! •  Multiple community supported drivers The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x
  5. 7 7 RDBMS MongoDB Table Collection Row(s) JSON Document Index

    Index Partition Shard Join Embedding/Linking Schema (implied Schema)
  6. 8 8 { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya", date

    : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-02-03T17:22:21.124Z"), text : "Best Post Ever!" }], comment_count : 1 }
  7. 9 9 • JSON has powerful, limited set of datatypes – 

    Mongo extends datatypes with Date, Int types, Id, … • MongoDB stores data in BSON • BSON is a binary representation of JSON –  Optimized for performance and navigational abilities –  Also compression See: bsonspec.org!
  8. 10 10 •  Intrinsic support for fast, iterative development • 

    Super low latency access to your data •  Very little CPU overhead •  No additional caching layer required •  Built in replication and horizontal scaling support
  9. 11 11 • Want to build an app where users can

    check in to a location • Leave notes or comments about that location
  10. 12 12 "As a user I want to be able

    to find other locations nearby" •  Need to store locations (Offices, Restaurants, etc) –  name, address, tags –  coordinates –  User generated content e.g. tips / notes
  11. 13 13 "As a user I want to be able

    to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations
  12. 15 15 > location_1 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012 }
  13. 16 16 > location_1 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.find({name: "Lotus Flower"})
  14. 17 17 > location_1 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.ensureIndex({name: 1}) > db.locations.find({name: "Lotus Flower"})
  15. 18 18 > location_2 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] }
  16. 19 19 > location_2 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1})
  17. 20 20 > location_2 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "dumplings"})
  18. 21 21 > location_3 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] }
  19. 22 22 > location_3 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"})
  20. 23 23 > location_3 = { name: "Lotus Flower", address:

    "123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}})
  21. 24 24 // creating your indexes: > db.locations.ensureIndex({tags: 1}) >

    db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({lat_long: "2d"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) // with regular expressions: > db.locations.find({name: /^Din/}) // by tag: > db.locations.find({tag: "dumplings"})
  22. 26 26 // initial data load: > db.locations.insert(location_3) // adding

    a tip with update: > db.locations.update( {name: "Lotus Flower"}, {$push: { tips: { user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!"} }})
  23. 27 27 > db.locations.findOne() { name: "Lotus Flower", address: "123

    University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387], tips:[{ user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!" }] }
  24. 28 28 "As a user I want to be able

    to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations
  25. 29 29 > user_1 = { _id: "[email protected]", name: "Asya",

    twitter: "asya999", checkins: [ {location: "Lotus Flower", ts: "28/03/2012"}, {location: "Meridian Hotel", ts: "27/03/2012"} ] } > db.users.ensureIndex({checkins.location: 1}) > db.users.find({checkins.location: "Lotus Flower"})
  26. 30 30 // find all users who've checked in here:

    > db.users.find({"checkins.location":"Lotus Flower"})
  27. 31 31 // find all users who've checked in here:

    > db.users.find({"checkins.location":"Lotus Flower"}) // find the last 10 checkins here? > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10)
  28. 32 32 // find all users who've checked in here:

    > db.users.find({"checkins.location":"Lotus Flower"}) // find the last 10 checkins here: - Warning! > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10) Hard to query for last 10
  29. 33 33 > user_2 = { _id: "[email protected]", name: "Asya",

    twitter: "asya999", } > checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } > db.checkins.ensureIndex({user: 1}) > db.checkins.find({user: user_id})
  30. 34 34 // find all users who've checked in here:

    > location_id = db.checkins.find({"name":"Lotus Flower"}) > u_ids = db.checkins.find({location: location_id}, {_id: -1, user: 1}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: location_id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: location_id, ts: {$gt: midnight}} ).count()
  31. 35 35 // Find most popular locations > agg =

    db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numEntries: {$sum: 1}}} ) > agg.result [{"_id": "Lotus Flower", "numEntries" : 17}]
  32. 36 36 // Find most popular locations > map_func =

    function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Lotus Flower", "value" : 17}
  33. 39 39 •  Single server - need a strong backup

    plan •  Replica sets - High availability - Automatic failover P P S S
  34. 40 40 •  Single server - need a strong backup

    plan •  Replica sets - High availability - Automatic failover •  Sharded - Horizontally scale - Auto balancing P S S P S S P P S S
  35. 41 41 User  Data  Management   High  Volume  Data  Feeds

        Content  Management   Opera9onal  Intelligence   E-­‐Commerce  
  36. 43 43 @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook

    | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by
  37. Goals Avoid anomalies when inserting, updating or deleting Minimize redesign

    when extending the schema Avoid bias toward a particular query Make use of all SQL features In MongoDB Similar goals apply but rules are different Denormalization for optimization is an option: most features still exist, contrary to BLOBS Normalization
  38. Terminology RDBMS   MongoDB   Table   Collection   Row(s)

      JSON  Document   Index   Index   Join   Embedding  &  Linking   Partition   Shard   Partition  Key   Shard  Key  
  39. Equivalent to a Table in SQL Cheap to create (max

    24000) Collections don’t have a fixed schema Common for documents in a collection to share a schema Document schema can evolve Consider using multiple related collections tied together by a naming convention: e.g. LogData-2011-02-08 Collections Basics
  40. Elements are name/value pairs, equivalent to column value in SQL

    elements can be nested Rich data types for values JSON for the human eye BSON for all internals 16MB maximum size (many books..) What you see is what is stored Document basics
  41. ! Design documents that simply map to your application !

    > post = { author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z"),! text: "Destination Moon",! tags: ["comic", "adventure"]! }! ! > db.blogs.save(post)! Design Session
  42. > db.blogs.find()! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! author: "Hergé", ! date:

    ISODate("2011-09-18T09:56:06.298Z"), ! text: "Destination Moon", ! tags: [ "comic", "adventure" ]! }   Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied Find the document
  43. Secondary index for “author” // 1 means ascending, -1 means

    descending! > db.blogs.ensureIndex( { author: 1 } )! ! > db.blogs.find( { author: 'Hergé' } ) ! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! date: ISODate("2011-09-18T09:56:06.298Z"),! author: "Hergé", ! ... }! Add and index, find via Index
  44. > db.blogs.find( { author: "Hergé" } ).explain() { "cursor" :

    "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } } Examine the query plan
  45. > db.blogs.find( { author: "Hergé" } ).explain() { "cursor" :

    "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } } Examine the query plan
  46. Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type,

    .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Query operators
  47. Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type,

    .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) !   Query operators
  48. Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type,

    .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) ! Counting: // number of posts written by Hergé! > db.blogs.find( { author: "Hergé" } ).count() ! Query operators
  49. > new_comment = { author: "Kyle", date: new Date(), text:

    "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Extending the Schema
  50. > db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),

    author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 } Extending the Schema
  51. // create index on nested documents: > db.blogs.ensureIndex( { "comments.author":

    1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) Extending the Schema
  52. // create index on nested documents: > db.blogs.ensureIndex( { "comments.author":

    1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) Extending the Schema
  53. // create index on nested documents: > db.blogs.ensureIndex( { "comments.author":

    1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an index Extending the Schema
  54. Patterns: •  Inheritance •  one to one •  one to

    many •  many to many Common Patterns
  55. shapes table Single Table Inheritance - MongoDB id   type

      area   radius   length   width   1   circle   3.14   1   2   square   4   2   3   rect   10   5   2  
  56. > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius:

    1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} Single Table Inheritance - MongoDB missing  values  not   stored!  
  57. > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius:

    1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) Single Table Inheritance - MongoDB
  58. > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius:

    1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) // create index > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) Single Table Inheritance - MongoDB index  only  values   present!  
  59. One to Many Either:     • Embedded  Array  /  Document:

      •  improves  read  speed   •  simplifies  schema     • Normalize:   •  if  list  grows  significantly   •  if  sub  items  are  updated  often   •  if  sub  items  are  more  than  1  level  deep  and  need  updating  
  60. One to Many Embedded Array:! • $slice operator to return subset

    of comments! • some queries become harder (e.g find latest comments across all blogs)! ! blogs: { ! author : "Hergé",! date : ISODate("2011-09-18T09:56:06.298Z"), ! comments : [! " {! " "author : "Kyle",! " "date : ISODate("2011-09-19T09:56:06.298Z"),! " "text : "great book"! " }! ]! }!
  61. One to Many Normalized (2 collections)! • most flexible! • more queries!

    ! blogs: { _id: 1000, ! author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z") }! ! comments : { _id : 1,! blogId: 1000,! author : "Kyle",! " " date : ISODate("2011-09-19T09:56:06.298Z") }! ! > blog = db.blogs.find( { text: "Destination Moon" } );! ! > db.ensureIndex( { blogId: 1 } ) // important!! > db.comments.find( { blogId: blog._id } );!
  62. // Each product list the IDs of the categories! products:!

    { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! Many - Many
  63. // Each product list the IDs of the categories! products:!

    { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Many - Many
  64. // Each product list the IDs of the categories! products:!

    { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Cuts mapping table and 2 indexes, but:! •  potential consistency issue! •  lists can grow too large! Many - Many
  65. // Each product list the IDs of the categories! products:!

    { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! Alternative
  66. // Each product list the IDs of the categories! products:!

    { _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! // All products for a given category! > db.products.ensureIndex( { category_ids: 1} ) // yes!! > db.products.find( { category_ids: 20 } )! ! ! Alternative
  67. Full Tree in Document   { retweet: [! { who:

    “Kyle”, text: “...”, ! retweet: [! {who: “James”, text: “...”,! retweet: []} ! ]}! ]! }!   Pros: Single Document, Performance, Intuitive Cons: Hard to search or update, document can easily get too large         Trees
  68. // Store all Ancestors of a node { _id: "a"

    } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) Array of Ancestors A   B   C   D   E   F  
  69. // Store all Ancestors of a node { _id: "a"

    } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) Array of Ancestors A   B   C   D   E   F  
  70. // Store all Ancestors of a node { _id: "a"

    } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) // find tweet history of f: > tweets = db.tweets.findOne( { _id: "f" } ).tree > db.tweets.find( { _id: { $in : tweets } } ) Array of Ancestors A   B   C   D   E   F  
  71. Store hierarchy as a path expression •  Separate each node

    by a delimiter, e.g. “,” •  Use text search for find parts of a tree •  search must be left-rooted and use an index! { retweets: [! { _id: "a", text: "initial tweet", ! path: "a" },! { _id: "b", text: "reweet with comment",! path: "a,b" },! { _id: "c", text: "reply to retweet",! path : "a,b,c"} ] }! ! // Find the conversations "a" started ! > db.tweets.find( { path: /^a/i } )! // Find the conversations under a branch ! > db.tweets.find( { path: /^a,b/i } )! Trees as Paths A   B   C   D   E   F  
  72. // Time series buckets, hour and minute sub-docs { _id:

    "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } } // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 }) Time Series
  73. •  Sequence of key/value pairs •  NOT a hash map

    •  Optimized to scan quickly BSON Storage ...   0   1   2   3   1439   What is the cost of update the minute before midnight?
  74. •  Can skip sub-documents BSON Storage ...   0  

    1   59   1439   How could this change the schema? 0   ...   23   ...   1380  
  75. Use more of a Tree structure by nesting! // Time

    series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { "minute.23.59": 1 } }) Time Series
  76. Document to represent a shopping order: { _id: 1234, ts:

    ISODate("2011-12-09T00:00:00.000Z") customerId: 67, total_price: 1050, items: [{ sku: 123, quantity: 2, price: 50, name: “macbook”, thumbnail: “macbook.png” }, { sku: 234, quantity: 1, price: 20, name: “iphone”, thumbnail: “iphone.png” }, ... } } The item information is duplicated in every order that reference it. Mongo’s flexible schema makes it easy! Duplicate data
  77. Pros: only 1 query to get all information needed to

    display the order processing on the db is as fast as a BLOB can achieve much higher performance Cons: more storage used ... cheap enough updates are much more complicated ... just consider fields immutable Duplicate data
  78. Basic data design principles stay the same ... But MongoDB

    is more flexible and brings possibilities embed or duplicate data to speed up operations, cut down the number of collections and indexes watch for documents growing too large make sure to use the proper indexes for querying and sorting schema should feel natural to your application! Summary
  79. @mongodb   conferences,  appearances,  and  meetups   http://www.10gen.com/events   http://bit.ly/mongo_

        Facebook                    |                  Twitter                  |                  LinkedIn   http://linkd.in/joinmongo   download at mongodb.org
  80. 101 101 •  High Availability (auto-failover) •  Read Scaling (extra

    copies to read from) •  Backups –  Online, Delayed Copy (fat finger) –  Point in Time (PiT) backups •  Use (hidden) replica for secondary workload –  Analytics –  Data-processing –  Integration with external systems
  81. 102 102 Planned –  Hardware upgrade –  O/S or file-system

    tuning –  Relocation of data to new file-system / storage –  Software upgrade Unplanned –  Hardware failure –  Data center failure –  Region outage –  Human error –  Application corruption
  82. 103 103 •  A cluster of N servers •  All

    writes to primary •  Reads can be to primary (default) or a secondary •  Any (one) node can be primary •  Consensus election of primary •  Automatic failover •  Automatic recovery
  83. 104 104 •  Replica Set is made up of 2

    or more nodes Member 1 Member 2 Member 3
  84. 105 105 •  Election establishes the PRIMARY •  Data replication

    from PRIMARY to SECONDARY Member 1 Member 2 Primary Member 3
  85. 106 106 •  PRIMARY may fail •  Automatic election of

    new PRIMARY if majority exists Member 1 Member 2 DOWN Member 3 negotiate new master
  86. 107 107 Member 1 Member 2 DOWN Member 3 Primary

    negotiate new master •  New PRIMARY elected •  Replica Set re-established
  87. 111 111 Primary Secondary Secondary As long as a partition

    can see a majority (>50%) of the cluster, then it will elect a primary.
  88. 116 116 Secondary 33% of cluster visible Read only mode.

    Primary Secondary Failed Node Failed Node Secondary
  89. 118 118 Primary Secondary Secondary Secondary Failed Node Secondary Failed

    Node 50% of cluster visible Read only mode. Secondary
  90. 119 119 Primary Secondary Failed Node Secondary Failed Node 50%

    of cluster visible Read only mode. Secondary Secondary Secondary
  91. 125 125 Primary Secondary Secondary San Francisco Dallas Priority 1

    Priority 1 Priority 0 Disaster recover data center. Will never become primary automatically.
  92. 131 131 Primary Arbiter Secondary Primary Arbiter Secondary 1 2

    Primary Arbiter Secondary 3 Secondary Full Sync Uh oh. Full Sync is going to use a lot of resources on the primary. So I may have downtime or degraded performance
  93. 134 134 Primary Secondary Primary Secondary 1 2 Primary Secondary

    3 Secondary Full Sync Sync can happen from secondary, which will not impact traffic on Primary. Secondary Secondary Secondary
  94. 135 135 •  Avoid single points of failure – Separate racks

    – Separate data centers •  Avoid long recovery downtime – Use journaling – Use 3+ replicas •  Keep your actives close – Use priority to control where failovers happen
  95. 139 139 •  You  are  using,  or  want  to  use,

     MongoDB   –  What  benefits?   –  Poten9al  Use  cases   –  Steering  the  adop9on  of  MongoDB   •  Why  is  MongoDB  Safe   –  Execu9on   –  Opera9onal   –  Financial   •  Why  10gen?   –  People   –  Company   –  Future  
  96. 142 142 Horizontally Scalable { author: “roger”, date: new Date(),

    text: “Spirited Away”, tags: [“Tezuka”, “Manga”]} Document Oriented High Performance -indexes -RAM Application"
  97. 143 143 User  Data  Management   High  Volume  Data  Feeds

        Content  Management   Opera9onal  Intelligence   Product  Data  Mgt  
  98. 144 144 •  “NoSQL databases are proving valuable for scaling

    out cloud and on- premises uses of numerous content types, and document-oriented open- source solutions are emerging as one of the leading choices. “
  99. 145 145 •  Reassuring  the  Ops  Team   •  Reassuring

     the  Business  Team   •  Start  with  low  stakes  –  learn  to  trust   •  Grow  towards  a  mission  cri9cal  use  case   •  LET  US  HELP  YOU!    è  [email protected]  
  100. 148 148 { " _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), " author :

    "roger"," date : "Sat Jul 24 2010 19:47:11", " text : "Spirited Away"," tags : [ "Tezuka", "Manga" ]," comments : [" { author : ’’ Fred "," date : "Sat Jul 24 2010 20:51:03"," text : "Best Movie Ever” } , " { author : ’’ Bill "," date : "Sat Jul 24 2010 21:13:23"," text : ” No Way !! ” }" " ] " }" "
  101. 152 152 •  Elas9c  capacity   •  Data  center  outages

      •  Upgrading  DB  versions   •  Upgrade  App  versions   •  Change/Evolve  schema/representa9on  
  102. 153 153 •  Data  Durability     –  Journal  

    –  Replicated  Writes   •  Data  Consistency   –  Single  Master   –  Shard  to  Scale   •  YOU  are  in  control!  
  103. 154 154 •  Millions  of  IO  ops/sec   •  Petabytes

     of  data   •  Commodity  hardware  –  Virtual  hardware  
  104. 156 156 •  Less  code   •  More  produc9ve  coding

      •  Easier  to  maintain   •  Con9ngency  plans  for  turnover   •  Commodity  hardware   •  No  upfront  license,  pay  for  value  over  9me   •  Cost  visibility  for  growth  of  usage  
  105. 157 157 §  Analyze  a  staggering   amount  of  data

     for  a  system   build  on  con9nuous  stream   of  high-­‐quality  text  pulled   from  online  sources   §  Adding  too  much  data  too   quickly  resulted  in  outages;   tables  locked  for  tens  of   seconds  during  inserts   §  Ini9ally  launched  en9rely  on   MySQL  but  quickly  hit   performance  road  blocks     Problem Life  with  MongoDB  has  been  good  for  Wordnik.  Our  code  is  faster,  more  flexible  and  drama?cally  smaller.   Since  we  don’t  spend  ?me  worrying  about  the  database,  we  can  spend  more  ?me  wri?ng  code  for  our   applica?on.   §  Migrated  5  billion  records  in   a  single  day  with  zero   down9me   §  MongoDB  powers  every   website  requests:  20m  API   calls  per  day   §  Ability  to  eliminated   memcached  layer,  crea9ng  a   simplified  system  that   required  fewer  resources   and  was  less  prone  to  error.   Why MongoDB §  Reduced  code  by  75%   compared  to  MySQL   §  Fetch  9me  cut  from  400ms   to  60ms   §  Sustained  insert  speed  of  8k   words  per  second,  with   frequent  bursts  of  up  to  50k   per  second   §  Significant  cost  savings  and   15%  reduc9on  in  servers     Impact Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Tony Tam, Vice President of Engineering and Technical Co-founder
  106. 159 159 Dwight Merriman – CEO! Founder, CTO DoubleClick" Max

    Shireson – President! COO MarkLogic" 9 Years at Oracle" Eliot Horowitz – CTO ! Co-founder of Shopwiki, DoubleClick Erik Frieberg – VP Marketing! HP Software, Borland, BEA Ben Sabrin – VP of Sales ! VP of Sales at Jboss, over 9 years of Open Source experience
  107. 160 160 •  Community  and  Commercial   •  Dedicated  support

     staff  across  the  globe   –  NY   –  CA   –  Dublin   –  London   –  Australia  
  108. 161 161 •  Union  Square  Ventures   •  Sequoia  Capital

      •  Flybridge  Capital   •  NEA   •  $80M  raised  overall   •  Most  recent  round:  $42M  in  May…  
  109. 163 163 •  Authen9ca9on   •  Data  encryp9on    

    –  At  rest   –  In  flight   •  Full  Text  Search   •  Global  Database  lock  ?   •  Monitoring  
  110. 164 164 Version  2.2  (now)     •  Database  level

     locking   •  Aggrega9on  Framework   •  TTL  collec9ons   •  Geo-­‐aware  sharding   •  Read  Preferences   Version  2.4  (Q4  2012)     •  Kerberos/LDAP  authen9ca9on   •  Collec9on  level  locking   •  Full  Text  Search   •  Improved  Aggrega9on   Framework