Online Conference: Deep Dive with MongoDB

1 1 Online Conference: Deep Dive with MongoDB

2 2 Building your first App with MongoDB

3 3 •  Quick introduction to mongoDB •  Data modeling
in mongoDB, queries, geospatial, updates and map reduce. •  Using a location-based app as an example •  Example works in mongoDB JS shell

5 5 MongoDB is a scalable, high-performance, open source, document-oriented
database. •  Fast Querying •  In-place updates •  Full Index Support •  Replication /High Availability •  Auto-Sharding •  Aggregation; Map/Reduce •  GridFS

6 6 MongoDB is Implemented in C++ •  Windows, Linux,
Mac OS-X, Solaris Drivers are available in many languages 10gen supported •  C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs! •  Multiple community supported drivers The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the ﬁle again. If the red x

7 7 RDBMS MongoDB Table Collection Row(s) JSON Document Index
Index Partition Shard Join Embedding/Linking Schema (implied Schema)

8 8 { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya", date
: ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-02-03T17:22:21.124Z"), text : "Best Post Ever!" }], comment_count : 1 }

9 9 • JSON has powerful, limited set of datatypes – 
Mongo extends datatypes with Date, Int types, Id, … • MongoDB stores data in BSON • BSON is a binary representation of JSON –  Optimized for performance and navigational abilities –  Also compression See: bsonspec.org!

10 10 •  Intrinsic support for fast, iterative development • 
Super low latency access to your data •  Very little CPU overhead •  No additional caching layer required •  Built in replication and horizontal scaling support

11 11 • Want to build an app where users can
check in to a location • Leave notes or comments about that location

12 12 "As a user I want to be able
to find other locations nearby" •  Need to store locations (Offices, Restaurants, etc) –  name, address, tags –  coordinates –  User generated content e.g. tips / notes

to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations

14 14 users user1, user2 loc1, loc2, loc3 locations checkins
checkin1, checkin2

15 15 > location_1 = { name: "Lotus Flower", address:
"123 University Ave", city: "Palo Alto", post_code: 94012 }

"123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.find({name: "Lotus Flower"})

"123 University Ave", city: "Palo Alto", post_code: 94012 } > db.locations.ensureIndex({name: 1}) > db.locations.find({name: "Lotus Flower"})

"123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] }

"123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1})

"123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "dumplings"})

"123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] }

"123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"})

"123 University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}})

24 24 // creating your indexes: > db.locations.ensureIndex({tags: 1}) >
db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({lat_long: "2d"}) // ﬁnding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) // with regular expressions: > db.locations.find({name: /^Din/}) // by tag: > db.locations.find({tag: "dumplings"})

25 25 Atomic operators: $set, $unset, $inc, $push, $pushAll, $pull,
$pullAll, $bit

26 26 // initial data load: > db.locations.insert(location_3) // adding
a tip with update: > db.locations.update( {name: "Lotus Flower"}, {$push: { tips: { user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!"} }})

27 27 > db.locations.findOne() { name: "Lotus Flower", address: "123
University Ave", city: "Palo Alto", post_code: 94012, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387], tips:[{ user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!" }] }

to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: •  Recent checkins •  Popular locations

29 29 > user_1 = { _id: "[email protected]", name: "Asya",
twitter: "asya999", checkins: [ {location: "Lotus Flower", ts: "28/03/2012"}, {location: "Meridian Hotel", ts: "27/03/2012"} ] } > db.users.ensureIndex({checkins.location: 1}) > db.users.find({checkins.location: "Lotus Flower"})

30 30 // ﬁnd all users who've checked in here:
> db.users.find({"checkins.location":"Lotus Flower"})

> db.users.find({"checkins.location":"Lotus Flower"}) // ﬁnd the last 10 checkins here? > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10)

> db.users.find({"checkins.location":"Lotus Flower"}) // ﬁnd the last 10 checkins here: - Warning! > db.users.find({"checkins.location":"Lotus Flower"}) .sort({"checkins.ts": -1}).limit(10) Hard to query for last 10

33 33 > user_2 = { _id: "[email protected]", name: "Asya",
twitter: "asya999", } > checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } > db.checkins.ensureIndex({user: 1}) > db.checkins.find({user: user_id})

> location_id = db.checkins.find({"name":"Lotus Flower"}) > u_ids = db.checkins.find({location: location_id}, {_id: -1, user: 1}) > users = db.users.find({_id: {$in: u_ids}}) // ﬁnd the last 10 checkins here: > db.checkins.find({location: location_id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: location_id, ts: {$gt: midnight}} ).count()

35 35 // Find most popular locations > agg =
db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numEntries: {$sum: 1}}} ) > agg.result [{"_id": "Lotus Flower", "numEntries" : 17}]

36 36 // Find most popular locations > map_func =
function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Lotus Flower", "value" : 17}

37 37 Deployment

38 38 P •  Single server - need a strong
backup plan

39 39 •  Single server - need a strong backup
plan •  Replica sets - High availability - Automatic failover P P S S

40 40 •  Single server - need a strong backup
plan •  Replica sets - High availability - Automatic failover •  Sharded - Horizontally scale - Auto balancing P S S P S S P P S S

41 41 User Data Management High Volume Data Feeds
Content Management Opera9onal Intelligence E-‐Commerce

43 43 @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook
| Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by

44 44 Schema Design with MongoDB

Schema Design with MongoDB Antoine Girbal [email protected] @antoinegirbal

So why model data? http://www.ﬂickr.com/photos/42304632@N00/493639870/

Goals Avoid anomalies when inserting, updating or deleting Minimize redesign
when extending the schema Avoid bias toward a particular query Make use of all SQL features In MongoDB Similar goals apply but rules are different Denormalization for optimization is an option: most features still exist, contrary to BLOBS Normalization

Terminology RDBMS MongoDB Table Collection Row(s)
JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key

Equivalent to a Table in SQL Cheap to create (max
24000) Collections don’t have a fixed schema Common for documents in a collection to share a schema Document schema can evolve Consider using multiple related collections tied together by a naming convention: e.g. LogData-2011-02-08 Collections Basics

Elements are name/value pairs, equivalent to column value in SQL
elements can be nested Rich data types for values JSON for the human eye BSON for all internals 16MB maximum size (many books..) What you see is what is stored Document basics

Schema Design - Relational

Schema Design - MongoDB

Schema Design - MongoDB embedding

Schema Design - MongoDB embedding linking

! Design documents that simply map to your application !
> post = { author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z"),! text: "Destination Moon",! tags: ["comic", "adventure"]! }! ! > db.blogs.save(post)! Design Session

> db.blogs.find()! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! author: "Hergé", ! date:
ISODate("2011-09-18T09:56:06.298Z"), ! text: "Destination Moon", ! tags: [ "comic", "adventure" ]! } Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied Find the document

Secondary index for “author” // 1 means ascending, -1 means
descending! > db.blogs.ensureIndex( { author: 1 } )! ! > db.blogs.find( { author: 'Hergé' } ) ! ! { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),! date: ISODate("2011-09-18T09:56:06.298Z"),! author: "Hergé", ! ... }! Add and index, find via Index

> db.blogs.find( { author: "Hergé" } ).explain() { "cursor" :
"BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } } Examine the query plan

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type,
.. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Query operators

.. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) ! Query operators

.. $lt, $lte, $gt, $gte, $ne... // find posts with any tags! > db.blogs.find( { tags: { $exists: true } } )! Regular expressions: // posts where author starts with h! > db.blogs.find( { author: /^h/i } ) ! Counting: // number of posts written by Hergé! > db.blogs.find( { author: "Hergé" } ).count() ! Query operators

> new_comment = { author: "Kyle", date: new Date(), text:
"great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Extending the Schema

> db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 } Extending the Schema

// create index on nested documents: > db.blogs.ensureIndex( { "comments.author":
1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) Extending the Schema

1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) Extending the Schema

1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an index Extending the Schema

Patterns: •  Inheritance •  one to one •  one to
many •  many to many Common Patterns

Inheritance

shapes table Single Table Inheritance - MongoDB id type
area radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2

> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius:
1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} Single Table Inheritance - MongoDB missing values not stored!

1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) Single Table Inheritance - MongoDB

1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) // create index > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) Single Table Inheritance - MongoDB index only values present!

One to Many Either: • Embedded Array / Document:
•  improves read speed •  simpliﬁes schema • Normalize: •  if list grows signiﬁcantly •  if sub items are updated often •  if sub items are more than 1 level deep and need updating

One to Many Embedded Array:! • $slice operator to return subset
of comments! • some queries become harder (e.g find latest comments across all blogs)! ! blogs: { ! author : "Hergé",! date : ISODate("2011-09-18T09:56:06.298Z"), ! comments : [! " {! " "author : "Kyle",! " "date : ISODate("2011-09-19T09:56:06.298Z"),! " "text : "great book"! " }! ]! }!

One to Many Normalized (2 collections)! • most flexible! • more queries!
! blogs: { _id: 1000, ! author: "Hergé",! date: ISODate("2011-09-18T09:56:06.298Z") }! ! comments : { _id : 1,! blogId: 1000,! author : "Kyle",! " " date : ISODate("2011-09-19T09:56:06.298Z") }! ! > blog = db.blogs.find( { text: "Destination Moon" } );! ! > db.ensureIndex( { blogId: 1 } ) // important!! > db.comments.find( { blogId: blog._id } );!

Example: •  Product can be in many categories •  Category
can have many products Many - Many

// Each product list the IDs of the categories! products:!
{ _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! Many - Many

{ _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Many - Many

{ _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Each category lists the IDs of the products! categories:! { _id: 20, name: "adventure", ! product_ids: [ 10, 11, 12 ] }! ! categories:! { _id: 21, name: "movie", ! product_ids: [ 10 ] }! ! Cuts mapping table and 2 indexes, but:! •  potential consistency issue! •  lists can grow too large! Many - Many

{ _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! Alternative

{ _id: 10, name: "Destination Moon",! category_ids: [ 20, 30 ] }! ! // Association not stored on the categories! categories:! { _id: 20, ! name: "adventure"}! ! // All products for a given category! > db.products.ensureIndex( { category_ids: 1} ) // yes!! > db.products.find( { category_ids: 20 } )! ! ! Alternative

Use cases: •  Trees •  Time Series Common Use Cases

Hierarchical information Trees

Full Tree in Document { retweet: [! { who:
“Kyle”, text: “...”, ! retweet: [! {who: “James”, text: “...”,! retweet: []} ! ]}! ]! }! Pros: Single Document, Performance, Intuitive Cons: Hard to search or update, document can easily get too large Trees

// Store all Ancestors of a node { _id: "a"
} { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) Array of Ancestors A B C D E F

} { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) Array of Ancestors A B C D E F

} { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) // find tweet history of f: > tweets = db.tweets.findOne( { _id: "f" } ).tree > db.tweets.find( { _id: { $in : tweets } } ) Array of Ancestors A B C D E F

Store hierarchy as a path expression •  Separate each node
by a delimiter, e.g. “,” •  Use text search for find parts of a tree •  search must be left-rooted and use an index! { retweets: [! { _id: "a", text: "initial tweet", ! path: "a" },! { _id: "b", text: "reweet with comment",! path: "a,b" },! { _id: "c", text: "reply to retweet",! path : "a,b,c"} ] }! ! // Find the conversations "a" started ! > db.tweets.find( { path: /^a/i } )! // Find the conversations under a branch ! > db.tweets.find( { path: /^a,b/i } )! Trees as Paths A B C D E F

•  Records stats by •  Day, Hour, Minute •  Show
time series Time Series

// Time series buckets, hour and minute sub-docs { _id:
"20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } } // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 }) Time Series

•  Sequence of key/value pairs •  NOT a hash map
•  Optimized to scan quickly BSON Storage ... 0 1 2 3 1439 What is the cost of update the minute before midnight?

•  Can skip sub-documents BSON Storage ... 0
1 59 1439 How could this change the schema? 0 ... 23 ... 1380

Use more of a Tree structure by nesting! // Time
series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { "minute.23.59": 1 } }) Time Series

Document to represent a shopping order: { _id: 1234, ts:
ISODate("2011-12-09T00:00:00.000Z") customerId: 67, total_price: 1050, items: [{ sku: 123, quantity: 2, price: 50, name: “macbook”, thumbnail: “macbook.png” }, { sku: 234, quantity: 1, price: 20, name: “iphone”, thumbnail: “iphone.png” }, ... } } The item information is duplicated in every order that reference it. Mongo’s flexible schema makes it easy! Duplicate data

Pros: only 1 query to get all information needed to
display the order processing on the db is as fast as a BLOB can achieve much higher performance Cons: more storage used ... cheap enough updates are much more complicated ... just consider fields immutable Duplicate data

Basic data design principles stay the same ... But MongoDB
is more flexible and brings possibilities embed or duplicate data to speed up operations, cut down the number of collections and indexes watch for documents growing too large make sure to use the proper indexes for querying and sorting schema should feel natural to your application! Summary

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongo_
Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org

99 99 Replication and Replica Sets

100 100 Why Have Replication?

101 101 •  High Availability (auto-failover) •  Read Scaling (extra
copies to read from) •  Backups –  Online, Delayed Copy (fat finger) –  Point in Time (PiT) backups •  Use (hidden) replica for secondary workload –  Analytics –  Data-processing –  Integration with external systems

102 102 Planned –  Hardware upgrade –  O/S or file-system
tuning –  Relocation of data to new file-system / storage –  Software upgrade Unplanned –  Hardware failure –  Data center failure –  Region outage –  Human error –  Application corruption

103 103 •  A cluster of N servers •  All
writes to primary •  Reads can be to primary (default) or a secondary •  Any (one) node can be primary •  Consensus election of primary •  Automatic failover •  Automatic recovery

104 104 •  Replica Set is made up of 2
or more nodes Member 1 Member 2 Member 3

105 105 •  Election establishes the PRIMARY •  Data replication
from PRIMARY to SECONDARY Member 1 Member 2 Primary Member 3

106 106 •  PRIMARY may fail •  Automatic election of
new PRIMARY if majority exists Member 1 Member 2 DOWN Member 3 negotiate new master

107 107 Member 1 Member 2 DOWN Member 3 Primary
negotiate new master •  New PRIMARY elected •  Replica Set re-established

108 108 •  Automatic recovery Member 1 Member 3  Primary
Member 2 Recovering

109 109 •  Replica Set re-established Member 1 Member 3 
Primary Member 2

110 110 Understanding automatic failover

111 111 Primary Secondary Secondary As long as a partition
can see a majority (>50%) of the cluster, then it will elect a primary.

112 112 Primary Failed Node Secondary 66% of cluster visible.
Primary is elected

113 113 Failed Node 33% of cluster visible. Read only
mode. Failed Node Secondary

114 114 Primary Secondary Secondary

115 115 Primary Secondary Secondary Primary Failed Node Secondary 66%
of cluster visible Primary is elected

116 116 Secondary 33% of cluster visible Read only mode.
Primary Secondary Failed Node Failed Node Secondary

117 117 Primary Secondary Secondary Secondary

118 118 Primary Secondary Secondary Secondary Failed Node Secondary Failed
Node 50% of cluster visible Read only mode. Secondary

119 119 Primary Secondary Failed Node Secondary Failed Node 50%
of cluster visible Read only mode. Secondary Secondary Secondary

120 120 Avoid single points of failure

121 121

122 122 Primary Secondary Secondary Top of rack switch Rack
falls over

123 123 Primary Secondary Secondary Loss of internet Building burns
dow

124 124 Primary Secondary Secondary San Francisco Dallas

125 125 Primary Secondary Secondary San Francisco Dallas Priority 1
Priority 1 Priority 0 Disaster recover data center. Will never become primary automatically.

126 126 Primary Secondary Secondary San Francisco Dallas New York

127 127 Fast recovery

128 128 Primary Arbiter Secondary Is this a good idea?

129 129 Primary Arbiter Secondary 1

130 130 Primary Arbiter Secondary Primary Arbiter Secondary 1 2

131 131 Primary Arbiter Secondary Primary Arbiter Secondary 1 2
Primary Arbiter Secondary 3 Secondary Full Sync Uh oh. Full Sync is going to use a lot of resources on the primary. So I may have downtime or degraded performance

132 132 Primary Secondary 1 Secondary

133 133 Primary Secondary Primary Secondary 1 2 Secondary Secondary

134 134 Primary Secondary Primary Secondary 1 2 Primary Secondary
3 Secondary Full Sync Sync can happen from secondary, which will not impact traffic on Primary. Secondary Secondary Secondary

135 135 •  Avoid single points of failure – Separate racks
– Separate data centers •  Avoid long recovery downtime – Use journaling – Use 3+ replicas •  Keep your actives close – Use priority to control where failovers happen

136 136 Q&A after this session

137 137 Introducing MongoDB into your Organization

138 138 Introducing MongoDB into your Organization Edouard Servan-Schreiber, Ph.D.
Director for Solution Architecture [email protected] @edouardss

139 139 •  You are using, or want to use,
MongoDB –  What beneﬁts? –  Poten9al Use cases –  Steering the adop9on of MongoDB •  Why is MongoDB Safe –  Execu9on –  Opera9onal –  Financial •  Why 10gen? –  People –  Company –  Future

140 140 Your First MongoDB Project

141 141 Big Data! New Programming models New Hardware Architecture

142 142 Horizontally Scalable { author: “roger”, date: new Date(),
text: “Spirited Away”, tags: [“Tezuka”, “Manga”]} Document Oriented High Performance -indexes -RAM Application"

143 143 User Data Management High Volume Data Feeds
Content Management Opera9onal Intelligence Product Data Mgt

144 144 •  “NoSQL databases are proving valuable for scaling
out cloud and on- premises uses of numerous content types, and document-oriented open- source solutions are emerging as one of the leading choices. “

145 145 •  Reassuring the Ops Team •  Reassuring
the Business Team •  Start with low stakes – learn to trust •  Grow towards a mission cri9cal use case •  LET US HELP YOU! è [email protected]

146 146 Execution

147 147

148 148 { " _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), " author :
"roger"," date : "Sat Jul 24 2010 19:47:11", " text : "Spirited Away"," tags : [ "Tezuka", "Manga" ]," comments : [" { author : ’’ Fred "," date : "Sat Jul 24 2010 20:51:03"," text : "Best Movie Ever” } , " { author : ’’ Bill "," date : "Sat Jul 24 2010 21:13:23"," text : ” No Way !! ” }" " ] " }" "

Iteration

150 150 •  Start •  Develop •  Scale

151 151 Operational

152 152 •  Elas9c capacity •  Data center outages
•  Upgrading DB versions •  Upgrade App versions •  Change/Evolve schema/representa9on

153 153 •  Data Durability –  Journal
–  Replicated Writes •  Data Consistency –  Single Master –  Shard to Scale •  YOU are in control!

154 154 •  Millions of IO ops/sec •  Petabytes
of data •  Commodity hardware – Virtual hardware

155 155 Economics

156 156 •  Less code •  More produc9ve coding
•  Easier to maintain •  Con9ngency plans for turnover •  Commodity hardware •  No upfront license, pay for value over 9me •  Cost visibility for growth of usage

157 157 §  Analyze a staggering amount of data
for a system build on con9nuous stream of high-‐quality text pulled from online sources §  Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts §  Ini9ally launched en9rely on MySQL but quickly hit performance road blocks Problem Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and drama?cally smaller. Since we don’t spend ?me worrying about the database, we can spend more ?me wri?ng code for our applica?on. §  Migrated 5 billion records in a single day with zero down9me §  MongoDB powers every website requests: 20m API calls per day §  Ability to eliminated memcached layer, crea9ng a simplified system that required fewer resources and was less prone to error. Why MongoDB §  Reduced code by 75% compared to MySQL §  Fetch 9me cut from 400ms to 60ms §  Sustained insert speed of 8k words per second, with frequent bursts of up to 50k per second §  Significant cost savings and 15% reduc9on in servers Impact Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Tony Tam, Vice President of Engineering and Technical Co-founder

158 158 Why 10gen ?

159 159 Dwight Merriman – CEO! Founder, CTO DoubleClick" Max
Shireson – President! COO MarkLogic" 9 Years at Oracle" Eliot Horowitz – CTO ! Co-founder of Shopwiki, DoubleClick Erik Frieberg – VP Marketing! HP Software, Borland, BEA Ben Sabrin – VP of Sales ! VP of Sales at Jboss, over 9 years of Open Source experience

160 160 •  Community and Commercial •  Dedicated support
staﬀ across the globe –  NY –  CA –  Dublin –  London –  Australia

161 161 •  Union Square Ventures •  Sequoia Capital
•  Flybridge Capital •  NEA •  $80M raised overall •  Most recent round: $42M in May…

162 162 What’s in store…

163 163 •  Authen9ca9on •  Data encryp9on
–  At rest –  In ﬂight •  Full Text Search •  Global Database lock ? •  Monitoring

164 164 Version 2.2 (now) •  Database level
locking •  Aggrega9on Framework •  TTL collec9ons •  Geo-‐aware sharding •  Read Preferences Version 2.4 (Q4 2012) •  Kerberos/LDAP authen9ca9on •  Collec9on level locking •  Full Text Search •  Improved Aggrega9on Framework

165 165 [email protected] Easy to start Easy to develop Easy
to scale

Online Conference: Deep Dive with MongoDB

Online Conference: Deep Dive with MongoDB

More Decks by mongodb

Featured

Transcript