Slide 1

Slide 1 text

MongoDB Building scalable applications Ross Lawley 10gen 379

Slide 2

Slide 2 text

Origins of mongoDB > Who is 10gen > Design goals of mongoDB Data models in mongoDB > Flexible Schemas > Rich queries and atomic operations > Developer happiness and agility Scaling mongoDB > Replication > Disaster Recovery > Sharding scaling horizontally AGENDA

Slide 3

Slide 3 text

Origins

Slide 4

Slide 4 text

Founded in 2007 >Dwight Merriman, Eliot Horowitz >Doubleclick, Oracle, Marklogic, HP $74M+ in funding > Flybridge, Sequoia, Union Square, New Enterprise Associates Worldwide Expanding Team > 120+ employees > NY, CA, IE, UK, AUS Foster community ecosystem Provide MongoDB management services Provide commercial services Set the direction & contribute code to MongoDB

Slide 5

Slide 5 text

Cost of database increases > Vertical, not horizontal, scaling > High cost of SAN Scaling RDBMS is frustrating launch +30 Days +6 months +60 Days +1 year

Slide 6

Slide 6 text

> Needed to add new software layers of ORM, Caching, Sharding, Message Queue > Polymorphic, semi-structured and unstructured data not well supported Productivity decreases Project start Denormalize data model Stop using joins Custom caching layer Custom sharding

Slide 7

Slide 7 text

Evolution in computing Volume of Data > Trillions of records > 100's of millions of queries per second Agile Development > Iterative > Continuous deployment New Hardware Architecture > Commodity servers > Cloud Computing

Slide 8

Slide 8 text

JSON Documents > Rich data models > Seamlessly map to native programming language > Flexible for dynamic data > Better data locality Simplicity > Few configuration options > Does the right thing out of the box > Easy to deploy and manage General Purpose DBMS > Sophisticated secondary indexes > Dynamic queries > Sorting > Rich updates, upserts > Easy aggregation Scaling > Scale linearly > Increase capacity with no downtime > Transparent to the application MongoDB design goals

Slide 9

Slide 9 text

Developing

Slide 10

Slide 10 text

Developers already model to objects class Post(models.Model): author = models.CharField(max_length=250) title = models.CharField(max_length=250) body = models.TextField() date = models.DateTimeField('date') tags = models.ManyToManyField('Tag') comments = models.ManyToManyField('Comment') class Tag(models.Model): text = models.CharField(max_length=250) class Comment(models.Model): author = models.CharField(max_length=250) body = models.TextField() date = models.DateTimeField('date')

Slide 11

Slide 11 text

In a relational database post id author title body date id post_id tag_id post_tags id text tag id post_id comment_id post_comments id author body date comment 0..* 0..*

Slide 12

Slide 12 text

In mongoDB { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", title : "Building scalable applications", body : "About MongoDB...", date : ISODate("2012-06-27T14:30:00.000Z"), tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-06-27T14:35:00.000Z"), body : "Thanks, I'll look into it" }] }

Slide 13

Slide 13 text

Data model now fits your brain

Slide 14

Slide 14 text

Overview

Slide 15

Slide 15 text

Where can you use it? MongoDB is Implemented in C++ > Platforms 32/64 bit Windows, Linux, Mac OS-X, FreeBSD, Solaris Drivers are available in many languages 10gen supported > C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, node.js Community supported > Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... http://www.mongodb.org/display/DOCS/Drivers

Slide 16

Slide 16 text

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key

Slide 17

Slide 17 text

Flexible schemas > p = { author: "Ross", date: new Date(), text: "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p)

Slide 18

Slide 18 text

Flexible schemas > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : ISODate("2012-06-27T14:30:00.000Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Notes: _id is unique, but can be anything you'd like

Slide 19

Slide 19 text

Introducing BSON JSON has powerful, but limited set of datatypes > arrays, objects, strings, numbers and null BSON is a binary representation of JSON > Adds extra data types with Date, Int types, Id, … > Optimised for performance and navigational abilities > And compression MongoDB sends and stores data in BSON

Slide 20

Slide 20 text

Finding data Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // find posts with any tags > db.posts.find({tags: {$exists: true }}) // count posts where "Ross" has commented > db.posts.find({comment.author: 'Ross'}).count()

Slide 21

Slide 21 text

Indexes > Create index on any field at any level in a document > Supports compound indexes > Can index arrays // Ensure index (1 ascending, -1 descending) > db.posts.ensureIndex({author: 1}) > db.posts.findOne({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... }

Slide 22

Slide 22 text

Examine the query plan > db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } }

Slide 23

Slide 23 text

Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit // Create a comment > new_comment = { author: "Tim", date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} });

Slide 24

Slide 24 text

Rich documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : ISODate("2012-06-27T14:30:00.000Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Tim", date : ISODate("2012-06-27T14:35:00.000Z"), text : "Best Post Ever!" }], comment_count : 1 }

Slide 25

Slide 25 text

Geo-spatial support Geo-spatial queries > Require a geo index > Find points near a given point > Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex({"author.location": "2d"}) > db.posts.find({ "author.location" : { $near : [22, 42] }}) [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: {location: [22, 43]}, ...}]

Slide 26

Slide 26 text

GridFS Save files in mongoDB Stream data back to the client // (Python) Create a new instance of GridFS >>> fs = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read()

Slide 27

Slide 27 text

Aggregation - coming in 2.2 Describe a chain of operations to apply to your data. // Count tags > agg = db.posts.aggregate( {$unwind: "$tags"}, {$group : {_id : "$tags", count : {$sum: 1}}} ) > agg.result [{"_id": "databases", "count": 1}, {"_id": "tech", "count": 1}]

Slide 28

Slide 28 text

Data modelling in mongoDB > Schema less – Storage relates to how you actually use the data – Inherently agile > Rich query language – For adhoc queries – Atomic updates at a document level > Flexible data models – Embed data or link documents depending on usage – Warning! Not all models perform at large scale

Slide 29

Slide 29 text

High availability

Slide 30

Slide 30 text

Replication features > Single master system - Primary always consistent > Automatic failover if a Primary fails > Automatic recovery when a node joins the set > Can be used to scale reads > Full control over writes using write concerns > Easy to administer and manage

Slide 31

Slide 31 text

Replica set is made up of 2 or more nodes How mongoDB replication works A B C

Slide 32

Slide 32 text

Election establishes the PRIMARY Data replication from PRIMARY to SECONDARY How mongoDB replication works S P S

Slide 33

Slide 33 text

PRIMARY may fail Automatic election of new PRIMARY if majority exists How mongoDB replication works S S negotiate new master DOWN

Slide 34

Slide 34 text

New PRIMARY elected Replica set re-established How mongoDB replication works S P DOWN

Slide 35

Slide 35 text

Automatic recovery How mongoDB replication works S P RECOVERING

Slide 36

Slide 36 text

Replica set re-established How mongoDB replication works S P S

Slide 37

Slide 37 text

Advanced replication features > Durability via write concerns – On a connection, database, collection and query level – Tag nodes and direct writes to specific nodes / data centers > Scaling reads – Not applicable for all applications – Secondaries can be used for backups, analytics, data processing > Prioritisation – Prefer specific nodes to be primary – Ensure certain nodes are never primary

Slide 38

Slide 38 text

Example Setup London Reading Cloud p:10 p:10 p:5 p:0 p:1 Backups / Analytics Server Primary Data Centre

Slide 39

Slide 39 text

Scaling

Slide 40

Slide 40 text

Horizontal scale out write read MongoD shard2 MongoD MongoD MongoD shard3 MongoD MongoD MongoD MongoD MongoD shard1

Slide 41

Slide 41 text

MongoDB sharding > Range based > Automatic partitioning and management > Convert to sharded system with no downtime > Fully consistent

Slide 42

Slide 42 text

How mongoDB sharding works Range keys from -∞ to +∞ Ranges are stored as "chunks" -∞ +∞ > db.runCommand({addshard: "shard1"}); > db.runCommand({shardCollection: "mydb.users", key: {age: 1}})

Slide 43

Slide 43 text

How mongoDB sharding works -∞ +∞ 41 +∞ -∞ 40 > db.users.save({age: 40})

Slide 44

Slide 44 text

How mongoDB sharding works -∞ +∞ 41 +∞ 51 +∞ -∞ 40 41 50 61 +∞ 51 60 > db.users.save({age: 40}) > db.users.save({age: 50}) > db.users.save({age: 60})

Slide 45

Slide 45 text

How mongoDB sharding works > db.users.save({age: 40}) > db.users.save({age: 50}) > db.users.save({age: 60}) -∞ +∞ 41 +∞ 51 +∞ -∞ 40 41 50 61 +∞ 51 60 shard1

Slide 46

Slide 46 text

How mongoDB sharding works > db.runCommand({addshard: "shard2"}); > db.runCommand({addshard: "shard3"}); 41 50 61 +∞ 51 60 shard1 -∞ 40 shard2 shard3

Slide 47

Slide 47 text

Architecture C1 C2 C3 Config Servers mongos mongos app app secondary Shard 1 secondary primary secondary Shard 2 secondary primary Shard 4 secondary Shard 3 secondary primary Replica Set secondary secondary primary

Slide 48

Slide 48 text

Use cases

Slide 49

Slide 49 text

There are many use cases User Data Management High Volume Data Feeds Content Management Operational Intelligence Product Data Management

Slide 50

Slide 50 text

Download & join in http://www.meetup.com/Swiss-MongoDB-User-Group/

Slide 51

Slide 51 text

Ross Lawley http://mongodb.org 10gen [email protected]