Slide 1

Slide 1 text

Sridhar Nanjundeswaran Software Engineer, 10gen © Copyright 2010 10gen Inc.

Slide 2

Slide 2 text

Overview • Non-Relational Databases • MongoDB • Use cases and customers

Slide 3

Slide 3 text

• Why do we need them? • Type of non-relational databases Non-Relational Databases

Slide 4

Slide 4 text

Costs go up

Slide 5

Slide 5 text

Productivity goes down

Slide 6

Slide 6 text

Other issues with traditional RDBMS • Application evolution • Replication for high read loads • Sharding for write throughput

Slide 7

Slide 7 text

Non-Relational Data Models • Data model determines the kinds of items that can be stored and retrieved • What can the system store? • Opaque data, documents? • What kind of queries can you do? • E.g . SQL is based on relational algebra

Slide 8

Slide 8 text

Types of Non-Relational Data Models • Key-value stores • Document stores • Column-oriented databases • Graph databases

Slide 9

Slide 9 text

Consistency Models • Relational databases support transactions • Can only see committed changes • Commits/aborts span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Single vs Multi-Master

Slide 10

Slide 10 text

Single Master • All writes go to a single master and then replicated • Replication can provide read scalability • Writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem

Slide 11

Slide 11 text

Single Master - Sharding • Partition the primary key space via hashing • Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system?

Slide 12

Slide 12 text

Multi-Master • Dynamo like solutions • Writes can occur to any node • All writes are replicated everywhere • Collisions can occur • Who wins? • A collision resolution strategy is required

Slide 13

Slide 13 text

No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model Single Master Membase MongoDB Multi- Master/Dynamo Riak CouchDB Cassandra, HBase, Hypertable

Slide 14

Slide 14 text

What is MongoDB? MongoDB‟s architecture and features Installing and running MongoDB

Slide 15

Slide 15 text

What is MongoDB • Document Store • Horizontally Scalable • High Performance

Slide 16

Slide 16 text

MongoDB vs Traditional RDBMS databases contain rows server contain tables schema joins

Slide 17

Slide 17 text

Terminology RDBMS Mongo Table, View Collection Row(s) JSON Document Index Index Join Embedded Document Partition Shard Partition Key Shard Key

Slide 18

Slide 18 text

MongoDB is a Single-Master System • All writes are to a primary (master) • Failure of the primary is detected, and a new one is elected • Application writes get an error if there is no quorum to elect a new master • Reads can continue

Slide 19

Slide 19 text

MongoDB Storage Management • Data is kept in memory-mapped files • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses

Slide 20

Slide 20 text

Release History • First release – February 2009 • v1.0 - August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets • V1.8 – March 2011 – Journaling, Covered/Sparse indexes, Geo sphere

Slide 21

Slide 21 text

Documents Blog Post Document p = { author: “sridhar”, date: new Date(), title: “Using the C# driver with MongoDB”, tags: [“NoSQL”, “Mongo”, “MongoDB”]} > db.posts.save(p)

Slide 22

Slide 22 text

Querying >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", date : “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)", title: “Using the C# driver with MongoDB”, tags: [“NoSQL”, “Mongo”, “MongoDB”]}

Slide 23

Slide 23 text

Secondary Indexes Create index on any Field in Document // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: „sridhar'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", ... }

Slide 24

Slide 24 text

Query Operators • Conditional Operators • $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type • $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find( {tags: {$exists: true }} ) // find posts matching a regular expression > db.posts.find( {author: /^sri*/i } ) // count posts by author > db.posts.find( {author: „sridhar‟} ).count()

Slide 25

Slide 25 text

Atomic Operations • $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit > comment = { author: “fred”, date: new Date(), text: “Interesting blog post”} > db.posts.update( { _id: “...” }, $push: {comments: comment} );

Slide 26

Slide 26 text

Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", date : “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)", text : “Using the C# driver with MongoDB", tags : [ “NoSQL", “Mongo", “MongoDB" ], comments : [ { author : "Fred", date : “Mon Jul 11 2011 20:51:03 GMT-0700 (PDT)", text : “Interesting blog post" } ]}

Slide 27

Slide 27 text

Indexes // Index nested documents > db.posts.ensureIndex( “comments.author”:1 )  db.posts.find({„comments.author‟:‟Fred‟}) // Index on tags > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: ‟Mongo‟ } ) // geospatial index > db.posts.ensureIndex( “author.location”: “2d” ) > db.posts.find( “author.location” : { $near : [22,42] } )

Slide 28

Slide 28 text

MongoDB – More • Geo-spatial queries • Require a geo index • Find points near a given point • Find points within a polygon/sphere • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript

Slide 29

Slide 29 text

Scaling MongoDB • Replication - Read scalability • Master/Slave • Replica Sets • Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced

Slide 30

Slide 30 text

Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 MongoS MongoS MongoS Read Write MongoS

Slide 31

Slide 31 text

MongoDB Access • Drivers are available in many languages • 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools

Slide 32

Slide 32 text

V2.0 • Pretty soon • Better concurrency • Faster data compaction • Faster map/reduce • TTL collections • Geospatial polygons • Hash shard key • Index 2.0 (smaller+faster)

Slide 33

Slide 33 text

Future – a short list • Full text Search • More concurrency • Online compaction • Internal compression • New aggregation framework Vote: http://jira.mongodb.org

Slide 34

Slide 34 text

MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License: AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers

Slide 35

Slide 35 text

• Use cases • Case studies Use cases and customers

Slide 36

Slide 36 text

Content Management

Slide 37

Slide 37 text

Gaming

Slide 38

Slide 38 text

Analytics

Slide 39

Slide 39 text

© Copyright 2010 10gen Inc. try at try.mongodb.org

Slide 40

Slide 40 text

@mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] @snanjund