Introduction to MongoDB

Overview • Non-Relational Databases • MongoDB • Use cases and
customers

• Why do we need them? • Type of non-relational
databases Non-Relational Databases

Costs go up

Productivity goes down

Other issues with traditional RDBMS • Application evolution • Replication
for high read loads • Sharding for write throughput

Non-Relational Data Models • Data model determines the kinds of
items that can be stored and retrieved • What can the system store? • Opaque data, documents? • What kind of queries can you do? • E.g . SQL is based on relational algebra

Types of Non-Relational Data Models • Key-value stores • Document
stores • Column-oriented databases • Graph databases

Consistency Models • Relational databases support transactions • Can only
see committed changes • Commits/aborts span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Single vs Multi-Master

Single Master • All writes go to a single master
and then replicated • Replication can provide read scalability • Writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem

Single Master - Sharding • Partition the primary key space
via hashing • Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system?

Multi-Master • Dynamo like solutions • Writes can occur to
any node • All writes are replicated everywhere • Collisions can occur • Who wins? • A collision resolution strategy is required

No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model
Single Master Membase MongoDB Multi- Master/Dynamo Riak CouchDB Cassandra, HBase, Hypertable

What is MongoDB? MongoDB‟s architecture and features Installing and running
MongoDB

What is MongoDB • Document Store • Horizontally Scalable •
High Performance

MongoDB vs Traditional RDBMS databases contain rows server contain tables
schema joins

Terminology RDBMS Mongo Table, View Collection Row(s) JSON Document Index
Index Join Embedded Document Partition Shard Partition Key Shard Key

MongoDB is a Single-Master System • All writes are to
a primary (master) • Failure of the primary is detected, and a new one is elected • Application writes get an error if there is no quorum to elect a new master • Reads can continue

MongoDB Storage Management • Data is kept in memory-mapped files
• Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses

Release History • First release – February 2009 • v1.0
- August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets • V1.8 – March 2011 – Journaling, Covered/Sparse indexes, Geo sphere

Documents Blog Post Document p = { author: “sridhar”, date:
new Date(), title: “Using the C# driver with MongoDB”, tags: [“NoSQL”, “Mongo”, “MongoDB”]} > db.posts.save(p)

Querying >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", date
: “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)", title: “Using the C# driver with MongoDB”, tags: [“NoSQL”, “Mongo”, “MongoDB”]}

Secondary Indexes Create index on any Field in Document //
1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: „sridhar'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", ... }

Query Operators • Conditional Operators • $all, $exists, $mod, $ne,
$in, $nin, $nor, $or, $size, $type • $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find( {tags: {$exists: true }} ) // find posts matching a regular expression > db.posts.find( {author: /^sri*/i } ) // count posts by author > db.posts.find( {author: „sridhar‟} ).count()

Atomic Operations • $set, $unset, $inc, $push, $pushAll, $pull, $pullAll,
$bit > comment = { author: “fred”, date: new Date(), text: “Interesting blog post”} > db.posts.update( { _id: “...” }, $push: {comments: comment} );

Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", date
: “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)", text : “Using the C# driver with MongoDB", tags : [ “NoSQL", “Mongo", “MongoDB" ], comments : [ { author : "Fred", date : “Mon Jul 11 2011 20:51:03 GMT-0700 (PDT)", text : “Interesting blog post" } ]}

Indexes // Index nested documents > db.posts.ensureIndex( “comments.author”:1 ) 
db.posts.find({„comments.author‟:‟Fred‟}) // Index on tags > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: ‟Mongo‟ } ) // geospatial index > db.posts.ensureIndex( “author.location”: “2d” ) > db.posts.find( “author.location” : { $near : [22,42] } )

MongoDB – More • Geo-spatial queries • Require a geo
index • Find points near a given point • Find points within a polygon/sphere • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript

Scaling MongoDB • Replication - Read scalability • Master/Slave •
Replica Sets • Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced

Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary
Secondary Secondary Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 MongoS MongoS MongoS Read Write MongoS

MongoDB Access • Drivers are available in many languages •
10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools

V2.0 • Pretty soon • Better concurrency • Faster data
compaction • Faster map/reduce • TTL collections • Geospatial polygons • Hash shard key • Index 2.0 (smaller+faster)

Future – a short list • Full text Search •
More concurrency • Online compaction • Internal compression • New aggregation framework Vote: http://jira.mongodb.org

MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:
AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers

• Use cases • Case studies Use cases and customers

Content Management

Gaming

Analytics

@mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups
http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] @snanjund

Introduction to MongoDB

Introduction to MongoDB

More Decks by Sridhar Nanjundeswaran

Other Decks in Programming

Featured

Transcript