Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to MongoDB

Introduction to MongoDB

Presented at the LA NoSQL meetup on 7/12/2011

Sridhar Nanjundeswaran

July 13, 2011
Tweet

More Decks by Sridhar Nanjundeswaran

Other Decks in Programming

Transcript

  1. • Why do we need them? • Type of non-relational

    databases Non-Relational Databases
  2. Other issues with traditional RDBMS • Application evolution • Replication

    for high read loads • Sharding for write throughput
  3. Non-Relational Data Models • Data model determines the kinds of

    items that can be stored and retrieved • What can the system store? • Opaque data, documents? • What kind of queries can you do? • E.g . SQL is based on relational algebra
  4. Types of Non-Relational Data Models • Key-value stores • Document

    stores • Column-oriented databases • Graph databases
  5. Consistency Models • Relational databases support transactions • Can only

    see committed changes • Commits/aborts span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Single vs Multi-Master
  6. Single Master • All writes go to a single master

    and then replicated • Replication can provide read scalability • Writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem
  7. Single Master - Sharding • Partition the primary key space

    via hashing • Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system?
  8. Multi-Master • Dynamo like solutions • Writes can occur to

    any node • All writes are replicated everywhere • Collisions can occur • Who wins? • A collision resolution strategy is required
  9. No-SQL solutions Data Model Key-Value Document Column- Oriented Consistency Model

    Single Master Membase MongoDB Multi- Master/Dynamo Riak CouchDB Cassandra, HBase, Hypertable
  10. Terminology RDBMS Mongo Table, View Collection Row(s) JSON Document Index

    Index Join Embedded Document Partition Shard Partition Key Shard Key
  11. MongoDB is a Single-Master System • All writes are to

    a primary (master) • Failure of the primary is detected, and a new one is elected • Application writes get an error if there is no quorum to elect a new master • Reads can continue
  12. MongoDB Storage Management • Data is kept in memory-mapped files

    • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses
  13. Release History • First release – February 2009 • v1.0

    - August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets • V1.8 – March 2011 – Journaling, Covered/Sparse indexes, Geo sphere
  14. Documents Blog Post Document p = { author: “sridhar”, date:

    new Date(), title: “Using the C# driver with MongoDB”, tags: [“NoSQL”, “Mongo”, “MongoDB”]} > db.posts.save(p)
  15. Querying >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", date

    : “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)", title: “Using the C# driver with MongoDB”, tags: [“NoSQL”, “Mongo”, “MongoDB”]}
  16. Secondary Indexes Create index on any Field in Document //

    1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: „sridhar'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", ... }
  17. Query Operators • Conditional Operators • $all, $exists, $mod, $ne,

    $in, $nin, $nor, $or, $size, $type • $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find( {tags: {$exists: true }} ) // find posts matching a regular expression > db.posts.find( {author: /^sri*/i } ) // count posts by author > db.posts.find( {author: „sridhar‟} ).count()
  18. Atomic Operations • $set, $unset, $inc, $push, $pushAll, $pull, $pullAll,

    $bit > comment = { author: “fred”, date: new Date(), text: “Interesting blog post”} > db.posts.update( { _id: “...” }, $push: {comments: comment} );
  19. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : “sridhar", date

    : “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)", text : “Using the C# driver with MongoDB", tags : [ “NoSQL", “Mongo", “MongoDB" ], comments : [ { author : "Fred", date : “Mon Jul 11 2011 20:51:03 GMT-0700 (PDT)", text : “Interesting blog post" } ]}
  20. Indexes // Index nested documents > db.posts.ensureIndex( “comments.author”:1 ) 

    db.posts.find({„comments.author‟:‟Fred‟}) // Index on tags > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: ‟Mongo‟ } ) // geospatial index > db.posts.ensureIndex( “author.location”: “2d” ) > db.posts.find( “author.location” : { $near : [22,42] } )
  21. MongoDB – More • Geo-spatial queries • Require a geo

    index • Find points near a given point • Find points within a polygon/sphere • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript
  22. Scaling MongoDB • Replication - Read scalability • Master/Slave •

    Replica Sets • Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced
  23. Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary

    Secondary Secondary Key Range 0..30 Key Range 31..60 Key Range 61..90 Key Range 91.. 100 MongoS MongoS MongoS Read Write MongoS
  24. MongoDB Access • Drivers are available in many languages •

    10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools
  25. V2.0 • Pretty soon • Better concurrency • Faster data

    compaction • Faster map/reduce • TTL collections • Geospatial polygons • Hash shard key • Index 2.0 (smaller+faster)
  26. Future – a short list • Full text Search •

    More concurrency • Online compaction • Internal compression • New aggregation framework Vote: http://jira.mongodb.org
  27. MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:

    AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers
  28. @mongodb © Copyright 2010 10gen Inc. conferences, appearances, and meetups

    http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] @snanjund