Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to MongoDB

Introduction to MongoDB

Presented at the LA NoSQL meetup on 7/12/2011

Sridhar Nanjundeswaran

July 13, 2011
Tweet

More Decks by Sridhar Nanjundeswaran

Other Decks in Programming

Transcript

  1. Sridhar Nanjundeswaran
    Software Engineer, 10gen
    © Copyright 2010 10gen Inc.

    View full-size slide

  2. Overview
    • Non-Relational Databases
    • MongoDB
    • Use cases and customers

    View full-size slide

  3. • Why do we need them?
    • Type of non-relational databases
    Non-Relational Databases

    View full-size slide

  4. Productivity goes down

    View full-size slide

  5. Other issues with traditional RDBMS
    • Application evolution
    • Replication for high read loads
    • Sharding for write throughput

    View full-size slide

  6. Non-Relational Data Models
    • Data model determines the kinds of items
    that can be stored and retrieved
    • What can the system store?
    • Opaque data, documents?
    • What kind of queries can you do?
    • E.g . SQL is based on relational algebra

    View full-size slide

  7. Types of Non-Relational Data Models
    • Key-value stores
    • Document stores
    • Column-oriented databases
    • Graph databases

    View full-size slide

  8. Consistency Models
    • Relational databases support transactions
    • Can only see committed changes
    • Commits/aborts span multiple changes
    • Read-only transaction flavors
    • Read committed, repeatable read, etc
    • Single vs Multi-Master

    View full-size slide

  9. Single Master
    • All writes go to a single master and then
    replicated
    • Replication can provide read scalability
    • Writing becomes a bottleneck
    • Physical limitations (seek time)
    • Throughput of a single I/O subsystem

    View full-size slide

  10. Single Master - Sharding
    • Partition the primary key space via hashing
    • Set up a duplicate system for each shard
    • The write-rate limitation now applies to each
    shard
    • Joins or aggregation across shards are
    problematic
    • Can the data be re-sharded on a live system?
    • Can shards be re-balanced on a live system?

    View full-size slide

  11. Multi-Master
    • Dynamo like solutions
    • Writes can occur to any node
    • All writes are replicated everywhere
    • Collisions can occur
    • Who wins?
    • A collision resolution strategy is required

    View full-size slide

  12. No-SQL solutions
    Data Model
    Key-Value Document Column-
    Oriented
    Consistency
    Model
    Single Master Membase MongoDB
    Multi-
    Master/Dynamo
    Riak CouchDB Cassandra,
    HBase,
    Hypertable

    View full-size slide

  13. What is MongoDB?
    MongoDB‟s architecture and features
    Installing and running
    MongoDB

    View full-size slide

  14. What is MongoDB
    • Document Store
    • Horizontally Scalable
    • High Performance

    View full-size slide

  15. MongoDB vs Traditional RDBMS
    databases
    contain rows
    server
    contain tables
    schema
    joins

    View full-size slide

  16. Terminology
    RDBMS Mongo
    Table, View Collection
    Row(s) JSON Document
    Index Index
    Join Embedded Document
    Partition Shard
    Partition Key Shard Key

    View full-size slide

  17. MongoDB is a Single-Master System
    • All writes are to a primary (master)
    • Failure of the primary is detected, and a new
    one is elected
    • Application writes get an error if there is no
    quorum to elect a new master
    • Reads can continue

    View full-size slide

  18. MongoDB Storage Management
    • Data is kept in memory-mapped files
    • Files are allocated as needed
    • Documents in a collection are kept on a list
    using a geographical addressing scheme
    • Indexes (B*-trees) point to documents using
    geographical addresses

    View full-size slide

  19. Release History
    • First release – February 2009
    • v1.0 - August 2009
    • v1.2 - December 2009 - Map/Reduce, lots of
    small things
    • v1.4 - March 2010 - Concurrency/Geo
    • V1.6 - August 2010 - Sharding/Replica Sets
    • V1.8 – March 2011 – Journaling,
    Covered/Sparse indexes, Geo sphere

    View full-size slide

  20. Documents
    Blog Post Document
    p = { author: “sridhar”,
    date: new Date(),
    title: “Using the C# driver with MongoDB”,
    tags: [“NoSQL”, “Mongo”, “MongoDB”]}
    > db.posts.save(p)

    View full-size slide

  21. Querying
    >db.posts.find()
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : “sridhar",
    date : “Mon Jul 11 2011 19:47:11 GMT-0700
    (PDT)",
    title: “Using the C# driver with MongoDB”,
    tags: [“NoSQL”, “Mongo”, “MongoDB”]}

    View full-size slide

  22. Secondary Indexes
    Create index on any Field in Document
    // 1 means ascending, -1 means descending
    >db.posts.ensureIndex({author: 1})
    >db.posts.find({author: „sridhar'})
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : “sridhar",
    ... }

    View full-size slide

  23. Query Operators
    • Conditional Operators
    • $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
    • $lt, $lte, $gt, $gte
    // find posts with any tags
    > db.posts.find( {tags: {$exists: true }} )
    // find posts matching a regular expression
    > db.posts.find( {author: /^sri*/i } )
    // count posts by author
    > db.posts.find( {author: „sridhar‟} ).count()

    View full-size slide

  24. Atomic Operations
    • $set, $unset, $inc, $push, $pushAll, $pull,
    $pullAll, $bit
    > comment = { author: “fred”,
    date: new Date(),
    text: “Interesting blog post”}
    > db.posts.update( { _id: “...” },
    $push: {comments: comment} );

    View full-size slide

  25. Nested Documents
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : “sridhar",
    date : “Mon Jul 11 2011 19:47:11 GMT-0700 (PDT)",
    text : “Using the C# driver with MongoDB",
    tags : [ “NoSQL", “Mongo", “MongoDB" ],
    comments : [
    {
    author : "Fred",
    date : “Mon Jul 11 2011 20:51:03 GMT-0700 (PDT)",
    text : “Interesting blog post"
    }
    ]}

    View full-size slide

  26. Indexes
    // Index nested documents
    > db.posts.ensureIndex( “comments.author”:1 )
     db.posts.find({„comments.author‟:‟Fred‟})
    // Index on tags
    > db.posts.ensureIndex( tags: 1)
    > db.posts.find( { tags: ‟Mongo‟ } )
    // geospatial index
    > db.posts.ensureIndex( “author.location”: “2d” )
    > db.posts.find( “author.location” : { $near : [22,42] } )

    View full-size slide

  27. MongoDB – More
    • Geo-spatial queries
    • Require a geo index
    • Find points near a given point
    • Find points within a polygon/sphere
    • Built-in Map-Reduce
    • The caller provides map and reduce functions
    written in JavaScript

    View full-size slide

  28. Scaling MongoDB
    • Replication - Read scalability
    • Master/Slave
    • Replica Sets
    • Sharding – Read and write scalability
    • Collections are sharded
    • Each shard is served by its own replica set
    • Shard key ranges are automatically balanced

    View full-size slide

  29. Primary
    Secondary
    Secondary
    Primary
    Secondary
    Secondary
    Primary
    Secondary
    Secondary
    Primary
    Secondary
    Secondary
    Key Range
    0..30
    Key Range
    31..60
    Key Range
    61..90
    Key Range
    91.. 100
    MongoS MongoS MongoS
    Read
    Write
    MongoS

    View full-size slide

  30. MongoDB Access
    • Drivers are available in many languages
    • 10gen supported
    • C, C# (.Net), C++, Erlang, Haskell, Java,
    JavaScript, Perl, PHP, Python, Ruby, Scala
    • Community supported
    • Clojure, ColdFusion, F#, Go, Groovy, Lua, R
    • http://www.mongodb.org/display/DOCS/Overview+-
    +Writing+Drivers+and+Tools

    View full-size slide

  31. V2.0
    • Pretty soon
    • Better concurrency
    • Faster data compaction
    • Faster map/reduce
    • TTL collections
    • Geospatial polygons
    • Hash shard key
    • Index 2.0 (smaller+faster)

    View full-size slide

  32. Future – a short list
    • Full text Search
    • More concurrency
    • Online compaction
    • Internal compression
    • New aggregation framework
    Vote: http://jira.mongodb.org

    View full-size slide

  33. MongoDB Availability
    • Source
    • https://github.com/mongodb/mongo
    • Server
    • License: AGPL
    • http://www.mongodb.org/downloads
    • Drivers
    • License: Apache
    • http://www.mongodb.org/display/DOCS/Drivers

    View full-size slide

  34. • Use cases
    • Case studies
    Use cases and customers

    View full-size slide

  35. Content Management

    View full-size slide

  36. © Copyright 2010 10gen Inc.
    try at try.mongodb.org

    View full-size slide

  37. @mongodb
    © Copyright 2010 10gen Inc.
    conferences, appearances, and meetups
    http://www.10gen.com/events
    http://bit.ly/mongofb
    Facebook | Twitter | LinkedIn
    http://linkd.in/joinmongo
    download at mongodb.org
    We’re Hiring !
    [email protected]
    @snanjund

    View full-size slide