Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building scalable applications with mongoDB

rozza
June 28, 2012

Building scalable applications with mongoDB

rozza

June 28, 2012
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Origins of mongoDB > Who is 10gen > Design goals

    of mongoDB Data models in mongoDB > Flexible Schemas > Rich queries and atomic operations > Developer happiness and agility Scaling mongoDB > Replication > Disaster Recovery > Sharding scaling horizontally AGENDA
  2. Founded in 2007 >Dwight Merriman, Eliot Horowitz >Doubleclick, Oracle, Marklogic,

    HP $74M+ in funding > Flybridge, Sequoia, Union Square, New Enterprise Associates Worldwide Expanding Team > 120+ employees > NY, CA, IE, UK, AUS Foster community ecosystem Provide MongoDB management services Provide commercial services Set the direction & contribute code to MongoDB
  3. Cost of database increases > Vertical, not horizontal, scaling >

    High cost of SAN Scaling RDBMS is frustrating launch +30 Days +6 months +60 Days +1 year
  4. > Needed to add new software layers of ORM, Caching,

    Sharding, Message Queue > Polymorphic, semi-structured and unstructured data not well supported Productivity decreases Project start Denormalize data model Stop using joins Custom caching layer Custom sharding
  5. Evolution in computing Volume of Data > Trillions of records

    > 100's of millions of queries per second Agile Development > Iterative > Continuous deployment New Hardware Architecture > Commodity servers > Cloud Computing
  6. JSON Documents > Rich data models > Seamlessly map to

    native programming language > Flexible for dynamic data > Better data locality Simplicity > Few configuration options > Does the right thing out of the box > Easy to deploy and manage General Purpose DBMS > Sophisticated secondary indexes > Dynamic queries > Sorting > Rich updates, upserts > Easy aggregation Scaling > Scale linearly > Increase capacity with no downtime > Transparent to the application MongoDB design goals
  7. Developers already model to objects class Post(models.Model): author = models.CharField(max_length=250)

    title = models.CharField(max_length=250) body = models.TextField() date = models.DateTimeField('date') tags = models.ManyToManyField('Tag') comments = models.ManyToManyField('Comment') class Tag(models.Model): text = models.CharField(max_length=250) class Comment(models.Model): author = models.CharField(max_length=250) body = models.TextField() date = models.DateTimeField('date')
  8. In a relational database post id author title body date

    id post_id tag_id post_tags id text tag id post_id comment_id post_comments id author body date comment 0..* 0..*
  9. In mongoDB { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", title

    : "Building scalable applications", body : "About MongoDB...", date : ISODate("2012-06-27T14:30:00.000Z"), tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-06-27T14:35:00.000Z"), body : "Thanks, I'll look into it" }] }
  10. Where can you use it? MongoDB is Implemented in C++

    > Platforms 32/64 bit Windows, Linux, Mac OS-X, FreeBSD, Solaris Drivers are available in many languages 10gen supported > C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, node.js Community supported > Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... http://www.mongodb.org/display/DOCS/Drivers
  11. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index

    Join Embedding & Linking Partition Shard Partition Key Shard Key
  12. Flexible schemas > p = { author: "Ross", date: new

    Date(), text: "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p)
  13. Flexible schemas > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author :

    "Ross", date : ISODate("2012-06-27T14:30:00.000Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Notes: _id is unique, but can be anything you'd like
  14. Introducing BSON JSON has powerful, but limited set of datatypes

    > arrays, objects, strings, numbers and null BSON is a binary representation of JSON > Adds extra data types with Date, Int types, Id, … > Optimised for performance and navigational abilities > And compression MongoDB sends and stores data in BSON
  15. Finding data Conditional Operators - $all, $exists, $mod, $ne, $in,

    $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // find posts with any tags > db.posts.find({tags: {$exists: true }}) // count posts where "Ross" has commented > db.posts.find({comment.author: 'Ross'}).count()
  16. Indexes > Create index on any field at any level

    in a document > Supports compound indexes > Can index arrays // Ensure index (1 ascending, -1 descending) > db.posts.ensureIndex({author: 1}) > db.posts.findOne({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... }
  17. Examine the query plan > db.posts.find({"author": 'Ross'}).explain() { "cursor" :

    "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } }
  18. Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

    // Create a comment > new_comment = { author: "Tim", date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} });
  19. Rich documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date

    : ISODate("2012-06-27T14:30:00.000Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Tim", date : ISODate("2012-06-27T14:35:00.000Z"), text : "Best Post Ever!" }], comment_count : 1 }
  20. Geo-spatial support Geo-spatial queries > Require a geo index >

    Find points near a given point > Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex({"author.location": "2d"}) > db.posts.find({ "author.location" : { $near : [22, 42] }}) [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: {location: [22, 43]}, ...}]
  21. GridFS Save files in mongoDB Stream data back to the

    client // (Python) Create a new instance of GridFS >>> fs = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read()
  22. Aggregation - coming in 2.2 Describe a chain of operations

    to apply to your data. // Count tags > agg = db.posts.aggregate( {$unwind: "$tags"}, {$group : {_id : "$tags", count : {$sum: 1}}} ) > agg.result [{"_id": "databases", "count": 1}, {"_id": "tech", "count": 1}]
  23. Data modelling in mongoDB > Schema less – Storage relates

    to how you actually use the data – Inherently agile > Rich query language – For adhoc queries – Atomic updates at a document level > Flexible data models – Embed data or link documents depending on usage – Warning! Not all models perform at large scale
  24. Replication features > Single master system - Primary always consistent

    > Automatic failover if a Primary fails > Automatic recovery when a node joins the set > Can be used to scale reads > Full control over writes using write concerns > Easy to administer and manage
  25. Replica set is made up of 2 or more nodes

    How mongoDB replication works A B C
  26. PRIMARY may fail Automatic election of new PRIMARY if majority

    exists How mongoDB replication works S S negotiate new master DOWN
  27. Advanced replication features > Durability via write concerns – On

    a connection, database, collection and query level – Tag nodes and direct writes to specific nodes / data centers > Scaling reads – Not applicable for all applications – Secondaries can be used for backups, analytics, data processing > Prioritisation – Prefer specific nodes to be primary – Ensure certain nodes are never primary
  28. Example Setup London Reading Cloud p:10 p:10 p:5 p:0 p:1

    Backups / Analytics Server Primary Data Centre
  29. Horizontal scale out write read MongoD shard2 MongoD MongoD MongoD

    shard3 MongoD MongoD MongoD MongoD MongoD shard1
  30. MongoDB sharding > Range based > Automatic partitioning and management

    > Convert to sharded system with no downtime > Fully consistent
  31. How mongoDB sharding works Range keys from -∞ to +∞

    Ranges are stored as "chunks" -∞ +∞ > db.runCommand({addshard: "shard1"}); > db.runCommand({shardCollection: "mydb.users", key: {age: 1}})
  32. How mongoDB sharding works -∞ +∞ 41 +∞ 51 +∞

    -∞ 40 41 50 61 +∞ 51 60 > db.users.save({age: 40}) > db.users.save({age: 50}) > db.users.save({age: 60})
  33. How mongoDB sharding works > db.users.save({age: 40}) > db.users.save({age: 50})

    > db.users.save({age: 60}) -∞ +∞ 41 +∞ 51 +∞ -∞ 40 41 50 61 +∞ 51 60 shard1
  34. Architecture C1 C2 C3 Config Servers mongos mongos app app

    secondary Shard 1 secondary primary secondary Shard 2 secondary primary Shard 4 secondary Shard 3 secondary primary Replica Set secondary secondary primary
  35. There are many use cases User Data Management High Volume

    Data Feeds Content Management Operational Intelligence Product Data Management