Upgrade to Pro — share decks privately, control downloads, hide ads and more …

mongoDB Brussels - mongoDB an introduction

rozza
February 06, 2012

mongoDB Brussels - mongoDB an introduction

An introduction to mongoDB and its place in the noSQL world. A quick look at its core features, from how to use it to scaling mongoDB.

rozza

February 06, 2012
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Key-Value Stores •A mapping from a key to a value

    •The store doesn't know anything about the the key or value •The store doesn't know anything about the insides of the value •Operations • Set, get, or delete a key-value pair
  2. Document Stores •The store is a container for documents •

    Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores •Can create "secondary indexes" • These provide the ability to query on any document field(s) •Operations: • Insert and delete documents • Update fields within documents
  3. Column-Oriented Stores •Like a relational store, but flipped around: all

    data for a column is kept together • An index provides a means to get a column value for a record •Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop
  4. Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting

    edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints
  5. What is mongoDB? MongoDB is a scalable, high-performance, open source

    NoSQL database. •Document-oriented storage •Full Index Support •Replication & High Availability •Auto-Sharding •Querying •Fast In-Place Updates •Map/Reduce •GridFS
  6. • Company behind mongoDB – (A)GPL license, own copyrights, engineering

    team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 90+ employees
  7. History •First release – February 2009 •v1.0 - August 2009

    •v1.2 - December 2009 – MapReduce, ++ •v1.4 - March 2010 – Concurrency, Geo •V1.6 - August 2010 – Sharding, Replica Sets •V1.8 – March 2011 – Journaling, Geosphere •V2.0 - Sep 2011 – V1 Indexes, Concurrency •V2.2 - Soon - Aggregation, Concurrency
  8. MongoDB Access Drivers are available in many languages 10gen supported

    • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... • http://www.mongodb.org/display/DOCS/Drivers
  9. > p = { author: "Ross", date: new Date(), text:

    "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p) Documents Blog Post Document
  10. > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date

    : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Querying Notes: _id is unique, but can be anything you'd like
  11. // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1})

    > db.posts.find({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... } Secondary Indexes Create index on any Field in Document
  12. // find posts with any tags > db.posts.find({tags: {$exists: true

    }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count() Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte
  13. > db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" :

    1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } } Examine the query plan
  14. // Create a comment > new_comment = { author: "Fred",

    date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} }); Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit
  15. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

    Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents
  16. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

    Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents
  17. // Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"})

    // Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } ) Secondary Indexes
  18. Geo • Geo-spatial queries • Require a geo index •

    Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex( "author.location": "2d" ) > db.posts.find( "author.location" : { $near : [22, 42] } )
  19. Map Reduce The caller provides map and reduce functions written

    in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };
  20. // run the map reduce > db.posts.mapReduce(map, reduce, {"out": {

    inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, } Map Reduce
  21. // (Python) Create a new instance of GridFS >>> fs

    = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read() Gridfs Save files in mongoDB Stream data back to the client
  22. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: 'tt-123',

    name: 'Coltrane: Impressions' }, { ski: 'tt-457', name: 'Davis: Kind of Blue' } ], address : { name: 'Banker', street: '111 Main', zip: 10010 }, payment: { cc: 4567, exp: Date(2012, 7, 7) }, subtotal: 2355 } Rich Documents
  23. Scaling MongoDB • Replication - Read scalability • Replica Sets

    • Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced
  24. @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter

    | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by