Upgrade to Pro — share decks privately, control downloads, hide ads and more …

mongoDB Brussels - mongoDB an introduction

Avatar for rozza rozza
February 06, 2012

mongoDB Brussels - mongoDB an introduction

An introduction to mongoDB and its place in the noSQL world. A quick look at its core features, from how to use it to scaling mongoDB.

Avatar for rozza

rozza

February 06, 2012
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Key-Value Stores •A mapping from a key to a value

    •The store doesn't know anything about the the key or value •The store doesn't know anything about the insides of the value •Operations • Set, get, or delete a key-value pair
  2. Document Stores •The store is a container for documents •

    Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores •Can create "secondary indexes" • These provide the ability to query on any document field(s) •Operations: • Insert and delete documents • Update fields within documents
  3. Column-Oriented Stores •Like a relational store, but flipped around: all

    data for a column is kept together • An index provides a means to get a column value for a record •Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop
  4. Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting

    edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints
  5. What is mongoDB? MongoDB is a scalable, high-performance, open source

    NoSQL database. •Document-oriented storage •Full Index Support •Replication & High Availability •Auto-Sharding •Querying •Fast In-Place Updates •Map/Reduce •GridFS
  6. • Company behind mongoDB – (A)GPL license, own copyrights, engineering

    team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 90+ employees
  7. History •First release – February 2009 •v1.0 - August 2009

    •v1.2 - December 2009 – MapReduce, ++ •v1.4 - March 2010 – Concurrency, Geo •V1.6 - August 2010 – Sharding, Replica Sets •V1.8 – March 2011 – Journaling, Geosphere •V2.0 - Sep 2011 – V1 Indexes, Concurrency •V2.2 - Soon - Aggregation, Concurrency
  8. MongoDB Access Drivers are available in many languages 10gen supported

    • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... • http://www.mongodb.org/display/DOCS/Drivers
  9. > p = { author: "Ross", date: new Date(), text:

    "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p) Documents Blog Post Document
  10. > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date

    : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Querying Notes: _id is unique, but can be anything you'd like
  11. // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1})

    > db.posts.find({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... } Secondary Indexes Create index on any Field in Document
  12. // find posts with any tags > db.posts.find({tags: {$exists: true

    }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count() Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte
  13. > db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" :

    1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } } Examine the query plan
  14. // Create a comment > new_comment = { author: "Fred",

    date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} }); Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit
  15. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

    Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents
  16. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

    Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents
  17. // Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"})

    // Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } ) Secondary Indexes
  18. Geo • Geo-spatial queries • Require a geo index •

    Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex( "author.location": "2d" ) > db.posts.find( "author.location" : { $near : [22, 42] } )
  19. Map Reduce The caller provides map and reduce functions written

    in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };
  20. // run the map reduce > db.posts.mapReduce(map, reduce, {"out": {

    inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, } Map Reduce
  21. // (Python) Create a new instance of GridFS >>> fs

    = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read() Gridfs Save files in mongoDB Stream data back to the client
  22. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: 'tt-123',

    name: 'Coltrane: Impressions' }, { ski: 'tt-457', name: 'Davis: Kind of Blue' } ], address : { name: 'Banker', street: '111 Main', zip: 10010 }, payment: { cc: 4567, exp: Date(2012, 7, 7) }, subtotal: 2355 } Rich Documents
  23. Scaling MongoDB • Replication - Read scalability • Replica Sets

    • Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced
  24. @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter

    | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by