Slide 1

Slide 1 text

Ross Lawley - [email protected] @rossc0 Introduction

Slide 2

Slide 2 text

Database Landscape depth of functionality scalability & performance memcached key/value RDBMS

Slide 3

Slide 3 text

What is noSQL? •Key-value stores •Document stores •Column-oriented databases •Graph databases

Slide 4

Slide 4 text

Key-Value Stores •A mapping from a key to a value •The store doesn't know anything about the the key or value •The store doesn't know anything about the insides of the value •Operations • Set, get, or delete a key-value pair

Slide 5

Slide 5 text

Document Stores •The store is a container for documents • Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores •Can create "secondary indexes" • These provide the ability to query on any document field(s) •Operations: • Insert and delete documents • Update fields within documents

Slide 6

Slide 6 text

Column-Oriented Stores •Like a relational store, but flipped around: all data for a column is kept together • An index provides a means to get a column value for a record •Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop

Slide 7

Slide 7 text

Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints

Slide 8

Slide 8 text

What is mongoDB? MongoDB is a scalable, high-performance, open source NoSQL database. •Document-oriented storage •Full Index Support •Replication & High Availability •Auto-Sharding •Querying •Fast In-Place Updates •Map/Reduce •GridFS

Slide 9

Slide 9 text

• Company behind mongoDB – (A)GPL license, own copyrights, engineering team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 90+ employees

Slide 10

Slide 10 text

History •First release – February 2009 •v1.0 - August 2009 •v1.2 - December 2009 – MapReduce, ++ •v1.4 - March 2010 – Concurrency, Geo •V1.6 - August 2010 – Sharding, Replica Sets •V1.8 – March 2011 – Journaling, Geosphere •V2.0 - Sep 2011 – V1 Indexes, Concurrency •V2.2 - Soon - Aggregation, Concurrency

Slide 11

Slide 11 text

MongoDB Access Drivers are available in many languages 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... • http://www.mongodb.org/display/DOCS/Drivers

Slide 12

Slide 12 text

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking

Slide 13

Slide 13 text

> p = { author: "Ross", date: new Date(), text: "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p) Documents Blog Post Document

Slide 14

Slide 14 text

> db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Querying Notes: _id is unique, but can be anything you'd like

Slide 15

Slide 15 text

// 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.find({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... } Secondary Indexes Create index on any Field in Document

Slide 16

Slide 16 text

// find posts with any tags > db.posts.find({tags: {$exists: true }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count() Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte

Slide 17

Slide 17 text

> db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } } Examine the query plan

Slide 18

Slide 18 text

// Create a comment > new_comment = { author: "Fred", date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} }); Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

Slide 19

Slide 19 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents

Slide 20

Slide 20 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents

Slide 21

Slide 21 text

// Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"}) // Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } ) Secondary Indexes

Slide 22

Slide 22 text

Geo • Geo-spatial queries • Require a geo index • Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex( "author.location": "2d" ) > db.posts.find( "author.location" : { $near : [22, 42] } )

Slide 23

Slide 23 text

Map Reduce The caller provides map and reduce functions written in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };

Slide 24

Slide 24 text

// run the map reduce > db.posts.mapReduce(map, reduce, {"out": { inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, } Map Reduce

Slide 25

Slide 25 text

// (Python) Create a new instance of GridFS >>> fs = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read() Gridfs Save files in mongoDB Stream data back to the client

Slide 26

Slide 26 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: 'tt-123', name: 'Coltrane: Impressions' }, { ski: 'tt-457', name: 'Davis: Kind of Blue' } ], address : { name: 'Banker', street: '111 Main', zip: 10010 }, payment: { cc: 4567, exp: Date(2012, 7, 7) }, subtotal: 2355 } Rich Documents

Slide 27

Slide 27 text

Scaling MongoDB • Replication - Read scalability • Replica Sets • Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced

Slide 28

Slide 28 text

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by