mongoDB Brussels - mongoDB an introduction

Ross Lawley - [email protected] @rossc0 Introduction

Database Landscape depth of functionality scalability & performance memcached key/value
RDBMS

What is noSQL? •Key-value stores •Document stores •Column-oriented databases •Graph
databases

Key-Value Stores •A mapping from a key to a value
•The store doesn't know anything about the the key or value •The store doesn't know anything about the insides of the value •Operations • Set, get, or delete a key-value pair

Document Stores •The store is a container for documents •
Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores •Can create "secondary indexes" • These provide the ability to query on any document field(s) •Operations: • Insert and delete documents • Update fields within documents

Column-Oriented Stores •Like a relational store, but ﬂipped around: all
data for a column is kept together • An index provides a means to get a column value for a record •Operations: • Get, insert, delete records; updating ﬁelds • Streaming column data in and out of Hadoop

Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting
edges • Sometimes possible to annotate vertices or edges • Query languages support ﬁnding paths between vertices, subject to various constraints

What is mongoDB? MongoDB is a scalable, high-performance, open source
NoSQL database. •Document-oriented storage •Full Index Support •Replication & High Availability •Auto-Sharding •Querying •Fast In-Place Updates •Map/Reduce •GridFS

• Company behind mongoDB – (A)GPL license, own copyrights, engineering
team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 90+ employees

History •First release – February 2009 •v1.0 - August 2009
•v1.2 - December 2009 – MapReduce, ++ •v1.4 - March 2010 – Concurrency, Geo •V1.6 - August 2010 – Sharding, Replica Sets •V1.8 – March 2011 – Journaling, Geosphere •V2.0 - Sep 2011 – V1 Indexes, Concurrency •V2.2 - Soon - Aggregation, Concurrency

MongoDB Access Drivers are available in many languages 10gen supported
• C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... • http://www.mongodb.org/display/DOCS/Drivers

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index
Join Embedding & Linking

> p = { author: "Ross", date: new Date(), text:
"About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p) Documents Blog Post Document

> db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date
: ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Querying Notes: _id is unique, but can be anything you'd like

// 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1})
> db.posts.find({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... } Secondary Indexes Create index on any Field in Document

// find posts with any tags > db.posts.find({tags: {$exists: true
}}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count() Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte

> db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" :
1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } } Examine the query plan

// Create a comment > new_comment = { author: "Fred",
date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} }); Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu
Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents

// Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"})
// Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } ) Secondary Indexes

Geo • Geo-spatial queries • Require a geo index •
Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex( "author.location": "2d" ) > db.posts.find( "author.location" : { $near : [22, 42] } )

Map Reduce The caller provides map and reduce functions written
in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };

// run the map reduce > db.posts.mapReduce(map, reduce, {"out": {
inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, } Map Reduce

// (Python) Create a new instance of GridFS >>> fs
= gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read() Gridfs Save ﬁles in mongoDB Stream data back to the client

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: 'tt-123',
name: 'Coltrane: Impressions' }, { ski: 'tt-457', name: 'Davis: Kind of Blue' } ], address : { name: 'Banker', street: '111 Main', zip: 10010 }, payment: { cc: 4567, exp: Date(2012, 7, 7) }, subtotal: 2355 } Rich Documents

Scaling MongoDB • Replication - Read scalability • Replica Sets
• Sharding – Read and write scalability • Collections are sharded • Each shard is served by its own replica set • Shard key ranges are automatically balanced

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter
| LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by

mongoDB Brussels - mongoDB an introduction

mongoDB Brussels - mongoDB an introduction

rozza

More Decks by rozza

Other Decks in Technology

Featured

Transcript

Ross Lawley - [email protected] @rossc0 Introduction

Database Landscape depth of functionality scalability & performance memcached key/value

What is noSQL? •Key-value stores •Document stores •Column-oriented databases •Graph

Key-Value Stores •A mapping from a key to a value

Document Stores •The store is a container for documents •

Column-Oriented Stores •Like a relational store, but ﬂipped around: all

Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting

What is mongoDB? MongoDB is a scalable, high-performance, open source

• Company behind mongoDB – (A)GPL license, own copyrights, engineering

History •First release – February 2009 •v1.0 - August 2009

MongoDB Access Drivers are available in many languages 10gen supported

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index

> p = { author: "Ross", date: new Date(), text:

> db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date

// 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1})

// find posts with any tags > db.posts.find({tags: {$exists: true

> db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" :

// Create a comment > new_comment = { author: "Fred",

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

// Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"})

Geo • Geo-spatial queries • Require a geo index •

Map Reduce The caller provides map and reduce functions written

// run the map reduce > db.posts.mapReduce(map, reduce, {"out": {

// (Python) Create a new instance of GridFS >>> fs

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: 'tt-123',

Scaling MongoDB • Replication - Read scalability • Replica Sets

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter