Slide 1

Slide 1 text

Building Your First MongoDB Application

Slide 2

Slide 2 text

Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: [email protected] twitter: RossC0

Slide 3

Slide 3 text

Today's Talk • Quick introduction to NoSQL • Some Background about mongoDB • Using mongoDB, general querying and usage • Building your first app, creating a location-based app • Data modelling in mongoDB, queries, geospatial, updates and map reduce (examples work in mongoDB JS shell)

Slide 4

Slide 4 text

What is NoSQL? Key / Value Column Graph Document

Slide 5

Slide 5 text

Key-Value Stores •A mapping from a key to a value •The store doesn't know anything about the the key or value •The store doesn't know anything about the insides of the value •Operations • Set, get, or delete a key-value pair

Slide 6

Slide 6 text

Column-Oriented Stores •Like a relational store, but flipped around: all data for a column is kept together • An index provides a means to get a column value for a record •Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop

Slide 7

Slide 7 text

Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints

Slide 8

Slide 8 text

Document Stores •The store is a container for documents • Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores •Can create "secondary indexes" • These provide the ability to query on any document field(s) •Operations: • Insert and delete documents • Update fields within documents

Slide 9

Slide 9 text

What is ? MongoDB is a scalable, high-performance, open source NoSQL database. •Document-oriented storage •Full Index Support •Querying •Fast In-Place Updates •Replication & High Availability •Auto-Sharding •Map/Reduce •GridFS

Slide 10

Slide 10 text

Database Landscape depth of functionality scalability & performance memcached key/value RDBMS

Slide 11

Slide 11 text

History •First release – February 2009 •v1.0 - August 2009 •v1.2 - December 2009 – MapReduce, ++ •v1.4 - March 2010 – Concurrency, Geo •V1.6 - August 2010 – Sharding, Replica Sets •V1.8 – March 2011 – Journaling, Geosphere •V2.0 - Sep 2011 – V1 Indexes, Concurrency •V2.2 - Soon - Aggregation, Concurrency

Slide 12

Slide 12 text

• Company behind mongoDB – (A)GPL license, own copyrights, engineering team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 100+ employees

Slide 13

Slide 13 text

Where can you use it? MongoDB is Implemented in C++ • Platforms 32/64 bit Windows, Linux, Mac OS-X, FreeBSD, Solaris Drivers are available in many languages 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... http://www.mongodb.org/display/DOCS/Drivers

Slide 14

Slide 14 text

Why use mongoDB? • Intrinsic support for agile development • Super low latency access to your data • Very little CPU overhead • No additional caching layer required • Built in replication and horizontal scaling support

Slide 15

Slide 15 text

Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key

Slide 16

Slide 16 text

> p = { author: "Ross", date: new Date(), text: "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p) Documents Blog Post Document

Slide 17

Slide 17 text

> db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Querying Notes: _id is unique, but can be anything you'd like

Slide 18

Slide 18 text

Introducing BSON JSON has powerful, but limited set of datatypes • arrays, objects, strings, numbers and null BSON is a binary representation of JSON • Adds extra dataypes with Date, Int types, Id, … • Optimized for performance and navigational abilities • And compression MongoDB sends and stores data in BSON

Slide 19

Slide 19 text

BSON http://bsonspec.org

Slide 20

Slide 20 text

// find posts with any tags > db.posts.find({tags: {$exists: true }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count() Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte

Slide 21

Slide 21 text

// 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.findOne({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... } Secondary Indexes Create index on any Field in Document

Slide 22

Slide 22 text

// 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1, ts: -1}) > db.posts.find({author: 'Ross'}).sort({ts: -1}) [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ...}, { _id : ObjectId("4f61d325c496820ceba84124"), author: "Ross", ...}] Compound Indexes Create index on multiple fields in a Document

Slide 23

Slide 23 text

> db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } } Examine the query plan

Slide 24

Slide 24 text

// Create a comment > new_comment = { author: "Fred", date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} }); Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

Slide 25

Slide 25 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents

Slide 26

Slide 26 text

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents

Slide 27

Slide 27 text

// Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"}) // Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } ) Secondary Indexes

Slide 28

Slide 28 text

Geo • Geo-spatial queries • Require a geo index • Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex({"author.location": "2d"}) > db.posts.find({ "author.location" : { $near : [22, 42] }})

Slide 29

Slide 29 text

Map Reduce The caller provides map and reduce functions written in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };

Slide 30

Slide 30 text

// run the map reduce > db.posts.mapReduce(map, reduce, {"out": { inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, } Map Reduce

Slide 31

Slide 31 text

// Count tags > agg = db.posts.aggregate( {$unwind: "$tags"}, {$group : {_id : "$tags", count : {$sum: 1}}} ) > agg.result [{"_id": "databases", "count": 1}, {"_id": "tech", "count": 1}] Aggregation - coming in 2.2

Slide 32

Slide 32 text

// (Python) Create a new instance of GridFS >>> fs = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read() GridFS Save files in mongoDB Stream data back to the client

Slide 33

Slide 33 text

Building Your First MongoDB App • Want to build an app where users can check in to a location • Leave notes or comments about that location Iterative Approach: • Decide requirements • Design documents • Rinse, repeat :-)

Slide 34

Slide 34 text

Requirements "As a user I want to be able to find other locations nearby" • Need to store locations (Offices, Restaurants etc) • name, address, tags • coordinates • user generated content e.g. tips / notes

Slide 35

Slide 35 text

Requirements "As a user I want to be able to 'checkin' to a location" Checkins • User should be able to 'check in' to a location • Want to be able to generate statistics: • Recent checkins • Popular locations

Slide 36

Slide 36 text

> location_1 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402" } Locations v1

Slide 37

Slide 37 text

> location_1 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402" } > db.locations.save(location_1) > db.locations.find({name: "Holiday Inn"}) Locations v1

Slide 38

Slide 38 text

> location_2 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"] } Locations v2

Slide 39

Slide 39 text

> location_2 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "hotel"}) Locations v2

Slide 40

Slide 40 text

> location_3 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387] } Locations v3

Slide 41

Slide 41 text

> location_3 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) Locations v3

Slide 42

Slide 42 text

// creating your indexes: > db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({tags: 1}) > db.locations.ensureIndex({lat_long: "2d"}) // with regular expressions: > db.locations.find({name: /^holid/i}) // by tag: > db.locations.find({tag: "hotel"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) Finding locations

Slide 43

Slide 43 text

// initial data load: > db.locations.insert(location_3) // adding a tip with update: > db.locations.update( {name: "Holiday Inn"}, {$push: { tips: { user: "Ross", date: ISODate("05-04-2012"), tip: "Check out my replication talk later!"} }}) Inserting locations - adding tips

Slide 44

Slide 44 text

> db.locations.findOne() { _id : ObjectId("5c4ba5c0672c685e5e8aabf3"), name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387], tips:[{ user: "Ross", date: ISODate("05-04-2012"), tip: "Check out my replication talk later!" }] } Tips added

Slide 45

Slide 45 text

Requirements "As a user I want to be able to find other locations nearby" Need to store locations (Offices, Restaurants etc) • name, address, tags • coordinates • user generated content e.g. tips / notes "As a user I want to be able to 'checkin' to a location" Checkins • User should be able to 'check in' to a location • Want to be able to generate statistics: • Recent checkins • Popular locations

Slide 46

Slide 46 text

> user_1 = { _id: "[email protected]", name: "Ross", twitter: "RossC0", checkins: [ {location: "Holiday Inn", ts: "25/04/2012"}, {location: "Munich Airport", ts: "24/04/2012"} ] } Users and Checkins

Slide 47

Slide 47 text

// find all users who've checked in here: > db.users.find({"checkins.location":"Holiday Inn"}) Simple Stats

Slide 48

Slide 48 text

// find all users who've checked in here: > db.users.find({"checkins.location":"Holiday Inn"}) // Can't find the last 10 checkins easily > db.users.find({"checkins.location":"Holiday Inn"}) .sort({"checkins.ts": -1}).limit(10) Schema hard to query for stats. Simple Stats

Slide 49

Slide 49 text

> user_2 = { _id: "[email protected]", name: "Ross", twitter: "RossC0", } > checkin_1 = { location: location_id, user: user_id, ts: ISODate("05-04-2012") } > db.checkins.find({user: user_id}) User and Checkins v2

Slide 50

Slide 50 text

// find all users who've checked in here: > loc = db.locations.findOne({"name":"Holiday Inn"}) > checkins = db.checkins.find({location: loc._id}) > u_ids = checkins.map(function(c){return c.user}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: loc._id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: loc._id, ts: {$gt: midnight}} ).count() Simple Stats

Slide 51

Slide 51 text

// Find most popular locations > map_func = function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Holiday Inn", "value" : 99} Map Reduce

Slide 52

Slide 52 text

// Find most popular locations > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", value: {$sum: 1}}} ) > agg.result [{"_id": "Holiday Inn", "value" : 17}] Aggregation

Slide 53

Slide 53 text

P Deployment • Single server - need a strong backup plan

Slide 54

Slide 54 text

Deployment • Single server - need a strong backup plan • Replica sets - High availability - Automatic failover P P S S

Slide 55

Slide 55 text

Deployment • Single server - need a strong backup plan • Replica sets - High availability - Automatic failover • Sharded - Horizontally scale - Auto balancing P S S P S S P P S S

Slide 56

Slide 56 text

MongoDB Use Cases • Archiving • Content Management • Ecommerce • Finance • Gaming • Government • Metadata Storage • News & Media • Online Advertising • Online Collaboration • Real-time stats/analytics • Social Networks • Telecommunications

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

Any Questions?

Slide 59

Slide 59 text

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by