Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building your first mongoDB application

rozza
April 25, 2012

Building your first mongoDB application

Presentation from this years OSDC.de
===========================

MongoDB - from "humongous" - is an open source, non-relational, document-oriented database. Trading off a few traditional features of databases (Notably joins and transactions) in order to Achieve much better performance, MongoDB is fast, scalable, and designed for web development. The goal of the project is MongoDB to bridge the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which queries Provide rich and deep functionality). This talk will introduce the features of MongoDB by walking through how One Can building a simple location-based application using MongoDB. The talk will cover the basics of MongoDB's document model, query language, map-reduce framework and deployment architecture.

rozza

April 25, 2012
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Ross Lawley Python Engineer @ 10gen Web developer since 1999

    Passionate about open source Agile methodology email: [email protected] twitter: RossC0
  2. Today's Talk • Quick introduction to NoSQL • Some Background

    about mongoDB • Using mongoDB, general querying and usage • Building your first app, creating a location-based app • Data modelling in mongoDB, queries, geospatial, updates and map reduce (examples work in mongoDB JS shell)
  3. Key-Value Stores •A mapping from a key to a value

    •The store doesn't know anything about the the key or value •The store doesn't know anything about the insides of the value •Operations • Set, get, or delete a key-value pair
  4. Column-Oriented Stores •Like a relational store, but flipped around: all

    data for a column is kept together • An index provides a means to get a column value for a record •Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop
  5. Graph Databases •Stores vertex-to-vertex edges •Operations: • Getting and setting

    edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints
  6. Document Stores •The store is a container for documents •

    Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores •Can create "secondary indexes" • These provide the ability to query on any document field(s) •Operations: • Insert and delete documents • Update fields within documents
  7. What is ? MongoDB is a scalable, high-performance, open source

    NoSQL database. •Document-oriented storage •Full Index Support •Querying •Fast In-Place Updates •Replication & High Availability •Auto-Sharding •Map/Reduce •GridFS
  8. History •First release – February 2009 •v1.0 - August 2009

    •v1.2 - December 2009 – MapReduce, ++ •v1.4 - March 2010 – Concurrency, Geo •V1.6 - August 2010 – Sharding, Replica Sets •V1.8 – March 2011 – Journaling, Geosphere •V2.0 - Sep 2011 – V1 Indexes, Concurrency •V2.2 - Soon - Aggregation, Concurrency
  9. • Company behind mongoDB – (A)GPL license, own copyrights, engineering

    team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 100+ employees
  10. Where can you use it? MongoDB is Implemented in C++

    • Platforms 32/64 bit Windows, Linux, Mac OS-X, FreeBSD, Solaris Drivers are available in many languages 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... http://www.mongodb.org/display/DOCS/Drivers
  11. Why use mongoDB? • Intrinsic support for agile development •

    Super low latency access to your data • Very little CPU overhead • No additional caching layer required • Built in replication and horizontal scaling support
  12. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index

    Join Embedding & Linking Partition Shard Partition Key Shard Key
  13. > p = { author: "Ross", date: new Date(), text:

    "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p) Documents Blog Post Document
  14. > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date

    : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Querying Notes: _id is unique, but can be anything you'd like
  15. Introducing BSON JSON has powerful, but limited set of datatypes

    • arrays, objects, strings, numbers and null BSON is a binary representation of JSON • Adds extra dataypes with Date, Int types, Id, … • Optimized for performance and navigational abilities • And compression MongoDB sends and stores data in BSON
  16. // find posts with any tags > db.posts.find({tags: {$exists: true

    }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count() Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte
  17. // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1})

    > db.posts.findOne({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ... } Secondary Indexes Create index on any Field in Document
  18. // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1,

    ts: -1}) > db.posts.find({author: 'Ross'}).sort({ts: -1}) [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross", ...}, { _id : ObjectId("4f61d325c496820ceba84124"), author: "Ross", ...}] Compound Indexes Create index on multiple fields in a Document
  19. > db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" :

    1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Ross", "Ross" ] ] } } Examine the query plan
  20. // Create a comment > new_comment = { author: "Fred",

    date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} }); Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit
  21. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

    Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents
  22. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu

    Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 } Nested Documents
  23. // Index nested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"})

    // Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } ) Secondary Indexes
  24. Geo • Geo-spatial queries • Require a geo index •

    Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex({"author.location": "2d"}) > db.posts.find({ "author.location" : { $near : [22, 42] }})
  25. Map Reduce The caller provides map and reduce functions written

    in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };
  26. // run the map reduce > db.posts.mapReduce(map, reduce, {"out": {

    inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, } Map Reduce
  27. // Count tags > agg = db.posts.aggregate( {$unwind: "$tags"}, {$group

    : {_id : "$tags", count : {$sum: 1}}} ) > agg.result [{"_id": "databases", "count": 1}, {"_id": "tech", "count": 1}] Aggregation - coming in 2.2
  28. // (Python) Create a new instance of GridFS >>> fs

    = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read() GridFS Save files in mongoDB Stream data back to the client
  29. Building Your First MongoDB App • Want to build an

    app where users can check in to a location • Leave notes or comments about that location Iterative Approach: • Decide requirements • Design documents • Rinse, repeat :-)
  30. Requirements "As a user I want to be able to

    find other locations nearby" • Need to store locations (Offices, Restaurants etc) • name, address, tags • coordinates • user generated content e.g. tips / notes
  31. Requirements "As a user I want to be able to

    'checkin' to a location" Checkins • User should be able to 'check in' to a location • Want to be able to generate statistics: • Recent checkins • Popular locations
  32. > location_1 = { name: "Holiday Inn", address: "Engelhardsgasse 12",

    city: "Nürnberg ", post_code: "D-90402" } Locations v1
  33. > location_1 = { name: "Holiday Inn", address: "Engelhardsgasse 12",

    city: "Nürnberg ", post_code: "D-90402" } > db.locations.save(location_1) > db.locations.find({name: "Holiday Inn"}) Locations v1
  34. > location_2 = { name: "Holiday Inn", address: "Engelhardsgasse 12",

    city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"] } Locations v2
  35. > location_2 = { name: "Holiday Inn", address: "Engelhardsgasse 12",

    city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "hotel"}) Locations v2
  36. > location_3 = { name: "Holiday Inn", address: "Engelhardsgasse 12",

    city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387] } Locations v3
  37. > location_3 = { name: "Holiday Inn", address: "Engelhardsgasse 12",

    city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) Locations v3
  38. // creating your indexes: > db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({tags: 1})

    > db.locations.ensureIndex({lat_long: "2d"}) // with regular expressions: > db.locations.find({name: /^holid/i}) // by tag: > db.locations.find({tag: "hotel"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) Finding locations
  39. // initial data load: > db.locations.insert(location_3) // adding a tip

    with update: > db.locations.update( {name: "Holiday Inn"}, {$push: { tips: { user: "Ross", date: ISODate("05-04-2012"), tip: "Check out my replication talk later!"} }}) Inserting locations - adding tips
  40. > db.locations.findOne() { _id : ObjectId("5c4ba5c0672c685e5e8aabf3"), name: "Holiday Inn", address:

    "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387], tips:[{ user: "Ross", date: ISODate("05-04-2012"), tip: "Check out my replication talk later!" }] } Tips added
  41. Requirements "As a user I want to be able to

    find other locations nearby" Need to store locations (Offices, Restaurants etc) • name, address, tags • coordinates • user generated content e.g. tips / notes "As a user I want to be able to 'checkin' to a location" Checkins • User should be able to 'check in' to a location • Want to be able to generate statistics: • Recent checkins • Popular locations
  42. > user_1 = { _id: "[email protected]", name: "Ross", twitter: "RossC0",

    checkins: [ {location: "Holiday Inn", ts: "25/04/2012"}, {location: "Munich Airport", ts: "24/04/2012"} ] } Users and Checkins
  43. // find all users who've checked in here: > db.users.find({"checkins.location":"Holiday

    Inn"}) // Can't find the last 10 checkins easily > db.users.find({"checkins.location":"Holiday Inn"}) .sort({"checkins.ts": -1}).limit(10) Schema hard to query for stats. Simple Stats
  44. > user_2 = { _id: "[email protected]", name: "Ross", twitter: "RossC0",

    } > checkin_1 = { location: location_id, user: user_id, ts: ISODate("05-04-2012") } > db.checkins.find({user: user_id}) User and Checkins v2
  45. // find all users who've checked in here: > loc

    = db.locations.findOne({"name":"Holiday Inn"}) > checkins = db.checkins.find({location: loc._id}) > u_ids = checkins.map(function(c){return c.user}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: loc._id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: loc._id, ts: {$gt: midnight}} ).count() Simple Stats
  46. // Find most popular locations > map_func = function() {

    emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Holiday Inn", "value" : 99} Map Reduce
  47. // Find most popular locations > agg = db.checkins.aggregate( {$match:

    {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", value: {$sum: 1}}} ) > agg.result [{"_id": "Holiday Inn", "value" : 17}] Aggregation
  48. Deployment • Single server - need a strong backup plan

    • Replica sets - High availability - Automatic failover P P S S
  49. Deployment • Single server - need a strong backup plan

    • Replica sets - High availability - Automatic failover • Sharded - Horizontally scale - Auto balancing P S S P S S P P S S
  50. MongoDB Use Cases • Archiving • Content Management • Ecommerce

    • Finance • Gaming • Government • Metadata Storage • News & Media • Online Advertising • Online Collaboration • Real-time stats/analytics • Social Networks • Telecommunications
  51. @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter

    | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by