Using NoSQL with Yo' SQL

Using NoSQL with Yo' SQL

Supplementing your app with a slice of MongoDB. Use cases and code examples to get a feel for how to use Mongo in concert with a relational database application.

E837d2996f52008ace055a08fbfff992?s=128

Rich Thornett

May 23, 2012
Tweet

Transcript

  1. Using NoSQL with Yo’ SQL Supplementing your app with a

    slice of MongoDB Rich Thornett Dribbble Thursday, June 9, 2011
  2. Dribbble What are you working on? Show and tell for

    creatives via screenshots Thursday, June 9, 2011
  3. Your Father's Webapp Dribbble is a typical web application: Ruby

    on Rails + Relational Database We <3 PostgreSQL But for certain tasks ... Thursday, June 9, 2011
  4. Alternative Values More flexible data structures Easier horizontal scaling log

    | scale | optimize | aggregate | cache Thursday, June 9, 2011
  5. NoSQL No == Not Only (but sounds a bit stronger,

    no?) • No: Fixed table schemas • No: Joins • Yes: Scale horizontally Examples Memcached, Redis, CouchDB, Cassandra, MongoDB ... Thursday, June 9, 2011
  6. Exploring MongoDB • Persistent data store • Powerful query language

    (closest to RDBMs) • Broad feature set • Great community and documentation Utility belt that fits us? Thursday, June 9, 2011
  7. What is MongoDB? A document-oriented NoSQL database Collections & Documents

    v. Tables & Rows Thursday, June 9, 2011
  8. What's a document? Our old friend JavaScript { _id: ObjectId("4ddfe31db6bc16ab615e573d"),

    description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: ['can', 'haz', 'arrays'] } Documents are BSON (binary encoded JSON) Thursday, June 9, 2011
  9. Embedded Documents Avoid joins for "belongs to" associations { _id:

    ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: ['can', 'haz', 'arrays'] }) Thursday, June 9, 2011
  10. Arrays { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document",

    embedded_doc: { description: "I belong to my parent document" }, tags: ['can', 'haz', 'arrays'] }) Avoid joins for "tiny relations" thing tags thing_taggings Relational Cruft Thursday, June 9, 2011
  11. Googley “With MongoDB we can ... grow our data set

    horizontally on a cluster of commodity hardware and do distributed (read parallel execution of) queries/updates/inserts/deletes.” --Markus Gattol http://www.markus-gattol.name/ws/mongodb.html Thursday, June 9, 2011
  12. Replica Sets • Read Scaling • Data Redundancy • Automated

    Failover • Maintenance • Disaster Recovery Automate the storing of multiple copies of data Thursday, June 9, 2011
  13. Dude, who sharded? Relax, not you. You Specify a shard

    key for a collection Mongo Partitions the collection across machines Application Blissfully unaware (mostly :) Auto-sharding Thursday, June 9, 2011
  14. CoSQL Caching Analytics Logging Scaling Flexibility MongoDB MIND THE APP

    WEBAPP RDBMS Thursday, June 9, 2011
  15. Ads Let's Mongo! • Orthogonal to primary app • Few

    joins • Integrity not critical Thursday, June 9, 2011
  16. From the Console db.ads.insert({ advertiser_id: 1, type: 'text', url: 'http://dribbbler-on-the-roof.com',

    copy: 'Watch me!', runs: [{ start: new Date(2011, 4, 7), end: new Date(2011, 4, 14) }], created_at: new Date() }) Create a text ad But there are drivers for all major languages Thursday, June 9, 2011
  17. Querying Query by match db.ads.find({advertiser_id: 1}) Paging active ads //

    Page 2 of text ads running this month db.ads.find({ type: 'text', runs: { $elemMatch: { start: {$lte: new Date(2011, 4, 10)}, end: {$gte: new Date(2011, 4, 10)} } } }).sort({created_at: -1}).skip(15).limit(15) Thursday, June 9, 2011
  18. Advanced Queries $gt $lt $gte $lte $all $exists $size $type

    $elemMatch $not $where $mod $ne $in $nin $nor $or http://www.mongodb.org/display/DOCS/Advanced+Queries count | distinct | group Group does not work across shards, use map/reduce instead. Thursday, June 9, 2011
  19. Polymorphism // Banner ad has additional fields db.ads.insert({ advertiser_id: 1,

    type: 'banner', url: 'http://dribbble-me-this.com', copy: 'Buy me!', runs: [], image_file_name: 'ad.png', image_content_type: 'image/png', image_file_size: '33333' }) Easy inheritance. Document has whatever fields it needs. Single | Multiple | Joined table inheritance all present difficulties No DB changes to create new subclasses in Mongo Thursday, June 9, 2011
  20. Logging • Scale and query horizontally • Add fields on

    the fly • Writes: Fast, asynchronous, atomic Thursday, June 9, 2011
  21. Volume Logging • Ad impressions • Screenshot views • Profile

    views Fast, asynchronous writes and sharding FTW! Thursday, June 9, 2011
  22. Real-time Analytics db.trends.update( {date: "2011-04-10 13:00"}, // search criteria {

    $inc: { // increment 'user.simplebits.likes_received': 1, 'country.us.likes_received': 1, 'city.boston.likes_received': 1 } }, true // upsert ) What people and locations are trending this hour? upsert: Update document (if present) or insert it $inc: Increment field by amount (if present) or set to amount Thursday, June 9, 2011
  23. Flex Benefits • Add/nest new fields to measure with ease

    • Atomic upsert with $inc Replaces two-step, transactional find-and-update/create • Live, cached aggregation Thursday, June 9, 2011
  24. Scouting Thursday, June 9, 2011

  25. db.users.insert( { name: 'Dan Cederholm', available: true, skills: ['html', 'css',

    'illustration', 'icon design'] } ) Design a Designer Thursday, June 9, 2011
  26. db.users.ensureIndex({location: '2d'}) db.users.insert( { name: 'Dan Cederholm', // Salem longitude/latitude

    location: [-70.8972222, 42.5194444], available: true, skills: ['html', 'css', 'illustration', 'icon design'] } ) Geospatial Indexing Thursday, June 9, 2011
  27. Search by Location // Find users in the Boston area

    who: // are available for work // have expertise in HTML and icon design db.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: ['html', 'icon design']} }) Within area // $maxDistance: Find users in Boston area (w/in 50 miles) db.users.find({location: {$near: boston, $maxDistance: 0.7234842}}) Within area, matching criteria boston = [-71.0602778, 42.3583333] // long/lat Thursday, June 9, 2011
  28. Search Power Flexible Documents + Rich Query Language + Geospatial

    Indexing Thursday, June 9, 2011
  29. Stats Thursday, June 9, 2011

  30. Unique Views unique = remote_ip address / DAY a.k.a visitors

    per day Thursday, June 9, 2011
  31. Collections Input and output Map Returns 0..N key/value pairs per

    document Reduce Aggregates values per key Aggregate by key => GROUP BY in SQL Map/Reduce http://www.mongodb.org/display/DOCS/MapReduce Thursday, June 9, 2011
  32. Strategy Two-pass map/reduce to calculate unique visitors Pass 1 GROUP

    BY: profile, visitor COUNT: visits per visitor per profile Pass 2 GROUP BY: profile COUNT: visitors Thursday, June 9, 2011
  33. Profile View Data // Profile 1 {profile_id: 1, remote_ip: '127.0.0.1'}

    {profile_id: 1, remote_ip: '127.0.0.1'} {profile_id: 1, remote_ip: '127.0.0.2'} // Profile 2 {profile_id: 2, remote_ip: '127.0.0.4'} {profile_id: 2, remote_ip: '127.0.0.4'} Visits on a given day Thursday, June 9, 2011
  34. Pass 1: Map Function map = function() { var key

    = { profile_id: this.profile_id, remote_ip: this.remote_ip }; emit(key, {count: 1}); } Count visits per remote_ip per profile KEY = profile, remote_ip Thursday, June 9, 2011
  35. Reduce Function reduce = function(key, values) { var count =

    0; values.forEach(function(v) { count += v.count; }); return {count: count}; } Counts (occurrences of key) Thursday, June 9, 2011
  36. Pass 1: Run Map/Reduce db.profile_views.mapReduce(map, reduce, {out: 'profile_views_by_visitor'} ) //

    Results: Unique visitors per profile db.profile_views_by_visitor.find() { "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } } { "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } } { "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } } Count visits per remote_ip per profile Thursday, June 9, 2011
  37. Pass 2: Map/Reduce map = function() { emit(this._id.profile_id, {count: 1});

    } Count visitors per profile KEY = profile_id Thursday, June 9, 2011
  38. Pass 2: Results // Same reduce function as before db.profile_views_by_visitor.mapReduce(map,

    reduce, {out: 'profile_views_unique'} ) // Results db.profile_views_unique.find() { "_id" : 1, "value" : { "count" : 2 } } { "_id" : 2, "value" : { "count" : 1 } } Count visitors per profile Thursday, June 9, 2011
  39. Map/Deduce Large data sets, you get: • Horizontal scaling •

    Parallel processing across cluster Can be clunkier than GROUP BY in SQL. But ... JavaScript functions offers flexibility/power Thursday, June 9, 2011
  40. Activity SELECT * FROM everything; Too many tables to JOIN

    or UNION Thursday, June 9, 2011
  41. Relational solution Denormalized events table as activity log. Column |

    Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone | We use James Golick’s timeline_fu gem for Rails: https://github.com/jamesgolick/timeline_fu Thursday, June 9, 2011
  42. Direction Incoming Activity (recipients) Generated Activity (actors) Thursday, June 9,

    2011
  43. Complications Multiple recipients • Subscribe to comments for a shot

    • Twitter-style @ mentions in comments Confusing names • Generic names make queries and view logic hard to follow N+1 • Each event may require several lookups to get actor, subject, etc Thursday, June 9, 2011
  44. Events in Mongo { event_type: "created", subject_type: "Comment", actor_type: "User",

    subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Comment on a Screenshot containing an @ mention Screenshot owner and @user should be recipients. Mongo version of our timeline_events table Thursday, June 9, 2011
  45. Mongo Event v.2 { event_type: "created", subject_type: "Comment", actor_type: "User",

    subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [ {user_id: 2, reason: 'screenshot owner'}, {user_id: 3, reason: 'mention'} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Why is a user a recipient? Thursday, June 9, 2011
  46. Mongo Event v.3 { event_type: "created", subject_type: "Comment", actor_type: "User",

    subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [ {user_id: 2, reason: 'screenshot owner'}, {user_id: 3, reason: 'mention'} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Meaningful names Thursday, June 9, 2011
  47. Mongo Event v.4 { event_type: "created", subject_type: "Comment", user_id: 1,

    comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [ {user_id: 2, reason: 'screenshot owner'}, {user_id: 3, reason: 'mention'} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Denormalize to eliminate N+1s in view Thursday, June 9, 2011
  48. Denormalizing? You're giving up RDBMs benefits to optimize. Optimize your

    optimizations. Document flexibility: Data structures can mirror the view Thursday, June 9, 2011
  49. Caching • Grabs free memory as needed; no configured cache

    size • Relies on OS to reclaim memory (LRU) http://www.mongodb.org/display/DOCS/Caching MongoDB uses memory-mapped files Thursday, June 9, 2011
  50. Replace Redis/Memcached? FREQUENTLY accessed items LIKELY in memory Good enough

    for you? One less moving part. Thursday, June 9, 2011
  51. Cache Namespaces // Clear collection to expire db.ads_cache.remove() 'ad_1' 'ad_2'

    'ad_3' Memcached keys are flat No simple way to expire all Collection can serve as an expirable namespace Thursday, June 9, 2011
  52. Time to Mongo? Versatility? Data structure flexibility worth more than

    joins? Easier horizontal scaling? http://www.mongodb.org log | scale | optimize | aggregate | cache Thursday, June 9, 2011
  53. Cheers! Rich Thornett Dribbble http://dribbble.com @frogandcode Thursday, June 9, 2011