Slide 1

Slide 1 text

Using NoSQL with Yo’ SQL Supplementing your app with a slice of MongoDB Rich Thornett Dribbble Thursday, June 9, 2011

Slide 2

Slide 2 text

Dribbble What are you working on? Show and tell for creatives via screenshots Thursday, June 9, 2011

Slide 3

Slide 3 text

Your Father's Webapp Dribbble is a typical web application: Ruby on Rails + Relational Database We <3 PostgreSQL But for certain tasks ... Thursday, June 9, 2011

Slide 4

Slide 4 text

Alternative Values More flexible data structures Easier horizontal scaling log | scale | optimize | aggregate | cache Thursday, June 9, 2011

Slide 5

Slide 5 text

NoSQL No == Not Only (but sounds a bit stronger, no?) • No: Fixed table schemas • No: Joins • Yes: Scale horizontally Examples Memcached, Redis, CouchDB, Cassandra, MongoDB ... Thursday, June 9, 2011

Slide 6

Slide 6 text

Exploring MongoDB • Persistent data store • Powerful query language (closest to RDBMs) • Broad feature set • Great community and documentation Utility belt that fits us? Thursday, June 9, 2011

Slide 7

Slide 7 text

What is MongoDB? A document-oriented NoSQL database Collections & Documents v. Tables & Rows Thursday, June 9, 2011

Slide 8

Slide 8 text

What's a document? Our old friend JavaScript { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: ['can', 'haz', 'arrays'] } Documents are BSON (binary encoded JSON) Thursday, June 9, 2011

Slide 9

Slide 9 text

Embedded Documents Avoid joins for "belongs to" associations { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: ['can', 'haz', 'arrays'] }) Thursday, June 9, 2011

Slide 10

Slide 10 text

Arrays { _id: ObjectId("4ddfe31db6bc16ab615e573d"), description: "This is a BSON document", embedded_doc: { description: "I belong to my parent document" }, tags: ['can', 'haz', 'arrays'] }) Avoid joins for "tiny relations" thing tags thing_taggings Relational Cruft Thursday, June 9, 2011

Slide 11

Slide 11 text

Googley “With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed (read parallel execution of) queries/updates/inserts/deletes.” --Markus Gattol http://www.markus-gattol.name/ws/mongodb.html Thursday, June 9, 2011

Slide 12

Slide 12 text

Replica Sets • Read Scaling • Data Redundancy • Automated Failover • Maintenance • Disaster Recovery Automate the storing of multiple copies of data Thursday, June 9, 2011

Slide 13

Slide 13 text

Dude, who sharded? Relax, not you. You Specify a shard key for a collection Mongo Partitions the collection across machines Application Blissfully unaware (mostly :) Auto-sharding Thursday, June 9, 2011

Slide 14

Slide 14 text

CoSQL Caching Analytics Logging Scaling Flexibility MongoDB MIND THE APP WEBAPP RDBMS Thursday, June 9, 2011

Slide 15

Slide 15 text

Ads Let's Mongo! • Orthogonal to primary app • Few joins • Integrity not critical Thursday, June 9, 2011

Slide 16

Slide 16 text

From the Console db.ads.insert({ advertiser_id: 1, type: 'text', url: 'http://dribbbler-on-the-roof.com', copy: 'Watch me!', runs: [{ start: new Date(2011, 4, 7), end: new Date(2011, 4, 14) }], created_at: new Date() }) Create a text ad But there are drivers for all major languages Thursday, June 9, 2011

Slide 17

Slide 17 text

Querying Query by match db.ads.find({advertiser_id: 1}) Paging active ads // Page 2 of text ads running this month db.ads.find({ type: 'text', runs: { $elemMatch: { start: {$lte: new Date(2011, 4, 10)}, end: {$gte: new Date(2011, 4, 10)} } } }).sort({created_at: -1}).skip(15).limit(15) Thursday, June 9, 2011

Slide 18

Slide 18 text

Advanced Queries $gt $lt $gte $lte $all $exists $size $type $elemMatch $not $where $mod $ne $in $nin $nor $or http://www.mongodb.org/display/DOCS/Advanced+Queries count | distinct | group Group does not work across shards, use map/reduce instead. Thursday, June 9, 2011

Slide 19

Slide 19 text

Polymorphism // Banner ad has additional fields db.ads.insert({ advertiser_id: 1, type: 'banner', url: 'http://dribbble-me-this.com', copy: 'Buy me!', runs: [], image_file_name: 'ad.png', image_content_type: 'image/png', image_file_size: '33333' }) Easy inheritance. Document has whatever fields it needs. Single | Multiple | Joined table inheritance all present difficulties No DB changes to create new subclasses in Mongo Thursday, June 9, 2011

Slide 20

Slide 20 text

Logging • Scale and query horizontally • Add fields on the fly • Writes: Fast, asynchronous, atomic Thursday, June 9, 2011

Slide 21

Slide 21 text

Volume Logging • Ad impressions • Screenshot views • Profile views Fast, asynchronous writes and sharding FTW! Thursday, June 9, 2011

Slide 22

Slide 22 text

Real-time Analytics db.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment 'user.simplebits.likes_received': 1, 'country.us.likes_received': 1, 'city.boston.likes_received': 1 } }, true // upsert ) What people and locations are trending this hour? upsert: Update document (if present) or insert it $inc: Increment field by amount (if present) or set to amount Thursday, June 9, 2011

Slide 23

Slide 23 text

Flex Benefits • Add/nest new fields to measure with ease • Atomic upsert with $inc Replaces two-step, transactional find-and-update/create • Live, cached aggregation Thursday, June 9, 2011

Slide 24

Slide 24 text

Scouting Thursday, June 9, 2011

Slide 25

Slide 25 text

db.users.insert( { name: 'Dan Cederholm', available: true, skills: ['html', 'css', 'illustration', 'icon design'] } ) Design a Designer Thursday, June 9, 2011

Slide 26

Slide 26 text

db.users.ensureIndex({location: '2d'}) db.users.insert( { name: 'Dan Cederholm', // Salem longitude/latitude location: [-70.8972222, 42.5194444], available: true, skills: ['html', 'css', 'illustration', 'icon design'] } ) Geospatial Indexing Thursday, June 9, 2011

Slide 27

Slide 27 text

Search by Location // Find users in the Boston area who: // are available for work // have expertise in HTML and icon design db.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: ['html', 'icon design']} }) Within area // $maxDistance: Find users in Boston area (w/in 50 miles) db.users.find({location: {$near: boston, $maxDistance: 0.7234842}}) Within area, matching criteria boston = [-71.0602778, 42.3583333] // long/lat Thursday, June 9, 2011

Slide 28

Slide 28 text

Search Power Flexible Documents + Rich Query Language + Geospatial Indexing Thursday, June 9, 2011

Slide 29

Slide 29 text

Stats Thursday, June 9, 2011

Slide 30

Slide 30 text

Unique Views unique = remote_ip address / DAY a.k.a visitors per day Thursday, June 9, 2011

Slide 31

Slide 31 text

Collections Input and output Map Returns 0..N key/value pairs per document Reduce Aggregates values per key Aggregate by key => GROUP BY in SQL Map/Reduce http://www.mongodb.org/display/DOCS/MapReduce Thursday, June 9, 2011

Slide 32

Slide 32 text

Strategy Two-pass map/reduce to calculate unique visitors Pass 1 GROUP BY: profile, visitor COUNT: visits per visitor per profile Pass 2 GROUP BY: profile COUNT: visitors Thursday, June 9, 2011

Slide 33

Slide 33 text

Profile View Data // Profile 1 {profile_id: 1, remote_ip: '127.0.0.1'} {profile_id: 1, remote_ip: '127.0.0.1'} {profile_id: 1, remote_ip: '127.0.0.2'} // Profile 2 {profile_id: 2, remote_ip: '127.0.0.4'} {profile_id: 2, remote_ip: '127.0.0.4'} Visits on a given day Thursday, June 9, 2011

Slide 34

Slide 34 text

Pass 1: Map Function map = function() { var key = { profile_id: this.profile_id, remote_ip: this.remote_ip }; emit(key, {count: 1}); } Count visits per remote_ip per profile KEY = profile, remote_ip Thursday, June 9, 2011

Slide 35

Slide 35 text

Reduce Function reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v.count; }); return {count: count}; } Counts (occurrences of key) Thursday, June 9, 2011

Slide 36

Slide 36 text

Pass 1: Run Map/Reduce db.profile_views.mapReduce(map, reduce, {out: 'profile_views_by_visitor'} ) // Results: Unique visitors per profile db.profile_views_by_visitor.find() { "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } } { "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } } { "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } } Count visits per remote_ip per profile Thursday, June 9, 2011

Slide 37

Slide 37 text

Pass 2: Map/Reduce map = function() { emit(this._id.profile_id, {count: 1}); } Count visitors per profile KEY = profile_id Thursday, June 9, 2011

Slide 38

Slide 38 text

Pass 2: Results // Same reduce function as before db.profile_views_by_visitor.mapReduce(map, reduce, {out: 'profile_views_unique'} ) // Results db.profile_views_unique.find() { "_id" : 1, "value" : { "count" : 2 } } { "_id" : 2, "value" : { "count" : 1 } } Count visitors per profile Thursday, June 9, 2011

Slide 39

Slide 39 text

Map/Deduce Large data sets, you get: • Horizontal scaling • Parallel processing across cluster Can be clunkier than GROUP BY in SQL. But ... JavaScript functions offers flexibility/power Thursday, June 9, 2011

Slide 40

Slide 40 text

Activity SELECT * FROM everything; Too many tables to JOIN or UNION Thursday, June 9, 2011

Slide 41

Slide 41 text

Relational solution Denormalized events table as activity log. Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone | We use James Golick’s timeline_fu gem for Rails: https://github.com/jamesgolick/timeline_fu Thursday, June 9, 2011

Slide 42

Slide 42 text

Direction Incoming Activity (recipients) Generated Activity (actors) Thursday, June 9, 2011

Slide 43

Slide 43 text

Complications Multiple recipients • Subscribe to comments for a shot • Twitter-style @ mentions in comments Confusing names • Generic names make queries and view logic hard to follow N+1 • Each event may require several lookups to get actor, subject, etc Thursday, June 9, 2011

Slide 44

Slide 44 text

Events in Mongo { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Comment on a Screenshot containing an @ mention Screenshot owner and @user should be recipients. Mongo version of our timeline_events table Thursday, June 9, 2011

Slide 45

Slide 45 text

Mongo Event v.2 { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [ {user_id: 2, reason: 'screenshot owner'}, {user_id: 3, reason: 'mention'} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Why is a user a recipient? Thursday, June 9, 2011

Slide 46

Slide 46 text

Mongo Event v.3 { event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [ {user_id: 2, reason: 'screenshot owner'}, {user_id: 3, reason: 'mention'} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Meaningful names Thursday, June 9, 2011

Slide 47

Slide 47 text

Mongo Event v.4 { event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [ {user_id: 2, reason: 'screenshot owner'}, {user_id: 3, reason: 'mention'} ], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)" } Denormalize to eliminate N+1s in view Thursday, June 9, 2011

Slide 48

Slide 48 text

Denormalizing? You're giving up RDBMs benefits to optimize. Optimize your optimizations. Document flexibility: Data structures can mirror the view Thursday, June 9, 2011

Slide 49

Slide 49 text

Caching • Grabs free memory as needed; no configured cache size • Relies on OS to reclaim memory (LRU) http://www.mongodb.org/display/DOCS/Caching MongoDB uses memory-mapped files Thursday, June 9, 2011

Slide 50

Slide 50 text

Replace Redis/Memcached? FREQUENTLY accessed items LIKELY in memory Good enough for you? One less moving part. Thursday, June 9, 2011

Slide 51

Slide 51 text

Cache Namespaces // Clear collection to expire db.ads_cache.remove() 'ad_1' 'ad_2' 'ad_3' Memcached keys are flat No simple way to expire all Collection can serve as an expirable namespace Thursday, June 9, 2011

Slide 52

Slide 52 text

Time to Mongo? Versatility? Data structure flexibility worth more than joins? Easier horizontal scaling? http://www.mongodb.org log | scale | optimize | aggregate | cache Thursday, June 9, 2011

Slide 53

Slide 53 text

Cheers! Rich Thornett Dribbble http://dribbble.com @frogandcode Thursday, June 9, 2011