Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using NoSQL with Yo' SQL

Using NoSQL with Yo' SQL

Supplementing your app with a slice of MongoDB. Use cases and code examples to get a feel for how to use Mongo in concert with a relational database application.

Rich Thornett

May 23, 2012
Tweet

More Decks by Rich Thornett

Other Decks in Programming

Transcript

  1. Using NoSQL with Yo’ SQL
    Supplementing your app with a slice of MongoDB
    Rich Thornett
    Dribbble
    Thursday, June 9, 2011

    View full-size slide

  2. Dribbble
    What are you working on?
    Show and tell for creatives via screenshots
    Thursday, June 9, 2011

    View full-size slide

  3. Your Father's Webapp
    Dribbble is a typical web application:
    Ruby on Rails + Relational Database
    We <3 PostgreSQL
    But for certain tasks ...
    Thursday, June 9, 2011

    View full-size slide

  4. Alternative Values
    More flexible data structures
    Easier horizontal scaling
    log | scale | optimize | aggregate | cache
    Thursday, June 9, 2011

    View full-size slide

  5. NoSQL
    No == Not Only
    (but sounds a bit stronger, no?)
    • No: Fixed table schemas
    • No: Joins
    • Yes: Scale horizontally
    Examples
    Memcached, Redis, CouchDB, Cassandra, MongoDB ...
    Thursday, June 9, 2011

    View full-size slide

  6. Exploring MongoDB
    • Persistent data store
    • Powerful query language (closest to RDBMs)
    • Broad feature set
    • Great community and documentation
    Utility belt that fits us?
    Thursday, June 9, 2011

    View full-size slide

  7. What is MongoDB?
    A document-oriented NoSQL database
    Collections & Documents
    v.
    Tables & Rows
    Thursday, June 9, 2011

    View full-size slide

  8. What's a document?
    Our old friend JavaScript
    {
    _id: ObjectId("4ddfe31db6bc16ab615e573d"),
    description: "This is a BSON document",
    embedded_doc: {
    description: "I belong to my parent document"
    },
    tags: ['can', 'haz', 'arrays']
    }
    Documents are BSON (binary encoded JSON)
    Thursday, June 9, 2011

    View full-size slide

  9. Embedded Documents
    Avoid joins for "belongs to" associations
    {
    _id: ObjectId("4ddfe31db6bc16ab615e573d"),
    description: "This is a BSON document",
    embedded_doc: {
    description: "I belong to my parent document"
    },
    tags: ['can', 'haz', 'arrays']
    })
    Thursday, June 9, 2011

    View full-size slide

  10. Arrays
    {
    _id: ObjectId("4ddfe31db6bc16ab615e573d"),
    description: "This is a BSON document",
    embedded_doc: {
    description: "I belong to my parent document"
    },
    tags: ['can', 'haz', 'arrays']
    })
    Avoid joins for "tiny relations"
    thing tags
    thing_taggings
    Relational Cruft
    Thursday, June 9, 2011

    View full-size slide

  11. Googley
    “With MongoDB we can ... grow our data set horizontally
    on a cluster of commodity hardware and do distributed
    (read parallel execution of) queries/updates/inserts/deletes.”
    --Markus Gattol
    http://www.markus-gattol.name/ws/mongodb.html
    Thursday, June 9, 2011

    View full-size slide

  12. Replica Sets
    • Read Scaling
    • Data Redundancy
    • Automated Failover
    • Maintenance
    • Disaster Recovery
    Automate the storing of multiple copies of data
    Thursday, June 9, 2011

    View full-size slide

  13. Dude, who sharded?
    Relax, not you.
    You
    Specify a shard key for a collection
    Mongo
    Partitions the collection across machines
    Application
    Blissfully unaware (mostly :)
    Auto-sharding
    Thursday, June 9, 2011

    View full-size slide

  14. CoSQL
    Caching
    Analytics
    Logging
    Scaling
    Flexibility
    MongoDB
    MIND THE APP
    WEBAPP
    RDBMS
    Thursday, June 9, 2011

    View full-size slide

  15. Ads
    Let's Mongo!
    • Orthogonal to primary app
    • Few joins
    • Integrity not critical
    Thursday, June 9, 2011

    View full-size slide

  16. From the Console
    db.ads.insert({
    advertiser_id: 1,
    type: 'text',
    url: 'http://dribbbler-on-the-roof.com',
    copy: 'Watch me!',
    runs: [{
    start: new Date(2011, 4, 7),
    end: new Date(2011, 4, 14)
    }],
    created_at: new Date()
    })
    Create a text ad
    But there are drivers for all major languages
    Thursday, June 9, 2011

    View full-size slide

  17. Querying
    Query by match
    db.ads.find({advertiser_id: 1})
    Paging active ads
    // Page 2 of text ads running this month
    db.ads.find({
    type: 'text',
    runs: {
    $elemMatch: {
    start: {$lte: new Date(2011, 4, 10)},
    end: {$gte: new Date(2011, 4, 10)}
    }
    }
    }).sort({created_at: -1}).skip(15).limit(15)
    Thursday, June 9, 2011

    View full-size slide

  18. Advanced Queries
    $gt
    $lt
    $gte
    $lte
    $all
    $exists
    $size
    $type
    $elemMatch
    $not
    $where
    $mod
    $ne
    $in
    $nin
    $nor
    $or
    http://www.mongodb.org/display/DOCS/Advanced+Queries
    count | distinct | group
    Group does not work across shards, use map/reduce instead.
    Thursday, June 9, 2011

    View full-size slide

  19. Polymorphism
    // Banner ad has additional fields
    db.ads.insert({
    advertiser_id: 1,
    type: 'banner',
    url: 'http://dribbble-me-this.com',
    copy: 'Buy me!',
    runs: [],
    image_file_name: 'ad.png',
    image_content_type: 'image/png',
    image_file_size: '33333'
    })
    Easy inheritance. Document has whatever fields it needs.
    Single | Multiple | Joined
    table inheritance all present difficulties
    No DB changes to create new subclasses in Mongo
    Thursday, June 9, 2011

    View full-size slide

  20. Logging
    • Scale and query horizontally
    • Add fields on the fly
    • Writes: Fast, asynchronous, atomic
    Thursday, June 9, 2011

    View full-size slide

  21. Volume Logging
    • Ad impressions
    • Screenshot views
    • Profile views
    Fast, asynchronous writes and sharding FTW!
    Thursday, June 9, 2011

    View full-size slide

  22. Real-time Analytics
    db.trends.update(
    {date: "2011-04-10 13:00"}, // search criteria
    {
    $inc: { // increment
    'user.simplebits.likes_received': 1,
    'country.us.likes_received': 1,
    'city.boston.likes_received': 1
    }
    },
    true // upsert
    )
    What people and locations are trending this hour?
    upsert: Update document (if present) or insert it
    $inc: Increment field by amount (if present) or set to amount
    Thursday, June 9, 2011

    View full-size slide

  23. Flex Benefits
    • Add/nest new fields to measure with ease
    • Atomic upsert with $inc
    Replaces two-step, transactional find-and-update/create
    • Live, cached aggregation
    Thursday, June 9, 2011

    View full-size slide

  24. Scouting
    Thursday, June 9, 2011

    View full-size slide

  25. db.users.insert(
    { name: 'Dan Cederholm',
    available: true,
    skills: ['html', 'css', 'illustration', 'icon design'] }
    )
    Design a Designer
    Thursday, June 9, 2011

    View full-size slide

  26. db.users.ensureIndex({location: '2d'})
    db.users.insert(
    { name: 'Dan Cederholm',
    // Salem longitude/latitude
    location: [-70.8972222, 42.5194444],
    available: true,
    skills: ['html', 'css', 'illustration', 'icon design'] }
    )
    Geospatial Indexing
    Thursday, June 9, 2011

    View full-size slide

  27. Search by Location
    // Find users in the Boston area who:
    // are available for work
    // have expertise in HTML and icon design
    db.users.find({
    location: {$near: boston, $maxDistance: .7234842},
    available: true,
    skills: {$all: ['html', 'icon design']}
    })
    Within area
    // $maxDistance: Find users in Boston area (w/in 50 miles)
    db.users.find({location: {$near: boston, $maxDistance: 0.7234842}})
    Within area, matching criteria
    boston = [-71.0602778, 42.3583333] // long/lat
    Thursday, June 9, 2011

    View full-size slide

  28. Search Power
    Flexible Documents
    +
    Rich Query Language
    +
    Geospatial Indexing
    Thursday, June 9, 2011

    View full-size slide

  29. Stats
    Thursday, June 9, 2011

    View full-size slide

  30. Unique Views
    unique = remote_ip address / DAY
    a.k.a visitors per day
    Thursday, June 9, 2011

    View full-size slide

  31. Collections
    Input and output
    Map
    Returns 0..N key/value pairs per document
    Reduce
    Aggregates values per key
    Aggregate by key => GROUP BY in SQL
    Map/Reduce
    http://www.mongodb.org/display/DOCS/MapReduce
    Thursday, June 9, 2011

    View full-size slide

  32. Strategy
    Two-pass map/reduce to calculate unique visitors
    Pass 1
    GROUP BY: profile, visitor
    COUNT: visits per visitor per profile
    Pass 2
    GROUP BY: profile
    COUNT: visitors
    Thursday, June 9, 2011

    View full-size slide

  33. Profile View Data
    // Profile 1
    {profile_id: 1, remote_ip: '127.0.0.1'}
    {profile_id: 1, remote_ip: '127.0.0.1'}
    {profile_id: 1, remote_ip: '127.0.0.2'}
    // Profile 2
    {profile_id: 2, remote_ip: '127.0.0.4'}
    {profile_id: 2, remote_ip: '127.0.0.4'}
    Visits on a given day
    Thursday, June 9, 2011

    View full-size slide

  34. Pass 1: Map Function
    map = function() {
    var key = {
    profile_id: this.profile_id,
    remote_ip: this.remote_ip
    };
    emit(key, {count: 1});
    }
    Count visits per remote_ip per profile
    KEY = profile, remote_ip
    Thursday, June 9, 2011

    View full-size slide

  35. Reduce Function
    reduce = function(key, values) {
    var count = 0;
    values.forEach(function(v) {
    count += v.count;
    });
    return {count: count};
    }
    Counts
    (occurrences of key)
    Thursday, June 9, 2011

    View full-size slide

  36. Pass 1: Run Map/Reduce
    db.profile_views.mapReduce(map, reduce,
    {out: 'profile_views_by_visitor'}
    )
    // Results: Unique visitors per profile
    db.profile_views_by_visitor.find()
    { "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } }
    { "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } }
    { "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }
    Count visits per remote_ip per profile
    Thursday, June 9, 2011

    View full-size slide

  37. Pass 2: Map/Reduce
    map = function() {
    emit(this._id.profile_id, {count: 1});
    }
    Count visitors per profile
    KEY = profile_id
    Thursday, June 9, 2011

    View full-size slide

  38. Pass 2: Results
    // Same reduce function as before
    db.profile_views_by_visitor.mapReduce(map, reduce,
    {out: 'profile_views_unique'}
    )
    // Results
    db.profile_views_unique.find()
    { "_id" : 1, "value" : { "count" : 2 } }
    { "_id" : 2, "value" : { "count" : 1 } }
    Count visitors per profile
    Thursday, June 9, 2011

    View full-size slide

  39. Map/Deduce
    Large data sets, you get:
    • Horizontal scaling
    • Parallel processing across cluster
    Can be clunkier than GROUP BY in SQL. But ...
    JavaScript functions offers flexibility/power
    Thursday, June 9, 2011

    View full-size slide

  40. Activity
    SELECT * FROM everything;
    Too many tables to JOIN or UNION
    Thursday, June 9, 2011

    View full-size slide

  41. Relational solution
    Denormalized events table as activity log.
    Column | Type |
    ------------------------+-----------------------------+
    id | integer |
    event_type | character varying(255) |
    subject_type | character varying(255) |
    actor_type | character varying(255) |
    secondary_subject_type | character varying(255) |
    subject_id | integer |
    actor_id | integer |
    secondary_subject_id | integer |
    recipient_id | integer |
    secondary_recipient_id | integer |
    created_at | timestamp without time zone |
    We use James Golick’s timeline_fu gem for Rails:
    https://github.com/jamesgolick/timeline_fu
    Thursday, June 9, 2011

    View full-size slide

  42. Direction
    Incoming Activity
    (recipients)
    Generated Activity
    (actors)
    Thursday, June 9, 2011

    View full-size slide

  43. Complications
    Multiple recipients
    • Subscribe to comments for a shot
    • Twitter-style @ mentions in comments
    Confusing names
    • Generic names make queries and view logic hard to follow
    N+1
    • Each event may require several lookups to get actor, subject, etc
    Thursday, June 9, 2011

    View full-size slide

  44. Events in Mongo
    {
    event_type: "created",
    subject_type: "Comment",
    actor_type: "User",
    subject_id: 999,
    actor_id: 1,
    recipients: [], // Multiple recipients
    secondary_recipient_id: 3,
    created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"
    }
    Comment on a Screenshot containing an @ mention
    Screenshot owner and @user should be recipients.
    Mongo version of our timeline_events table
    Thursday, June 9, 2011

    View full-size slide

  45. Mongo Event v.2
    {
    event_type: "created",
    subject_type: "Comment",
    actor_type: "User",
    subject_id: 999,
    actor_id: 1,
    recipients: [1, 2],
    recipients: [
    {user_id: 2, reason: 'screenshot owner'},
    {user_id: 3, reason: 'mention'}
    ],
    created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"
    }
    Why is a user a recipient?
    Thursday, June 9, 2011

    View full-size slide

  46. Mongo Event v.3
    {
    event_type: "created",
    subject_type: "Comment",
    actor_type: "User",
    subject_id: 999,
    actor_id: 1
    user_id: 1,
    comment_id 999,
    screenshot_id: 555,
    recipients: [
    {user_id: 2, reason: 'screenshot owner'},
    {user_id: 3, reason: 'mention'}
    ],
    created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"
    }
    Meaningful names
    Thursday, June 9, 2011

    View full-size slide

  47. Mongo Event v.4
    {
    event_type: "created",
    subject_type: "Comment",
    user_id: 1,
    comment_id: 999,
    screenshot_id: 999,
    user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"},
    comment: {id: 999, text: "Great shot!”},
    screenshot: {id: 555, title: "Shot heard around the world"},
    recipients: [
    {user_id: 2, reason: 'screenshot owner'},
    {user_id: 3, reason: 'mention'}
    ],
    created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"
    }
    Denormalize to eliminate N+1s in view
    Thursday, June 9, 2011

    View full-size slide

  48. Denormalizing?
    You're giving up RDBMs benefits to optimize.
    Optimize your optimizations.
    Document flexibility:
    Data structures can mirror the view
    Thursday, June 9, 2011

    View full-size slide

  49. Caching
    • Grabs free memory as needed; no configured cache size
    • Relies on OS to reclaim memory (LRU)
    http://www.mongodb.org/display/DOCS/Caching
    MongoDB uses memory-mapped files
    Thursday, June 9, 2011

    View full-size slide

  50. Replace Redis/Memcached?
    FREQUENTLY accessed items LIKELY in memory
    Good enough for you?
    One less moving part.
    Thursday, June 9, 2011

    View full-size slide

  51. Cache Namespaces
    // Clear collection to expire
    db.ads_cache.remove()
    'ad_1'
    'ad_2'
    'ad_3'
    Memcached keys are flat
    No simple way to expire all
    Collection
    can serve as an expirable namespace
    Thursday, June 9, 2011

    View full-size slide

  52. Time to Mongo?
    Versatility?
    Data structure flexibility worth more than joins?
    Easier horizontal scaling?
    http://www.mongodb.org
    log | scale | optimize | aggregate | cache
    Thursday, June 9, 2011

    View full-size slide

  53. Cheers!
    Rich Thornett
    Dribbble
    http://dribbble.com
    @frogandcode
    Thursday, June 9, 2011

    View full-size slide