$30 off During Our Annual Pro Sale. View Details »

NoSQL - An Introduction

Kevin Lawver
September 26, 2011

NoSQL - An Introduction

Given at Refresh Savannah last November. The slides are really spare, but man was this a fun one to give.

Kevin Lawver

September 26, 2011
Tweet

More Decks by Kevin Lawver

Other Decks in Technology

Transcript

  1. NoSQL
    Now we know what it’s not... what is it?

    View Slide

  2. What are we running
    from?
    • Relational databases are the defacto
    standard for storing data in a web
    application.
    • A lot of times, that data isn’t really
    relational at all.
    • RDBMS’s have lots of rules that can impact
    performance.

    View Slide

  3. Rules? What Rules?
    • Classic relational databases follow the
    ACID rules:
    • Atomicity
    • Consistency
    • Isolation
    • Durability

    View Slide

  4. Atomicity
    • If any part of the update fails, it all fails.
    • Databases have to be able to lock tables
    and rows for operations, which can block
    or delay other incoming requests.

    View Slide

  5. Consistency
    • After a transaction, all copies of the data
    must be consistent with each other (my
    interpretation).
    • Replication across lots of shards is
    expensive especially if there’s locking
    involved.

    View Slide

  6. Isolation
    • Data involved in a transaction must be
    inaccessible to other operations.
    • Remember the thing about locked rows
    and tables?
    • It’s a bummer.

    View Slide

  7. Durability
    • Once a user is notified that a transaction
    has completed, the data must be accessible
    and all integrity constraints have been met.

    View Slide

  8. I come not to bury
    MySQL...
    • Relational databases are great for a lot of
    uses.
    • If you have data that’s actually relational and
    you need transactions, joins and have a
    limited number of data types, then an
    RDBMS will work for you.

    View Slide

  9. But...
    • RDBMS’s have been
    treated like hammers
    and used for things
    they’re not good at and
    weren’t designed for.
    • Like the web...

    View Slide

  10. Thus were born...
    • Key-Value Stores
    • Wide-Column Stores
    • Document Stores/Databases
    • Graph Databases

    View Slide

  11. All thrown together &
    clumsily dubbed...

    View Slide

  12. NoSQL

    View Slide

  13. Which, despite it’s
    negative sound,
    supposedly means:
    “Not Only SQL”

    View Slide

  14. Yeah, I don’t believe it
    either...

    View Slide

  15. Key-Value
    Just what it sounds like. You set a Key to a Value and
    can then retrieve it.

    View Slide

  16. Key-Value Benefits
    • Simple
    • High performance (usually) because there
    are no transactions or relations so it’s a
    simple bucket and lookup.
    • Extremely flexible
    • Commonly used as caches in front of
    slower resources (like MySQL - bazinga!)

    View Slide

  17. Popular Players
    • memcached - in memory only, extremely
    efficient hashing algorithm allows you to
    scale easily to hundreds of nodes.
    • Redis - persistent, slightly more complex
    than memcached (has support for arrays)
    but still highly performant.
    • Riak - The Rails Machine guys love it. Jesse?

    View Slide

  18. My Uses
    • memcached: Read-through cache for
    Rails with cache-money.
    • redis: persistent cache for results from
    our algorithm, partitioned by version and
    instance.

    View Slide

  19. Wide Column
    • Family of databases modeled on either
    Google’s BigTable or Amazon’s Dynamo.
    • Pick two out of three from the CAP
    theorem in order to get horizontal
    scalability.
    • Data stored by column instead of by row.

    View Slide

  20. CAP?
    • Consistency: All clients always have the
    same view of the data.
    • Availability: Each client can always read
    and write.
    • Partition Tolerance: The system works
    well despite physical network partitions

    View Slide

  21. Use cases
    • Making sense out of large amounts of data
    where you know your query scenario
    ahead of time.
    • Large = 100s of millions of records.
    • Data-mining log files and other sources of
    similar data.

    View Slide

  22. Big Players
    • HBase
    • Cassandra
    • Hypertable
    • Amazon’s SimpleDB
    • Google’s BigTable (the granddaddy of all of
    them)

    View Slide

  23. Graph Databases
    • Store nodes, edges and properties
    • Think of them as Things, Connections and
    Properties
    • Good for storing properties and
    relationships.
    • Honestly, I don’t fully understand them...
    anyone?

    View Slide

  24. The Players
    • Neo4j
    • FlockDB
    • HyperGraphDB

    View Slide

  25. Document Stores
    • Short on relationships, tall on rich data
    types.
    • Big on eventual consistency and flexible
    schemas.
    • Hybrid of traditional RDBMS and Key-Value
    stores.

    View Slide

  26. Use Cases
    • Content Management Systems
    • Applications with rapid partial updates
    • Anything you don’t need joins or
    transactions for that you would normally
    use a RDBMS for.

    View Slide

  27. The Players
    • CouchDB
    • MongoDB
    • Terrastore

    View Slide

  28. MongoDB
    • Support for rich data types: arrays, hashes,
    embedded documents, etc
    • Support for adding and removing things
    from arrays and embedded documents
    (addToSet, for example).
    • Map/Reduce support and strong indexes
    • Regular expression support in queries

    View Slide

  29. Design Considerations
    • Embedded Documents - Use only if it
    the embedded document will always be
    selected with the parent.
    • Indexes - MongoDB punishes you much
    earlier for missing indexes than MySQL.
    • Document size - Currently, documents
    are limited to 4MB, which should be large
    enough, but if it’s not...

    View Slide

  30. Real-World MongoDB
    • We use MongoDB heavily at MIS.
    • Statistics application and reporting
    • Top-secret new application
    • Web crawler and indexer
    • CMS

    View Slide

  31. Real-World Example
    Let’s do tags. Everything is taggable now, right?

    View Slide

  32. The MySQL Way

    View Slide

  33. Schema

    View Slide

  34. And to get a “thing’s”
    tags?
    SELECT `tags`.* FROM `tags`
    INNER JOIN `taggings` ON `tags`.id = `taggings`.tag_id
    WHERE ((`taggings`.taggable_id = 237)
    AND (`taggings`.taggable_type = 'Song'))

    View Slide

  35. Yuck!
    That’s a lot of pain for something so simple.
    And I didn’t even show you finding things with tag “x”.
    Or how to set and unset tags on a “thing”.
    Ouch.

    View Slide

  36. The MongoDB Way
    Using MongoMapper and Rails 3

    View Slide

  37. class Post
    include MongoMapper::Document
    key :title, String
    key :body, String
    key :tags, Array
    ensure_index :tags
    end

    View Slide

  38. Let’s Make This Easy...
    def add_tag(tag)
    tag = Post.clean_tag(tag)
    self.tags << tag
    self.add_to_set(:tags => tag) unless self.new_record?
    end
    def remove_tag(tag)
    tag = Post.clean_tag(tag)
    self.tags.delete(tag)
    self.pull(:tags => tag) unless self.new_record?
    end
    def self.clean_tag(str)
    str.strip.downcase.gsub(" ","-").gsub(/[^a-z0-9-]/,"")
    end
    def self.clean_tags(str)
    out = []
    arr = str.split(",")
    arr.each do |t|
    out << self.clean_tag(t)
    end
    out
    end

    View Slide

  39. Demo Time
    Sorry if you’re looking at this later, but it’s console time!

    View Slide

  40. Why I Love MongoDB
    • Document model fits how I build web apps.
    • For most apps, I don’t need transactions.
    • Eventual consistency is actually OK.
    • Partial updates and arrays make things that
    are a pain in SQL-land absolutely painless.
    • It’s just smart enough without getting in the
    way.

    View Slide

  41. What’s NoSQL, really?
    • The right tool for the job.
    • We’ve got lots of options for storing
    application data.
    • The key is picking the one that solves our
    real problem.
    • And if an RDBMS is the right tool, that’s OK
    too.

    View Slide

  42. Questions?

    View Slide

  43. Further Reading
    • Visual NoSQL: http://blog.nahurst.com/
    visual-guide-to-nosql-systems
    • MongoDB: http://mongodb.org
    • MongoMapper: http://mongomapper.com/

    View Slide

  44. Thanks!
    • Kevin Lawver
    • @kplawver
    [email protected]
    • http://kevinlawver.com

    View Slide