Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL, Redis, Mongo, Together in Peace

MySQL, Redis, Mongo, Together in Peace

Ben Regenspan

October 15, 2012
Tweet

Other Decks in Technology

Transcript

  1. ABOUT ME Started at Huffington Post helping scale and build

    new features, now CTO at Soho Tech Labs Learning to use Keynote Generalist programmer I like the internet I don’t love databases I just want some place to put my data
  2. THIS PRESENTATION Discussing MySQL, Redis, Mongo Starting with the bad

    parts Progressing to the strengths Debunking the (possibly strawman) argument that there is some magic DB that is suited for everything. An example of how the various solutions can play well together
  3. MySQL • “Doesn’t Scale” • It does, just a lot

    of work • Facebook still using it for most data • 60 million+ queries/second • Horizontal scaling of big tables: Sharding • Facebook has 4000+ shards • That is a lot of shards • They had to roll their own, as do most people who use MySQL • MySQL Cluster auto-shards, but...
  4. MySQL: A not-at-all- contrived example of an expensive alter •

    ALTERs are expensive • Very realistic example: • We’re the City of New York, we have a database of a little over 8 million people • Sometimes we want to track new things about these people
  5. MySQL • We store first name and last name already

    • We want to start tracking whether each person is an artisanal pickler •
  6. MySQL • A naive “home timeline” implementation: • User •

    Friendship (~100 per user) • Posts (millions+)
  7. MySQL • Very Relational! • Performs OK when Friendship table

    is small and Post table is small. • Degrades rapidly as tables grow. • Twitter does not use this query.
  8. MongoDB • Non-relational • Arbitrary reports harder than via RDBMS

    • Very new • Famously uses Global Write Lock • Less people out there with experience scaling it • Less people with experience building on it • Many tickets out to improve key deficits • Fewer tools -- for e.g., migration of old documents • Devs familiar with RDBMS need to understand new concepts
  9. Redis • In-memory • Non-relational • Simple types and structures

    • Hashes != RDBMS rows or Mongo documents, can only contain integers and strings
  10. It’s not all bad. • MySQL • it’s an RDBMS!

    • it’s been around awhile - stable • fast, with the right optimizations • Mongo • schemaless, no ALTER needed • automatic failover in replica sets • auto-sharding • Redis • Memcached on steroids
  11. Rebelmouse stack • Posts are stored in Mongo • Needed

    properties evolve fast, we don’t want ALTERs or rigid structure • Order of posts is stored in Redis, per-site • Single Redis instance can serve nearly a million QPS -- gives huge amount of headroom • We can query for e.g. latest post IDs with almost no overhead • Very memory-efficient It’s the hybrid!
  12. Rebelmouse stack • Redis (continued) • Writes are cheap. We

    can compose the list of posts needed to show a home timeline in advance. • Frontends don’t have to query “most recent posts by [list of 1000 users the user i friends with]” • Great for storing summary counts and similar -- INCRBY command • Remove Memcached. Memcached is great, but unneeded if you’re using Redis.
  13. Rebelmouse stack • MySQL • Slower-changing data • Users table

    • User preferences, etc. • Arbitrary SQL queries can be made for reporting, e.g. # of users with connected Twitter accounts • Simpler to find developers • Code re-use • Not sexy, but known quantity
  14. “Other” • No DB suits all tasks • Postgres is

    still the gold standard for geo • Good spatial indexing • Can build on top of e.g. GeoDjango • On Casahop.com - get nearby houses, upcoming search within specific geographic area, etc.
  15. Questions? post to meetup [email protected] rebelmouse.com/benregenspan Thanks To Nike and

    rest of the team at Rebelmouse and Soho Tech Labs for building an awesome “hybrid stack” example