Mixing a persistence cocktail

Mixing a persistence cocktail

The story of scaling your application, the tactics involved with adding new databases to your application, and overcoming the fear of adding new pieces to your application.


Adam Keys

May 17, 2011


  1. Mixing a persistence cocktail Adam Keys RailsConf 2011 @therealadam http://therealadam.com

  2. Imagine a world

  3. The story of scaling Let’s start by taking a whirlwind

    tour through most of the other talks you’ve heard on scaling in the past.
  4. Grow ‗ Prototype it The first chapter in your application’s

    scaling story is…building it. You’ll probably end up using something with ad-hoc queries. Something like MySQL, PostgreSQL, or MongoDB.
  5. class User def self.by_email(address) where(:email => address) end end

  6. Grow ‗ Ship it! This is important! You have to

    ship your app, and then do a bunch of business-y and lean/ agile stuff to get it to stick before you even need to think about scaling.
  7. cap deploy production

  8. Grow ‗ Ad-hoc queries → indices Your application starts to

    get some attention and your scaling story begins. Some kinds of data start to grow really quickly. You find hotspots, track down the queries that aren’t so fast, and you add an index in your database to make that query faster. Rinse, repeat. Most apps don’t go past this point.
  9. add_index :users, :email

  10. Grow ‗ Database indices → specialized storage If your app

    continues to grow, you’ll find yourself extracting indexes into specialized storage. You’ll replace indexed queries (or queries that are difficult to index) with caches like Memcache, Redis, or maybe something custom.
  11. class User after_commit :update_cache def update_cache # Danger: stores a

    # Marshal'd AR object, which is tricky CACHE.set("user:email:#{email}", self) end def self.by_email(address) CACHE.get("user:#{id}:email") end end
  12. Grow ‗ Slow processes → queues and workers At some

    point, your application will accrue more work than you’d really want to do during a transaction/request. That’s when you want to go slightly asynchronous, queue work up and process it out-of-process. Delayed Job is a great way to start, then move up to something like Resque or Rabbit MQ.
  13. class User after_commit :update_recommendations def update_recommendations Resque.enqueue( User::GenerateRecommendations, self.id )

    end end
  14. class User class GenerateRecommendations def perform(id) # ... end end

  15. Grow ‗ Relax consistency requirements As you start to embrace

    caches and queues, it’s important to start thinking about the observability of your data. Depending on how you read and write data, you can get confusing results depending on timing. The good news is, lots of distributed databases are explicit about this, so it’s easy to read about. The bad news is, it’s a bit of brain-bender.
  16. class User def update_cache CASSANDRA.insert( :UserEmails, email, attributes, :consistency =>

    ONE ) end def self.by_email attrs = CASSANDRA.get( :UserEmails, email, :consistency => QUORUM ) end end
  17. Integrating that hot, new database That’s what mixing a persistence

    cocktail looks like when you’re talking about it over beers. But how does it work out in practice?
  18. Integrate ‘ Client libraries After you’ve selected a database for

    your second database, you need a gem to connect to it. This is usually a pretty simple task; there’s usually a widely preferred library for your database. The main caveat is if you’re not on MRI or you’re trying to go non-blocking; in this case, you’ll want to take more care to in selecting a library.
  19. Integrate ‘ Configuration Once you’ve made your selection, you’ll end

    up inventing a configuration file for it. Mostly I see people doing something like `database.yml`, but plain-old-Ruby works great too. Make sure you make it easy to configure different environments. Don’t worry about avoiding global references to connections, it hasn’t seemed to hurt us on Gowalla.
  20. $FANTASTIC = case Rails.env when 'development' Fantastic.new('localhost') when 'test' Fantastic::Mock.new

    when 'staging' Fantastic.new('staging.myapp.com') when 'production' Fantastic.new('fantastic-master.myapp.com') end
  21. development: localhost staging: staging.myapp.com production: fantastic-master.myapp.com

  22. Integrate ‘ Application code Once your connection is configured, it’s

    time to use it in application code. We’ve been using direct access from a global variable in Gowalla and it hasn’t bit us. I’d like to see us adopt something like redis-objects, toystore, etc. to do domain modeling against our uses of Memcached, Redis, etc. but it’s definitely not something that is holding us back.
  23. class UsersController def show email = params[:email] @user = $FANTASTIC.get("user:email:#

    {email}") # ... end end
  24. class UsersController def show email = params[:email] @user = User.by_email(email)

    # ... end end
  25. Integrate ‘ Deployment Once you’ve got a feature coded up

    using your new database it’s time to deploy. Get your ops person to set up the database, figure out exactly which steps you need to roll the new code out in production, and then go for it.
  26. Integrate ‘ Data Migration Once you’re in production for a

    while, it will end up that you need to rejigger how your data is stored. Absent migrations ala AR, you’ve got a couple options. One is read-repair, where you make your application code resilient to different versions of a data structure and only update it on writes. Another is to version the key you’re storing data with and increment the key version when you change the structure.
  27. # Versioned keys class User def by_email CACHE.get("user:#{id}:email/2") end def

    update_cache # Danger: stores a Marshal'd AR object, # which is tricky CACHE.set("user:email:#{email}/2", self) end end
  28. # Read-repair class User # Suppose we added a UserEmailPreference

    # domain object... def by_email obj = CACHE.get("user:#{id}:email") if obj.has_key?(:receive_friend_notice?) UserEmailPreference.from_hash(obj) else # Migrate on every read UserEmailPreference.migrate(obj) end end
  29. def update_cache # Only update the cache on writes CACHE.set(

    "user:email:#{email}", user.email_preference ) end end
  30. Overcoming THE FEAR So that’s the tactical level. But there’s

    another level of tactics that’s really important as you’re mixing your persistence cocktail. It’s easy to develop anxiety as you get close to deploying your new database. There’s so much to go wrong, so much uncertainty. You need the tactics that allow you to overcome THE FEAR.
  31. Fear ☢ Training wheels First off, give yourself a project

    with extremely low stakes. New features, or features you’re not sure you always need work great. The important part is that it’s something with low risk. Low risk means you can push the envelope a bit, which is exactly what adding a new database involves.
  32. Fear ☢ Instrument and log everything Once you’ve got your

    training wheels in place, you need to know how things are working. You need numbers on how often things happen and how much time it takes when they do happen. Use Scout, RPM, or log inspection for this. You’ll also want to log things you’re unsure about. Log profusely, and get handy with grep, sed, and awk for digesting those logs.
  33. class InstrumentedRedis def initialize(connection) self.connection = connection end def method_missing(command,

    *args, &block) ActiveSupport::Notifications.instrument( "request.redis", :command => command, :args => args ) do connection.send(command, *args, &block) end end attr_accessor :connection end
  34. class RedisInstrumenter < ActiveSupport::LogSubscriber def request(event) return unless logger.debug? name

    = "%s (%.1fms)" % ["Redis", event.duration] command = event.payload[:command] args = event.payload[:args].inspect debug " #{color(name, RED, true)} [ #{command} #{args} ]" end end
  35. REDIS_CONFIG = config_for('redis.yml') connection = Redis.new( :host => REDIS_CONFIG["server"] )

    REDIS = if Rails.env.development? RedisInstrumenter.attach_to :redis InstrumentedRedis.new(connection) else connection end
  36. class User def recommendations recommender.for(self) rescue RecommendationError => e backtrace

    = Rails. backtrace_cleaner. clean(e.backtrace). take(5). join(' | ') logger.warn("Recommendation error: you should look into it") logger.warn(e.message) logger.warn(backtrace) end end
  37. Fear ☢ Feature switches, progressive rollout Sometimes, adding new stuff

    doesn’t work out so well. In this case, you want the ability to turn features on and off willy-nilly. Feature toggles make this really easy. Branching in code isn’t the prettiest thing, but it’s a great safety net. The other great thing about toggles is you can roll out to more and more users, making it easy to ease a feature in, rather than hoping it works the first time.
  38. gem 'rollout', :version => '~0.3.0' $feature = Rollout.new(REDIS)

  39. class User def recommenations if $feature.active?(:user_recommendations, self) recommender.for(self) else []

    end end end
  40. u = User.ak $feature.activate_user(:user_recommendations, u) $feature.activate_group (:user_recommendations, :team) $feature.activate_percentage (:user_recommendations,

    10) $feature.activate_group (:user_recommendations, :all)
  41. Fear ☢ Double writes and dark reads When you’re ready

    to use your new database for critical features, you can ease into it with two techniques. Start off by writing data to your existing database _and_ the new one. Once you’ve got the new writes debugged and working, start doing reads from it but discarding the results. Debug, optimize, and remove the double write, only using the new system. Success!
  42. class User after_commit :write_to_fantastic def write_to_fantastic if $feature.active(:fantastic, :all) $FANTASTIC.set(self.key,

    self) end end end
  43. class UsersController after_filter :read_fantastic, :only => [:show, :index] def read_fantastic

    if $feature.active?(:fantastic_read, @user) User.from_fantastic(@user.id) end end # Now watch your performance metrics # and see how your throughput and # latency change. If all goes well, # turn it on for more users and see # if you can break it. end
  44. Fear ☢ Everyday I iterate The most important tool, of

    course, is iteration. Depending on the scope of your project, it could be weeks or month before the new thing is the thing. Everyday, move the ball forward. Everyday, make it better. Everyday, figure out how to make the next step without shooting yourself in the foot. Everyday, deliver business value.
  45. THE FEAR Training wheels Feature flippers Dark reads, double writes

    Iterate, a lot Instrument and log everything Integrate Configuration Deployment Client libraries Application code Data migration Grow 1. Prototype it 2. Ship it 3. Convert ad-hoc queries to indexes 5. Queue it, work it 4. Extract indexes into other systems 6. Go asynchronous 7. Relax consistency So here’s your map. As you can see, it’s all interconnected. That’s the way of things. I think it’s neat.