Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fog City Ruby - Caching Talk

bloc
June 14, 2017

Fog City Ruby - Caching Talk

These slides are from my caching talk at the Fog City Ruby meetup on Tuesday, June 13th, 2017
Talk summary:
"Site's down. Out of memory errors. Identified the problem, fixed it, wrote a juicy post mortem. Come learn how to NOT implement caching in your Rails app. I'll talk about the various types of caches, forming the proper cache keys, and increasing site performance. Who doesn't love a classic 10x speedup?"

bloc

June 14, 2017
Tweet

More Decks by bloc

Other Decks in Education

Transcript

  1. Hi, I’m Megan Tech Lead @ Bloc part-time online bootcamp

    for aspiring developers and designers
  2. Caching at Bloc: Post Mortem + Tech Talk • Debugging

    an unknown downtime • Learning how Rails caches work • How Bloc implemented caching • How we fixed it
  3. The Timeline Bloc has two alert mechanisms: Rollbar for exceptions,

    Pingdom for downtime Wednesday 8:46pm Rollbar error, single occurrence. No pingdom alert. Thursday 4:00am Pingdom alert for various landing pages Thursday 7:17am Pingdom alert for bloc.io, intermittent uptime Thursday 8:00am Consistent downtime for bloc.io Thursday 8:30am Begin investigation on the bus Thursday 9:20am Adjusted redistogo memory limit from 2GB to 5GB
  4. The Investigation The initial error surfaced by Rollbar: Redis::CommandError: OOM

    command not allowed when used memory > 'maxmemory'. ↳ Search through codebase for familiar code: none ↳ Check heroku database configuration: there’s a Redis To Go add-on ↳ Memory usage at 2GB, 100%: that must be it. What do we use redis for? How do we increase/decrease usage? What if we turned it off? Will restarting it do anything?
  5. Initial Conclusions • Not a customer scaling issue • Scales

    linearly with time • Key usage and memory usage are correlated • Memory has climbed before and hit the max, what happened there? fails to reset fails to reset reaches memory maximum no data linear slope
  6. Digging In Further What do we use redis for? •

    Caching ◦ Maybe our expirations aren’t working? • Resque workers (emails, enrollment) ◦ /resque/overview has multiple failed jobs Failed attempts: • Restart redis db: no effect on memory • Clear failed redis jobs: no effect on memory (still a problem to fix though) Success: • Just throw money at the problem! Hot fix alert Now we need to figure out the real fix before June...
  7. What is caching? Save it for later. • Every layer

    of hardware and software • Keys with values, might get expired kind of like a menu at a restaurant
  8. Rails Cache Settings Multiple Types: • Page caching • Action

    caching • Fragment caching ◦ Russian doll caching • Low-level caching • SQL caching Enabled through two variables in the configuration file: config.action_controller.perform_caching = true (does not affect low level caching) config.cache_store = :redis_store
  9. Rails Caching: Page Caching Using the actionpack-page_caching gem Code looks

    like: class WeblogController < ActionController::Base caches_page :show, :new end Keys look like: cache:views/www.bloc.io/users/carlton-banks/checkpoints/all Caveats: • Can’t be used for controller actions that have before filters Page URL
  10. Rails Caching: Action Caching Using the actionpack-action_caching gem Code looks

    like: class LandingController < ApplicationController caches_action :about, expires_in: 6.hour, if: -> { guest? } caches_action :mentors, expires_in: 6.hour, if: -> { guest? } end Keys looks like: cache:views/www.bloc.io/mentors Page URL
  11. Object cache key Rails Caching: Fragment Caching (Part 1) Natively

    exists in Rails. Can fragment cache a partial or an object instance. Caching an instance Code looks like: / app/views/alum_stories/layouts/_text_photo_layout.html.haml - cache story do .headline = story.headline Key looks like: cache:views/alum_stories/3-20150430203540769345000/44e227db194e26779273184afa632eef md5 hash of view contents story.cache_key
  12. Rails Caching: Fragment Caching (Part 2) Natively exists in Rails.

    Can fragment cache a partial or an object instance. Caching a partial Code looks like: / app/views/layouts/_footer.html.haml - cache do .full-page-footer %h6 Programs Keys look like cache:views/www.bloc.io/users/nicky-banks/9e8e8931e0697aff02864b001ebdc99c md5 hash of view contents Page URL
  13. Rails Caching: Low-Level Caching Natively exists in Rails. Code looks

    like: Rails.cache.fetch('my_unique_cache_key') do Calculator.new.expensive_calculation end Keys look like: cache:my_unique_cache_key The cache key is key: • Unique to the object’s current state • If you expect to search keys at some point, namespacing is great specified key
  14. Cache Keys: Expiration For each level of caching, you can

    set an expiration: Rails.cache.fetch(my_unique_cache_key, expires_in: 6.hours) An expiration value gets set on the key, aka TTL The database clears that key and cache once it expires. Keys: 137 Expires: 2 Memory Used: 1.77MB Expired Keys: 10 Evicted Keys: 0 Keys: 138 Expires: 3 Memory Used: 1.86MB Expired Keys: 10 Evicted Keys: 0 Keys: 137 Expires: 2 Memory Used: 1.77MB Expired Keys: 11 Evicted Keys: 0 Load cached page that expires in a minute Wait a minute > redis-cli info
  15. Cache Keys: Eviction The redis configuration has an eviction policy

    of keys: noeviction | allkeys-lru | volatile-lru | allkeys-random | volatile-random | volatile-ttl Some keywords: • allkeys will expire keys regardless of expiration • volatile only evicts keys with an expiration set • random selects keys randomly • lru selects keys that are less recently used • ttl selects keys with least time to live
  16. Profiling Our Keys Using redis-audit, I found the distribution of

    a sampling of keys: > ruby redis-audit.rb -h spinyfin.redistogo.com -p 9340 -a mypassword -s 1000 Sampling 1000 keys, 0.2% of total keys, the tool found 16 groups.
  17. The problem is caching The 16 groups found by the

    tool actually narrowed down to 9, where 4 groups accounted for more than 1% cache:views/www.bloc.io/users/carlton-banks/checkp... 46.94% .38% cache:views/checkpoints/1368-201607251822231046450... 24.54% 0% cache:https://github.com/will/bloc-jams/commit/aef... 21.34% 0% cache:111325,111326,111327,111328,111329,111330,11... 5.55% 0% other caches + workers 1.63% 0% Memory Usage % Expires
  18. The problem is caching: expiration + eviction Of all our

    keys, 0.06% have expiration dates, 294 keys of 435,474. Our max memory policy is volatile-lru.
  19. Fixing Key Eviction We use Redis To Go as our

    redis server host. Pro: We don’t have to maintain the server Con: Limited options, just maxmemory
  20. Fixing expiration • Page caching • Action caching (our action

    caching has proper expirations for guest users) • Fragment caching (71%) • Low-level caching (27%) • SQL caching
  21. Keys like: cache:views/www.bloc.io/users/carlton-banks/9e8e8931e0697aff02864b001ebdc99c Code is: / app/views/layouts/_footer.html.haml - cache do

    Fixing the footer cache: 47% Debugging Questions: • How big is this key space? ◦ Dependent on page URL (including user) and content hash • What’s wrong with this key? ◦ Many unique keys for same content • Do we need to cache this? ◦ It’s not computationally intensive, so no. TODO: delete existing keys, remove caching mechanism in code
  22. Fixing the checkpoint nav cache: 25% Keys like: cache:views/checkpoints/1368-20160725182223104645000/roadmap_sections/95-201602110012 50316888000/users/2302534-20160904083834996975000/user/f21882266d6ec6d0f9331f1e16d3c1

    76 Code is: / app/views/users/checkpoints/_checkpoint_nav.html.haml - cache [@checkpoint, @checkpoint.section, @user, current_user.role] do • How big is this key space? ◦ Dependent on checkpoint, section, user, and role: big. • What’s wrong with this key? ◦ There’s a lot of them for similar data • Do we need to cache this? ◦ Might be able to use lower level caching of position, index, etc. TODO: delete existing keys, set expiration on new keys
  23. Fixing the github commit cache: 21% Keys like: cache:https://github.com/will/Blocly/commit/c0ad999d1bd91361858481e807724410407e Code

    is: commit = Rails.cache.fetch(commit_link) do Github::Commit.new(commit_link) end • How big is this key space? ◦ # uniq commits for all students • What’s wrong with this key? ◦ It’s not namespaced • Do we need to cache this? ◦ Seems like a good idea TODO: namespace key, set expiration on new keys, delete existing keys
  24. Fixing the calendar cache: 6% Keys like: 111325,111326,111327,111328,111329,111330,111331,111332,111333,111334,111335,111336,... Code is:

    key = @appointments.pluck(:id).sort.join(',') @data = Rails.cache.fetch (key) do Appointment.calendar_for(@appointments, @user).export end
  25. Fixing the calendar cache: 6% Keys like: 111325,111326,111327,111328,111329,111330,111331,111332,111333,111334,111335,111336,... Code is:

    key = @appointments.pluck(:id).sort.join(',') @data = Rails.cache.fetch (key) do Appointment.calendar_for(@appointments, @user).export end • How big is this key space? ◦ All combination of appointment keys • What’s wrong with this key? ◦ Freakin nonsensical, no human words • Do we need to cache this? ◦ Sure but not with this key TODO: use a better key, set expiration, delete existing keys
  26. Another Cache Key Learning One more mistake when implementing our

    improved caching: Keys like: calendar_export:appointments/709028-20170612032235056661000 Code is: most_recent_appointment = @appointments.reorder(:updated_at).last cache_key = most_recent_appointment.cache_key Rails.cache.fetch(cache_key, expires_in: 1.day) do Appointment.calendar_for(@appointments, @user).export end Most recent appointment as the cache key! Smart, right? Wrong. Forgot the user, we needed to add the user’s cache key too: calendar_export:users/2389812-20170612042305184969000:appointments/709028-20170612032 235056661000
  27. Takeaways • When determining cache keys: ◦ Consider the keyspace

    ◦ Consider how the keyspace correlates to time, scale ◦ Set an expiration that makes sense according to the volatility of the cached information • Become familiar with your app’s caching mechanism: ◦ Cache implementation, expiration policy, default expiration • Use caching to quickly make massive performance improvements • If using redis, check out redis-cli for cache debugging > redis-cli monitor > redis-cli info > redis-cli get <key> > redis-cli -h HOST -p PORT -a MYPASSWORD
  28. What do the cache values look like? For view caches,

    the full controller response as a string. For object caches, the object encoded in a JSON string.
  29. How do you debug the redis cache? > redis-cli >

    redis-cli -h HOST -p PORT -a MYPASSWORD Most used commands: > info > monitor > ttl <key> > get <key> > expire <key> <seconds>
  30. How does Redis use the LRU algorithm? Redis uses an

    approximated LRU algorithm, using a sampling size. The light gray band are objects that were evicted. The gray band are objects that were not evicted. The green band are objects that were added.
  31. References Rails cache overview Using Redis as an LRU cache

    DHH's key-based cache expiration overview redis-audit