Fog City Ruby - Caching Talk

Caching at Bloc Post Mortem + Tech Talk

Hi, I’m Megan Tech Lead @ Bloc part-time online bootcamp
for aspiring developers and designers

Caching at Bloc: Post Mortem + Tech Talk • Debugging
an unknown downtime • Learning how Rails caches work • How Bloc implemented caching • How we fixed it

Memory Overuse Incident - November

This is a story all about how...

The Timeline Bloc has two alert mechanisms: Rollbar for exceptions,
Pingdom for downtime Wednesday 8:46pm Rollbar error, single occurrence. No pingdom alert. Thursday 4:00am Pingdom alert for various landing pages Thursday 7:17am Pingdom alert for bloc.io, intermittent uptime Thursday 8:00am Consistent downtime for bloc.io Thursday 8:30am Begin investigation on the bus Thursday 9:20am Adjusted redistogo memory limit from 2GB to 5GB

So What Happened? ? !?

The Investigation The initial error surfaced by Rollbar: Redis::CommandError: OOM
command not allowed when used memory > 'maxmemory'. ↳ Search through codebase for familiar code: none ↳ Check heroku database configuration: there’s a Redis To Go add-on ↳ Memory usage at 2GB, 100%: that must be it. What do we use redis for? How do we increase/decrease usage? What if we turned it off? Will restarting it do anything?

Interesting Graphs What’s interesting about these?

Initial Conclusions • Not a customer scaling issue • Scales
linearly with time • Key usage and memory usage are correlated • Memory has climbed before and hit the max, what happened there? fails to reset fails to reset reaches memory maximum no data linear slope

Digging In Further What do we use redis for? •
Caching ◦ Maybe our expirations aren’t working? • Resque workers (emails, enrollment) ◦ /resque/overview has multiple failed jobs Failed attempts: • Restart redis db: no effect on memory • Clear failed redis jobs: no effect on memory (still a problem to fix though) Success: • Just throw money at the problem! Hot fix alert Now we need to figure out the real fix before June...

Rails Caching: An Overview

What is caching? Save it for later. • Every layer
of hardware and software • Keys with values, might get expired kind of like a menu at a restaurant

Rails Cache Settings Multiple Types: • Page caching • Action
caching • Fragment caching ◦ Russian doll caching • Low-level caching • SQL caching Enabled through two variables in the configuration file: config.action_controller.perform_caching = true (does not affect low level caching) config.cache_store = :redis_store

Rails Caching: Page Caching Using the actionpack-page_caching gem Code looks
like: class WeblogController < ActionController::Base caches_page :show, :new end Keys look like: cache:views/www.bloc.io/users/carlton-banks/checkpoints/all Caveats: • Can’t be used for controller actions that have before filters Page URL

Rails Caching: Action Caching Using the actionpack-action_caching gem Code looks
like: class LandingController < ApplicationController caches_action :about, expires_in: 6.hour, if: -> { guest? } caches_action :mentors, expires_in: 6.hour, if: -> { guest? } end Keys looks like: cache:views/www.bloc.io/mentors Page URL

Object cache key Rails Caching: Fragment Caching (Part 1) Natively
exists in Rails. Can fragment cache a partial or an object instance. Caching an instance Code looks like: / app/views/alum_stories/layouts/_text_photo_layout.html.haml - cache story do .headline = story.headline Key looks like: cache:views/alum_stories/3-20150430203540769345000/44e227db194e26779273184afa632eef md5 hash of view contents story.cache_key

Rails Caching: Fragment Caching (Part 2) Natively exists in Rails.
Can fragment cache a partial or an object instance. Caching a partial Code looks like: / app/views/layouts/_footer.html.haml - cache do .full-page-footer %h6 Programs Keys look like cache:views/www.bloc.io/users/nicky-banks/9e8e8931e0697aff02864b001ebdc99c md5 hash of view contents Page URL

Rails Caching: Low-Level Caching Natively exists in Rails. Code looks
like: Rails.cache.fetch('my_unique_cache_key') do Calculator.new.expensive_calculation end Keys look like: cache:my_unique_cache_key The cache key is key: • Unique to the object’s current state • If you expect to search keys at some point, namespacing is great specified key

Rails Caching: SQL Caching

Cache Keys: Expiration For each level of caching, you can
set an expiration: Rails.cache.fetch(my_unique_cache_key, expires_in: 6.hours) An expiration value gets set on the key, aka TTL The database clears that key and cache once it expires. Keys: 137 Expires: 2 Memory Used: 1.77MB Expired Keys: 10 Evicted Keys: 0 Keys: 138 Expires: 3 Memory Used: 1.86MB Expired Keys: 10 Evicted Keys: 0 Keys: 137 Expires: 2 Memory Used: 1.77MB Expired Keys: 11 Evicted Keys: 0 Load cached page that expires in a minute Wait a minute > redis-cli info

Cache Keys: Eviction The redis configuration has an eviction policy
of keys: noeviction | allkeys-lru | volatile-lru | allkeys-random | volatile-random | volatile-ttl Some keywords: • allkeys will expire keys regardless of expiration • volatile only evicts keys with an expiration set • random selects keys randomly • lru selects keys that are less recently used • ttl selects keys with least time to live

Okay so that’s how caching works What did we do
wrong?

Profiling Our Keys Using redis-audit, I found the distribution of
a sampling of keys: > ruby redis-audit.rb -h spinyfin.redistogo.com -p 9340 -a mypassword -s 1000 Sampling 1000 keys, 0.2% of total keys, the tool found 16 groups.

The problem is caching The 16 groups found by the
tool actually narrowed down to 9, where 4 groups accounted for more than 1% cache:views/www.bloc.io/users/carlton-banks/checkp... 46.94% .38% cache:views/checkpoints/1368-201607251822231046450... 24.54% 0% cache:https://github.com/will/bloc-jams/commit/aef... 21.34% 0% cache:111325,111326,111327,111328,111329,111330,11... 5.55% 0% other caches + workers 1.63% 0% Memory Usage % Expires

The problem is caching: expiration + eviction Of all our
keys, 0.06% have expiration dates, 294 keys of 435,474. Our max memory policy is volatile-lru.

Fixing Key Eviction We use Redis To Go as our
redis server host. Pro: We don’t have to maintain the server Con: Limited options, just maxmemory

Fixing expiration • Page caching • Action caching (our action
caching has proper expirations for guest users) • Fragment caching (71%) • Low-level caching (27%) • SQL caching

Fixing the footer cache: 47% Keys like: cache:views/www.bloc.io/users/carlton-banks/9e8e8931e0697aff02864b001ebdc99c Code is:
/ app/views/layouts/_footer.html.haml - cache do

Keys like: cache:views/www.bloc.io/users/carlton-banks/9e8e8931e0697aff02864b001ebdc99c Code is: / app/views/layouts/_footer.html.haml - cache do
Fixing the footer cache: 47% Debugging Questions: • How big is this key space? ◦ Dependent on page URL (including user) and content hash • What’s wrong with this key? ◦ Many unique keys for same content • Do we need to cache this? ◦ It’s not computationally intensive, so no. TODO: delete existing keys, remove caching mechanism in code

Fixing the checkpoint nav cache: 25% Keys like: cache:views/checkpoints/1368-20160725182223104645000/roadmap_sections/95-201602110012 50316888000/users/2302534-20160904083834996975000/user/f21882266d6ec6d0f9331f1e16d3c1
76 Code is: / app/views/users/checkpoints/_checkpoint_nav.html.haml - cache [@checkpoint, @checkpoint.section, @user, current_user.role] do

Fixing the checkpoint nav cache: 25% Keys like: cache:views/checkpoints/1368-20160725182223104645000/roadmap_sections/95-201602110012 50316888000/users/2302534-20160904083834996975000/user/f21882266d6ec6d0f9331f1e16d3c1
76 Code is: / app/views/users/checkpoints/_checkpoint_nav.html.haml - cache [@checkpoint, @checkpoint.section, @user, current_user.role] do • How big is this key space? ◦ Dependent on checkpoint, section, user, and role: big. • What’s wrong with this key? ◦ There’s a lot of them for similar data • Do we need to cache this? ◦ Might be able to use lower level caching of position, index, etc. TODO: delete existing keys, set expiration on new keys

Fixing the github commit cache: 21% Keys like: cache:https://github.com/will/Blocly/commit/c0ad999d1bd91361858481e807724410407e Code
is: commit = Rails.cache.fetch(commit_link) do Github::Commit.new(commit_link) end

Fixing the github commit cache: 21% Keys like: cache:https://github.com/will/Blocly/commit/c0ad999d1bd91361858481e807724410407e Code
is: commit = Rails.cache.fetch(commit_link) do Github::Commit.new(commit_link) end • How big is this key space? ◦ # uniq commits for all students • What’s wrong with this key? ◦ It’s not namespaced • Do we need to cache this? ◦ Seems like a good idea TODO: namespace key, set expiration on new keys, delete existing keys

Fixing the calendar cache: 6% Keys like: 111325,111326,111327,111328,111329,111330,111331,111332,111333,111334,111335,111336,... Code is:
key = @appointments.pluck(:id).sort.join(',') @data = Rails.cache.fetch (key) do Appointment.calendar_for(@appointments, @user).export end

Fixing the calendar cache: 6% Keys like: 111325,111326,111327,111328,111329,111330,111331,111332,111333,111334,111335,111336,... Code is:
key = @appointments.pluck(:id).sort.join(',') @data = Rails.cache.fetch (key) do Appointment.calendar_for(@appointments, @user).export end • How big is this key space? ◦ All combination of appointment keys • What’s wrong with this key? ◦ Freakin nonsensical, no human words • Do we need to cache this? ◦ Sure but not with this key TODO: use a better key, set expiration, delete existing keys

Results

Massive Performance Improvements We applied the correct caching technique to
our slow api endpoints: THE GOAL

Constant Memory Usage: Predictable!

Another Cache Key Learning One more mistake when implementing our
improved caching: Keys like: calendar_export:appointments/709028-20170612032235056661000 Code is: most_recent_appointment = @appointments.reorder(:updated_at).last cache_key = most_recent_appointment.cache_key Rails.cache.fetch(cache_key, expires_in: 1.day) do Appointment.calendar_for(@appointments, @user).export end Most recent appointment as the cache key! Smart, right? Wrong. Forgot the user, we needed to add the user’s cache key too: calendar_export:users/2389812-20170612042305184969000:appointments/709028-20170612032 235056661000

Takeaways • When determining cache keys: ◦ Consider the keyspace
◦ Consider how the keyspace correlates to time, scale ◦ Set an expiration that makes sense according to the volatility of the cached information • Become familiar with your app’s caching mechanism: ◦ Cache implementation, expiration policy, default expiration • Use caching to quickly make massive performance improvements • If using redis, check out redis-cli for cache debugging > redis-cli monitor > redis-cli info > redis-cli get <key> > redis-cli -h HOST -p PORT -a MYPASSWORD

The End! twitter: @meganmarie610 email: megan@bloc.io and the classic: we’re
hiring at Bloc!

Appendix

What do the cache values look like? For view caches,
the full controller response as a string. For object caches, the object encoded in a JSON string.

How do you debug the redis cache? > redis-cli >
redis-cli -h HOST -p PORT -a MYPASSWORD Most used commands: > info > monitor > ttl <key> > get <key> > expire <key> <seconds>

How does Redis use the LRU algorithm? Redis uses an
approximated LRU algorithm, using a sampling size. The light gray band are objects that were evicted. The gray band are objects that were not evicted. The green band are objects that were added.

References Rails cache overview Using Redis as an LRU cache
DHH's key-based cache expiration overview redis-audit

Fog City Ruby - Caching Talk

Fog City Ruby - Caching Talk

More Decks by bloc

Other Decks in Education

Featured

Transcript