Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Little Lawyer

The Little Lawyer

A Semi-Intelligent Scaling Technique for Predictable Load Patterns

Alan Norton

March 28, 2018
Tweet

More Decks by Alan Norton

Other Decks in Programming

Transcript

  1. Rails Apps • PostgreSQL • Heroku • Core Engine •

    Legacy API Engine • Consumer App • Restaurant App • CX + Ops App
  2. • Mon 17:00 - “Kitchen Open” ◦ 17:00 - Send

    “Kitchen is Open” notifications to users • Tue 09:30 - “Kitchen Close” ◦ 09:15 - Send “Kitchen Is Closing” notifications to users ◦ 09:30 - Send orders to restaurants • Tue 15:00 - Last pickup-window ends • Tue 17:00 - “Kitchen Open”
  3. # subscriber/lib/clock.rb module Clockwork evening_reservation_day = ->(t) { (0..4).cover? t.wday

    } City.real.with_location.distinct.pluck(:time_zone_name).each do |tz| every(1.day, 'web_scale_high', at: '16:45', tz: tz, if: evening_reservation_day) do HerokuSubscriberScale.new(web: 40, worker: 30).save end every(1.day, 'web_scale_low', at: '17:15', tz: tz, if: evening_reservation_day) do HerokuSubscriberScale.new(web: 15, worker: 10).save end end end
  4. # subscriber/app/models/heroku_scale.rb class HerokuScale include ActiveModel::Model attr_accessor :env, :web, :worker

    def save! heroku.formation.update(heroku_application_name, :web, quantity: web) if web heroku.formation.update(heroku_application_name, :worker, quantity: worker) if worker SlackMessage.new(text: scale_status_message, channel: slack_channel_name).save Rails.logger.info(scale_status_message) true end # ... end
  5. Request Count / 15m Timeslices, 7-day Comparison _source=subscriber-production "heroku router

    - at=info" | timeslice 15m | count by _timeslice | order by _timeslice
  6. Little’s Law Little's Law tells us that the average number

    of customers in the store L, is the effective arrival rate λ, times the average time that a customer spends in the store W L = λW
  7. _source=subscriber-production "app web." AND "method=" | parse "duration=* " as

    ms | timeslice 30m | count(_timeslice) as requests_per_30min, max(ms) as max_duration_ms by _timeslice | outlier max_duration_ms | requests_per_30min / 30 / 60 / 1000 as requests_per_ms | 16 as threads_per_dyno | (requests_per_ms * (max_duration_ms_mean)) as required_thread_count | required_thread_count / threads_per_dyno as required_web_dyno_count | fields _timeslice, required_web_dyno_count | compare with timeshift 1d 7 | sort by _timeslice desc | limit 48
  8. $webscale { "UTC": [ { "duration": "0000-0030", "web": 33, "worker":

    13 }, { "duration": "0030-0100", "web": 35, "worker": 5 }, { "duration": "0100-0130", "web": 20, "worker": 5 }, { "duration": "0130-0200", "web": 18, "worker": 5 }, { "duration": "0200-0230", "web": 16, "worker": 5 }, { "duration": "0230-0300", "web": 15, "worker": 8 }, { "duration": "0300-0330", "web": 15, "worker": 14 }, { "duration": "0330-0400", "web": 15, "worker": 7 }, { "duration": "0400-0430", "web": 15, "worker": 8 }, { "duration": "0430-0500", "web": 15, "worker": 6 }, { "duration": "0500-0530", "web": 15, "worker": 6 }, { "duration": "0530-1230", "web": 15, "worker": 5 }, { "duration": "1230-1300", "web": 21, "worker": 5 }, { "duration": "1300-1330", "web": 25, "worker": 5 }, { "duration": "1330-1400", "web": 20, "worker": 5 }, { "duration": "1400-1430", "web": 22, "worker": 5 }, { "duration": "1430-1500", "web": 18, "worker": 5 }, { "duration": "1500-1530", "web": 24, "worker": 5 }, { "duration": "1530-1600", "web": 39, "worker": 5 }, { "duration": "1600-1630", "web": 49, "worker": 5 }, { "duration": "1630-1700", "web": 45, "worker": 5 }, { "duration": "1700-1730", "web": 48, "worker": 7 }, { "duration": "1730-1800", "web": 26, "worker": 5 }, { "duration": "1800-1830", "web": 25, "worker": 5 }, { "duration": "1830-1900", "web": 30, "worker": 8 }, { "duration": "1900-1930", "web": 32, "worker": 12 }, { "duration": "1930-2000", "web": 25, "worker": 5 }, { "duration": "2000-2030", "web": 24, "worker": 5 }, { "duration": "2030-2100", "web": 41, "worker": 8 }, { "duration": "2100-2130", "web": 60, "worker": 15 }, { "duration": "2130-2200", "web": 37, "worker": 5 }, { "duration": "2200-2230", "web": 32, "worker": 6 }, { "duration": "2230-2300", "web": 30, "worker": 5 }, { "duration": "2300-2330", "web": 30, "worker": 5 }, { "duration": "2330-2359", "web": 30, "worker": 7 } ] }
  9. # subscriber/lib/clock.rb module Clockwork on_the_fives = ["**:00", "**:05", "**:10", "**:15",

    "**:20", "**:25", "**:30", "**:35", "**:40", "**:45", "**:50", "**:55"] if Rails.env.production? every(5.minutes, 'heroku_scale_schedule', at: on_the_fives) do HerokuScaleSchedule.new.scale_now! if ENV['CLOCK_HEROKU_SCALE_SCHEDULE'] == 'true' end end end
  10. class HerokuScaleSchedule attr_reader :as_of delegate :events, to: :timeline def initialize(opts

    = {}) @as_of = (opts.delete(:as_of) || Time.current).utc @durations = JSON.parse(IO.read(Rails.root.join('config', 'heroku_scale_schedule.json'))).deep_symbolize_keys end def scale_now! HerokuScaleJob.set(wait_until: as_of.utc).perform_later(scale) end def scale @scale ||= { web: active_durations.sort_by(&:web).last.web, worker: active_durations.sort_by(&:worker).last.worker } end private def active_durations @active_durations ||= raw_durations.select { |d| d.active_window.cover? as_of } + [LocalScaleDuration.default] end def raw_durations durations.map do |tz_name, duration_config| tz = ActiveSupport::TimeZone.new(tz_name.to_s) duration_config.map do |conf| LocalScaleDuration.new(conf.merge(tz: tz, today_in_tz: as_of.in_time_zone(tz).to_date)) end end.flatten end end
  11. class LocalScaleDuration include ActiveModel::Model attr_accessor :tz, :today_in_tz, :duration attr_writer :web,

    :worker validate :duration_must_start_before_it_ends # not shown def initialize(attrs) super(attrs) validate! end class << self def default new( tz: ActiveSupport::TimeZone['UTC'], today_in_tz: Time.now.utc.to_date, duration: '0000-2359', web: default_web_dynos, worker: default_worker_dynos ) end end def active_window @active_window ||= (starts_at...ends_at) end def starts_at @starts_at ||= Tod::TimeOfDay.parse(duration.split('-')[0]).on(today_in_tz, tz).utc end def ends_at @ends_at ||= Tod::TimeOfDay.parse(duration.split('-')[1]).on(today_in_tz, tz).utc end end
  12. { "America/Chicago": [ { "duration": "0830-0930", "web": 20 }, {

    "duration": "1645-1730", "web": 20, "worker": 10 }, { "duration": "0900-0920", "worker": 20 } ], "America/Los_Angeles": [ { "duration": "0900-0930", "web": 30 }, { "duration": "1645-1730", "web": 40, "worker": 20 }, { "duration": "0900-0920", "worker": 20 } ], "America/New_York": [ { "duration": "0900-0930", "web": 20 }, { "duration": "1200-1300", "web": 30 }, { "duration": "1645-1715", "web": 60, "worker": 10 }, { "duration": "1700-1710", "worker": 50 }, { "duration": "1715-1830", "web": 40, "worker": 10 }, { "duration": "0900-0920", "worker": 20 } ], "Australia/Sydney": [ { "duration": "0900-0930", "web": 20 }, { "duration": "1645-1730", "web": 20, "worker": 10 }, { "duration": "0900-0920", "worker": 20 } ], "Europe/London": [ { "duration": "0900-0930", "web": 20 }, { "duration": "1645-1730", "web": 30, "worker": 15 }, { "duration": "0900-0920", "worker": 20 } ], "Europe/Paris": [ { "duration": "0900-0930", "web": 20 }, { "duration": "1645-1730", "web": 20, "worker": 10 }, { "duration": "0900-0920", "worker": 20 } ] } Bonus Time Zone Overrides
  13. Drawbacks? Sure. • Daylight Savings Time • Config susceptible to

    outliers / outages ◦ No true autoscaling • Clunky ENV on/off ◦ heroku feature:preboot helps • Needs periodic “refresh”
  14. _source=subscriber-production "heroku router - at=info" | parse "service=*ms" as ms

    | timeslice 30m | count(_timeslice) as requests_per_30min, max(ms) as max_duration_ms by _timeslice | outlier max_duration_ms | requests_per_30min / 30 / 60 / 1000 as requests_per_ms | 16 as threads_per_dyno | (requests_per_ms * (max_duration_ms_mean)) as required_thread_count | required_thread_count / threads_per_dyno as required_web_dyno_count | fields _timeslice, required_web_dyno_count | compare with timeshift 1d 7 | sort by _timeslice desc | limit 48
  15. • Database Config • Calculate scale from active customer count

    in TimeZone • Automatically react to trends in period response time ◦ Or just Perf-M Dynos • More & better caching What’s next?