Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Knobs, Buttons, and Switches: operating your application at scale

Knobs, Buttons, and Switches: operating your application at scale

Pilots have the flight deck, Captain Kirk had his bridge, but what do you have for managing failure in your application?

Every app comes under stress, whether it's from downstream failures to unmaintainable high load to a spike in intensive requests. We'll cover code patterns you can use to change the behavior of your application on the fly to gracefully fail.

You’ll walk away from this talk with tools you can have on hand to ensure you remain in control even when your application is under stress.

C904d45853b2e4de64d080c6630c0d8f?s=128

Amy Unger

April 17, 2018
Tweet

Transcript

  1. Knobs, buttons & switches Operating your application at scale

  2. Got Options?

  3. Application Resilience

  4. None
  5. None
  6. None
  7. None
  8. None
  9. 1. Maintenance Mode 2. Read-Only Mode 3. Feature Flags 4.

    Rate Limits 5. Non-critical work 6. New code paths 7. Circuit Breakers
  10. 1 Maintenance Mode

  11. Environment Variable MAINTENANCE_MODE=ON

  12. 2 Read-only Mode

  13. READONLY_MODE=ON Environment Variable

  14. 3 Feature Flags

  15. Feature Flags Granularity • User • Global • Group

  16. ApplicationSetting.get(‘billing-enabled’) Datastore

  17. ApplicationSetting.get(‘billing-enabled-eu’) Datastore

  18. 4 Rate Limits

  19. Surge pricing Gray Rate Limits

  20. Preferred traffic Gray Rate Limits

  21. Default & modifiers Gray Rate Limits

  22. ApplicationSetting.set(‘default-rate’) = 100 Datastore

  23. > customer.rate_limit_modifier => 1.0 Datastore

  24. > customer.rate_limit => 100 Datastore

  25. > customer.rate_limit_modifier = 2.0 Datastore

  26. > customer.rate_limit => 200 Datastore

  27. ApplicationSetting.set(‘default-rate’) = 50 Datastore

  28. > customer.rate_limit => 100 Datastore

  29. Cost-based limits Gray Rate Limits

  30. (D)DoS Concerns Gray Rate Limits

  31. 5 Stop non-critical work

  32. > ReportSetting.get(“monthly_user_report”) => false Datastore

  33. class MonthlyUserReport end

  34. class MonthlyUserReport def run end end

  35. class MonthlyUserReport def run do_something end end

  36. class MonthlyUserReport def run return unless enabled? do_something end def

    enabled? ReportSetting.get(“monthly_user_report”) end end
  37. class MonthlyUserReport < Report def run return unless enabled? do_something

    end def enabled? ReportSetting.get(“monthly_user_report”) end end
  38. class MonthlyUserReport < Report def build do_something end end

  39. class Report def run return unless enabled? build end def

    enabled? ReportSetting.get(self.class.underscore) end end
  40. 6 Known Unknowns

  41. Scientist! https://github.com/github/scientist Datastore

  42. 7 Circuit Breakers

  43. Responsive shut-offs Gray Circuit Breakers

  44. Hard shut-offs Gray Circuit Breakers

  45. > BillingServiceClient.circuit.open => > BillingServiceClient.circuit.close => Caching layer or Datastore

  46. Consider implementation

  47. • Database • Data caching layer • Environment variables •

    Code Gray Consider Implementation
  48. Caveats

  49. Visibility Gray Caveats

  50. Got Options?

  51. Does it actually work? Gray Caveats

  52. Knowledge vs. Control Gray Caveats

  53. “I’d still take this deal any day

  54. Thank you! Amy Unger @cdwort

  55. Tomorrow 10:50 So You’ve Got Yourself a Kafka: Event-Powered Rails

    Services Stella Cotton Room 301-302 Tomorrow 11:40 Postgres 10, Performance, and You Gabe Enslein Room 306-307
  56. Thank you! Amy Unger @cdwort