Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Trail to Scale Without Fail: Rails?

The Trail to Scale Without Fail: Rails?

Let's be blunt: Most web apps aren’t so computation-heavy and won't hit scaling issues.

What if yours is the exception? Can Rails handle it?

Cue Exhibit A: Cloudinary, which serves billions of image and video requests daily, including on-the-fly edits, QUICKLY, running on Rails since Day 1. Case closed?

Not so fast. Beyond the app itself, we needed creative solutions to ensure that, as traffic rises and falls at the speed of the internet, we handle the load gracefully, and no customer overwhelms the system.

The real question isn't whether Rails is up to the challenge, but rather: Are you?

Ariel Caplan

April 12, 2021
Tweet

More Decks by Ariel Caplan

Other Decks in Technology

Transcript

  1. GitHub's journey towards microservices and more: 'We actually have our

    own version of Ruby that we maintain’ The Register, 1 Dec 2020
  2. NORMALIZED TO REQ/DAY 1.3 billion (2015) 2.6 billion (2016, likely

    2x by now) 1 billion (2020, just API) 15 billion (2021) *Media requests
  3. -Media Library (GUI for media cataloguing and editing) -Developer APIs

    -Bulk operations -All this ⬆ is 0.3% of total requests handled -99.7% is delivery: transforming/serving media -P.S. We also do video! SOME OTHER STUFF…
  4. ABOUT YOUR PRESENTER -Hi, I’m Ariel Caplan! -Working in Rails

    since 2015 -Cloudinary since 2018 -@amcaplan -I ♥ conferences!
  5. “LAYERS. ONIONS HAVE LAYERS. OGRES HAVE LAYERS. ONIONS HAVE LAYERS.

    YOU GET IT? WE BOTH HAVE LAYERS.”  SHREK
  6. -We clearly need, the question is build vs. buy) -Pro:

    ~95% of traffic isn’t our problem! -Pro: Leverage best-in-class service/features -Pro: Multi-CDN for reliability -Con: Need to play by their rules -We write lots of custom rules (f_auto) -Every provider has their own invalidation system -Need to parse their logfiles to bill our customers CDN
  7. -High-throughput, simple service written in Go -Handles 85% of requests

    it receives -Takes up ~10% of the computing resources vs IO -Pro: It’s super fast -Con: Need to duplicate some Ruby logic in Go S3GET
  8. -Layers can scale independently -Horizontal slices also exist! (image vs.

    video) -Security (CPU can’t access internet or DB OTHER LAYER ADVANTAGES
  9. “SHARDS OF GLASS CAN CUT AND WOUND OR MAGNIFY A

    VISION”  TERRY TEMPEST WILLIAMS
  10. -Vertical partitioning ≠ sharding -Each database contains different tables -Increases

    simultaneous writes/reads -Supported in Rails 6.0 WHAT IS SHARDING?
  11. -Horizontal partitioning = sharding -1 table, multiple databases -Keeps table

    size under control -Supported in Rails 6.1 WHAT IS SHARDING?
  12. -1 main shard for app-wide data -Every cloud “lives” 100%

    on 1 of several shards SHARDING @ CLOUDINARY 1 2 3 4
  13. -1 main shard for app-wide data -Every cloud “lives” 100%

    on 1 of several shards -Code includes thousands of shard references -Test environments must be sharded too! SHARDING @ CLOUDINARY cloud = find_cloud_from_request_params cloud.on_shard do # do the work end
  14. -Pro: Fast is good -Pro: Flexibility -Con: Error-prone cloud =

    find_cloud_from_request_params assets = [] cloud.on_shard do assets = cloud.assets.where(tag: "duck") end render json: assets SHARDING @ CLOUDINARY
  15. -Pro: Fast is good -Pro: Flexibility -Con: Error-prone cloud =

    find_cloud_from_request_params assets = [] cloud.on_shard do assets = cloud.assets.where(tag: "duck") end render json: assets SHARDING @ CLOUDINARY ActiveRecord::Relation Queried outside shard block!
  16. -Pro: Fast is good -Pro: Flexibility -Con: Error-prone cloud =

    find_cloud_from_request_params assets = [] cloud.on_shard do assets = cloud.assets.where(tag: "duck").to_a end render json: assets SHARDING @ CLOUDINARY Load eagerly
  17. -Pro: Fast is good -Pro: Flexibility -Con: Error-prone SHARDING @

    CLOUDINARY people = ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do Person.all end people #=> preloaded ActiveRecord::Relation from shard
  18. -Pro: Fast is good -Pro: Flexibility -Con: Error-prone SHARDING @

    CLOUDINARY people = nil ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do people = Person.all "some other code" end people #=> ActiveRecord::Relation is loaded from default shard
  19. -3 regions: US (default), EU, AP -Premium customers choose closest

    to their users -Dedicated shards per-region -What about the primary DB? -Option A Run 3 completely independent systems -Option B EU and AP will be a little slower -Option C Multi-Primary DB -Option D NrtCache (Near RealTime) REGIONS @ CLOUDINARY
  20. -Following a routine deploy, error rate spiked -15 min “high”

    error rate -The cause: A problematic migration -Nothing we could do THE BIG FAILURE Cloud.find_each do |cloud| # update some attributes cloud.save! end
  21. -The solutions: -Review EVERY code change for cloud updates -Migrate

    to new && improved NrtCache -Following a routine deploy, error rate spiked -15 min “high” error rate -The cause: A problematic migration -Nothing we could do THE BIG FAILURE
  22. “THERE ARE ALL KINDS OF LOVE IN THIS WORLD, BUT

    NEVER THE SAME LOVE TWICE.”  F. SCOTT FITZGERALD
  23. -Goals: 1. Don’t repeat transformations 2. Never block a job

    (2x > 0x) -Implementation: Best-effort locking system LOCKING
  24. -Read lock on asset before working on it -No writing

    while the lock is held -Write lock on derivation before generating -This process can write -Exclusive -Depends on a well-behaved client LOPTR MAY I…
  25. -Failure to release -Timeout -Downtime -Pretend every lock request succeeded

    -Scaling Loptr -Cluster with request targeted by consistent hash LOPTR CONCERNS
  26. -Pro: Resiliency to traffic surges -Con: Unreleased locks can cause

    timeouts -Note: Not 100% reliable (but still net positive) LOPTR
  27. -Limit individual customer impact -Would you rather deal with: -1

    dissatisfied customer -Thousands of dissatisfied customers -Rate limits -Manage scarcity while eliminating scarcity -Fair queueing -Background jobs HOW NOT TO SCALE
  28. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  29. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  30. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  31. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  32. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  33. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  34. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  35. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  36. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  37. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  38. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  39. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  40. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  41. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  42. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  43. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  44. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  45. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  46. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  47. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  48. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  49. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  50. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU
  51. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  52. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  53. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  54. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  55. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  56. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  57. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  58. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  59. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  60. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  61. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  62. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  63. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  64. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  65. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  66. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  67. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  68. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  69. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  70. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  71. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  72. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  73. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  74. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  75. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1
  76. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  77. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  78. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  79. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  80. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  81. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  82. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  83. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds FAIR QUEUE ON CPU 1 3 2 4 2 3 1 2 3 1 4
  84. -Jobs are assigned a number of slots -“Queue of queues”

    mechanism allots slots to clouds -Prefer sync to async requests FAIR QUEUE ON CPU LAYER
  85. -Anything that can be done out-of-band, should be -Examples: -CDN

    invalidations -Webhooks -Eager transformations BACKGROUND JOBS
  86. “THE GOOD NEWS ABOUT COMPUTERS IS THAT THEY DO WHAT

    YOU TELL THEM TO DO. THE BAD NEWS IS THAT THEY DO WHAT YOU TELL THEM TO DO.”  TED NELSON
  87. -Education -Encourage practices like eager generation -Relationships -Understand customer use

    cases -They inform us about changes in use patterns -Look for win-wins! SCALING VIA HUMANS
  88. -Most of our traffic never touches Rails -The fastest parts

    of the system aren’t in Ruby -The computation-heavy parts are low-level utils or APIs -The database scaling is language-independent IT WAS NEVER ABOUT RAILS
  89. -2 developers had 4 incredibly productive years -Ruby is actually

    " for creating interfaces -Just needed to move a few things out of Ruby RAILS IS GREAT!
  90. -Upgrading Rails is hard -Upgrading monkeypatched Rails is harder -Specific

    to Israel: -Difficult to recruit Ruby devs -Difficult to recruit devs who want to learn Ruby RAILS IS CHALLENGING
  91. -Here to stay, but… -Polyglot microservices -Build the app our

    next employee wants to work on THE FUTURE OF RAILS @ CLOUDINARY
  92. THANKS! Ariel Caplan • @amcaplan • amcaplan.ninja Special thanks to

    advisors/reviewers: Itai Benari Max Rozenoer Vladimir Shteinman