-Upload to Cloudinary
-Stored in S3, GCS
-Apply transformations
-Deliver via CDN
1 IMAGE’S JOURNEY
Slide 27
Slide 27 text
-Media Library (GUI for media cataloguing and editing)
-Developer APIs
-Bulk operations
-All this ⬆ is 0.3% of total requests handled
-99.7% is delivery: transforming/serving media
-P.S. We also do video!
SOME OTHER STUFF…
Slide 28
Slide 28 text
ABOUT YOUR PRESENTER
-Hi, I’m Ariel Caplan!
-Working in Rails since 2015
-Cloudinary since 2018
-@amcaplan
-I ♥ conferences!
Slide 29
Slide 29 text
HOW
CLOUDINARY
SCALES
Slide 30
Slide 30 text
HOW CLOUDINARY SCALES
-Layers
-Sharding
-Location
-Deduplicating Work
-Not Scaling
-Human Factors
Slide 31
Slide 31 text
LAYERS
Slide 32
Slide 32 text
“LAYERS. ONIONS HAVE LAYERS.
OGRES HAVE LAYERS. ONIONS
HAVE LAYERS. YOU GET IT? WE
BOTH HAVE LAYERS.”
SHREK
Slide 33
Slide 33 text
REQUEST LIFECYCLE
CDN
S3get
IO
CPU
Slide 34
Slide 34 text
REQUEST LIFECYCLE
CDN
S3get
IO
CPU
15B/day
1B/day
150M/day
125M/day
Slide 35
Slide 35 text
REQUEST LIFECYCLE
CDN
S3get
IO
CPU
15B/day
1B/day
150M/day
125M/day
Slide 36
Slide 36 text
-We clearly need, the question is build vs. buy)
-Pro: ~95% of traffic isn’t our problem!
-Pro: Leverage best-in-class service/features
-Pro: Multi-CDN for reliability
-Con: Need to play by their rules
-We write lots of custom rules (f_auto)
-Every provider has their own invalidation system
-Need to parse their logfiles to bill our customers
CDN
Slide 37
Slide 37 text
-High-throughput, simple service written in Go
-Handles 85% of requests it receives
-Takes up ~10% of the computing resources vs IO
-Pro: It’s super fast
-Con: Need to duplicate some Ruby logic in Go
S3GET
Slide 38
Slide 38 text
-Layers can scale independently
-Horizontal slices also exist! (image vs. video)
-Security (CPU can’t access internet or DB
OTHER LAYER ADVANTAGES
Slide 39
Slide 39 text
SHARDING
Slide 40
Slide 40 text
“SHARDS OF GLASS
CAN CUT AND WOUND
OR MAGNIFY A VISION”
TERRY TEMPEST WILLIAMS
Slide 41
Slide 41 text
-Vertical partitioning ≠ sharding
-Each database contains different tables
-Increases simultaneous writes/reads
-Supported in Rails 6.0
WHAT IS SHARDING?
Slide 42
Slide 42 text
-Horizontal partitioning = sharding
-1 table, multiple databases
-Keeps table size under control
-Supported in Rails 6.1
WHAT IS SHARDING?
Slide 43
Slide 43 text
-1 main shard for app-wide data
-Every cloud “lives” 100% on 1 of several shards
SHARDING @ CLOUDINARY
1 2 3 4
Slide 44
Slide 44 text
-1 main shard for app-wide data
-Every cloud “lives” 100% on 1 of several shards
-Code includes thousands of shard references
-Test environments must be sharded too!
SHARDING @ CLOUDINARY
cloud = find_cloud_from_request_params
cloud.on_shard do
# do the work
end
Slide 45
Slide 45 text
-Pro: Fast is good
-Pro: Flexibility
-Con: Error-prone
cloud = find_cloud_from_request_params
assets = []
cloud.on_shard do
assets = cloud.assets.where(tag: "duck")
end
render json: assets
SHARDING @ CLOUDINARY
Slide 46
Slide 46 text
-Pro: Fast is good
-Pro: Flexibility
-Con: Error-prone
cloud = find_cloud_from_request_params
assets = []
cloud.on_shard do
assets = cloud.assets.where(tag: "duck")
end
render json: assets
SHARDING @ CLOUDINARY
ActiveRecord::Relation
Queried outside shard block!
Slide 47
Slide 47 text
-Pro: Fast is good
-Pro: Flexibility
-Con: Error-prone
cloud = find_cloud_from_request_params
assets = []
cloud.on_shard do
assets = cloud.assets.where(tag: "duck").to_a
end
render json: assets
SHARDING @ CLOUDINARY
Load eagerly
Slide 48
Slide 48 text
-Pro: Fast is good
-Pro: Flexibility
-Con: Error-prone
SHARDING @ CLOUDINARY
people = ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do
Person.all
end
people #=> preloaded ActiveRecord::Relation from shard
Slide 49
Slide 49 text
-Pro: Fast is good
-Pro: Flexibility
-Con: Error-prone
SHARDING @ CLOUDINARY
people = nil
ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do
people = Person.all
"some other code"
end
people #=> ActiveRecord::Relation is loaded from default shard
Slide 50
Slide 50 text
LOCATION
Slide 51
Slide 51 text
“LOCATION,
LOCATION,
LOCATION”
EVERY REAL ESTATE AGENT EVER
Slide 52
Slide 52 text
-3 regions: US (default), EU, AP
-Premium customers choose closest to their users
-Dedicated shards per-region
-What about the primary DB?
-Option A Run 3 completely independent systems
-Option B EU and AP will be a little slower
-Option C Multi-Primary DB
-Option D NrtCache (Near RealTime)
REGIONS @ CLOUDINARY
Slide 53
Slide 53 text
NRTCACHE
US
Slide 54
Slide 54 text
NRTCACHE
US EU
Slide 55
Slide 55 text
NRTCACHE
US EU
Syncer
Slide 56
Slide 56 text
“WHAT COULD
GO WRONG?”
FAMOUS LAST WORDS
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
-Following a routine deploy, error rate spiked
-15 min “high” error rate
-The cause: A problematic migration
-Nothing we could do
THE BIG FAILURE
Cloud.find_each do |cloud|
# update some attributes
cloud.save!
end
Slide 59
Slide 59 text
-The solutions:
-Review EVERY code change for cloud updates
-Migrate to new && improved NrtCache
-Following a routine deploy, error rate spiked
-15 min “high” error rate
-The cause: A problematic migration
-Nothing we could do
THE BIG FAILURE
Slide 60
Slide 60 text
-Pro: Multi-region is mostly great for customers/users
-Con: Hard to do right
LOCATION
Slide 61
Slide 61 text
DEDUPLICATING
WORK
Slide 62
Slide 62 text
“THERE ARE ALL KINDS OF LOVE
IN THIS WORLD, BUT NEVER
THE SAME LOVE TWICE.”
F. SCOTT FITZGERALD
Slide 63
Slide 63 text
THE PROBLEM ubyShoes
r-ubyShoes
Slide 64
Slide 64 text
THE PROBLEM ubyShoes
r-ubyShoes
Slide 65
Slide 65 text
THE PROBLEM ubyShoes
r-ubyShoes
Slide 66
Slide 66 text
-Goals:
1. Don’t repeat transformations
2. Never block a job (2x > 0x)
-Implementation: Best-effort locking system
LOCKING
Slide 67
Slide 67 text
LOPTR
Slide 68
Slide 68 text
LOPTR
CDN
S3get
IO
CPU
Slide 69
Slide 69 text
LOPTR
CDN
S3get
IO
CPU
Loptr
Slide 70
Slide 70 text
-Read lock on asset before working on it
-No writing while the lock is held
-Write lock on derivation before generating
-This process can write
-Exclusive
-Depends on a well-behaved client
LOPTR MAY I…
Slide 71
Slide 71 text
-In-memory lock table for speed
-Written in Scala for high concurrency
LOPTR IMPLEMENTATION
Slide 72
Slide 72 text
-Failure to release
-Timeout
-Downtime
-Pretend every lock request succeeded
-Scaling Loptr
-Cluster with request targeted by consistent hash
LOPTR CONCERNS
Slide 73
Slide 73 text
-Pro: Resiliency to traffic surges
-Con: Unreleased locks can cause timeouts
-Note: Not 100% reliable (but still net positive)
LOPTR
Slide 74
Slide 74 text
NOT
SCALING
Slide 75
Slide 75 text
“WHAT YOU DON’T DO
DETERMINES WHAT YOU
CAN DO.”
TIM FERRISS
Slide 76
Slide 76 text
-Limit individual customer impact
-Would you rather deal with:
-1 dissatisfied customer
-Thousands of dissatisfied customers
-Rate limits
-Manage scarcity while eliminating scarcity
-Fair queueing
-Background jobs
HOW NOT TO SCALE
Slide 77
Slide 77 text
-Heavy API calls have a strict limit
-Locking effectively throttles non-rate-limited calls
RATE LIMITS
Slide 78
Slide 78 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 79
Slide 79 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 80
Slide 80 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 81
Slide 81 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 82
Slide 82 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 83
Slide 83 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 84
Slide 84 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 85
Slide 85 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 86
Slide 86 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 87
Slide 87 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 88
Slide 88 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 89
Slide 89 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 90
Slide 90 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 91
Slide 91 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 92
Slide 92 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 93
Slide 93 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 94
Slide 94 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 95
Slide 95 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 96
Slide 96 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 97
Slide 97 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 98
Slide 98 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 99
Slide 99 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 100
Slide 100 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
Slide 101
Slide 101 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 102
Slide 102 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 103
Slide 103 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 104
Slide 104 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 105
Slide 105 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 106
Slide 106 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 107
Slide 107 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 108
Slide 108 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 109
Slide 109 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 110
Slide 110 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 111
Slide 111 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 112
Slide 112 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 113
Slide 113 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 114
Slide 114 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 115
Slide 115 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 116
Slide 116 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 117
Slide 117 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 118
Slide 118 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 119
Slide 119 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 120
Slide 120 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 121
Slide 121 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 122
Slide 122 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 123
Slide 123 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 124
Slide 124 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 125
Slide 125 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
Slide 126
Slide 126 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 127
Slide 127 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 128
Slide 128 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 129
Slide 129 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 130
Slide 130 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 131
Slide 131 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 132
Slide 132 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 133
Slide 133 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
FAIR QUEUE ON CPU
1
3
2
4
2
3
1
2
3
1
4
Slide 134
Slide 134 text
FAIR QUEUE ON CPU
CDN
S3get
IO
CPU
Loptr
Slide 135
Slide 135 text
FAIR QUEUE ON CPU
CDN
S3get
IO
CPU
Loptr
Fair Queue
Slide 136
Slide 136 text
-Jobs are assigned a number of slots
-“Queue of queues” mechanism allots slots to clouds
-Prefer sync to async requests
FAIR QUEUE ON CPU LAYER
Slide 137
Slide 137 text
-“Queue of queues” throttles per-cloud concurrency
-One big monkeypatch
-ActiveRecord::ConnectionAdapters::ConnectionPool
FAIR QUEUE FOR DB ACCESS
Slide 138
Slide 138 text
-Anything that can be done out-of-band, should be
-Examples:
-CDN invalidations
-Webhooks
-Eager transformations
BACKGROUND JOBS
Slide 139
Slide 139 text
HUMAN
FACTORS
Slide 140
Slide 140 text
“THE GOOD NEWS ABOUT
COMPUTERS IS THAT THEY DO
WHAT YOU TELL THEM TO DO.
THE BAD NEWS IS THAT THEY DO
WHAT YOU TELL THEM TO DO.”
TED NELSON
Slide 141
Slide 141 text
-Education
-Encourage practices like eager generation
-Relationships
-Understand customer use cases
-They inform us about changes in use patterns
-Look for win-wins!
SCALING VIA HUMANS
Slide 142
Slide 142 text
CLOUDINARY
ON
RAILS
Slide 143
Slide 143 text
-Most of our traffic never touches Rails
-The fastest parts of the system aren’t in Ruby
-The computation-heavy parts are low-level utils or APIs
-The database scaling is language-independent
IT WAS NEVER ABOUT RAILS
Slide 144
Slide 144 text
-2 developers had 4 incredibly productive years
-Ruby is actually " for creating interfaces
-Just needed to move a few things out of Ruby
RAILS IS GREAT!
Slide 145
Slide 145 text
-Upgrading Rails is hard
-Upgrading monkeypatched Rails is harder
-Specific to Israel:
-Difficult to recruit Ruby devs
-Difficult to recruit devs who want to learn Ruby
RAILS IS CHALLENGING
Slide 146
Slide 146 text
-Here to stay, but…
-Polyglot microservices
-Build the app our next employee wants to work on
THE FUTURE OF RAILS @ CLOUDINARY
Slide 147
Slide 147 text
THANKS!
Ariel Caplan • @amcaplan • amcaplan.ninja
Special thanks to advisors/reviewers:
Itai Benari
Max Rozenoer
Vladimir Shteinman