Running Jobs at Scale
Kir Shatrov
GORUCO 2018, @kirshatrov
Slide 2
Slide 2 text
GORUCO 2018, @kirshatrov
Slide 3
Slide 3 text
GORUCO 2018, @kirshatrov
Slide 4
Slide 4 text
GORUCO 2018, @kirshatrov
Slide 5
Slide 5 text
GORUCO 2018, @kirshatrov
Slide 6
Slide 6 text
Jobs
GORUCO 2018, @kirshatrov
Slide 7
Slide 7 text
class ExampleJob < ActiveJob::Base
def perform
...
end
end
GORUCO 2018, @kirshatrov
Slide 8
Slide 8 text
class ExampleJob < ActiveJob::Base
def perform
Product.all.find_each do |product|
product.sync_and_refresh
end
end
end
GORUCO 2018, @kirshatrov
Slide 9
Slide 9 text
class ExampleJob < ActiveJob::Base
def perform
Product.all.find_each do |product|
product.sync_and_refresh
end
end
end
minutes? hours? days?
GORUCO 2018, @kirshatrov
Slide 10
Slide 10 text
Long-running jobs
GORUCO 2018, @kirshatrov
Slide 11
Slide 11 text
Long-running jobs
— Deploys and termination
GORUCO 2018, @kirshatrov
Slide 12
Slide 12 text
Long-running jobs
— Deploys and termination
— Abort and re-enqueue
— Progress lost
GORUCO 2018, @kirshatrov
Slide 13
Slide 13 text
GORUCO 2018, @kirshatrov
Slide 14
Slide 14 text
Long-running jobs
— Deploys and termination
— Abort and re-enqueue
— Progress lost
— Job may never complete
GORUCO 2018, @kirshatrov
Slide 15
Slide 15 text
Long-running jobs
— Deploys and termination
— Abort and re-enqueue
— Progress lost
— Job may never complete
— Capacity and worker starvation
GORUCO 2018, @kirshatrov
Slide 16
Slide 16 text
Long-running jobs
— Deploys and termination
— Abort and re-enqueue
— Progress lost
— Job may never complete
— Capacity and worker starvation
— Cloud
☁
GORUCO 2018, @kirshatrov
Slide 17
Slide 17 text
Why is it taking long?
Because it iterates over a long collection.
GORUCO 2018, @kirshatrov
Slide 18
Slide 18 text
What if jobs were interruptible and
resumable?
GORUCO 2018, @kirshatrov
Slide 19
Slide 19 text
Split the job definition
1. Collection to process
2. Work to be done
GORUCO 2018, @kirshatrov
Slide 20
Slide 20 text
Split the job definition
1. Collection to process ≫ Product.all
2. Work to be done
GORUCO 2018, @kirshatrov
Slide 21
Slide 21 text
Split the job definition
1. Collection to process ≫ Product.all
2. Work to be done ≫ product.sync_and_refresh
GORUCO 2018, @kirshatrov
Slide 22
Slide 22 text
class ExampleJob < ActiveJob::Base
include Iteration
def collection
Product.all
end
def each_iteration(product)
product.sync_and_refresh
end
end
GORUCO 2018, @kirshatrov
class WhateverJob < ActiveJob::Base
include Iteration
def collection
Enumerator.new do |enum|
3.times do |n|
enum << n
end
end
end
def each_iteration(n)
# do something three times!
end
end
GORUCO 2018, @kirshatrov
Slide 31
Slide 31 text
Endless possibilities
— Interrupt and resume at any moment
— Progress tracking
— Parallel computations
— Throttling by default
GORUCO 2018, @kirshatrov
Slide 32
Slide 32 text
Benefits for the infrastructure
— Keep supporting long-running jobs
— Success for Cloud runtime
— Make scale invisible for developers
— Opportunities to save money with short-living instances in Cloud
GORUCO 2018, @kirshatrov