Slide 1

Slide 1 text

Running Jobs at Scale Kir Shatrov GORUCO 2018, @kirshatrov

Slide 2

Slide 2 text

GORUCO 2018, @kirshatrov

Slide 3

Slide 3 text

GORUCO 2018, @kirshatrov

Slide 4

Slide 4 text

GORUCO 2018, @kirshatrov

Slide 5

Slide 5 text

GORUCO 2018, @kirshatrov

Slide 6

Slide 6 text

Jobs GORUCO 2018, @kirshatrov

Slide 7

Slide 7 text

class ExampleJob < ActiveJob::Base def perform ... end end GORUCO 2018, @kirshatrov

Slide 8

Slide 8 text

class ExampleJob < ActiveJob::Base def perform Product.all.find_each do |product| product.sync_and_refresh end end end GORUCO 2018, @kirshatrov

Slide 9

Slide 9 text

class ExampleJob < ActiveJob::Base def perform Product.all.find_each do |product| product.sync_and_refresh end end end minutes? hours? days? GORUCO 2018, @kirshatrov

Slide 10

Slide 10 text

Long-running jobs GORUCO 2018, @kirshatrov

Slide 11

Slide 11 text

Long-running jobs — Deploys and termination GORUCO 2018, @kirshatrov

Slide 12

Slide 12 text

Long-running jobs — Deploys and termination — Abort and re-enqueue — Progress lost GORUCO 2018, @kirshatrov

Slide 13

Slide 13 text

GORUCO 2018, @kirshatrov

Slide 14

Slide 14 text

Long-running jobs — Deploys and termination — Abort and re-enqueue — Progress lost — Job may never complete GORUCO 2018, @kirshatrov

Slide 15

Slide 15 text

Long-running jobs — Deploys and termination — Abort and re-enqueue — Progress lost — Job may never complete — Capacity and worker starvation GORUCO 2018, @kirshatrov

Slide 16

Slide 16 text

Long-running jobs — Deploys and termination — Abort and re-enqueue — Progress lost — Job may never complete — Capacity and worker starvation — Cloud ☁ GORUCO 2018, @kirshatrov

Slide 17

Slide 17 text

Why is it taking long? Because it iterates over a long collection. GORUCO 2018, @kirshatrov

Slide 18

Slide 18 text

What if jobs were interruptible and resumable? GORUCO 2018, @kirshatrov

Slide 19

Slide 19 text

Split the job definition 1. Collection to process 2. Work to be done GORUCO 2018, @kirshatrov

Slide 20

Slide 20 text

Split the job definition 1. Collection to process ≫ Product.all 2. Work to be done GORUCO 2018, @kirshatrov

Slide 21

Slide 21 text

Split the job definition 1. Collection to process ≫ Product.all 2. Work to be done ≫ product.sync_and_refresh GORUCO 2018, @kirshatrov

Slide 22

Slide 22 text

class ExampleJob < ActiveJob::Base include Iteration def collection Product.all end def each_iteration(product) product.sync_and_refresh end end GORUCO 2018, @kirshatrov

Slide 23

Slide 23 text

— def perform — collection — each_iteration GORUCO 2018, @kirshatrov

Slide 24

Slide 24 text

Product.all cursor: 1 GORUCO 2018, @kirshatrov

Slide 25

Slide 25 text

Product.all cursor: 2 GORUCO 2018, @kirshatrov

Slide 26

Slide 26 text

Product.all cursor: 3 GORUCO 2018, @kirshatrov

Slide 27

Slide 27 text

Product.all cursor: 4 GORUCO 2018, @kirshatrov

Slide 28

Slide 28 text

Product.all cursor: 5 GORUCO 2018, @kirshatrov

Slide 29

Slide 29 text

Product.all cursor: 450123 GORUCO 2018, @kirshatrov

Slide 30

Slide 30 text

class WhateverJob < ActiveJob::Base include Iteration def collection Enumerator.new do |enum| 3.times do |n| enum << n end end end def each_iteration(n) # do something three times! end end GORUCO 2018, @kirshatrov

Slide 31

Slide 31 text

Endless possibilities — Interrupt and resume at any moment — Progress tracking — Parallel computations — Throttling by default GORUCO 2018, @kirshatrov

Slide 32

Slide 32 text

Benefits for the infrastructure — Keep supporting long-running jobs — Success for Cloud runtime — Make scale invisible for developers — Opportunities to save money with short-living instances in Cloud GORUCO 2018, @kirshatrov

Slide 33

Slide 33 text

Thank you! @kirshatrov GORUCO 2018, @kirshatrov