Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running Jobs at Scale

Running Jobs at Scale

My talk from GORUCO 2018 in New York.

Kir Shatrov

June 16, 2018
Tweet

More Decks by Kir Shatrov

Other Decks in Programming

Transcript

  1. Running Jobs at Scale
    Kir Shatrov
    GORUCO 2018, @kirshatrov

    View full-size slide

  2. GORUCO 2018, @kirshatrov

    View full-size slide

  3. GORUCO 2018, @kirshatrov

    View full-size slide

  4. GORUCO 2018, @kirshatrov

    View full-size slide

  5. GORUCO 2018, @kirshatrov

    View full-size slide

  6. Jobs
    GORUCO 2018, @kirshatrov

    View full-size slide

  7. class ExampleJob < ActiveJob::Base
    def perform
    ...
    end
    end
    GORUCO 2018, @kirshatrov

    View full-size slide

  8. class ExampleJob < ActiveJob::Base
    def perform
    Product.all.find_each do |product|
    product.sync_and_refresh
    end
    end
    end
    GORUCO 2018, @kirshatrov

    View full-size slide

  9. class ExampleJob < ActiveJob::Base
    def perform
    Product.all.find_each do |product|
    product.sync_and_refresh
    end
    end
    end
    minutes? hours? days?
    GORUCO 2018, @kirshatrov

    View full-size slide

  10. Long-running jobs
    GORUCO 2018, @kirshatrov

    View full-size slide

  11. Long-running jobs
    — Deploys and termination
    GORUCO 2018, @kirshatrov

    View full-size slide

  12. Long-running jobs
    — Deploys and termination
    — Abort and re-enqueue
    — Progress lost
    GORUCO 2018, @kirshatrov

    View full-size slide

  13. GORUCO 2018, @kirshatrov

    View full-size slide

  14. Long-running jobs
    — Deploys and termination
    — Abort and re-enqueue
    — Progress lost
    — Job may never complete
    GORUCO 2018, @kirshatrov

    View full-size slide

  15. Long-running jobs
    — Deploys and termination
    — Abort and re-enqueue
    — Progress lost
    — Job may never complete
    — Capacity and worker starvation
    GORUCO 2018, @kirshatrov

    View full-size slide

  16. Long-running jobs
    — Deploys and termination
    — Abort and re-enqueue
    — Progress lost
    — Job may never complete
    — Capacity and worker starvation
    — Cloud

    GORUCO 2018, @kirshatrov

    View full-size slide

  17. Why is it taking long?
    Because it iterates over a long collection.
    GORUCO 2018, @kirshatrov

    View full-size slide

  18. What if jobs were interruptible and
    resumable?
    GORUCO 2018, @kirshatrov

    View full-size slide

  19. Split the job definition
    1. Collection to process
    2. Work to be done
    GORUCO 2018, @kirshatrov

    View full-size slide

  20. Split the job definition
    1. Collection to process ≫ Product.all
    2. Work to be done
    GORUCO 2018, @kirshatrov

    View full-size slide

  21. Split the job definition
    1. Collection to process ≫ Product.all
    2. Work to be done ≫ product.sync_and_refresh
    GORUCO 2018, @kirshatrov

    View full-size slide

  22. class ExampleJob < ActiveJob::Base
    include Iteration
    def collection
    Product.all
    end
    def each_iteration(product)
    product.sync_and_refresh
    end
    end
    GORUCO 2018, @kirshatrov

    View full-size slide

  23. — def perform
    — collection
    — each_iteration
    GORUCO 2018, @kirshatrov

    View full-size slide

  24. Product.all
    cursor: 1
    GORUCO 2018, @kirshatrov

    View full-size slide

  25. Product.all
    cursor: 2
    GORUCO 2018, @kirshatrov

    View full-size slide

  26. Product.all
    cursor: 3
    GORUCO 2018, @kirshatrov

    View full-size slide

  27. Product.all
    cursor: 4
    GORUCO 2018, @kirshatrov

    View full-size slide

  28. Product.all
    cursor: 5
    GORUCO 2018, @kirshatrov

    View full-size slide

  29. Product.all
    cursor: 450123
    GORUCO 2018, @kirshatrov

    View full-size slide

  30. class WhateverJob < ActiveJob::Base
    include Iteration
    def collection
    Enumerator.new do |enum|
    3.times do |n|
    enum << n
    end
    end
    end
    def each_iteration(n)
    # do something three times!
    end
    end
    GORUCO 2018, @kirshatrov

    View full-size slide

  31. Endless possibilities
    — Interrupt and resume at any moment
    — Progress tracking
    — Parallel computations
    — Throttling by default
    GORUCO 2018, @kirshatrov

    View full-size slide

  32. Benefits for the infrastructure
    — Keep supporting long-running jobs
    — Success for Cloud runtime
    — Make scale invisible for developers
    — Opportunities to save money with short-living instances in Cloud
    GORUCO 2018, @kirshatrov

    View full-size slide

  33. Thank you!
    @kirshatrov
    GORUCO 2018, @kirshatrov

    View full-size slide