A Fair Task Execution Framework

C5686e8241d39d963c175bb1738295d0?s=47 Mourjo Sen
January 12, 2019

A Fair Task Execution Framework

How to ensure that in a multi-tenant business on the same infrastructure, one big customer will not bottleneck every single maintenance task you have to perform on all customers’ data? With the help of Clojure’s concurrency and core.async, the answer to this question was intuitive and effective.

C5686e8241d39d963c175bb1738295d0?s=128

Mourjo Sen

January 12, 2019
Tweet

Transcript

  1. A Fair Task Execution Framework mourjo@helpshift.com

  2. 150 Million+ conversations on Helpshift in the last year

  3. Running jobs on customer data • Maintenance is unavoidable ◦

    Feature to release: Segment conversations into buckets ◦ Every conversation must be part of one bucket ◦ Need to add a “default bucket” to all conversation before release
  4. Running jobs on customer data • Maintenance is unavoidable ◦

    Feature to release: Segment conversations into buckets ◦ Every conversation must be part of one bucket ◦ Need to add a “default bucket” to all conversation before release • Uneven data distribution
  5. Goals Fair 1. Runtime independent of data distribution 2. Only

    dependent on amount of data Easy To understand and use
  6. Strategy 1: Spawn a thread for each customer Run each

    customer on a dedicated thread in a threadpool
  7. Strategy 1: Spawn a thread for each customer Run each

    customer on a dedicated thread in a threadpool
  8. Strategy 1: Report Card Goal Result Fair No Easy Yes

    Easy to use but runtime affected by long running jobs
  9. Strategy 2: Chunk-and-work Decouple the size of customers from the

    unit of concurrency. Each task is a “small-enough” segment of a customer
  10. Strategy 2: Chunk-and-work Decouple the size of customers from the

    unit of concurrency. Each task is a “small-enough” segment of a customer [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...]
  11. Strategy 2: Chunk-and-work

  12. Strategy 2: Chunk-and-work

  13. Strategy 2: Chunk-and-work Some timeranges contain more data than others

    Some timeranges contain too little or no data
  14. Strategy 2: Report Card Goal Result Fair Mostly Easy Yes

  15. Strategy 2: Report Card Goal Result Fair Mostly Easy Yes

    Uneven data distribution across customers => Handled Uneven division of tasks => Unhandled
  16. Strategy 2: Report Card Goal Result Fair Mostly Easy Yes

    Robust No Uneven data distribution across customers => Handled Uneven division of tasks => Unhandled
  17. Strategy 2: How to achieve robustness?

  18. Strategy 2: How to achieve robustness?

  19. Decouple data collection from execution How to achieve robustness?

  20. Strategy 3: Iterator and Worker

  21. Strategy 3: Iterator and Worker [{:cust-id “xyz” :month “Jan”} {:cust-id

    “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...]
  22. Strategy 3: Iterator and Worker Dynamically chunk into uniformly sized

    tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] Give me the next 10K records where id > last-id in this timerange
  23. Strategy 3: Iterator and Worker On-the-fly chunked tasks Dynamically chunk

    into uniformly sized tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] [data-id1, data-id2, ...] Give me the next 10K records where id > last-id in this timerange
  24. Strategy 3: Iterator and Worker On-the-fly chunked tasks Dynamically chunk

    into uniformly sized tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] [data-id1, data-id2, ...] Give me the next 10K records where id > last-id in this timerange Perform the task
  25. Strategy 3: Iterator and Worker On-the-fly chunked tasks Dynamically chunk

    into uniformly sized tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] [data-id1, data-id2, ...] Give me the next 10K records where id > last-id in this timerange Perform the task
  26. Strategy 3: Iterator and Worker Launcher Iterator function Worker function

  27. Strategy 3: Iterator and Worker Launcher Iterator function Worker function

    Rendez-vous point
  28. Strategy 3: Iterator and Worker (fair-exec-job (let [ch (async/chan 1000)]

    {:iterator (partial iterate-customer ch) :input-tasks customer-xs-or-ch :chan ch :iterator-thread-count 10 :worker run-tasks :worker-thread-count 20})) Launcher
  29. Strategy 3: Iterator and Worker (defn iterate-customer [output-chan {:keys [customer

    query profiles-per-task] :as params}] (let [data (query-next-batch query) num (count data) last-id (:id (last data))] (doseq [chunk (partition-all 10000 data)] (async/>!! output-chan {:customer customer :chunk chunk})) (when (= num-entities query-limit) (recur (assoc params :query {:id {:$gt last-id}}))))) (fair-exec-job (let [ch (async/chan 1000)] {:iterator (partial iterate-customer ch) :input-tasks customer-xs-or-ch :chan ch :iterator-thread-count 10 :worker run-tasks :worker-thread-count 20})) Iterator Launcher
  30. Strategy 3: Iterator and Worker (defn iterate-customer [output-chan {:keys [customer

    query profiles-per-task] :as params}] (let [data (query-next-batch query) num (count data) last-id (:id (last data))] (doseq [chunk (partition-all 10000 data)] (async/>!! output-chan {:customer customer :chunk chunk})) (when (= num-entities query-limit) (recur (assoc params :query {:id {:$gt last-id}}))))) (defn run-tasks [{:keys [customer chunk] :as task}] (run-maintenance-on customer chunk)) (fair-exec-job (let [ch (async/chan 1000)] {:iterator (partial iterate-customer ch) :input-tasks customer-xs-or-ch :chan ch :iterator-thread-count 10 :worker run-tasks :worker-thread-count 20})) Iterator Worker Launcher
  31. Strategy 3: Report Card Goal Result Fair Yes Easy Yes

    Robust Yes
  32. Conclusion • Concurrency => Parallelism

  33. Conclusion • Concurrency => Parallelism • A good concurrent design

    => Tunable throughput
  34. Conclusion • Concurrency => Parallelism • A good concurrent design

    => Tunable throughput • Clojure => Easy to harness the power of concurrency
  35. Come say hi! We are hiring @helpshift https://jobs.lever.co/helpshift

  36. Come say hi! We are hiring @helpshift https://jobs.lever.co/helpshift Thank you!

    Questions?