A Fair Task Execution Framework

A Fair Task Execution Framework [email protected]

150 Million+ conversations on Helpshift in the last year

Running jobs on customer data • Maintenance is unavoidable ◦
Feature to release: Segment conversations into buckets ◦ Every conversation must be part of one bucket ◦ Need to add a “default bucket” to all conversation before release

Running jobs on customer data • Maintenance is unavoidable ◦
Feature to release: Segment conversations into buckets ◦ Every conversation must be part of one bucket ◦ Need to add a “default bucket” to all conversation before release • Uneven data distribution

Goals Fair 1. Runtime independent of data distribution 2. Only
dependent on amount of data Easy To understand and use

Strategy 1: Spawn a thread for each customer Run each
customer on a dedicated thread in a threadpool

Strategy 1: Report Card Goal Result Fair No Easy Yes
Easy to use but runtime affected by long running jobs

Strategy 2: Chunk-and-work Decouple the size of customers from the
unit of concurrency. Each task is a “small-enough” segment of a customer

Strategy 2: Chunk-and-work Decouple the size of customers from the
unit of concurrency. Each task is a “small-enough” segment of a customer [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...]

Strategy 2: Chunk-and-work

Strategy 2: Chunk-and-work Some timeranges contain more data than others
Some timeranges contain too little or no data

Strategy 2: Report Card Goal Result Fair Mostly Easy Yes

Uneven data distribution across customers => Handled Uneven division of tasks => Unhandled

Robust No Uneven data distribution across customers => Handled Uneven division of tasks => Unhandled

Strategy 2: How to achieve robustness?

Decouple data collection from execution How to achieve robustness?

Strategy 3: Iterator and Worker

Strategy 3: Iterator and Worker [{:cust-id “xyz” :month “Jan”} {:cust-id
“xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...]

Strategy 3: Iterator and Worker Dynamically chunk into uniformly sized
tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] Give me the next 10K records where id > last-id in this timerange

Strategy 3: Iterator and Worker On-the-fly chunked tasks Dynamically chunk
into uniformly sized tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] [data-id1, data-id2, ...] Give me the next 10K records where id > last-id in this timerange

Strategy 3: Iterator and Worker On-the-fly chunked tasks Dynamically chunk
into uniformly sized tasks [{:cust-id “xyz” :month “Jan”} {:cust-id “xyz” :month “Feb”} ... {:cust-id “abc” :month “Jan”} {:cust-id “abc” :month “Feb”} ...] [data-id1, data-id2, ...] Give me the next 10K records where id > last-id in this timerange Perform the task

Strategy 3: Iterator and Worker Launcher Iterator function Worker function

Strategy 3: Iterator and Worker Launcher Iterator function Worker function
Rendez-vous point

Strategy 3: Iterator and Worker (fair-exec-job (let [ch (async/chan 1000)]
{:iterator (partial iterate-customer ch) :input-tasks customer-xs-or-ch :chan ch :iterator-thread-count 10 :worker run-tasks :worker-thread-count 20})) Launcher

Strategy 3: Iterator and Worker (defn iterate-customer [output-chan {:keys [customer
query profiles-per-task] :as params}] (let [data (query-next-batch query) num (count data) last-id (:id (last data))] (doseq [chunk (partition-all 10000 data)] (async/>!! output-chan {:customer customer :chunk chunk})) (when (= num-entities query-limit) (recur (assoc params :query {:id {:$gt last-id}}))))) (fair-exec-job (let [ch (async/chan 1000)] {:iterator (partial iterate-customer ch) :input-tasks customer-xs-or-ch :chan ch :iterator-thread-count 10 :worker run-tasks :worker-thread-count 20})) Iterator Launcher

Strategy 3: Iterator and Worker (defn iterate-customer [output-chan {:keys [customer
query profiles-per-task] :as params}] (let [data (query-next-batch query) num (count data) last-id (:id (last data))] (doseq [chunk (partition-all 10000 data)] (async/>!! output-chan {:customer customer :chunk chunk})) (when (= num-entities query-limit) (recur (assoc params :query {:id {:$gt last-id}}))))) (defn run-tasks [{:keys [customer chunk] :as task}] (run-maintenance-on customer chunk)) (fair-exec-job (let [ch (async/chan 1000)] {:iterator (partial iterate-customer ch) :input-tasks customer-xs-or-ch :chan ch :iterator-thread-count 10 :worker run-tasks :worker-thread-count 20})) Iterator Worker Launcher

Strategy 3: Report Card Goal Result Fair Yes Easy Yes
Robust Yes

Conclusion • Concurrency => Parallelism

Conclusion • Concurrency => Parallelism • A good concurrent design
=> Tunable throughput

Conclusion • Concurrency => Parallelism • A good concurrent design
=> Tunable throughput • Clojure => Easy to harness the power of concurrency

Come say hi! We are hiring @helpshift https://jobs.lever.co/helpshift

Come say hi! We are hiring @helpshift https://jobs.lever.co/helpshift Thank you!
Questions?

A Fair Task Execution Framework

A Fair Task Execution Framework

More Decks by Mourjo Sen

Other Decks in Technology

Featured

Transcript