Manual load balancing is harder Tasks, not threads Proposed by J. Reinders as one of the four features for parallel programming solutions Performance portability Again proposed by J. Reinders as one of the five qualities desired Hard for heterogeneous/asymmetric architectures Task scheduling plays a key role!
a central task pool Lock the central task pool whenever a worker tries to get a task from it Each worker has a task pool, and get task from its own task pool mostly. Lock the victim worker’s task pool when a worker tries to steal a task from it 7 Tasks are scheduled to workers randomly
twice the speed of c1 , c2 and c3 T1 , T2 , T3 and T4 need 1.5t, 4t, 1t, 1.5t respectively on c0 Therefore, they need 3t, 8t, 2t, 3t on c1 , c2 and c3
their workloads are required. However, the information is not available until the tasks are completed. The solution cannot tolerate the dynamic changing of tasks and their workloads Dynamic adjusting is required to further balance workloads
the first limitation Allocate tasks to different c-groups according to their workloads and the computation capacity of c- groups Preference-based Task-stealing To address the second limitation Adjust the workloads dynamically among different c-groups
function have similar workloads The percentage of tasks executing the same function among all tasks is almost the same during the execution of a parallel application.
, ni , wi ) (1<i<m) in descending order of wi The overall workload ni * wi is used as the workload of the task class TCi (fi , ni , wi ) The near optimal algorithm is applied to group task classes into task clusters that are mapped to c-groups.
for its c- group. Steal tasks from other task pools for its c- group. Obtain tasks from its local task pool for the next c-group in its preference list Steal tasks from other task pools for the next c- group in its preference list.
PFT: traditional Parent-First Task-stealing scheduler. RTS: Random Task-Snatching scheduler. (a faster core snatches tasks from a randomly chosen slower core if the faster core cannot steal any task.) WATS: our Workload-Aware Task Scheduler
different workloads (in proportion of 8t, 4t, 2t and t) in each batch. Num of tasks with workloads of 8t, 4t, 2t and t is n, n, n, 128-n. WATS is scalable RTS is not scalable n
that can allocate tasks in AMC near- optimally. We have proposed a novel preference-based task- stealing policy that can effectively balance workloads among different groups of cores. We have implemented a task scheduler, WATS, which achieves a performance gain of up to 82.7% compared to the random task stealing approach commonly employed.
thanks Dept of Computer Science, Univ. of Otago for hosting and funding his study. This research was partially supported by Natural Science Foundation China. Reference: WATS: Workload-Aware Task Scheduling in Asymmetric Multi- core Architectures, Quan Chen , Yawen Chen, Zhiyi Huang, and Minyi Guo, to appear in the Proceedings of IPDPS'12, Shanghai, May 2012.