$30 off During Our Annual Pro Sale. View Details »

Heuristics for Resource Matching in Intel's Compute Farm

oshai
July 03, 2013

Heuristics for Resource Matching in Intel's Compute Farm

JSSPP 2013 slides of my thesis presentation:
http://www.cs.huji.ac.il/~feit/parsched/jsspp13/

oshai

July 03, 2013
Tweet

More Decks by oshai

Other Decks in Research

Transcript

  1. Heuristics for Resource Matching
    in Intel’s Compute Farm
    Masters Thesis By Ohad Shai
    Supervised by:
    • Dr. Edi Shmueli, IDC Haifa
    • Prof. Dror G. Feitelson – Hebrew University, Israel

    View Slide

  2. 2
    Intel Information Technology
    Intel Information Technology
    Agenda
    •Background
    •Job’s at Intel data-center characteristics
    •Proposed algorithms
    •Mix-Fit
    •Max-Jobs
    •Simulation results
    •Conclusion

    View Slide

  3. 3
    Intel Information Technology
    Intel Information Technology
    Background
    •Intel owns an Internet-scale distributed compute farm
    •Massive chip-simulation workloads
    •Tens of thousands of servers
    •Dozens of data-centers around the world
    •Thousands of newly incoming jobs every second
    •Capable of running hundreds of thousands of jobs
    simultaneously
    •Netbatch used for resource and scheduling management
    •An in-house developed application

    View Slide

  4. 4
    Intel Information Technology
    Intel Information Technology
    Netbatch components
    Virtual Pool (VPM)
    Physical Pool (PPM)
    Physical Pool (PPM)
    Machine
    Machine
    Machine
    Machine
    Machine
    Machine

    View Slide

  5. 5
    Intel Information Technology
    Intel Information Technology
    Netbatch components
    Virtual Pool (VPM)
    Physical Pool (PPM)
    Physical Pool (PPM)
    Machine
    Machine
    Machine
    Machine
    Machine
    Machine

    View Slide

  6. 6
    Intel Information Technology
    Intel Information Technology
    Background
    •Resource matching is done at the PPM level
    •Matching jobs to machines based on cores and memory
    requirements - resources are allocated exclusively
    •Preserve fairness among teams at Intel - that part was
    not covered in our research
    •We show that neither heuristic used today is optimal
    •We suggest other heuristics to improve flexibility and
    utilization in the pools

    View Slide

  7. 7
    Intel Information Technology
    Intel Information Technology
    The workload
    •Traces from 4 of Intel’s largest physical pools
    •10 – 13 million jobs each
    •One month period (November 2012)

    View Slide

  8. 8
    Intel Information Technology
    Intel Information Technology
    Resource requirements by Jobs
    •Most of the jobs require 1 core

    View Slide

  9. 9
    Intel Information Technology
    Intel Information Technology
    Resource requirements by Jobs
    •Most of the jobs require 1 core
    •Most of the jobs require less than 5 GB memory

    View Slide

  10. 10
    Intel Information Technology
    Intel Information Technology
    Resource requirements by Jobs
    •Most of the jobs require 1 core
    •Most of the jobs require less than 5 GB memory
    •But still, there are bursts of higher demand
    •Buckets of 1000 jobs
    •Ordered by arrival

    View Slide

  11. 11
    Intel Information Technology
    Intel Information Technology
    Resource requirements by Jobs
    •Most of the jobs require 1 core
    •Most of the jobs require less than 5 GB memory
    •But still, there are bursts of higher demand
    •Buckets of 1000 jobs
    •Ordered by arrival

    View Slide

  12. 12
    Intel Information Technology
    Intel Information Technology
    Heuristics for Resource Matching
    •Varying resource requirements can cause fragmentation
    •There are various heuristics to reduce fragmentation and
    improve utilization: Best-Fit / Worse-Fit
    •Let’s see an example

    View Slide

  13. 13
    Intel Information Technology
    Intel Information Technology
    Example
    • Machines A and B, each having 4 cores and 32 GB of
    memory
    • Assume that 8 jobs arrive at the PPM in the following
    order:
    • 2 jobs of 1 core and 16 GB of memory
    • Followed by 6 jobs of 1 core and 4 GB of memory

    View Slide

  14. 14
    Intel Information Technology
    Intel Information Technology
    Example
    • Machines A and B, each having 4 cores and 32 GB of
    memory
    • Assume that 8 jobs arrive at the PPM in the following
    order:
    • 2 jobs of 1 core and 16 GB of memory
    • Followed by 6 jobs of 1 core and 4 GB of memory

    View Slide

  15. 15
    Intel Information Technology
    Intel Information Technology
    Another Example
    • The order of arrival is:
    • 3 jobs of 1 core and 8 GB of memory
    • Followed by 1 job of 1 core and 32 GB of memory

    View Slide

  16. 16
    Intel Information Technology
    Intel Information Technology
    Another Example
    • The order of arrival is:
    • 3 jobs of 1 core and 8 GB of memory
    • Followed by 1 job of 1 core and 32 GB of memory

    View Slide

  17. 17
    Intel Information Technology
    Intel Information Technology
    Heuristics for Resource Matching
    •Different heuristics are optimal in different cases
    •At Intel, one dimensional heuristics are used
    •The are 4 options:
    •Best-Fit/Worse-Fit on Cores/Memory

    View Slide

  18. 18
    Intel Information Technology
    Intel Information Technology
    Comparing heuristics
    •The heuristics were compared by buckets
    •Buckets of 1000 jobs each by arrival order
    •Allocate the jobs in each bucket synthetically on 500
    empty cores
    •Jobs that were not allocated were ignored

    View Slide

  19. 19
    Intel Information Technology
    Intel Information Technology
    Comparing heuristics
    •The heuristics were compared by buckets
    •There is no single heuristic that gets to 100%

    View Slide

  20. 20
    Intel Information Technology
    Intel Information Technology
    Mix-Fit
    •As seen before, one dimensions heuristics are lack of
    information

    View Slide

  21. 21
    Intel Information Technology
    Intel Information Technology
    Mix-Fit
    •As seen before, one dimensions heuristics are lack of
    information
    •Mix-Fit is:
    •Trying to “Best-Fit” on two-dimensions

    View Slide

  22. 22
    Intel Information Technology
    Intel Information Technology
    Mix-Fit
    •Same bucket experiment
    •Yet, experiments shows “Mix-Fit” is not 100% also

    View Slide

  23. 23
    Intel Information Technology
    Intel Information Technology
    Max-Jobs
    •Yet, experiments shows “Mix-Fit” is not always the best
    heuristic
    •Max-Jobs means always take the best heuristic
    •Max-Jobs uses the heuristics as “black-box” algorithms
    •Each heuristic compute a mapping from jobs to hosts - the
    “schedule”
    Max-Jobs
    Best-Fit
    Worse-Fit
    Mix-Fit

    View Slide

  24. 24
    Intel Information Technology
    Intel Information Technology
    Simulation results
    •Java based event-driven simulator that we developed
    •Open source: https://code.google.com/p/batch-
    simulator/
    •Same workload that was described earlier:
    • 10-13 million jobs
    •1 month
    •4 of Intel largest pools

    View Slide

  25. 25
    Intel Information Technology
    Intel Information Technology
    Simulation results
    •Up to 22% reduction in wait time for jobs

    View Slide

  26. 26
    Intel Information Technology
    Intel Information Technology
    Simulation results
    •Up to 22% reduction in number of waiting jobs in PPM

    View Slide

  27. 27
    Intel Information Technology
    Intel Information Technology
    Conclusions
    •In this paper we investigated the problem of resource
    matching in Intel’s compute farm
    •Improvements to matching heuristics were suggested:
    •Heuristics focus on a single resource, either cores or memory à
    We implemented Mix-Fit
    •The nature of dynamically changing demands prevent a specific
    use case-tailored algorithm to be optimal for all cases à We
    suggest Max-Jobs meta-heuristic
    •Open source simulator:
    https://code.google.com/p/batch-simulator/

    View Slide

  28. 28
    Intel Information Technology
    Intel Information Technology

    View Slide