Building Bridges: Cloud Native
and High Performance Computing
Ricardo Rocha, CERN @ahcorporto
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
“ High Performance Computing most generally refers to the practice
of aggregating computing power in a way that delivers much higher
performance than one could get out of a typical desktop computer or
workstation in order to solve large problems in science, engineering,
or business. “
Slide 5
Slide 5 text
“ High Performance Computing most generally refers to the practice
of aggregating computing power in a way that delivers much higher
performance than one could get out of a typical desktop computer or
workstation in order to solve large problems in science, engineering,
or business. “
Slide 6
Slide 6 text
Source: https://www.top500.org/ November 2021
Slide 7
Slide 7 text
Source: https://www.top500.org/ November 2021
Slide 8
Slide 8 text
Low
Latency
High
Throughput
NUMA
Awareness
Advanced
Scheduling
Software
Distribution
Slide 9
Slide 9 text
“ High Throughput Computing is a computing paradigm that
focuses on the efficient execution of a large number of
loosely-coupled tasks. HTC systems are independent, sequential
jobs that can be individually scheduled on many different
computing resources across multiple administrative boundaries “
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
Low
Latency
High
Throughput
NUMA
Awareness
Advanced
Scheduling
Software
Distribution
Slide 12
Slide 12 text
Low
Latency
High
Throughput
NUMA
Awareness
Advanced
Scheduling
Software
Distribution
Cross
Boundaries
Slide 13
Slide 13 text
Workloads, not just Pods
Queues and Priorities, not all workloads are equal
Fair Sharing, resource usage optimization
Gang Scheduling, Array jobs
Advanced Scheduling and Batch
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Cluster
Queue
Cluster
Queue
Cluster
Queue
Cluster
Queue
Queue
Queue Queue Queue
ATLAS CMS ALICE LHCb
Cluster
Queue
ATLAS CMS ALICE LHCb
SIG Scheduling: Kueue
https://github.com/kubernetes-sigs/kueue
Slide 19
Slide 19 text
The Last Mile
Kubernetes Batch WG
https://github.com/kubernetes/community/tree/master/wg-batch
CNCF Batch System Initiative
https://github.com/cncf/tag-runtime/issues/38
CNCF Research User Group
https://community.cncf.io/research-end-user-group/