CONFIDENTIAL designator V0000000 BARE METAL 4 Spawning and running containers OCI specification - runc - containerd RUNTIME Prestart HOOKS Poststart Poststop Hooks can be used to enhance the functionality of a container runtime - mount files - configure cgroups
CONFIDENTIAL designator V0000000 BARE METAL 5 Spawning and running containers OCI specification - runc - containerd RUNTIME NVIDIA prestart hook Bind mount of - devices - binaries - libraries HOOKS NV prestart hook configures the container to use GPUs - mount files - configure cgroups
CONFIDENTIAL designator V0000000 BARE METAL Summary & Resources 12 Bare Metal Enablement https://github.com/NVIDIA/dgx-selinux RHEL SELinux Policy for NVIDIA https://www.redhat.com/en/blog/how-use-gpus -containers-bare-metal-rhel-8 How to enable NVIDIA GPUs in containers on bare metal in RHEL 8 https://github.com/zvonkok/oci-decorator Simple Prestart Hook Implementation
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS Use a MachineSet to scale the cluster with a GPU node
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS Heterogeneous cluster with different compute units
CONFIDENTIAL designator V0000000 OPENSHIFT NODE Prestart ACCELERATOR Poststart Poststop Every node could have features that are interesting to different pods GPU FPGA NIC AVX512 4.18.0-80.1.2 RHEL7, 8, RHCOS
CONFIDENTIAL designator V0000000 OPENSHIFT NODE Prestart ACCELERATOR Poststart Poststop Optimized workloads can be placed on the right node pci-10de=present pci-8086=present pci-1924=present cpuid-AVX512=true kernel-ver.major=4 os_release.ID=rhel POD POD POD POD POD POD
CONFIDENTIAL designator V0000000 BARE METAL Node Feature Discovery 19 Exposing node features the easy way since 4.2 https://github.com/kubernetes-sigs/node-featur e-discovery Upstream NFD & Operator https://github.com/openshift/node-feature-disc overy Downstream NFD & Operator https://www.openshift.com/blog/building-multiar ch-imagestream-with-the-nfd-operator-and-ope nshift-4 Building Multiarch imageStream with the NFD Operator and OpenShift 4
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container Every step has to be validated, makes no sense to advance Modules, userspace, hook Small workload using the acc. driver-container-validation
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container Device plugins does health checks and updates nodes capacity driver-container-validation device-plugin Modules, userspace, hook Small workload using the acc. Expose acc. to the cluster
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container A Pod must be able to allocate a extended resource driver-container-validation device-plugin device-plugin-validation Modules, userspace, hook Small workload using the acc. Expose acc. to the cluster Allocate resource and use it
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container Special resource node-exporter registering with the cluster stack driver-container-validation device-plugin device-plugin-validation monitoring Modules, userspace, hook Small workload using the acc. Expose acc. to the cluster Allocate resource and use it Setup Prometheus and Grafana, metrics and alerts
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container Discover advanced features of a special resource driver-container-validation device-plugin device-plugin-validation monitoring Modules, userspace, hook Small workload using the acc. Expose acc. to the cluster Allocate resource and use it Setup Prometheus and Grafana, metrics and alerts feature-discovery Sidecar container for NFD, fine grained scheduling
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container The SRO is a pattern to enable special resources in OpenShift driver-container-validation device-plugin device-plugin-validation monitoring Modules, userspace, hook Small workload using the acc. Expose acc. to the cluster Allocate resource and use it Setup Prometheus and Grafana, metrics and alerts SPECIAL RESOURCE OPERATOR feature-discovery Sidecar container for NFD, fine grained scheduling
CONFIDENTIAL designator V0000000 OPENSHIFT ACCELERATOR STACK driver-container The SRO is a pattern to enable special resources in OpenShift driver-container-validation device-plugin device-plugin-validation monitoring SPECIAL RESOURCE OPERATOR feature-discovery Hard or Soft Partitioning CONFIGURATIONS Driver Version Custom Manifests
CONFIDENTIAL designator V0000000 BARE METAL Special Resource Operator 35 Enable special resources the OpenShift way https://bit.ly/31utbkm Special Resource Operator https://red.ht/2XyOKz6 How to use entitled image builds to build DriverContainers with UBI on OpenShift https://red.ht/2JQuNwB Part 1: How to Enable Hardware Accelerators on OpenShift https://red.ht/34ubzq3 Part 2: How to enable Hardware Accelerators on OpenShift, SRO Building Blocks https://red.ht/34ubzq3 Simplifying deployments of accelerated AI workloads on Red Hat OpenShift with NVIDIA GPU Operator
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS Use the Cluster Autoscaler for on demand GPU nodes GPU Node GPU Node
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS Hard Partitioning - Taints and Tolerations GPU Node GPU Node GPU ONLY PODS
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS Hard/Soft Partitioning combining both GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS Per ns, multiple ns for CPU, MEM and extended resources GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU TRAINING Clustering nodes with specific roles GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS GPU Node GPU INFERENCE GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU TRAINING Pods can be repelled attracted or not scheduled with affinities GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS GPU Node GPU INFERENCE GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS CPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS INFRA ONLY PODS
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS Multus & high speed interconnects GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY CPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS INFRA ONLY PODS
CONFIDENTIAL designator V0000000 OPENSHIFT Master Node Master Node Master Node CPU Node GPU Node CPU Node CPU Node CONTROL PLANE CPU WORKERS GPU WORKERS RDMA over Infiniband or Ethernet GPU Node GPU Node GPU ONLY PODS HIGH PRIORITY LOW PRIORITY CPU ONLY PODS HIGH PRIORITY LOW PRIORITY QUOTAS INFRA ONLY PODS RDMA
CONFIDENTIAL designator V0000000 BARE METAL Special Resource Operator 46 Enable special resource the OpenShift way https://www.youtube.com/watch?v=TFP0oLG-ss 8&feature=youtu.be Running the NV flowers demo on OpenShift https://www.youtube.com/watch?v=usV_STdcM HY&feature=youtu.be Running RAPIDS with GPUs on OpenShift How to use GPUs with OKD 4.5 https://bit.ly/3gA73M5 Future Work: MIG Support, GPUDirect,
CONFIDENTIAL designator V0000000 linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat 47 Red Hat is the world’s leading provider of enterprise open source software solutions. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. Thank you