DevOps Company • Located in Zürich, • Founded 2014 by ETHZ alumni • Currently 33 VSHNeers • First Kubernetes Certified Provider in Switzerland • Authorized Docker Consulting Partner
performance for universities, research and teaching – was further developed in the SCALE-UP project mandated by swissuniversities. Your benefits • Your data in Switzerland • Integrated network and security • Support for academic use cases • Simple administration and billing • Created together with you Customers • Universities • Research institutions • eLearning Center • University hospitals • Spin-Offs Services • SWITCHengines (IaaS) • Virtual Private Cloud (VPC) • SCALE-UP (academic project)
Let's get technical! – technical details GPU in OpenStack VMs – technical details about exposing said GPUs to Containers • Use Cases • Where are we going with this?
of users at universities of applied science (classroom) – Specialised training (Bioinformatics) – Bachelor / Masters projects • Research – Across universities – SDSC • Enterprise – business continuity – off site storage – datacenter migrations
term storage of scientific data (LTS Project) • 5-10+ years, luke warm storage, S3 interface • off-site, regulatory demands, "Vault" • Initial procurement of 3 PB underway • Paid service in 2020 • Contact: [email protected] – Backups etc. • increasing demand from many customers • GPU – that's why we are here today
them know to nova /etc/nova/nova.conf pci_passthrough_whitelist={"vendor_id":"10de"} pci_alias={"name":"P100","vendor_id":"10de","product_id":"15f8"} pci_alias={"name":"P100-12GB","vendor_id":"10de","product_id":"15f7"} pci_alias={"name":"TitanXpVGA","vendor_id":"10de","product_id":"1b02"} pci_alias={"name":"TitanXpAudio","vendor_id":"10de","product_id":"10ef"} pci_alias={"name":"T4","vendor_id":"10de","device_type":"type-PF", "product_id":"1eb8"}
shouldn't) NVIDIA doesn't want you to run them virtualized #459753 adds support for a img_hide_hypervisor_id image property. This is included in OpenStack Pike and above. #555861 adds support for a hide_hypervisor_id flavor property. This is included in OpenStack Rocky and above. openstack flavor create --private \ --ram 47104 --disk 30 --vcpus 4 \ --property pci_passthrough:alias='TitanXpVGA:1,TitanXpAudio:1' \ --property hide_hypervisor_id=true \ g1.c04r46-1titanxp
`limits` section – Can specify `limits` without `requests` – `limits` and `requests` must be equal – Cannot specify `requests` without `limits` • GPUs cannot be shared across containers and pods • Each container can only request one or more GPUs – No "fractions" of GPUs
drivers must be pre-installed in nodes – https://github.com/NVIDIA/nvidia-docker • nvidia-container-runtime must be configured as the default runtime for docker instead of runc – https://github.com/NVIDIA/k8s-device-plugin • NVIDIA drivers ~= 361.93
Does not require nvidia-docker • Compatible with any CRI • Installation kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/contai ner-engine-accelerators/stable/daemonset.yaml More info: https://github.com/GoogleCloudPlatform/container-engine-accelerators
GPU oc create -f cuda-vector-add.yaml oc get pods NAME READY STATUS RESTARTS AGE cuda-vector-add 0/1 Completed 0 3s nvidia-device-plugin-daemonset-s9ngg 1/1 Running 0 9m oc logs cuda-vector-add [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
SDSC • Various Machine Learning / Deep Learning projects • GPUs available via https://renkulab.io/ (on containers, implemented by SDSC) or on VMs FHNW • Various Master projects Various test instances
cloud? Usage patterns? Desired delivery? (VM, Container) K8s or OpenShift? Should we continue building it? Will you use it? Discreet Records [Public domain]
TitanXP: CHF 0.50 / hour – Tesla T4: CHF 0.75 / hour – P100: CHF 1.00 / hour + cost of VM (discounts with higher usage) • Available to VM projects in ZH Region • OpenShift powered by Appuio and OpenStack – Service in 2020