KEP-3063: Dynamic resource allocation

KEP- 3 0 6 3 : Dynamic resource allocation KEP持ち寄り会(
2 0 2 3 / 1 1 / 2 7 ) #kepjp @bells 1 7

▶ @bells 1 7 ▶ Software Engineer@ 3 -shake inc.
▶ kubernetes & kubernetes-csi member ▶ Kubernetes Internal Organizer ▶ #kubenews ▶ X(Twitter): @bells 1 7 _ ▶ GitHub: @bells 1 7

KEP- 3 0 6 3 : Dynamic resource allocation

この発表より内容がちゃんとまとまってます @toVersus さん最⾼です https://zenn.dev/toversus/articles/fe 2 aa 0 6 f
1 3 3 b 4 9 答え: この記事を読みましょう

Dynamic resource allocation(DRA) とは？ ▶ Kubernetes v 1 . 2
6 で追加された alpha API の1つ ▶ ワークロード向けに様々なデバイス(リソース)を設定~割り当てるための機能 ▶ 例えばGPUやFPGAといった様々なデバイスをより柔軟にPodに割り当てるために使ったり ▶ 次のようなことが可能になる + 異なるポッドおよびコンテナ内から同じリソースへのアクセス + リソース要求に応じた最適なリソースを割り当て + ユーザーが指定したパラメータに従ってリソースの初期化を実⾏ https://kubernetes.io/blog/ 2 0 2 2 / 1 2 / 1 5 /dynamic-resource-allocation/

Device Pluginを使えば GPUとかって普通に使えるじゃん？

DRAを使うとDevice Pluginではできなかったこんなことができる ▶ 異なるポッドおよびコンテナ内から同じリソースへのアクセス ▶ FPGA などをワークロード⽤に再構成/再プログラミングする ▶ ワークロード終了時にワークロード向けのデバイス設定をクリーンアップ ▶
リソースの部分割り当て(e.g. 要求に応じたNVIDIA GPUのMIGデバイスの動的割り当て) ▶ デバイスへのオプションの割り当て ▶ ネットワーク‧ファブリック経由のデバイスのサポート ▶ デバイス単位の初期化の実⾏ https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/ 3 0 6 3 -dynamic-resource-allocation/README.md

DRA Driver(Plugin)の実装例

調べて出てきたDRA Pluginは下記の3つ ▶ kubernetes-sigs/dra-example-driver: サンプル実装 + コンテナにGPUを割り当てる⾵のサンプルとなっているけど、実際にはGPUが無くても kindさえあれば動かすことができるのでDRA Pluginについて学びたい場合は最適 ▶
NVIDIA/k 8 s-dra-driver: NVIDIA製GPU向けプラグイン + リポジトリのリンク先にあるGoogle Docsでどのように動作するのかの資料があったりデモ動画があったりとわかりやすい + ちなみに NVIDIA/gpu-operator はノード上のGPUをプロビジョニングするためのものなので別物らしい ▶ intel/intel-resource-drivers-for-kubernetes: Intel製GPU向けプラグイン + intel/intel-device-plugins-for-kubernetes を⾒るとFPGA/QAT/SGX/SGX/DLB/IAAなど様々なデバイスプラグインが存在するので、そのうちこいつらもサポートされるのかも

dra-example-driver ▶ https://github.com/kubernetes-sigs/dra-example-driver にResource Driverの   サンプルがある ▶ コンテナにGPUを割り当てる⾵のサンプルとなっている ▶
サンプル実装なので、実際にGPUを割り当てているわけではない ▶ 主に「複数Pod/コンテナに対して1つのGPUを共有する」といった例を⾒せてくれる   サンプルとなっている ▶ kindがあれば試すことができるのでまずはさわってみるのがオススメです

dra-example-driverで検証できること https://github.com/kubernetes-sigs/dra-example-driver/blob/main/demo/demo-apps.png

DRA Driver(Plugin)のアーキテクチャ

コンポーネント図 https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/ 3 0 6 3 -dynamic-resource-allocation/components.png

Podを起動開始するまでの流れ AllocationMode=WaitForFirstConsumerのケースの流れです(Immediateの場合は先にAllocateしてからschedulerというフロー)

ボリュームの場合kubeletのVolume Managerでボリュームセットアップを待機してる

kubeletがPodを実際に起動するまでの流れちなみに登場してないresource claim controllerくんはResourceClaimTemplateからResourceClaimの⽣成などを担当してくれています

Kubelet Plugin Watcher

Kubelet ~ CSI Drier間の通信

Container Device Interface(CDI) ▶ サードパーティのデバイスをサポートするためのコンテナランタイムの仕様 ▶ containerdの場合は”enable_cdi”を有効にすると利⽤することができる ▶ デフォルトだと/etc/cdi
or /var/run/cdi配下にこのCDIの仕様に沿った設定ファイルを   設置するよう ▶ 設定ファイルの内容に沿ってコンテナの起動設定をアップデートできるよう ▶ アップデートできるフィールドについてはSPECのOCI Editsに詳しく書いてある https://github.com/cncf-tags/container-device-interface

$ kubectl exec -it -n gpu-test4 pod0 -- env |
grep GPU GPU_NODE_NAME=dra-example-driver-cluster-worker GPU_DEVICE_0=GPU-f11773a1-5bfb-e48b-3d98-1beb5baaf08e GPU_DEVICE_1=GPU-93d37703-997c-c46f-a531-755e3e0dc2ac GPU_DEVICE_2=GPU-e7b42cb1-4fd8-91b2-bc77-352a0c1f5747 GPU_DEVICE_3=GPU-ee3e4b55-fcda-44b8-0605-64b7a9967744 dra-example-driverのコンテナ内部を⾒てみると   環境変数が設定されているのが確認できる   → これがこのデモにおけるデバイスの割り当てを表しているよう “GPU_DEVICE_x”という環境変数にはそれぞれドライバーが適当に⽣成したUUIDがセットされている

$ docker exec -it d67f749149f5 /bin/bash root@dra-example-driver-cluster-worker:/# cat /var/run/cdi/k8s.gpu.resource.example.com- gpu_4cae1474-ed97-4edf-876e-fa73c3af279b.yaml
cdiVersion: 0.3.0 containerEdits: {} devices: - containerEdits: env: - GPU_DEVICE_0=GPU-f11773a1-5bfb-e48b-3d98-1beb5baaf08e name: GPU-f11773a1-5bfb-e48b-3d98-1beb5baaf08e - containerEdits: env: - GPU_DEVICE_1=GPU-93d37703-997c-c46f-a531-755e3e0dc2ac name: GPU-93d37703-997c-c46f-a531-755e3e0dc2ac - containerEdits: env: - GPU_DEVICE_2=GPU-e7b42cb1-4fd8-91b2-bc77-352a0c1f5747 name: GPU-e7b42cb1-4fd8-91b2-bc77-352a0c1f5747 - containerEdits: env: - GPU_DEVICE_3=GPU-ee3e4b55-fcda-44b8-0605-64b7a9967744 name: GPU-ee3e4b55-fcda-44b8-0605-64b7a9967744 kind: k8s.gpu.resource.example.com/gpu この環境変数はどうやら/var/run/cdi配下にあるファイルで設定されているらしい

func (c *criService) containerSpecOpts(config *runtime.ContainerConfig, imageConfig *imagespec.ImageConfig) ([]oci.SpecOpts, error) {
var specOpts []oci.SpecOpts ... if c.config.EnableCDI { specOpts = append(specOpts, customopts.WithCDI(config.Annotations, config.CDIDevices)) } return specOpts, nil } containerdのコードを⾒てみるとenable_cdiを有効にすることで   OCIコンテナのSpecにデバイス設定を追加してくれるのがわかる https://github.com/containerd/containerd/blob/v 1 . 7 . 9 /pkg/cri/server/container_create_linux.go#L 4 1 7 -L 4 1 9

// Apply edits to the given OCI Spec. Updates the
OCI Spec in place. // Returns an error if the update fails. func (e *ContainerEdits) Apply(spec *oci.Spec) error { if spec == nil { return errors.New("can't edit nil OCI Spec") } if e == nil || e.ContainerEdits == nil { return nil } specgen := ocigen.NewFromSpec(spec) if len(e.Env) > 0 { specgen.AddMultipleProcessEnv(e.Env) } ... return nil } 最終的にこのあたりでenvを追加してる https://github.com/cncf-tags/container-device-interface/blob/main/pkg/cdi/container-edits.go#L 7 0 -L 1 4 8

仕様については⾊々書いてあってちゃんと理解するのはめんどいけど、 containerdのコードを読む限りOCI Runtime Specファイルにちょいと⼿を加えてるだけなのがわかるので、コードベースで⾒ると理解がしやすい

--- apiVersion: resource.k8s.io/v1alpha2 kind: ResourceClaim metadata: namespace: gpu-test4 name: multiple-gpus
spec: spec: resourceClassName: gpu.example.com parametersRef: apiGroup: gpu.resource.example.com kind: GpuClaimParameters name: multiple-gpus --- apiVersion: gpu.resource.example.com/v1alpha1 kind: GpuClaimParameters metadata: namespace: gpu-test4 name: multiple-gpus spec: count: 4 ResourceClaimやResourceClassのparametersRefとCRDなどの任意のリソースを利⽤して   リソース(デバイス)の設定のためのパラメータを渡すことが可能になっている

DRA Driver(Plugin)のアーキテクチャまとめ ▶ DRA Driverは下記の2つで構成 + controller: ResourceClaimに対して選ばれたNodeとリソース(デバイス)要求に対する   ”(De)Allocate”を通してデバイスのセットアップ/クリーンアップを⾏うことができる
+ kubeplugin: Pod起動/停⽌時に“Node(Un)PrepareResources”を通してCDI Spec   ファイルを通してコンテナ(ランタイム)にデバイスの情報を共有することができる ▶ DRAに伴って追加‧拡張されたリソースは以下 + Pod(resourceClaim/resourcesフィールドなど) + ResourceClass: ResourceClaimで使⽤するDRA Driverの設定やパラメータ設定で使⽤ + ResourceClaim: PVCのリソース(デバイス)版 + ResourceClaimTemplate: ResourceClaimのテンプレートリソース + PodSchedulingContext: WaitForFirstConsumer時のスケジュール処理⽤リソース

つまりDRAの仕組みはだいたいボリュームプラグイン+CSI Driver

実装の中⾝については初期実装のPRを追うと理解しやすいです https://github.com/kubernetes/kubernetes/pull/ 1 1 1 0 2 3

DRAの今後

https://groups.google.com/a/kubernetes.io/g/dev/c/BDtCFfXQbw 0 ?pli= 1

DRAの今後 ▶ DRAはCluster Auotscalerとも連携できるようになる予定らしい ▶ その上でアーキテクチャの再整理が⼊るかも ▶ 今後いろんなsigの⼈が⼊って積極的に開発が⾏われていきそう

開発タスクも⽬⽩押しな状況のよう https://github.com/orgs/kubernetes/projects/ 9 5 /views/ 1

まとめ

まとめ ▶ Dynamic Resource Allocation(DRA)はリソース(デバイス)を動的に設定してコンテナに   割り当てることができる機能でDevice Pluginよりも柔軟な設定が可能な機能です ▶ DRAの登場でDevice
Pluginが無くなるわけではなく、必要性に応じてDevice Pluginと   共存していく流れらしいです ▶ 実装‧設計ともに⼤きすぎず⼩さすぎずで適度な読み応えの内容だったので読んでて   ⾯⽩かったしGPU周りの勉強にもなりました ▶ 今後もしかしたらDRA v 2 くらい実装が変わるかもしれないので楽しみです ▶ コントリビューションチャンスも多そうなので注⽬ですね

参考資料 ▶ https://zenn.dev/toversus/articles/fe 2 aa 0 6 f 1 3
3 b 4 9 ▶ https://kubernetes.io/blog/ 2 0 2 2 / 1 2 / 1 5 /dynamic-resource-allocation/ ▶ https://github.com/kubernetes/enhancements/blob/master/keps/ sig-node/ 3 0 6 3 -dynamic-resource-allocation/README.md ▶ https://github.com/kubernetes-sigs/dra-example-driver/blob/main/ demo/demo-apps.png ▶ https://github.com/kubernetes/enhancements/blob/master/keps/ sig-node/ 3 0 6 3 -dynamic-resource-allocation/components.png ▶ https://github.com/cncf-tags/container-device-interface ▶ https://github.com/containerd/containerd/blob/v 1 . 7 . 9 /pkg/cri/ server/container_create_linux.go#L 4 1 7 -L 4 1 9 ▶ https://github.com/cncf-tags/container-device-interface/blob/main/ pkg/cdi/container-edits.go#L 7 0 -L 1 4 8 ▶ https://github.com/kubernetes/enhancements/blob/master/keps/ sig-node/ 3 0 6 3 -dynamic-resource-allocation/README.md ▶ https://github.com/kubernetes/kubernetes/pull/ 1 1 1 0 2 3 ▶ https://github.com/orgs/kubernetes/projects/ 9 5 /views/ 1 ▶ https://github.com/kubernetes/dynamic-resource-allocation ▶ https://www.cncf.io/projects/akri/ ▶ https://github.com/kubernetes-sigs/dra-example-driver ▶ https://github.com/NVIDIA/k 8 s-dra-driver ▶ https://github.com/intel/intel-resource-drivers-for-kubernetes ▶ https://github.com/intel/intel-device-plugins-for-kubernetes ▶ https://docs.google.com/document/d/ 1 BNWqgx_SmZDi- va_V 3 1 v 3 DnuVwYnF 2 EmN 7 D-O_fB 6 Oo/edit#heading=h.bxuci 8 gx 6 hna ▶ https://drive.google.com/ fi le/d/ 1 iLg 2 FEAEilb 1 dcI 2 7 TnB 1 9 VYtbcvgKhS/view ▶ https://developer.nvidia.com/blog/nvidia-gpu-operator-simplifying-gpu- management-in-kubernetes/ ▶ https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/ overview.html ▶ https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/ cdi.html ▶ https://intel.github.io/intel-device-plugins-for-kubernetes/README.html ▶ https://github.com/NVIDIA/k 8 s-device-plugin ▶ https://blogs.nvidia.com/blog/multi-instance-gpus/ ▶ https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/ ▶ https://groups.google.com/a/kubernetes.io/g/dev/c/BDtCFfXQbw 0 ?pli= 1 ▶ https://kubernetes.slack.com/archives/C 0 3 2 ZE 6 6 A 2 X/p 1 7 0 0 2 1 5 1 9 0 4 2 9 6 8 9 ▶ https://kubernetes.slack.com/archives/C 0 3 2 ZE 6 6 A 2 X/p 1 7 0 0 2 1 5 1 9 0 4 2 9 6 8 9

画像引⽤元 ▶ https://github.com/kubernetes/community/tree/master/icons ▶ https://github.com/kubernetes/kubernetes/tree/master/logo ▶ https://github.com/cncf/artwork/tree/master/projects/kubernetes ▶ https://github.com/kubernetes/kubeadm/tree/main/logos

Thanks / Question? ▶ @bells 1 7 ▶ Slide: https://speakerdeck.com/bells
1 7 ▶ @bells 1 7 _

KEP-3063: Dynamic resource allocation

KEP-3063: Dynamic resource allocation

More Decks by bells17

Other Decks in Programming

Featured

Transcript