初めてのRancherシリーズ Longhorn Vol 2. Architecture deep dive

5da4f70462de256a566e68adfbb3ca03?s=47 Wenhan Shi
September 24, 2020

初めてのRancherシリーズ Longhorn Vol 2. Architecture deep dive

今回は「Longhorn編」第二回目のご紹介として、6月に行ったミートアップのおさらいと、さらに発展させ、アーキテクチャのDeep diveに焦点を当てた中級者向けセミナーをお送りいたします。

5da4f70462de256a566e68adfbb3ca03?s=128

Wenhan Shi

September 24, 2020
Tweet

Transcript

  1. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 1

    © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 1 初めてのRancherシリーズ Longhorn Vol 2. Architecture deep dive 24th Sep 2020 Wenhan Shi Support Engineer
  2. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 2

    ⾃⼰紹介 • 施 ⽂翰 (シ ブンカン) Wenhan Shi • @shi_wenhan • wenhan.shi@rancher.com • 経歴 • ⽇⽴製作所 - Linux カーネルモジュールの保守サポート • Red Hat K.K. - GlusterFS/OpenShift サポート • Canonical Japan K.K. - Ubuntu/OpenStack/Kubernetes サポート • Rancher Lab Inc, - Support Engineer
  3. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 3

    https://rancher.connpass.com/event/175715/ 前回のおさらい - Longhornとは • CloudNative環境向けの分散型Block Storage Software • 3rd June 2020から1.0 GA、今⽇の時点では1.0.2 • CNCF Sandboxプロジェクト • 軽量だが⾼信頼&使いやすい(単独Web UI) • Rancher カタログ・Kubectl・helmからインストール可能 • Certificated Kubernetesで利⽤可能 • Volume単位のSize Extend・Snapshot・Backup・Restore機能 • Backend Storage依存しない、Ext4 / xfsサポート
  4. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 4

    アーキテクチャー
  5. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 5

    アーキテクチャー Engine (Data Plane) Manager (Control Plane)
  6. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 6

    アーキテクチャー Worker node Longhorn node Longhorn node Pod Engine manager pod Pod Replica Manager Pod Replica Manager Pod Engine Process Replica Process Engine Process Replica Process Ext4/xfs Ext4/xfs Replica Process Ext4/xfs Replica Process Ext4/xfs PVC/PV Volume PVC/PV Volume Longhorn Manager (DS) Longhorn Manager (DS) Longhorn Manager (DS)
  7. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 7

    Longhorn Manager Longhorn Manager (Orchestrates all the volumes) Longhorn CSI Plugin Longhorn API Longhorn UI Longhorn API Kubernetes API Server Volume (CRD) Kubernetes Cluster Container Storage Interface API Engine- Replica Engine - Engine
  8. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 8

    Longhorn Manager • Longhorn Cluster DaemonSetでデプロイ • Longhorn UIまたはKubernetes CSIからオペレーションの指⽰を受ける • Longhorn Volumeが要求された時 • APIサーバーと通信 • CRDのVolumeを作成 • Volumeがアタッチされるノードに、Longhorn Engine Processを起動 • Volumeのデータを保存するとscheduleされたノードに、Longhorn replica Process を起動
  9. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 9

    Longhorn Engine - Engineモード • 1Volumeに対するEngineプロセスは1つ • Engineプロセスは最⼤100,000 • EngineはReplicaと接続し、VolumeのData Planeの機能を実現 • データをWriteする時、全てのReplicaにWrite処理を実⾏ • データをReadする時、⼀つのHealthy なReplicaを選んで処理を実⾏ • Longhorn Engine podがクラッシュした時に、 ⾃動回復可能 • Engineプロセスの再作成 • Replicaプロセスの関連付け • ただし、Pod内のremountは、Longhorn VolumeへのLiveness probeの設定が必要
  10. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 10

    Longhorn Engine - Replicaモード • 1 Engineプロセスに対するReplicaプロセスは複数 • ディスクと接続し、フォルダ内にVolumeのデータを保存 • パスはLonghorn UIで確認できる • 各Volumeは複数のReplicaを持つことができ、各Replicaは全部のデータを保存 • 障害の時に1つReplicaしか残っていなくても、全部のデータは復元可能 • 各Replicaに”最後Write時のTimestamp”という情報を持っていて、最新データが確認できる
  11. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 11

    Pod with Longhorn Volume -1 • 4node環境を⽤意し、そのうちの三つをLonghorn⽤ • Workloadを全部cp1ノードに ❯ kubectl get node NAME STATUS ROLES AGE VERSION longhorn-demo-cp1 Ready controlplane,etcd,worker 2d19h v1.18.8 longhorn-demo-worker1 Ready worker 2d19h v1.18.8 longhorn-demo-worker2 Ready worker 2d19h v1.18.8 longhorn-demo-worker3 Ready worker 2d19h v1.18.8 cp1 &(worker) Pod worker1 worker2 worker3
  12. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 12

    Pod with Longhorn Volume - 2 • StorageClassを指定し、PodからVolumeが利用可能 ❯ cat pod1.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-volv-pvc-1 spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 2Gi … ❯ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES longhorn-demo 1/1 Running 0 3m28s 10.42.0.24 longhorn-demo-cp1 <none> <none>
  13. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 13

    Pod with Longhorn Volume - 2 • volume/pv/pvcが自動で作ってくれる ❯ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE longhorn-volv-pvc-1 Bound pvc-fa8fb473-01d5-4544-92a1-11b122115905 2Gi RWO longhorn 29m ❯ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-fa8fb473-01d5-4544-92a1-11b122115905 2Gi RWO Delete Bound default/longhorn-volv-pvc-1 longhorn 29m root@longhorn-demo-cp1:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2G 0 disk /var/lib/kubelet/pods/a6db3565-c598-4fe0-8b75-d8c850be075f/volumes/……
  14. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 14

    Pod with Longhorn Volume - 3 • VolumeのReplicaは各Longhornノードの/var/lib/longhornに作られる
  15. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 15

    Pod with Longhorn Volume - 4 • ちょっと中身を覗いてみる root@longhorn-demo-worker1:/var/lib/longhorn# tree . ├── engine-binaries │ └── longhornio-longhorn-engine-v1.0.2 │ └── longhorn ├── longhorn-disk.cfg └── replicas ├── pvc-47892157-f218-4fed-ac2e-9b1505844bae-1998aacc │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta └── pvc-fa8fb473-01d5-4544-92a1-11b122115905-9c450d92 ├── revision.counter ├── volume-head-000.img ├── volume-head-000.img.meta └── volume.meta
  16. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 16

    Demo - Pod inside Longhorn Namespace $ kubectl get node NAME STATUS ROLES AGE VERSION wshi-all-in-one1 Ready controlplane,etcd,worker 2d4h v1.17.6 wshi-longhorn1 Ready worker 2d4h v1.17.6 wshi-longhorn2 Ready worker 2d4h v1.17.6 wshi-longhorn3 Ready worker 2d4h v1.17.6 $ kubectl get pod -n longhorn-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES … instance-manager-e-15bd5bd3 1/1 Running 0 2d4h 10.42.2.6 wshi-longhorn2 <none> <none> instance-manager-e-6bda88b5 1/1 Running 0 2d4h 10.42.1.5 wshi-longhorn1 <none> <none> instance-manager-e-a91a7513 1/1 Running 0 2d4h 10.42.0.14 wshi-all-in-one1 <none> <none> instance-manager-e-bbdfb559 1/1 Running 0 2d4h 10.42.3.12 wshi-longhorn3 <none> <none> instance-manager-r-ba3a2b7e 1/1 Running 0 2d4h 10.42.0.13 wshi-all-in-one1 <none> <none> instance-manager-r-c07dc8f5 1/1 Running 0 2d4h 10.42.3.11 wshi-longhorn3 <none> <none> instance-manager-r-f075b217 1/1 Running 0 2d4h 10.42.2.7 wshi-longhorn2 <none> <none> instance-manager-r-fffb8270 1/1 Running 0 2d4h 10.42.1.6 wshi-longhorn1 <none> <none> … longhorn-manager-2zw42 1/1 Running 0 2d4h 10.42.3.2 wshi-longhorn3 <none> <none> longhorn-manager-cjz8b 1/1 Running 0 2d4h 10.42.0.7 wshi-all-in-one1 <none> <none> longhorn-manager-jqn4r 1/1 Running 0 2d4h 10.42.2.3 wshi-longhorn2 <none> <none> longhorn-manager-xb2c7 1/1 Running 0 2d4h 10.42.1.3 wshi-longhorn1 <none> <none>
  17. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 17

    Demo - process inside Engine manager Pod $ kubectl exec instance-manager-e-a91a7513 -n longhorn-system -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4520 864 ? Ss Jun22 0:04 /tini -- engine-manager --debug daemon --listen 0.0.0.0:8500 root 7 0.1 0.1 1452792 6648 ? Sl Jun22 6:09 longhorn-instance-manager --debug daemon --listen 0.0.0.0:8500 root 9 0.0 0.1 1304072 5016 ? Sl Jun22 0:14 tgtd -f root 10 0.0 0.0 4536 776 ? S Jun22 0:00 tee /var/log/tgtd.log root 2629 0.3 0.5 860052 20276 ? Sl Jun22 9:17 /engine-binaries/longhornio-longhorn-engine-v1.0.0/longhorn controller pvc-bc36f842-f592-4eea-ae78-cdff01f0b893 --frontend tgt-blockdev --replica tcp://10.42.2.7:10000 --replica tcp://10.42.1.6:10000 --replica tcp://10.42.3.11:10000 --listen 0.0.0.0:10000 root 2648 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 2688 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 7415 0.3 0.5 933784 20880 ? Sl Jun22 9:20 /engine-binaries/longhornio-longhorn-engine-v1.0.0/longhorn controller pvc-db9d123e-332c-452d-8c04-d0041d717b3e --frontend tgt-blockdev --replica tcp://10.42.3.11:10015 --replica tcp://10.42.1.6:10015 --replica tcp://10.42.2.7:10015 --listen 0.0.0.0:10001 root 7425 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 7475 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 14039 0.0 0.0 34404 2872 ? Rs 06:37 0:00 ps aux
  18. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 18

    Demo - process inside Replica manager Pod $ kubectl exec instance-manager-r-c07dc8f5 -n longhorn-system -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4520 832 ? Ss Jun22 0:06 /tini -- longhorn-instance-manager --debug daemon --listen 0.0.0.0:8500 root 6 1.0 0.3 1747720 15672 ? Sl Jun22 32:14 longhorn-instance-manager --debug daemon --listen 0.0.0.0:8500 root 2567 0.1 0.4 933784 18392 ? Sl Jun22 3:53 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn replica /host/var/lib/longhorn/replicas/pvc-bc36f842-f592-4eea-ae78-cdff01f0b893-258cb451 --size 3221225472 --listen 0.0.0.0:10000 root 2574 0.0 0.4 786320 19352 ? Sl Jun22 2:30 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn sync-agent --listen 0.0.0.0:10002 --replica 0.0.0.0:10000 --listen-port-range 10003-10012 root 7204 0.1 0.7 1228712 30100 ? Sl Jun22 3:56 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn replica /host/var/lib/longhorn/replicas/pvc-db9d123e-332c-452d-8c04-d0041d717b3e-a3f5c04a --size 3221225472 --listen 0.0.0.0:10015 root 7209 0.0 0.4 786320 19176 ? Sl Jun22 2:29 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn sync-agent --listen 0.0.0.0:10017 --replica 0.0.0.0:10015 --listen-port-range 10018-10027 root 12632 0.0 0.0 34404 2916 ? Rs 06:42 0:00 ps aux
  19. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 19

    https://longhorn.io/docs/1.0.2/high-availability/recover-volume/ Demo - Kill Engine Manager pod Worker node Longhorn node Longhorn node Pod Engine manager pod Pod Replica Manager Pod Replica Manager Pod Engine Process Replica Process Engine Process Replica Process Ext4/xfs Ext4/xfs Replica Process Ext4/xfs Replica Process Ext4/xfs PVC Volume PVC Volume Longhorn Manager (DS) Longhorn Manager (DS) Longhorn Manager (DS) • エンジンPod自体が自動的に回復 • auto-remountも可能だが、liveness probeが必要 • 手動の場合、WorkloadのPodの Containerの再起動 CRASH
  20. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 20

    Demo - Kill replica Manager pod Worker node Longhorn node Longhorn node Pod Engine manager pod Pod Replica Manager Pod Replica Manager Pod Engine Process Replica Process Engine Process Replica Process Ext4/xfs Ext4/xfs Replica Process Ext4/xfs Replica Process Ext4/xfs PVC Volume PVC Volume Longhorn Manager (DS) Longhorn Manager (DS) Longhorn Manager (DS) • 他のReplicaプロセスが生きているので、 IOに影響なし CRASH
  21. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 21

    複数Replicaのどれが最新? • revision.counterに注目 root@longhorn-demo-worker1:/var/lib/longhorn# tree . ├── engine-binaries │ └── longhornio-longhorn-engine-v1.0.2 │ └── longhorn ├── longhorn-disk.cfg └── replicas ├── pvc-47892157-f218-4fed-ac2e-9b1505844bae-1998aacc │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta └── pvc-fa8fb473-01d5-4544-92a1-11b122115905-9c450d92 ├── revision.counter ├── volume-head-000.img ├── volume-head-000.img.meta └── volume.meta
  22. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 22

    複数Replicaのどれが最新? • このカウンターを参考し、どのReplicaのデータは最新なのかを判断 • 下記の例では、三つとも最新 worker1 revision.counter = 100 worker2 revision.counter = 100 worker3 revision.counter = 100 最 新 最 新 最 新
  23. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 23

    複数Replicaのどれが最新? • 最新ではないと判断された時、このReplicaへのIOは停止され、他のReplicaから最新のデー タをコピー worker1 revision.counter = 100 worker2 revision.counter = 98 worker3 revision.counter = 100 最 新 最 新 最新データ
  24. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 24

    複数Replicaのどれが最新? • 他のReplicaがDownした場合、生きているReplicaが一つだけでもIOは継続 worker1 revision.counter = 100 worker2 revision.counter = 98 worker3 revision.counter = 95 最 新 CRASH CRASH
  25. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 25

    複数Replicaのどれが最新? • Downしたノードが復活された時、最新Replicaから最新データのSyncを行う worker1 revision.counter = 100 worker2 revision.counter = 98 worker3 revision.counter = 95 最 新 CRASH 最新データ
  26. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 26

    revision.counteの将来(仮) • パフォーマンスの理由で、無くなる可能性がある • https://github.com/longhorn/longhorn/issues/508
  27. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 27

    revision.counteの将来(仮) • volume-head-000.img のLast Modified Timeを使って最新Replicaを判断 root@longhorn-demo-worker1:/var/lib/longhorn# tree . ├── engine-binaries │ └── longhornio-longhorn-engine-v1.0.2 │ └── longhorn ├── longhorn-disk.cfg └── replicas ├── pvc-47892157-f218-4fed-ac2e-9b1505844bae-1998aacc │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta └── pvc-fa8fb473-01d5-4544-92a1-11b122115905-9c450d92 ├── revision.counter ├── volume-head-000.img ├── volume-head-000.img.meta └── volume.meta
  28. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 28

    Information • Homepage • https://longhorn.io/ • Documentation • https://longhorn.io/docs/1.0.0/ • Roadmap • https://github.com/longhorn/longhorn/wiki/Roadmap • Development updates • https://github.com/longhorn/longhorn/milestones/
  29. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 29

    Thank you
  30. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 30

    30