Upgrade to Pro — share decks privately, control downloads, hide ads and more …

初めてのRancherシリーズ Longhorn Vol 2. Architecture d...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for Wenhan Shi Wenhan Shi
September 24, 2020

初めてのRancherシリーズ Longhorn Vol 2. Architecture deep dive

今回は「Longhorn編」第二回目のご紹介として、6月に行ったミートアップのおさらいと、さらに発展させ、アーキテクチャのDeep diveに焦点を当てた中級者向けセミナーをお送りいたします。

Avatar for Wenhan Shi

Wenhan Shi

September 24, 2020
Tweet

More Decks by Wenhan Shi

Other Decks in Technology

Transcript

  1. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 1

    © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 1 初めてのRancherシリーズ Longhorn Vol 2. Architecture deep dive 24th Sep 2020 Wenhan Shi Support Engineer
  2. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 2

    ⾃⼰紹介 • 施 ⽂翰 (シ ブンカン) Wenhan Shi • @shi_wenhan • [email protected] • 経歴 • ⽇⽴製作所 - Linux カーネルモジュールの保守サポート • Red Hat K.K. - GlusterFS/OpenShift サポート • Canonical Japan K.K. - Ubuntu/OpenStack/Kubernetes サポート • Rancher Lab Inc, - Support Engineer
  3. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 3

    https://rancher.connpass.com/event/175715/ 前回のおさらい - Longhornとは • CloudNative環境向けの分散型Block Storage Software • 3rd June 2020から1.0 GA、今⽇の時点では1.0.2 • CNCF Sandboxプロジェクト • 軽量だが⾼信頼&使いやすい(単独Web UI) • Rancher カタログ・Kubectl・helmからインストール可能 • Certificated Kubernetesで利⽤可能 • Volume単位のSize Extend・Snapshot・Backup・Restore機能 • Backend Storage依存しない、Ext4 / xfsサポート
  4. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 5

    アーキテクチャー Engine (Data Plane) Manager (Control Plane)
  5. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 6

    アーキテクチャー Worker node Longhorn node Longhorn node Pod Engine manager pod Pod Replica Manager Pod Replica Manager Pod Engine Process Replica Process Engine Process Replica Process Ext4/xfs Ext4/xfs Replica Process Ext4/xfs Replica Process Ext4/xfs PVC/PV Volume PVC/PV Volume Longhorn Manager (DS) Longhorn Manager (DS) Longhorn Manager (DS)
  6. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 7

    Longhorn Manager Longhorn Manager (Orchestrates all the volumes) Longhorn CSI Plugin Longhorn API Longhorn UI Longhorn API Kubernetes API Server Volume (CRD) Kubernetes Cluster Container Storage Interface API Engine- Replica Engine - Engine
  7. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 8

    Longhorn Manager • Longhorn Cluster DaemonSetでデプロイ • Longhorn UIまたはKubernetes CSIからオペレーションの指⽰を受ける • Longhorn Volumeが要求された時 • APIサーバーと通信 • CRDのVolumeを作成 • Volumeがアタッチされるノードに、Longhorn Engine Processを起動 • Volumeのデータを保存するとscheduleされたノードに、Longhorn replica Process を起動
  8. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 9

    Longhorn Engine - Engineモード • 1Volumeに対するEngineプロセスは1つ • Engineプロセスは最⼤100,000 • EngineはReplicaと接続し、VolumeのData Planeの機能を実現 • データをWriteする時、全てのReplicaにWrite処理を実⾏ • データをReadする時、⼀つのHealthy なReplicaを選んで処理を実⾏ • Longhorn Engine podがクラッシュした時に、 ⾃動回復可能 • Engineプロセスの再作成 • Replicaプロセスの関連付け • ただし、Pod内のremountは、Longhorn VolumeへのLiveness probeの設定が必要
  9. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 10

    Longhorn Engine - Replicaモード • 1 Engineプロセスに対するReplicaプロセスは複数 • ディスクと接続し、フォルダ内にVolumeのデータを保存 • パスはLonghorn UIで確認できる • 各Volumeは複数のReplicaを持つことができ、各Replicaは全部のデータを保存 • 障害の時に1つReplicaしか残っていなくても、全部のデータは復元可能 • 各Replicaに”最後Write時のTimestamp”という情報を持っていて、最新データが確認できる
  10. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 11

    Pod with Longhorn Volume -1 • 4node環境を⽤意し、そのうちの三つをLonghorn⽤ • Workloadを全部cp1ノードに ❯ kubectl get node NAME STATUS ROLES AGE VERSION longhorn-demo-cp1 Ready controlplane,etcd,worker 2d19h v1.18.8 longhorn-demo-worker1 Ready worker 2d19h v1.18.8 longhorn-demo-worker2 Ready worker 2d19h v1.18.8 longhorn-demo-worker3 Ready worker 2d19h v1.18.8 cp1 &(worker) Pod worker1 worker2 worker3
  11. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 12

    Pod with Longhorn Volume - 2 • StorageClassを指定し、PodからVolumeが利用可能 ❯ cat pod1.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-volv-pvc-1 spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 2Gi … ❯ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES longhorn-demo 1/1 Running 0 3m28s 10.42.0.24 longhorn-demo-cp1 <none> <none>
  12. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 13

    Pod with Longhorn Volume - 2 • volume/pv/pvcが自動で作ってくれる ❯ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE longhorn-volv-pvc-1 Bound pvc-fa8fb473-01d5-4544-92a1-11b122115905 2Gi RWO longhorn 29m ❯ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-fa8fb473-01d5-4544-92a1-11b122115905 2Gi RWO Delete Bound default/longhorn-volv-pvc-1 longhorn 29m root@longhorn-demo-cp1:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2G 0 disk /var/lib/kubelet/pods/a6db3565-c598-4fe0-8b75-d8c850be075f/volumes/……
  13. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 14

    Pod with Longhorn Volume - 3 • VolumeのReplicaは各Longhornノードの/var/lib/longhornに作られる
  14. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 15

    Pod with Longhorn Volume - 4 • ちょっと中身を覗いてみる root@longhorn-demo-worker1:/var/lib/longhorn# tree . ├── engine-binaries │ └── longhornio-longhorn-engine-v1.0.2 │ └── longhorn ├── longhorn-disk.cfg └── replicas ├── pvc-47892157-f218-4fed-ac2e-9b1505844bae-1998aacc │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta └── pvc-fa8fb473-01d5-4544-92a1-11b122115905-9c450d92 ├── revision.counter ├── volume-head-000.img ├── volume-head-000.img.meta └── volume.meta
  15. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 16

    Demo - Pod inside Longhorn Namespace $ kubectl get node NAME STATUS ROLES AGE VERSION wshi-all-in-one1 Ready controlplane,etcd,worker 2d4h v1.17.6 wshi-longhorn1 Ready worker 2d4h v1.17.6 wshi-longhorn2 Ready worker 2d4h v1.17.6 wshi-longhorn3 Ready worker 2d4h v1.17.6 $ kubectl get pod -n longhorn-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES … instance-manager-e-15bd5bd3 1/1 Running 0 2d4h 10.42.2.6 wshi-longhorn2 <none> <none> instance-manager-e-6bda88b5 1/1 Running 0 2d4h 10.42.1.5 wshi-longhorn1 <none> <none> instance-manager-e-a91a7513 1/1 Running 0 2d4h 10.42.0.14 wshi-all-in-one1 <none> <none> instance-manager-e-bbdfb559 1/1 Running 0 2d4h 10.42.3.12 wshi-longhorn3 <none> <none> instance-manager-r-ba3a2b7e 1/1 Running 0 2d4h 10.42.0.13 wshi-all-in-one1 <none> <none> instance-manager-r-c07dc8f5 1/1 Running 0 2d4h 10.42.3.11 wshi-longhorn3 <none> <none> instance-manager-r-f075b217 1/1 Running 0 2d4h 10.42.2.7 wshi-longhorn2 <none> <none> instance-manager-r-fffb8270 1/1 Running 0 2d4h 10.42.1.6 wshi-longhorn1 <none> <none> … longhorn-manager-2zw42 1/1 Running 0 2d4h 10.42.3.2 wshi-longhorn3 <none> <none> longhorn-manager-cjz8b 1/1 Running 0 2d4h 10.42.0.7 wshi-all-in-one1 <none> <none> longhorn-manager-jqn4r 1/1 Running 0 2d4h 10.42.2.3 wshi-longhorn2 <none> <none> longhorn-manager-xb2c7 1/1 Running 0 2d4h 10.42.1.3 wshi-longhorn1 <none> <none>
  16. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 17

    Demo - process inside Engine manager Pod $ kubectl exec instance-manager-e-a91a7513 -n longhorn-system -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4520 864 ? Ss Jun22 0:04 /tini -- engine-manager --debug daemon --listen 0.0.0.0:8500 root 7 0.1 0.1 1452792 6648 ? Sl Jun22 6:09 longhorn-instance-manager --debug daemon --listen 0.0.0.0:8500 root 9 0.0 0.1 1304072 5016 ? Sl Jun22 0:14 tgtd -f root 10 0.0 0.0 4536 776 ? S Jun22 0:00 tee /var/log/tgtd.log root 2629 0.3 0.5 860052 20276 ? Sl Jun22 9:17 /engine-binaries/longhornio-longhorn-engine-v1.0.0/longhorn controller pvc-bc36f842-f592-4eea-ae78-cdff01f0b893 --frontend tgt-blockdev --replica tcp://10.42.2.7:10000 --replica tcp://10.42.1.6:10000 --replica tcp://10.42.3.11:10000 --listen 0.0.0.0:10000 root 2648 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 2688 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 7415 0.3 0.5 933784 20880 ? Sl Jun22 9:20 /engine-binaries/longhornio-longhorn-engine-v1.0.0/longhorn controller pvc-db9d123e-332c-452d-8c04-d0041d717b3e --frontend tgt-blockdev --replica tcp://10.42.3.11:10015 --replica tcp://10.42.1.6:10015 --replica tcp://10.42.2.7:10015 --listen 0.0.0.0:10001 root 7425 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 7475 0.0 0.0 0 0 ? Z Jun22 0:00 [sleep] <defunct> root 14039 0.0 0.0 34404 2872 ? Rs 06:37 0:00 ps aux
  17. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 18

    Demo - process inside Replica manager Pod $ kubectl exec instance-manager-r-c07dc8f5 -n longhorn-system -- ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4520 832 ? Ss Jun22 0:06 /tini -- longhorn-instance-manager --debug daemon --listen 0.0.0.0:8500 root 6 1.0 0.3 1747720 15672 ? Sl Jun22 32:14 longhorn-instance-manager --debug daemon --listen 0.0.0.0:8500 root 2567 0.1 0.4 933784 18392 ? Sl Jun22 3:53 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn replica /host/var/lib/longhorn/replicas/pvc-bc36f842-f592-4eea-ae78-cdff01f0b893-258cb451 --size 3221225472 --listen 0.0.0.0:10000 root 2574 0.0 0.4 786320 19352 ? Sl Jun22 2:30 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn sync-agent --listen 0.0.0.0:10002 --replica 0.0.0.0:10000 --listen-port-range 10003-10012 root 7204 0.1 0.7 1228712 30100 ? Sl Jun22 3:56 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn replica /host/var/lib/longhorn/replicas/pvc-db9d123e-332c-452d-8c04-d0041d717b3e-a3f5c04a --size 3221225472 --listen 0.0.0.0:10015 root 7209 0.0 0.4 786320 19176 ? Sl Jun22 2:29 /host/var/lib/longhorn/engine-binaries/longhornio-longhorn- engine-v1.0.0/longhorn sync-agent --listen 0.0.0.0:10017 --replica 0.0.0.0:10015 --listen-port-range 10018-10027 root 12632 0.0 0.0 34404 2916 ? Rs 06:42 0:00 ps aux
  18. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 19

    https://longhorn.io/docs/1.0.2/high-availability/recover-volume/ Demo - Kill Engine Manager pod Worker node Longhorn node Longhorn node Pod Engine manager pod Pod Replica Manager Pod Replica Manager Pod Engine Process Replica Process Engine Process Replica Process Ext4/xfs Ext4/xfs Replica Process Ext4/xfs Replica Process Ext4/xfs PVC Volume PVC Volume Longhorn Manager (DS) Longhorn Manager (DS) Longhorn Manager (DS) • エンジンPod自体が自動的に回復 • auto-remountも可能だが、liveness probeが必要 • 手動の場合、WorkloadのPodの Containerの再起動 CRASH
  19. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 20

    Demo - Kill replica Manager pod Worker node Longhorn node Longhorn node Pod Engine manager pod Pod Replica Manager Pod Replica Manager Pod Engine Process Replica Process Engine Process Replica Process Ext4/xfs Ext4/xfs Replica Process Ext4/xfs Replica Process Ext4/xfs PVC Volume PVC Volume Longhorn Manager (DS) Longhorn Manager (DS) Longhorn Manager (DS) • 他のReplicaプロセスが生きているので、 IOに影響なし CRASH
  20. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 21

    複数Replicaのどれが最新? • revision.counterに注目 root@longhorn-demo-worker1:/var/lib/longhorn# tree . ├── engine-binaries │ └── longhornio-longhorn-engine-v1.0.2 │ └── longhorn ├── longhorn-disk.cfg └── replicas ├── pvc-47892157-f218-4fed-ac2e-9b1505844bae-1998aacc │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta └── pvc-fa8fb473-01d5-4544-92a1-11b122115905-9c450d92 ├── revision.counter ├── volume-head-000.img ├── volume-head-000.img.meta └── volume.meta
  21. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 22

    複数Replicaのどれが最新? • このカウンターを参考し、どのReplicaのデータは最新なのかを判断 • 下記の例では、三つとも最新 worker1 revision.counter = 100 worker2 revision.counter = 100 worker3 revision.counter = 100 最 新 最 新 最 新
  22. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 23

    複数Replicaのどれが最新? • 最新ではないと判断された時、このReplicaへのIOは停止され、他のReplicaから最新のデー タをコピー worker1 revision.counter = 100 worker2 revision.counter = 98 worker3 revision.counter = 100 最 新 最 新 最新データ
  23. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 24

    複数Replicaのどれが最新? • 他のReplicaがDownした場合、生きているReplicaが一つだけでもIOは継続 worker1 revision.counter = 100 worker2 revision.counter = 98 worker3 revision.counter = 95 最 新 CRASH CRASH
  24. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 25

    複数Replicaのどれが最新? • Downしたノードが復活された時、最新Replicaから最新データのSyncを行う worker1 revision.counter = 100 worker2 revision.counter = 98 worker3 revision.counter = 95 最 新 CRASH 最新データ
  25. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 26

    revision.counteの将来(仮) • パフォーマンスの理由で、無くなる可能性がある • https://github.com/longhorn/longhorn/issues/508
  26. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 27

    revision.counteの将来(仮) • volume-head-000.img のLast Modified Timeを使って最新Replicaを判断 root@longhorn-demo-worker1:/var/lib/longhorn# tree . ├── engine-binaries │ └── longhornio-longhorn-engine-v1.0.2 │ └── longhorn ├── longhorn-disk.cfg └── replicas ├── pvc-47892157-f218-4fed-ac2e-9b1505844bae-1998aacc │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta └── pvc-fa8fb473-01d5-4544-92a1-11b122115905-9c450d92 ├── revision.counter ├── volume-head-000.img ├── volume-head-000.img.meta └── volume.meta
  27. © Copyright 2020 Rancher Labs, Inc. All Rights Reserved. 28

    Information • Homepage • https://longhorn.io/ • Documentation • https://longhorn.io/docs/1.0.0/ • Roadmap • https://github.com/longhorn/longhorn/wiki/Roadmap • Development updates • https://github.com/longhorn/longhorn/milestones/