Slide 1

Slide 1 text

Trident Deep Dive 2022 Feb Translated by: Yoshiki Fujiwara IDC Frontier Inc. Software Engineer Daiki Hayakawa NetApp G.K. Japan Cloud Solutions Architect for AWS Yoshiki Fujiwara

Slide 2

Slide 2 text

▶ @bells17 ▶ Software Engineer@IDC Frontier inc. ▶ What I usually do : + Kubernetes Development of related components + Kubernetes as a Service Development ▶ Kubernetes SIG-Docs Japanese localization reviewer ▶ Kubernetes Internal Organizer ▶ #kubenews ▶ @bells17_

Slide 3

Slide 3 text

#kubenews Steams on YouTube Almost Every Friday 22:00 JST- We have technical chats mainly Kubernetes/Cloud Native related news.

Slide 4

Slide 4 text

What I talk today ▶ Trident Overview ▶ Kubernetes and CSI ▶ Trident Implementation

Slide 5

Slide 5 text

Cautionary Note ▶ The presenter is not a storage expert (especially iSCSI or NFS ..) ▶ Trident version is expected to be v21.07.2 ▶ Trident is expected to be used with the following settings: + Use Trident from Kubernetes + Trident is basically installed using Trident Operator (described later) ▶ Since it is only an explanation of understanding as a result of following the implementation of Trident, there may be cases where it differs from the actual behavior

Slide 6

Slide 6 text

Mr. Ohno of NetApp also explains the architecture of Trident in Cloud Native Storage Meetup # 1, so please refer to that as well(Presented in Japanese) https://youtu.be/2xEUyAzoNmY?t=3583

Slide 7

Slide 7 text

What is Trident?

Slide 8

Slide 8 text

What is Trident? ▶ Trident is an application for taking advantage of various NetApp storage products (e.g. ONTAP, E- Series, Cloud Volumes Service for AWS, etc...) ▶ With Trident, you will be able to operate NetApp storage products in a container environment such as Kubernetes ▶ The following two platforms are currently supported: + Kubernetes ← I will talk how Trident works on Kubernetes here + Docker ▶ Trident is an OSS (https://github.com/NetApp/trident) ▶ Trident is one of the applications included in Project "Astra" ▶ Astra consists of following applications in addition to Trident + Astra Control: Kubernetes cluster management and operations console + Astra Data Store: Kubernetes native shared file service

Slide 9

Slide 9 text

https://docs.netapp.com/us-en/astra-family/intro-family.html

Slide 10

Slide 10 text

Trident Components

Slide 11

Slide 11 text

Trident Components ▶ Trident(Core): The main body of Trident consists of a CSI Driver for cooperating with Kubernetes, a Rest API server for tridentctl, and various controllers for controlling the state of Trident ▶ tridentctl: The Command line tool for operating Trident from cli ▶ Trident Operator: Kubernetes Operator for managing Trident installations and upgrades on Kubernetes clusters + It was added from Trident v20.04.0 + You can install trident with the tridentctl install command in addition to the Trident Operator

Slide 12

Slide 12 text

Trident installation example

Slide 13

Slide 13 text

Create an ONTAP Storage VM (when using ONTAP)

Slide 14

Slide 14 text

Install Trident Operator XHFUIUUQTHJUIVCDPN/FU"QQUSJEFOUSFMFBTFTEPXOMPBEWUSJEFOU JOTUBMMFSUBSH[ UBSYGUSJEFOUJOTUBMMFSUBSH[ DEUSJEFOUJOTUBMMFS IFMNJOTUBMMUSJEFOUPQFSBUPSOUSJEFOUIFMNUSJEFOUPQFSBUPSUH[ LVCFDUMOUSJEFOUHFUEFQMPZUSJEFOUPQFSBUPS /".&3&"%:6150%"5&"7"*-"#-&"(& USJEFOUPQFSBUPSE

Slide 15

Slide 15 text

Create a TridentOrchastrator resource and install Trident LVCFDUMBQQMZGEFQMPZDSETUSJEFOUPSDIFTUSBUPS@DSZBNM LVCFDUMHFUUSJEFOUPSDIFTUSBUPS /".&"(& USJEFOUE LVCFDUMOUSJEFOUHFUEFQMPZUSJEFOUDTJ /".&3&"%:6150%"5&"7"*-"#-&"(& USJEFOUDTJE LVCFDUMOUSJEFOUHFUETUSJEFOUDTJ /".&%&4*3&%$633&/53&"%:6150%"5&"7"*-"#-&/0%&4&-&$503 "(& USJEFOUDTJLVCFSOFUFTJP BSDIBNE LVCFSOFUFTJPPTMJOVYE

Slide 16

Slide 16 text

Create TridentBackendConfig DBU&0'cLVCFDUMBQQMZG BQJ7FSTJPOUSJEFOUOFUBQQJPW LJOE5SJEFOU#BDLFOE$POpH NFUBEBUB OBNFPOUBQTBOEFGBVMU TQFD TUPSBHF%SJWFS/BNFPOUBQTBO NBOBHFNFOU-*' EBUB-*' TWNUSJEFOU@TWN DSFEFOUJBMT OBNFCBDLFOEUCDPOUBQTBOTFDSFU &0'

Slide 17

Slide 17 text

Create StorageClass DQTBNQMFJOQVUTUPSBHFDMBTTDTJZBNMUFNQMTBNQMFJOQVUTUPSBHFDMBTTCBTJD DTJZBNM WJTBNQMFJOQVUTUPSBHFDMBTTCBTJDDTJZBNM LVCFDUMDSFBUFGTBNQMFJOQVUTUPSBHFDMBTTCBTJDDTJZBNM LVCFDUMHFUTUPSBHFDMBTT /".&1307*4*0/&33&$-"*.10-*$:70-6.*/%*/(.0%& "--0870-6.&&91"/4*0/"(& JTDTJYGTDTJUSJEFOUOFUBQQJP%FMFUF*NNFEJBUFGBMTF E OGTDTJ EFGBVMU DTJUSJEFOUOFUBQQJP%FMFUF*NNFEJBUFGBMTF E

Slide 18

Slide 18 text

Trident and Kubernetes

Slide 19

Slide 19 text

Trident and Kubernetes ▶ Trident runs as a CSI Driver for manage various NetApp storage on Kubernetes ▶ It also implements multiple Kubernetes Controllers for Trident management ▶ Therefore, before we get into the implementation of Trident, It is better to know about the outline such as + Kubernetes Operator(Kubernetes Controller) + CSI(driver)

Slide 20

Slide 20 text

What is Kubernetes? ▶ Kubernetes is one of the container orchestrators ▶ You can build a cluster composed of etcd / control plane / worker node, run various containers on node on Kubernetes, and enable to link the running container and network nicely ▶ By declaratively describing the container and other resources to be deployed in the manifest file, Kubernetes will perform adjustment processing nicely so that it will be in the declared state ▶ A container orchestrator that recreated from Borg, a container platform operated internally by Google, for OSS ▶ Kubernetes has also been donated to the Cloud Native Computing Foundation(CNCF) and is managed on a community basis as a CNCF Graduated project

Slide 21

Slide 21 text

https://github.com/kubernetes/website/blob/fb6364da0afd19e8a9515aaae2de9bc74a0a6abd/static/images/docs/components-of-kubernetes.png

Slide 22

Slide 22 text

How Trident extends Kubernetes ▶ Kubernetes Operator + CRD + Kubernetes Controller ▶ CSI Driver

Slide 23

Slide 23 text

https://github.com/NetApp/trident/blob/v21.07.2/operator/controllers/orchestrator/apis/netapp/v1/types.go CRD

Slide 24

Slide 24 text

Kubernetes Controller

Slide 25

Slide 25 text

https://github.com/kubernetes/sample-controller/blob/master/docs/images/client-go-controller-interaction.jpeg How Controller works

Slide 26

Slide 26 text

▶ Kubernetesのコードリーディングをする上で知っておくと良さそうなこと ▶ Kubernetes Internal #1 Click following links for details around the Kubernetes Controller implementations(Presented in Japanese)

Slide 27

Slide 27 text

Kubernetes and CSI

Slide 28

Slide 28 text

CSI ▶ Definition of common specifications for using storage in Container Orchestrator (CO) such as Kubernetes, Mesos, Cloud Foundry, etc + So it's not a specification only for Kubernetes + For example, Hashicorp Nomad is using CSI under the hood ▶ Storage providers aim to be able to utilize Kubernetes and other COs by writing a driver that supports CSI once ▶ The CSI specification is defined in the the spec.md file in the container- storage- interface/spec repository on Github

Slide 29

Slide 29 text

Specifications defined by CSI ▶ The communication method and provision method of CSI Driver ▶ Features provided by CSI Driver ▶ gRPC Protocol Buffers to use CSI Driver by CO

Slide 30

Slide 30 text

How CSI Driver Communicates and being provided ▶ Need to be provided in container image format (Docker, OCI, etc.) ▶ Communication between CSI Driver and CO needs to ... + Use gRPC protocol + Via UNIX domain socket

Slide 31

Slide 31 text

Features provided by CSI ▶ Create / Delete volume ▶ Attach / Detach volume to node ▶ Volume Mount / Unmount ▶ Create / Delete volume Snapshot ▶ etc...

Slide 32

Slide 32 text

gRPC interfaces defined in Protocol Buffers ▶ Controller Plugin ▶ Node Plugin

Slide 33

Slide 33 text

Controller Plugin ▶ A gRPC server that operates as a control plane for its CSI Driver ▶ Implementation of the following gRPC services: + Controller Service + Identity Service ▶ It provides the ability to control the volume and the snapshot ▶ Specifically, it provides the following features: + Create / Delete volume + Attach / Detach volume to node + Create / Delete volume snapshot

Slide 34

Slide 34 text

Node Plugin ▶ A gRPC server that operates on each CO-participating Worker Node ▶ Implementation of the following gRPC services: + Node Service + Identity Service ▶ It provides the feature to operate the volume on each target Worker Node ▶ It mainly provides the following features: + Format volume attached to node + Mount / Unmount Volume

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Volume Lifecycle

Slide 37

Slide 37 text

CSI wrap-up ▶ CSI is a common specification defined by storage providers to provide volume plug-ins to CO ▶ Specifically, the following are defined + Definition of operating environment and communication method of CSI Driver + Container image format + UNIX domain socket / gRCP protocol + Definition of RPC interface + Controller Plugin + Node Plugin

Slide 38

Slide 38 text

Kubernetes and Volume Plugin

Slide 39

Slide 39 text

There are three Kubernetes volume plugins ▶ In-Tree Volume + Volume plugin implemented inside Kubernetes code + ConfigMap / Secret / EmptyDir etc. fall into here ▶ FlexVolume + Plugin created before CSI appeared (Kubernetes v1.8) + It seems that it is not used much because it requires knowledge about the internal implementation of Kubernetes + Deprecated in Recent Kubernetes Updated versions ▶ CSI Driver + Alpha is available from Kubernetes v1.9 + Kubernetes provides a sidecar application that works with the CSI Driver, so you can provide volume plugins without the need for knowledge of the internal implementation of Kubernetes

Slide 40

Slide 40 text

Integration between Kubernetes and CSI Driver

Slide 41

Slide 41 text

Integration by Sidecar application https://github.com/kubernetes/community/blob/d83cd53979d08ac0e0e33704c6aec6b1c3cb7c8d/contributors/design-proposals/storage/container-storage-interface_diagram1.png

Slide 42

Slide 42 text

Processing flow when creating a volume

Slide 43

Slide 43 text

Controller Sidecar external-provisioner Create / Delete volume external-attacher Attach / Detach volume external-resizer Resize the volume external-snapshotter Create / Delete snapshot livenessprobe HTTP proxy for Liveness Probe

Slide 44

Slide 44 text

Node Sidecar node-driver-registrar It provides the feature to register the CSI driver in Kubelet by using the feature called “Plugin Watcher” in Kubelet

Slide 45

Slide 45 text

▶ CSI⼊⾨(スライド) ▶ CSI⼊⾨(セッション動画) Click following links for a more detailed explanation of Kubernetes and CSI(Presented in Japanese)

Slide 46

Slide 46 text

Trident implementation

Slide 47

Slide 47 text

Trident Components(Again)

Slide 48

Slide 48 text

Trident Overview

Slide 49

Slide 49 text

Trident Operator

Slide 50

Slide 50 text

Trident Core(setup)

Slide 51

Slide 51 text

Trident Core(PVC Create ~ Volume Mount)

Slide 52

Slide 52 text

Trident Core: Other components ▶ TransactionMonitor: Execution processing task management Controller using TridentTransaction resource for managing volume creation etc ▶ PeriopdicallyReconcileNodeAccessOnBackends: A Controller that checks that the appropriate policy settings are set so that each node and each Trident Backend can connect ▶ k8shelper + Node Controller: A Controller that deletes the target Trident Node when deleting a node and deletes the target node information from the policy of each Trident Backend ▶ CRD Controllers + reconcileTMR: Set the SnapMirror according to the TridentMirrorRelationship resource and update the status of the TridentMirrorRelationship ▶ etc

Slide 53

Slide 53 text

CRD operated by Trident $3%໊ ༻్ 5SJEFOU0SDIBTUSBUPS 0QFSBUFECZ5SJEFOU0QFSBUPSUPJOTUBMM5SJEFOU 5SJEFOU#BDLFOE$POpH 3FTPVSDFTGPSHFOFSBUJOH5SJEFOU#BDLFOEUIBUNBOBHFTUPSBHFJOGPSNBUJPOTVDIBT 0/5"1UIBU5SJEFOUDPOOFDUTUP 5SJEFOU#BDLFOE 0QFSBUFEUPNBOBHFTUPSBHFJOGPSNBUJPOTVDIBT0/5"1UPXIJDI5SJEFOUDPOOFDUT :PVDBODSFBUFXJUIl5SJEFOU#BDLFOE$POpHzPSlUSJEFODUMDSFBUFCBDLFOEzhere are other CRDs, but they are unlikely to be used

Slide 54

Slide 54 text

tridentctl/Rest API

Slide 55

Slide 55 text

Conclusion

Slide 56

Slide 56 text

Impressions of reading the implementation ▶ It was characteristic and interesting to be able to use TridentBackend for multiple NetApp storage from one Trident ▶ On the other hand, I got the impression that it would be difficult to start using it because the setting method of TridentBackend and StorageClass is complicated ▶ There were some features and resource definitions (eg Snapshot resources, etc.) that were developed from a relatively old age or seem to be unused now. I got the impression that the implementation could be kept simple by redefining the features, implementations, and support scope of the application

Slide 57

Slide 57 text

Reference materials ▶ https://youtu.be/2xEUyAzoNmY?t=3583 ▶ https://github.com/NetApp/trident/tree/v21.07.2 ▶ https://netapp-trident.readthedocs.io/en/stable-v21.07/dag/kubernetes/index.html ▶ https://netapp-trident.readthedocs.io/en/stable-v21.07/kubernetes/deploying/operator-deploy.html ▶ https://netapp-trident.readthedocs.io/en/stable-v21.07/kubernetes/operations/tasks/managing-backends/tbc.html ▶ https://netapp-trident.readthedocs.io/en/stable-v21.07/kubernetes/operations/tasks/monitoring.html#trident-autosupport-telemetry ▶ https://hub.docker.com/r/netapp/trident-autosupport ▶ https://netapp-trident.readthedocs.io/en/stable-v21.07/kubernetes/concepts/objects.html ▶ https://library.netapp.com/ecmdocs/ECMLP2372138/html/GUID-3FC8A37A-FFCC-4070-A9F0-1B9B3FB79BF8.html ▶ https://milestone-of-se.nesuke.com/sv-basic/architecture/disk-term/ ▶ https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/dm_multipath/mpio_overview ▶ https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/dm_multipath/mpio_description ▶ https://qiita.com/ochiba/items/39dbcda84ec17aefed07 ▶ https://tech-mmmm.blogspot.com/2020/05/iscsi-dm-multipathrheliscsi.html ▶ https://milestone-of-se.nesuke.com/sv-basic/architecture/iscsi-summary/ ▶ https://library.netapp.com/ecmdocs/ECMLP2573234/html/GUID-EC3C367B-79E0-4DBA-8036-22094557357A.html ▶ https://qiita.com/OPySPGcLYpJE0Tc/items/be9daae23b80478b81ff ▶ https://qiita.com/hana_shin/items/cbd428faf92534e25f7b ▶ https://atmarkit.itmedia.co.jp/ait/articles/0807/02/news142.html ▶ https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/8/html/managing_storage_devices/getting-started-with-iscsi_managing-storage-devices ▶ https://docs.netapp.com/ja-jp/ontap/system-admin/command-line-interface-concept.html ▶ https://docs.netapp.com/ja-jp/ontap/volumes/commands-manage-flexvol-volumes-reference.html ▶ http://docs.netapp.com/ontap-9/topic/com.netapp.doc.dot-cm-cmpr-9101/home.html ▶ https://docs.netapp.com/us-en/ontap/concepts/snapmirror-cloud-backups-object-store-concept.html ▶ https://netapp-trident.readthedocs.io/en/stable-v21.07/kubernetes/operations/tasks/volumes/topology.html?highlight=supportedtopology

Slide 58

Slide 58 text

Thanks / Question? ▶ @bells17 ▶ Slide: https://speakerdeck.com/bells17 ▶ @bells17_

Slide 59

Slide 59 text

・Points that I was interested in after reading the code ・Points that the panelist was interested in after listening to this session Discussion/Q&A

Slide 60

Slide 60 text

• セッションコード:1689 ⼊⾨ タイトル:クラウド知っ得シリーズ NetApp Astra スピーカ:ネットアップ合同会社 ソリューション技術本部 SE第1部 ソリューションズエンジニア Zhao Mandy • セッションコード:1687 中級 タイトル:クラウドネイティブアプリケーションの採 ⽤を加速する「Astra Data Store」とは︖ スピーカ:ネットアップ合同会社 ソリューションアーキテクト部 シニアソリューションアーキテクト ⼤削 緑 • セッションコード:1740 中級 タイトル: NetApp Astraを利⽤したKubernetes環境の データポータビリティの実現 スピーカ:ネットワンシステムズ株式会社 ビジネス開発本部 第1応⽤技術部 クラウドインフラチーム ⾦只 圭司 Related Sessions with Trident in NetApp INSIGHT Japan 2022 Digital (and more...)

Slide 61

Slide 61 text

Meet the Specialists

Slide 62

Slide 62 text

Appendix

Slide 63

Slide 63 text

k8s helper ▶ PVC Controller: It resizes PV(C) resources when PVC is resized + The CSI driver's sidecar "csi-resizer" should resize the PV(C), so it seems unnecessary ▶ PV Controller: It deletes the volume associated with the deleted PV + Since the deletion process itself is performed on the CSI driver side, it seems that it is for retrying the deletion if the deletion of the volume was not successful due to some influence ▶ StorageClass Controller: It generates TridentStorageClass according to the creation of k8s Storage Class + Also, if a k8s StorageClass prior to v1 is created, a v1 k8s StorageClass will be generated ▶ Node Controller: It deletes the target TridentNode when deleting a node, and delete the target node information from the policy of each TridentBackend ▶ reconcileNodes: It compares TridentNode and k8s Node and remove that TridentNode if k8s Node does not exist ▶ handleFailedPVUpgrades: If there is a transaction for which the PV upgrade process has not been completed, stop the PV upgrade, delete the PV that was being created, and create a PV with the old settings + PV upgrade seems to be a feature operatated only with tridentctl, but the usage is unknown + PV upgrade seems to create a new PV that imports the PV from which it was upgraded and replacing it

Slide 64

Slide 64 text

CRD Controllers ▶ reconcileBackendConfig: It converts TridentBackendConfig to TridentBackend and store + If the event of k8s Secret with the secret information of TridentBackendConfig occurs, it generates + TridentBackendConfig including the data of k8s secret and execute the event of reconcileBackendConfig + If the TridentBackend resource is deleted, it executes reconcileBackendConfig to regenerate the TridentBackend resource based on the associated TridentBackendConfig ▶ reconcileTMR: Set the SnapMirror according to the TridentMirrorRelationship resource and update the status of the TridentMirrorRelationship ▶ handleTridentSnapshotInfo: It gets the volumeSnapshot(Content) of k8s from the snapshotName stored in the TridentSnapshotInfo resource, and stores the SnapshotHandle(≒Snapshot ID) in the status of TridentSnapshotInfo + However, the TridentSnapshotInfo resource doesn't seem to be operated at all elsewhere, so it seems unlikely that this recocile loop will work in the first place

Slide 65

Slide 65 text

Other Controllers ▶ TransactionMonitor: An execution processing task management Controller operating TridentTransaction resource for managing volume creation etc ▶ PeriopdicallyReconcileNodeAccessOnBackends: A Controller that checks that the appropriate policy settings are done so that each node and each TridentBackend can connect

Slide 66

Slide 66 text

nodePrep ▶ A feature that automatically installs packages required for NFS/iSCSI -> Starts Service (looks like a beta feature) ▶ Supported Linux distributions are as follows: + Ubuntu + RHEL/CentOS ▶ Packages to be installed + NFS + Ubuntu: nfs-common + RHEL/CentOS: nfs-utils + iSCSI + Ubuntu: lsscsi, sg3-utils, scsitools, open-iscsi, multipath-tools + RHEL/CentOS: lsscsi, sg3_utils, iscsi-initiator-utils, device-mapper-multipath ▶ Service to be started + NFS + rpc-statd + iSCSI: + Ubuntu: iscsid, multipathd + RHEL/CentOS: iscsid, open-iscsi, multipathd

Slide 67

Slide 67 text

CSI driver mount procedure (NFS) ▶ NodeStageVolume: + If nodePrep process is enabled, it installs package etc + It writes information such as mountOptions, NFS Server IP, NFS Path as a file named + volumePublishInfo.json + It creates this file in the path for the target volume provided by a Kubernetes CSI Sidecar ▶ NodePublishVolume: + It gets information from volumePublishInfo.json + It creates a mount destination directory + It mounts an NFS volume with the “mount -t nfs” command

Slide 68

Slide 68 text

CSI driver mount procedure(iSCSI) ▶ NodeStageVolume: + If nodePrep process is enabled, it installs package etc. + It creates an iSCSI target with the iscsiadm command and log in + It scans the path to a particular LUN and waits for all SCSI disk-by-paths for that LUN to be created + It waits for the multipath device to be created + After that, if a file system other than Raw Block Volume is specified, it formats. + Available file systems: xfs / ext3 / ext4 ▶ NodePublishVolume: + It mount the device with the mount command

Slide 69

Slide 69 text

Storage Pool ▶ Trident's Storage Pool is a pool of resources for which PV is assigned ▶ This Storage Pool is categorized as follows + Physical Storage Pool + Virtual Storage Pool ▶ Physical Storage Pool is an Aggregator for ONTAP + Aggregator = RAID bundled to improve storage performance and scalability ▶ Virtual Storage Pool + One Physical Storage Pool made into multiple pools + It set up multiple different Virtual Storage Pools such as IOPS and assign them according to the required requirements + It combines multiple Physical Storage Pools into one Pool + Physical Storage Pools with different physical arrangements and network topologies can be combined into one Virtual Storage Pool and scheduled to the appropriate Topology

Slide 70

Slide 70 text

TridentMirrorRelationshipとSnap Mirror ▶ You can use the TridentMirrorRelationship resource to set the SnapMirror for a volume ▶ However, it seems that this function is still under development and there is no document etc ▶ You can mirror data with other volumes like SnapMirror = rsync ▶ Basically, the log of the written contents is added by Incremental Forever ▶ It's also possible to mirror the entire Storage VM (which seems impossible with Trident) ▶ It is also possible to transfer to S3 (object storage that can use the protocol) called SnapMirror Cloud