Slide 1

Slide 1 text

OCP v4.12 OVN-Kubernetes Deep Dive Manabu Ori Red Hat 1 OpenShift.Run 2023

Slide 2

Slide 2 text

2 自己紹介 ▸ 名前: 織 学 (@orimanabu) ▸ 所属: Red Hat ▸ 仕事: コンサルタント ・ OpenStackとかOpenShiftとかAnsibleとか

Slide 3

Slide 3 text

目次 3 ● OVNとは ● OVSDB ● OpenStack Integration ● OpenShift Integration

Slide 4

Slide 4 text

OVNとは 4

Slide 5

Slide 5 text

OVN (Open Virtual Network) とは 5 ● 複数ハイパーバイザ /k8sノード上のOVSにまたがった仮想ネットワークを作る仕組み ● OVS (Open vSwitch) のサブプロジェクトとして、 2015年に始動 ○ 最初のリリース: 27 Sep 2016 (OVS v2.6) ○ OpenStack Neutron Plugin (networking-ovn) の最初のリリース: 06 Oct 2016 (Newton) ○ OVS v2.11からリポジトリが分離 https://github.com/ovn-org ● オーバーレイネットワークを論理ネットワークとして抽象化 HV1 HV2 VM-1 VM-2 VM-A VM-3 VM-4 VM-B Logical Switch VM-A VM-B Logical Switch Logical Router Logical Switch VM-3 VM-4 VM-1 VM-2 物理ネットワーク 論理ネットワーク

Slide 6

Slide 6 text

Chris Wright (Red Hat CTO)による Open vSwitch and OVN 2022 Fall Conferenceでのキーノートより の 6 https://www.openvswitch.org/support/ovscon2022/slides/Keynote-OVS-OVN-Nov-2022.pdf

Slide 7

Slide 7 text

OVNの特徴 7 ● データベース操作によるコンフィギュレーション ● Logical Flowによる設定 ○ 物理ネットワーク(OVS)と仮想ネットワークを分離 ○ だいたいOpenFlowと同じ気分 ■ フローテーブルのパイプライン、フローの matchとaction ● ハイパーバイザ/k8sノード間のカプセリングは Geneve,STT ● 分散L2, L3処理 ● NAT、DHCP、ロードバランサのネイティブ実装 ● L2, L3ゲートウェイ ● 他のCMS (Cloud Management System) と連携することを想定したデザイン ○ OpenStack, Kubernetes, Docker, Mesos, oVirt, ... OVS OVN 対象 1台のホスト内の仮想スイッチ 複数のホストにまたがる仮想ネットワーク 設定 OpenFlow + OVSDB Logical Flow + OVSDB

Slide 8

Slide 8 text

Open vSwitch (OVS) の課題 8 ● OVSは超強力、だけどOpenFlowでSDN環境を構築するのは大変 ○ 「現時点では、低レベルのフローロジックを直接作り込む必要があるなど、導入の敷居はあまり 低くありません」 ■ 技術文書 OpenFlowの概要, VA Linux Systems Japan ○ 「プログラミング言語に例えるとアセンブラ、もしくは標準ライブラリがない C言語」 ■ マスタリングTCP/IP OpenFlow編, オーム社 ● OVSは超強力、だから ○ かつてOpenStackのML2/OVSでは、OVS, Network Namespace, iptables, etcを組み合わせて 様々な機能を実現していた ○ OVSネイティブな機能を活用するとより効率的に処理できるはず ● 仮想化/コンテナ基盤のソフトウェア製品それぞれで OpenFlowの作り込みをするのはつらい ○ OpenStack ○ Kubernetes ○ oVirt, ...

Slide 9

Slide 9 text

Open vSwitch (OVS) の課題 9 ● OVSは超強力、だけどOpenFlowでSDN環境を構築するのは大変 ○ 「現時点では、低レベルのフローロジックを直接作り込む必要があるなど、導入の敷居はあまり 低くありません」 ■ 技術文書 OpenFlowの概要, VA Linux Systems Japan ○ 「プログラミング言語に例えるとアセンブラ、もしくは標準ライブラリがない C言語」 ■ マスタリングTCP/IP OpenFlow編, オーム社 ● OVSは超強力、だから ○ かつてOpenStackのML2/OVSでは、OVS, Network Namespace, iptables, etcを組み合わせて 様々な機能を実現していた ○ OVSネイティブな機能を活用するとより効率的に処理できるはず ● 仮想化/コンテナ基盤のソフトウェア製品それぞれで OpenFlowの作り込みをするのはつらい ○ OpenStack ○ Kubernetes ○ oVirt, ...

Slide 10

Slide 10 text

OVNのコンポーネント 10 ● Northbound DB ● Southbound DB ● ovn-northd ● ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ ハイパーバイザ /k8sノード OVSDB Management Protocol OpenFlow

Slide 11

Slide 11 text

OVNのコンポーネント 11 ● Northbound DB ● Southbound DB ● ovn-northd ● ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow Northbound DB ● CMS (Cloud Management System) との連携をする部分 ● 論理ネットワークの構成、あるべき姿 (desired state) を格 納するデータベース ○ Logical Port, Logical Switch, Logical Router, ... ハイパーバイザ /k8sノード

Slide 12

Slide 12 text

OVNのコンポーネント 12 ● Northbound DB ● Southbound DB ● ovn-northd ● ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow Southbound DB ● 現在の状態 (runtime state) を格納するデータベース ● 論理ポート・スイッチ・ルータと、物理要素とのマッピング ● runtime stateと論理ネットワークを元にした Logical Flowのパイ プライン ハイパーバイザ /k8sノード

Slide 13

Slide 13 text

OVNのコンポーネント 13 ● Northbound DB ● Southbound DB ● ovn-northd ● ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow ovn-northd ● Northbound DBの論理構成をSouthbound DBの runtime stateに変換するデーモン ● 論理ネットワークの構成を元に Logical flowを生成 ハイパーバイザ /k8sノード

Slide 14

Slide 14 text

ハイパーバイザ /k8sノード OVNのコンポーネント 14 ● Northbound DB ● Southbound DB ● ovn-northd ● ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow ovn-controller ● 各ハイパーバイザ/k8sノードで稼働 ● Logical flowからPhysical flowを生成 ○ e.g. VIF UUID → OpenFlow port ● Physical flowをハイパーバイザ上の OVSに注入

Slide 15

Slide 15 text

OVNのコンポーネント 15 ● Northbound DB ○ CMS (Cloud Management System) との連携をする部分 ○ 論理ネットワークの構成、あるべき姿 (desired state) を格納するデータベース ■ Logical Port, Logical Switch, Logical Router, ... ● Southbound DB ○ 現在の状態 (runtime state) を格納するデータベース ○ 論理ポート・スイッチ・ルータと、物理要素とのマッピング ○ runtime stateと論理ネットワークを元にした Logical Flowのパイプライン ● ovn-northd ○ Northbound DBの論理構成をSouthbound DBのruntime stateに変換するデーモン ○ 論理ネットワークの構成を元に Logical flowを生成 ● ovn-controller ○ 各ハイパーバイザ/k8sノードで稼働 ○ Logical flowからPhysical flowを生成 ■ e.g. VIF UUID → OpenFlow port ○ Physical flowをハイパーバイザ/k8sノード上のOVSに注入

Slide 16

Slide 16 text

Logical Flowの例 16 $ oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-sbctl dump-flows worker-0 Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: ingress table=0 (ls_in_check_port_sec), priority=100 , match=(eth.src[40]), action=(drop;) table=0 (ls_in_check_port_sec), priority=100 , match=(vlan.present), action=(drop;) table=0 (ls_in_check_port_sec), priority=50 , match=(1), action=(reg0[15] = check_in_port_sec(); next;) table=1 (ls_in_apply_port_sec), priority=50 , match=(reg0[15] == 1), action=(drop;) table=1 (ls_in_apply_port_sec), priority=0 , match=(1), action=(next;) table=2 (ls_in_lookup_fdb ), priority=0 , match=(1), action=(next;) table=3 (ls_in_put_fdb ), priority=0 , match=(1), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.dst == $svc_monitor_mac), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.mcast), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(ip && inport == "stor-worker-0"), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;) table=4 (ls_in_pre_acl ), priority=100 , match=(ip), action=(reg0[0] = 1; next;) table=4 (ls_in_pre_acl ), priority=0 , match=(1), action=(next;) table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 0a:58:0a:81:02:29), action=(outport = "ovntest_hello-w0"; output;) table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 0a:58:0a:81:02:38), action=(outport = "ovntest_hello-7c5b866886-g5xd8"; output;) table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 6e:6d:05:5b:15:32), action=(outport = "k8s-worker-0"; output;) table=25(ls_in_l2_lkup ), priority=0 , match=(1), action=(outport = get_fdb(eth.dst); next;) table=26(ls_in_l2_unknown ), priority=50 , match=(outport == "none"), action=(drop;) table=26(ls_in_l2_unknown ), priority=0 , match=(1), action=(output;) Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: egress table=0 (ls_out_pre_lb ), priority=110 , match=(eth.mcast), action=(next;) table=0 (ls_out_pre_lb ), priority=110 , match=(eth.src == $svc_monitor_mac), action=(next;) table=0 (ls_out_pre_lb ), priority=110 , match=(ip && outport == "stor-worker-0"), action=(next;) table=0 (ls_out_pre_lb ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2), action=(next;) table=0 (ls_out_pre_lb ), priority=100 , match=(ip), action=(reg0[2] = 1; next;) table=0 (ls_out_pre_lb ), priority=0 , match=(1), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(eth.mcast), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(eth.src == $svc_monitor_mac), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(ip && outport == "stor-worker-0"), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;)

Slide 17

Slide 17 text

Logical Flow vs OpenFlow 17 $ oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-sbctl dump-flows worker-0 Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: ingress table=0 (ls_in_check_port_sec), priority=100 , match=(eth.src[40]), action=(drop;) table=0 (ls_in_check_port_sec), priority=100 , match=(vlan.present), action=(drop;) table=0 (ls_in_check_port_sec), priority=50 , match=(1), action=(reg0[15] = check_in_port_sec(); next;) table=1 (ls_in_apply_port_sec), priority=50 , match=(reg0[15] == 1), action=(drop;) table=1 (ls_in_apply_port_sec), priority=0 , match=(1), action=(next;) table=2 (ls_in_lookup_fdb ), priority=0 , match=(1), action=(next;) table=3 (ls_in_put_fdb ), priority=0 , match=(1), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.dst == $svc_monitor_mac), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.mcast), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(ip && inport == "stor-worker-0"), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;) table=4 (ls_in_pre_acl ), priority=100 , match=(ip), action=(reg0[0] = 1; next;) table=4 (ls_in_pre_acl ), priority=0 , match=(1), action=(next;) $ oc debug node/worker-0 -- chroot /host ovs-ofctl -O OpenFlow13 dump-flows br-int --no-stats Temporary namespace openshift-debug-t2zd6 is created for debugging node... Starting pod/worker-0-debug ... To use host binaries, run `chroot /host` cookie=0xdaf2f9b4, priority=180,vlan_tci=0x0000/0x1000 actions=conjunction(100,2/2) cookie=0xdaf2f9b4, priority=180,conj_id=100,in_port=5,vlan_tci=0x0000/0x1000 actions=set_field:0xe->reg11,set_field:0xd->reg12,set_field:0x10->metadata,set_field:0x1->reg14,set_field:52:54:00:00:13:04->eth_src,resubmit(,8) priority=100,in_port=3 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) priority=100,in_port=2 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) priority=100,in_port=1 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) cookie=0x98f46744, priority=100,in_port=4 actions=set_field:0xb->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x2->reg14,resubmit(,8) cookie=0xf3e2718b, priority=100,in_port=6 actions=set_field:0xc->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x6->reg14,resubmit(,8) cookie=0xf3e4502e, priority=100,in_port=7 actions=set_field:0x10->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x8->reg14,resubmit(,8) cookie=0xbb4a6313, priority=100,in_port=8 actions=set_field:0x12->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x5->reg14,resubmit(,8) cookie=0x5a36182a, priority=100,in_port=10 actions=set_field:0x13->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x7->reg14,resubmit(,8) priority=100,in_port=11 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) priority=100,in_port=12 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) cookie=0xda74b9ad, priority=100,in_port=13 actions=set_field:0x5->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x3->reg14,resubmit(,8) Logical Flow OpenFlow

Slide 18

Slide 18 text

OVNの手動構成 18 ● OVSDBの操作 ○ ovsdb-tool ○ ovsdb-client ● Logical Switchの作成 ○ ovn-nbctl lswitch-add SWITCH_NAME ● Logical Portの作成 ○ ovn-nbctl lport-add SWITCH_NAME PORT_NAME ● Logical PortにMACアドレスを設定 ○ ovn-nbctl lport-set-address PORT_NAME MAC_ADDRESS ● Logical PortとPhysical Portの紐付け ○ ovs-vsctl add-port BRIDGE INTERFACE -- set Interface INTERFACE external_ids:iface-id=PORT_NAME ↓ ● OpenStack, Kubernetes等と連携するときは、この辺りは Neutron ML2 driver/CNI Pluginがやってくれ ます

Slide 19

Slide 19 text

OVSDB 19

Slide 20

Slide 20 text

OVSDB 20 ● OVSDB ○ Native JSON-RPC 1.0 support ■ OVSDB Management Protocol (RFC7047) ○ Bidirectional communication ■ Whenever a monitored portion of the database changes, the server tells the client what rows were added or modified (including the new contents) or deleted. ○ Schema based ○ Standard Database Operations ○ Journal based in-memory datastore ○ Atomic transactions Request : { “id" : or “method" : , “params" : [] } Response : { "id": or “result" : [], “error" : }

Slide 21

Slide 21 text

OVSDB 21 https://twitter.com/Ben_Pfaff/status/453333818653417472

Slide 22

Slide 22 text

OVSDB - RPC methods, Schema 22 ● RPC methods ○ list_dbs ○ get_schema ○ transact ○ cancel ○ monitor ○ monitor_cancel ○ lock ○ steal ○ unlock ○ echo ○ update ○ locked ○ stolen ○ echo ● Schema : { “name”:, "tables": {: , …}, …, } : { "columns": {: , ...} …, } : { "type": …, }

Slide 23

Slide 23 text

OVSDB - Open_vSwitch Database 23

Slide 24

Slide 24 text

OVSDB - monitoring 24 $ ovsdb-client monitor tcp:127.0.0.1:6640 todo List $ ovsdb-client transact tcp:127.0.0.1:6640 '[todo,{op:insert, table:List, row:{name:List1} }]' [{"uuid":["uuid","b8654366-1a91-4813-bc5c-24bd23ed8d83"]}] $ ovsdb-client monitor tcp:127.0.0.1:6640 todo List row action items name _version ------------------------------------ ------ ----- ----- ------------------------------------ b8654366-1a91-4813-bc5c-24bd23ed8d83 insert [] List1 e07c4d9e-2544-4c02-b68c-72222b5da82d Terminal #1 Terminal #1 Terminal #2

Slide 25

Slide 25 text

OVSDBの冗長化 25 ● 2種類のクラスタリング: Active-Backup, Clustered ○ https://docs.openvswitch.org/en/latest/ref/ovsdb.7/ ● Active-Backup ○ Active, Backupの2台構成 ○ ActiveノードはStandaloneノードと同じ動きをする ○ クライアントはActiveに接続して読み書き ○ BackupノードはActiveノードに接続してデータをレプリケーション ● Clustered ○ Raft分散合意アルゴリズムによるクラスタリング ○ 奇数台で構成、過半数より多いノード数が生きている限りサービス継 続 ■ OpenShiftの場合、masterは3台なので、1台障害まで許容

Slide 26

Slide 26 text

Tips 26 ● ClusteredモードでどのPodがRaftのLeader/Followerかを確認する $ oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 95a5 Name: OVN_Northbound Cluster ID: 6a7b (6a7b2916-0fae-4208-8dda-aecca921761c) Server ID: 95a5 (95a5f5fd-1959-4920-8905-d0b5ad99ddc4) Address: ssl:172.16.13.103:9643 Status: cluster member Role: leader Term: 7 Leader: self Vote: self Last Election started 79806997 ms ago, reason: leadership_transfer Last Election won: 79806988 ms ago Election timer: 10000 Log: [21725, 27838] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 <-4846 <-4110 ->4110 Disconnections: 0 Servers: 4846 (4846 at ssl:172.16.13.101:9643) next_index=27838 match_index=27837 last msg 844 ms ago 95a5 (95a5 at ssl:172.16.13.103:9643) (self) next_index=21731 match_index=27837 4110 (4110 at ssl:172.16.13.102:9643) next_index=27838 match_index=27837 last msg 844 ms ago

Slide 27

Slide 27 text

OpenStack Integration 27

Slide 28

Slide 28 text

OpenStackとの連携 28 ● Neutron ML2 driver: networking-ovn ML2/OVS ML2/OVN

Slide 29

Slide 29 text

NeutronとOVNの構成要素のマッピング 29 NEUTRON OVN router logical router + gateway_chassis (scheduling) network logical switch + dhcp_options port logical switch port ( + logical router port) security group Port_Group + ACL + Address_Set floating ip NAT (dnat_snat entry type) (in octavia WIP!) Load_Balancer

Slide 30

Slide 30 text

networking-ovnの特徴 30 ● L2 ○ ARP responderの機能 ● L3 ○ OVNでIPv4/IPv6ルーティングのネイティブサポート ■ L3 agentは必要ない ○ 分散ルータ ○ namespaceを渡る必要がないので効率的 ● Security Group ○ カーネルのconntrackモジュールをOVSから直接利用 ○ Neutronの firewall_driver = openvswitch と同じ動き ● DHCP ○ ovn-controllerがDHCPの機能を持つ ■ dhcp agentは必要ない ■ dnsmasqがたくさん地獄にならない ○ シンプルなユースケースのみ想定

Slide 31

Slide 31 text

networking-ovnの特徴 31 ● Metadata ○ 今の実装では namespace + haproxy ○ metadata-agentとneutron-serverとの 通信は不要 ● Octavia ○ OVNのOctavia driver開発中 ○ Amphora VMが必要なくなる VM1 VM3 VM2 localport A localport B br-int VM4 nsB haproxy nsA haproxy ovn-metadata-agent UNIX socket Chassis 1

Slide 32

Slide 32 text

OpenShift Integration 32

Slide 33

Slide 33 text

33 Kubernetes OpenShift ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow OVNのコンポーネントとノードとのマッピング master node worker node

Slide 34

Slide 34 text

Pod構成 (1) 34 ● ovnkube-master ○ masterノードで稼動するDaemonSet ○ コンテナは6つ ■ northd ● NBDBの状態をSBDBに反映するデーモン ■ nbdb ● OVSDB (Northbound DB) ● master 3台でRAFTクラスター構成 ■ sbdb ● OVSDB (Southbound DB) ● master 3台でRAFTクラスター構成 ■ ovnkube-master ● ホストへのサブネットの切り出し ● 新規PodができたらNBDBを更新 ■ kube-rbac-proxy ● RBAC authをk8s APIに対して実行できるようにするためのプロキシー ■ ovn-dbchecker ● OVSDBのRAFTクラスターの状態を監視する

Slide 35

Slide 35 text

Pod構成 (2) 35 ● ovnkube-node ○ 全てのノードで稼動する DaemonSet ○ コンテナは5つ ■ ovn-controller ● SBDBの情報から、自ノードの Physical Flowを生成するデーモン ■ ovn-acl-logging ● ACL audit logを生成する ■ kube-rbac-proxy ● RBAC authをk8s APIに対して実行できるようにするためのプロキシー ■ kube-rbac-proxy-ovn-metrics ● RBAC authをk8s APIに対して実行できるようにするためのプロキシー ■ ovnkube-node ● CNIバイナリ(ovn-k8s-cni-overlay)と連携してPodがネットワークに接続でき るように各種設定を行う ● ※ OVSはホストOS上でsystemdのサービスunitとして稼動 ○ ovs-* コマンドはホスト上で直接実行できる ○ ovn-*コマンドはどこかのコンテナに oc execしてから実行する

Slide 36

Slide 36 text

使用コマンド 36 ● オペレーションでコマンド ○ ovn-nbctl ○ ovn-sbctl ○ ovn-trace ● private key, certificate, CA certificate等の指定はDaemonSetの定義を見て思い出すのが よ いと思います ○ NBDB: TCP 9641 ○ SBDB: TCP 9642 $ oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -A1 OVN_NB_CTL= | sed 's/^ *//' OVN_NB_CTL="ovn-nbctl -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \ --db "ssl:172.16.13.101:9641,ssl:172.16.13.102:9641,ssl:172.16.13.103:9641""

Slide 37

Slide 37 text

物理構成 (OCP v4.12: Shared Gateway Mode) 37 worker-0 172.16.13.0/24 OpenShiftクラスター br-int br-ex enp1s0 172.16.13.104/24 169.254.169.2/29 … Geneve tunnel links to the other nodes ovn-k8s-mp0 10.129.2.2/23 Pod eth0 Pod eth0 master-0 br-int br-ex enp1s0 172.16.13.101/24 169.254.169.2/29 … Geneve tunnel links to the other nodes ovn-k8s-mp0 10.130.0.2/23 Pod eth0 Pod eth0

Slide 38

Slide 38 text

オーバーレイ仮想ネットワーク (OCP v4.12: Shared Gateway Mode) stor-worker-0 br-ex_worker-0 br-ex_master-0 rtos-master-0(10.130.0.1/23) rtos-worker-0(10.129.2.1/23) k8s-master-0 (10.130.0.2/23) k8s-worker-0 (10.129.2.2/23) stor-master-0 rtoj-GR_master-0 (100.64.0.4/16) rtoe-GR_master-0 (172.16.13.101/24) etor-GR_master-0 jtor-ovn_cluster_router jtor-GR_worker-0 rtoj-GR_worker-0 (100.64.0.7/16) rtoe-GR_worker-0 (172.16.13.104/24) etor-GR_worker-0 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) br-int br-int worker-0 master-0 Pod Pod master-0 Pod Pod worker-0 ovn_cluster_router GR_worker-0 ext_worker-0 ext_master-0 GR_master-0 join rtoj-ovn_cluster_router(100.64.0.1/29) Underlay Network jtor-GR_master-0 enp1s0 (br-ex) enp1s0 (br-ex) 凡例 ロジカルスイッチ ロジカルルーター ロジカルロード バランサー

Slide 39

Slide 39 text

(ご参考) v4.7までの物理構成 (Local Gateway Mode)

Slide 40

Slide 40 text

(ご参考) v4.7までのオーバーレイ仮想ネットワーク (Local Gateway Mode) 40

Slide 41

Slide 41 text

Overall view 41 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23

Slide 42

Slide 42 text

Pod to Pod (same node) 42

Slide 43

Slide 43 text

サマリ 43 ● 同一ノード上のPodは、OVNのロジカルネットワーク上は「ノード用のロジカルスイッチ」に接続 する ○ 物理的には、OVSブリッジbr-intに接続する worker-0 ovn-k8s-mp0 (br-int) client hello-w0 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29

Slide 44

Slide 44 text

Pod-to-Pod - same node 44 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23

Slide 45

Slide 45 text

Pod-to-Pod - same node 45 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:29 && ip4.dst == 10.129.2.41 && tcp.dst == 8080 && ip.ttl == 64 && tcp'

Slide 46

Slide 46 text

ovn-trace --summary (Pod-to-Pod - same node) 46 ingress(dp="worker-0", inport="ovntest_client") { reg0[15] = check_in_port_sec(); next; reg0[0] = 1; next; reg0[2] = 1; next; ct_lb_mark; ct_lb_mark { reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; outport = "ovntest_hello-w0"; output; egress(dp="worker-0", inport="ovntest_client", outport="ovntest_hello-w0") { reg0[2] = 1; next; reg0[0] = 1; next; ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ { reg0[10] = 1; next; reg0[15] = check_out_port_sec(); next; output; /* output to "ovntest_hello-w0", type "" */; }; }; }; }; Logical Switch `worker-0`の Ingress Datapath Logical Switch `worker-0`の Egress Datapath

Slide 47

Slide 47 text

ovn-trace --detail (Pod-to-Pod - same node) 47 ingress(dp="worker-0", inport="ovntest_client") ----------------------------------------------- 0. ls_in_check_port_sec (northd.c:8327): 1, priority 50, uuid 57ea4ad9 reg0[15] = check_in_port_sec(); next; 4. ls_in_pre_acl (northd.c:5801): ip, priority 100, uuid b95659ee reg0[0] = 1; next; 5. ls_in_pre_lb (northd.c:5971): ip, priority 100, uuid 5c887504 reg0[2] = 1; next; 6. ls_in_pre_stateful (northd.c:5994): reg0[2] == 1, priority 110, uuid 3d49ab72 ct_lb_mark; ct_lb_mark ---------- 7. ls_in_acl_hint (northd.c:6054): ct.new && !ct.est, priority 7, uuid bd5ebb2e reg0[7] = 1; reg0[9] = 1; next; 8. ls_in_acl (northd.c:6668): ip && !ct.est, priority 1, uuid 13bbc321 reg0[1] = 1; next; 15. ls_in_stateful (northd.c:7507): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 2335ba44 ct_commit { ct_mark.blocked = 0; }; next; 16. ls_in_pre_hairpin (northd.c:7535): ip && ct.trk, priority 100, uuid 2e88b7c6 reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; 25. ls_in_l2_lkup (northd.c:8993): eth.dst == 0a:58:0a:81:02:29, priority 50, uuid 96aa0da0 outport = "ovntest_hello-w0"; output; egress(dp="worker-0", inport="ovntest_client", outport="ovntest_hello-w0") -------------------------------------------------------------------------- 0. ls_out_pre_lb (northd.c:5973): ip, priority 100, uuid 73ec1607 reg0[2] = 1; next; 1. ls_out_pre_acl (northd.c:5803): ip, priority 100, uuid cbfe6a43 reg0[0] = 1; next; 2. ls_out_pre_stateful (northd.c:5997): reg0[2] == 1, priority 110, uuid a10d5414 ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ ------------------------------------------------ 3. ls_out_acl_hint (northd.c:6116): ct.est && ct_mark.blocked == 0, priority 1, uuid 4e73f02a reg0[10] = 1; next; 8. ls_out_check_port_sec (northd.c:5657): 1, priority 0, uuid d9e7bcdf reg0[15] = check_out_port_sec(); next; 9. ls_out_apply_port_sec (northd.c:5662): 1, priority 0, uuid 2c3e3cdd output; /* output to "ovntest_hello-w0", type "" */ Logical Switch `worker-0`の Ingress Datapath Logical Switch `worker-0`の Egress Datapath

Slide 48

Slide 48 text

Pod to Pod (different node) 48

Slide 49

Slide 49 text

サマリ 49 ● worker-0上のPod `client` からworker-1上のPod `hello-w1` に通信 ● 送信元Pod `curl` から出たパケットは、ノードローカルなロジカルスイッチ `worker-0` を経 由して、ノード間を接続するロジカルルータ `ovn_cluster_router` に入る ● ovn_cluster_routerは送信先PodのIPアドレスはworker-1のサブネットなので、 worker-1のノー ドローカルスイッチに転送する ● worker-1のノードローカルスイッチを経由してパケットが送信先 Pod (hello-w1) に届く

Slide 50

Slide 50 text

Pod-to-Pod - different node 50 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23

Slide 51

Slide 51 text

Pod-to-Pod - different node 51 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:01 && ip4.dst == 10.131.0.18 && tcp.dst == 8080 && ip.ttl == 64 && tcp'

Slide 52

Slide 52 text

ovn-trace --summary (Pod-to-Pod - different node) 52 ingress(dp="worker-0", inport="ovntest_client") { reg0[15] = check_in_port_sec(); next; reg0[0] = 1; next; reg0[2] = 1; next; ct_lb_mark; ct_lb_mark { reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; outport = "stor-worker-0"; output; egress(dp="worker-0", inport="ovntest_client", outport="stor-worker-0") { next; next; reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[15] = check_out_port_sec(); next; output; /* output to "stor-worker-0", type "patch" */; Logical Switch `worker-0`の Ingress Datapath Logical Switch `worker-0`の Egress Datapath

Slide 53

Slide 53 text

ovn-trace --summary (Pod-to-Pod - different node) 53 ingress(dp="ovn_cluster_router", inport="rtos-worker-0") { xreg0[0..47] = 0a:58:0a:81:02:01; next; reg9[2] = 1; next; next; reg7 = 0; next; ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 10.131.0.1; eth.src = 0a:58:0a:83:00:01; outport = "rtos-worker-1"; flags.loopback = 1; next; next; reg8[0..15] = 0; next; next; eth.dst = 0a:58:0a:83:00:12; next; outport = "cr-rtos-worker-1"; next; output; /* Replacing type "chassisredirect" outport "cr-rtos-worker-1" with distributed port "rtos-worker-1". */; egress(dp="ovn_cluster_router", inport="rtos-worker-0", outport="rtos-worker-1") { reg9[4] = 0; next; output; /* output to "rtos-worker-1", type "patch" */; Logical Router `ovn_cluster_router`の Ingress Datapath Logical Router `ovn_cluster_router`の Egress Datapath

Slide 54

Slide 54 text

ovn-trace --summary (Pod-to-Pod - different node) 54 ingress(dp="worker-1", inport="stor-worker-1") { reg0[15] = check_in_port_sec(); next; next; next; reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; outport = "ovntest_hello-w1"; output; egress(dp="worker-1", inport="stor-worker-1", outport="ovntest_hello-w1") { reg0[2] = 1; next; reg0[0] = 1; next; ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ { reg0[10] = 1; next; reg0[15] = check_out_port_sec(); next; output; /* output to "ovntest_hello-w1", type "" */; }; }; }; }; }; }; }; };

Slide 55

Slide 55 text

Pod to ClusterIP (different node) 55

Slide 56

Slide 56 text

サマリ 56 ● worker-0上のPod `client` からworker-1上のPod `hello-w1` に通信 ● 送信元Pod `curl` から出たパケットは、ノードローカルなロジカルスイッチ `worker-0` に入 る ● ロジカルスイッチworker-0上にClusterIP Serviceに相当するロジカルロードバランサがあり、 そこでService配下Podに振り分ける ● 振り分け先のPodが同じノードであれば折り返し、異なるノードであればロジカルルータ ovn_cluster_routerを経由してノードをまたいだ通信をする

Slide 57

Slide 57 text

Pod-to-ClusterIP same/different node 57 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 ClusterIP Service 172.30.182.53:80

Slide 58

Slide 58 text

Pod-to-ClusterIP same/different node 58 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:01 && ip4.dst == 172.30.182.53 && tcp.dst == 80 && ip.ttl == 64 && tcp' ClusterIP Service 172.30.182.53:80

Slide 59

Slide 59 text

OpenFlowのトレースを見てみる (1) 59 $ oc exec client -- cat /sys/class/net/eth0/iflink 45 $ oc debug node/worker-0 -- chroot /host ovs-vsctl find Interface ifindex=45 2>/dev/null | grep ofport ofport : 44 ofport_request : [] $ sudo ovs-appctl ofproto/trace br-int \ in_port=44,\ tcp,\ dl_src=0a:58:0a:81:02:27,\ nw_src=10.129.2.39,\ tcp_src=33333,\ dl_dst=0a:58:0a:81:02:01,\ nw_dst=172.30.182.53,\ tcp_dst=80,\ nw_ttl=64,\ dp_hash=2 client Podのインターフェース番号は 45 client PodはOVSブリッジbr-intのポート番号 は44に接続している worker-0上で実行

Slide 60

Slide 60 text

OpenFlowのトレースを見てみる (2) 60 $ sudo ovs-appctl ofproto/trace br-int in_port=44,tcp,dl_src=0a:58:0a:81:02:27,nw_src=10.129.2.39,tcp_src=33333,dl_dst=0a:58:0a:81:02:01,nw_dst=172.30.182.53,tcp_dst=80 ,nw_ttl=64,dp_hash=2 Flow: dp_hash=0x2,tcp,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=172.30.182 .53,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=80,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=44, priority 100, cookie 0xf7f9ca3e set_field:0x16->reg13 set_field:0x7->reg11 set_field:0x1->reg12 ... 29. metadata=0xa, priority 0, cookie 0x928911c5 resubmit(,37) 37. reg15=0xd,metadata=0xa, priority 100, cookie 0x2a5007e1 set_field:0xa/0xffffff->tun_id set_field:0xd->tun_metadata0 move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30] -> NXM_NX_TUN_METADATA0[16..30] is now 0xa output:12 -> output to kernel tunnel resubmit(,38) 38. No match. drop Final flow: recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_ tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw _frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:8 1:02:01,nw_src=10.129.2.32/29,nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,l en=4,0xa000d}),flags(df|csum|key))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4 12番ポートから送信 worker-0上で実行

Slide 61

Slide 61 text

OpenFlowのトレースを見てみる (3) 61 $ sudo ovs-ofctl -O OpenFlow13 show br-int ... 12(ovn-4c2f1b-0): addr:82:9b:d1:10:33:a0 config: 0 state: LIVE speed: 0 Mbps now, 0 Mbps max ... $ sudo ovs-vsctl --columns=options find interface type=geneve ofport=12 options : {csum="true", key=flow, remote_ip="172.16.13.105"} $ sudo ovs-vsctl show ... 6f785367-2ac9-40bc-9728-64db3aeb2e8d Bridge br-int fail_mode: secure datapath_type: system ... Port ovn-4c2f1b-0 Interface ovn-4c2f1b-0 type: geneve options: {csum="true", key=flow, remote_ip="172.16.13.105"} ... 12番ポートのインターフェース名は ovn-4c2f1b-0 ovn-4c2f1b-0は172.16.13.105 (worker-1) 行きのGeneveトンネル おまけ:12番ポートがどのGeneveトンネルか を一撃で調べる worker-0上で実行

Slide 62

Slide 62 text

OpenFlowのトレースを見てみる (4) 62 $ sudo ovs-appctl ofproto/trace br-int \ in_port=3,\ tun_id=0xa,\ tun_metadata0=0xa000d, tcp,\ reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe, vlan_tci=0x0000, dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_fla gs=0 Flow: tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,tun_id=0xa,metadata=0xe,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83 :00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=3, priority 100 move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23] -> OXM_OF_METADATA[0..23] is now 0xa ... 64. priority 0 resubmit(,65) 65. reg15=0x11,metadata=0xc, priority 100, cookie 0xccf44662 output:23 Final flow: recirc_id=0x124769,eth,tcp,reg0=0x287,reg11=0x3,reg12=0x4,reg13=0x17,reg14=0x1,reg15=0x11,tun_id=0xa,metadata=0xc,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0x124769,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=3,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.128.0.0/15,nw_d st=10.131.0.18,nw_frag=no Datapath actions: ct(commit,zone=23,mark=0/0x1,nat(src)),18 Final flow: recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.32/29, nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xa000d}),flags(df|csum|k ey))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4 worker-0でのtrace結果 worker-1上で実行

Slide 63

Slide 63 text

OpenFlowのトレースを見てみる (4) 63 $ sudo ovs-appctl ofproto/trace br-int \ in_port=3,\ tun_id=0xa,\ tun_metadata0=0xa000d,\ tcp,\ reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,\ vlan_tci=0x0000,\ dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_fla gs=0 Flow: tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,tun_id=0xa,metadata=0xe,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83 :00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=3, priority 100 move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23] -> OXM_OF_METADATA[0..23] is now 0xa ... 64. priority 0 resubmit(,65) 65. reg15=0x11,metadata=0xc, priority 100, cookie 0xccf44662 output:23 Final flow: recirc_id=0x124769,eth,tcp,reg0=0x287,reg11=0x3,reg12=0x4,reg13=0x17,reg14=0x1,reg15=0x11,tun_id=0xa,metadata=0xc,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0x124769,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=3,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.128.0.0/15,nw_d st=10.131.0.18,nw_frag=no Datapath actions: ct(commit,zone=23,mark=0/0x1,nat(src)),18 Final flow: recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.32/29, nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xa000d}),flags(df|csum|k ey))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4 23番ポートから送信 worker-1上で実行 worker-0でのtrace結果

Slide 64

Slide 64 text

OpenFlowのトレースを見てみる (5) 64 $ sudo ovs-vsctl --column=external_ids find interface ofport=23 external_ids : {attached_mac="0a:58:0a:83:00:12", iface-id=ovntest_hello-w1, iface-id-ver="25e16505-3656-49df-8268-80b20fe065d3", ip_addresses="10.131.0.18/23", ovn-installed="true", ovn-installed-ts="1674478354432", sandbox=f0888b516b6dabb7b8e967672de1c0e731d849a63c6f9798e35ed9c69063d354} $ sudo ovs-vsctl --columns=ofport find interface type=geneve options:remote_ip=172.16.13.104 ofport : 3 worker-0 (172.16.13.104)からの Geneveトンネルは3番ポート 23番ポートはhello-w1 Pod worker-1上で実行

Slide 65

Slide 65 text

Pod-to-NodePort (俯瞰図)

Slide 66

Slide 66 text

Pod-to-NodePort (1) 66 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 curl master-0:31513 NAME TYPE CLUSTER-IP PORT(S) hello-nodeport NodePort 172.30.49.137 80:31513/TCP oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:01 && ip4.dst == 172.16.13.101 && tcp.dst == 31513 && ip.ttl == 64 && tcp'

Slide 67

Slide 67 text

Pod-to-NodePort (2) 67 stor-worker-1 rtos-master-0 10.130.0.1/23 0a:58:0a:82:00:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01 master-0 worker-1 stor-master-0 rtoj-GR_master-0 100.64.0.4/16 rtoe-GR_master-0 172.16.13.101/24 jtor-ovn_cluster_router jtor-GR_master-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 master-0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.131.0.18 0a:58:0a:83:00:12 k8s-master-0 10.130.0.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "br-ex_master-0" && eth.src == 52:54:00:00:13:04 && ip4.src == 172.16.13.104 && tcp.src == 33333 && eth.dst == 52:54:00:00:13:01 && ip4.dst == 172.16.13.101 && tcp.dst == 31513 && ip.ttl == 64 && tcp'

Slide 68

Slide 68 text

その他 68 ● 内部DNS (ns: openshift-dns, svc: dns-default) への名前解決時は、ローカ ルノード上のCoreDNS Podに優先して問い合わせをする ○ https://github.com/openshift/ovn-kubernetes/pull/896 ○ https://docs.google.com/presentation/d/1_5Dh3HTVSpETvVhZsz E41REwNPiFFxanhGCAAFg5iPQ/edit#slide=id.g144e6452910_0_ 37 ● Egress RouterはCNI Pluginを用いて実装 ○ https://github.com/openshift/egress-router-cni ● スケーラビリティ改善の鍵 : OVN Interconnect ○ https://www.openvswitch.org/support/ovscon2022/slides/OVN-I C-OVSCON.pdf ○ https://www.openvswitch.org/support/ovscon2019/day1/1501-Mul ti-tenant%20Inter-DC%20tunneling%20with%20OVN(4).pdf

Slide 69

Slide 69 text

Presentation title should not exceed two lines 69 Thank You