Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OVN-Kubernetes-Introduction-ja-2023-01-27.pdf

orimanabu
January 27, 2023

 OVN-Kubernetes-Introduction-ja-2023-01-27.pdf

OpenShift.Run2023での講演資料です

orimanabu

January 27, 2023
Tweet

More Decks by orimanabu

Other Decks in Technology

Transcript

  1. 2 自己紹介 ▸ 名前: 織 学 (@orimanabu) ▸ 所属: Red

    Hat ▸ 仕事: コンサルタント ・ OpenStackとかOpenShiftとかAnsibleとか
  2. OVN (Open Virtual Network) とは 5 • 複数ハイパーバイザ /k8sノード上のOVSにまたがった仮想ネットワークを作る仕組み •

    OVS (Open vSwitch) のサブプロジェクトとして、 2015年に始動 ◦ 最初のリリース: 27 Sep 2016 (OVS v2.6) ◦ OpenStack Neutron Plugin (networking-ovn) の最初のリリース: 06 Oct 2016 (Newton) ◦ OVS v2.11からリポジトリが分離 https://github.com/ovn-org • オーバーレイネットワークを論理ネットワークとして抽象化 HV1 HV2 VM-1 VM-2 VM-A VM-3 VM-4 VM-B Logical Switch VM-A VM-B Logical Switch Logical Router Logical Switch VM-3 VM-4 VM-1 VM-2 物理ネットワーク 論理ネットワーク
  3. Chris Wright (Red Hat CTO)による Open vSwitch and OVN 2022

    Fall Conferenceでのキーノートより の 6 https://www.openvswitch.org/support/ovscon2022/slides/Keynote-OVS-OVN-Nov-2022.pdf
  4. OVNの特徴 7 • データベース操作によるコンフィギュレーション • Logical Flowによる設定 ◦ 物理ネットワーク(OVS)と仮想ネットワークを分離 ◦

    だいたいOpenFlowと同じ気分 ▪ フローテーブルのパイプライン、フローの matchとaction • ハイパーバイザ/k8sノード間のカプセリングは Geneve,STT • 分散L2, L3処理 • NAT、DHCP、ロードバランサのネイティブ実装 • L2, L3ゲートウェイ • 他のCMS (Cloud Management System) と連携することを想定したデザイン ◦ OpenStack, Kubernetes, Docker, Mesos, oVirt, ... OVS OVN 対象 1台のホスト内の仮想スイッチ 複数のホストにまたがる仮想ネットワーク 設定 OpenFlow + OVSDB Logical Flow + OVSDB
  5. Open vSwitch (OVS) の課題 8 • OVSは超強力、だけどOpenFlowでSDN環境を構築するのは大変 ◦ 「現時点では、低レベルのフローロジックを直接作り込む必要があるなど、導入の敷居はあまり 低くありません」

    ▪ 技術文書 OpenFlowの概要, VA Linux Systems Japan ◦ 「プログラミング言語に例えるとアセンブラ、もしくは標準ライブラリがない C言語」 ▪ マスタリングTCP/IP OpenFlow編, オーム社 • OVSは超強力、だから ◦ かつてOpenStackのML2/OVSでは、OVS, Network Namespace, iptables, etcを組み合わせて 様々な機能を実現していた ◦ OVSネイティブな機能を活用するとより効率的に処理できるはず • 仮想化/コンテナ基盤のソフトウェア製品それぞれで OpenFlowの作り込みをするのはつらい ◦ OpenStack ◦ Kubernetes ◦ oVirt, ...
  6. Open vSwitch (OVS) の課題 9 • OVSは超強力、だけどOpenFlowでSDN環境を構築するのは大変 ◦ 「現時点では、低レベルのフローロジックを直接作り込む必要があるなど、導入の敷居はあまり 低くありません」

    ▪ 技術文書 OpenFlowの概要, VA Linux Systems Japan ◦ 「プログラミング言語に例えるとアセンブラ、もしくは標準ライブラリがない C言語」 ▪ マスタリングTCP/IP OpenFlow編, オーム社 • OVSは超強力、だから ◦ かつてOpenStackのML2/OVSでは、OVS, Network Namespace, iptables, etcを組み合わせて 様々な機能を実現していた ◦ OVSネイティブな機能を活用するとより効率的に処理できるはず • 仮想化/コンテナ基盤のソフトウェア製品それぞれで OpenFlowの作り込みをするのはつらい ◦ OpenStack ◦ Kubernetes ◦ oVirt, ...
  7. OVNのコンポーネント 10 • Northbound DB • Southbound DB • ovn-northd

    • ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ ハイパーバイザ /k8sノード OVSDB Management Protocol OpenFlow
  8. OVNのコンポーネント 11 • Northbound DB • Southbound DB • ovn-northd

    • ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow Northbound DB • CMS (Cloud Management System) との連携をする部分 • 論理ネットワークの構成、あるべき姿 (desired state) を格 納するデータベース ◦ Logical Port, Logical Switch, Logical Router, ... ハイパーバイザ /k8sノード
  9. OVNのコンポーネント 12 • Northbound DB • Southbound DB • ovn-northd

    • ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow Southbound DB • 現在の状態 (runtime state) を格納するデータベース • 論理ポート・スイッチ・ルータと、物理要素とのマッピング • runtime stateと論理ネットワークを元にした Logical Flowのパイ プライン ハイパーバイザ /k8sノード
  10. OVNのコンポーネント 13 • Northbound DB • Southbound DB • ovn-northd

    • ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow ovn-northd • Northbound DBの論理構成をSouthbound DBの runtime stateに変換するデーモン • 論理ネットワークの構成を元に Logical flowを生成 ハイパーバイザ /k8sノード
  11. ハイパーバイザ /k8sノード OVNのコンポーネント 14 • Northbound DB • Southbound DB

    • ovn-northd • ovn-controller Clouc Management System (OpenStack, Kubernetes, etc) networking-ovn ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow ovn-controller • 各ハイパーバイザ/k8sノードで稼働 • Logical flowからPhysical flowを生成 ◦ e.g. VIF UUID → OpenFlow port • Physical flowをハイパーバイザ上の OVSに注入
  12. OVNのコンポーネント 15 • Northbound DB ◦ CMS (Cloud Management System)

    との連携をする部分 ◦ 論理ネットワークの構成、あるべき姿 (desired state) を格納するデータベース ▪ Logical Port, Logical Switch, Logical Router, ... • Southbound DB ◦ 現在の状態 (runtime state) を格納するデータベース ◦ 論理ポート・スイッチ・ルータと、物理要素とのマッピング ◦ runtime stateと論理ネットワークを元にした Logical Flowのパイプライン • ovn-northd ◦ Northbound DBの論理構成をSouthbound DBのruntime stateに変換するデーモン ◦ 論理ネットワークの構成を元に Logical flowを生成 • ovn-controller ◦ 各ハイパーバイザ/k8sノードで稼働 ◦ Logical flowからPhysical flowを生成 ▪ e.g. VIF UUID → OpenFlow port ◦ Physical flowをハイパーバイザ/k8sノード上のOVSに注入
  13. Logical Flowの例 16 $ oc -n openshift-ovn-kubernetes exec -c northd

    ovnkube-master-zwglk -- ovn-sbctl dump-flows worker-0 Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: ingress table=0 (ls_in_check_port_sec), priority=100 , match=(eth.src[40]), action=(drop;) table=0 (ls_in_check_port_sec), priority=100 , match=(vlan.present), action=(drop;) table=0 (ls_in_check_port_sec), priority=50 , match=(1), action=(reg0[15] = check_in_port_sec(); next;) table=1 (ls_in_apply_port_sec), priority=50 , match=(reg0[15] == 1), action=(drop;) table=1 (ls_in_apply_port_sec), priority=0 , match=(1), action=(next;) table=2 (ls_in_lookup_fdb ), priority=0 , match=(1), action=(next;) table=3 (ls_in_put_fdb ), priority=0 , match=(1), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.dst == $svc_monitor_mac), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.mcast), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(ip && inport == "stor-worker-0"), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;) table=4 (ls_in_pre_acl ), priority=100 , match=(ip), action=(reg0[0] = 1; next;) table=4 (ls_in_pre_acl ), priority=0 , match=(1), action=(next;) <snip> table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 0a:58:0a:81:02:29), action=(outport = "ovntest_hello-w0"; output;) table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 0a:58:0a:81:02:38), action=(outport = "ovntest_hello-7c5b866886-g5xd8"; output;) table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 6e:6d:05:5b:15:32), action=(outport = "k8s-worker-0"; output;) table=25(ls_in_l2_lkup ), priority=0 , match=(1), action=(outport = get_fdb(eth.dst); next;) table=26(ls_in_l2_unknown ), priority=50 , match=(outport == "none"), action=(drop;) table=26(ls_in_l2_unknown ), priority=0 , match=(1), action=(output;) Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: egress table=0 (ls_out_pre_lb ), priority=110 , match=(eth.mcast), action=(next;) table=0 (ls_out_pre_lb ), priority=110 , match=(eth.src == $svc_monitor_mac), action=(next;) table=0 (ls_out_pre_lb ), priority=110 , match=(ip && outport == "stor-worker-0"), action=(next;) table=0 (ls_out_pre_lb ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2), action=(next;) table=0 (ls_out_pre_lb ), priority=100 , match=(ip), action=(reg0[2] = 1; next;) table=0 (ls_out_pre_lb ), priority=0 , match=(1), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(eth.mcast), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(eth.src == $svc_monitor_mac), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(ip && outport == "stor-worker-0"), action=(next;) table=1 (ls_out_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;) <snip>
  14. Logical Flow vs OpenFlow 17 $ oc -n openshift-ovn-kubernetes exec

    -c northd ovnkube-master-zwglk -- ovn-sbctl dump-flows worker-0 Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: ingress table=0 (ls_in_check_port_sec), priority=100 , match=(eth.src[40]), action=(drop;) table=0 (ls_in_check_port_sec), priority=100 , match=(vlan.present), action=(drop;) table=0 (ls_in_check_port_sec), priority=50 , match=(1), action=(reg0[15] = check_in_port_sec(); next;) table=1 (ls_in_apply_port_sec), priority=50 , match=(reg0[15] == 1), action=(drop;) table=1 (ls_in_apply_port_sec), priority=0 , match=(1), action=(next;) table=2 (ls_in_lookup_fdb ), priority=0 , match=(1), action=(next;) table=3 (ls_in_put_fdb ), priority=0 , match=(1), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.dst == $svc_monitor_mac), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(eth.mcast), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(ip && inport == "stor-worker-0"), action=(next;) table=4 (ls_in_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;) table=4 (ls_in_pre_acl ), priority=100 , match=(ip), action=(reg0[0] = 1; next;) table=4 (ls_in_pre_acl ), priority=0 , match=(1), action=(next;) <snip> $ oc debug node/worker-0 -- chroot /host ovs-ofctl -O OpenFlow13 dump-flows br-int --no-stats Temporary namespace openshift-debug-t2zd6 is created for debugging node... Starting pod/worker-0-debug ... To use host binaries, run `chroot /host` cookie=0xdaf2f9b4, priority=180,vlan_tci=0x0000/0x1000 actions=conjunction(100,2/2) cookie=0xdaf2f9b4, priority=180,conj_id=100,in_port=5,vlan_tci=0x0000/0x1000 actions=set_field:0xe->reg11,set_field:0xd->reg12,set_field:0x10->metadata,set_field:0x1->reg14,set_field:52:54:00:00:13:04->eth_src,resubmit(,8) priority=100,in_port=3 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) priority=100,in_port=2 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) priority=100,in_port=1 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) cookie=0x98f46744, priority=100,in_port=4 actions=set_field:0xb->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x2->reg14,resubmit(,8) cookie=0xf3e2718b, priority=100,in_port=6 actions=set_field:0xc->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x6->reg14,resubmit(,8) cookie=0xf3e4502e, priority=100,in_port=7 actions=set_field:0x10->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x8->reg14,resubmit(,8) cookie=0xbb4a6313, priority=100,in_port=8 actions=set_field:0x12->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x5->reg14,resubmit(,8) cookie=0x5a36182a, priority=100,in_port=10 actions=set_field:0x13->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x7->reg14,resubmit(,8) priority=100,in_port=11 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) priority=100,in_port=12 actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38) cookie=0xda74b9ad, priority=100,in_port=13 actions=set_field:0x5->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x3->reg14,resubmit(,8) Logical Flow OpenFlow
  15. OVNの手動構成 18 • OVSDBの操作 ◦ ovsdb-tool ◦ ovsdb-client • Logical

    Switchの作成 ◦ ovn-nbctl lswitch-add SWITCH_NAME • Logical Portの作成 ◦ ovn-nbctl lport-add SWITCH_NAME PORT_NAME • Logical PortにMACアドレスを設定 ◦ ovn-nbctl lport-set-address PORT_NAME MAC_ADDRESS • Logical PortとPhysical Portの紐付け ◦ ovs-vsctl add-port BRIDGE INTERFACE -- set Interface INTERFACE external_ids:iface-id=PORT_NAME ↓ • OpenStack, Kubernetes等と連携するときは、この辺りは Neutron ML2 driver/CNI Pluginがやってくれ ます
  16. OVSDB 20 • OVSDB ◦ Native JSON-RPC 1.0 support ▪

    OVSDB Management Protocol (RFC7047) ◦ Bidirectional communication ▪ Whenever a monitored portion of the database changes, the server tells the client what rows were added or modified (including the new contents) or deleted. ◦ Schema based ◦ Standard Database Operations ◦ Journal based in-memory datastore ◦ Atomic transactions Request : { “id" : <string> or <integer> “method" : <string>, “params" : [<object>] } Response : { "id":<string> or <integer> “result" : [<object>], “error" : <error> }
  17. OVSDB - RPC methods, Schema 22 • RPC methods ◦

    list_dbs ◦ get_schema ◦ transact ◦ cancel ◦ monitor ◦ monitor_cancel ◦ lock ◦ steal ◦ unlock ◦ echo ◦ update ◦ locked ◦ stolen ◦ echo • Schema <database-schema> : { “name”:<id>, "tables": {<id>: <table-schema>, …}, …, } <table-schema> : { "columns": {<id>: <column-schema>, ...} …, } <column-schema> : { "type": <type> …, }
  18. OVSDB - monitoring 24 $ ovsdb-client monitor tcp:127.0.0.1:6640 todo List

    $ ovsdb-client transact tcp:127.0.0.1:6640 '[todo,{op:insert, table:List, row:{name:List1} }]' [{"uuid":["uuid","b8654366-1a91-4813-bc5c-24bd23ed8d83"]}] $ ovsdb-client monitor tcp:127.0.0.1:6640 todo List row action items name _version ------------------------------------ ------ ----- ----- ------------------------------------ b8654366-1a91-4813-bc5c-24bd23ed8d83 insert [] List1 e07c4d9e-2544-4c02-b68c-72222b5da82d Terminal #1 Terminal #1 Terminal #2
  19. OVSDBの冗長化 25 • 2種類のクラスタリング: Active-Backup, Clustered ◦ https://docs.openvswitch.org/en/latest/ref/ovsdb.7/ • Active-Backup

    ◦ Active, Backupの2台構成 ◦ ActiveノードはStandaloneノードと同じ動きをする ◦ クライアントはActiveに接続して読み書き ◦ BackupノードはActiveノードに接続してデータをレプリケーション • Clustered ◦ Raft分散合意アルゴリズムによるクラスタリング ◦ 奇数台で構成、過半数より多いノード数が生きている限りサービス継 続 ▪ OpenShiftの場合、masterは3台なので、1台障害まで許容
  20. Tips 26 • ClusteredモードでどのPodがRaftのLeader/Followerかを確認する $ oc -n openshift-ovn-kubernetes exec -c

    northd ovnkube-master-zwglk -- ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound 95a5 Name: OVN_Northbound Cluster ID: 6a7b (6a7b2916-0fae-4208-8dda-aecca921761c) Server ID: 95a5 (95a5f5fd-1959-4920-8905-d0b5ad99ddc4) Address: ssl:172.16.13.103:9643 Status: cluster member Role: leader Term: 7 Leader: self Vote: self Last Election started 79806997 ms ago, reason: leadership_transfer Last Election won: 79806988 ms ago Election timer: 10000 Log: [21725, 27838] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 <-4846 <-4110 ->4110 Disconnections: 0 Servers: 4846 (4846 at ssl:172.16.13.101:9643) next_index=27838 match_index=27837 last msg 844 ms ago 95a5 (95a5 at ssl:172.16.13.103:9643) (self) next_index=21731 match_index=27837 4110 (4110 at ssl:172.16.13.102:9643) next_index=27838 match_index=27837 last msg 844 ms ago
  21. NeutronとOVNの構成要素のマッピング 29 NEUTRON OVN router logical router + gateway_chassis (scheduling)

    network logical switch + dhcp_options port logical switch port ( + logical router port) security group Port_Group + ACL + Address_Set floating ip NAT (dnat_snat entry type) (in octavia WIP!) Load_Balancer
  22. networking-ovnの特徴 30 • L2 ◦ ARP responderの機能 • L3 ◦

    OVNでIPv4/IPv6ルーティングのネイティブサポート ▪ L3 agentは必要ない ◦ 分散ルータ ◦ namespaceを渡る必要がないので効率的 • Security Group ◦ カーネルのconntrackモジュールをOVSから直接利用 ◦ Neutronの firewall_driver = openvswitch と同じ動き • DHCP ◦ ovn-controllerがDHCPの機能を持つ ▪ dhcp agentは必要ない ▪ dnsmasqがたくさん地獄にならない ◦ シンプルなユースケースのみ想定
  23. networking-ovnの特徴 31 • Metadata ◦ 今の実装では namespace + haproxy ◦

    metadata-agentとneutron-serverとの 通信は不要 • Octavia ◦ OVNのOctavia driver開発中 ◦ Amphora VMが必要なくなる VM1 VM3 VM2 localport A localport B br-int VM4 nsB haproxy nsA haproxy ovn-metadata-agent UNIX socket Chassis 1
  24. 33 Kubernetes OpenShift ovn-kubernetes Northbound DB Southbound DB ovn-northd ovn-controller

    OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko ovn-controller OVSDB ovs-vswitchd openvswitch.ko 管理サーバ OVSDB Management Protocol OpenFlow OVNのコンポーネントとノードとのマッピング master node worker node
  25. Pod構成 (1) 34 • ovnkube-master ◦ masterノードで稼動するDaemonSet ◦ コンテナは6つ ▪

    northd • NBDBの状態をSBDBに反映するデーモン ▪ nbdb • OVSDB (Northbound DB) • master 3台でRAFTクラスター構成 ▪ sbdb • OVSDB (Southbound DB) • master 3台でRAFTクラスター構成 ▪ ovnkube-master • ホストへのサブネットの切り出し • 新規PodができたらNBDBを更新 ▪ kube-rbac-proxy • RBAC authをk8s APIに対して実行できるようにするためのプロキシー ▪ ovn-dbchecker • OVSDBのRAFTクラスターの状態を監視する
  26. Pod構成 (2) 35 • ovnkube-node ◦ 全てのノードで稼動する DaemonSet ◦ コンテナは5つ

    ▪ ovn-controller • SBDBの情報から、自ノードの Physical Flowを生成するデーモン ▪ ovn-acl-logging • ACL audit logを生成する ▪ kube-rbac-proxy • RBAC authをk8s APIに対して実行できるようにするためのプロキシー ▪ kube-rbac-proxy-ovn-metrics • RBAC authをk8s APIに対して実行できるようにするためのプロキシー ▪ ovnkube-node • CNIバイナリ(ovn-k8s-cni-overlay)と連携してPodがネットワークに接続でき るように各種設定を行う • ※ OVSはホストOS上でsystemdのサービスunitとして稼動 ◦ ovs-* コマンドはホスト上で直接実行できる ◦ ovn-*コマンドはどこかのコンテナに oc execしてから実行する
  27. 使用コマンド 36 • オペレーションでコマンド ◦ ovn-nbctl ◦ ovn-sbctl ◦ ovn-trace

    • private key, certificate, CA certificate等の指定はDaemonSetの定義を見て思い出すのが よ いと思います ◦ NBDB: TCP 9641 ◦ SBDB: TCP 9642 $ oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -A1 OVN_NB_CTL= | sed 's/^ *//' OVN_NB_CTL="ovn-nbctl -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \ --db "ssl:172.16.13.101:9641,ssl:172.16.13.102:9641,ssl:172.16.13.103:9641""
  28. 物理構成 (OCP v4.12: Shared Gateway Mode) 37 worker-0 172.16.13.0/24 OpenShiftクラスター

    br-int br-ex enp1s0 172.16.13.104/24 169.254.169.2/29 … Geneve tunnel links to the other nodes ovn-k8s-mp0 10.129.2.2/23 Pod eth0 Pod eth0 master-0 br-int br-ex enp1s0 172.16.13.101/24 169.254.169.2/29 … Geneve tunnel links to the other nodes ovn-k8s-mp0 10.130.0.2/23 Pod eth0 Pod eth0
  29. オーバーレイ仮想ネットワーク (OCP v4.12: Shared Gateway Mode) stor-worker-0 br-ex_worker-0 br-ex_master-0 rtos-master-0(10.130.0.1/23)

    rtos-worker-0(10.129.2.1/23) k8s-master-0 (10.130.0.2/23) k8s-worker-0 (10.129.2.2/23) stor-master-0 rtoj-GR_master-0 (100.64.0.4/16) rtoe-GR_master-0 (172.16.13.101/24) etor-GR_master-0 jtor-ovn_cluster_router jtor-GR_worker-0 rtoj-GR_worker-0 (100.64.0.7/16) rtoe-GR_worker-0 (172.16.13.104/24) etor-GR_worker-0 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) br-int br-int worker-0 master-0 Pod Pod master-0 Pod Pod worker-0 ovn_cluster_router GR_worker-0 ext_worker-0 ext_master-0 GR_master-0 join rtoj-ovn_cluster_router(100.64.0.1/29) Underlay Network jtor-GR_master-0 enp1s0 (br-ex) enp1s0 (br-ex) 凡例 ロジカルスイッチ ロジカルルーター ロジカルロード バランサー
  30. Overall view 41 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01

    worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23
  31. Pod-to-Pod - same node 44 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1

    10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23
  32. Pod-to-Pod - same node 45 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1

    10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:29 && ip4.dst == 10.129.2.41 && tcp.dst == 8080 && ip.ttl == 64 && tcp'
  33. ovn-trace --summary (Pod-to-Pod - same node) 46 ingress(dp="worker-0", inport="ovntest_client") {

    reg0[15] = check_in_port_sec(); next; reg0[0] = 1; next; reg0[2] = 1; next; ct_lb_mark; ct_lb_mark { reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; outport = "ovntest_hello-w0"; output; egress(dp="worker-0", inport="ovntest_client", outport="ovntest_hello-w0") { reg0[2] = 1; next; reg0[0] = 1; next; ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ { reg0[10] = 1; next; reg0[15] = check_out_port_sec(); next; output; /* output to "ovntest_hello-w0", type "" */; }; }; }; }; Logical Switch `worker-0`の Ingress Datapath Logical Switch `worker-0`の Egress Datapath
  34. ovn-trace --detail (Pod-to-Pod - same node) 47 ingress(dp="worker-0", inport="ovntest_client") -----------------------------------------------

    0. ls_in_check_port_sec (northd.c:8327): 1, priority 50, uuid 57ea4ad9 reg0[15] = check_in_port_sec(); next; 4. ls_in_pre_acl (northd.c:5801): ip, priority 100, uuid b95659ee reg0[0] = 1; next; 5. ls_in_pre_lb (northd.c:5971): ip, priority 100, uuid 5c887504 reg0[2] = 1; next; 6. ls_in_pre_stateful (northd.c:5994): reg0[2] == 1, priority 110, uuid 3d49ab72 ct_lb_mark; ct_lb_mark ---------- 7. ls_in_acl_hint (northd.c:6054): ct.new && !ct.est, priority 7, uuid bd5ebb2e reg0[7] = 1; reg0[9] = 1; next; 8. ls_in_acl (northd.c:6668): ip && !ct.est, priority 1, uuid 13bbc321 reg0[1] = 1; next; 15. ls_in_stateful (northd.c:7507): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 2335ba44 ct_commit { ct_mark.blocked = 0; }; next; 16. ls_in_pre_hairpin (northd.c:7535): ip && ct.trk, priority 100, uuid 2e88b7c6 reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; 25. ls_in_l2_lkup (northd.c:8993): eth.dst == 0a:58:0a:81:02:29, priority 50, uuid 96aa0da0 outport = "ovntest_hello-w0"; output; egress(dp="worker-0", inport="ovntest_client", outport="ovntest_hello-w0") -------------------------------------------------------------------------- 0. ls_out_pre_lb (northd.c:5973): ip, priority 100, uuid 73ec1607 reg0[2] = 1; next; 1. ls_out_pre_acl (northd.c:5803): ip, priority 100, uuid cbfe6a43 reg0[0] = 1; next; 2. ls_out_pre_stateful (northd.c:5997): reg0[2] == 1, priority 110, uuid a10d5414 ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ ------------------------------------------------ 3. ls_out_acl_hint (northd.c:6116): ct.est && ct_mark.blocked == 0, priority 1, uuid 4e73f02a reg0[10] = 1; next; 8. ls_out_check_port_sec (northd.c:5657): 1, priority 0, uuid d9e7bcdf reg0[15] = check_out_port_sec(); next; 9. ls_out_apply_port_sec (northd.c:5662): 1, priority 0, uuid 2c3e3cdd output; /* output to "ovntest_hello-w0", type "" */ Logical Switch `worker-0`の Ingress Datapath Logical Switch `worker-0`の Egress Datapath
  35. サマリ 49 • worker-0上のPod `client` からworker-1上のPod `hello-w1` に通信 • 送信元Pod

    `curl` から出たパケットは、ノードローカルなロジカルスイッチ `worker-0` を経 由して、ノード間を接続するロジカルルータ `ovn_cluster_router` に入る • ovn_cluster_routerは送信先PodのIPアドレスはworker-1のサブネットなので、 worker-1のノー ドローカルスイッチに転送する • worker-1のノードローカルスイッチを経由してパケットが送信先 Pod (hello-w1) に届く
  36. Pod-to-Pod - different node 50 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1

    10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23
  37. Pod-to-Pod - different node 51 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1

    10.131.0.1/23 0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:01 && ip4.dst == 10.131.0.18 && tcp.dst == 8080 && ip.ttl == 64 && tcp'
  38. ovn-trace --summary (Pod-to-Pod - different node) 52 ingress(dp="worker-0", inport="ovntest_client") {

    reg0[15] = check_in_port_sec(); next; reg0[0] = 1; next; reg0[2] = 1; next; ct_lb_mark; ct_lb_mark { reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; outport = "stor-worker-0"; output; egress(dp="worker-0", inport="ovntest_client", outport="stor-worker-0") { next; next; reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[15] = check_out_port_sec(); next; output; /* output to "stor-worker-0", type "patch" */; Logical Switch `worker-0`の Ingress Datapath Logical Switch `worker-0`の Egress Datapath
  39. ovn-trace --summary (Pod-to-Pod - different node) 53 ingress(dp="ovn_cluster_router", inport="rtos-worker-0") {

    xreg0[0..47] = 0a:58:0a:81:02:01; next; reg9[2] = 1; next; next; reg7 = 0; next; ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 10.131.0.1; eth.src = 0a:58:0a:83:00:01; outport = "rtos-worker-1"; flags.loopback = 1; next; next; reg8[0..15] = 0; next; next; eth.dst = 0a:58:0a:83:00:12; next; outport = "cr-rtos-worker-1"; next; output; /* Replacing type "chassisredirect" outport "cr-rtos-worker-1" with distributed port "rtos-worker-1". */; egress(dp="ovn_cluster_router", inport="rtos-worker-0", outport="rtos-worker-1") { reg9[4] = 0; next; output; /* output to "rtos-worker-1", type "patch" */; Logical Router `ovn_cluster_router`の Ingress Datapath Logical Router `ovn_cluster_router`の Egress Datapath
  40. ovn-trace --summary (Pod-to-Pod - different node) 54 ingress(dp="worker-1", inport="stor-worker-1") {

    reg0[15] = check_in_port_sec(); next; next; next; reg0[7] = 1; reg0[9] = 1; next; reg0[1] = 1; next; ct_commit { ct_mark.blocked = 0; }; next; reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next; outport = "ovntest_hello-w1"; output; egress(dp="worker-1", inport="stor-worker-1", outport="ovntest_hello-w1") { reg0[2] = 1; next; reg0[0] = 1; next; ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ { reg0[10] = 1; next; reg0[15] = check_out_port_sec(); next; output; /* output to "ovntest_hello-w1", type "" */; }; }; }; }; }; }; }; };
  41. サマリ 56 • worker-0上のPod `client` からworker-1上のPod `hello-w1` に通信 • 送信元Pod

    `curl` から出たパケットは、ノードローカルなロジカルスイッチ `worker-0` に入 る • ロジカルスイッチworker-0上にClusterIP Serviceに相当するロジカルロードバランサがあり、 そこでService配下Podに振り分ける • 振り分け先のPodが同じノードであれば折り返し、異なるノードであればロジカルルータ ovn_cluster_routerを経由してノードをまたいだ通信をする
  42. Pod-to-ClusterIP same/different node 57 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23

    0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 ClusterIP Service 172.30.182.53:80
  43. Pod-to-ClusterIP same/different node 58 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23

    0a:58:0a:83:00:01 worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:01 && ip4.dst == 172.30.182.53 && tcp.dst == 80 && ip.ttl == 64 && tcp' ClusterIP Service 172.30.182.53:80
  44. OpenFlowのトレースを見てみる (1) 59 $ oc exec client -- cat /sys/class/net/eth0/iflink

    45 $ oc debug node/worker-0 -- chroot /host ovs-vsctl find Interface ifindex=45 2>/dev/null | grep ofport ofport : 44 ofport_request : [] $ sudo ovs-appctl ofproto/trace br-int \ in_port=44,\ tcp,\ dl_src=0a:58:0a:81:02:27,\ nw_src=10.129.2.39,\ tcp_src=33333,\ dl_dst=0a:58:0a:81:02:01,\ nw_dst=172.30.182.53,\ tcp_dst=80,\ nw_ttl=64,\ dp_hash=2 client Podのインターフェース番号は 45 client PodはOVSブリッジbr-intのポート番号 は44に接続している worker-0上で実行
  45. OpenFlowのトレースを見てみる (2) 60 $ sudo ovs-appctl ofproto/trace br-int in_port=44,tcp,dl_src=0a:58:0a:81:02:27,nw_src=10.129.2.39,tcp_src=33333,dl_dst=0a:58:0a:81:02:01,nw_dst=172.30.182.53,tcp_dst=80 ,nw_ttl=64,dp_hash=2

    Flow: dp_hash=0x2,tcp,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=172.30.182 .53,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=80,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=44, priority 100, cookie 0xf7f9ca3e set_field:0x16->reg13 set_field:0x7->reg11 set_field:0x1->reg12 ... 29. metadata=0xa, priority 0, cookie 0x928911c5 resubmit(,37) 37. reg15=0xd,metadata=0xa, priority 100, cookie 0x2a5007e1 set_field:0xa/0xffffff->tun_id set_field:0xd->tun_metadata0 move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30] -> NXM_NX_TUN_METADATA0[16..30] is now 0xa output:12 -> output to kernel tunnel resubmit(,38) 38. No match. drop Final flow: recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_ tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw _frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:8 1:02:01,nw_src=10.129.2.32/29,nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,l en=4,0xa000d}),flags(df|csum|key))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4 12番ポートから送信 worker-0上で実行
  46. OpenFlowのトレースを見てみる (3) 61 $ sudo ovs-ofctl -O OpenFlow13 show br-int

    ... 12(ovn-4c2f1b-0): addr:82:9b:d1:10:33:a0 config: 0 state: LIVE speed: 0 Mbps now, 0 Mbps max ... $ sudo ovs-vsctl --columns=options find interface type=geneve ofport=12 options : {csum="true", key=flow, remote_ip="172.16.13.105"} $ sudo ovs-vsctl show ... 6f785367-2ac9-40bc-9728-64db3aeb2e8d Bridge br-int fail_mode: secure datapath_type: system ... Port ovn-4c2f1b-0 Interface ovn-4c2f1b-0 type: geneve options: {csum="true", key=flow, remote_ip="172.16.13.105"} ... 12番ポートのインターフェース名は ovn-4c2f1b-0 ovn-4c2f1b-0は172.16.13.105 (worker-1) 行きのGeneveトンネル おまけ:12番ポートがどのGeneveトンネルか を一撃で調べる worker-0上で実行
  47. OpenFlowのトレースを見てみる (4) 62 $ sudo ovs-appctl ofproto/trace br-int \ in_port=3,\

    tun_id=0xa,\ tun_metadata0=0xa000d, tcp,\ reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe, vlan_tci=0x0000, dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_fla gs=0 Flow: tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,tun_id=0xa,metadata=0xe,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83 :00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=3, priority 100 move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23] -> OXM_OF_METADATA[0..23] is now 0xa ... 64. priority 0 resubmit(,65) 65. reg15=0x11,metadata=0xc, priority 100, cookie 0xccf44662 output:23 Final flow: recirc_id=0x124769,eth,tcp,reg0=0x287,reg11=0x3,reg12=0x4,reg13=0x17,reg14=0x1,reg15=0x11,tun_id=0xa,metadata=0xc,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0x124769,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=3,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.128.0.0/15,nw_d st=10.131.0.18,nw_frag=no Datapath actions: ct(commit,zone=23,mark=0/0x1,nat(src)),18 Final flow: recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.32/29, nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xa000d}),flags(df|csum|k ey))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4 worker-0でのtrace結果 worker-1上で実行
  48. OpenFlowのトレースを見てみる (4) 63 $ sudo ovs-appctl ofproto/trace br-int \ in_port=3,\

    tun_id=0xa,\ tun_metadata0=0xa000d,\ tcp,\ reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,\ vlan_tci=0x0000,\ dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_fla gs=0 Flow: tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,tun_id=0xa,metadata=0xe,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83 :00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=3, priority 100 move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23] -> OXM_OF_METADATA[0..23] is now 0xa ... 64. priority 0 resubmit(,65) 65. reg15=0x11,metadata=0xc, priority 100, cookie 0xccf44662 output:23 Final flow: recirc_id=0x124769,eth,tcp,reg0=0x287,reg11=0x3,reg12=0x4,reg13=0x17,reg14=0x1,reg15=0x11,tun_id=0xa,metadata=0xc,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0x124769,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=3,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.128.0.0/15,nw_d st=10.131.0.18,nw_frag=no Datapath actions: ct(commit,zone=23,mark=0/0x1,nat(src)),18 Final flow: recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81: 02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0 Megaflow: recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.32/29, nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xa000d}),flags(df|csum|k ey))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4 23番ポートから送信 worker-1上で実行 worker-0でのtrace結果
  49. OpenFlowのトレースを見てみる (5) 64 $ sudo ovs-vsctl --column=external_ids find interface ofport=23

    external_ids : {attached_mac="0a:58:0a:83:00:12", iface-id=ovntest_hello-w1, iface-id-ver="25e16505-3656-49df-8268-80b20fe065d3", ip_addresses="10.131.0.18/23", ovn-installed="true", ovn-installed-ts="1674478354432", sandbox=f0888b516b6dabb7b8e967672de1c0e731d849a63c6f9798e35ed9c69063d354} $ sudo ovs-vsctl --columns=ofport find interface type=geneve options:remote_ip=172.16.13.104 ofport : 3 worker-0 (172.16.13.104)からの Geneveトンネルは3番ポート 23番ポートはhello-w1 Pod worker-1上で実行
  50. Pod-to-NodePort (1) 66 stor-worker-1 rtos-worker-0 10.129.2.1/23 0a:58:0a:81:02:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01

    worker-0 worker-1 stor-worker-0 rtoj-GR_worker-0 100.64.0.7/16 rtoe-GR_worker-0 172.16.13.104/24 jtor-ovn_cluster_router jtor-GR_worker-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 worker-0 client hello-w0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.129.2.39 0a:58:0a:81:02:27 10.129.2.41 0a:58:0a:81:02:29 10.131.0.18 0a:58:0a:83:00:12 k8s-worker-0 10.129.2.2/23 k8s-worker-1 10.131.0.2/23 curl master-0:31513 NAME TYPE CLUSTER-IP PORT(S) hello-nodeport NodePort 172.30.49.137 80:31513/TCP oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "ovntest_client" && eth.src == 0a:58:0a:81:02:27 && ip4.src == 10.129.2.39 && tcp.src == 33333 && eth.dst == 0a:58:0a:81:02:01 && ip4.dst == 172.16.13.101 && tcp.dst == 31513 && ip.ttl == 64 && tcp'
  51. Pod-to-NodePort (2) 67 stor-worker-1 rtos-master-0 10.130.0.1/23 0a:58:0a:82:00:01 rtos-worker-1 10.131.0.1/23 0a:58:0a:83:00:01

    master-0 worker-1 stor-master-0 rtoj-GR_master-0 100.64.0.4/16 rtoe-GR_master-0 172.16.13.101/24 jtor-ovn_cluster_router jtor-GR_master-0 jtor-GR_worker-1 rtoj-GR_worker-1 100.64.0.5/16 rtoe-GR_worker-1 172.16.12.105/24 ovn-k8s-mp0 (br-int) ovn-k8s-mp0 (br-int) worker-1 master-0 hello-w1 ovn_cluster_router GR_worker-1 GR_worker-0 join rtoj-ovn_cluster_router 100.64.0.1/16 10.131.0.18 0a:58:0a:83:00:12 k8s-master-0 10.130.0.2/23 k8s-worker-1 10.131.0.2/23 oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642" --ct new 'inport == "br-ex_master-0" && eth.src == 52:54:00:00:13:04 && ip4.src == 172.16.13.104 && tcp.src == 33333 && eth.dst == 52:54:00:00:13:01 && ip4.dst == 172.16.13.101 && tcp.dst == 31513 && ip.ttl == 64 && tcp'
  52. その他 68 • 内部DNS (ns: openshift-dns, svc: dns-default) への名前解決時は、ローカ ルノード上のCoreDNS

    Podに優先して問い合わせをする ◦ https://github.com/openshift/ovn-kubernetes/pull/896 ◦ https://docs.google.com/presentation/d/1_5Dh3HTVSpETvVhZsz E41REwNPiFFxanhGCAAFg5iPQ/edit#slide=id.g144e6452910_0_ 37 • Egress RouterはCNI Pluginを用いて実装 ◦ https://github.com/openshift/egress-router-cni • スケーラビリティ改善の鍵 : OVN Interconnect ◦ https://www.openvswitch.org/support/ovscon2022/slides/OVN-I C-OVSCON.pdf ◦ https://www.openvswitch.org/support/ovscon2019/day1/1501-Mul ti-tenant%20Inter-DC%20tunneling%20with%20OVN(4).pdf