Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OVN-Kubernetes-Introduction-ja-2023-01-27.pdf

orimanabu
January 27, 2023

 OVN-Kubernetes-Introduction-ja-2023-01-27.pdf

OpenShift.Run2023での講演資料です

orimanabu

January 27, 2023
Tweet

More Decks by orimanabu

Other Decks in Technology

Transcript

  1. OCP v4.12
    OVN-Kubernetes
    Deep Dive
    Manabu Ori
    Red Hat
    1
    OpenShift.Run 2023

    View Slide

  2. 2
    自己紹介
    ▸ 名前: 織 学 (@orimanabu)
    ▸ 所属: Red Hat
    ▸ 仕事: コンサルタント
    ・ OpenStackとかOpenShiftとかAnsibleとか

    View Slide

  3. 目次
    3
    ● OVNとは
    ● OVSDB
    ● OpenStack Integration
    ● OpenShift Integration

    View Slide

  4. OVNとは
    4

    View Slide

  5. OVN (Open Virtual Network) とは
    5
    ● 複数ハイパーバイザ /k8sノード上のOVSにまたがった仮想ネットワークを作る仕組み
    ● OVS (Open vSwitch) のサブプロジェクトとして、 2015年に始動
    ○ 最初のリリース: 27 Sep 2016 (OVS v2.6)
    ○ OpenStack Neutron Plugin (networking-ovn) の最初のリリース: 06 Oct 2016 (Newton)
    ○ OVS v2.11からリポジトリが分離 https://github.com/ovn-org
    ● オーバーレイネットワークを論理ネットワークとして抽象化
    HV1 HV2
    VM-1 VM-2 VM-A VM-3 VM-4 VM-B
    Logical
    Switch
    VM-A VM-B
    Logical
    Switch
    Logical
    Router
    Logical
    Switch
    VM-3 VM-4
    VM-1 VM-2
    物理ネットワーク
    論理ネットワーク

    View Slide

  6. Chris Wright (Red Hat CTO)による
    Open vSwitch and OVN 2022 Fall Conferenceでのキーノートより

    6
    https://www.openvswitch.org/support/ovscon2022/slides/Keynote-OVS-OVN-Nov-2022.pdf

    View Slide

  7. OVNの特徴
    7
    ● データベース操作によるコンフィギュレーション
    ● Logical Flowによる設定
    ○ 物理ネットワーク(OVS)と仮想ネットワークを分離
    ○ だいたいOpenFlowと同じ気分
    ■ フローテーブルのパイプライン、フローの matchとaction
    ● ハイパーバイザ/k8sノード間のカプセリングは Geneve,STT
    ● 分散L2, L3処理
    ● NAT、DHCP、ロードバランサのネイティブ実装
    ● L2, L3ゲートウェイ
    ● 他のCMS (Cloud Management System) と連携することを想定したデザイン
    ○ OpenStack, Kubernetes, Docker, Mesos, oVirt, ...
    OVS OVN
    対象 1台のホスト内の仮想スイッチ 複数のホストにまたがる仮想ネットワーク
    設定 OpenFlow + OVSDB Logical Flow + OVSDB

    View Slide

  8. Open vSwitch (OVS) の課題
    8
    ● OVSは超強力、だけどOpenFlowでSDN環境を構築するのは大変
    ○ 「現時点では、低レベルのフローロジックを直接作り込む必要があるなど、導入の敷居はあまり
    低くありません」
    ■ 技術文書 OpenFlowの概要, VA Linux Systems Japan
    ○ 「プログラミング言語に例えるとアセンブラ、もしくは標準ライブラリがない C言語」
    ■ マスタリングTCP/IP OpenFlow編, オーム社
    ● OVSは超強力、だから
    ○ かつてOpenStackのML2/OVSでは、OVS, Network Namespace, iptables, etcを組み合わせて
    様々な機能を実現していた
    ○ OVSネイティブな機能を活用するとより効率的に処理できるはず
    ● 仮想化/コンテナ基盤のソフトウェア製品それぞれで OpenFlowの作り込みをするのはつらい
    ○ OpenStack
    ○ Kubernetes
    ○ oVirt, ...

    View Slide

  9. Open vSwitch (OVS) の課題
    9
    ● OVSは超強力、だけどOpenFlowでSDN環境を構築するのは大変
    ○ 「現時点では、低レベルのフローロジックを直接作り込む必要があるなど、導入の敷居はあまり
    低くありません」
    ■ 技術文書 OpenFlowの概要, VA Linux Systems Japan
    ○ 「プログラミング言語に例えるとアセンブラ、もしくは標準ライブラリがない C言語」
    ■ マスタリングTCP/IP OpenFlow編, オーム社
    ● OVSは超強力、だから
    ○ かつてOpenStackのML2/OVSでは、OVS, Network Namespace, iptables, etcを組み合わせて
    様々な機能を実現していた
    ○ OVSネイティブな機能を活用するとより効率的に処理できるはず
    ● 仮想化/コンテナ基盤のソフトウェア製品それぞれで OpenFlowの作り込みをするのはつらい
    ○ OpenStack
    ○ Kubernetes
    ○ oVirt, ...

    View Slide

  10. OVNのコンポーネント
    10
    ● Northbound DB
    ● Southbound DB
    ● ovn-northd
    ● ovn-controller
    Clouc Management System
    (OpenStack, Kubernetes, etc)
    networking-ovn ovn-kubernetes
    Northbound DB
    Southbound DB
    ovn-northd
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    管理サーバ
    ハイパーバイザ /k8sノード
    OVSDB Management Protocol
    OpenFlow

    View Slide

  11. OVNのコンポーネント
    11
    ● Northbound DB
    ● Southbound DB
    ● ovn-northd
    ● ovn-controller
    Clouc Management System
    (OpenStack, Kubernetes, etc)
    networking-ovn ovn-kubernetes
    Northbound DB
    Southbound DB
    ovn-northd
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    管理サーバ
    OVSDB Management Protocol
    OpenFlow
    Northbound DB
    ● CMS (Cloud Management System) との連携をする部分
    ● 論理ネットワークの構成、あるべき姿 (desired state) を格
    納するデータベース
    ○ Logical Port, Logical Switch, Logical Router, ...
    ハイパーバイザ /k8sノード

    View Slide

  12. OVNのコンポーネント
    12
    ● Northbound DB
    ● Southbound DB
    ● ovn-northd
    ● ovn-controller
    Clouc Management System
    (OpenStack, Kubernetes, etc)
    networking-ovn ovn-kubernetes
    Northbound DB
    Southbound DB
    ovn-northd
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    管理サーバ
    OVSDB Management Protocol
    OpenFlow
    Southbound DB
    ● 現在の状態 (runtime state) を格納するデータベース
    ● 論理ポート・スイッチ・ルータと、物理要素とのマッピング
    ● runtime stateと論理ネットワークを元にした Logical Flowのパイ
    プライン
    ハイパーバイザ /k8sノード

    View Slide

  13. OVNのコンポーネント
    13
    ● Northbound DB
    ● Southbound DB
    ● ovn-northd
    ● ovn-controller
    Clouc Management System
    (OpenStack, Kubernetes, etc)
    networking-ovn ovn-kubernetes
    Northbound DB
    Southbound DB
    ovn-northd
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    管理サーバ
    OVSDB Management Protocol
    OpenFlow
    ovn-northd
    ● Northbound DBの論理構成をSouthbound DBの
    runtime stateに変換するデーモン
    ● 論理ネットワークの構成を元に Logical flowを生成
    ハイパーバイザ /k8sノード

    View Slide

  14. ハイパーバイザ /k8sノード
    OVNのコンポーネント
    14
    ● Northbound DB
    ● Southbound DB
    ● ovn-northd
    ● ovn-controller
    Clouc Management System
    (OpenStack, Kubernetes, etc)
    networking-ovn ovn-kubernetes
    Northbound DB
    Southbound DB
    ovn-northd
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    管理サーバ
    OVSDB Management Protocol
    OpenFlow
    ovn-controller
    ● 各ハイパーバイザ/k8sノードで稼働
    ● Logical flowからPhysical flowを生成
    ○ e.g. VIF UUID → OpenFlow port
    ● Physical flowをハイパーバイザ上の OVSに注入

    View Slide

  15. OVNのコンポーネント
    15
    ● Northbound DB
    ○ CMS (Cloud Management System) との連携をする部分
    ○ 論理ネットワークの構成、あるべき姿 (desired state) を格納するデータベース
    ■ Logical Port, Logical Switch, Logical Router, ...
    ● Southbound DB
    ○ 現在の状態 (runtime state) を格納するデータベース
    ○ 論理ポート・スイッチ・ルータと、物理要素とのマッピング
    ○ runtime stateと論理ネットワークを元にした Logical Flowのパイプライン
    ● ovn-northd
    ○ Northbound DBの論理構成をSouthbound DBのruntime stateに変換するデーモン
    ○ 論理ネットワークの構成を元に Logical flowを生成
    ● ovn-controller
    ○ 各ハイパーバイザ/k8sノードで稼働
    ○ Logical flowからPhysical flowを生成
    ■ e.g. VIF UUID → OpenFlow port
    ○ Physical flowをハイパーバイザ/k8sノード上のOVSに注入

    View Slide

  16. Logical Flowの例
    16
    $ oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-sbctl dump-flows worker-0
    Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: ingress
    table=0 (ls_in_check_port_sec), priority=100 , match=(eth.src[40]), action=(drop;)
    table=0 (ls_in_check_port_sec), priority=100 , match=(vlan.present), action=(drop;)
    table=0 (ls_in_check_port_sec), priority=50 , match=(1), action=(reg0[15] = check_in_port_sec(); next;)
    table=1 (ls_in_apply_port_sec), priority=50 , match=(reg0[15] == 1), action=(drop;)
    table=1 (ls_in_apply_port_sec), priority=0 , match=(1), action=(next;)
    table=2 (ls_in_lookup_fdb ), priority=0 , match=(1), action=(next;)
    table=3 (ls_in_put_fdb ), priority=0 , match=(1), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(eth.dst == $svc_monitor_mac), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(eth.mcast), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(ip && inport == "stor-worker-0"), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)),
    action=(next;)
    table=4 (ls_in_pre_acl ), priority=100 , match=(ip), action=(reg0[0] = 1; next;)
    table=4 (ls_in_pre_acl ), priority=0 , match=(1), action=(next;)

    table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 0a:58:0a:81:02:29), action=(outport = "ovntest_hello-w0"; output;)
    table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 0a:58:0a:81:02:38), action=(outport = "ovntest_hello-7c5b866886-g5xd8"; output;)
    table=25(ls_in_l2_lkup ), priority=50 , match=(eth.dst == 6e:6d:05:5b:15:32), action=(outport = "k8s-worker-0"; output;)
    table=25(ls_in_l2_lkup ), priority=0 , match=(1), action=(outport = get_fdb(eth.dst); next;)
    table=26(ls_in_l2_unknown ), priority=50 , match=(outport == "none"), action=(drop;)
    table=26(ls_in_l2_unknown ), priority=0 , match=(1), action=(output;)
    Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: egress
    table=0 (ls_out_pre_lb ), priority=110 , match=(eth.mcast), action=(next;)
    table=0 (ls_out_pre_lb ), priority=110 , match=(eth.src == $svc_monitor_mac), action=(next;)
    table=0 (ls_out_pre_lb ), priority=110 , match=(ip && outport == "stor-worker-0"), action=(next;)
    table=0 (ls_out_pre_lb ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2), action=(next;)
    table=0 (ls_out_pre_lb ), priority=100 , match=(ip), action=(reg0[2] = 1; next;)
    table=0 (ls_out_pre_lb ), priority=0 , match=(1), action=(next;)
    table=1 (ls_out_pre_acl ), priority=110 , match=(eth.mcast), action=(next;)
    table=1 (ls_out_pre_acl ), priority=110 , match=(eth.src == $svc_monitor_mac), action=(next;)
    table=1 (ls_out_pre_acl ), priority=110 , match=(ip && outport == "stor-worker-0"), action=(next;)
    table=1 (ls_out_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)),
    action=(next;)

    View Slide

  17. Logical Flow vs OpenFlow
    17
    $ oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-sbctl dump-flows worker-0
    Datapath: "worker-0" (c10b6daf-de19-44c1-a310-94a4f776222c) Pipeline: ingress
    table=0 (ls_in_check_port_sec), priority=100 , match=(eth.src[40]), action=(drop;)
    table=0 (ls_in_check_port_sec), priority=100 , match=(vlan.present), action=(drop;)
    table=0 (ls_in_check_port_sec), priority=50 , match=(1), action=(reg0[15] = check_in_port_sec(); next;)
    table=1 (ls_in_apply_port_sec), priority=50 , match=(reg0[15] == 1), action=(drop;)
    table=1 (ls_in_apply_port_sec), priority=0 , match=(1), action=(next;)
    table=2 (ls_in_lookup_fdb ), priority=0 , match=(1), action=(next;)
    table=3 (ls_in_put_fdb ), priority=0 , match=(1), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(eth.dst == $svc_monitor_mac), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(eth.mcast), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(ip && inport == "stor-worker-0"), action=(next;)
    table=4 (ls_in_pre_acl ), priority=110 , match=(nd || nd_rs || nd_ra || mldv1 || mldv2 || (udp && udp.src == 546 && udp.dst == 547)), action=(next;)
    table=4 (ls_in_pre_acl ), priority=100 , match=(ip), action=(reg0[0] = 1; next;)
    table=4 (ls_in_pre_acl ), priority=0 , match=(1), action=(next;)

    $ oc debug node/worker-0 -- chroot /host ovs-ofctl -O OpenFlow13 dump-flows br-int --no-stats
    Temporary namespace openshift-debug-t2zd6 is created for debugging node...
    Starting pod/worker-0-debug ...
    To use host binaries, run `chroot /host`
    cookie=0xdaf2f9b4, priority=180,vlan_tci=0x0000/0x1000 actions=conjunction(100,2/2)
    cookie=0xdaf2f9b4, priority=180,conj_id=100,in_port=5,vlan_tci=0x0000/0x1000
    actions=set_field:0xe->reg11,set_field:0xd->reg12,set_field:0x10->metadata,set_field:0x1->reg14,set_field:52:54:00:00:13:04->eth_src,resubmit(,8)
    priority=100,in_port=3
    actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38)
    priority=100,in_port=2
    actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38)
    priority=100,in_port=1
    actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38)
    cookie=0x98f46744, priority=100,in_port=4 actions=set_field:0xb->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x2->reg14,resubmit(,8)
    cookie=0xf3e2718b, priority=100,in_port=6 actions=set_field:0xc->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x6->reg14,resubmit(,8)
    cookie=0xf3e4502e, priority=100,in_port=7 actions=set_field:0x10->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x8->reg14,resubmit(,8)
    cookie=0xbb4a6313, priority=100,in_port=8 actions=set_field:0x12->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x5->reg14,resubmit(,8)
    cookie=0x5a36182a, priority=100,in_port=10 actions=set_field:0x13->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x7->reg14,resubmit(,8)
    priority=100,in_port=11
    actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38)
    priority=100,in_port=12
    actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,38)
    cookie=0xda74b9ad, priority=100,in_port=13 actions=set_field:0x5->reg13,set_field:0x7->reg11,set_field:0x1->reg12,set_field:0xe->metadata,set_field:0x3->reg14,resubmit(,8)
    Logical Flow
    OpenFlow

    View Slide

  18. OVNの手動構成
    18
    ● OVSDBの操作
    ○ ovsdb-tool
    ○ ovsdb-client
    ● Logical Switchの作成
    ○ ovn-nbctl lswitch-add SWITCH_NAME
    ● Logical Portの作成
    ○ ovn-nbctl lport-add SWITCH_NAME PORT_NAME
    ● Logical PortにMACアドレスを設定
    ○ ovn-nbctl lport-set-address PORT_NAME MAC_ADDRESS
    ● Logical PortとPhysical Portの紐付け
    ○ ovs-vsctl add-port BRIDGE INTERFACE -- set Interface INTERFACE
    external_ids:iface-id=PORT_NAME

    ● OpenStack, Kubernetes等と連携するときは、この辺りは Neutron ML2 driver/CNI Pluginがやってくれ
    ます

    View Slide

  19. OVSDB
    19

    View Slide

  20. OVSDB
    20
    ● OVSDB
    ○ Native JSON-RPC 1.0 support
    ■ OVSDB Management Protocol (RFC7047)
    ○ Bidirectional communication
    ■ Whenever a monitored portion of the database changes, the
    server tells the client what rows were added or modified
    (including the new contents) or deleted.
    ○ Schema based
    ○ Standard Database Operations
    ○ Journal based in-memory datastore
    ○ Atomic transactions
    Request :
    {
    “id" : or
    “method" : ,
    “params" : []
    }
    Response :
    {
    "id": or
    “result" : [],
    “error" :
    }

    View Slide

  21. OVSDB
    21
    https://twitter.com/Ben_Pfaff/status/453333818653417472

    View Slide

  22. OVSDB - RPC methods, Schema
    22
    ● RPC methods
    ○ list_dbs
    ○ get_schema
    ○ transact
    ○ cancel
    ○ monitor
    ○ monitor_cancel
    ○ lock
    ○ steal
    ○ unlock
    ○ echo
    ○ update
    ○ locked
    ○ stolen
    ○ echo
    ● Schema
    :
    {
    “name”:,
    "tables": {: , …},
    …,
    }
    :
    {
    "columns": {: ,
    ...}
    …,
    }
    :
    {
    "type":
    …,
    }

    View Slide

  23. OVSDB - Open_vSwitch Database
    23

    View Slide

  24. OVSDB - monitoring
    24
    $ ovsdb-client monitor tcp:127.0.0.1:6640 todo List
    $ ovsdb-client transact tcp:127.0.0.1:6640 '[todo,{op:insert, table:List, row:{name:List1} }]'
    [{"uuid":["uuid","b8654366-1a91-4813-bc5c-24bd23ed8d83"]}]
    $ ovsdb-client monitor tcp:127.0.0.1:6640 todo List
    row action items name _version
    ------------------------------------ ------ ----- ----- ------------------------------------
    b8654366-1a91-4813-bc5c-24bd23ed8d83 insert [] List1 e07c4d9e-2544-4c02-b68c-72222b5da82d
    Terminal #1
    Terminal #1
    Terminal #2

    View Slide

  25. OVSDBの冗長化
    25
    ● 2種類のクラスタリング: Active-Backup, Clustered
    ○ https://docs.openvswitch.org/en/latest/ref/ovsdb.7/
    ● Active-Backup
    ○ Active, Backupの2台構成
    ○ ActiveノードはStandaloneノードと同じ動きをする
    ○ クライアントはActiveに接続して読み書き
    ○ BackupノードはActiveノードに接続してデータをレプリケーション
    ● Clustered
    ○ Raft分散合意アルゴリズムによるクラスタリング
    ○ 奇数台で構成、過半数より多いノード数が生きている限りサービス継

    ■ OpenShiftの場合、masterは3台なので、1台障害まで許容

    View Slide

  26. Tips
    26
    ● ClusteredモードでどのPodがRaftのLeader/Followerかを確認する
    $ oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk -- ovn-appctl -t
    /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
    95a5
    Name: OVN_Northbound
    Cluster ID: 6a7b (6a7b2916-0fae-4208-8dda-aecca921761c)
    Server ID: 95a5 (95a5f5fd-1959-4920-8905-d0b5ad99ddc4)
    Address: ssl:172.16.13.103:9643
    Status: cluster member
    Role: leader
    Term: 7
    Leader: self
    Vote: self
    Last Election started 79806997 ms ago, reason: leadership_transfer
    Last Election won: 79806988 ms ago
    Election timer: 10000
    Log: [21725, 27838]
    Entries not yet committed: 0
    Entries not yet applied: 0
    Connections: ->0000 4110
    Disconnections: 0
    Servers:
    4846 (4846 at ssl:172.16.13.101:9643) next_index=27838 match_index=27837 last msg 844 ms ago
    95a5 (95a5 at ssl:172.16.13.103:9643) (self) next_index=21731 match_index=27837
    4110 (4110 at ssl:172.16.13.102:9643) next_index=27838 match_index=27837 last msg 844 ms ago

    View Slide

  27. OpenStack
    Integration
    27

    View Slide

  28. OpenStackとの連携
    28
    ● Neutron ML2 driver: networking-ovn
    ML2/OVS
    ML2/OVN

    View Slide

  29. NeutronとOVNの構成要素のマッピング
    29
    NEUTRON OVN
    router logical router + gateway_chassis (scheduling)
    network logical switch + dhcp_options
    port logical switch port ( + logical router port)
    security group Port_Group + ACL + Address_Set
    floating ip NAT (dnat_snat entry type)
    (in octavia WIP!) Load_Balancer

    View Slide

  30. networking-ovnの特徴
    30
    ● L2
    ○ ARP responderの機能
    ● L3
    ○ OVNでIPv4/IPv6ルーティングのネイティブサポート
    ■ L3 agentは必要ない
    ○ 分散ルータ
    ○ namespaceを渡る必要がないので効率的
    ● Security Group
    ○ カーネルのconntrackモジュールをOVSから直接利用
    ○ Neutronの firewall_driver = openvswitch と同じ動き
    ● DHCP
    ○ ovn-controllerがDHCPの機能を持つ
    ■ dhcp agentは必要ない
    ■ dnsmasqがたくさん地獄にならない
    ○ シンプルなユースケースのみ想定

    View Slide

  31. networking-ovnの特徴
    31
    ● Metadata
    ○ 今の実装では namespace + haproxy
    ○ metadata-agentとneutron-serverとの
    通信は不要
    ● Octavia
    ○ OVNのOctavia driver開発中
    ○ Amphora VMが必要なくなる
    VM1 VM3
    VM2
    localport A localport B
    br-int
    VM4
    nsB
    haproxy
    nsA
    haproxy
    ovn-metadata-agent
    UNIX socket
    Chassis 1

    View Slide

  32. OpenShift
    Integration
    32

    View Slide

  33. 33
    Kubernetes
    OpenShift
    ovn-kubernetes
    Northbound DB
    Southbound DB
    ovn-northd
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    ovn-controller
    OVSDB
    ovs-vswitchd
    openvswitch.ko
    管理サーバ
    OVSDB Management Protocol
    OpenFlow
    OVNのコンポーネントとノードとのマッピング
    master node
    worker node

    View Slide

  34. Pod構成 (1)
    34
    ● ovnkube-master
    ○ masterノードで稼動するDaemonSet
    ○ コンテナは6つ
    ■ northd
    ● NBDBの状態をSBDBに反映するデーモン
    ■ nbdb
    ● OVSDB (Northbound DB)
    ● master 3台でRAFTクラスター構成
    ■ sbdb
    ● OVSDB (Southbound DB)
    ● master 3台でRAFTクラスター構成
    ■ ovnkube-master
    ● ホストへのサブネットの切り出し
    ● 新規PodができたらNBDBを更新
    ■ kube-rbac-proxy
    ● RBAC authをk8s APIに対して実行できるようにするためのプロキシー
    ■ ovn-dbchecker
    ● OVSDBのRAFTクラスターの状態を監視する

    View Slide

  35. Pod構成 (2)
    35
    ● ovnkube-node
    ○ 全てのノードで稼動する DaemonSet
    ○ コンテナは5つ
    ■ ovn-controller
    ● SBDBの情報から、自ノードの Physical Flowを生成するデーモン
    ■ ovn-acl-logging
    ● ACL audit logを生成する
    ■ kube-rbac-proxy
    ● RBAC authをk8s APIに対して実行できるようにするためのプロキシー
    ■ kube-rbac-proxy-ovn-metrics
    ● RBAC authをk8s APIに対して実行できるようにするためのプロキシー
    ■ ovnkube-node
    ● CNIバイナリ(ovn-k8s-cni-overlay)と連携してPodがネットワークに接続でき
    るように各種設定を行う
    ● ※ OVSはホストOS上でsystemdのサービスunitとして稼動
    ○ ovs-* コマンドはホスト上で直接実行できる
    ○ ovn-*コマンドはどこかのコンテナに oc execしてから実行する

    View Slide

  36. 使用コマンド
    36
    ● オペレーションでコマンド
    ○ ovn-nbctl
    ○ ovn-sbctl
    ○ ovn-trace
    ● private key, certificate, CA certificate等の指定はDaemonSetの定義を見て思い出すのが よ
    いと思います
    ○ NBDB: TCP 9641
    ○ SBDB: TCP 9642
    $ oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -A1 OVN_NB_CTL= | sed 's/^ *//'
    OVN_NB_CTL="ovn-nbctl -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \
    --db "ssl:172.16.13.101:9641,ssl:172.16.13.102:9641,ssl:172.16.13.103:9641""

    View Slide

  37. 物理構成 (OCP v4.12: Shared Gateway Mode)
    37
    worker-0
    172.16.13.0/24
    OpenShiftクラスター
    br-int
    br-ex
    enp1s0
    172.16.13.104/24
    169.254.169.2/29

    Geneve tunnel links
    to the other nodes
    ovn-k8s-mp0
    10.129.2.2/23
    Pod
    eth0
    Pod
    eth0
    master-0
    br-int
    br-ex
    enp1s0
    172.16.13.101/24
    169.254.169.2/29

    Geneve tunnel links
    to the other nodes
    ovn-k8s-mp0
    10.130.0.2/23
    Pod
    eth0
    Pod
    eth0

    View Slide

  38. オーバーレイ仮想ネットワーク (OCP v4.12: Shared Gateway Mode)
    stor-worker-0
    br-ex_worker-0
    br-ex_master-0
    rtos-master-0(10.130.0.1/23)
    rtos-worker-0(10.129.2.1/23)
    k8s-master-0
    (10.130.0.2/23)
    k8s-worker-0
    (10.129.2.2/23)
    stor-master-0
    rtoj-GR_master-0
    (100.64.0.4/16)
    rtoe-GR_master-0
    (172.16.13.101/24)
    etor-GR_master-0
    jtor-ovn_cluster_router
    jtor-GR_worker-0
    rtoj-GR_worker-0
    (100.64.0.7/16)
    rtoe-GR_worker-0
    (172.16.13.104/24)
    etor-GR_worker-0
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    br-int br-int
    worker-0
    master-0
    Pod Pod
    master-0
    Pod Pod
    worker-0
    ovn_cluster_router
    GR_worker-0
    ext_worker-0
    ext_master-0
    GR_master-0
    join
    rtoj-ovn_cluster_router(100.64.0.1/29)
    Underlay Network
    jtor-GR_master-0
    enp1s0
    (br-ex)
    enp1s0
    (br-ex)
    凡例
    ロジカルスイッチ
    ロジカルルーター
    ロジカルロード
    バランサー

    View Slide

  39. (ご参考) v4.7までの物理構成 (Local Gateway Mode)

    View Slide

  40. (ご参考) v4.7までのオーバーレイ仮想ネットワーク (Local Gateway Mode)
    40

    View Slide

  41. Overall view
    41
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23

    View Slide

  42. Pod to Pod
    (same node)
    42

    View Slide

  43. サマリ
    43
    ● 同一ノード上のPodは、OVNのロジカルネットワーク上は「ノード用のロジカルスイッチ」に接続
    する
    ○ 物理的には、OVSブリッジbr-intに接続する
    worker-0
    ovn-k8s-mp0
    (br-int)
    client hello-w0
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29

    View Slide

  44. Pod-to-Pod - same node
    44
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23

    View Slide

  45. Pod-to-Pod - same node
    45
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23
    oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk --
    ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt
    --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642"
    --ct new
    'inport == "ovntest_client"
    && eth.src == 0a:58:0a:81:02:27
    && ip4.src == 10.129.2.39
    && tcp.src == 33333
    && eth.dst == 0a:58:0a:81:02:29
    && ip4.dst == 10.129.2.41
    && tcp.dst == 8080
    && ip.ttl == 64
    && tcp'

    View Slide

  46. ovn-trace --summary (Pod-to-Pod - same node)
    46
    ingress(dp="worker-0", inport="ovntest_client") {
    reg0[15] = check_in_port_sec();
    next;
    reg0[0] = 1;
    next;
    reg0[2] = 1;
    next;
    ct_lb_mark;
    ct_lb_mark {
    reg0[7] = 1;
    reg0[9] = 1;
    next;
    reg0[1] = 1;
    next;
    ct_commit { ct_mark.blocked = 0; };
    next;
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    next;
    outport = "ovntest_hello-w0";
    output;
    egress(dp="worker-0", inport="ovntest_client", outport="ovntest_hello-w0") {
    reg0[2] = 1;
    next;
    reg0[0] = 1;
    next;
    ct_lb_mark;
    ct_lb_mark /* default (use --ct to customize) */ {
    reg0[10] = 1;
    next;
    reg0[15] = check_out_port_sec();
    next;
    output;
    /* output to "ovntest_hello-w0", type "" */;
    };
    };
    };
    };
    Logical Switch `worker-0`の
    Ingress Datapath
    Logical Switch `worker-0`の
    Egress Datapath

    View Slide

  47. ovn-trace --detail (Pod-to-Pod - same node)
    47
    ingress(dp="worker-0", inport="ovntest_client")
    -----------------------------------------------
    0. ls_in_check_port_sec (northd.c:8327): 1, priority 50, uuid 57ea4ad9
    reg0[15] = check_in_port_sec();
    next;
    4. ls_in_pre_acl (northd.c:5801): ip, priority 100, uuid b95659ee
    reg0[0] = 1;
    next;
    5. ls_in_pre_lb (northd.c:5971): ip, priority 100, uuid 5c887504
    reg0[2] = 1;
    next;
    6. ls_in_pre_stateful (northd.c:5994): reg0[2] == 1, priority 110, uuid 3d49ab72
    ct_lb_mark;
    ct_lb_mark
    ----------
    7. ls_in_acl_hint (northd.c:6054): ct.new && !ct.est, priority 7, uuid bd5ebb2e
    reg0[7] = 1;
    reg0[9] = 1;
    next;
    8. ls_in_acl (northd.c:6668): ip && !ct.est, priority 1, uuid 13bbc321
    reg0[1] = 1;
    next;
    15. ls_in_stateful (northd.c:7507): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 2335ba44
    ct_commit { ct_mark.blocked = 0; };
    next;
    16. ls_in_pre_hairpin (northd.c:7535): ip && ct.trk, priority 100, uuid 2e88b7c6
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    next;
    25. ls_in_l2_lkup (northd.c:8993): eth.dst == 0a:58:0a:81:02:29, priority 50, uuid 96aa0da0
    outport = "ovntest_hello-w0";
    output;
    egress(dp="worker-0", inport="ovntest_client", outport="ovntest_hello-w0")
    --------------------------------------------------------------------------
    0. ls_out_pre_lb (northd.c:5973): ip, priority 100, uuid 73ec1607
    reg0[2] = 1;
    next;
    1. ls_out_pre_acl (northd.c:5803): ip, priority 100, uuid cbfe6a43
    reg0[0] = 1;
    next;
    2. ls_out_pre_stateful (northd.c:5997): reg0[2] == 1, priority 110, uuid a10d5414
    ct_lb_mark;
    ct_lb_mark /* default (use --ct to customize) */
    ------------------------------------------------
    3. ls_out_acl_hint (northd.c:6116): ct.est && ct_mark.blocked == 0, priority 1, uuid 4e73f02a
    reg0[10] = 1;
    next;
    8. ls_out_check_port_sec (northd.c:5657): 1, priority 0, uuid d9e7bcdf
    reg0[15] = check_out_port_sec();
    next;
    9. ls_out_apply_port_sec (northd.c:5662): 1, priority 0, uuid 2c3e3cdd
    output;
    /* output to "ovntest_hello-w0", type "" */
    Logical Switch `worker-0`の
    Ingress Datapath
    Logical Switch `worker-0`の
    Egress Datapath

    View Slide

  48. Pod to Pod
    (different node)
    48

    View Slide

  49. サマリ
    49
    ● worker-0上のPod `client` からworker-1上のPod `hello-w1` に通信
    ● 送信元Pod `curl` から出たパケットは、ノードローカルなロジカルスイッチ `worker-0` を経
    由して、ノード間を接続するロジカルルータ `ovn_cluster_router` に入る
    ● ovn_cluster_routerは送信先PodのIPアドレスはworker-1のサブネットなので、 worker-1のノー
    ドローカルスイッチに転送する
    ● worker-1のノードローカルスイッチを経由してパケットが送信先 Pod (hello-w1) に届く

    View Slide

  50. Pod-to-Pod - different node
    50
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23

    View Slide

  51. Pod-to-Pod - different node
    51
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23
    oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk --
    ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt
    --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642"
    --ct new
    'inport == "ovntest_client"
    && eth.src == 0a:58:0a:81:02:27
    && ip4.src == 10.129.2.39
    && tcp.src == 33333
    && eth.dst == 0a:58:0a:81:02:01
    && ip4.dst == 10.131.0.18
    && tcp.dst == 8080
    && ip.ttl == 64
    && tcp'

    View Slide

  52. ovn-trace --summary (Pod-to-Pod - different node)
    52
    ingress(dp="worker-0", inport="ovntest_client") {
    reg0[15] = check_in_port_sec();
    next;
    reg0[0] = 1;
    next;
    reg0[2] = 1;
    next;
    ct_lb_mark;
    ct_lb_mark {
    reg0[7] = 1;
    reg0[9] = 1;
    next;
    reg0[1] = 1;
    next;
    ct_commit { ct_mark.blocked = 0; };
    next;
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    next;
    outport = "stor-worker-0";
    output;
    egress(dp="worker-0", inport="ovntest_client", outport="stor-worker-0") {
    next;
    next;
    reg0[7] = 1;
    reg0[9] = 1;
    next;
    reg0[1] = 1;
    next;
    ct_commit { ct_mark.blocked = 0; };
    next;
    reg0[15] = check_out_port_sec();
    next;
    output;
    /* output to "stor-worker-0", type "patch" */;
    Logical Switch `worker-0`の
    Ingress Datapath
    Logical Switch `worker-0`の
    Egress Datapath

    View Slide

  53. ovn-trace --summary (Pod-to-Pod - different node)
    53
    ingress(dp="ovn_cluster_router", inport="rtos-worker-0") {
    xreg0[0..47] = 0a:58:0a:81:02:01;
    next;
    reg9[2] = 1;
    next;
    next;
    reg7 = 0;
    next;
    ip.ttl--;
    reg8[0..15] = 0;
    reg0 = ip4.dst;
    reg1 = 10.131.0.1;
    eth.src = 0a:58:0a:83:00:01;
    outport = "rtos-worker-1";
    flags.loopback = 1;
    next;
    next;
    reg8[0..15] = 0;
    next;
    next;
    eth.dst = 0a:58:0a:83:00:12;
    next;
    outport = "cr-rtos-worker-1";
    next;
    output;
    /* Replacing type "chassisredirect" outport "cr-rtos-worker-1" with distributed port
    "rtos-worker-1". */;
    egress(dp="ovn_cluster_router", inport="rtos-worker-0", outport="rtos-worker-1") {
    reg9[4] = 0;
    next;
    output;
    /* output to "rtos-worker-1", type "patch" */;
    Logical Router
    `ovn_cluster_router`の
    Ingress Datapath
    Logical Router
    `ovn_cluster_router`の
    Egress Datapath

    View Slide

  54. ovn-trace --summary (Pod-to-Pod - different node)
    54
    ingress(dp="worker-1", inport="stor-worker-1") {
    reg0[15] = check_in_port_sec();
    next;
    next;
    next;
    reg0[7] = 1;
    reg0[9] = 1;
    next;
    reg0[1] = 1;
    next;
    ct_commit { ct_mark.blocked = 0; };
    next;
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    next;
    outport = "ovntest_hello-w1";
    output;
    egress(dp="worker-1", inport="stor-worker-1", outport="ovntest_hello-w1") {
    reg0[2] = 1;
    next;
    reg0[0] = 1;
    next;
    ct_lb_mark;
    ct_lb_mark /* default (use --ct to customize) */ {
    reg0[10] = 1;
    next;
    reg0[15] = check_out_port_sec();
    next;
    output;
    /* output to "ovntest_hello-w1", type "" */;
    };
    };
    };
    };
    };
    };
    };
    };

    View Slide

  55. Pod to ClusterIP
    (different node)
    55

    View Slide

  56. サマリ
    56
    ● worker-0上のPod `client` からworker-1上のPod `hello-w1` に通信
    ● 送信元Pod `curl` から出たパケットは、ノードローカルなロジカルスイッチ `worker-0` に入

    ● ロジカルスイッチworker-0上にClusterIP Serviceに相当するロジカルロードバランサがあり、
    そこでService配下Podに振り分ける
    ● 振り分け先のPodが同じノードであれば折り返し、異なるノードであればロジカルルータ
    ovn_cluster_routerを経由してノードをまたいだ通信をする

    View Slide

  57. Pod-to-ClusterIP same/different node
    57
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23
    ClusterIP Service
    172.30.182.53:80

    View Slide

  58. Pod-to-ClusterIP same/different node
    58
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23
    oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk --
    ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt
    --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642"
    --ct new
    'inport == "ovntest_client"
    && eth.src == 0a:58:0a:81:02:27
    && ip4.src == 10.129.2.39
    && tcp.src == 33333
    && eth.dst == 0a:58:0a:81:02:01
    && ip4.dst == 172.30.182.53
    && tcp.dst == 80
    && ip.ttl == 64
    && tcp'
    ClusterIP Service
    172.30.182.53:80

    View Slide

  59. OpenFlowのトレースを見てみる (1)
    59
    $ oc exec client -- cat /sys/class/net/eth0/iflink
    45
    $ oc debug node/worker-0 -- chroot /host ovs-vsctl find Interface ifindex=45 2>/dev/null | grep
    ofport
    ofport : 44
    ofport_request : []
    $ sudo ovs-appctl ofproto/trace br-int \
    in_port=44,\
    tcp,\
    dl_src=0a:58:0a:81:02:27,\
    nw_src=10.129.2.39,\
    tcp_src=33333,\
    dl_dst=0a:58:0a:81:02:01,\
    nw_dst=172.30.182.53,\
    tcp_dst=80,\
    nw_ttl=64,\
    dp_hash=2
    client Podのインターフェース番号は
    45
    client PodはOVSブリッジbr-intのポート番号
    は44に接続している
    worker-0上で実行

    View Slide

  60. OpenFlowのトレースを見てみる (2)
    60
    $ sudo ovs-appctl ofproto/trace br-int
    in_port=44,tcp,dl_src=0a:58:0a:81:02:27,nw_src=10.129.2.39,tcp_src=33333,dl_dst=0a:58:0a:81:02:01,nw_dst=172.30.182.53,tcp_dst=80
    ,nw_ttl=64,dp_hash=2
    Flow:
    dp_hash=0x2,tcp,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=172.30.182
    .53,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=80,tcp_flags=0
    bridge("br-int")
    ----------------
    0. in_port=44, priority 100, cookie 0xf7f9ca3e
    set_field:0x16->reg13
    set_field:0x7->reg11
    set_field:0x1->reg12
    ...
    29. metadata=0xa, priority 0, cookie 0x928911c5
    resubmit(,37)
    37. reg15=0xd,metadata=0xa, priority 100, cookie 0x2a5007e1
    set_field:0xa/0xffffff->tun_id
    set_field:0xd->tun_metadata0
    move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30]
    -> NXM_NX_TUN_METADATA0[16..30] is now 0xa
    output:12
    -> output to kernel tunnel
    resubmit(,38)
    38. No match.
    drop
    Final flow:
    recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_
    tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw
    _frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    Megaflow:
    recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:8
    1:02:01,nw_src=10.129.2.32/29,nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no
    Datapath actions:
    ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,l
    en=4,0xa000d}),flags(df|csum|key))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4
    12番ポートから送信
    worker-0上で実行

    View Slide

  61. OpenFlowのトレースを見てみる (3)
    61
    $ sudo ovs-ofctl -O OpenFlow13 show br-int
    ...
    12(ovn-4c2f1b-0): addr:82:9b:d1:10:33:a0
    config: 0
    state: LIVE
    speed: 0 Mbps now, 0 Mbps max
    ...
    $ sudo ovs-vsctl --columns=options find interface type=geneve ofport=12
    options : {csum="true", key=flow, remote_ip="172.16.13.105"}
    $ sudo ovs-vsctl show
    ...
    6f785367-2ac9-40bc-9728-64db3aeb2e8d
    Bridge br-int
    fail_mode: secure
    datapath_type: system
    ...
    Port ovn-4c2f1b-0
    Interface ovn-4c2f1b-0
    type: geneve
    options: {csum="true", key=flow, remote_ip="172.16.13.105"}
    ...
    12番ポートのインターフェース名は
    ovn-4c2f1b-0
    ovn-4c2f1b-0は172.16.13.105 (worker-1)
    行きのGeneveトンネル
    おまけ:12番ポートがどのGeneveトンネルか
    を一撃で調べる
    worker-0上で実行

    View Slide

  62. OpenFlowのトレースを見てみる (4)
    62
    $ sudo ovs-appctl ofproto/trace br-int \
    in_port=3,\
    tun_id=0xa,\
    tun_metadata0=0xa000d,
    tcp,\
    reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,
    vlan_tci=0x0000,
    dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_fla
    gs=0
    Flow:
    tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,tun_id=0xa,metadata=0xe,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83
    :00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    bridge("br-int")
    ----------------
    0. in_port=3, priority 100
    move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23]
    -> OXM_OF_METADATA[0..23] is now 0xa
    ...
    64. priority 0
    resubmit(,65)
    65. reg15=0x11,metadata=0xc, priority 100, cookie 0xccf44662
    output:23
    Final flow:
    recirc_id=0x124769,eth,tcp,reg0=0x287,reg11=0x3,reg12=0x4,reg13=0x17,reg14=0x1,reg15=0x11,tun_id=0xa,metadata=0xc,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:
    02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    Megaflow:
    recirc_id=0x124769,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=3,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.128.0.0/15,nw_d
    st=10.131.0.18,nw_frag=no
    Datapath actions: ct(commit,zone=23,mark=0/0x1,nat(src)),18
    Final flow:
    recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81:
    02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    Megaflow:
    recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.32/29,
    nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no
    Datapath actions:
    ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xa000d}),flags(df|csum|k
    ey))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4
    worker-0でのtrace結果 worker-1上で実行

    View Slide

  63. OpenFlowのトレースを見てみる (4)
    63
    $ sudo ovs-appctl ofproto/trace br-int \
    in_port=3,\
    tun_id=0xa,\
    tun_metadata0=0xa000d,\
    tcp,\
    reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,\
    vlan_tci=0x0000,\
    dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_fla
    gs=0
    Flow:
    tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,tun_id=0xa,metadata=0xe,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83
    :00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    bridge("br-int")
    ----------------
    0. in_port=3, priority 100
    move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23]
    -> OXM_OF_METADATA[0..23] is now 0xa
    ...
    64. priority 0
    resubmit(,65)
    65. reg15=0x11,metadata=0xc, priority 100, cookie 0xccf44662
    output:23
    Final flow:
    recirc_id=0x124769,eth,tcp,reg0=0x287,reg11=0x3,reg12=0x4,reg13=0x17,reg14=0x1,reg15=0x11,tun_id=0xa,metadata=0xc,in_port=3,vlan_tci=0x0000,dl_src=0a:58:0a:81:
    02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    Megaflow:
    recirc_id=0x124769,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,ip,in_port=3,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:83:00:12,nw_src=10.128.0.0/15,nw_d
    st=10.131.0.18,nw_frag=no
    Datapath actions: ct(commit,zone=23,mark=0/0x1,nat(src)),18
    Final flow:
    recirc_id=0xab9fd,dp_hash=0x2,eth,tcp,reg0=0x282,reg11=0x7,reg12=0x1,reg13=0x16,reg14=0xc,reg15=0x1,metadata=0xe,in_port=44,vlan_tci=0x0000,dl_src=0a:58:0a:81:
    02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.39,nw_dst=10.131.0.18,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33333,tp_dst=8080,tcp_flags=0
    Megaflow:
    recirc_id=0xab9fd,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,in_port=44,dl_src=0a:58:0a:81:02:27,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.32/29,
    nw_dst=10.131.0.18,nw_ecn=0,nw_ttl=64,nw_frag=no
    Datapath actions:
    ct(commit,zone=22,mark=0/0x1,nat(src)),set(tunnel(tun_id=0xa,dst=172.16.13.105,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xa000d}),flags(df|csum|k
    ey))),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:12)),set(ipv4(ttl=63)),4
    23番ポートから送信
    worker-1上で実行
    worker-0でのtrace結果

    View Slide

  64. OpenFlowのトレースを見てみる (5)
    64
    $ sudo ovs-vsctl --column=external_ids find interface ofport=23
    external_ids : {attached_mac="0a:58:0a:83:00:12", iface-id=ovntest_hello-w1,
    iface-id-ver="25e16505-3656-49df-8268-80b20fe065d3", ip_addresses="10.131.0.18/23",
    ovn-installed="true", ovn-installed-ts="1674478354432",
    sandbox=f0888b516b6dabb7b8e967672de1c0e731d849a63c6f9798e35ed9c69063d354}
    $ sudo ovs-vsctl --columns=ofport find interface type=geneve options:remote_ip=172.16.13.104
    ofport : 3
    worker-0 (172.16.13.104)からの
    Geneveトンネルは3番ポート
    23番ポートはhello-w1 Pod
    worker-1上で実行

    View Slide

  65. Pod-to-NodePort (俯瞰図)

    View Slide

  66. Pod-to-NodePort (1)
    66
    stor-worker-1
    rtos-worker-0
    10.129.2.1/23
    0a:58:0a:81:02:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    worker-0 worker-1
    stor-worker-0
    rtoj-GR_worker-0
    100.64.0.7/16
    rtoe-GR_worker-0
    172.16.13.104/24
    jtor-ovn_cluster_router
    jtor-GR_worker-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    worker-0
    client hello-w0 hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.129.2.39
    0a:58:0a:81:02:27
    10.129.2.41
    0a:58:0a:81:02:29
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-worker-0
    10.129.2.2/23
    k8s-worker-1
    10.131.0.2/23
    curl master-0:31513
    NAME TYPE CLUSTER-IP PORT(S)
    hello-nodeport NodePort 172.30.49.137 80:31513/TCP
    oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk --
    ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt
    --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642"
    --ct new
    'inport == "ovntest_client"
    && eth.src == 0a:58:0a:81:02:27
    && ip4.src == 10.129.2.39
    && tcp.src == 33333
    && eth.dst == 0a:58:0a:81:02:01
    && ip4.dst == 172.16.13.101
    && tcp.dst == 31513
    && ip.ttl == 64
    && tcp'

    View Slide

  67. Pod-to-NodePort (2)
    67
    stor-worker-1
    rtos-master-0
    10.130.0.1/23
    0a:58:0a:82:00:01
    rtos-worker-1
    10.131.0.1/23
    0a:58:0a:83:00:01
    master-0 worker-1
    stor-master-0
    rtoj-GR_master-0
    100.64.0.4/16
    rtoe-GR_master-0
    172.16.13.101/24
    jtor-ovn_cluster_router
    jtor-GR_master-0 jtor-GR_worker-1
    rtoj-GR_worker-1
    100.64.0.5/16
    rtoe-GR_worker-1
    172.16.12.105/24
    ovn-k8s-mp0
    (br-int)
    ovn-k8s-mp0
    (br-int)
    worker-1
    master-0
    hello-w1
    ovn_cluster_router
    GR_worker-1
    GR_worker-0
    join
    rtoj-ovn_cluster_router
    100.64.0.1/16
    10.131.0.18
    0a:58:0a:83:00:12
    k8s-master-0
    10.130.0.2/23
    k8s-worker-1
    10.131.0.2/23
    oc -n openshift-ovn-kubernetes exec -c northd ovnkube-master-zwglk --
    ovn-trace -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt
    --db "ssl:172.16.13.101:9642,ssl:172.16.13.102:9642,ssl:172.16.13.103:9642"
    --ct new
    'inport == "br-ex_master-0"
    && eth.src == 52:54:00:00:13:04
    && ip4.src == 172.16.13.104
    && tcp.src == 33333
    && eth.dst == 52:54:00:00:13:01
    && ip4.dst == 172.16.13.101
    && tcp.dst == 31513
    && ip.ttl == 64
    && tcp'

    View Slide

  68. その他
    68
    ● 内部DNS (ns: openshift-dns, svc: dns-default) への名前解決時は、ローカ
    ルノード上のCoreDNS Podに優先して問い合わせをする
    ○ https://github.com/openshift/ovn-kubernetes/pull/896
    ○ https://docs.google.com/presentation/d/1_5Dh3HTVSpETvVhZsz
    E41REwNPiFFxanhGCAAFg5iPQ/edit#slide=id.g144e6452910_0_
    37
    ● Egress RouterはCNI Pluginを用いて実装
    ○ https://github.com/openshift/egress-router-cni
    ● スケーラビリティ改善の鍵 : OVN Interconnect
    ○ https://www.openvswitch.org/support/ovscon2022/slides/OVN-I
    C-OVSCON.pdf
    ○ https://www.openvswitch.org/support/ovscon2019/day1/1501-Mul
    ti-tenant%20Inter-DC%20tunneling%20with%20OVN(4).pdf

    View Slide

  69. Presentation title should not
    exceed two lines
    69
    Thank You

    View Slide