Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenShiftのBGPサポート - OVN-Kubernetes編

Avatar for orimanabu orimanabu
November 25, 2025

OpenShiftのBGPサポート - OVN-Kubernetes編

CNIプラグインOVN-KubernetesがBGP対応し、PodネットワークやEgress IPのアドレスをBGPで広告する話です。

Avatar for orimanabu

orimanabu

November 25, 2025
Tweet

More Decks by orimanabu

Other Decks in Technology

Transcript

  1. はじめに ▸ OpenShift v4.19.14 / v4.20.0から、CNIプラグインであるOVN-KubernetesがBGPに対応しました ・ frr-k8sを使って外部のBGPルータとピアを張り、 Podネットワーク/EgressIPのアドレスを広告したり、 外部のルータから経路を受け取ることができるようになります

    ・ Cluster Network Operatorのカスタムリソースの spec.additionalRoutingCapabilities に 設定をすると、openshift-frr-k8s namespaceにfrr-k8sがDaemonSetとして起動します ・ 主に FRRConfiguration、RouteAdvertisement の2つのカスタムリソースで BGP関連の設定 を行います 3
  2. MetalLBのBGPモードとの関係 ▸ OpenShift v4.17以降、MetalLBのBGPモードを使って type: LoadBalancer のServiceのExternal IPを BGPで広告する場合もfrr-k8sを使います ▸

    OVN-KubernetesのBGP対応に伴い、OVN-KubernetesとMetalLBでfrr-k8sを共有するようになりました ・ frr-k8sはmetallb-system Namespaceにデプロイされていましたが、 OVN-KubernetesのBGP対応 にともない、デプロイする Namespaceがopenshift-frr-k8sに変わりました ・ frr-k8sは、Cluster Network Operatorに管理されるようになりました。 MetalLB Operatorがfrr-k8sを デプロイする際も、Cluster Network Operatorに依頼します ・ OVN-KubernetesのBGPサポートが有効になっている場合 : すでにfrr-k8sが稼働しているの で、MetalLB Operatorはそれをそのまま使います ・ OVN-KubernetesのBGPサポートが有効になっていない場合 : MetalLB OperatorがCluster Network Operatorに依頼してfrr-k8sをデプロイしてもらいます 4
  3. FRR Configuration 登場人物 5 FRR Configuration frr-k8s frr-k8s frr frr

    config openshift-frr-k8s Custom Resource Pod Namespace Container DaemonSet openshift-ovn-kubernetes ovnkube-node FRR Configuration ※ MetalLBが生成する FRRConfigurationについては本資料では扱っていません Cluster UserDefined Network Route Advertisement BGPPeer BGP Advertisement metallb-system speaker controller MetalLB Operator IPAddressPool Cluster Network Operator openshift-network-operator Network Manage Manage FRRConfigurationの マージ
  4. OVN-KubernetesのBGP対応 ▸ できること ・ Pod/VMのプライマリネットワークを BGPで広告できる ・ デフォルトPodネットワーク (clusterNetwork) ・

    CUDNで設定したプライマリネットワーク (トポロジー: layer2 or layer3) ・ 外部からPod/VMのアドレスに対して直接疎通できる (NAPTなし) ▸ 注意 ・ サポートされるのはplatform: baremetal環境のみ ・ ノード、ServiceのアドレスはOVN-Kubernetes with BGPの機能では広告できない ・ Serviceは従来どおりMetalLBを使ってBGPで広告する ・ Multusで追加するセカンダリネットワークは対象外 ・ 実装上、CUDN (例: cudn1) と同名のVRFが作成されるため、公式ドキュメントやコマンドの出力等 で、CUDN名とVRF名のどちらか判別しずらい場合があります。本資料では、 VRFを指す場合は「VRF cudn1」等と記載するようにしたつもりですが、 CUDNかVRFかを意識しながら見ていただければ幸い です 6
  5. r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24

    172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 検証環境 ▸ OpenShiftのノード(cp0..2, wk0..4)はルータr2配下にいて、 r2→r1経由でインターネットに出る。VRF Lite検証時は追加 NICをr3に接続する ・ cp[0-2]: 172.18.20.10[0-2] ・ wk[0-4]: 172.18.20.11[0-4] ▸ r1, r2, r3はループバックアドレスでBGPピアを張っている ・ r1: 172.18.0.1 ・ r2: 172.18.0.2 ・ r3: 172.18.0.3 ▸ 172.18.99.0/24 は管理用裏ネットワークのアドレス ▸ AS番号やその他のアドレスは右図のとおり ▸ r1, r2, r3はVyOS 1.5-stream-2025-Q2 ▸ OpenShiftはv4.20.2 Router NAT Switch OpenShift node VM Container 8
  6. VyOS configs r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 r2 r3 r1 set interfaces ethernet eth0 address '172.18.99.101/24' set interfaces ethernet eth1 address '172.18.10.2/24' set interfaces ethernet eth2 address '172.18.12.1/24' set interfaces ethernet eth3 address '172.18.13.1/24' set interfaces loopback lo address '172.18.0.1/32' set nat source rule 100 outbound-interface name 'eth1' set nat source rule 100 source address '0.0.0.0/0' set nat source rule 100 translation address 'masquerade' set protocols bgp address-family ipv4-unicast set protocols bgp neighbor 172.18.0.2 address-family ipv4-unicast default-originate set protocols bgp neighbor 172.18.0.2 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.2 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.2 remote-as '65102' set protocols bgp neighbor 172.18.0.2 update-source 'lo' set protocols bgp neighbor 172.18.0.3 address-family ipv4-unicast default-originate set protocols bgp neighbor 172.18.0.3 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.3 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.3 remote-as '65103' set protocols bgp neighbor 172.18.0.3 update-source 'lo' set protocols bgp parameters router-id '172.18.0.1' set protocols bgp system-as '65101' set protocols ospf area 0.0.0.0 network '172.18.0.1/32' set protocols ospf area 0.0.0.0 network '172.18.12.0/24' set protocols ospf area 0.0.0.0 network '172.18.13.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.1' set protocols static route 0.0.0.0/0 next-hop 172.18.10.1 set interfaces ethernet eth0 address '172.18.99.103/24' set interfaces ethernet eth1 address '172.18.13.2/24' set interfaces ethernet eth2 address '172.18.30.1/24' set interfaces ethernet eth2 vif 2001 address '172.19.21.1/24' set interfaces ethernet eth2 vif 2002 address '172.19.22.1/24' set interfaces loopback lo address '172.18.0.3/32' set protocols bgp address-family ipv4-unicast network 172.18.30.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp parameters router-id '172.18.0.3' set protocols bgp system-as '65103' set protocols ospf area 0.0.0.0 network '172.18.0.3/32' set protocols ospf area 0.0.0.0 network '172.18.13.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.3' 9 set interfaces ethernet eth0 address '172.18.99.102/24' set interfaces ethernet eth1 address '172.18.12.2/24' set interfaces ethernet eth2 address '172.18.20.1/24' set interfaces loopback lo address '172.18.0.2/32' set protocols bgp address-family ipv4-unicast network 172.18.20.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp parameters router-id '172.18.0.2' set protocols bgp system-as '65102' set protocols ospf area 0.0.0.0 network '172.18.0.2/32' set protocols ospf area 0.0.0.0 network '172.18.12.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.2'
  7. 各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 r2 r3 r1 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 11251 11255 1942 0 0 6d13h01m 1 3 N/A 172.18.0.3 4 65103 11361 11248 1942 0 0 6d13h01m 1 3 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11255 11250 1940 0 0 6d13h01m 2 3 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11247 11361 12427 0 0 6d13h01m 2 3 N/A 10 [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i
  8. OpenShift環境 ▸ v4.20.2 UPI on libvirt VMs ・ (注意) BGP機能がサポートされ

    るのはベアメタルのみです ▸ 追加Operator ・ OpenShift Virtualization ・ ODF ・ Nmstate ▸ Cluster Network Operatorの設定を右 のように変更 $ oc get network.operator cluster -o yaml | yq .spec additionalRoutingCapabilities: providers: - FRR clusterNetwork: - cidr: 10.128.0.0/16 hostPrefix: 24 defaultNetwork: ovnKubernetesConfig: egressIPConfig: {} gatewayConfig: ipForwarding: Global ipv4: {} ipv6: {} routingViaHost: true genevePort: 6081 ipsecConfig: mode: Disabled mtu: 1400 policyAuditConfig: destination: "null" maxFileSize: 50 maxLogFiles: 5 rateLimit: 20 syslogFacility: local0 routeAdvertisements: Enabled type: OVNKubernetes deployKubeProxy: false disableMultiNetwork: false disableNetworkDiagnostics: false logLevel: Normal managementState: Managed observedConfig: null operatorLogLevel: Normal serviceNetwork: - 10.200.0.0/16 unsupportedConfigOverrides: null useMultiNetworkPolicy: false BGPを使うのに必要 VRF Lite設定時に必要 BGPを使うのに必要 11
  9. OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: v1 kind: Namespace metadata: name: proj1 labels: use-egressip: "true" apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: myegressip spec: egressIPs: - 172.18.20.201 - 172.18.20.202 namespaceSelector: matchLabels: use-egressip: "true" apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello app.kubernetes.io/component: hello app.kubernetes.io/instance: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello imagePullPolicy: IfNotPresent name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace EgressIP Deployment 65801 BGP Adv: Default Pod NW 13
  10. OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: name: receive-all namespace: openshift-frr-k8s labels: routeAdvertisements: receive-all spec: nodeSelector: {} bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: true toReceive: allowed: mode: all apiVersion: k8s.ovn.org/v1 kind: RouteAdvertisements metadata: name: default spec: nodeSelector: {} advertisements: - PodNetwork - EgressIP networkSelectors: - networkSelectionType: DefaultNetwork frrConfigurationSelector: matchLabels: routeAdvertisements: receive-all FRRConfiguratio n RouteAdvertisemen t 65801 BGP Adv: Default Pod NW 14
  11. OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: name: receive-all namespace: openshift-frr-k8s labels: routeAdvertisements: receive-all spec: nodeSelector: {} bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: true toReceive: allowed: mode: all apiVersion: k8s.ovn.org/v1 kind: RouteAdvertisements metadata: name: default spec: nodeSelector: {} advertisements: - PodNetwork - EgressIP networkSelectors: - networkSelectionType: DefaultNetwork frrConfigurationSelector: matchLabels: routeAdvertisements: receive-all FRRConfiguratio n RouteAdvertisemen t 65801 BGP Adv: Default Pod NW 対向ルータから広告された 経路を全てインポートする ピアを張る対向ルータのア ドレスとAS番号 全ノードからピアを張る PodネットワークとEgressIP を広告する デフォルトPodネットワーク を広告する この経路広告設定と紐づけ るFRRConfiguration 自分のAS番号 15
  12. 対向ルータr2の設定 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 [email protected]:~$ show configuration commands | match bgp set protocols bgp address-family ipv4-unicast network 172.18.20.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp neighbor 172.18.20.100 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.100 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.100 remote-as '65801' set protocols bgp neighbor 172.18.20.101 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.101 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.101 remote-as '65801' set protocols bgp neighbor 172.18.20.102 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.102 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.102 remote-as '65801' set protocols bgp neighbor 172.18.20.110 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.110 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.110 remote-as '65801' set protocols bgp neighbor 172.18.20.111 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.111 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.111 remote-as '65801' set protocols bgp neighbor 172.18.20.112 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.112 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.112 remote-as '65801' set protocols bgp neighbor 172.18.20.113 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.113 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.113 remote-as '65801' set protocols bgp neighbor 172.18.20.114 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.114 ebgp-multihop '10' set protocols bgp neighbor 172.18.20.114 remote-as '65801' set protocols bgp parameters router-id '172.18.0.2' set protocols bgp system-as '65102' 65801 r2から各ノードへの ピア設定を追加 BGP Adv: Default Pod NW 16
  13. 各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 11251 11255 1942 0 0 6d13h01m 1 3 N/A 172.18.0.3 4 65103 11361 11248 1942 0 0 6d13h01m 1 3 N/A Network Next Hop Metric LocPrf Weight Path *> 10.128.0.0/24 172.18.0.2 0 65102 65801 i *> 10.128.1.0/24 172.18.0.2 0 65102 65801 i *> 10.128.2.0/24 172.18.0.2 0 65102 65801 i *> 10.128.3.0/24 172.18.0.2 0 65102 65801 i *> 10.128.4.0/24 172.18.0.2 0 65102 65801 i *> 10.128.5.0/24 172.18.0.2 0 65102 65801 i *> 10.128.6.0/24 172.18.0.2 0 65102 65801 i *> 10.128.7.0/24 172.18.0.2 0 65102 65801 i *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.20.201/32 172.18.0.2 0 65102 65801 i *> 172.18.20.202/32 172.18.0.2 0 65102 65801 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11247 11361 12427 0 0 6d13h01m 2 3 N/A *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 10.128.0.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.1.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.2.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.3.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.4.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.5.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.6.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.7.0/24 172.18.0.1 0 65101 65102 65801 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.20.201/32 172.18.0.1 0 65101 65102 65801 i *> 172.18.20.202/32 172.18.0.1 0 65101 65102 65801 i *> 172.18.30.0/24 0.0.0.0 0 32768 i r1 r2 r3 BGP Adv: Default Pod NW Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11289 11284 1950 0 0 6d13h33m 2 13 N/A 172.18.20.100 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.101 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.102 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.110 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.111 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.112 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.113 4 65801 4 14 1950 0 0 00:00:35 2 13 N/A 172.18.20.114 4 65801 4 14 1950 0 0 00:00:35 2 13 N/A 17 Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 10.128.0.0/24 172.18.20.101 0 0 65801 i *> 10.128.1.0/24 172.18.20.100 0 0 65801 i *> 10.128.2.0/24 172.18.20.102 0 0 65801 i *> 10.128.3.0/24 172.18.20.110 0 0 65801 i *> 10.128.4.0/24 172.18.20.111 0 0 65801 i *> 10.128.5.0/24 172.18.20.112 0 0 65801 i *> 10.128.6.0/24 172.18.20.114 0 0 65801 i *> 10.128.7.0/24 172.18.20.113 0 0 65801 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.20.201/32 172.18.20.113 0 0 65801 i *> 172.18.20.202/32 172.18.20.114 0 0 65801 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i
  14. 各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 11251 11255 1942 0 0 6d13h01m 1 3 N/A 172.18.0.3 4 65103 11361 11248 1942 0 0 6d13h01m 1 3 N/A Network Next Hop Metric LocPrf Weight Path *> 10.128.0.0/24 172.18.0.2 0 65102 65801 i *> 10.128.1.0/24 172.18.0.2 0 65102 65801 i *> 10.128.2.0/24 172.18.0.2 0 65102 65801 i *> 10.128.3.0/24 172.18.0.2 0 65102 65801 i *> 10.128.4.0/24 172.18.0.2 0 65102 65801 i *> 10.128.5.0/24 172.18.0.2 0 65102 65801 i *> 10.128.6.0/24 172.18.0.2 0 65102 65801 i *> 10.128.7.0/24 172.18.0.2 0 65102 65801 i *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.20.201/32 172.18.0.2 0 65102 65801 i *> 172.18.20.202/32 172.18.0.2 0 65102 65801 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11247 11361 12427 0 0 6d13h01m 2 3 N/A *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 10.128.0.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.1.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.2.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.3.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.4.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.5.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.6.0/24 172.18.0.1 0 65101 65102 65801 i *> 10.128.7.0/24 172.18.0.1 0 65101 65102 65801 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.20.201/32 172.18.0.1 0 65101 65102 65801 i *> 172.18.20.202/32 172.18.0.1 0 65101 65102 65801 i *> 172.18.30.0/24 0.0.0.0 0 32768 i r1 r2 r3 BGP Adv: Default Pod NW Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11289 11284 1950 0 0 6d13h33m 2 13 N/A 172.18.20.100 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.101 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.102 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.110 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.111 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.112 4 65801 4 14 1950 0 0 00:00:35 1 13 N/A 172.18.20.113 4 65801 4 14 1950 0 0 00:00:35 2 13 N/A 172.18.20.114 4 65801 4 14 1950 0 0 00:00:35 2 13 N/A 各ノードとのピアリング状況 18 Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 10.128.0.0/24 172.18.20.101 0 0 65801 i *> 10.128.1.0/24 172.18.20.100 0 0 65801 i *> 10.128.2.0/24 172.18.20.102 0 0 65801 i *> 10.128.3.0/24 172.18.20.110 0 0 65801 i *> 10.128.4.0/24 172.18.20.111 0 0 65801 i *> 10.128.5.0/24 172.18.20.112 0 0 65801 i *> 10.128.6.0/24 172.18.20.114 0 0 65801 i *> 10.128.7.0/24 172.18.20.113 0 0 65801 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.20.201/32 172.18.20.113 0 0 65801 i *> 172.18.20.202/32 172.18.20.114 0 0 65801 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i 各ノードから受けた経路 EgressIPの/32の経路 各ノードが、自ノードに割り 当てられた/24のアドレス レンジを広告している
  15. 各ノードが広告するアドレスレンジ ▸ デフォルトのPodネットワークおよびLayer3トポロジーのCUDN は、cidr と hostPrefix で定義される ・ 右の例だと cidr:

    10.128.0.0/16, hostPrefix: 24 ・ cidr を hostPrefix ごとに区切って各ノードに割り 当てる ・ 各ノードは、自分に割り当てられた hostPrefix 分の アドレスレンジからPodにIPアドレスを割り当てる ・ BGPでアドレスを広告するときは、各ノードが自分に割り 当てられた hostPrefix 分のアドレスレンジを広告す る ▸ Layer2トポロジーのCUDNの場合は、各ノードがCUDN subnetのアドレスレンジ全体を広告する $ oc get network.operator cluster -oyaml | yq .spec.clusterNetwork - cidr: 10.128.0.0/16 hostPrefix: 24 $ oc get node -o yaml | yq '.items[] | [{"name":.metadata.name, "subnets":.metadata.annotations."k8s.ovn.org/node-subnets"}]' - name: cp0 subnets: '{"default":["10.128.1.0/24"]}' - name: cp1 subnets: '{"default":["10.128.0.0/24"]}' - name: cp2 subnets: '{"default":["10.128.2.0/24"]}' - name: wk0 subnets: '{"default":["10.128.3.0/24"]}' - name: wk1 subnets: '{"default":["10.128.4.0/24"]}' - name: wk2 subnets: '{"default":["10.128.5.0/24"]}' - name: wk3 subnets: '{"default":["10.128.7.0/24"]}' - name: wk4 subnets: '{"default":["10.128.6.0/24"]}' デフォルトPodネットワークのアドレス 各ノードに割り当てられたアドレスレンジ 19
  16. frrのrunning-config r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... Current configuration: ! frr version 8.5.3 frr defaults traditional hostname wk3 log file /etc/frr/frr.log informational log timestamp precision 3 service integrated-vtysh-config ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 10.128.7.0/24 network 172.18.20.201/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! ip prefix-list 172.18.20.1-allowed-ipv4 seq 1 permit 10.128.7.0/24 ip prefix-list 172.18.20.1-allowed-ipv4 seq 2 permit 172.18.20.201/32 ip prefix-list 172.18.20.1-inpl-ipv4 seq 1 permit any ! ipv6 prefix-list 172.18.20.1-allowed-ipv6 seq 1 deny any ipv6 prefix-list 172.18.20.1-inpl-ipv4 seq 2 permit any ! route-map 172.18.20.1-out permit 1 match ip address prefix-list 172.18.20.1-allowed-ipv4 exit BGP Adv: Default Pod NW 20
  17. frrのrunning-config r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... Current configuration: ! frr version 8.5.3 frr defaults traditional hostname wk3 log file /etc/frr/frr.log informational log timestamp precision 3 service integrated-vtysh-config ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 10.128.7.0/24 network 172.18.20.201/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! ip prefix-list 172.18.20.1-allowed-ipv4 seq 1 permit 10.128.7.0/24 ip prefix-list 172.18.20.1-allowed-ipv4 seq 2 permit 172.18.20.201/32 ip prefix-list 172.18.20.1-inpl-ipv4 seq 1 permit any ! ipv6 prefix-list 172.18.20.1-allowed-ipv6 seq 1 deny any ipv6 prefix-list 172.18.20.1-inpl-ipv4 seq 2 permit any ! route-map 172.18.20.1-out permit 1 match ip address prefix-list 172.18.20.1-allowed-ipv4 exit 自ノードに割り当てられた EgressIPを広告 自ノードに割り当てられた /24のアドレスレンジを広 告 対向ルータr2との ピア設定 カスタムリソース FRRNodeState から、各ノードの frrの running-configが見れる BGP Adv: Default Pod NW 21
  18. ノードwk3のfrrにvtyshで入る r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' BGP table version is 5, local router ID is 172.18.20.201, vrf id 0 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.20.1 0 65102 65101 i *> 10.128.7.0/24 0.0.0.0 0 32768 i *> 172.18.20.0/24 172.18.20.1 0 0 65102 i *> 172.18.20.201/32 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.20.1 0 65102 65101 65103 i Displayed 5 routes and 5 total paths BGP Adv: Default Pod NW $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' IPv4 Unicast Summary (VRF default): BGP router identifier 172.18.20.201, local AS number 65801 vrf-id 0 BGP table version 5 RIB entries 6, using 1152 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 29 20 0 0 0 00:15:20 3 2 N/A Total number of neighbors 1 22
  19. ノードwk3のfrrにvtyshで入る r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' BGP table version is 5, local router ID is 172.18.20.201, vrf id 0 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.20.1 0 65102 65101 i *> 10.128.7.0/24 0.0.0.0 0 32768 i *> 172.18.20.0/24 172.18.20.1 0 0 65102 i *> 172.18.20.201/32 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.20.1 0 65102 65101 65103 i Displayed 5 routes and 5 total paths 対向ルータr2との ピアリング状況 対向ルータr2から受け取っ た経路 BGP Adv: Default Pod NW $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' IPv4 Unicast Summary (VRF default): BGP router identifier 172.18.20.201, local AS number 65801 vrf-id 0 BGP table version 5 RIB entries 6, using 1152 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 29 20 0 0 0 00:15:20 3 2 N/A Total number of neighbors 1 23
  20. PodのIPアドレスに直接アクセスする r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 [test@testvm0 ~]$ traceroute -n 10.128.7.14 traceroute to 10.128.7.14 (10.128.7.14), 30 hops max, 60 byte packets 1 172.18.10.2 0.489 ms 0.461 ms 0.445 ms 2 172.18.12.2 0.753 ms 0.730 ms 0.682 ms 3 172.18.20.113 1.135 ms 1.115 ms 1.087 ms 4 10.128.7.14 3.338 ms 3.316 ms 3.293 ms 5 10.128.7.14 1.643 ms 1.620 ms 1.597 ms [test@testvm0 ~]$ curl http://10.128.7.14:8080 Hello, World! Timestamp: 2025/11/11 02:46:50 Hostname: hello-c84644886-zkwst LocalAddress: 10.128.7.14 Gateway: 10.128.7.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 10.128.7.14:8080 RemoteAddress: 172.18.10.90:45816 65801 $ oc get egressip myegressip -o yaml | yq '.status' items: - egressIP: 172.18.20.201 node: wk3 - egressIP: 172.18.20.202 node: wk4 PodのIPアドレス EgressIP の割り当て PodのIPアドレスに直接アクセス PodのIPアドレスにtraceroute BGP Adv: Default Pod NW 24 $ oc -n proj1 get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-zkwst 1/1 Running 0 9m36s 10.128.7.14 wk3 <none> <none>
  21. ノードwk3上のルーティングテーブル r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 172.18.30.0/24 nhid 183 via 172.18.20.1 dev br-ex proto bgp metric 20 [core@wk3 ~]$ ip -4 -br addr show lo UNKNOWN 127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 172.18.20.201/32 BGP Adv: Default Pod NW 自ノードに割り当てられた EgressIP BGPで受け取った経路 25
  22. OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: v1 kind: Namespace metadata: name: proj2 labels: k8s.ovn.org/primary-user-defined-network: "" apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello app.kubernetes.io/component: hello app.kubernetes.io/instance: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello imagePullPolicy: IfNotPresent name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Deployment 65801 BGP Adv: CUDN 27
  23. OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: v1 kind: Namespace metadata: name: proj2 labels: k8s.ovn.org/primary-user-defined-network: "" apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello app.kubernetes.io/component: hello app.kubernetes.io/instance: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello imagePullPolicy: IfNotPresent name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Deployment 65801 Podのプライマリネットワー クをUDNにする BGP Adv: CUDN 28
  24. OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: k8s.ovn.org/v1 kind: ClusterUserDefinedNetwork metadata: labels: export: "true" name: cudn2 spec: namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - proj2 network: topology: Layer2 layer2: role: Primary ipam: lifecycle: Persistent subnets: - "172.22.0.0/16" ClusterUserDefinedNetwor k 65801 BGP Adv: CUDN 29
  25. OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: k8s.ovn.org/v1 kind: ClusterUserDefinedNetwork metadata: labels: export: "true" name: cudn2 spec: namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - proj2 network: topology: Layer2 layer2: role: Primary ipam: lifecycle: Persistent subnets: - "172.22.0.0/16" ClusterUserDefinedNetwor k 65801 指定したNamespaceに 適用する IPアドレスの永続化 (VMが ライブマイグレーションして もアドレスが変わらないよ うにする) BGP Adv: CUDN CUDNのトポロジーはクラ スターにまたがったひとつ のL2セグメント 30
  26. OpenShiftの設定 (3) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: name: receive-all namespace: openshift-frr-k8s labels: routeAdvertisements: receive-all spec: nodeSelector: {} bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: true toReceive: allowed: mode: all apiVersion: k8s.ovn.org/v1 kind: RouteAdvertisements metadata: name: adv-cudn1 spec: nodeSelector: {} advertisements: - PodNetwork networkSelectors: - networkSelectionType: ClusterUserDefinedNetworks clusterUserDefinedNetworkSelector: networkSelector: matchLabels: export: "true" frrConfigurationSelector: matchLabels: routeAdvertisements: receive-all FRRConfiguratio n RouteAdvertisemen t 65801 BGP Adv: CUDN 31
  27. OpenShiftの設定 (3) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: name: receive-all namespace: openshift-frr-k8s labels: routeAdvertisements: receive-all spec: nodeSelector: {} bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: true toReceive: allowed: mode: all apiVersion: k8s.ovn.org/v1 kind: RouteAdvertisements metadata: name: adv-cudn1 spec: nodeSelector: {} advertisements: - PodNetwork networkSelectors: - networkSelectionType: ClusterUserDefinedNetworks clusterUserDefinedNetworkSelector: networkSelector: matchLabels: export: "true" frrConfigurationSelector: matchLabels: routeAdvertisements: receive-all FRRConfiguratio n RouteAdvertisemen t 65801 export=”true”のラベルが ついたCUDNのアドレスを 広告する BGP Adv: CUDN 32
  28. 各ルータのBGP table r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i *> 172.22.0.0/16 172.18.0.2 0 65102 65801 i Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i *> 172.22.0.0/16 172.18.0.1 0 65101 65102 65801 i r1 r2 r3 BGP Adv: CUDN 33 Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i *= 172.22.0.0/16 172.18.20.102 0 0 65801 i *= 172.18.20.112 0 0 65801 i *= 172.18.20.101 0 0 65801 i *> 172.18.20.100 0 0 65801 i *= 172.18.20.111 0 0 65801 i *= 172.18.20.114 0 0 65801 i *= 172.18.20.110 0 0 65801 i *= 172.18.20.113 0 0 65801 i 各ノードがL2 CUDNの アドレスレンジ全体を広告
  29. frrのrunning-config r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.22.0.0/16 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out import vrf cudn2 exit-address-family ! address-family ipv6 unicast import vrf cudn2 exit-address-family exit ! router bgp 65801 vrf cudn2 no bgp ebgp-requires-policy no bgp hard-administrative-reset no bgp default ipv4-unicast no bgp graceful-restart notification bgp graceful-restart preserve-fw-state no bgp network import-check ! address-family ipv4 unicast import vrf default exit-address-family ! address-family ipv6 unicast import vrf default exit-address-family default VRFからcudn2 VRFにルートリーク BGP Adv: CUDN cudn2 VRFからdefault VRFにルートリーク cudn2のアドレスをdefault VRFから広告 default VRFの設定 作成したCUDNと同じ名前 のVRFが作られる 34
  30. ノードwk3のfrrにvtyshで入る r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.20.1 0 65102 65101 i *> 172.18.20.0/24 172.18.20.1 0 0 65102 i *> 172.18.30.0/24 172.18.20.1 0 65102 65101 65103 i *> 172.22.0.0/16 0.0.0.0 0 32768 i $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp vrf cudn2' ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.20.1@0< 0 65102 65101 i *> 172.18.20.0/24 172.18.20.1@0< 0 0 65102 i *> 172.18.30.0/24 172.18.20.1@0< 0 65102 65101 65103 i 172.22.0.0/16 0.0.0.0@0< 0 32768 i BGP Adv: CUDN $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 15 12 0 0 0 00:07:30 3 1 N/A 対向ルータr2から 受け取ったBGP経路 default VRFからリーク したBGP経路 35
  31. PodのIPアドレスに直接アクセスする r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 Pod Network 10.128.0.0/16 [test@testvm0 ~]$ traceroute -n 172.22.0.4 traceroute to 172.22.0.4 (172.22.0.4), 30 hops max, 60 byte packets 1 172.18.10.2 0.397 ms 0.368 ms 0.352 ms 2 172.18.12.2 0.612 ms 0.588 ms 0.568 ms 3 172.18.20.101 1.658 ms 1.642 ms 1.627 ms 4 172.22.0.4 5.354 ms 5.338 ms 5.323 ms [test@testvm0 ~]$ curl http://172.22.0.4:8080 Hello, World! Timestamp: 2025/11/13 02:04:55 Hostname: hello-c84644886-x8rgb LocalAddress: 10.128.7.246 Gateway: 172.22.0.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 172.22.0.4:8080 RemoteAddress: 172.18.10.90:39232 65801 $ oc -n proj2 get pod hello-c84644886-rdcvl -oyaml | yq '.metadata.annotations."k8s.v1.cni.cncf.io/ network-status"' [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.7.246" ], "mac": "0a:58:0a:80:07:f6", "dns": {} },{ "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "172.22.0.4" ], "mac": "0a:58:ac:16:00:04", "default": true, "dns": {} }] アプリPodのIPアドレス PodのCUDNアドレス PodのIPアドレスに直接アクセス PodのIPアドレスにtraceroute BGP Adv: CUDN 36 $ oc -n proj2 get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-x8rgb 1/1 Running 0 11m 10.128.7.246 wk3 <none> <none>
  32. ノード(wk3)のルーティングテーブル [core@wk3 ~]$ ip -4 -br addr show lo UNKNOWN

    127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 ovn-k8s-mp7 UNKNOWN 172.22.0.2/16 [core@wk3 ~]$ ip -4 -br addr show vrf cudn2 ovn-k8s-mp7 UNKNOWN 172.22.0.2/16 [core@wk3 ~]$ ip route show vrf cudn2 default via 172.18.20.1 dev br-ex mtu 1400 unreachable default metric 4278198272 10.200.0.0/16 via 169.254.0.4 dev br-ex mtu 1400 169.254.0.3 via 172.22.0.1 dev ovn-k8s-mp7 169.254.0.24 dev ovn-k8s-mp7 mtu 1400 172.18.30.0/24 nhid 2247 via 172.18.20.1 dev br-ex proto bgp metric 20 172.22.0.0/16 dev ovn-k8s-mp7 proto kernel scope link src 172.22.0.2 デフォルトVRFのインターフェースIPアドレス cudn2 VRF のインターフェースIPアドレス デフォルトVRFのルーティングテーブル cudn2 VRF のルーティングテーブル [core@wk3 ~]$ ip rule show 0: from all lookup local 30: from all fwmark 0x1745ec lookup 7 1000: from all lookup [l3mdev-table] 2000: from all fwmark 0x1007 lookup 2194 2000: from all to 169.254.0.24 lookup 2194 2000: from all to 172.22.0.0/16 lookup 2194 5999: from all fwmark 0x3f0 lookup main 32766: from all lookup main 32767: from all lookup default [core@wk3 ~]$ ip vrf show Name Table ----------------------- cudn2 2194 ルーティングルール VRF cudn2のアドレス(172.22.0.0/16) 宛てのパケットはここのルールに 引っかけて... こっちのルーティングテーブルに したがって転送される 37 [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 172.18.30.0/24 nhid 2247 via 172.18.20.1 dev br-ex proto bgp metric 20
  33. 復習: UDNをプライマリネットワークにした場合の見え方 ▸ virt-launcher Podには、デフォルトPodネットワークのIPアドレスがついている ・ oc get pod -o

    wide して見えるPodのIPアドレスはデフォルト Podネットワークのアドレス ・ virt-launcher Podのデフォルトゲートウェイはデフォルト Podネットワークのゲートウェイ ➔ これらのアドレスは偽物 (oc execやvirtctl console等で使用) ▸ VMにログインすると、1個目のNIC(デフォルトゲートウェイがついているインターフェース )はUDNのアドレス がついている 39 $ oc get vm NAME AGE STATUS READY fedora-1 19m Running True $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES virt-launcher-fedora-1-x8h6t 2/2 Running 0 19m 10.128.6.186 wk4 <none> 1/1 $ oc exec virt-launcher-fedora-1-x8h6t -- ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if2910: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:80:06:ba brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.128.6.186/24 brd 10.128.6.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe80:6ba/64 scope link valid_lft forever preferred_lft forever 3: ovn-udn1-nic@if2911: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master k6t-ovn-udn1 state UP group デフォルトPodネットワーク のIPアドレスが 見える (偽物)
  34. 復習: UDNをプライマリネットワークにした場合の見え方 ▸ virt-launcher Podには、デフォルトPodネットワークのIPアドレスがついている ・ oc get pod -o

    wide して見えるPodのIPアドレスはデフォルト Podネットワークのアドレス ・ virt-launcher Podのデフォルトゲートウェイはデフォルト Podネットワークのゲートウェイ ➔ これらのアドレスは偽物 (oc execやvirtctl console等で使用) ▸ VMにログインすると、1個目のNIC(デフォルトゲートウェイがついているインターフェース )はUDNのアドレス がついている 40 $ oc get pod virt-launcher-fedora-1-x8h6t -o yaml | yq '.metadata.annotations."k8s.v1.cni.cncf.io/network-status" ' [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.6.186" ], "mac": "0a:58:0a:80:06:ba", "dns": {} },{ "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "172.22.0.3" ], "mac": "0a:58:ac:16:00:03", "default": true, "dns": {} }] $ oc get vmi fedora-1 -o yaml | yq .status.interfaces - infoSource: domain, guest-agent interfaceName: enp1s0 ipAddress: 172.22.0.3 ipAddresses: - 172.22.0.3 - fe80::858:acff:fe16:3 linkState: up mac: 0a:58:ac:16:00:03 name: default podInterfaceName: ovn-udn1 queueCount: 4 仮想マシンのプライマリネッ トワークのIPアドレス
  35. 注意点 ▸ プライマリネットワークを CUDNに設定した場合、binding modeが l2bridge になる ・ 通常は masquerade

    デフォルト Podネットワークの場合 プライマリネットワークを CUDNに設定した場合 41
  36. enp1s0 br-int veth br-ex eth0 VM k6t-ovn-udn1 tap0 ovn-udn1-nic CUDNに接続するVMのネットワーク構成(bridge

    mode) 42 $ oc exec virt-launcher-fedora-1-x8h6t -- ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if2910: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:80:06:ba brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.128.6.186/24 brd 10.128.6.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe80:6ba/64 scope link valid_lft forever preferred_lft forever 3: ovn-udn1-nic@if2911: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master k6t-ovn-udn1 state UP group default link/ether 76:75:e2:57:80:9d brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::7475:e2ff:fe57:809d/64 scope link valid_lft forever preferred_lft forever 4: k6t-ovn-udn1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000 link/ether 76:75:e2:57:80:9d brd ff:ff:ff:ff:ff:ff inet6 fe80::7475:e2ff:fe57:809d/64 scope link valid_lft forever preferred_lft forever 5: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq master k6t-ovn-udn1 state UP group default qlen 1000 link/ether be:73:4f:06:bb:af brd ff:ff:ff:ff:ff:ff inet6 fe80::bc73:4fff:fe06:bbaf/64 scope link valid_lft forever preferred_lft forever 6: ovn-udn1: <BROADCAST,NOARP> mtu 1400 qdisc noop state DOWN group default qlen 1000 link/ether 0a:58:ac:16:00:03 brd ff:ff:ff:ff:ff:ff inet 172.22.0.3/16 brd 172.22.255.255 scope global ovn-udn1 valid_lft forever preferred_lft forever inet6 fe80::858:acff:fe16:3/64 scope link valid_lft forever preferred_lft forever veth bridge tap dummy localhost:~$ oc exec virt-launcher-fedora-9-cbf8r -- ip route show 10.131.0.0/23 dev eth0 proto kernel scope link src 10.131.1.14 virt-launcher Podの インターフェース Node Pod このdummyインターフェースは通信には使用しない (そもそも Downしている、CNIが払い出したMACアドレスを仮想マシンに 伝搬させるために便宜的に存在している)
  37. CUDNに接続するVMのネットワーク構成(bridge mode) 43 [fedora@fedora-1 ~]$ ip addr show 1: lo:

    <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UP group default qlen 1000 link/ether 0a:58:ac:16:00:03 brd ff:ff:ff:ff:ff:ff altname enx0a58ac160003 inet 172.22.0.3/16 brd 172.22.255.255 scope global dynamic noprefixroute enp1s0 valid_lft 3430sec preferred_lft 3430sec inet6 fe80::858:acff:fe16:3/64 scope link noprefixroute valid_lft forever preferred_lft forever [fedora@fedora-1 ~]$ ip route show default via 172.22.0.1 dev enp1s0 proto dhcp src 172.22.0.3 metric 100 172.22.0.0/16 dev enp1s0 proto kernel scope link src 172.22.0.3 metric 100 VMのインターフェース (VMにログインして実行) enp1s0 br-int veth br-ex eth0 VM k6t-ovn-udn1 tap0 ovn-udn1-nic Node Pod bridge接続の場合は、仮想マ シンのインターフェースにPod のIPアドレスがつく
  38. 参考: デフォルトPodネットワークに接続するVM (masquerade mode) 44 Node enp1s0 br-int veth br-ex

    eth0 VM Pod k6t-eth0 tap0 eth0 localhost:~$ oc exec virt-launcher-fedora-0-8twvt -- ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if165: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:81:02:84 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.129.2.132/23 brd 10.129.3.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe81:284/64 scope link valid_lft forever preferred_lft forever 3: k6t-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000 link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet 10.0.2.1/24 brd 10.0.2.255 scope global k6t-eth0 valid_lft forever preferred_lft forever inet6 fe80::ff:fe00:0/64 scope link valid_lft forever preferred_lft forever 4: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel master k6t-eth0 state UP group default qlen 1000 link/ether 4e:0c:58:54:2b:f8 brd ff:ff:ff:ff:ff:ff inet6 fe80::4c0c:58ff:fe54:2bf8/64 scope link valid_lft forever preferred_lft forever veth bridge tap localhost:~$ oc exec virt-launcher-fedora-0-8twvt -- ip r default via 10.129.2.1 dev eth0 10.0.2.0/24 dev k6t-eth0 proto kernel scope link src 10.0.2.1 10.128.0.0/14 via 10.129.2.1 dev eth0 10.129.2.0/23 dev eth0 proto kernel scope link src 10.129.2.132 100.64.0.0/16 via 10.129.2.1 dev eth0 172.30.0.0/16 via 10.129.2.1 dev eth0 virt-launcher Podの インターフェース PodのIPアドレス
  39. 参考: デフォルトPodネットワークに接続するVM (masquerade mode) 45 Node enp1s0 br-int veth br-ex

    eth0 VM Pod k6t-eth0 tap0 eth0 localhost:~$ oc exec virt-launcher-fedora-0-8twvt -- ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if165: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:81:02:84 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.129.2.132/23 brd 10.129.3.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe81:284/64 scope link valid_lft forever preferred_lft forever 3: k6t-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000 link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet 10.0.2.1/24 brd 10.0.2.255 scope global k6t-eth0 valid_lft forever preferred_lft forever inet6 fe80::ff:fe00:0/64 scope link valid_lft forever preferred_lft forever 4: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel master k6t-eth0 state UP group default qlen 1000 link/ether 4e:0c:58:54:2b:f8 brd ff:ff:ff:ff:ff:ff inet6 fe80::4c0c:58ff:fe54:2bf8/64 scope link valid_lft forever preferred_lft forever localhost:~$ oc exec virt-launcher-fedora-0-8twvt -- ip r default via 10.129.2.1 dev eth0 10.0.2.0/24 dev k6t-eth0 proto kernel scope link src 10.0.2.1 10.128.0.0/14 via 10.129.2.1 dev eth0 10.129.2.0/23 dev eth0 proto kernel scope link src 10.129.2.132 100.64.0.0/16 via 10.129.2.1 dev eth0 172.30.0.0/16 via 10.129.2.1 dev eth0 worker-0:~$ sudo nsenter -t 2062231 -n nft list table nat table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; iifname "eth0" counter packets 2 bytes 120 jump KUBEVIRT_PREINBOUND } chain input { type nat hook input priority srcnat; policy accept; } chain output { type nat hook output priority dstnat; policy accept; ip daddr 127.0.0.1 counter packets 0 bytes 0 dnat to 10.0.2.2 } chain postrouting { type nat hook postrouting priority srcnat; policy accept; ip saddr 10.0.2.2 counter packets 157 bytes 11804 masquerade oifname "k6t-eth0" counter packets 4 bytes 579 jump KUBEVIRT_POSTINBOUND } chain KUBEVIRT_PREINBOUND { counter packets 2 bytes 120 dnat to 10.0.2.2 } chain KUBEVIRT_POSTINBOUND { ip saddr 127.0.0.1 counter packets 0 bytes 0 snat to 10.0.2.1 } } virt-launcher Pod内の nftablesルール masquerade
  40. 参考: デフォルトPodネットワークに接続するVM (masquerade mode) 46 Node enp1s0 br-int veth br-ex

    eth0 VM Pod k6t-eth0 tap0 eth0 localhost: ~$ oc virt ssh fedora-0 Last login: Thu Oct 10 04:34:07 2024 from 10.131.0.57 [fedora-0: ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UP group default qlen 1000 link/ether 02:b3:f0:00:00:45 brd ff:ff:ff:ff:ff:ff altname enp1s0 inet 10.0.2.2/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0 valid_lft 86306184sec preferred_lft 86306184sec inet6 fe80::b3:f0ff:fe00:45/64 scope link noprefixroute valid_lft forever preferred_lft forever [fedora-0: ~]$ ip r default via 10.0.2.1 dev eth0 proto dhcp src 10.0.2.2 metric 100 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.2 metric 100 VMのインターフェース (VMにログインして実行) 仮想マシンのインターフェース には、Podとは異なるIPアドレ スがつく
  41. VMのIPアドレスに直接アクセスする r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 [test@testvm0 ~]$ traceroute -n 172.22.0.5 traceroute to 172.22.0.5 (172.22.0.5), 30 hops max, 60 byte packets 1 172.18.10.2 1.253 ms 1.279 ms 0.962 ms 2 172.18.12.2 0.837 ms 0.777 ms 0.733 ms 3 172.18.20.114 1.960 ms 1.928 ms 1.882 ms 4 172.22.0.5 4.479 ms 3.967 ms * $ ssh -i ~/my_id_rsa -l fedora 172.22.0.5 uptime 02:15:50 up 5 min, 3 users, load average: 0.00, 0.05, 0.03 65801 $ oc -n proj2 get pod virt-launcher-fedora-1-tc8td -o yaml | yq '.metadata.annotations."k8s.v1.cni.cncf.io/networ k-status"' [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.6.219" ], "mac": "0a:58:0a:80:06:db", "dns": {} },{ "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "172.22.0.5" ], "mac": "0a:58:ac:16:00:05", "default": true, "dns": {} }] cudn2 のIPアドレス VMのIPアドレスに直接アクセス VMのIPアドレスにtraceroute BGP Adv: CUDN 47 $ oc -n proj2 get vmi fedora-1 -o yaml | yq .status.interfaces - infoSource: domain, guest-agent interfaceName: enp1s0 ipAddress: 172.22.0.5 ipAddresses: - 172.22.0.5 - fe80::858:acff:fe16:5 linkState: up mac: 0a:58:ac:16:00:05 name: default podInterfaceName: ovn-udn1 queueCount: 4
  42. VMのライブマイグレーション ▸ ライブマイグレーション実行 NAME READY STATUS RESTARTS AGE IP NODE

    NOMINATED NODE READINESS GATES hello-c84644886-x8rgb 1/1 Running 0 3h48m 10.128.7.246 wk3 <none> <none> virt-launcher-fedora-1-tc8td 2/2 Running 0 3h31m 10.128.6.219 wk4 <none> 1/1 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-x8rgb 1/1 Running 0 3h49m 10.128.7.246 wk3 <none> <none> virt-launcher-fedora-1-p444t 2/2 Running 0 33s 10.128.7.209 wk3 <none> 1/1 virt-launcher-fedora-1-tc8td 0/2 Completed 0 3h31m 10.128.6.219 wk4 <none> 1/1 [test@testvm0 ~]$ ping -D 172.22.0.5 PING 172.22.0.5 (172.22.0.5) 56(84) bytes of data. [1763012452.383248] 64 bytes from 172.22.0.5: icmp_seq=195 ttl=60 time=1.26 ms [1763012453.384375] 64 bytes from 172.22.0.5: icmp_seq=196 ttl=60 time=0.979 ms [1763012454.385804] 64 bytes from 172.22.0.5: icmp_seq=197 ttl=60 time=1.27 ms [1763012455.388274] 64 bytes from 172.22.0.5: icmp_seq=198 ttl=60 time=2.24 ms [1763012456.388445] 64 bytes from 172.22.0.5: icmp_seq=199 ttl=60 time=0.971 ms [1763012457.395423] 64 bytes from 172.22.0.5: icmp_seq=200 ttl=60 time=6.72 ms [1763012458.397395] 64 bytes from 172.22.0.5: icmp_seq=201 ttl=60 time=6.73 ms [1763012459.394317] 64 bytes from 172.22.0.5: icmp_seq=202 ttl=60 time=1.70 ms [1763012460.395882] 64 bytes from 172.22.0.5: icmp_seq=203 ttl=60 time=1.35 ms [1763012461.397378] 64 bytes from 172.22.0.5: icmp_seq=204 ttl=60 time=1.25 ms [1763012462.399342] 64 bytes from 172.22.0.5: icmp_seq=205 ttl=60 time=1.72 ms ▸ 外からのpingは途切れない 実行前 実行後 48
  43. VMのライブマイグレーション ▸ Layer2トポロジーのCUDNをプライマリネットワークにした場合は、 VMのプライマリインターフェースの IPアド レスはライブマイグレーションの前後で変わらない $ oc -n proj2

    get pod virt-launcher-fedora-1-tc8td -o yaml | yq .metadata.annotations ... k8s.v1.cni.cncf.io/network-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.6.219" ], "mac": "0a:58:0a:80:06:db", "dns": {} },{ "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "172.22.0.5" ], "mac": "0a:58:ac:16:00:05", "default": true, "dns": {} }] $ oc -n proj2 get pod virt-launcher-fedora-1-p444t -o yaml | yq .metadata.annotations ... k8s.v1.cni.cncf.io/network-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.7.209" ], "mac": "0a:58:0a:80:07:d1", "dns": {} },{ "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "172.22.0.5" ], "mac": "0a:58:ac:16:00:05", "default": true, "dns": {} }] ライブマイグレーション後 ライブマイグレーション前 49
  44. VRF Liteを使ったCUDNアドレスの広告 BGP Adv: VRF Lite r1 net10 net12 net13

    r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 172.19.23.0/24 .2 65103 .1 172.19.24.0/24 vrf:cudn3 vrf:cudn4 65801 .1XX vlan2001 vlan2002 r3 ▸ CUDNごとにVRFをわけてアドレスを広告 ▸ デフォルトNICとは異なるインターフェースを VRFに割り当てる 51
  45. 用語集 ▸ VRF = Virtual Routing and Forwarding ・ 1台のルータ上で独立した複数の仮想的なルーティングテーブルを作成

    /管理する機能 ・ 1台のネットワーク機器を複数の仮想ルータとして動作させることができる ・ L2の仮想化がVLAN、L3の仮想化がVRF ▸ full VRF vs VRF Lite ・ full VRF ・ MPLS/MP-BGPとVRFを組み合わせて大規模キャリアネットワークでマルチテナント L3VPNを 構成するときのVRFの使い方をfull VRFという(もしくは単にVRFと書かれることもある、機能とし てのVRFとややこしいけど) ・ VRF lite ・ full VRFのサブセット、単一ルータ上で複数の VRFを作る 52
  46. 用語集 ▸ LinuxにおけるVRF ・ VRFはNetwork namespace(netns)と似て非なる機能、netnsのサブセット ・ Network namespace ・

    プロトコルスタック、netfilter等のネットワーク関連機能一式をまとめて分離する ・ VRF ・ ルーティングテーブルのみ分離する ・ 例: ・ デフォルトnamespaceで ip link show すると、netnsをわけた場合はnetnsに所属するイン ターフェースは見えないが、 VRFをわけた場合は全てのインターフェースが見える ▸ 関連コマンド ・ ip vrf show ・ ip link show type vrf ・ ip link show master <VRF_INTERFACE> 53
  47. OpenShiftの設定 (1) apiVersion: v1 kind: Namespace metadata: name: proj3 labels:

    k8s.ovn.org/primary-user-defined-network: "" --- apiVersion: v1 kind: Namespace metadata: name: proj4 labels: k8s.ovn.org/primary-user-defined-network: "" apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello app.kubernetes.io/component: hello app.kubernetes.io/instance: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello imagePullPolicy: IfNotPresent name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Deployment BGP Adv: VRF Lite r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 65801 vlan2002 54
  48. OpenShiftの設定 (1) apiVersion: v1 kind: Namespace metadata: name: proj3 labels:

    k8s.ovn.org/primary-user-defined-network: "" --- apiVersion: v1 kind: Namespace metadata: name: proj4 labels: k8s.ovn.org/primary-user-defined-network: "" apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello app.kubernetes.io/component: hello app.kubernetes.io/instance: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello imagePullPolicy: IfNotPresent name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Deployment BGP Adv: VRF Lite r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 Podのプライマリネットワー クをUDNにする vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 55
  49. OpenShiftの設定 (2) BGP Adv: VRF Lite apiVersion: k8s.ovn.org/v1 kind: ClusterUserDefinedNetwork

    metadata: labels: export: "true" name: cudn3 spec: namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - proj3 network: topology: Layer3 layer3: role: Primary subnets: - cidr: 172.23.0.0/16 hostSubnet: 24 ClusterUserDefinedNetwor k apiVersion: k8s.ovn.org/v1 kind: ClusterUserDefinedNetwork metadata: labels: export: "true" name: cudn4 spec: namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - proj4 network: topology: Layer2 layer2: role: Primary ipam: lifecycle: Persistent subnets: - "172.24.0.0/16" r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 ClusterUserDefinedNetwor k vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 56
  50. OpenShiftの設定 (2) BGP Adv: VRF Lite apiVersion: k8s.ovn.org/v1 kind: ClusterUserDefinedNetwork

    metadata: labels: export: "true" name: cudn3 spec: namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - proj3 network: topology: Layer3 layer3: role: Primary subnets: - cidr: 172.23.0.0/16 hostSubnet: 24 ClusterUserDefinedNetwor k apiVersion: k8s.ovn.org/v1 kind: ClusterUserDefinedNetwork metadata: labels: export: "true" name: cudn4 spec: namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - proj4 network: topology: Layer2 layer2: role: Primary ipam: lifecycle: Persistent subnets: - "172.24.0.0/16" r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 ClusterUserDefinedNetwor k cudn3(Layer3トポロジー)を proj3に紐づける cudn4(Layer2トポロジー)を proj4に紐づける vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 57
  51. OpenShiftの設定 (3) BGP Adv: VRF Lite apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration

    metadata: name: vrflite namespace: openshift-frr-k8s labels: routeAdvertisements: vrflite spec: bgp: routers: - asn: 65801 vrf: cudn3 neighbors: - address: 172.19.23.1 asn: 65103 disableMP: true toReceive: allowed: mode: all - asn: 65801 vrf: cudn4 neighbors: - address: 172.19.24.1 asn: 65103 disableMP: true toReceive: allowed: mode: all FRRConfiguratio n r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 58
  52. OpenShiftの設定 (3) BGP Adv: VRF Lite apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration

    metadata: name: vrflite namespace: openshift-frr-k8s labels: routeAdvertisements: vrflite spec: bgp: routers: - asn: 65801 vrf: cudn3 neighbors: - address: 172.19.23.1 asn: 65103 disableMP: true toReceive: allowed: mode: all - asn: 65801 vrf: cudn4 neighbors: - address: 172.19.24.1 asn: 65103 disableMP: true toReceive: allowed: mode: all FRRConfiguratio n r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 CUDNごとにVRFを わける vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 59
  53. OpenShiftの設定 (4) BGP Adv: VRF Lite apiVersion: k8s.ovn.org/v1 kind: RouteAdvertisements

    metadata: name: advertise-vrf-lite spec: targetVRF: auto advertisements: - "PodNetwork" nodeSelector: {} frrConfigurationSelector: matchLabels: routeAdvertisements: vrflite networkSelectors: - networkSelectionType: ClusterUserDefinedNetworks clusterUserDefinedNetworkSelector: networkSelector: matchLabels: export: "true" RouteAdvertisemen t r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 60
  54. OpenShiftの設定 (4) BGP Adv: VRF Lite apiVersion: k8s.ovn.org/v1 kind: RouteAdvertisements

    metadata: name: advertise-vrf-lite spec: targetVRF: auto advertisements: - "PodNetwork" nodeSelector: {} frrConfigurationSelector: matchLabels: routeAdvertisements: vrflite networkSelectors: - networkSelectionType: ClusterUserDefinedNetworks clusterUserDefinedNetworkSelector: networkSelector: matchLabels: export: "true" RouteAdvertisemen t r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 経路をどのVRFから広告す るか (次ページ参照) vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 61
  55. RouteAdvertisementのtargetVRF設定 ▸ targetVRF : Podネットワークの経路をどの VRFから広告するかを設定 ・ 現時点では、default もしくは auto

    の設定のみ有効 (デフォルトは default ) ・ default ・ PodネットワークをCUDNのVRFからdefault VRFにリークし、default VRFから広告する ・ 結果的にCUDNによるネットワーク分離は (BGPによる経路広告の観点では )機能しなくなる ・ CUDNをデフォルトNICから広告する場合はこちらの設定 ・ auto ・ CUDNごとに異なるVRF(CUDNと同じ名前のVRF)から経路を広告する ・ CUDN名に文字数制限 (同じ名前のVRFインターフェースが生えてくるが、 Linuxのインターフェー ス名に文字数制限があるため ) ・ デフォルトNICのとは異なるNICから経路広告する ・ VRF Liteの構成する場合はこちらの設定 62
  56. OpenShiftの設定 (5) BGP Adv: VRF Lite apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy

    metadata: name: enp3s0-wk3 spec: nodeSelector: kubernetes.io/hostname: wk3 desiredState: interfaces: - name: enp3s0 type: ethernet state: up ipv4: dhcp: false enabled: false - name: vlan2001 type: vlan state: up controller: cudn3 ipv4: address: - ip: 172.19.23.113 prefix-length: 24 dhcp: false enabled: true vlan: base-iface: enp3s0 id: 2001 - name: vlan2002 type: vlan state: up controller: cudn4 ipv4: address: - ip: 172.19.24.113 prefix-length: 24 dhcp: false enabled: true vlan: base-iface: enp3s0 id: 2002 NNCP r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 VRF cudn3用の VLANインターフェース ノードごとにNNCPの manifestを作成する ノード固有の設定 ノード固有の設定 VRF cudn4用の VLANインターフェース 63
  57. NNCP (NodeNetworkConfigurationPolicy) の設定 ▸ VRF Liteの場合、VRFごとにノードのインターフェース (物理NIC or VLAN 等のサブインターフェース

    )を割り当ててBGPピアを設定する ▸ 追加NICの設定にはNNCPを使いたい、がNNCPはノード個別の設定 (IP アドレス等)をするのは苦手 ・ 泥臭く nodeSelector を使ってノードごとに NNCPのmanifestを書 くしか... ▸ 指定したVRFにインターフェースを所属にさせる設定は、 NMStateの Interface.CONTROLLER API[1]を使って記述する ・ spec.desiredState.interfaces[].controller にVRF 名を指定すると、NMStateが該当インターフェースを指定した VRF 配下に入れてくれる ・ VRFがNetworkManagerのconnectionとして定義されていることが前提 ・ OpenShiftの場合、CUDNを作成すると、同名のVRFおよびnm connectionが自動的に作成される apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: enp3s0-wk3 spec: nodeSelector: kubernetes.io/hostname: wk3 desiredState: interfaces: - name: enp3s0 type: ethernet state: up ipv4: dhcp: false enabled: false - name: vlan2001 type: vlan state: up controller: cudn3 ipv4: address: - ip: 172.19.23.113 prefix-length: 24 dhcp: false enabled: true vlan: base-iface: enp3s0 id: 2001 [1] https://nmstate.io/devel/api.html#interfacecontroller 64
  58. 各ルータのBGP table BGP Adv: VRF Lite r1 r2 r3 r1

    net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i *> 172.23.0.0/24 172.18.0.3 0 65103 65801 i *> 172.23.1.0/24 172.18.0.3 0 65103 65801 i *> 172.23.2.0/24 172.18.0.3 0 65103 65801 i *> 172.23.3.0/24 172.18.0.3 0 65103 65801 i *> 172.23.4.0/24 172.18.0.3 0 65103 65801 i *> 172.23.5.0/24 172.18.0.3 0 65103 65801 i *> 172.23.6.0/24 172.18.0.3 0 65103 65801 i *> 172.23.7.0/24 172.18.0.3 0 65103 65801 i *> 172.24.0.0/16 172.18.0.3 0 65103 65801 i Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i *> 172.23.0.0/24 172.19.23.102 0 0 65801 i *> 172.23.1.0/24 172.19.23.112 0 0 65801 i *> 172.23.2.0/24 172.19.23.113 0 0 65801 i *> 172.23.3.0/24 172.19.23.111 0 0 65801 i *> 172.23.4.0/24 172.19.23.101 0 0 65801 i *> 172.23.5.0/24 172.19.23.100 0 0 65801 i *> 172.23.6.0/24 172.19.23.110 0 0 65801 i *> 172.23.7.0/24 172.19.23.114 0 0 65801 i *= 172.24.0.0/16 172.19.24.111 0 0 65801 i *= 172.19.24.102 0 0 65801 i *= 172.19.24.101 0 0 65801 i *= 172.19.24.110 0 0 65801 i *= 172.19.24.100 0 0 65801 i *= 172.19.24.113 0 0 65801 i *= 172.19.24.114 0 0 65801 i *> 172.19.24.112 0 0 65801 i L2 CUDNの経路は、各ノー ドがアドレスレンジ全体 (/16) を広告 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 65 Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i *> 172.23.0.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.1.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.2.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.3.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.4.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.5.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.6.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.23.7.0/24 172.18.0.1 0 65101 65103 65801 i *> 172.24.0.0/16 172.18.0.1 0 65101 65103 65801 i L3 CUDNの経路は、各 ノードが、自ノードに割り当 てられた/24のアドレスレ ンジを広告る
  59. frrのrunning-config $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig

    Building configuration... ... router bgp 65801 vrf cudn3 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.19.23.1 remote-as 65103 ! address-family ipv4 unicast network 172.23.2.0/24 neighbor 172.19.23.1 activate neighbor 172.19.23.1 route-map 172.19.23.1-cudn3-in in neighbor 172.19.23.1 route-map 172.19.23.1-cudn3-out out exit-address-family exit ! router bgp 65801 vrf cudn4 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.19.24.1 remote-as 65103 ! address-family ipv4 unicast network 172.24.0.0/16 neighbor 172.19.24.1 activate neighbor 172.19.24.1 route-map 172.19.24.1-cudn4-in in neighbor 172.19.24.1 route-map 172.19.24.1-cudn4-out out exit-address-family exit ! ip prefix-list 172.19.23.1-cudn3-inpl-ipv4 seq 1 permit any ip prefix-list 172.19.24.1-cudn4-inpl-ipv4 seq 1 permit any ip prefix-list 172.19.23.1-cudn3-allowed-ipv4 seq 1 permit 172.23.2.0/24 ip prefix-list 172.19.24.1-cudn4-allowed-ipv4 seq 1 permit BGP Adv: VRF Lite r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 66
  60. frrのrunning-config $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig

    Building configuration... ... router bgp 65801 vrf cudn3 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.19.23.1 remote-as 65103 ! address-family ipv4 unicast network 172.23.2.0/24 neighbor 172.19.23.1 activate neighbor 172.19.23.1 route-map 172.19.23.1-cudn3-in in neighbor 172.19.23.1 route-map 172.19.23.1-cudn3-out out exit-address-family exit ! router bgp 65801 vrf cudn4 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.19.24.1 remote-as 65103 ! address-family ipv4 unicast network 172.24.0.0/16 neighbor 172.19.24.1 activate neighbor 172.19.24.1 route-map 172.19.24.1-cudn4-in in neighbor 172.19.24.1 route-map 172.19.24.1-cudn4-out out exit-address-family exit ! ip prefix-list 172.19.23.1-cudn3-inpl-ipv4 seq 1 permit any ip prefix-list 172.19.24.1-cudn4-inpl-ipv4 seq 1 permit any ip prefix-list 172.19.23.1-cudn3-allowed-ipv4 seq 1 permit 172.23.2.0/24 ip prefix-list 172.19.24.1-cudn4-allowed-ipv4 seq 1 permit BGP Adv: VRF Lite r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 cudn3 VRFの設定 cudn4 VRFの設定 cudn3で広告するアドレス (Layer3なので自ノードで 担当する/24) cudn4で広告するアドレス (Layer2なので/16全体) 67
  61. ノードwk3のfrrにvtyshで入る BGP Adv: VRF Lite r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 vrf:cudn4 vlan2002 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' Default BGP instance not found $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp vrf cudn4' BGP table version is 4, local router ID is 172.24.0.2, vrf id 1444 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.19.24.1 0 65103 65101 i *> 172.18.20.0/24 172.19.24.1 0 65103 65101 65102 i *> 172.18.30.0/24 172.19.24.1 0 0 65103 i *> 172.24.0.0/16 0.0.0.0 0 32768 i Displayed 4 routes and 4 total paths $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp vrf cudn3' BGP table version is 4, local router ID is 172.23.2.2, vrf id 1442 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.19.23.1 0 65103 65101 i *> 172.18.20.0/24 172.19.23.1 0 65103 65101 65102 i *> 172.18.30.0/24 172.19.23.1 0 0 65103 i *> 172.23.2.0/24 0.0.0.0 0 32768 i Displayed 4 routes and 4 total paths cudn4 VRFから広告してい るBGP経路 対向ルータr3から 受け取ったBGP経路 cudn3 VRFから広告してい るBGP経路 対向ルータr3から 受け取ったBGP経路 68
  62. PodのIPアドレスに直接アクセスする (cudn3) BGP Adv: VRF Lite r1 net10 net12 net13

    r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 cudn3 のIPアドレス vrf:cudn4 vlan2002 [test@testvm0 ~]$ traceroute -n 172.23.2.4 traceroute to 172.23.2.4 (172.23.2.4), 30 hops max, 60 byte packets 1 172.18.10.2 0.239 ms 0.199 ms 0.175 ms 2 172.18.13.2 0.417 ms 0.397 ms 0.376 ms 3 172.19.23.113 0.775 ms 0.754 ms 0.733 ms 4 172.23.2.4 6.122 ms 6.060 ms 6.038 ms [test@testvm0 ~]$ curl http://172.23.2.4:8080 Hello, World! Timestamp: 2025/11/14 09:42:09 Hostname: hello-c84644886-h6jdc LocalAddress: 10.128.7.37 Gateway: 172.23.2.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 172.23.2.4:8080 RemoteAddress: 172.18.10.90:45860 PodのIPアドレスに直接アクセス PodのIPアドレスにtraceroute $ oc -n proj3 get pod hello-c84644886-h6jdc -o yaml | yq '.metadata.annotations."k8s.v1.cni.cncf.io/netwo rk-status"' [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.7.37" ], "mac": "0a:58:0a:80:07:25", "dns": {} },{ "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "172.23.2.4" ], "mac": "0a:58:ac:17:02:04", "default": true, "dns": {} }] 172.23.2.4 69
  63. VMのIPアドレスに直接アクセスする (cudn4) BGP Adv: VRF Lite r1 net10 net12 net13

    r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 65801 vrf:cudn3 vlan2001 vlan2001 172.19.23.0/24 172.19.24.0/24 vlan2002 $ oc -n proj4 get pod virt-launcher-fedora-0-nhf5b -oyaml | yq '.metadata.annotations."k8s.v1.cni.cncf.io/netwo rk-status"' [{ "name": "openshift-ovn-kubernetes/default", "interface": "eth0", "ips": [ "10.128.7.40" ], "mac": "02:a8:37:d5:96:af", "dns": {} },{ "name": "openshift-ovn-kubernetes/default", "interface": "ovn-udn1", "ips": [ "172.24.0.4" ], "mac": "0a:58:ac:18:00:04", "default": true, "dns": {} }] cudn4 のIPアドレス vrf:cudn4 vlan2002 [test@testvm0 ~]$ traceroute -n 172.24.0.4 traceroute to 172.24.0.4 (172.24.0.4), 30 hops max, 60 byte packets 1 172.18.10.2 0.302 ms 0.268 ms * 2 172.18.13.2 0.422 ms 0.427 ms 0.411 ms 3 172.19.24.111 0.601 ms 0.584 ms 0.568 ms 4 172.24.0.4 6.128 ms 6.112 ms 6.094 ms 5 172.24.0.4 5.779 ms 5.850 ms 5.749 ms [test@testvm0 ~]$ ssh -l fedora 172.24.0.4 uptime 09:37:40 up 1 day, 13 min, 2 users, load average: 0.00, 0.00, 0.00 VMのIPアドレスに直接アクセス VMのIPアドレスにtraceroute 172.24.0.4 70 $ oc -n proj4 get vmi fedora-0 -oyaml | yq '.status.interfaces' - infoSource: domain, guest-agent interfaceName: enp1s0 ipAddress: 172.24.0.4 ipAddresses: - 172.24.0.4 - fe80::a8:37ff:fed5:96af linkState: up mac: 02:a8:37:d5:96:af name: default podInterfaceName: ovn-udn1 queueCount: 4
  64. ノード上のVRF cudn3のルーティングテーブル [core@wk3 ~]$ ip -4 -br addr show lo

    UNKNOWN 127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 ovn-k8s-mp8 UNKNOWN 172.23.2.2/24 ovn-k8s-mp9 UNKNOWN 172.24.0.2/16 vlan2001@enp3s0 UP 172.19.23.113/24 vlan2002@enp3s0 UP 172.19.24.113/24 [core@wk3 ~]$ ip -4 -br addr show vrf cudn3 ovn-k8s-mp8 UNKNOWN 172.23.2.2/24 vlan2001@enp3s0 UP 172.19.23.113/24 [core@wk3 ~]$ ip route show vrf cudn3 default nhid 2750 via 172.19.23.1 dev vlan2001 proto bgp metric 20 unreachable default metric 4278198272 10.200.0.0/16 via 169.254.0.4 dev br-ex mtu 1400 169.254.0.3 via 172.23.2.1 dev ovn-k8s-mp8 169.254.0.26 dev ovn-k8s-mp8 mtu 1400 172.18.20.0/24 nhid 2750 via 172.19.23.1 dev vlan2001 proto bgp metric 20 172.18.30.0/24 nhid 2750 via 172.19.23.1 dev vlan2001 proto bgp metric 20 172.19.23.0/24 dev vlan2001 proto kernel scope link src 172.19.23.113 metric 400 172.23.0.0/16 via 172.23.2.1 dev ovn-k8s-mp8 172.23.2.0/24 dev ovn-k8s-mp8 proto kernel scope link src 172.23.2.2 デフォルトVRFのインターフェースIPアドレス VRF cudn3 のインターフェースIPアドレス デフォルトVRFのルーティングテーブル VRF cudn3 のルーティングテーブル [core@wk3 ~]$ ip rule show 0: from all lookup local 30: from all fwmark 0x1745ec lookup 7 1000: from all lookup [l3mdev-table] 2000: from all fwmark 0x1008 lookup 2441 2000: from all to 169.254.0.26 lookup 2441 2000: from all fwmark 0x1009 lookup 2443 2000: from all to 169.254.0.28 lookup 2443 5999: from all fwmark 0x3f0 lookup main 32766: from all lookup main 32767: from all lookup default [core@wk3 ~]$ ip vrf show Name Table ----------------------- cudn3 2441 cudn4 2443 ルーティングルール VRFリスト BGP Adv: VRF Lite cudn3のパケットはvlan2001から入って きてVRF cudn3のルーティングテーブル にしたがって転送される VRF Liteの場合、CUDNの通信に default VRFのルーティングテーブルは使 われない 71 [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48
  65. ノード上のVRF cudn4のルーティングテーブル [core@wk3 ~]$ ip -4 -br addr show lo

    UNKNOWN 127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 ovn-k8s-mp8 UNKNOWN 172.23.2.2/24 ovn-k8s-mp9 UNKNOWN 172.24.0.2/16 vlan2001@enp3s0 UP 172.19.23.113/24 vlan2002@enp3s0 UP 172.19.24.113/24 [core@wk3 ~]$ ip -4 -br addr show vrf cudn4 ovn-k8s-mp9 UNKNOWN 172.24.0.2/16 vlan2002@enp3s0 UP 172.19.24.113/24 [core@wk3 ~]$ ip route show vrf cudn4 default nhid 2751 via 172.19.24.1 dev vlan2002 proto bgp metric 20 unreachable default metric 4278198272 10.200.0.0/16 via 169.254.0.4 dev br-ex mtu 1400 169.254.0.3 via 172.24.0.1 dev ovn-k8s-mp9 169.254.0.28 dev ovn-k8s-mp9 mtu 1400 172.18.20.0/24 nhid 2751 via 172.19.24.1 dev vlan2002 proto bgp metric 20 172.18.30.0/24 nhid 2751 via 172.19.24.1 dev vlan2002 proto bgp metric 20 172.19.24.0/24 dev vlan2002 proto kernel scope link src 172.19.24.113 metric 401 172.24.0.0/16 dev ovn-k8s-mp9 proto kernel scope link src 172.24.0.2 デフォルトVRFのインターフェースIPアドレス VRF cudn4 のインターフェースIPアドレス デフォルトVRFのルーティングテーブル VRF cudn4 のルーティングテーブル [core@wk3 ~]$ ip rule show 0: from all lookup local 30: from all fwmark 0x1745ec lookup 7 1000: from all lookup [l3mdev-table] 2000: from all fwmark 0x1008 lookup 2441 2000: from all to 169.254.0.26 lookup 2441 2000: from all fwmark 0x1009 lookup 2443 2000: from all to 169.254.0.28 lookup 2443 5999: from all fwmark 0x3f0 lookup main 32766: from all lookup main 32767: from all lookup default [core@wk3 ~]$ ip vrf show Name Table ----------------------- cudn3 2441 cudn4 2443 ルーティングルール VRFリスト cudn4のパケットはvlan2002から入って きてcudn3 VRFのルーティングテーブル にしたがって転送される BGP Adv: VRF Lite VRF Liteの場合、CUDNの通信に default VRFのルーティングテーブルは使 われない 72 [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48
  66. OpenShift Networking Summary 74 OVN-Kubernetes (Default Pod NW) OVN-Kubernetes (Layer2)

    OVN-Kubernetes (Layer3) OVN-Kubernetes (localnet) Misc CNI Plugins (bridge, macvlan, etc) Notes Default NIC ✅ ✅ (overlay on br-ex) ✅ (overlay on br-ex) ✅ (overlay on br-ex) ❌ 1st physical NIC (connected to the Machine Network, configured during installation) Additional NIC ❌ ❌ ❌ ✅ (overlay on ovs bridge created by NNCP) ✅ 2nd physical NIC (set up on Day 2 in NNCP) Pod Primary network ✅ ✅ ✅ ❌ ❌ The first network to which the Pod connects, which becomes the default gateway for the Pod Pod Secondary network ❌ ✅ (❌ for CUDN as of 4.20) ✅ (❌ for CUDN as of 4.20) ✅ ✅ The second and subsequent networks to which the Pod connects NAD ❌ ✅ ✅ ✅ ✅ Configure directly in the NAD CR (not via UDN) UDN ❌ ✅ ✅ ✅ (CUDN only) ❌ Set it with UDN CR (which will result in NAD) VLAN connectivity ❌ ❌ ❌ ✅ ✅ Can I connect directly to an external VLAN? BGP advertisement ✅ ✅ (CUDN only) ✅ (CUDN only) ❌ ❌ BGP adv with VRF Lite ❌ (default VRF only) ✅ (CUDN only) ✅ (CUDN only) ❌ ❌ Additional NIC only for external connectivity
  67. 参考文献 ▸ Split FRR - Proposal to move FRR to

    a stand alone component ・ https://github.com/metallb/metallb/blob/main/design/splitfrr-proposal.md ▸ blog: FRR-k8s as a BGP backend for MetalLB ・ https://www.redhat.com/ja/blog/frr-k8s-bgp-backend-metallb ▸ slide: Bringing routes to Kubernetes nodes via BGP: introducing frr-k8s ・ https://archive.fosdem.org/2024/schedule/event/fosdem-2024-1818-bringing-routes-to-kub ernetes-nodes-via-bgp-introducing-frr-k8s/ ▸ slide: MetalLB and FRR: a match made in heaven ・ https://archive.fosdem.org/2023/schedule/event/network_metallb_and_frr/ ▸ 75
  68. FRRConfigurationのマージ ▸ 基本方針: 複数のFRRConfigurationを、「コンフィグが拡張する (できることが増える)」方針でマージする ・ よりneighborを増やす ・ より多くのプレフィックスを許可する ▸

    流れ ・ 複数のFRRConfigurationで設定内容に矛盾がないかをチェックする ・ コンフリクトがあったらマージせず前の FRRConfigurationを使用する ・ エラーになる例: ・ 同じVRFで同じルータに対して異なる ASN設定がある ・ 同じneighbor(同じアドレス、ポート番号 )に対して異なるASN設定がある ・ 同じ名前で異なる設定内容の BFDプロファイルがある ・ ラベルセレクタで指定した各ノードに対して、マージした FRRのconfigを生成する ・ 全てのルータ設定を組み合わせる ・ 各ルータ設定内では、全てのプレフィックスと neighborをマージする ・ 各neighborでは、全てのフィルタをマージする ・ より多くの経路を扱えるフィルタを優先する 76