Slide 1

Slide 1 text

OpenShiftのBGPサポート 〜 MetalLB+FRR-k8s編 〜 2025-11-25 Manabu Ori v0.9 1

Slide 2

Slide 2 text

はじめに 2

Slide 3

Slide 3 text

はじめに ▸ 本資料は、MetalLB + FRR-k8sに関して2025年11月時点の情報を元に作成した資料です ▸ OpenShift v4.20にMetalLB Operatorをインストールして検証しました ・ 本資料で検証した機能のうち、 Egress ServiceはCNIプラグインOVN-Kubernetesの機能です ・ Egress Service以外は、OVN-Kubernetesを使わないKubernetes環境でも同じように動くと思います ▸ upstreamのドキュメントにはFRR-k8sが “Experimental” と表現されていますが、 v4.17以降のOpenShiftに MetalLB Operatorを入れるとデフォルトで FRR-k8s backend設定になります 3 https://metallb.io/concepts/bgp/#frr-k8s-mode

Slide 4

Slide 4 text

MetalLB ▸ クラウドのロードバランサーサービスがない環境 ) で type: loadbalancer のServiceを使うための仕組 み ▸ 2つの動作モード ・ L2モード: ・ GARPを飛ばすことでExternal IPへのトラフィックを吸い込む ・ どれか1台のノードがServiceのExternal IPを処理する ・ BGPモード: ・ External IPをBGPで広告する ・ 複数のノードがBGPで広告すると、対向ルータによって ECMPでロードバランスできる 4 本資料の対象は BGPモード

Slide 5

Slide 5 text

MetalLBのBGPモード ▸ 3種類のBGPバックエンドから選択 ・ Native ・ MetalLBの初期実装、今はほとんど使われていない ・ FRR ・ BGPスピーカとしてFRRoutingを使用する ・ speaker DaemonSet Podのサイドカーコンテナとして frrが動く ・ FRR-K8s ・ frrをspeaker Podから分離して、MetalLB以外のコンポーネントからも frrを使えるようにした新し い仕組み ・ MetalLB Operatorの内部実装的には2種類のデプロイ方法がある ・ MetalLB Operatorが直接frr-k8sをデプロイする ・ OpenShift上でかつ最近のバージョンであれば、 OpenShiftのCluster Network Operatorに依頼してfrr-k8sをデプロイしてもらう 5 本資料の対象は こっちのデプロイ方式

Slide 6

Slide 6 text

FRR Configuration FRR Configuration 登場人物 6 MetalLB Operator controller speaker Cluster Network Operator BGPPeer BGP Advertisement FRR Configuration frr-k8s frr-k8s frr frr config metallb-system openshift-frr-k8s Custom Resource Pod Namespace Container DaemonSet IPAddress Pool MetalLB Network openshift-ovn-kubernetes ovnkube-node ※ OVN-Kubernetesが生成する FRRConfigurationについては本資料では扱っていません openshift-network-operator Manage M anage Manage FRRConfigurationの マージ

Slide 7

Slide 7 text

検証環境 7

Slide 8

Slide 8 text

r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 検証環境 ▸ OpenShiftのノード(cp0..2, wk0..4)はルータr2配下にいて、 r2→r1経由でインターネットに出る。VRF Lite検証時は追加 NICをr3に接続する ・ cp[0-2]: 172.18.20.10[0-2] ・ wk[0-4]: 172.18.20.11[0-4] ▸ wk3, 4に worker-virt のラベルをつけている ・ 本資料ではこの2ノードがBGPピアを張ります ▸ r1, r2, r3はループバックアドレスでBGPピアを張っている ・ r1: 172.18.0.1 ・ r2: 172.18.0.2 ・ r3: 172.18.0.3 ▸ 172.18.99.0/24 は管理用裏ネットワークのアドレス ▸ AS番号やその他のアドレスは右図のとおり ▸ r1, r2, r3はVyOS 1.5-stream-2025-Q2 ▸ OpenShiftはv4.20.2 Router NAT Switch OpenShift node VM Container 8

Slide 9

Slide 9 text

VyOS configs r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 r2 r3 r1 set interfaces ethernet eth0 address '172.18.99.101/24' set interfaces ethernet eth1 address '172.18.10.2/24' set interfaces ethernet eth2 address '172.18.12.1/24' set interfaces ethernet eth3 address '172.18.13.1/24' set interfaces loopback lo address '172.18.0.1/32' set nat source rule 100 outbound-interface name 'eth1' set nat source rule 100 source address '0.0.0.0/0' set nat source rule 100 translation address 'masquerade' set protocols bgp address-family ipv4-unicast set protocols bgp neighbor 172.18.0.2 address-family ipv4-unicast default-originate set protocols bgp neighbor 172.18.0.2 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.2 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.2 remote-as '65102' set protocols bgp neighbor 172.18.0.2 update-source 'lo' set protocols bgp neighbor 172.18.0.3 address-family ipv4-unicast default-originate set protocols bgp neighbor 172.18.0.3 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.3 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.3 remote-as '65103' set protocols bgp neighbor 172.18.0.3 update-source 'lo' set protocols bgp parameters router-id '172.18.0.1' set protocols bgp system-as '65101' set protocols ospf area 0.0.0.0 network '172.18.0.1/32' set protocols ospf area 0.0.0.0 network '172.18.12.0/24' set protocols ospf area 0.0.0.0 network '172.18.13.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.1' set protocols static route 0.0.0.0/0 next-hop 172.18.10.1 set interfaces ethernet eth0 address '172.18.99.103/24' set interfaces ethernet eth1 address '172.18.13.2/24' set interfaces ethernet eth2 address '172.18.30.1/24' set interfaces ethernet eth2 vif 2001 address '172.19.21.1/24' set interfaces ethernet eth2 vif 2002 address '172.19.22.1/24' set interfaces loopback lo address '172.18.0.3/32' set protocols bgp address-family ipv4-unicast network 172.18.30.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp parameters router-id '172.18.0.3' set protocols bgp system-as '65103' set protocols ospf area 0.0.0.0 network '172.18.0.3/32' set protocols ospf area 0.0.0.0 network '172.18.13.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.3' 9 set interfaces ethernet eth0 address '172.18.99.102/24' set interfaces ethernet eth1 address '172.18.12.2/24' set interfaces ethernet eth2 address '172.18.20.1/24' set interfaces loopback lo address '172.18.0.2/32' set protocols bgp address-family ipv4-unicast network 172.18.20.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp parameters router-id '172.18.0.2' set protocols bgp system-as '65102' set protocols ospf area 0.0.0.0 network '172.18.0.2/32' set protocols ospf area 0.0.0.0 network '172.18.12.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.2'

Slide 10

Slide 10 text

各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 r2 r3 r1 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 11251 11255 1942 0 0 6d13h01m 1 3 N/A 172.18.0.3 4 65103 11361 11248 1942 0 0 6d13h01m 1 3 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11255 11250 1940 0 0 6d13h01m 2 3 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11247 11361 12427 0 0 6d13h01m 2 3 N/A 10 [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i

Slide 11

Slide 11 text

OpenShift環境 ▸ v4.20.2 UPI on libvirt VMs ・ (注意) BGP機能がサポートされ るのはベアメタルのみです ▸ 追加Operator ・ MetalLB Operator ・ Nmstate ▸ Cluster Network Operatorの設定を右 のように変更 $ oc get network.operator cluster -o yaml | yq .spec additionalRoutingCapabilities: providers: - FRR clusterNetwork: - cidr: 10.128.0.0/16 hostPrefix: 24 defaultNetwork: ovnKubernetesConfig: egressIPConfig: {} gatewayConfig: ipForwarding: Global ipv4: {} ipv6: {} routingViaHost: true genevePort: 6081 ipsecConfig: mode: Disabled mtu: 1400 policyAuditConfig: destination: "null" maxFileSize: 50 maxLogFiles: 5 rateLimit: 20 syslogFacility: local0 routeAdvertisements: Enabled type: OVNKubernetes deployKubeProxy: false disableMultiNetwork: false disableNetworkDiagnostics: false logLevel: Normal managementState: Managed observedConfig: null operatorLogLevel: Normal serviceNetwork: - 10.200.0.0/16 unsupportedConfigOverrides: null useMultiNetworkPolicy: false BGPを使うのに必要 (MetalLB Operatorが自 動的に設定します) VRF設定時に必要 BGPを使うのに必要 (MetalLB Operatorが自 動的に設定します) 11

Slide 12

Slide 12 text

MetalLB ServiceのExternal IPを BGPで広告する 12

Slide 13

Slide 13 text

OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: v1 kind: Namespace metadata: name: proj1 apiVersion: v1 kind: Service metadata: labels: app: hello name: hello-lb-l3 annotations: metallb.io/address-pool: pool-l3 metallb.io/loadBalancerIPs: 172.19.20.181 spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: deployment: hello type: LoadBalancer apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Service Deployment 65801 13 hello-lb-l3 BGP Adv: MetalLB

Slide 14

Slide 14 text

OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 14 hello-lb-l3 apiVersion: metallb.io/v1beta2 kind: BGPPeer metadata: name: bgppeer-r2 namespace: metallb-system spec: myASN: 65801 peerAddress: 172.18.20.1 peerASN: 65102 #ebgpMultiHop: true nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: bgpadv1 namespace: metallb-system spec: ipAddressPools: - pool-l3 peers: - bgppeer-r2 nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" BGPPeer BGPAdvertisement apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: namespace: metallb-system name: pool-l3 spec: addresses: - 172.19.20.181-172.19.20.189 autoAssign: false IPAddressPool BGP Adv: MetalLB

Slide 15

Slide 15 text

MetalLBが生成するFRRConfiguration ▸ MetalLB用のFRRConfigurationは全ノード分生成さ れる 15 $ oc -n openshift-frr-k8s get frrconfiguration NAME AGE metallb-cp0 4h30m metallb-cp1 4h30m metallb-cp2 4h30m metallb-wk0 4h30m metallb-wk1 4h30m metallb-wk2 4h30m metallb-wk3 4h21m metallb-wk4 4h30m $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk0 -o yaml apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: ... spec: bgp: routers: [] nodeSelector: matchLabels: kubernetes.io/hostname: wk0 raw: {} ▸ MetalLBの経路広告をしていないノードについては、 中身はからっぽ BGP Adv: MetalLB

Slide 16

Slide 16 text

MetalLBが生成するFRRConfiguration 16 $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk3 -o yaml apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: ... spec: bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.181/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.181/32 nodeSelector: matchLabels: kubernetes.io/hostname: wk3 raw: {} ▸ MetalLBの経路広告をしているノードの FRRConfiguration BGP Adv: MetalLB

Slide 17

Slide 17 text

対向ルータr2の設定 17 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 [email protected]:~$ show configuration commands | match bgp set protocols bgp address-family ipv4-unicast network 172.18.20.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp neighbor 172.18.20.113 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.113 remote-as '65801' set protocols bgp neighbor 172.18.20.114 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.114 remote-as '65801' set protocols bgp parameters router-id '172.18.0.2' set protocols bgp system-as '65102' r2から各ノードへの ピア設定を追加 BGP Adv: MetalLB

Slide 18

Slide 18 text

各ルータのBGP neighbor status, BGP tables 18 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 25711 25705 2017 0 0 02w2d13h 2 4 N/A 172.18.0.3 4 65103 25814 25697 2017 0 0 02w2d13h 1 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i *> 172.19.20.181/32 172.18.0.2 0 65102 65801 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 25696 25814 12505 0 0 02w2d13h 3 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i *> 172.19.20.181/32 172.18.0.1 0 65101 65102 65801 i r1 r2 r3 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 25705 25710 2028 0 0 02w2d13h 2 4 N/A 172.18.20.113 4 65801 1758 1797 2028 0 0 00:28:20 1 4 N/A 172.18.20.114 4 65801 1754 1785 2028 0 0 00:28:20 1 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i *= 172.19.20.181/32 172.18.20.113 0 0 65801 i *> 172.18.20.114 0 0 65801 i Serviceの/32の経路 BGP Adv: MetalLB

Slide 19

Slide 19 text

frrのrunning-config 19 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.19.20.181/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! ip prefix-list 172.18.20.1-inpl-ipv4 seq 1 deny any ip prefix-list 172.18.20.1-allowed-ipv4 seq 1 permit 172.19.20.181/32 ! ... ! route-map 172.18.20.1-out permit 1 match ip address prefix-list 172.18.20.1-allowed-ipv4 exit ! ... ! route-map 172.18.20.1-in permit 3 match ip address prefix-list 172.18.20.1-inpl-ipv4 exit ! route-map 172.18.20.1-in permit 4 match ipv6 address prefix-list 172.18.20.1-inpl-ipv4 exit ! ServiceのExternal IPを 広告 対向ルータr2との ピア設定 カスタムリソース FRRNodeState から、各ノードの frrの running-configが見れる 外から広告された経路は 受け取らない BGP Adv: MetalLB

Slide 20

Slide 20 text

ノードwk3のfrrにvtyshで入る 20 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' BGP table version is 1, local router ID is 172.18.20.113, vrf id 0 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 172.19.20.181/32 0.0.0.0 0 32768 i Displayed 1 routes and 1 total paths 対向ルータr2との ピアリング状況 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' IPv4 Unicast Summary (VRF default): BGP router identifier 172.18.20.113, local AS number 65801 vrf-id 0 BGP table version 1 RIB entries 1, using 192 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 52 45 0 0 0 00:40:07 0 1 N/A Total number of neighbors 1 BGP Adv: MetalLB

Slide 21

Slide 21 text

ServiceのIPアドレスに直接アクセスする 21 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 PodのIPアドレス $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-zkwst 1/1 Running 2 9d 10.128.7.14 wk3 [test@testvm0 ~]$ sudo tcptraceroute -n 172.19.20.181 80 Running: traceroute -T -O info -n -p 80 172.19.20.181 traceroute to 172.19.20.181 (172.19.20.181), 30 hops max, 60 byte packets 1 172.18.10.2 0.289 ms 0.259 ms 0.246 ms 2 172.18.12.2 0.691 ms 0.675 ms 0.654 ms 3 172.19.20.181 1.869 ms 1.851 ms 1.837 ms 4 * * * 5 * * * 6 172.19.20.181 12.712 ms 10.887 ms 12.527 ms 7 172.19.20.181 12.696 ms 12.423 ms 12.352 ms [test@testvm0 ~]$ curl http://172.19.20.181 Hello, World! Timestamp: 2025/11/21 02:33:12 Hostname: hello-c84644886-zkwst LocalAddress: 10.128.7.14 Gateway: 10.128.7.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 172.19.20.181 RemoteAddress: 100.64.0.8:47212 Service のIPアドレスに直接アクセス Service のIPアドレスにtcptraceroute Service のIPアドレス $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-lb-l3 LoadBalancer 10.200.182.169 172.19.20.181 80:30224/TCP 141m BGP Adv: MetalLB

Slide 22

Slide 22 text

MetalLB + VRF VRFをわけて別NICから Serviceアドレスを広告する 22 ※VRFをわける機能は、OpenShift v4.20ではTechPreviewです

Slide 23

Slide 23 text

MetalLB + VRF ▸ ノードのデフォルトNICとは異なるインターフェースから ServiceのExternal IPをBGPで広告します ▸ BGPスピーカとなる各ノードで VRFインターフェースを作成する必要があります ・ OpenShiftではNMState Operatorのカスタムリソース NodeNetworkConfigurationPolicy で 設定します ・ NMState Operatorがない環境では、手動で VRFを作成しても構いません ▸ MetalLBでのVRFの使用は、 OpenShift v4.20ではTechPreviewです 23

Slide 24

Slide 24 text

OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 24 lbsvc-vrf1 apiVersion: v1 kind: Namespace metadata: name: proj1 apiVersion: v1 kind: Service metadata: labels: app: hello name: lbsvc-vrf1 annotations: metallb.io/address-pool: pool-vrf1 metallb.io/loadBalancerIPs: 172.19.20.190 spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: deployment: hello type: LoadBalancer apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Service Deployment MetalLB+VRF

Slide 25

Slide 25 text

OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 25 lbsvc-vrf1 apiVersion: metallb.io/v1beta2 kind: BGPPeer metadata: name: bgppeer-vrf1 namespace: metallb-system spec: myASN: 65801 peerAddress: 172.19.11.1 peerASN: 65103 vrf: vrf1 #ebgpMultiHop: true nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: bgpadv-vrf1 namespace: metallb-system spec: ipAddressPools: - pool-vrf1 peers: - bgppeer-vrf1 nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" BGPPeer BGPAdvertisement apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: namespace: metallb-system name: pool-vrf1 spec: addresses: - 172.19.20.190-172.19.20.194 autoAssign: false IPAddressPool MetalLB+VRF

Slide 26

Slide 26 text

OpenShiftの設定 (3) r1 net10 12 net13 r2 r3 20 .18.10.0/24 2.0/24 172.18.13.0/24 0.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 26 lbsvc-vrf1 routes: config: - destination: 0.0.0.0/0 metric: 150 next-hop-address: 172.19.11.1 next-hop-interface: vlan1001 table-id: 11 route-rules: config: - ip-to: 10.200.0.0/16 priority: 998 route-table: 254 - ip-to: 10.128.0.0/16 priority: 998 route-table: 254 - ip-to: 169.254.0.0/17 priority: 998 route-table: 254 apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: enp3s0-wk3 spec: nodeSelector: kubernetes.io/hostname: wk3 desiredState: interfaces: - name: vrf1 type: vrf state: up vrf: port: - vlan1001 route-table-id: 11 ipv4: dhcp: false enabled: false - name: vlan1001 type: vlan state: up ipv4: address: - ip: 172.19.11.113 prefix-length: 24 dhcp: false enabled: true vlan: base-iface: enp3s0 id: 1001 - name: enp3s0 type: ethernet state: up ipv4: dhcp: false enabled: false MetalLB+VRF

Slide 27

Slide 27 text

routes: config: - destination: 0.0.0.0/0 metric: 150 next-hop-address: 172.19.11.1 next-hop-interface: vlan1001 table-id: 11 route-rules: config: - ip-to: 10.200.0.0/16 priority: 998 route-table: 254 - ip-to: 10.128.0.0/16 priority: 998 route-table: 254 - ip-to: 169.254.0.0/17 priority: 998 route-table: 254 apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: enp3s0-wk3 spec: nodeSelector: kubernetes.io/hostname: wk3 desiredState: interfaces: - name: vrf1 type: vrf state: up vrf: port: - vlan1001 route-table-id: 11 ipv4: dhcp: false enabled: false - name: vlan1001 type: vlan state: up ipv4: address: - ip: 172.19.11.113 prefix-length: 24 dhcp: false enabled: true vlan: base-iface: enp3s0 id: 1001 - name: enp3s0 type: ethernet state: up ipv4: dhcp: false enabled: false NNCP (NodeNetworkConfigurationPolicy) の設定 ▸ 追加NICの設定にはNNCPを使いた い、がNNCPはノード個別の設定 (IPア ドレス等)をするのは苦手 ・ 泥臭く nodeSelector を使っ てノードごとにNNCPのmanifest を書くしか... ▸ VRF設定には、VRFに所属するイン ターフェースとルーティングルール番 号を書く 27 ノード固有の設定 VRFの設定 Pod, Service等への通信 はdefault VRFを使う

Slide 28

Slide 28 text

- asn: 65801 neighbors: - address: 172.19.11.1 asn: 65103 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.190/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.190/32 vrf: vrf1 nodeSelector: matchLabels: kubernetes.io/hostname: wk3 raw: {} MetalLBが生成するFRRConfiguration 28 $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk3 -o yaml apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: creationTimestamp: "2025-11-21T00:20:43Z" generation: 25 name: metallb-wk3 namespace: openshift-frr-k8s resourceVersion: "27291209" uid: 5950248d-a6d8-4ee9-bed0-247e6daf24ef spec: bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.181/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.181/32 vrf1のコンフィグ MetalLB+VRF

Slide 29

Slide 29 text

対向ルータr3の設定 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 29 lbsvc-vrf1 [email protected]:~$ sh conf comm | match bgp set protocols bgp address-family ipv4-unicast network 172.18.30.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp neighbor 172.19.11.113 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.19.11.113 remote-as '65801' set protocols bgp neighbor 172.19.11.114 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.19.11.114 remote-as '65801' set protocols bgp parameters router-id '172.18.0.3' set protocols bgp system-as '65103' r3から各ノードへの ピア設定を追加 [email protected]:~$ sh conf comm | match vif set interfaces ethernet eth2 vif 1001 address '172.19.11.1/24' MetalLB+VRF

Slide 30

Slide 30 text

各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 30 lbsvc-vrf1 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 31837 31826 2057 0 0 02w6d18h 1 4 N/A 172.18.0.3 4 65103 31935 31818 2057 0 0 02w6d18h 2 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i *> 172.19.20.190/32 172.18.0.3 0 65103 65801 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 31817 31935 12545 0 0 02w6d18h 2 4 N/A 172.19.11.113 4 65801 140 147 12545 0 0 02:15:41 1 4 N/A 172.19.11.114 4 65801 140 147 12545 0 0 02:15:41 1 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i *> 172.19.20.190/32 172.19.11.113 0 0 65801 i *= 172.19.11.114 0 0 65801 i r1 r2 r3 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 31826 31836 2077 0 0 02w6d18h 3 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i *> 172.19.20.190/32 172.18.0.1 0 65101 65103 65801 i Serviceの/32の経路 MetalLB+VRF

Slide 31

Slide 31 text

frrのrunning-config r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 31 lbsvc-vrf1 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.19.20.181/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! router bgp 65801 vrf vrf1 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.19.11.1 remote-as 65103 ! address-family ipv4 unicast network 172.19.20.190/32 neighbor 172.19.11.1 activate neighbor 172.19.11.1 route-map 172.19.11.1-vrf1-in in neighbor 172.19.11.1 route-map 172.19.11.1-vrf1-out out exit-address-family exit ! ... vrf1のコンフィグ MetalLB+VRF

Slide 32

Slide 32 text

ノードwk3のfrrにvtyshで入る r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 32 lbsvc-vrf1 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp vrf vrf1' BGP table version is 3, local router ID is 172.19.11.113, vrf id 465 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 172.19.20.190/32 0.0.0.0 0 32768 i Displayed 1 routes and 1 total paths $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp vrf vrf1 summary' IPv4 Unicast Summary (VRF vrf1): BGP router identifier 172.19.11.113, local AS number 65801 vrf-id 465 BGP table version 3 RIB entries 1, using 192 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.19.11.1 4 65103 32 24 0 0 0 00:15:10 0 1 N/A Total number of neighbors 1 対向ルータr3との ピアリング状況 MetalLB+VRF

Slide 33

Slide 33 text

ServiceのIPアドレスに直接アクセスする r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 33 lbsvc-vrf1 PodのIPアドレス $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-zkwst 1/1 Running 2 14d 10.128.7.14 wk3 [test@testvm0 ~]$ sudo tcptraceroute 172.19.20.190 80 Running: traceroute -T -O info -p 80 172.19.20.190 traceroute to 172.19.20.190 (172.19.20.190), 30 hops max, 60 byte packets 1 _gateway (172.18.10.2) 0.332 ms 0.274 ms * 2 172.18.13.2 (172.18.13.2) 0.575 ms * * 3 172.19.20.190 (172.19.20.190) 0.906 ms * * 4 * * * 5 * * * 6 172.19.20.190 (172.19.20.190) 6.194 ms * * [test@testvm0 ~]$ curl http://172.19.20.190 Hello, World! Timestamp: 2025/11/25 07:20:46 Hostname: hello-c84644886-zkwst LocalAddress: 10.128.7.14 Gateway: 10.128.7.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 172.19.20.190 RemoteAddress: 100.64.0.9:39432 Service のIPアドレスに直接アクセス Service のIPアドレスにtcptraceroute Service のIPアドレス $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE lbsvc-vrf1 LoadBalancer 10.200.101.103 172.19.20.190 80:31095/TCP 110m MetalLB+VRF

Slide 34

Slide 34 text

ノードwk3上のルーティングテーブル 34 [core@wk3 ~]$ ip -4 -br addr show lo UNKNOWN 127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 vlan1001@enp3s0 UP 172.19.11.113/24 [core@wk3 ~]$ ip -4 -br addr show vrf vrf1 vlan1001@enp3s0 UP 172.19.11.113/24 [core@wk3 ~]$ ip route show vrf vrf1 default via 172.19.11.1 dev vlan1001 proto static metric 150 172.19.11.0/24 dev vlan1001 proto kernel scope link src 172.19.11.113 metric 400 デフォルトVRFのインターフェースIPアドレス VRF vrf1 のインターフェースIPアドレス デフォルトVRFのルーティングテーブル VRF vrf1 のルーティングテーブル [core@wk3 ~]$ ip rule show 0: from all lookup local 30: from all fwmark 0x1745ec lookup 7 998: from all to 10.128.0.0/16 lookup main proto static 998: from all to 10.200.0.0/16 lookup main proto static 998: from all to 169.254.0.0/17 lookup main proto static 1000: from all lookup [l3mdev-table] 5999: from all fwmark 0x3f0 lookup main 32766: from all lookup main 32767: from all lookup default [core@wk3 ~]$ ip vrf show Name Table ----------------------- vrf1 11 ルーティングルール VRF [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 MetalLB+VRF

Slide 35

Slide 35 text

経路の非対称性 35 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 外からServiceにアクセスするときはvrf1を通る Podから外に出るときはdefault vrfを通る MetalLB+VRF

Slide 36

Slide 36 text

36 MetalLB+VRF r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 対向ルータでtcpdump curl http://172.19.20.190 Podから外に出るときはdefault vrfを通る 行きはr3を通る 帰りはr2を通る

Slide 37

Slide 37 text

Podから外に出るときはdefault vrfを通る 37 MetalLB+VRF r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 Podから外に出るときはdefault vrfを通る [root@wk3 ~]# nsenter -t $(crictl inspect $(crictl ps | awk '/hello/ {print $1}') | jq -r .info.pid) -n tracepath -n 172.18.10.90 1?: [LOCALHOST] pmtu 1400 1: 172.18.10.90 1.686ms asymm 2 1: 172.18.10.90 0.675ms asymm 2 2: 10.128.7.2 1.180ms 3: 172.18.20.1 1.303ms 4: 172.18.12.1 1.042ms 5: 172.18.10.90 1.277ms reached Resume: pmtu 1400 hops 5 back 5 testvm0 のIPアドレスにtraceroute

Slide 38

Slide 38 text

MetalLB +VRF +EgressService 38 ※Egress Serviceは、OpenShift v4.20ではTechPreviewです

Slide 39

Slide 39 text

Egress Service ▸ Egress Serviceは、「MetalLBでVRFを切った場合でも、Podから出るパケットはdefault VRFを通る」という課 題に対応するための OVN-Kubernetesの機能です ▸ BGP広告している type: LoadBalancer のServiceに対して、対応するEgressServiceを作成します ・ EgressServiceでは以下を設定します ・ 紐づけるService ・ どのVRF (ルーティングルール番号 ) を通すか ・ パケットのソースアドレスを何にするか (ServiceのExternal IP or ノードのアドレス) ▸ Egress ServiceはOpenShift v4.20ではTechPreviewです 39

Slide 40

Slide 40 text

Egress Service ▸ .metadata.name , .metadata.namespace は、紐づけるLoadBalancer Serviceと同一にする ▸ .spec.network にはLoadBalancer Serviceを広告するVRFのルーティングルール番号を指定する ▸ .spec.sourceIPBy の設定によってソースIPアドレスが変わる ・ sourceIPBy: LoadBalancer ➔ ServiceのExternal IPがソースIPアドレスとなる ・ sourceIPBy: Network ➔ ノードのvrfがソースIPアドレスとなる 40 apiVersion: v1 kind: Service metadata: labels: app: hello name: lbsvc-vrf1 annotations: metallb.io/address-pool: pool-vrf1 metallb.io/loadBalancerIPs: 172.19.20.190 spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: deployment: hello type: LoadBalancer apiVersion: k8s.ovn.org/v1 kind: EgressService metadata: name: lbsvc-vrf1 spec: sourceIPBy: "Network" nodeSelector: matchLabels: node-role.kubernetes.io/worker-virt: "" network: "11" Service EgressService

Slide 41

Slide 41 text

Egress Serviceを使うと... 41 MetalLB+VRF+Egress Service r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 外からServiceにアクセスするときはvrf1を通る Podから外に出るときもvrf1を通る

Slide 42

Slide 42 text

Egress Serviceを使うと... 42 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 Podから外に出るときはdefault vrfを通る [root@wk3 ~]# nsenter -t $(crictl inspect $(crictl ps | awk '/hello/ {print $1}') | jq -r .info.pid) -n tracepath -n 172.18.10.90 1?: [LOCALHOST] pmtu 1400 1: 172.18.10.90 2.026ms asymm 2 1: 172.18.10.90 0.693ms asymm 2 2: 10.128.7.2 1.140ms 3: 172.19.11.1 1.094ms 4: 172.18.13.1 1.171ms 5: 172.18.10.90 1.071ms reached Resume: pmtu 1400 hops 5 back 5 testvm0 のIPアドレスにtraceroute MetalLB+VRF+Egress Service

Slide 43

Slide 43 text

[core@wk3 ~]$ ip rule show 0: from all lookup local 30: from all fwmark 0x1745ec lookup 7 998: from all to 10.128.0.0/16 lookup main proto static 998: from all to 10.200.0.0/16 lookup main proto static 998: from all to 169.254.0.0/17 lookup main proto static 1000: from all lookup [l3mdev-table] 5000: from 10.128.7.14 lookup 11 5000: from 10.200.101.103 lookup 11 5999: from all fwmark 0x3f0 lookup main 32766: from all lookup main 32767: from all lookup default ノードwk3上のルーティングテーブル 43 [core@wk3 ~]$ ip -4 -br addr show lo UNKNOWN 127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 vlan1001@enp3s0 UP 172.19.11.113/24 [core@wk3 ~]$ ip -4 -br addr show vrf vrf1 vlan1001@enp3s0 UP 172.19.11.113/24 [core@wk3 ~]$ ip route show vrf vrf1 default via 172.19.11.1 dev vlan1001 proto static metric 150 172.19.11.0/24 dev vlan1001 proto kernel scope link src 172.19.11.113 metric 400 デフォルトVRFのインターフェースIPアドレス VRF vrf1 のインターフェースIPアドレス デフォルトVRFのルーティングテーブル VRF vrf1 のルーティングテーブル [core@wk3 ~]$ ip vrf show Name Table ----------------------- vrf1 11 ルーティングルール VRF [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 ソースアドレスが該当 Pod, Serviceのアドレスの場合は ここのルールに引っかけて ... こっちのルーティングテーブルに したがって転送される MetalLB+VRF+Egress Service

Slide 44

Slide 44 text

sourceIPBy: LoadBalancerのとき 44 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 対向ルータでtcpdump curl http://testvm0 ルータr2上のtcpdump ルータr3上のtcpdump ソースアドレスは Serviceのアドレス MetalLB+VRF+Egress Service

Slide 45

Slide 45 text

sourceIPBy: LoadBalancerのとき 45 [root@wk3 /]# tcpdump -nni any port 80 and host 172.18.10.90 ... 14:17:55.163503 ad92f172b775705 P IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [S], seq 988630598, win 65280, options [mss 1360,sackOK,TS val 1355935878 ecr 0,nop,wscale 7], length 0 14:17:55.164363 ovn-k8s-mp0 In IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [S], seq 988630598, win 65280, options [mss 1360,sackOK,TS val 1355935878 ecr 0,nop,wscale 7], length 0 14:17:55.164398 vlan1001 Out IP 172.19.20.190.47726 > 172.18.10.90.80: Flags [S], seq 988630598, win 65280, options [mss 1360,sackOK,TS val 1355935878 ecr 0,nop,wscale 7], length 0 14:17:55.164991 enp3s0 In IP13 (invalid) 14:17:55.164994 vlan1001 In IP 172.18.10.90.80 > 172.19.20.190.47726: Flags [S.], seq 1746093189, ack 988630599, win 65160, options [mss 1460,sackOK,TS val 120362348 ecr 1355935878,nop,wscale 7], length 0 14:17:55.165039 ovn-k8s-mp0 Out IP 172.18.10.90.80 > 10.128.7.14.47726: Flags [S.], seq 1746093189, ack 988630599, win 65160, options [mss 1460,sackOK,TS val 120362348 ecr 1355935878,nop,wscale 7], length 0 14:17:55.165732 ad92f172b775705 Out IP 172.18.10.90.80 > 10.128.7.14.47726: Flags [S.], seq 1746093189, ack 988630599, win 65160, options [mss 1460,sackOK,TS val 120362348 ecr 1355935878,nop,wscale 7], length 0 14:17:55.165796 ad92f172b775705 P IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 0 14:17:55.165843 ad92f172b775705 P IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 76: HTTP: GET / HTTP/1.1 14:17:55.166157 ovn-k8s-mp0 In IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 0 14:17:55.166173 vlan1001 Out IP 172.19.20.190.47726 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 0 14:17:55.166232 ovn-k8s-mp0 In IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 76: HTTP: GET / HTTP/1.1 14:17:55.166247 vlan1001 Out IP 172.19.20.190.47726 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 76: HTTP: GET / HTTP/1.1 ノード上でtcpdump MetalLB+VRF+Egress Service

Slide 46

Slide 46 text

sourceIPBy: Networkのとき 46 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 対向ルータでtcpdump curl http://testvm0 ルータr2上のtcpdump ルータr3上のtcpdump ソースアドレスは vrf1内の VLANのアドレス MetalLB+VRF+Egress Service

Slide 47

Slide 47 text

MetalLBで外から受けた 経路をインポートする 47

Slide 48

Slide 48 text

OpenShiftの設定 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: name: receive-all namespace: openshift-frr-k8s spec: bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 toReceive: allowed: mode: all FRRConfiguration 65801 48 hello-lb-l3 MetalLB route import 受けた経路を全て インポートする設定

Slide 49

Slide 49 text

MetalLBが生成するFRRConfiguration 49 $ oc -n openshift-frr-k8s get frrconfiguration receive-all -o yaml | yq .spec bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false toReceive: allowed: mode: all $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk3 -o yaml | yq .spec bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.181/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.181/32 nodeSelector: matchLabels: kubernetes.io/hostname: wk3 raw: {} この2つのFRRConfigurationがマージされて、外部 からの経路を受け取れるようになる 手で追加した FRRConfiguration MetalLBが生成した FRRConfiguration MetalLB route import

Slide 50

Slide 50 text

FRRConfigurationのマージ ▸ 基本方針: 複数のFRRConfigurationを、「コンフィグが拡張する (できることが増える)」方針でマージする ・ よりneighborを増やす ・ より多くのプレフィックスを許可する ▸ 流れ ・ 複数のFRRConfigurationで設定内容に矛盾がないかをチェックする ・ コンフリクトがあったらマージせず前の FRRConfigurationを使用する ・ エラーになる例: ・ 同じVRFで同じルータに対して異なる ASN設定がある ・ 同じneighbor(同じアドレス、ポート番号 )に対して異なるASN設定がある ・ 同じ名前で異なる設定内容の BFDプロファイルがある ・ ラベルセレクタで指定した各ノードに対して、マージした FRRのconfigを生成する ・ 全てのルータ設定を組み合わせる ・ 各ルータ設定内では、全てのプレフィックスと neighborをマージする ・ 各neighborでは、全てのフィルタをマージする ・ より多くの経路を扱えるフィルタを優先する 50 MetalLB route import

Slide 51

Slide 51 text

frrのrunning-config 51 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.19.20.181/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! ip prefix-list 172.18.20.1-inpl-ipv4 seq 1 permit any ip prefix-list 172.18.20.1-allowed-ipv4 seq 1 permit 172.19.20.181/32 ! ... ! route-map 172.18.20.1-out permit 1 match ip address prefix-list 172.18.20.1-allowed-ipv4 exit ! ... ! route-map 172.18.20.1-in permit 3 match ip address prefix-list 172.18.20.1-inpl-ipv4 exit ! route-map 172.18.20.1-in permit 4 match ipv6 address prefix-list 172.18.20.1-inpl-ipv4 exit ! ServiceのExternal IPを 広告 対向ルータr2との ピア設定 カスタムリソース FRRNodeState から、各ノードの frrの running-configが見れる 外から広告された経路を受 け取る MetalLB route import

Slide 52

Slide 52 text

ノードwk3のfrrにvtyshで入る 52 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' BGP table version is 9, local router ID is 172.18.20.113, vrf id 0 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.20.1 0 65102 65101 i *> 172.18.20.0/24 172.18.20.1 0 0 65102 i *> 172.18.30.0/24 172.18.20.1 0 65102 65101 65103 i *> 172.19.11.0/24 172.18.20.1 0 65102 65101 65103 i *> 172.19.20.181/32 0.0.0.0 0 32768 i Displayed 5 routes and 5 total paths 対向ルータr2との ピアリング状況 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' IPv4 Unicast Summary (VRF default): BGP router identifier 172.18.20.113, local AS number 65801 vrf-id 0 BGP table version 9 RIB entries 8, using 1536 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 2257 2241 0 0 0 14:46:16 4 1 N/A Total number of neighbors 1 外から受け取った経路 MetalLB route import

Slide 53

Slide 53 text

ノードwk3上のルーティングテーブル 53 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 172.18.30.0/24 nhid 890 via 172.18.20.1 dev br-ex proto bgp metric 20 172.19.11.0/24 nhid 890 via 172.18.20.1 dev br-ex proto bgp metric 20 BGPで受け取った経路 MetalLB route import

Slide 54

Slide 54 text

Appendix 54

Slide 55

Slide 55 text

参考文献 ▸ 本資料ではShowNet Iconを使わせていただきました ・ https://github.com/interop-tokyo-shownet/shownet-icons ▸ MetalLB ・ https://metallb.io/ ・ https://github.com/metallb/metallb ▸ FRR-k8s ・ https://github.com/metallb/frr-k8s ▸ FRRouting ・ https://frrouting.org/ ・ https://github.com/FRRouting/frr 55

Slide 56

Slide 56 text

参考文献 ▸ Split FRR - Proposal to move FRR to a stand alone component ・ https://github.com/metallb/metallb/blob/main/design/splitfrr-proposal.md ▸ blog: FRR-k8s as a BGP backend for MetalLB ・ https://www.redhat.com/ja/blog/frr-k8s-bgp-backend-metallb ▸ slide: Bringing routes to Kubernetes nodes via BGP: introducing frr-k8s ・ https://archive.fosdem.org/2024/schedule/event/fosdem-2024-1818-bringing-routes-to-kub ernetes-nodes-via-bgp-introducing-frr-k8s/ ▸ slide: MetalLB and FRR: a match made in heaven ・ https://archive.fosdem.org/2023/schedule/event/network_metallb_and_frr/ 56

Slide 57

Slide 57 text

MetalLB OperatorとOpenShift ▸ OpenShiftはv4.19.14から、CNIプラグインのOVN-KubernetesがBGP対応しており、MetalLBと OVN-Kubernetesがfrr-k8sを共有する ▸ Cluster Network Operator (CNO) のカスタムリソースを設定すると frr-k8sがDaemonSetとして openshift-frr-k8s namespaceにデプロイされる ▸ MetalLB Operatorは、CNOがfrr-k8sをデプロイ済みであればそれを使用し、そうでなければ CNOの設定 をして、CNOにfrr-k8sをデプロイしてもらう 57

Slide 58

Slide 58 text

MetalLB OperatorとOpenShift ▸ OpenShift上で動いていれば、バックエンドを frr-k8sにし、frr-k8sをCluster NetworkOperatorからデプロイ するようにする ・ route.openshift.io のAPIグループがあればOpenShiftと判断する 58 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/platform/platform.go#L95-L100

Slide 59

Slide 59 text

MetalLB OperatorとOpenShift ▸ OpenShift上で動いていれば、バックエンドを frr-k8sにし、frr-k8sをCluster NetworkOperatorからデプロイ するようにする ・ OpenShift上、かつ環境変数 DEPLOY_FRRK8S_FROM_CNO が true であればBGPのバックエンド を frr-k8s-external にする 59 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/params/params.go#L136-L138 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/params/params.go#L23-L25

Slide 60

Slide 60 text

MetalLB OperatorとOpenShift ▸ OpenShift上で動いていれば、バックエンドを frr-k8sにし、frr-k8sをCluster NetworkOperatorからデプロイ するようにする ・ CNOのカスタムリソースnetwork.operatorの additionalRoutingCapabilities を設定する ➔ CNOが openshift-frr-k8s namespaceにfrr-k8sをデプロイする 60 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/openshift/openshift.go#L44-L62

Slide 61

Slide 61 text

linkedin.com/company/red-hat youtube.com/OpenShift facebook.com/redhatinc twitter.com/OpenShift Thank you 61