Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE’s DCN Meets SRv6 ;)

LINE’s DCN Meets SRv6 ;)

https://www.nic.ad.jp/iw2019/program/s04/
S4 Latest Trend of Data Center Network Protocol

LINEのデータセンターネットワークについて

LINE Developers

November 26, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. S4 Latest Trend of Data Center Network Protocol LINE’s DCN

    Meets SRv6 ;) Internet Week 2019 Hiroki Shirokura Software Engineer, LINE corp @slankdev <[email protected]>
  2. LINE :: Services and Infrastructure Problem Statement Production Infrastructure Development

    Infrastructure Exclusive Infrastructure Common services (messenger, family service, …) Fintech services ... •Many fragmented infrastructure :( ◦ Many works to design and build network ◦ Lack of infrastructure flexibility On premises Infrastructure Integrated Infrastructure is described at here https://engineering.linecorp.com/en/blog/openstack-summit-vancouver-2018-recap-2-2/ https://www.janog.gr.jp/meeting/janog43/application/files/7915/4823/1858/janog43-line-kobayashi.pdf Exclusive Infrastructure
  3. VXLAN Pros • wider device support • many users Cons

    • complexity for L2 extension • need additional proto for SFC SRv6 Pros • flexible instructions with SID • IP-Fabric awareness Cons • less device support • less users and examples We’ve decided to adopt SRv6 for its innovative design. Multi-tenancy :: Which Technology
  4. DataCenter SRv6 Domain CLOS Network Network Node (SRv6 Node) Network

    Node (SRv6 Node) Router Switch Switch Switch Switch Switch Switch Hypervisor (SRv6 Node) Hypervisor (SRv6 Node) Hypervisor (SRv6 Node) VM Tenant A VM Tenant B VM Tenant A VM Tenant B VM Tenant A VM Tenant B NFV (FW, IDS, ...) Transit Node IPv6 forwarding only without process for SRH Hypervisor (HV) • From VM → Encap • To VM → Decap Network Node (NN) • Legacy network/Internet/Tenants SRv6 Data Plane :: Architecture Overview Network Node (SRv6 Node) SRv6 unaware device
  5. DC SRv6 Domain Router Hypervisor1 C2::/96 NFV VRF Tenant A

    SID: C2::A VM A1 C1::/96 Network Node2 C1::/96 Network Node1 VRF Tenant A SID: C1::A Routing to SID :: Use eBGP • Create VRF(l3mdev) for each tenant on NN, HV • Assign IPv6 /96 block (Locator) to NN, HV ◦ Advertise /96 IPv6 address(Locator) via BGP (Network Node adv anycast-SID) ◦ Add identifier for each tenant to the Locator as Function (LINE uses specific address from 169.254.0.0/16 each tenant) • Use T.Encaps and End.DX4 only (We configure the End.DT4 as pseudo way) please see the appendix. :) • Importance: is ECMP available on your NW …? VRF Tenant B SID: C2::B VM B1 Hypervisor2 C3::/96 VRF Tenant A SID: C3::A VM A2 VRF Tenant B SID: C3::B VM B2 VRF Tenant B SID: C1::B VRF Tenant A SID: C1::A VRF Tenant B SID: C1::B Route Advertise (eBGP)
  6. Data Plane :: Same Tenant Packet flow DC SRv6 Domain

    Router Hypervisor1 C2::/96 NFV VRF1 Tenant A SID: C2::A VM A1 C1::/96 Network Node2 C1::/96 Network Node1 VRF1 Tenant A SID: C1::A VRF2 Tenant B SID: C2::B VM B1 Hypervisor2 C3::/96 VRF1 Tenant A SID: C3::A VM A2 VRF2 Tenant B SID: C3::B VM B2 VRF2 Tenant B SID: C1::B VRF1 Tenant A SID: C1::A VRF2 Tenant B SID: C1::B T.Encaps dst = C3::A End.DX4 nexthop VRF1 VM A1 (HV1 Tennant A) → VM A2 (HV2 Tennant A)
  7. DC SRv6 Domain Router Hypervisor1 C2::/96 NFV VRF1 Tenant A

    SID: C2::A VM A1 C1::/96 Network Node2 C1::/96 Network Node1 VRF1 Tenant A SID: C1::A VRF2 Tenant B SID: C2::B VM B1 Hypervisor2 C3::/96 VRF1 Tenant A SID: C3::A VM A2 VRF2 Tenant B SID: C3::B VM B2 VRF2 Tenant B SID: C1::B VRF1 Tenant A SID: C1::A VRF2 Tenant B SID: C1::B Data Plane :: Different Tenants Packet flow VM A1 (HV1 Tennant A) → VM B2 (HV2 Tennant B) T.Encaps dst = C1::A End.DX4 nexthop VRF1 T.Encaps dst = C3::B End.DX4 nexthop VRF2
  8. SRv6 Control Plane in DC • Choices: ◦ IS-IS, OSPF,

    BGP, SDN-controller • Requirements: ◦ scalable (network node, hypervisor, multi-openstack-cluster) ◦ can be connected from another network • Design Principle: ◦ follow the design and philosophy of Openstack ◦ simple, Loose coupling D/C-plane LINE uses OpenStack as Private Cloud Controller so adopted SDN Controller
  9. • Manage Tenants on Network-Node & Hypervisor • Configure Encap/Decap

    rule to Network-Node & Hypervisor Control-Plane DC SRv6 Domain Router Hypervisor1 C2::/96 NFV VRF Tenant A SID: C2::A VM A1 C1::/96 Network Node2 C1::/96 Network Node1 VRF Tenant A SID: C1::A Control-Plane VRF Tenant B SID: C2::B VM B1 Hypervisor2 C3::/96 VRF Tenant A SID: C3::A VM A2 VRF Tenant B SID: C3::B VM B2 VRF Tenant B SID: C1::B VRF Tenant A SID: C1::A VRF Tenant B SID: C1::B Route Advertise(eBGP)
  10. networking-sr :: Neutron SRv6 Plugin (1) ML2 mechanism/type driver and

    agent ◦ srv6: new network type driver ◦ mech_sr: new mechanism driver Controller node neutron server type driver srv6 mechanism driver mech_sr Service Plugin srv6_encap_network Hypervisor ml2 agent sr-agent Network node ml2 agent srgw-agent (2) 2 types of neutron agent ◦ SR-gw-agent on Network-Node ◦ SR-agent on Hypervisor (3) Service plugin for new API to add SRv6 encap rule
  11. Nova, Neutron Behavior VM Create & NW configuration Controller Neutron

    Hypervisor nova-compute 6. Detect tap VM tap neutron-agent 8. Config tap VRF 9. Create VRF 10. Set SRv6 encap/decap rules 7. Get/Update port Info 1. Create Network 2. Create VM 4. Run VM 5. Create tap 3. VM Info Nova Please check the appendix-3 about the information in the step7.
  12. Set encap rule from Port info of each VM Hypervisor-3

    VM5 tap neutron-agent VRF1 Hypervisor-2 neutron-agent VRF1 VM4 VM3 Hypervisor-1 neutron-agent VRF1 VM2 VM1 to VM5 T.Encaps SID=HV3::VRF1 to VM5 T.Encaps SID=HV3::VRF1 • to VM1 T.Encaps HV1::VRF1 • to VM2 T.Encaps HV1::VRF1 • to VM3 T.Encaps HV2::VRF1 • to VM4 T.Encaps HV2::VRF1
  13. Network 1 Network N cluste r 1 vrf 1 Network

    2 ・・・ cluste r 2 vrf 1 cluste r 3 vrf 1 OpenStack Cluster 1 OpenStack Cluster 2 OpenStack Cluster N ・・・ cluste r 1 vrf 1 cluste r 2 vrf 1 cluste r 3 vrf 1 cluste r 1 vrf 1 cluste r 2 vrf 1 cluste r 3 vrf 1 Network Node Requirements: Multi OpenStack Clusters and Scale Hypervisor VRF VM VM Hypervisor VRF VM VM Hypervisor VRF VM VM Network 1 Network N VRF Network 2 ・・・ VRF VRF VRF VRF VRF VRF VRF VRF
  14. solution:: Etcd + Agent Model Network 1 Network N cluster

    1 vrf 1 Network 2 ・・・ cluster 2 vrf 1 cluster 3 vrf 1 OpenStack Cluster 1 OpenStack Cluster 2 OpenStack Cluster N ・・・ cluster 1 vrf 1 cluster 2 vrf 1 cluster 3 vrf 1 cluster 1 vrf 1 cluster 2 vrf 1 cluster 3 vrf 1 agent agent agent etcd • Share encap rule for each network tenants w/ Etcd • Each Network Nodes share multi cluster information.
  15. Notify New Encap/Decap Rule via Etcd Controller Neutron Nova etcd

    3. Put port info Network srgw-agent VRF 4. Get changes 5. Create VRF and Set SRv6 encap/decap rules Hypervisor nova-compute 1. Detect tap VM tap 2. Get/ Update port Info sr-agent VRF
  16. SRv6 DCN with Openstack :: Current Summary • Data plane

    ◦ T.Encaps and End.DX4 only ◦ use the magic to provide pseudo End.DT4 using End.DX4 and vrf redirection ◦ underlay is typical clos-fabric (bgp-unnumbered and anycast) • Control plane ◦ develop networking-sr as neutron ML2 plugin ◦ scale (multi network-node, multi openstack-cluster) with etcd ◦ can be connected from another network with new-api • Next ◦ Performance improvement (see appendix-5) ◦ Control plane mechanizm improvement using BGP
  17. eBGP C-plane Neutron C-Plane Next Cplane Architecture :: BGP Integration

    ;) eBGP C-Plane Neutron C-Plane Underlay Network Overlay Network for Service C Overlay Network for Service B Overlay Network for Service A ipv6 ucast sr agnet srgw agnet Underlay Network Overlay Network for Service C Overlay Network for Service B Overlay Network for Service A ipv6 ucast vpnv4 ucast lightweight agent • BGP Integration using SRv6-VPNv4 • with Yet-Another SRv6-VPNv4 Implementation …? • Currently we are working for this new-control-plane • CAUTION :: THIS IS JUST A PLAN
  18. Previous Overlay’s C-plane :: Neutron Only Underlay Network Overlay Network

    for Service C Overlay Network for Service B Overlay Network for Service A • 利点 ◦ Etcdを用いたスケーラブルな設計 ◦ Neutron側でnetwork tenant等をすべて掌握 ◦ 基本的に初期に検討された要件はクリア! • 欠点 ◦ Neutron側の過負荷 (VM作成時のcplaneトラフィック等) ◦ Network Nodeの仕事が多い (複数のOpenstack clusterの管理等) eBGP C-plane ipv6 ucast Neutron C-Plane sr agnet srgw agnet
  19. New Overlay’s C-plane :: BGP + Neutron • 利点 ◦

    bgpを用いたスケーラブルな設計 ◦ Neutron側の負担軽減 ◦ よりコモディティな設計思想 Network Nodeの仕事を単純化 • 欠点 ◦ 実装の少なさ (OSSは皆無) ◦ 前回同様, 取り組みの前例が少ない Underlay Network Overlay Network for Service C Overlay Network for Service B Overlay Network for Service A eBGP C-Plane Neutron C-Plane ipv6 ucast vpnv4 ucast lightweight agent
  20. VPNv4 :: MPLSとSRv6の経路広告の違い MPLS-VPNの場合は MP_REACH_NLRIだけ それにVPN Labelも含む SRv6-VPNの場合は Prefix-SID属性も付与する. そこにSID情報を埋め込む.

    従来のMPLS-VPN BGP Update 新たなのSRv6-VPN BGP Update 注意1 CISCO, HUAWEIがこうなっ ているとは限らないことに 注意. 我々のPrototypeではこう なっているだけ. 今後で検証予定 draft-dawra-idr-srv6-vpn- 05を参考にしています . 注意2 dissectorをPrefix-SIDに対 応させています.
  21. BGP SRv6 VPNv4 + LINE’s DCN :: eBGP v.s. iBGP

    • eBGPの場合 ◦ CLOS fabricがVPNv4のAFIを理解する必要がある. (afi/safi=1/128は古くからあるので問題なし) ◦ SIDの広告をPrefix-SID path-attrを使う. ▪ Type 5: SRv6 L3 Service TLV. (SRv6 Routerはこれを理解する必要がある.) ▪ transitive bit = 1 で運用なのでeBGP網なら透過可能 ただ, FRR-4.0以降の場合, Prefix-SIDの解釈に問題あり(appendix-8) • iBGPの場合 ◦ Underlay-eBGP, Overlay-iBGP により 2面BGP網の運用が必要 ◦ SRv6 RouterのみVPNv4が喋れれば問題ない (== CLOS fabricは何も変更しなくてよい) ◦ 実現は簡単だがLINEの運用との相性の問題がある • どう思いますか? 私はeBGPのがエレガントでシンプルな気がします. 思い切ってeBGPでやりませんか!!
  22. RIB and FIB :: before(left) / after(right) DC SRv6 Domain

    CLOS Fabric Hypervisor 1 Network Node 2 Network Node 1 Hypervisor 2 NFV ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib DC SRv6 Domain CLOS Fabric Hypervisor 1 Network Node 2 Network Node 1 Hypervisor 2 NFV ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib ipv6 ucast rib ipv6 ucast fib vpnv4 ucast rib vpnv4 ucast fib vpnv4 ucast rib vpnv4 ucast fib vpnv4 ucast rib vpnv4 ucast fib vpnv4 ucast rib vpnv4 ucast fib vpnv4 ucast rib ipv6 ucast rib ipv6 ucast fib vpnv4 ucast rib ipv6 ucast rib ipv6 ucast fib vpnv4 ucast rib ipv6 ucast rib ipv6 ucast fib vpnv4 ucast rib ipv6 ucast rib ipv6 ucast fib vpnv4 ucast rib BGP transitivity
  23. router bgp 65001 bgp router-id 10.255.0.1 neighbor 2001:1::2 remote-as 65010

    ! address-family ipv4 unicast redistribute connected redistribute static exit-address-family ! address-family ipv4 vpn neighbor 2001:1::2 activate segment-routing-ipv6 exit-address-family ! router bgp 65001 vrf vrf1 bgp router-id 10.255.0.1 ! address-family ipv4 unicast redistribute connected sid vpn export 1:1:: rd vpn export 65001:1 rt vpn both 100:1 export vpn import vpn exit-address-family ! segment-routing-ipv6 encapsulation source-address 2001:1:: locator prefix 2001:1::/64 router bgp 65001 bgp router-id 10.255.0.1 neighbor 2001:1::2 remote-as 65010 ! address-family ipv4 unicast redistribute connected redistribute static exit-address-family ! address-family ipv4 vpn neighbor 2001:1::2 activate exit-address-family ! router bgp 65001 vrf vrf1 bgp router-id 10.255.0.1 ! address-family ipv4 unicast redistribute connected label vpn export 80 rd vpn export 65001:1 rt vpn both 100:1 export vpn import vpn exit-address-family FRR: MPLS-VPN CLI FRR: SRv6-VPN CLI (plan) draft-dawra-idr-srv6-vpn-04/05 draft-ietf-bess-srv6-services-00 • draft-ietf-idr-bgp-prefix-sid-27 ◦ BGP Prefix-SID • New BGP Prefix-SID sub-type ◦ SRv6 L3 Service TLV ◦ SRv6 L2 Service TLV • Published Implementation: ◦ vendor: IOS-XR, Huawei ◦ oss: nothing or FRRouting(?) SRv6-VPNv4 Support Status ..?
  24. Dpt. Verda, NW Development Team Hiroki Shirokur <[email protected]> job-apply (position/564)

    SRv6 DCN with Openstack :: Next Step • Data plane • Performance improvement (see appendix-5) • Control plane • BGP VPNv4 Integration • neutron :: Less Complication • frr :: standard MP-BGP SRv6 VPNv4 • We are working for VPNv4-SRv6 implementation :) • Current • End.DX4 & T.Encaps only • networking-sr :: Neutron ML2 plugin ここで更に議論しましょう!
  25. Appendix-1: Pseudo End.DT4 :: Work-Around for Linux Limitation [Configure for

    Pseudo End.DT4 table 1] bash# ip link add vrf1 type vrf table 1 bash# ip route add 169.254.99.10 dev vrf1 bash# ip route add A::1/128 encap seg6local \ action End.DX4 nh4 169.254.99.10 dev eth0 [Configure for Pseudo End.DT4 table 2] bash# ip link add vrf2 type vrf table 2 bash# ip route add 169.254.99.20 dev vrf2 bash# ip route add A::2/128 encap seg6local \ action End.DX4 nh4 169.254.99.20 dev eth0 SRv6 Node A::/96 VRF1 VRF2 eth0 A::2 Pseudo End.DT4 A::1 Pseudo End.DT4
  26. Appendix-2: encap/decap rules on Network Node [NetworkNode]# ip route show

    vrf vrf1 10.0.0.113 encap seg6 mode encap segs 1 [ 2001::ffc1:108 ] dev vrf5c0594737b87 scope link 10.0.0.114 encap seg6 mode encap segs 1 [ 2001::ffc1:108 ] dev vrf5c0594737b87 scope link 10.0.0.115 encap seg6 mode encap segs 1 [ 2001::ffc2:108 ] dev vrf5c0594737b87 scope link Locator(HV Address) Function (ID of each tenant) Encap [NetworkNode]# ip -6 route show table local local 2001::aaaa:102 encap seg6local action End.DX4 nh4 169.254.1.2 dev vrf2 local 2001::aaaa:104 encap seg6local action End.DX4 nh4 169.254.1.4 dev vrf4 local 2001::aaaa:108 encap seg6local action End.DX4 nh4 169.254.1.8 dev vrf8 Decap Locator(NN Address) Function (Tenant identifier) IPv4 address to identify each tenant. They are assigned to VRF IF (That is magic to lookup VRF in End.DX4) Destination IPv4 address of VM SID List (sorry, address is dummy number)
  27. Appendix-3: How does sr-agent get VRF info? Controller Neutron Compute

    nova-compute Nova neutron-agent VM Configuration 1. Create network 2. Create VM 3. Notify VM info 4. Run VM 5. Create tap NW Configuration 6. Detect tap 7. Update/Get port info 8. Config tap 9. Create VRF 10. Set SRv6 encap/decap rules 7. Get/Update port Info No.7 VRF info in port/binding:profile { "port":{ "binding:profile": { "segment_node_id": "2001::ffc1", # Locator where VM with the port running "vrf": "vrf1", # VRF IF name for the port. The name is combined by "vrf" + tenant_id + network_id "vrf_cidr": "169.254.1.0/24", # IP CIDR of VRF for the port "vrf_ip": "169.254.1.1" # IP Address of VRF for the port } } }
  28. Appendix-4 Service plugin for new API to add SRv6 encap

    rule • {network,tenant,project}_id: ID for nw tenant • encap_rules: SRv6 encap rule list ◦ destination: addr for specified dest (VIP) ◦ nexthop: SID list for SRH • id: ID for encap rule srv6_encap_network resource
  29. Appendix-4: NFV(LBaaS) and networking-sr with new API Controller Node Neutron

    Compute Node nova-compute VM1 tap neutron-agent VRF1 4. Set SRv6 encap rule Network Node neutron-srgw-agent VRF1 LBaaS 1. Create VIP 2. Add encap rule for VIP by srv6_encap_netowrk API 3. Notify encap rule tenant_id: Tenant User belongs network_id: Network VM connects encap_rules: destination means VIP, nexthop means SID of VRF1 on Network node VIP encap seg6 mode encap segs NetworkNode_VRF1_SID
  30. Appendix-5: Performance Issue & Current work • We have constructed

    the auto-loadtest-environment. • SRv6 forwarding / IPv6 forwarding ◦ comparison between each software-dataplanes ◦ TSO evaluation ◦ performance impact by new-feature • automatically test and report (generate figures)
  31. Appendix-6: BGP-Prefix-SID Type 5 (SRv6 L3 Service) • 古くに提案されたPrefix-SIDのsub-typeを拡張 ◦

    sub type 1: Label Index (SR-MPLS) ◦ sub type 2: IPv6 SID ◦ sub type 3: Originator SRGB (SR-MPLS) ◦ sub type 5: SRv6 L3 Service (new) ◦ sub type 6: SRv6 L2 Service (new) • BGP Prefix-SID Path-attr ◦ attribute type=40 (BGP-Prefix-SID) ◦ https://tools.ietf.org/html/draft-ietf-idr-bgp-prefix-sid-27 • WiresharkでPrefix-SIDのType-1,2,3は見れる ◦ Type-5(SRv6 L3 Service)のDissectorは別途開発 ◦ 今後upstreamに提案予定
  32. Appendix-7: eBGP underlay / iBGP overlay • UnderlayはeBGP, Overlayは同一Level間でiBGP ◦

    未対応AFI等も遠くのルータに伝搬できる. ◦ eBGPの場合 Path属性は透過性があるが, AFIの透過性はない ◦ EVPN等でよく利用される. • LINE DCの場合, 同一レベルでも異なるAS番号のため, 行わない. ◦ SRv6 VPNは既存のVPNv4 AFI (afi/safi=1/128) を使用するため今回のケースだとeBGPで問題ない. ◦ Underlay / Overlay で別BGPインスタンスの管理をする必要もない
  33. Appendix-8: FRRのPrefix-SIDの不具合 • Prefix-SID Path Attr (type=40)の実装に問題点ありと提案 ◦ Sub-Typeが未対応の場合Skipできない. (

    == transitive-path-attrにtransitivityがない) ◦ 挙動としてはmessage fetch errorを起こしてNOTIFICATION -> closeする ◦ 本家には Issue提出済み, かつ 修正のPR提案済み) → Mergeされたが, まだ治っていなかった... (すみません...)