Upgrade to Pro — share decks privately, control downloads, hide ads and more …

“Do you have a virtual router?” Discuss how to ...

“Do you have a virtual router?” Discuss how to use virtual routers

JANOG51 Meeting
https://www.janog.gr.jp/meeting/janog51/vr/

Verda Network Development Team, LINE Corporation
Hiroki Shirokura

LINE Developers

January 18, 2023
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. “Do you have a virtual router?” Discuss how to use

    virtual routers LINE Corporation Hiroki Shirokura https://www.janog.gr.jp/meeting/janog51/VR-en/ 1
  2. • Senior Software Engineer @ Private Cloud • Responsibility: SDN,

    Cloud Networking • Design / Implementation / Reliability • SRv6, BGP OSS Upstream Developer • FRRouting, ExaBGP, etc.. • https://github.com/slankdev/ • HN: slankdev I’m Hiroki Shirokura from LINE 2 I both Control-plane, Data-plane
  3. LINE’s Growth Strategy :: Smart Portal == Realising Super App

    Our goal is to create a "Smart Portal" that seamlessly connects people, information, services, companies, and brands to whatever they need, both online and offline, using "LINE" as a gateway. reference to our factbook 3
  4. 3 SWEs for stable-services • system operator ◦ customer support

    ◦ maintenance • software developer • project manager 1 SWEs for newly-provided-services • system architect • software architect/developer • project manager
  5. (3, a) VPC Networking What is constructing Cloud Networking @

    LINE 5 VM ASBR ASBR DCI-BB DCI-BB internet Other Region VPC Gateway (3,b) NAT(3,a) L4LB(2,a) L7LB(3,a) DNS(3,a) • Platform ◦ (1) Commercial Network Box ▪ Routing Platform: Clos-Fabric, External Networking ▪ Security Platform: Firewall, IDS, etc.. for Fintech ◦ (2) Baremetal x86 Server (XDP, TC-BPF, DPDK, Linux-kernel) ◦ (3) Virtual x86 Server (XDP, TC-BPF, DPDK, Linux-kernel) • How we use it ◦ (a) Shared Dplane: many tenant use the same cluster at the same time ◦ (b) Dedicated Dplane: only one tenant use their cluster(s) • Recent our Interests Landscape in Operation ◦ Issue: Noisy neighbor in (3,a) ◦ Challenge: Development Scalability ◦ Out-of-Focus (currently): ▪ Smart-NIC, Programmable Dataplane ▪ Hypervisor Networking big change (1,a)
  6. Many Network/Software Challenges are already Publicated 6 linedevday/2020/sessions/2076 linedevday/2019/sessions/F1-7 linedevday/2019/sessions/E1-2

    janog48/linenfv janog48/linedns janog45/srv6xdp line.connpass/184927 line.connpass/184927 nvidia/gtc janog43/line wide meeting 2019 janog50/multiaz janog49/dcnsrv6
  7. Region-B Region-C Region-A Internet CLOS Aggregate NW Verda Verda Verda

    Dedicated Infra Dedicated Infra Dedicated Infra verda is our private cloud Verda VMs 100,000+ Verda HVs 10,000+ Dedicated HVs ~300 (2023.01.01)
  8. 9 0.1-% Application consumes 50+% Bandwidth 70 Gbps Amazingly, 50+%

    of bandwidth is for single Platform. 0.1-% consumes 50+% bandwidth @LINE’s DCN #VIP = 6000+ #VIP for single platform = 4
  9. Shared Dataplane’s Noisy Neighbor Basics • When we operate L4LB

    (Baremetal, Shared), we usually don’t face noisy-neighbor affects • When we operate L7LB, NAT (Virtual, Shared), we usually face noisy-neighbor affects • It is difficult to judge, which is the reason, “VM Network Interface”, “Network Function Kind” or “Both”. WE WANT TO PROTECT SOME TENANT’s AVAILABILITY FROM SOME ANOTHER TENANT’s RUSH 10 2 Big Topics • Performance of NF’s Implementation • L4LB -> XDP • L7LB -> HAProxy • NAT -> Netfilter + ip-rule + lwt-bpf • Performance of VM networking • We don’t start to use multi-queue virtio-net in hypervisor. • We faced kernel panic when RPS is enabled with LWT-bpf
  10. Preparing Software Based Network Function • Approaches: • Approach-1: Dplane

    development using eBPF, DPDK, etc… • Approach-2: Use existing software like linux-kernel, ovs, etc… • In both case, scale out is necessary while operation • It is really difficult to achieve high performance for some cases (nat, tls-termination, etc…) • But we just want to scale-out the networking, instead of scale-up • But we always feel that single box performance is important for noisy neighbor • eBPF v.s. DPDK • XDP_PASS is amazing to focus for only FAST PATH • Our priority: development cost is higher than fine tuning 11
  11. SDN System Architecture Design Knowledge(2) NAT dplane performance issue and

    its kernel panic • About Distributed NAT routing architecture: linedevday/2020/2076 , gihyo/line2021/0002 • Background ◦ Increasing users after 1st release ◦ There were 6 Linux servers as NAT dplane ▪ They are working as act/act, No session state sync ▪ 8vCPU/8GB-RAM x6 = 48vCPU ▪ RPS/RSS are disabled → Only 6vCPU are working 12 Internnet Internnet Immediately after release Increased users NAT Dplane Client core Internnet
  12. SDN System Architecture Design Knowledge(2) NAT dplane performance issue and

    its kernel panic • We enable RPS to use all cores • Few days later… weird kernel panics are occured in some servers • Few weeks later… All dplane servers are downed one by one, due to the same issue… ◦ There are some 秘孔 to make the server downed... 13 Internnet RPS enabled core Internnet Internnet Increased users
  13. SDN System Architecture Design Knowledge(2) NAT dplane performance issue and

    its kernel panic • We enable RPS to use all cores • Few days later… weird kernel panics are occured in some servers • Few weeks later… All dplane servers are downed one by one, due to the same issue… ◦ There are some 秘孔 to make the server downed... 14 Internnet RPS enabled core Internnet Internnet Increased users Kernel Panic! It was HELL...
  14. SDN System Architecture Design Knowledge(2) NAT dplane performance issue and

    its kernel panic • Then, we disabled RPS again • And we scaled out dplane nodes x3 (6 servers → 18 servers) now it’s 68 servers • Lesson learned ◦ (1) If your environment isn’t Majority case, be careful for tuning (LWT-BPF, etc..) ◦ (2) Scale out is right ◦ (3) Almost user work-loads were HTTPs/HTTP, It was easy to maintain ◦ (4) Operation Rehearsal ◦ (5) Performance lab 15 Scaled out Internnet Increased users Internnet core Internnet When it’s Baremetal, it takes time, but it’s Virtual, we achieved scaling operation in 3 days
  15. RollingUpdate • RollingUpdate Controller Capability ◦ policy: what kind of

    action for each “Endpoint” • Endpoint Capability ◦ maintenance mode: routing advertisement stop, etc.. 16
  16. RollingUpdate 17 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} (1) create (2) watch
  17. RollingUpdate 18 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (3) status set
  18. RollingUpdate 19 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (4) Update Endpoint1 as maint-mode Maintenance Mode
  19. RollingUpdate 20 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (5) Delete child NfvMachine Maintenance Mode Delete
  20. RollingUpdate 21 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (6) wait boot-up (7) Reconcile to create NfvMachine for Endpoint RE Ctrlr Maintenance Mode Creating
  21. RollingUpdate 22 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED RE Ctrlr Maintenance Mode Created
  22. RollingUpdate 23 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: FINISHED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED Maintenance Mode (8) disable maintenance mode
  23. RollingUpdate 24 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate

    preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED same procedures Done
  24. Growing Journey of RollingUpdate • 2021.09 initial release ◦ machine-recreate

    • 2022.05 policy abstraction ◦ container-refresh ◦ loader-container-refresh • 2022.08 no-maint-mode 25
  25. Conclusion • In house software, eBPF is just a approach

    for us ◦ System architecture which allows us to scale the whole performance is necessary • ここで話しにくい内容 ぜひ懇親会やロビーでお話しましょう! ◦ LINE ブースにお越しください! 26