Slide 1

Slide 1 text

“Do you have a virtual router?” Discuss how to use virtual routers LINE Corporation Hiroki Shirokura https://www.janog.gr.jp/meeting/janog51/VR-en/ 1

Slide 2

Slide 2 text

• Senior Software Engineer @ Private Cloud • Responsibility: SDN, Cloud Networking • Design / Implementation / Reliability • SRv6, BGP OSS Upstream Developer • FRRouting, ExaBGP, etc.. • https://github.com/slankdev/ • HN: slankdev I’m Hiroki Shirokura from LINE 2 I both Control-plane, Data-plane

Slide 3

Slide 3 text

LINE’s Growth Strategy :: Smart Portal == Realising Super App Our goal is to create a "Smart Portal" that seamlessly connects people, information, services, companies, and brands to whatever they need, both online and offline, using "LINE" as a gateway. reference to our factbook 3

Slide 4

Slide 4 text

3 SWEs for stable-services ● system operator ○ customer support ○ maintenance ● software developer ● project manager 1 SWEs for newly-provided-services ● system architect ● software architect/developer ● project manager

Slide 5

Slide 5 text

(3, a) VPC Networking What is constructing Cloud Networking @ LINE 5 VM ASBR ASBR DCI-BB DCI-BB internet Other Region VPC Gateway (3,b) NAT(3,a) L4LB(2,a) L7LB(3,a) DNS(3,a) ● Platform ○ (1) Commercial Network Box ■ Routing Platform: Clos-Fabric, External Networking ■ Security Platform: Firewall, IDS, etc.. for Fintech ○ (2) Baremetal x86 Server (XDP, TC-BPF, DPDK, Linux-kernel) ○ (3) Virtual x86 Server (XDP, TC-BPF, DPDK, Linux-kernel) ● How we use it ○ (a) Shared Dplane: many tenant use the same cluster at the same time ○ (b) Dedicated Dplane: only one tenant use their cluster(s) ● Recent our Interests Landscape in Operation ○ Issue: Noisy neighbor in (3,a) ○ Challenge: Development Scalability ○ Out-of-Focus (currently): ■ Smart-NIC, Programmable Dataplane ■ Hypervisor Networking big change (1,a)

Slide 6

Slide 6 text

Many Network/Software Challenges are already Publicated 6 linedevday/2020/sessions/2076 linedevday/2019/sessions/F1-7 linedevday/2019/sessions/E1-2 janog48/linenfv janog48/linedns janog45/srv6xdp line.connpass/184927 line.connpass/184927 nvidia/gtc janog43/line wide meeting 2019 janog50/multiaz janog49/dcnsrv6

Slide 7

Slide 7 text

Region-B Region-C Region-A Internet CLOS Aggregate NW Verda Verda Verda Dedicated Infra Dedicated Infra Dedicated Infra verda is our private cloud Verda VMs 100,000+ Verda HVs 10,000+ Dedicated HVs ~300 (2023.01.01)

Slide 8

Slide 8 text

8 When we met “new year greeting rush” by you guys

Slide 9

Slide 9 text

9 0.1-% Application consumes 50+% Bandwidth 70 Gbps Amazingly, 50+% of bandwidth is for single Platform. 0.1-% consumes 50+% bandwidth @LINE’s DCN #VIP = 6000+ #VIP for single platform = 4

Slide 10

Slide 10 text

Shared Dataplane’s Noisy Neighbor Basics • When we operate L4LB (Baremetal, Shared), we usually don’t face noisy-neighbor affects • When we operate L7LB, NAT (Virtual, Shared), we usually face noisy-neighbor affects • It is difficult to judge, which is the reason, “VM Network Interface”, “Network Function Kind” or “Both”. WE WANT TO PROTECT SOME TENANT’s AVAILABILITY FROM SOME ANOTHER TENANT’s RUSH 10 2 Big Topics • Performance of NF’s Implementation • L4LB -> XDP • L7LB -> HAProxy • NAT -> Netfilter + ip-rule + lwt-bpf • Performance of VM networking • We don’t start to use multi-queue virtio-net in hypervisor. • We faced kernel panic when RPS is enabled with LWT-bpf

Slide 11

Slide 11 text

Preparing Software Based Network Function • Approaches: • Approach-1: Dplane development using eBPF, DPDK, etc… • Approach-2: Use existing software like linux-kernel, ovs, etc… • In both case, scale out is necessary while operation • It is really difficult to achieve high performance for some cases (nat, tls-termination, etc…) • But we just want to scale-out the networking, instead of scale-up • But we always feel that single box performance is important for noisy neighbor • eBPF v.s. DPDK • XDP_PASS is amazing to focus for only FAST PATH • Our priority: development cost is higher than fine tuning 11

Slide 12

Slide 12 text

SDN System Architecture Design Knowledge(2) NAT dplane performance issue and its kernel panic ● About Distributed NAT routing architecture: linedevday/2020/2076 , gihyo/line2021/0002 ● Background ○ Increasing users after 1st release ○ There were 6 Linux servers as NAT dplane ■ They are working as act/act, No session state sync ■ 8vCPU/8GB-RAM x6 = 48vCPU ■ RPS/RSS are disabled → Only 6vCPU are working 12 Internnet Internnet Immediately after release Increased users NAT Dplane Client core Internnet

Slide 13

Slide 13 text

SDN System Architecture Design Knowledge(2) NAT dplane performance issue and its kernel panic ● We enable RPS to use all cores ● Few days later… weird kernel panics are occured in some servers ● Few weeks later… All dplane servers are downed one by one, due to the same issue… ○ There are some 秘孔 to make the server downed... 13 Internnet RPS enabled core Internnet Internnet Increased users

Slide 14

Slide 14 text

SDN System Architecture Design Knowledge(2) NAT dplane performance issue and its kernel panic ● We enable RPS to use all cores ● Few days later… weird kernel panics are occured in some servers ● Few weeks later… All dplane servers are downed one by one, due to the same issue… ○ There are some 秘孔 to make the server downed... 14 Internnet RPS enabled core Internnet Internnet Increased users Kernel Panic! It was HELL...

Slide 15

Slide 15 text

SDN System Architecture Design Knowledge(2) NAT dplane performance issue and its kernel panic ● Then, we disabled RPS again ● And we scaled out dplane nodes x3 (6 servers → 18 servers) now it’s 68 servers ● Lesson learned ○ (1) If your environment isn’t Majority case, be careful for tuning (LWT-BPF, etc..) ○ (2) Scale out is right ○ (3) Almost user work-loads were HTTPs/HTTP, It was easy to maintain ○ (4) Operation Rehearsal ○ (5) Performance lab 15 Scaled out Internnet Increased users Internnet core Internnet When it’s Baremetal, it takes time, but it’s Virtual, we achieved scaling operation in 3 days

Slide 16

Slide 16 text

RollingUpdate ● RollingUpdate Controller Capability ○ policy: what kind of action for each “Endpoint” ● Endpoint Capability ○ maintenance mode: routing advertisement stop, etc.. 16

Slide 17

Slide 17 text

RollingUpdate 17 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} (1) create (2) watch

Slide 18

Slide 18 text

RollingUpdate 18 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (3) status set

Slide 19

Slide 19 text

RollingUpdate 19 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (4) Update Endpoint1 as maint-mode Maintenance Mode

Slide 20

Slide 20 text

RollingUpdate 20 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (5) Delete child NfvMachine Maintenance Mode Delete

Slide 21

Slide 21 text

RollingUpdate 21 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED (6) wait boot-up (7) Reconcile to create NfvMachine for Endpoint RE Ctrlr Maintenance Mode Creating

Slide 22

Slide 22 text

RollingUpdate 22 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED RE Ctrlr Maintenance Mode Created

Slide 23

Slide 23 text

RollingUpdate 23 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: FINISHED - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED Maintenance Mode (8) disable maintenance mode

Slide 24

Slide 24 text

RollingUpdate 24 kind: RollingUpdate metadata: name: update-20221015 spec: policy: machine-recreate preAction: maint-mode target: kind: RoutingGateway name: Gateway params: {...} status: childTargets: - kind: RoutingEndpoint name: Endpoint1 state: WAIT - kind: RoutingEndpoint name: Endpoint2 state: NOT_STARTED - kind: RoutingEndpoint name: Endpoint3 state: NOT_STARTED same procedures Done

Slide 25

Slide 25 text

Growing Journey of RollingUpdate ● 2021.09 initial release ○ machine-recreate ● 2022.05 policy abstraction ○ container-refresh ○ loader-container-refresh ● 2022.08 no-maint-mode 25

Slide 26

Slide 26 text

Conclusion ● In house software, eBPF is just a approach for us ○ System architecture which allows us to scale the whole performance is necessary ● ここで話しにくい内容 ぜひ懇親会やロビーでお話しましょう! ○ LINE ブースにお越しください! 26