Software engineering that supports LINE-original LBaaS

2019 DevDay Software Engineering That Supports LINE-Original LBaaS > Yutaro
Hayakawa > LINE Network Development Team Infrastructure Engineer

Speaker : Yutaro Hayakawa > Joined to the LINE Network
Development Team as a new graduate in this year > Working for Development and Operation for Load Balancers

A Private Cloud Service for LINE Developers Verda > OpenStack
Based Private Cloud > Since 2016

Scale of Verda 1400 Hyper Visors 20000+ Last Year 35000
Virtual Machines

A Private Cloud Service for LINE Developers Verda Compute Networking
Storage K8S Kafka Redis MySQL IaaS Managed Services Function Platform … … PaaS Load Balancer

Scales the LINE Applications Verda Load Balancer As A Service
(LBaaS) VM Service A VM VM VM VM Service B VM VM VM VM Service C VM VM VM Virtually Dedicated Load Balancers Shared Load Balancer Cluster

Two Types of Load Balancing For Different Requirements Server Server
Server Req Req Req Req Server Server Server Req Req Req Req Layer7 Load Balancer (L7LB) a.k.a Reverse Proxy Layer4 Load Balancer (L4LB)

Users of the Verda LBaaS Messanger Family Services LINE BLOG
LINE Clova Text Message Videos Icon Images Ads Etc…

> Prepare Certificates > Optimize Resource Allocation L7 Load Balancing
Service L4 Load Balancing Service > Efficient and Fast Data Plane > Completely Developed From Scratch API Server and Orchestration Systems > Automation Friendly Not Just “Operating” on It We Are “Developing” an LBaaS

Fundamental Problems Why? Operational Cost Availability Scalability

Service B Service A In the Past From Beginning of
the LINE to 2016 Service C User Traffic L4/L7 LB HW Appliances 1 + 1 Active Standby Server Server Server Server Server Server Server Server Server Server Server Server

Painful for Both of Operators and Users Operational Cost >Takes
about 1 ~ 2 days for registering the backends >Cannot meet rapidly increasing demands >CLI based manual operation

Scalability and Availability Issue Session Table Exhaustion Problem  > Session
Table  • Remember Client ⁵ Backend Server Mappings   per TCP Connection    > Doesn’t Scale With Large User Traffic    > DoS Attack Causes Big Outage  • TCP SYN Flood Attack Source Destination Client 1 Server A Client 2 Server D Client 3 Server A … … Client N Server X Session Table

Rethink the Load Balancing Fully Automated Reduce the Failure Domain
Scales With Large Traffic

Ride on the Shoulders of “Tech Giants” Research and Development
2010 2016 We Were Here Google Maglev[5] Microsoft Ananta[1] Facebook Talk in SRECon[4] Cloudflare Blog Post[2] MS Research Duet[3] 2019

A New Architecture Multi Tier N + 1 Load Balancer
Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier

Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier N + 1 Active Active

Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier Stateless No Session Table

Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier Software Software

Software Load Balancer Runs on Commodity Server Hardware $$ 10
Times Cheaper Than Appliances (Per 1 HTTPS Request) Operating Like a Server

Controller Design > Ordinary Python Web Application  • Provides API
To Interact With Load Balancer Clusters  • Fully Automated  > User Interface  • GUI, CLI, or Use API Directly  > Common Authentication With OpenStack  > Revision Management $ openstack verda lb create … $ openstack verda lb list

L7 Load Balancer Tier Design VIP4 … VIP3 Node2 VIP4
… VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling  • 2 Clusters For Each Verda Region  (Active - Active)  • k8s Node == Verda VM  • Bind VIPs to Deployment VIP6 VIP5 VIP2 VIP1 Node1 (Verda VM)

L7 Load Balancer Tier Design VIP2 … VIP1 Node2 VIP4
… VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling  • 2 Clusters For Each Verda Region  (Active - Active)  • k8s Node == Verda VM  • Bind VIPs to Deployment VIP4 VIP3 VIP2 VIP1 Node1 (Verda VM) VIP4 VIP3 VIP6 VIP5

L7 Load Balancer Tier Design Node1 (Verda VM) VIP2 …
VIP1 Node2 VIP4 … VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling  • 2 Clusters For Each Verda Region  (Active - Active)  • k8s Node == Verda VM  • Bind VIPs to Deployment VIP4 VIP3 VIP2 VIP1 VIP4 VIP3 VIP6 VIP5

L4 Load Balancer Tier Design L4LB Node1 Cluster2 … …
Verda Region A Cluster1 > Non-Orchestrated Physical Servers  • Due to Special Network  and Performance Requirements  • VIP settings are replicated among  multiple nodes  • Fully Scratched Data Plane  L4LB Node2 VIP1 VIP2 VIP3 VIP4 … L4LB Node3 VIP1 VIP2 VIP3 VIP4 … L4LB Node1 VIP5 VIP6 VIP7 … L4LB Node2 … L4LB Node3 … VIP5 VIP6 VIP7 VIP5 VIP6 VIP7 Fully Scratched Data Plane …

Why Do We Need To Scratch the L4 Load Balancer?
> The Common Problem of Software Based Load Balancer Is Performance  > Performance Objective for Single L4LB Instance Was 7Mpps  > Difficult To Achieve by Existing Load Balancer Software

500 LoC Fast L4 Load Balancer With XDP NIC Driver
Protocol Stack XDP User App User Kernel Physical NIC XDP (eXpress Data Path)  > “Fast Path” of Linux Network Stack • Hook Packets in NIC Driver  > Able To Write the Packet  Processing in Very Simple C Code  > Statically Verifies the “Safety” of the Code  C Code Packet Compile& Attach

How to “Keep” the Performance? Continuous Performance Test >Performance Is
a “Value” of the Service  >We Need To Continuously Make Sure We Could Keep the Performance  >Like CI/CD

How To Do the Reproducible Performance Test? Fully Automated Performance
Tests Traffic Generator Run Load Test PR Trigger Report Result GitHub Drone (CI/CD) Unified Test Environment Developer

The Case of On-Demand Feature Implementation (My first task)

Problem of the Stateless L4LB Drawback of the Stateless Approach
Cannot Do “Graceful” Shutdown Difficult To Failover

Packet Hash( 5tuple ) Hash Value Destination 0 Backend A 1 Backend C 2 Backend B 3 Backend D … … ServerD ServerC ServerB ServerA 1. Source IP 2. Destination IP 3. Source Port 4. Destination Port 5. Protocol Number

Hash Value Destination 0 Server A 1 Server C 2 Server B 3 Server D … … ServerD ServerC ServerB ServerA Hash Value Destination 0 Server D 1 Server C 2 Server B 3 Server B … … Cannot Do “Graceful” Shutdown  & Difficult To Failover

Consistent Hashing > Special Type of the Hash Function Which
Can Reduce the Possibility of Connection Disruption on Hash Table Update  > We Use Maglev Hashing [5]

Use Cases That Consistent Hashing Is Not Enough > Media
Platform  • Connection Suddenly Disrupted During File Upload, Playing Video …  > Ads Platform  • Miss the Ads Impression due to the Connection Reset …  > The Problem Was a Blocker When Media Platform Migrated to Verda

Session Caching Lookup Session cache Lookup Hash Table Update Session
Cache Hash Value Destination 0 Server A 1 Server C 2 Server B 3 Server D … … Source Destination Client1 Server A Client2 Server C Client3 Server B Client4 Server D … … Miss Hit Client → Server Mapping Hash → Server Mapping Session Cache Hash Table

Session Table Exhaustion Problem Again!

Stateless vs Stateful > Stateless Hashing  • Pros: Simple and
Scalable  • Cons: Difficult To Failover > Stateful Session Caching  • Pros: Easy To Failover  • Cons: Vulnerable to SYN Flood

Solution Hybrid Approach > Detect the SYN Flood on the
L4LB  > Fallback to the Stateless Mode for a While When It Detects the SYN Flood Time SYN/s Threshold SYN Flood!!!

Achievement > Took About 3 Months To Implement  > Already
Deployed to the Production  > Media Platform Team Successfully Migrated to Verda

Future Work

Integrating With Other Verda Services > Integration With Managed Kubernetes
Service  > Use our LBaaS as an “Ingress” or  “Type: Load Balancer”

Adapt our Load Balancers to New Network Architecture > Native
SRv6 Support > SRv6 Load Balancer (?)

(Again) Ride on the Shoulders of “Tech Giants” Research and
Development 2012 2019 We Are Here Google Magrev[5] Facebook Katran[7] Fastly Faild[6] Facebook Talk in SRECon[4] MS Research Duet[3] GitHub GLB[8] NEW! 2018

References [1] Patel, Parveen, et al. "Ananta: Cloud scale load
balancing." ACM SIGCOMM Computer Communication Review. Vol. 43. No. 4. ACM, 2013. [2] https://blog.cloudflare.com/cloudflares-architecture-eliminating-single-p/ [3] Gandhi, Rohan, et al. "Duet: Cloud scale load balancing with hardware and software." ACM SIGCOMM Computer Communication Review. Vol. 44. No. 4. ACM, 2014. [4] https://www.usenix.org/conference/srecon15/program/presentation/shuff [5] Eisenbud, Daniel E., et al. "Maglev: A fast and reliable software network load balancer." 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 2016. [6] Araújo, João Taveira, et al. "Balancing on the edge: Transport affinity without network state." 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 2018. [7] https://github.com/facebookincubator/katran [8] https://github.com/github/glb-director

Thank You!

Software engineering that supports LINE-origina...

Software engineering that supports LINE-original LBaaS

More Decks by LINE DevDay 2019

Other Decks in Technology

Featured

Transcript