Software engineering that supports LINE-original LBaaS

Software engineering that supports LINE-original LBaaS

Yutaro Hayakawa
LINE Network Development Team Infrastructure Engineer
https://linedevday.linecorp.com/jp/2019/sessions/F1-7

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Software Engineering That Supports LINE-Original LBaaS > Yutaro

    Hayakawa > LINE Network Development Team Infrastructure Engineer
  2. Speaker : Yutaro Hayakawa > Joined to the LINE Network

    Development Team as a new graduate in this year > Working for Development and Operation for Load Balancers
  3. A Private Cloud Service for LINE Developers Verda > OpenStack

    Based Private Cloud > Since 2016
  4. Scale of Verda 1400 Hyper Visors 20000+ Last Year 35000

    Virtual Machines
  5. A Private Cloud Service for LINE Developers Verda Compute Networking

    Storage K8S Kafka Redis MySQL IaaS Managed Services Function Platform … … PaaS Load Balancer
  6. Scales the LINE Applications Verda Load Balancer As A Service

    (LBaaS) VM Service A VM VM VM VM Service B VM VM VM VM Service C VM VM VM Virtually Dedicated Load Balancers Shared Load Balancer Cluster
  7. Two Types of Load Balancing For Different Requirements Server Server

    Server Req Req Req Req Server Server Server Req Req Req Req Layer7 Load Balancer (L7LB) a.k.a Reverse Proxy Layer4 Load Balancer (L4LB)
  8. Users of the Verda LBaaS Messanger Family Services LINE BLOG

    LINE Clova Text Message Videos Icon Images Ads Etc…
  9. > Prepare Certificates > Optimize Resource Allocation L7 Load Balancing

    Service L4 Load Balancing Service > Efficient and Fast Data Plane > Completely Developed From Scratch API Server and Orchestration Systems > Automation Friendly Not Just “Operating” on It We Are “Developing” an LBaaS
  10. Fundamental Problems Why? Operational Cost Availability Scalability

  11. Service B Service A In the Past From Beginning of

    the LINE to 2016 Service C User Traffic L4/L7 LB HW Appliances 1 + 1 Active Standby Server Server Server Server Server Server Server Server Server Server Server Server
  12. Painful for Both of Operators and Users Operational Cost >Takes

    about 1 ~ 2 days for registering the backends >Cannot meet rapidly increasing demands >CLI based manual operation
  13. Scalability and Availability Issue Session Table Exhaustion Problem
 > Session

    Table
 • Remember Client ⁵ Backend Server Mappings 
 per TCP Connection
 
 > Doesn’t Scale With Large User Traffic
 
 > DoS Attack Causes Big Outage
 • TCP SYN Flood Attack Source Destination Client 1 Server A Client 2 Server D Client 3 Server A … … Client N Server X Session Table
  14. Rethink the Load Balancing Fully Automated Reduce the Failure Domain

    Scales With Large Traffic
  15. Ride on the Shoulders of “Tech Giants” Research and Development

    2010 2016 We Were Here Google Maglev[5] Microsoft Ananta[1] Facebook Talk in SRECon[4] Cloudflare Blog Post[2] MS Research Duet[3] 2019
  16. A New Architecture Multi Tier N + 1 Load Balancer

    Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier
  17. A New Architecture Multi Tier N + 1 Load Balancer

    Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier N + 1 Active Active
  18. A New Architecture Multi Tier N + 1 Load Balancer

    Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier Stateless No Session Table
  19. A New Architecture Multi Tier N + 1 Load Balancer

    Cluster Server Server Server Server Server Server L4LB Tier L7LB Tier Router Tier Software Software
  20. Software Load Balancer Runs on Commodity Server Hardware $$ 10

    Times Cheaper Than Appliances (Per 1 HTTPS Request) Operating Like a Server
  21. Controller Design > Ordinary Python Web Application
 • Provides API

    To Interact With Load Balancer Clusters
 • Fully Automated
 > User Interface
 • GUI, CLI, or Use API Directly
 > Common Authentication With OpenStack
 > Revision Management $ openstack verda lb create … $ openstack verda lb list
  22. L7 Load Balancer Tier Design VIP4 … VIP3 Node2 VIP4

    … VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling
 • 2 Clusters For Each Verda Region
 (Active - Active)
 • k8s Node == Verda VM
 • Bind VIPs to Deployment VIP6 VIP5 VIP2 VIP1 Node1 (Verda VM)
  23. L7 Load Balancer Tier Design VIP2 … VIP1 Node2 VIP4

    … VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling
 • 2 Clusters For Each Verda Region
 (Active - Active)
 • k8s Node == Verda VM
 • Bind VIPs to Deployment VIP4 VIP3 VIP2 VIP1 Node1 (Verda VM) VIP4 VIP3 VIP6 VIP5
  24. L7 Load Balancer Tier Design Node1 (Verda VM) VIP2 …

    VIP1 Node2 VIP4 … VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling
 • 2 Clusters For Each Verda Region
 (Active - Active)
 • k8s Node == Verda VM
 • Bind VIPs to Deployment VIP4 VIP3 VIP2 VIP1 VIP4 VIP3 VIP6 VIP5
  25. L7 Load Balancer Tier Design Node1 (Verda VM) VIP2 …

    VIP1 Node2 VIP4 … VIP3 VIP6 VIP5 Node1 … VIP2 VIP1 VIP4 VIP3 Node2 VIP4 VIP3 Cluster2 VIP2 VIP1 … … … Verda Region A Cluster1 > Use k8s for Resource Scheduling
 • 2 Clusters For Each Verda Region
 (Active - Active)
 • k8s Node == Verda VM
 • Bind VIPs to Deployment VIP4 VIP3 VIP2 VIP1 VIP4 VIP3 VIP6 VIP5
  26. L4 Load Balancer Tier Design L4LB Node1 Cluster2 … …

    Verda Region A Cluster1 > Non-Orchestrated Physical Servers
 • Due to Special Network
 and Performance Requirements
 • VIP settings are replicated among
 multiple nodes
 • Fully Scratched Data Plane
 L4LB Node2 VIP1 VIP2 VIP3 VIP4 … L4LB Node3 VIP1 VIP2 VIP3 VIP4 … L4LB Node1 VIP5 VIP6 VIP7 … L4LB Node2 … L4LB Node3 … VIP5 VIP6 VIP7 VIP5 VIP6 VIP7 Fully Scratched Data Plane …
  27. L4 Load Balancer Tier Design L4LB Node1 Cluster2 … …

    Verda Region A Cluster1 > Non-Orchestrated Physical Servers
 • Due to Special Network
 and Performance Requirements
 • VIP settings are replicated among
 multiple nodes
 • Fully Scratched Data Plane
 L4LB Node2 VIP1 VIP2 VIP3 VIP4 … L4LB Node3 VIP1 VIP2 VIP3 VIP4 … L4LB Node1 VIP5 VIP6 VIP7 … L4LB Node2 … L4LB Node3 … VIP5 VIP6 VIP7 VIP5 VIP6 VIP7 Fully Scratched Data Plane …
  28. Why Do We Need To Scratch the L4 Load Balancer?

    > The Common Problem of Software Based Load Balancer Is Performance
 > Performance Objective for Single L4LB Instance Was 7Mpps
 > Difficult To Achieve by Existing Load Balancer Software
  29. 500 LoC Fast L4 Load Balancer With XDP NIC Driver

    Protocol Stack XDP User App User Kernel Physical NIC XDP (eXpress Data Path)
 > “Fast Path” of Linux Network Stack • Hook Packets in NIC Driver
 > Able To Write the Packet
 Processing in Very Simple C Code
 > Statically Verifies the “Safety” of the Code
 C Code Packet Compile& Attach
  30. How to “Keep” the Performance? Continuous Performance Test >Performance Is

    a “Value” of the Service
 >We Need To Continuously Make Sure We Could Keep the Performance
 >Like CI/CD
  31. How To Do the Reproducible Performance Test? Fully Automated Performance

    Tests Traffic Generator Run Load Test PR Trigger Report Result GitHub Drone (CI/CD) Unified Test Environment Developer
  32. How To Do the Reproducible Performance Test? Fully Automated Performance

    Tests Traffic Generator Run Load Test PR Trigger Report Result GitHub Drone (CI/CD) Unified Test Environment Developer
  33. The Case of On-Demand Feature Implementation (My first task)

  34. Problem of the Stateless L4LB Drawback of the Stateless Approach

    Cannot Do “Graceful” Shutdown Difficult To Failover
  35. Problem of the Stateless L4LB Drawback of the Stateless Approach

    Packet Hash( 5tuple ) Hash Value Destination 0 Backend A 1 Backend C 2 Backend B 3 Backend D … … ServerD ServerC ServerB ServerA 1. Source IP 2. Destination IP 3. Source Port 4. Destination Port 5. Protocol Number
  36. Problem of the Stateless L4LB Drawback of the Stateless Approach

    Hash Value Destination 0 Server A 1 Server C 2 Server B 3 Server D … … ServerD ServerC ServerB ServerA Hash Value Destination 0 Server D 1 Server C 2 Server B 3 Server B … … Cannot Do “Graceful” Shutdown
 & Difficult To Failover
  37. Consistent Hashing > Special Type of the Hash Function Which

    Can Reduce the Possibility of Connection Disruption on Hash Table Update
 > We Use Maglev Hashing [5]
  38. Use Cases That Consistent Hashing Is Not Enough > Media

    Platform
 • Connection Suddenly Disrupted During File Upload, Playing Video …
 > Ads Platform
 • Miss the Ads Impression due to the Connection Reset …
 > The Problem Was a Blocker When Media Platform Migrated to Verda
  39. Session Caching Lookup Session cache Lookup Hash Table Update Session

    Cache Hash Value Destination 0 Server A 1 Server C 2 Server B 3 Server D … … Source Destination Client1 Server A Client2 Server C Client3 Server B Client4 Server D … … Miss Hit Client → Server Mapping Hash → Server Mapping Session Cache Hash Table
  40. Session Table Exhaustion Problem Again!

  41. Stateless vs Stateful > Stateless Hashing
 • Pros: Simple and

    Scalable
 • Cons: Difficult To Failover > Stateful Session Caching
 • Pros: Easy To Failover
 • Cons: Vulnerable to SYN Flood
  42. Solution Hybrid Approach > Detect the SYN Flood on the

    L4LB
 > Fallback to the Stateless Mode for a While When It Detects the SYN Flood Time SYN/s Threshold SYN Flood!!!
  43. Achievement > Took About 3 Months To Implement
 > Already

    Deployed to the Production
 > Media Platform Team Successfully Migrated to Verda
  44. Future Work

  45. Integrating With Other Verda Services > Integration With Managed Kubernetes

    Service
 > Use our LBaaS as an “Ingress” or
 “Type: Load Balancer”
  46. Adapt our Load Balancers to New Network Architecture > Native

    SRv6 Support > SRv6 Load Balancer (?)
  47. (Again) Ride on the Shoulders of “Tech Giants” Research and

    Development 2012 2019 We Are Here Google Magrev[5] Facebook Katran[7] Fastly Faild[6] Facebook Talk in SRECon[4] MS Research Duet[3] GitHub GLB[8] NEW! 2018
  48. References [1] Patel, Parveen, et al. "Ananta: Cloud scale load

    balancing." ACM SIGCOMM Computer Communication Review. Vol. 43. No. 4. ACM, 2013. [2] https://blog.cloudflare.com/cloudflares-architecture-eliminating-single-p/ [3] Gandhi, Rohan, et al. "Duet: Cloud scale load balancing with hardware and software." ACM SIGCOMM Computer Communication Review. Vol. 44. No. 4. ACM, 2014. [4] https://www.usenix.org/conference/srecon15/program/presentation/shuff [5] Eisenbud, Daniel E., et al. "Maglev: A fast and reliable software network load balancer." 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 2016. [6] Araújo, João Taveira, et al. "Balancing on the edge: Transport affinity without network state." 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 2018. [7] https://github.com/facebookincubator/katran [8] https://github.com/github/glb-director
  49. Thank You!