Service L4 Load Balancing Service > Efficient and Fast Data Plane > Completely Developed From Scratch API Server and Orchestration Systems > Automation Friendly Not Just “Operating” on It We Are “Developing” an LBaaS
the LINE to 2016 Service C User Traffic L4/L7 LB HW Appliances 1 + 1 Active Standby Server Server Server Server Server Server Server Server Server Server Server Server
Table • Remember Client ⁵ Backend Server Mappings per TCP Connection > Doesn’t Scale With Large User Traffic > DoS Attack Causes Big Outage • TCP SYN Flood Attack Source Destination Client 1 Server A Client 2 Server D Client 3 Server A … … Client N Server X Session Table
To Interact With Load Balancer Clusters • Fully Automated > User Interface • GUI, CLI, or Use API Directly > Common Authentication With OpenStack > Revision Management $ openstack verda lb create … $ openstack verda lb list
> The Common Problem of Software Based Load Balancer Is Performance > Performance Objective for Single L4LB Instance Was 7Mpps > Difficult To Achieve by Existing Load Balancer Software
Protocol Stack XDP User App User Kernel Physical NIC XDP (eXpress Data Path) > “Fast Path” of Linux Network Stack • Hook Packets in NIC Driver > Able To Write the Packet Processing in Very Simple C Code > Statically Verifies the “Safety” of the Code C Code Packet Compile& Attach
Packet Hash( 5tuple ) Hash Value Destination 0 Backend A 1 Backend C 2 Backend B 3 Backend D … … ServerD ServerC ServerB ServerA 1. Source IP 2. Destination IP 3. Source Port 4. Destination Port 5. Protocol Number
Hash Value Destination 0 Server A 1 Server C 2 Server B 3 Server D … … ServerD ServerC ServerB ServerA Hash Value Destination 0 Server D 1 Server C 2 Server B 3 Server B … … Cannot Do “Graceful” Shutdown & Difficult To Failover
Platform • Connection Suddenly Disrupted During File Upload, Playing Video … > Ads Platform • Miss the Ads Impression due to the Connection Reset … > The Problem Was a Blocker When Media Platform Migrated to Verda
Cache Hash Value Destination 0 Server A 1 Server C 2 Server B 3 Server D … … Source Destination Client1 Server A Client2 Server C Client3 Server B Client4 Server D … … Miss Hit Client → Server Mapping Hash → Server Mapping Session Cache Hash Table
Development 2012 2019 We Are Here Google Magrev[5] Facebook Katran[7] Fastly Faild[6] Facebook Talk in SRECon[4] MS Research Duet[3] GitHub GLB[8] NEW! 2018
balancing." ACM SIGCOMM Computer Communication Review. Vol. 43. No. 4. ACM, 2013. [2] https://blog.cloudflare.com/cloudflares-architecture-eliminating-single-p/ [3] Gandhi, Rohan, et al. "Duet: Cloud scale load balancing with hardware and software." ACM SIGCOMM Computer Communication Review. Vol. 44. No. 4. ACM, 2014. [4] https://www.usenix.org/conference/srecon15/program/presentation/shuff [5] Eisenbud, Daniel E., et al. "Maglev: A fast and reliable software network load balancer." 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 2016. [6] Araújo, João Taveira, et al. "Balancing on the edge: Transport affinity without network state." 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 2018. [7] https://github.com/facebookincubator/katran [8] https://github.com/github/glb-director