Slide 1

Slide 1 text

A story of migration from Docker Swarm to Kubernetes Shigeyuki Fujishima LINE Fukuoka CLOUDNATIVE DAYS FUKUOKA 2019

Slide 2

Slide 2 text

Introduction - I am ● LINE 2015.04 - 2017.08 ● LINE Fukuoka 2017.09 - Present ● Experienced ○ Develop/Operate In-house deployment/monitoring system ○ DevOps/SRE-ish role ● Current ○ A member of private cloud "Verda" developers in LINE group 2

Slide 3

Slide 3 text

Introduction - “Verda” is 3 ● Name of the private cloud service in LINE ● Helpful services for LINE developers ● Many kind of as-a-service ○ EC2-like VM / Bare metal service ○ Object Storage/CDN ○ Databases (MySQL, Elasticsearch, Redis) ○ Load balancer ○ Kubernetes ○ Heroku-ish service ○ Functions a.k.a Serverless ○ And more

Slide 4

Slide 4 text

● From developer’s view ○ Simple load balancer architecture ■ Developers had “toils” to control the configurations. ● e.g, URL routing ● Update Server certs ● Their own layer 7 for their own. Project Overview - Past in the day 4

Slide 5

Slide 5 text

Project Overview - Past in the day ● From layer4 LB’s view ○ Huge TCP connection ○ Shared resource ○ Active-Standby ○ Hardware load balancer 5

Slide 6

Slide 6 text

Project Overview - Past in the day ● From layer 4LB’s view ○ Huge TCP connection 6

Slide 7

Slide 7 text

Project Overview - Past in the day ● From L4LB’s view ○ Huge TCP connection ■ Stateful session ■ Session table shortage 7

Slide 8

Slide 8 text

Project Overview - Past in the day ● From L4LB’s view ○ Shared resource ■ Noisy neighbor ■ Cascading failure 8

Slide 9

Slide 9 text

Project Overview - Past in the day ● From L4LB’s view ○ Active-Standby ■ 2N: Always double 9

Slide 10

Slide 10 text

Project Overview - Past in the day ● From L4LB’s view ○ Hardware load balancer ■ Getting outdated 10

Slide 11

Slide 11 text

Project Overview - Problems to be solved ● Scalability ○ Easy to scale out / in ● Flexiblility ○ Easy to update/upgrade ● Isolation ○ Limit a failure domain 11

Slide 12

Slide 12 text

Project Overview - Concept 12 Before After

Slide 13

Slide 13 text

Project Story - Phase 1. Docker Swarm - Overview ● Started on 2016 ● Software-based load balancer ● Containerized by Docker ○ Linux namespace / cgroup ● Orchestration by Docker Swarm ○ Standalone mode (not swarm mode) ○ Low cost of learning and development ● Packet processing by XDP on L4 ○ ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作す るに至ったのか〜 13

Slide 14

Slide 14 text

Project Story - Phase 1. Docker Swarm - Overview 14

Slide 15

Slide 15 text

Project Story - Phase 1. Docker Swarm - Result ● Docker / Docker Swarm give us ○ Scalability ■ Container technology ○ Flexibility ■ Good APIs ○ Isolation ■ Linux namespace and cgorup ...Everything goes well? 15

Slide 16

Slide 16 text

Project Story - Phase 1. New problems 16 ● Docker Swarm ○ Auto-Scalability of container ■ No support out of the box ○ Docker Integrated with Kubernetes ■ Docker captain said “Swarm is alive and well.” but… ● Implementation ○ 1VIP in 1Container ○ Resource efficiency issue again

Slide 17

Slide 17 text

Project Story - Problems to be solved in next phase ● Enable to auto-scale ● Migration from Docker Swarm to Kubernetes ● Better accommodations ○ Put VIPs in a single server much more 17

Slide 18

Slide 18 text

Project Story - Phase 2. Kubernetes - #1 ● Put VIPs in a single container as many as possible ○ Configure like as Virtual Host ● Noisy neighbor? ○ Deploy many pods in low cost machine ○ Incoming traffic is supposed to be balnaced 18 Before After

Slide 19

Slide 19 text

Project Story - Phase 2. Kubernetes - #2 ● Auto-scale ○ In progress... ● Difficulties ○ Handle unpredicted situation ○ Lightning fast scaling ○ “Graceful” shutdown ■ Graceful…? ● Keep connection even when the Pod is going down ● Share communication resources such as socket…? 19

Slide 20

Slide 20 text

Project Story - Phase 2. Kubernetes - #3 ● Re-configure network ● NAPT-less networking using Calico 20

Slide 21

Slide 21 text

● One of official CNI ○ Used by ■ Yahoo! Japan ■ Google Cloud Platform ● “Pure” L3 network ○ No overlay ○ BGP based IP routing ● Direct reachability to Pods from out of cluster Project Story - Phase 2. Kubernetes - Calico 21

Slide 22

Slide 22 text

Project Story - Phase 2. Kubernetes - Present ● (Solved) Better accommodations ○ Virtual Host approach ● (In progress) Enable to auto-scale ○ Need more time... ● (Solve) Migration from Docker Swarm to Kubernetes ○ Direct connectivity to Pods by Calico ● New challenges ○ Intelligent resource scheduling ○ Better communication between L4 and L7 ○ Graceful upstream draining ○ and more... 22

Slide 23

Slide 23 text

Wrap Up ● From Legacy to Container ○ Scalability ○ Flexibility ○ Isolation ● From Docker Swarm to Kubernetes ○ Cloudnative networking by Calico ○ Possibility of advanced features ■ Auto-scaling ■ Intelligent resource scheduling 23

Slide 24

Slide 24 text

Side Story: How we implement L7 networking on k8s ● Pod is not visible from outside of the cluster ○ Because nobody know Pod IP address. 24

Slide 25

Slide 25 text

Side Story: How we implement L7 networking on k8s ● Calico can advertises their IP address with BGP. 25

Slide 26

Slide 26 text

Side Story: How we implement L7 networking on k8s 26

Slide 27

Slide 27 text

Appendix - Publication ● LINE's Infrastructure Platform: How It Scales Massive Services and Maintains Low Operational Cost ● 自作ロードバランサー開発 ● ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに 至ったのか〜 ● LINE Engineerを支える CaaS基盤の今とこれから 27

Slide 28

Slide 28 text

28 Ask LINERs!

Slide 29

Slide 29 text

29 Ask LINERs!

Slide 30

Slide 30 text

30 Thank you Shigeyuki Fujishima LINE Fukuoka CLOUDNATIVE DAYS FUKUOKA 2019