Slide 1

Slide 1 text

LINE TODAY - 微服務架構支撐千萬 活躍用戶的影音內容平臺 Libra Huang - LINE TODAY 10/18/2022

Slide 2

Slide 2 text

• LINE TODAY and its architecture • How miniservices and K8S changes our development and operation • Lessons learnt Agenda

Slide 3

Slide 3 text

LINE TODAY Stays With You Today 個人化訊息 8:00 財經天氣 12:30 國際/國內/娛樂 生活內容 15:00 話題/投票/電影 官方帳號 18:00 TODAY看世界 20:00 棒球/NBA/演唱會 新聞內容 直播

Slide 4

Slide 4 text

Taiwan Thailand Hong Kong

Slide 5

Slide 5 text

LINE TODAY Taiwan 18M MAU

Slide 6

Slide 6 text

CDN Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis

Slide 7

Slide 7 text

CDN 96% Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis 4%

Slide 8

Slide 8 text

CDN Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis

Slide 9

Slide 9 text

CDN Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis

Slide 10

Slide 10 text

CDN Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis

Slide 11

Slide 11 text

CDN Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis

Slide 12

Slide 12 text

CDN Object Storage In Mem Cach e Backend Server Frontend Server Cache Server Mini Service Feeding Service Internal CMS External CMS Vue.js Third Party API Content Provider Internal Editor External Editor Report Service Data Warehouse LINE TODAY Architecture ML Data Analysis

Slide 13

Slide 13 text

• LINE TODAY and its architecture • How miniservices and K8S changes our development and operation • Refactor to mini services • Lessons learnt Agenda

Slide 14

Slide 14 text

Small change requires entire system rebuild and deployment - Break down coarse-grained deployments into functionally cohesive mini services - Move to Kubernetes Problem Solutions How to improve development and deployment efficiency? Module Module Module Module Module Module Module Module Module Module Module Module Module deployment (eg. war file) OCI image OCI image OCI image

Slide 15

Slide 15 text

Migrate to mini services and Kubernetes Article Service Subscription Service Interaction Service Frontend Server Cache Server Ingress Controller Kubernetes Web Server (VM) API Server (VM) Observability logs metrics tracing Mini Services CD

Slide 16

Slide 16 text

• LINE TODAY and its architecture • How miniservices and K8S changes our development and operation • Refactor to mini services • Refine CI/CD • Lessons learnt Agenda

Slide 17

Slide 17 text

Build and test what was changed service1 service2 libA Changing service1 => rebuild service1 service1 service2 libA Changing libA => rebuild libA, service1, service2 libB libB

Slide 18

Slide 18 text

Manage deployment via GitOps git repo Deploy Process service(s) version(s) update manifest ArgoCD + Kustomize

Slide 19

Slide 19 text

• LINE TODAY and its architecture • How miniservices and K8S changes our development and operation • Refactor to mini services • Refine CI/CD • Integrate with observability • Lessons learnt Agenda

Slide 20

Slide 20 text

Observability improves operation efficiencies Article Mini Service Subscription Mini Service Interaction Mini Service Frontend Server Cache Server Ingress Controller Kubernetes Web Server (VM) API Server (VM) Observability logs metrics tracing Mini Services CD

Slide 21

Slide 21 text

Observability - service RPS / latency metrics

Slide 22

Slide 22 text

Observability - metrics week-over-week

Slide 23

Slide 23 text

Observability - abnormal spikes

Slide 24

Slide 24 text

Troubleshooting - 1. receive alert from slack

Slide 25

Slide 25 text

Troubleshooting - 2. link to alert panel and access logs

Slide 26

Slide 26 text

Troubleshooting - 3. open trace viewer

Slide 27

Slide 27 text

Troubleshooting via observability Alerts Metrics Logs Traces Logs Received alerts from slack Check error source and time period Inspect access logs Open trace viewer Jump to service logs of the trace Exemplars Split view with labels Metric queries Span metrics processor Trace to logs Followed Trace ID Metrics Traces Logs

Slide 28

Slide 28 text

• LINE TODAY and its architecture • How miniservices and K8S changes our development and operation • Refactor to mini services • Refine CI/CD • Integrate with observability • Leverage K8S CronJob • Lessons learnt Agenda

Slide 29

Slide 29 text

Run periodic simple tasks (java) on Kubernetes • Requirements • run at the specific time / interval • simple • concurrency control • running history and logs • monitor • easy to run in local and test env • Options • Spring @scheduled • Quartz • Spring Cloud Data Flow • AirFlow • K8S CronJob

Slide 30

Slide 30 text

Use K8S CronJob to run periodic tasks

Slide 31

Slide 31 text

K8S CronJob metrics

Slide 32

Slide 32 text

• LINE TODAY and its architecture • How miniservices and K8S changes our development and operation • Lessons learnt Agenda

Slide 33

Slide 33 text

Issue - intermittent errors during rolling update K8S API Server kube-proxy kubelet Pod Worker node kube-proxy Worker node 1a delete pod 1b. remove pod from service endpoint

Slide 34

Slide 34 text

Solution - graceful shutdown Main container process SIGTERM SIGKILL Pre-stop hook Container killed (if running) Container shutdown deployment manifest spring boot application.yaml • Existing services allowed to complete • No new requests permitted K8S API Server kube-proxy kubelet Pod Worker node kube-proxy Worker node delete pod remove pod from service endpoint

Slide 35

Slide 35 text

Issue - unpredictable request spikes • pod removed from endpoint at 30ish seconds • fewer pods available to serve requests • requeusts queue up • pod restarted at 60ish seconds • downward spiral

Slide 36

Slide 36 text

Options to handle request spikes - it depends • Overprovision • $$$$ • Auto scaling • pod - 20+ seconds • node - ~5 minutes • serverless (lambda) - seconds • Protection via ingress controller / api gateway • circuit breaker - 503 • rate-limit - 429 • Improve design

Slide 37

Slide 37 text

Issue - job killed without error log

Slide 38

Slide 38 text

Root cause: linux kernel memory leak reboot

Slide 39

Slide 39 text

• LINE TODAY and its architecture • Mini services and K8S helps dev / ops efficiency for large systems • Refactor to mini services • Refine CI/CD • Integrate with observability • Leverage K8S CronJob • Build in-depth DevOps and K8S skills Summary

Slide 40

Slide 40 text

Thank you