LINE TODAY - 微服務架構支撐千萬
活躍用戶的影音內容平臺
Libra Huang - LINE TODAY
10/18/2022
Slide 2
Slide 2 text
• LINE TODAY and its architecture
• How miniservices and K8S changes our
development and operation
• Lessons learnt
Agenda
Slide 3
Slide 3 text
LINE TODAY Stays With You Today
個人化訊息
8:00
財經天氣
12:30
國際/國內/娛樂
生活內容
15:00
話題/投票/電影
官方帳號
18:00
TODAY看世界
20:00
棒球/NBA/演唱會
新聞內容
直播
Slide 4
Slide 4 text
Taiwan Thailand Hong Kong
Slide 5
Slide 5 text
LINE TODAY Taiwan
18M MAU
Slide 6
Slide 6 text
CDN
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
Slide 7
Slide 7 text
CDN
96%
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
4%
Slide 8
Slide 8 text
CDN
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
Slide 9
Slide 9 text
CDN
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
Slide 10
Slide 10 text
CDN
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
Slide 11
Slide 11 text
CDN
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
Slide 12
Slide 12 text
CDN
Object
Storage
In
Mem
Cach
e
Backend Server
Frontend Server
Cache
Server
Mini Service
Feeding Service
Internal CMS
External CMS
Vue.js
Third Party
API
Content
Provider
Internal
Editor
External
Editor
Report Service
Data Warehouse
LINE TODAY Architecture
ML Data
Analysis
Slide 13
Slide 13 text
• LINE TODAY and its architecture
• How miniservices and K8S changes our
development and operation
• Refactor to mini services
• Lessons learnt
Agenda
Slide 14
Slide 14 text
Small change requires entire
system rebuild and deployment
- Break down coarse-grained
deployments into functionally
cohesive mini services
- Move to Kubernetes
Problem
Solutions
How to improve development and
deployment efficiency?
Module Module
Module Module
Module Module
Module
Module
Module
Module
Module
Module
Module
deployment (eg. war file)
OCI image OCI image OCI image
Slide 15
Slide 15 text
Migrate to mini services and Kubernetes
Article Service
Subscription Service
Interaction Service
Frontend Server
Cache
Server
Ingress
Controller
Kubernetes
Web Server (VM) API Server (VM)
Observability
logs
metrics
tracing
Mini Services
CD
Slide 16
Slide 16 text
• LINE TODAY and its architecture
• How miniservices and K8S changes our
development and operation
• Refactor to mini services
• Refine CI/CD
• Lessons learnt
Agenda
Slide 17
Slide 17 text
Build and test what was changed
service1 service2
libA
Changing service1
=> rebuild service1
service1 service2
libA
Changing libA => rebuild
libA, service1, service2
libB libB
Slide 18
Slide 18 text
Manage deployment via GitOps
git repo
Deploy
Process
service(s)
version(s)
update
manifest
ArgoCD +
Kustomize
Slide 19
Slide 19 text
• LINE TODAY and its architecture
• How miniservices and K8S changes our
development and operation
• Refactor to mini services
• Refine CI/CD
• Integrate with observability
• Lessons learnt
Agenda
Slide 20
Slide 20 text
Observability improves operation efficiencies
Article Mini Service
Subscription Mini Service
Interaction Mini Service
Frontend Server
Cache
Server
Ingress
Controller
Kubernetes
Web Server (VM) API Server (VM)
Observability
logs
metrics
tracing
Mini Services
CD
Slide 21
Slide 21 text
Observability - service RPS / latency metrics
Slide 22
Slide 22 text
Observability - metrics week-over-week
Slide 23
Slide 23 text
Observability - abnormal spikes
Slide 24
Slide 24 text
Troubleshooting - 1. receive alert from slack
Slide 25
Slide 25 text
Troubleshooting - 2. link to alert panel and access logs
Slide 26
Slide 26 text
Troubleshooting - 3. open trace viewer
Slide 27
Slide 27 text
Troubleshooting via observability
Alerts Metrics Logs Traces Logs
Received
alerts from
slack
Check error
source and
time period
Inspect
access logs
Open
trace
viewer
Jump to
service logs
of the trace
Exemplars Split view
with labels
Metric
queries
Span metrics
processor
Trace to logs
Followed Trace ID
Metrics
Traces Logs
Slide 28
Slide 28 text
• LINE TODAY and its architecture
• How miniservices and K8S changes our
development and operation
• Refactor to mini services
• Refine CI/CD
• Integrate with observability
• Leverage K8S CronJob
• Lessons learnt
Agenda
Slide 29
Slide 29 text
Run periodic simple tasks (java) on Kubernetes
• Requirements
• run at the specific time /
interval
• simple
• concurrency control
• running history and logs
• monitor
• easy to run in local and test
env
• Options
• Spring @scheduled
• Quartz
• Spring Cloud Data Flow
• AirFlow
• K8S CronJob
Slide 30
Slide 30 text
Use K8S CronJob to run periodic tasks
Slide 31
Slide 31 text
K8S CronJob metrics
Slide 32
Slide 32 text
• LINE TODAY and its architecture
• How miniservices and K8S changes our
development and operation
• Lessons learnt
Agenda
Slide 33
Slide 33 text
Issue - intermittent errors during rolling update
K8S API
Server
kube-proxy
kubelet Pod
Worker
node
kube-proxy
Worker
node
1a delete pod
1b. remove pod from
service endpoint
Slide 34
Slide 34 text
Solution - graceful shutdown
Main container process
SIGTERM SIGKILL
Pre-stop hook
Container killed
(if running)
Container shutdown
deployment
manifest
spring boot application.yaml
• Existing services allowed to
complete
• No new requests permitted
K8S API
Server
kube-proxy
kubelet Pod
Worker
node
kube-proxy
Worker
node
delete
pod
remove pod from
service endpoint
Slide 35
Slide 35 text
Issue - unpredictable request spikes
• pod removed from
endpoint at 30ish
seconds
• fewer pods
available to serve
requests
• requeusts queue up
• pod restarted at
60ish seconds
• downward spiral
Slide 36
Slide 36 text
Options to handle request spikes - it depends
• Overprovision
• $$$$
• Auto scaling
• pod - 20+ seconds
• node - ~5 minutes
• serverless (lambda) - seconds
• Protection via ingress controller / api gateway
• circuit breaker - 503
• rate-limit - 429
• Improve design
Slide 37
Slide 37 text
Issue - job killed without error log
Slide 38
Slide 38 text
Root cause: linux kernel memory leak
reboot
Slide 39
Slide 39 text
• LINE TODAY and its architecture
• Mini services and K8S helps dev / ops efficiency for large systems
• Refactor to mini services
• Refine CI/CD
• Integrate with observability
• Leverage K8S CronJob
• Build in-depth DevOps and K8S skills
Summary