Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE TODAY - 微服務架構支撐千萬,活躍用戶的影音內容平臺

LINE TODAY - 微服務架構支撐千萬,活躍用戶的影音內容平臺

LINE Developers Taiwan
PRO

October 18, 2022
Tweet

More Decks by LINE Developers Taiwan

Other Decks in Technology

Transcript

  1. LINE TODAY - 微服務架構支撐千萬
    活躍用戶的影音內容平臺
    Libra Huang - LINE TODAY
    10/18/2022

    View Slide

  2. • LINE TODAY and its architecture
    • How miniservices and K8S changes our
    development and operation
    • Lessons learnt
    Agenda

    View Slide

  3. LINE TODAY Stays With You Today
    個人化訊息
    8:00
    財經天氣
    12:30
    國際/國內/娛樂
    生活內容
    15:00
    話題/投票/電影
    官方帳號
    18:00
    TODAY看世界
    20:00
    棒球/NBA/演唱會
    新聞內容
    直播

    View Slide

  4. Taiwan Thailand Hong Kong

    View Slide

  5. LINE TODAY Taiwan
    18M MAU

    View Slide

  6. CDN
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis

    View Slide

  7. CDN
    96%
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis
    4%

    View Slide

  8. CDN
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis

    View Slide

  9. CDN
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis

    View Slide

  10. CDN
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis

    View Slide

  11. CDN
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis

    View Slide

  12. CDN
    Object
    Storage
    In
    Mem
    Cach
    e
    Backend Server
    Frontend Server
    Cache
    Server
    Mini Service
    Feeding Service
    Internal CMS
    External CMS
    Vue.js
    Third Party
    API
    Content
    Provider
    Internal
    Editor
    External
    Editor
    Report Service
    Data Warehouse
    LINE TODAY Architecture
    ML Data
    Analysis

    View Slide

  13. • LINE TODAY and its architecture
    • How miniservices and K8S changes our
    development and operation
    • Refactor to mini services
    • Lessons learnt
    Agenda

    View Slide

  14. Small change requires entire
    system rebuild and deployment
    - Break down coarse-grained
    deployments into functionally
    cohesive mini services
    - Move to Kubernetes
    Problem
    Solutions
    How to improve development and
    deployment efficiency?
    Module Module
    Module Module
    Module Module
    Module
    Module
    Module
    Module
    Module
    Module
    Module
    deployment (eg. war file)
    OCI image OCI image OCI image

    View Slide

  15. Migrate to mini services and Kubernetes
    Article Service
    Subscription Service
    Interaction Service
    Frontend Server
    Cache
    Server
    Ingress
    Controller
    Kubernetes
    Web Server (VM) API Server (VM)
    Observability
    logs
    metrics
    tracing
    Mini Services
    CD

    View Slide

  16. • LINE TODAY and its architecture
    • How miniservices and K8S changes our
    development and operation
    • Refactor to mini services
    • Refine CI/CD
    • Lessons learnt
    Agenda

    View Slide

  17. Build and test what was changed
    service1 service2
    libA
    Changing service1
    => rebuild service1
    service1 service2
    libA
    Changing libA => rebuild
    libA, service1, service2
    libB libB

    View Slide

  18. Manage deployment via GitOps
    git repo
    Deploy
    Process
    service(s)
    version(s)
    update
    manifest
    ArgoCD +
    Kustomize

    View Slide

  19. • LINE TODAY and its architecture
    • How miniservices and K8S changes our
    development and operation
    • Refactor to mini services
    • Refine CI/CD
    • Integrate with observability
    • Lessons learnt
    Agenda

    View Slide

  20. Observability improves operation efficiencies
    Article Mini Service
    Subscription Mini Service
    Interaction Mini Service
    Frontend Server
    Cache
    Server
    Ingress
    Controller
    Kubernetes
    Web Server (VM) API Server (VM)
    Observability
    logs
    metrics
    tracing
    Mini Services
    CD

    View Slide

  21. Observability - service RPS / latency metrics

    View Slide

  22. Observability - metrics week-over-week

    View Slide

  23. Observability - abnormal spikes

    View Slide

  24. Troubleshooting - 1. receive alert from slack

    View Slide

  25. Troubleshooting - 2. link to alert panel and access logs

    View Slide

  26. Troubleshooting - 3. open trace viewer

    View Slide

  27. Troubleshooting via observability
    Alerts Metrics Logs Traces Logs
    Received
    alerts from
    slack
    Check error
    source and
    time period
    Inspect
    access logs
    Open
    trace
    viewer
    Jump to
    service logs
    of the trace
    Exemplars Split view
    with labels
    Metric
    queries
    Span metrics
    processor
    Trace to logs
    Followed Trace ID
    Metrics
    Traces Logs

    View Slide

  28. • LINE TODAY and its architecture
    • How miniservices and K8S changes our
    development and operation
    • Refactor to mini services
    • Refine CI/CD
    • Integrate with observability
    • Leverage K8S CronJob
    • Lessons learnt
    Agenda

    View Slide

  29. Run periodic simple tasks (java) on Kubernetes
    • Requirements
    • run at the specific time /
    interval
    • simple
    • concurrency control
    • running history and logs
    • monitor
    • easy to run in local and test
    env
    • Options
    • Spring @scheduled
    • Quartz
    • Spring Cloud Data Flow
    • AirFlow
    • K8S CronJob

    View Slide

  30. Use K8S CronJob to run periodic tasks

    View Slide

  31. K8S CronJob metrics

    View Slide

  32. • LINE TODAY and its architecture
    • How miniservices and K8S changes our
    development and operation
    • Lessons learnt
    Agenda

    View Slide

  33. Issue - intermittent errors during rolling update
    K8S API
    Server
    kube-proxy
    kubelet Pod
    Worker
    node
    kube-proxy
    Worker
    node
    1a delete pod
    1b. remove pod from
    service endpoint

    View Slide

  34. Solution - graceful shutdown
    Main container process
    SIGTERM SIGKILL
    Pre-stop hook
    Container killed
    (if running)
    Container shutdown
    deployment
    manifest
    spring boot application.yaml
    • Existing services allowed to
    complete
    • No new requests permitted
    K8S API
    Server
    kube-proxy
    kubelet Pod
    Worker
    node
    kube-proxy
    Worker
    node
    delete
    pod
    remove pod from
    service endpoint

    View Slide

  35. Issue - unpredictable request spikes
    • pod removed from
    endpoint at 30ish
    seconds
    • fewer pods
    available to serve
    requests
    • requeusts queue up
    • pod restarted at
    60ish seconds
    • downward spiral

    View Slide

  36. Options to handle request spikes - it depends
    • Overprovision
    • $$$$
    • Auto scaling
    • pod - 20+ seconds
    • node - ~5 minutes
    • serverless (lambda) - seconds
    • Protection via ingress controller / api gateway
    • circuit breaker - 503
    • rate-limit - 429
    • Improve design

    View Slide

  37. Issue - job killed without error log

    View Slide

  38. Root cause: linux kernel memory leak
    reboot

    View Slide

  39. • LINE TODAY and its architecture
    • Mini services and K8S helps dev / ops efficiency for large systems
    • Refactor to mini services
    • Refine CI/CD
    • Integrate with observability
    • Leverage K8S CronJob
    • Build in-depth DevOps and K8S skills
    Summary

    View Slide

  40. Thank you

    View Slide