Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sidecar Authentication For Reliable Microservices

Sidecar Authentication For Reliable Microservices

CloudNative Days Tokyo 2021での登壇資料です
https://event.cloudnativedays.jp/cndt2021/talks/1285

LINE Developers
PRO

November 04, 2021
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Sidecar Authentication for
    Reliable Microservices
    > Ran Xu / Verda Platform Development Infrastructure Engineer

    View Slide

  2. About Me
    ‣ Ran Xu (littledriver)


    ‣ Joined LINE in 2019/09


    ‣ Infrastructure Engineer of Managed Kubernetes
    Service


    View Slide

  3. Agenda
    - Introduction Of Verda ( LINE Private Cloud
    Platform )


    - Outage Case Study: Verda Identity Service


    - Analysis Of Identity Outage Case


    - Improvement Plan Of Identity Service


    View Slide

  4. Verda — LINE Private Cloud Platform

    View Slide

  5. WebUI
    API
    Verda Family
    Identity VM
    Bare


    metal
    DNS
    Load
    Balancer
    Object
    Storage
    Block
    Storage
    CDN
    IaaS
    Managed


    Database
    Managed


    Kafka
    Managed


    Kubernetes
    Managed
    Service
    Platform

    Service
    Function

    Service
    PaaS
    Backend Components
    User Interface

    View Slide

  6. Outage Case Study of Identity Service

    View Slide

  7. Outage Case Study Of Identity Service
    Impact Workaround Root Cause
    Issue

    View Slide

  8. Identity Service (Keystone) Is Down Due
    to Malicious Client Requests

    View Slide

  9. OUTAGE-2021-03
    - Infra Service Inside the Verda Platform that relies on Keystone Service are dowAdd your Text
    Impact
    Workaround
    - Restart keystone and Apache server
    Issue
    - Keystone service cannot handle the any request normally and looks like stuck
    Root Cause
    - ?

    View Slide

  10. Expected Keystone Request Handling Flow
    Apache Web Server
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Load & Start App
    Get /v3/auth
    Pre-Folk WSGI Server
    HTTP Request
    Keystone Service

    View Slide

  11. Actual Keystone Request Handling Flow
    Apache Web Server
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Load & Start App
    Get /v3/auth
    Pre-Folk WSGI Server
    HTTP Request
    Keystone Service
    Response: 'HTTP/1.1 500 Internal Server Error


    Date: Sun, 23 May 2021 14:21:12 GM
    T

    Server: Apach
    e

    Cache-Control: private, max-age=0, must-revalidat
    e

    Content-Length: 52
    7

    Connection: clos
    e

    Content-Type: text/html; charset=iso-8859-1\r\n\r\nbr/>"-//IETF//DTD HTML 2.0//EN">\n\n500 Internal Server
    Error………


    [Error]hit timeout: 20.0556459427 sec passed

    View Slide

  12. Actual Keystone Request Handling Flow
    Apache Web Server
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Load & Start App
    Get /v3/auth
    Pre-Folk WSGI Server
    HTTP Request
    Keystone Service
    ERROR keystone.common.wsgi [req-63ca9bf4-139c-4d76-9db4-3b1c26c7749d - - - - -] request data read erro
    r

    ERROR keystone.common.wsgi Traceback (most recent call last)
    :

    ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/wsgi.py", line 434, in __call_
    _

    ERROR keystone.common.wsgi response = self.process_request(request
    )

    ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/middleware/core.py", line 89, in
    process_reques
    t

    ERROR keystone.common.wsgi params_json = request.bod
    y

    ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 701, in _body__ge
    t

    ERROR keystone.common.wsgi self.make_body_seekable() # we need this to have content_lengt
    h

    ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 943, in make_body_seekabl
    e

    ERROR keystone.common.wsgi self.copy_body(
    )

    ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 966, in copy_bod
    y

    ERROR keystone.common.wsgi self.body = self.body_file.read(self.content_length
    )

    ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 1542, in readint
    o

    ERROR keystone.common.wsgi data = self.file.read(sz0
    )

    ERROR keystone.common.wsgi IOError: request data read error

    View Slide

  13. Actual Keystone Request Handling Flow
    Apache Web Server
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Load & Start App
    Get /v3/auth
    Pre-Folk WSGI Server
    HTTP Request
    Keystone Service
    mod_wsgi (pid=73664): Unable to get bucket brigade for request
    .

    mod_wsgi (pid=73664): Unable to get bucket brigade for request
    .

    mod_wsgi (pid=73664): Unable to get bucket brigade for request
    .

    mod_wsgi (pid=72590): Exception occurred processing WSGI script '/usr/bin/
    keystone-wsgi-public'
    .

    mod_wsgi (pid=72586): Exception occurred processing WSGI script '/usr/bin/
    keystone-wsgi-public'
    .

    IOError: failed to write dat
    a

    IOError: failed to write dat
    a

    View Slide

  14. Expected Keystone Request Handling Flow
    Apache Web Server
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Keystone WSGI-APP
    webob
    Load & Start App
    Get /v3/auth
    Pre-Folk WSGI Server
    HTTP Request
    Keystone Service



    View Slide

  15. OUTAGE-2021-03
    - Infra Service Inside the Verda Platform that relies on Keystone Service are dowAdd your Text
    Impact
    Workaround
    - restart keystone and Apache server
    Issue
    - Keystone service cannot handle the token issue and token validation request normally
    Root Cause
    - Keystone WSGI application is down due to malicious requests. And it isn’t re-executed by
    Keystone WSGI server.

    View Slide

  16. Keystone Service Is Down Due
    Requests Exceed Service Processing
    Capacity

    View Slide

  17. OUTAGE-2021-05
    - Infra Service Inside the Verda Platform that relies on Keystone Service are dowAdd your Text
    Impact
    Workaround
    - ?
    Issue
    - Keystone service cannot handle the any request normally and looks like stuck
    Root Cause
    - ?

    View Slide

  18. OUTAGE-2021-05 — Troubleshooting
    - Infra Service Inside the Verda Platform that relies on Keystone Service are dowAdd your Text
    Impact
    Workaround
    - ?
    Issue
    - Keystone service cannot handle the token issue and token validation request normally
    Root Cause
    - ?

    View Slide

  19. The 2.5M req / 15mins = 2,777 req / s came to the 3 keystone nodes
    Max Keystone QPS = 33 (req/s) * 12 (processes/node) * 3 (node) = 1,188 req/s
    Authentication API Latency 33ms / req
    OUTAGE-2021-05 — Troubleshooting

    View Slide

  20. {“duration”:"33.910ms","level":"info","method":"GET","msg":"access-
    log","path":"/v3/auth/
    tokens","requestID":"3fd49e24-18f2-4a61-8520-8ff42f5797c9", “User-
    Agent”: “vks-gw-
    api”,“status”:200,”time”:”9999-99-99T16:51:37.681Z”}
    OUTAGE-2021-05 — Troubleshooting

    View Slide

  21. OUTAGE-2021-05
    - Infra Service Inside the Verda Platform that relies on Keystone Service are dowAdd your Text
    Impact
    Workaround
    - ?
    Issue
    - Keystone service cannot handle the token issue and token validation request normally
    Root Cause
    - A large number of token validation requests from Verda Managed Kubernetes to keystone service
    exceed the handling ability of keystone service

    View Slide

  22. OUTAGE-2021-05
    Apache Web Server
    Apache Web Server
    New Keystone Service
    Old Keystone Service

    View Slide

  23. Analysis of Identity Outage Case

    View Slide

  24. Analysis Of Identity Outage Case
    Architecture Dependence

    View Slide

  25. Architecture — Keystone Request Distribution


    Almost all requests that comes to the Keystone service are related to
    authentication (token issue/token validation), within each 15mins time window
    Around 99.95 % of the load on keystone service are token validation, only 0.05
    % request are token issue, within each15mins time window

    View Slide

  26. Architecture — Keystone Request Distribution


    Apache Web Server
    Keystone Service
    Token Validation
    Token Issue
    Service Discovery
    Others

    View Slide

  27. Architecture — Keystone Patch Distribution
    Past 3 Years
    Custom Patches
    15
    Authentication Patch
    0
    Upstream Patches
    3

    View Slide

  28. Architecture — Token Issue & Token Validation


    Almost all requests that comes to the Keystone service are related to
    authentication (token issue/token validation), within each 15mins time window
    Around 99.95 % of the load on keystone service are token validation, only 0.05
    % request are token issue, within each15mins time window

    View Slide

  29. Architecture — Tech Direction


    Decouple authentication functionality from the original Keystone
    Around 99.95 % of the load on keystone service are token validation, only 0.05
    % request are token issue, within each15mins time window

    View Slide

  30. Dependence — Verda Infra Service ↔ Keystone Service


    Apache Web Server
    Token
    Validation
    Token Issue
    Service
    Others
    VKS
    VFS
    VBS
    VOS

    View Slide

  31. Dependence — Tech Direction


    Apache Web Server
    Token
    Validation
    Token Issue
    Service
    Others
    VKS
    VFS
    VBS
    VOS
    Token
    Validation
    Token
    Validation
    Token
    Validation
    Token
    Validation

    View Slide

  32. Improvement Plan of Identity Service

    View Slide

  33. Improvement Plan Of Identity Service
    Decouple
    Authentication From
    Keystone


    Stateless Token
    Validation
    Decouple
    Implement Stateless
    Token Validation In
    Edge Side


    View Slide

  34. Decouple Authentication From Keystone


    Deployment Workload
    Apache Web Server
    Keystone WSGI-Server
    Apache mod_wsgi
    Keystone WSGI-APP
    webob
    Service
    Ingress-Keystone
    Nginx-Ingress Controller
    Ingress-Nova
    Load Balancer

    View Slide

  35. Deployment Workload
    Service
    Ingress-Keystone-
    Auth
    Nginx-Ingress Controller
    Ingress-Keystone-
    Others
    Apache Web Server
    WSGI Server
    WSGI APP
    Deployment Workload
    Apache Web Server
    WSGI Server
    WSGI APP
    Get /v3/auth/tokens

    View Slide

  36. Improvement Plan Of Identity Service
    Decouple
    Authentication From
    Keystone


    Stateless Token
    Validation
    Decouple
    Implement Stateless
    Token Validation In
    Edge Side


    View Slide

  37. Token Issue


    Fernet Token Version
    Initialization Vector
    Current Timestamp
    Cipher Text
    HMAC
    Token Payload:
    Version / UserID / Method / Project ID / Expiry
    Time / Audit ID
    Fernet key

    View Slide

  38. Token Validation


    Fernet key
    Fernet key
    HMAC
    Token Payload:
    Version / UserID / Method / Project ID /
    Expiry Time / Audit ID
    Validate
    InValidate
    GET 'https://verda-stage-dev-keystone.linecorp-dev.com:5000/v3/auth/
    tokens'
    \

    --header 'Content-Type: application/json'
    \

    --header 'X-Auth-Token: administrative user> '
    \

    --header 'X-Subject-Token: '
    \

    View Slide

  39. Stateless Token Validation On Edge


    Store Fernet key safely and share it with clients
    Find a good way to inject token validation logic without intrusion for client code

    View Slide

  40. Stateless Token Validation On Edge


    Deployment Workload
    Nova
    Verda-Common-Proxy

    View Slide

  41. Stateless Token Validation On Edge


    API Schema
    Audit Log
    Metrics
    ACL
    Token
    Validation

    View Slide

  42. 5PLFO7BMJEBUJPO0O
    &EHF
    ,FZTUPOFŠ5PLFO
    *TTVF
    5PLFO
    7FSEB$PNNPO
    1SPYZ
    /PWB
    /FVUSPO7,4
    'FSOFU,FZ
    Validation


    - decrypt


    - expiration check
    7FSEB$PNNPO
    1SPYZ
    LFZTUPOF
    5PLFO
    Issue


    - payload


    - Sign & encrypt
    ,FZTUPOFŠ
    0SJHJOBM
    *OHSFTT
    7FSEB$PNNPO
    1SPYZ
    LFZTUPOF
    POST /v3/auth/tokens

    View Slide

  43. Thank You

    View Slide