Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sidecar Authentication For Reliable Microservices

Sidecar Authentication For Reliable Microservices

CloudNative Days Tokyo 2021での登壇資料です
https://event.cloudnativedays.jp/cndt2021/talks/1285

A3966f193f4bef226a0d3e3c1f728d7f?s=128

LINE Developers
PRO

November 04, 2021
Tweet

Transcript

  1. Sidecar Authentication for Reliable Microservices > Ran Xu / Verda

    Platform Development Infrastructure Engineer
  2. About Me ‣ Ran Xu (littledriver) ‣ Joined LINE in

    2019/09 ‣ Infrastructure Engineer of Managed Kubernetes Service
  3. Agenda - Introduction Of Verda ( LINE Private Cloud Platform

    ) - Outage Case Study: Verda Identity Service - Analysis Of Identity Outage Case - Improvement Plan Of Identity Service
  4. Verda — LINE Private Cloud Platform

  5. WebUI API Verda Family Identity VM Bare metal DNS Load

    Balancer Object Storage Block Storage CDN IaaS Managed Database Managed Kafka Managed Kubernetes Managed Service Platform 
 Service Function 
 Service PaaS Backend Components User Interface
  6. Outage Case Study of Identity Service

  7. Outage Case Study Of Identity Service Impact Workaround Root Cause

    Issue
  8. Identity Service (Keystone) Is Down Due to Malicious Client Requests

  9. OUTAGE-2021-03 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - Restart keystone and Apache server Issue - Keystone service cannot handle the any request normally and looks like stuck Root Cause - ?
  10. Expected Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service
  11. Actual Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service Response: 'HTTP/1.1 500 Internal Server Error Date: Sun, 23 May 2021 14:21:12 GM T Server: Apach e Cache-Control: private, max-age=0, must-revalidat e Content-Length: 52 7 Connection: clos e Content-Type: text/html; charset=iso-8859-1\r\n\r\n<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>………</html> ’ [Error]hit timeout: 20.0556459427 sec passed
  12. Actual Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service ERROR keystone.common.wsgi [req-63ca9bf4-139c-4d76-9db4-3b1c26c7749d - - - - -] request data read erro r ERROR keystone.common.wsgi Traceback (most recent call last) : ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/wsgi.py", line 434, in __call_ _ ERROR keystone.common.wsgi response = self.process_request(request ) ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/middleware/core.py", line 89, in process_reques t ERROR keystone.common.wsgi params_json = request.bod y ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 701, in _body__ge t ERROR keystone.common.wsgi self.make_body_seekable() # we need this to have content_lengt h ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 943, in make_body_seekabl e ERROR keystone.common.wsgi self.copy_body( ) ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 966, in copy_bod y ERROR keystone.common.wsgi self.body = self.body_file.read(self.content_length ) ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 1542, in readint o ERROR keystone.common.wsgi data = self.file.read(sz0 ) ERROR keystone.common.wsgi IOError: request data read error
  13. Actual Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service mod_wsgi (pid=73664): Unable to get bucket brigade for request . mod_wsgi (pid=73664): Unable to get bucket brigade for request . mod_wsgi (pid=73664): Unable to get bucket brigade for request . mod_wsgi (pid=72590): Exception occurred processing WSGI script '/usr/bin/ keystone-wsgi-public' . mod_wsgi (pid=72586): Exception occurred processing WSGI script '/usr/bin/ keystone-wsgi-public' . IOError: failed to write dat a IOError: failed to write dat a
  14. Expected Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service ⌛ ⌛ ⌛
  15. OUTAGE-2021-03 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - restart keystone and Apache server Issue - Keystone service cannot handle the token issue and token validation request normally Root Cause - Keystone WSGI application is down due to malicious requests. And it isn’t re-executed by Keystone WSGI server.
  16. Keystone Service Is Down Due Requests Exceed Service Processing Capacity

  17. OUTAGE-2021-05 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - ? Issue - Keystone service cannot handle the any request normally and looks like stuck Root Cause - ?
  18. OUTAGE-2021-05 — Troubleshooting - Infra Service Inside the Verda Platform

    that relies on Keystone Service are dowAdd your Text Impact Workaround - ? Issue - Keystone service cannot handle the token issue and token validation request normally Root Cause - ?
  19. The 2.5M req / 15mins = 2,777 req / s

    came to the 3 keystone nodes Max Keystone QPS = 33 (req/s) * 12 (processes/node) * 3 (node) = 1,188 req/s Authentication API Latency 33ms / req OUTAGE-2021-05 — Troubleshooting
  20. {“duration”:"33.910ms","level":"info","method":"GET","msg":"access- log","path":"/v3/auth/ tokens","requestID":"3fd49e24-18f2-4a61-8520-8ff42f5797c9", “User- Agent”: “vks-gw- api”,“status”:200,”time”:”9999-99-99T16:51:37.681Z”} OUTAGE-2021-05 — Troubleshooting

  21. OUTAGE-2021-05 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - ? Issue - Keystone service cannot handle the token issue and token validation request normally Root Cause - A large number of token validation requests from Verda Managed Kubernetes to keystone service exceed the handling ability of keystone service
  22. OUTAGE-2021-05 Apache Web Server Apache Web Server New Keystone Service

    Old Keystone Service
  23. Analysis of Identity Outage Case

  24. Analysis Of Identity Outage Case Architecture Dependence

  25. Architecture — Keystone Request Distribution Almost all requests that comes

    to the Keystone service are related to authentication (token issue/token validation), within each 15mins time window Around 99.95 % of the load on keystone service are token validation, only 0.05 % request are token issue, within each15mins time window
  26. Architecture — Keystone Request Distribution Apache Web Server Keystone Service

    Token Validation Token Issue Service Discovery Others
  27. Architecture — Keystone Patch Distribution Past 3 Years Custom Patches

    15 Authentication Patch 0 Upstream Patches 3
  28. Architecture — Token Issue & Token Validation Almost all requests

    that comes to the Keystone service are related to authentication (token issue/token validation), within each 15mins time window Around 99.95 % of the load on keystone service are token validation, only 0.05 % request are token issue, within each15mins time window
  29. Architecture — Tech Direction Decouple authentication functionality from the original

    Keystone Around 99.95 % of the load on keystone service are token validation, only 0.05 % request are token issue, within each15mins time window
  30. Dependence — Verda Infra Service ↔ Keystone Service Apache Web

    Server Token Validation Token Issue Service Others VKS VFS VBS VOS
  31. Dependence — Tech Direction Apache Web Server Token Validation Token

    Issue Service Others VKS VFS VBS VOS Token Validation Token Validation Token Validation Token Validation
  32. Improvement Plan of Identity Service

  33. Improvement Plan Of Identity Service Decouple Authentication From Keystone Stateless

    Token Validation Decouple Implement Stateless Token Validation In Edge Side
  34. Decouple Authentication From Keystone Deployment Workload Apache Web Server Keystone

    WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Service Ingress-Keystone Nginx-Ingress Controller Ingress-Nova Load Balancer
  35. Deployment Workload Service Ingress-Keystone- Auth Nginx-Ingress Controller Ingress-Keystone- Others Apache

    Web Server WSGI Server WSGI APP Deployment Workload Apache Web Server WSGI Server WSGI APP Get /v3/auth/tokens
  36. Improvement Plan Of Identity Service Decouple Authentication From Keystone Stateless

    Token Validation Decouple Implement Stateless Token Validation In Edge Side
  37. Token Issue Fernet Token Version Initialization Vector Current Timestamp Cipher

    Text HMAC Token Payload: Version / UserID / Method / Project ID / Expiry Time / Audit ID Fernet key
  38. Token Validation Fernet key Fernet key HMAC Token Payload: Version

    / UserID / Method / Project ID / Expiry Time / Audit ID Validate InValidate GET 'https://verda-stage-dev-keystone.linecorp-dev.com:5000/v3/auth/ tokens' \ --header 'Content-Type: application/json' \ --header 'X-Auth-Token: <A valid authentication token for an administrative user> ' \ --header 'X-Subject-Token: '<the token that you want to validate> \
  39. Stateless Token Validation On Edge Store Fernet key safely and

    share it with clients Find a good way to inject token validation logic without intrusion for client code
  40. Stateless Token Validation On Edge Deployment Workload Nova Verda-Common-Proxy

  41. Stateless Token Validation On Edge API Schema Audit Log Metrics

    ACL Token Validation
  42. 5PLFO7BMJEBUJPO0O &EHF ,FZTUPOFŠ5PLFO *TTVF 5PLFO 7FSEB$PNNPO 1SPYZ /PWB /FVUSPO7,4 'FSOFU,FZ

    Validation - decrypt - expiration check 7FSEB$PNNPO 1SPYZ LFZTUPOF 5PLFO Issue - payload - Sign & encrypt ,FZTUPOFŠ 0SJHJOBM *OHSFTT 7FSEB$PNNPO 1SPYZ LFZTUPOF POST /v3/auth/tokens
  43. Thank You