Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sidecar Authentication For Reliable Microservices

Sidecar Authentication For Reliable Microservices

CloudNative Days Tokyo 2021での登壇資料です
https://event.cloudnativedays.jp/cndt2021/talks/1285

LINE Developers

November 04, 2021
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Sidecar Authentication for Reliable Microservices > Ran Xu / Verda

    Platform Development Infrastructure Engineer
  2. About Me ‣ Ran Xu (littledriver) ‣ Joined LINE in

    2019/09 ‣ Infrastructure Engineer of Managed Kubernetes Service
  3. Agenda - Introduction Of Verda ( LINE Private Cloud Platform

    ) - Outage Case Study: Verda Identity Service - Analysis Of Identity Outage Case - Improvement Plan Of Identity Service
  4. WebUI API Verda Family Identity VM Bare metal DNS Load

    Balancer Object Storage Block Storage CDN IaaS Managed Database Managed Kafka Managed Kubernetes Managed Service Platform 
 Service Function 
 Service PaaS Backend Components User Interface
  5. OUTAGE-2021-03 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - Restart keystone and Apache server Issue - Keystone service cannot handle the any request normally and looks like stuck Root Cause - ?
  6. Expected Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service
  7. Actual Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service Response: 'HTTP/1.1 500 Internal Server Error Date: Sun, 23 May 2021 14:21:12 GM T Server: Apach e Cache-Control: private, max-age=0, must-revalidat e Content-Length: 52 7 Connection: clos e Content-Type: text/html; charset=iso-8859-1\r\n\r\n<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>………</html> ’ [Error]hit timeout: 20.0556459427 sec passed
  8. Actual Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service ERROR keystone.common.wsgi [req-63ca9bf4-139c-4d76-9db4-3b1c26c7749d - - - - -] request data read erro r ERROR keystone.common.wsgi Traceback (most recent call last) : ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/wsgi.py", line 434, in __call_ _ ERROR keystone.common.wsgi response = self.process_request(request ) ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/middleware/core.py", line 89, in process_reques t ERROR keystone.common.wsgi params_json = request.bod y ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 701, in _body__ge t ERROR keystone.common.wsgi self.make_body_seekable() # we need this to have content_lengt h ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 943, in make_body_seekabl e ERROR keystone.common.wsgi self.copy_body( ) ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 966, in copy_bod y ERROR keystone.common.wsgi self.body = self.body_file.read(self.content_length ) ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/webob/request.py", line 1542, in readint o ERROR keystone.common.wsgi data = self.file.read(sz0 ) ERROR keystone.common.wsgi IOError: request data read error
  9. Actual Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service mod_wsgi (pid=73664): Unable to get bucket brigade for request . mod_wsgi (pid=73664): Unable to get bucket brigade for request . mod_wsgi (pid=73664): Unable to get bucket brigade for request . mod_wsgi (pid=72590): Exception occurred processing WSGI script '/usr/bin/ keystone-wsgi-public' . mod_wsgi (pid=72586): Exception occurred processing WSGI script '/usr/bin/ keystone-wsgi-public' . IOError: failed to write dat a IOError: failed to write dat a
  10. Expected Keystone Request Handling Flow Apache Web Server Keystone WSGI-Server

    Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Keystone WSGI-APP webob Keystone WSGI-APP webob Load & Start App Get /v3/auth Pre-Folk WSGI Server HTTP Request Keystone Service ⌛ ⌛ ⌛
  11. OUTAGE-2021-03 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - restart keystone and Apache server Issue - Keystone service cannot handle the token issue and token validation request normally Root Cause - Keystone WSGI application is down due to malicious requests. And it isn’t re-executed by Keystone WSGI server.
  12. OUTAGE-2021-05 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - ? Issue - Keystone service cannot handle the any request normally and looks like stuck Root Cause - ?
  13. OUTAGE-2021-05 — Troubleshooting - Infra Service Inside the Verda Platform

    that relies on Keystone Service are dowAdd your Text Impact Workaround - ? Issue - Keystone service cannot handle the token issue and token validation request normally Root Cause - ?
  14. The 2.5M req / 15mins = 2,777 req / s

    came to the 3 keystone nodes Max Keystone QPS = 33 (req/s) * 12 (processes/node) * 3 (node) = 1,188 req/s Authentication API Latency 33ms / req OUTAGE-2021-05 — Troubleshooting
  15. OUTAGE-2021-05 - Infra Service Inside the Verda Platform that relies

    on Keystone Service are dowAdd your Text Impact Workaround - ? Issue - Keystone service cannot handle the token issue and token validation request normally Root Cause - A large number of token validation requests from Verda Managed Kubernetes to keystone service exceed the handling ability of keystone service
  16. Architecture — Keystone Request Distribution Almost all requests that comes

    to the Keystone service are related to authentication (token issue/token validation), within each 15mins time window Around 99.95 % of the load on keystone service are token validation, only 0.05 % request are token issue, within each15mins time window
  17. Architecture — Keystone Request Distribution Apache Web Server Keystone Service

    Token Validation Token Issue Service Discovery Others
  18. Architecture — Token Issue & Token Validation Almost all requests

    that comes to the Keystone service are related to authentication (token issue/token validation), within each 15mins time window Around 99.95 % of the load on keystone service are token validation, only 0.05 % request are token issue, within each15mins time window
  19. Architecture — Tech Direction Decouple authentication functionality from the original

    Keystone Around 99.95 % of the load on keystone service are token validation, only 0.05 % request are token issue, within each15mins time window
  20. Dependence — Verda Infra Service ↔ Keystone Service Apache Web

    Server Token Validation Token Issue Service Others VKS VFS VBS VOS
  21. Dependence — Tech Direction Apache Web Server Token Validation Token

    Issue Service Others VKS VFS VBS VOS Token Validation Token Validation Token Validation Token Validation
  22. Improvement Plan Of Identity Service Decouple Authentication From Keystone Stateless

    Token Validation Decouple Implement Stateless Token Validation In Edge Side
  23. Decouple Authentication From Keystone Deployment Workload Apache Web Server Keystone

    WSGI-Server Apache mod_wsgi Keystone WSGI-APP webob Service Ingress-Keystone Nginx-Ingress Controller Ingress-Nova Load Balancer
  24. Deployment Workload Service Ingress-Keystone- Auth Nginx-Ingress Controller Ingress-Keystone- Others Apache

    Web Server WSGI Server WSGI APP Deployment Workload Apache Web Server WSGI Server WSGI APP Get /v3/auth/tokens
  25. Improvement Plan Of Identity Service Decouple Authentication From Keystone Stateless

    Token Validation Decouple Implement Stateless Token Validation In Edge Side
  26. Token Issue Fernet Token Version Initialization Vector Current Timestamp Cipher

    Text HMAC Token Payload: Version / UserID / Method / Project ID / Expiry Time / Audit ID Fernet key
  27. Token Validation Fernet key Fernet key HMAC Token Payload: Version

    / UserID / Method / Project ID / Expiry Time / Audit ID Validate InValidate GET 'https://verda-stage-dev-keystone.linecorp-dev.com:5000/v3/auth/ tokens' \ --header 'Content-Type: application/json' \ --header 'X-Auth-Token: <A valid authentication token for an administrative user> ' \ --header 'X-Subject-Token: '<the token that you want to validate> \
  28. Stateless Token Validation On Edge Store Fernet key safely and

    share it with clients Find a good way to inject token validation logic without intrusion for client code
  29. 5PLFO7BMJEBUJPO0O &EHF ,FZTUPOFŠ5PLFO *TTVF 5PLFO 7FSEB$PNNPO 1SPYZ /PWB /FVUSPO7,4 'FSOFU,FZ

    Validation - decrypt - expiration check 7FSEB$PNNPO 1SPYZ LFZTUPOF 5PLFO Issue - payload - Sign & encrypt ,FZTUPOFŠ 0SJHJOBM *OHSFTT 7FSEB$PNNPO 1SPYZ LFZTUPOF POST /v3/auth/tokens