Save 37% off PRO during our Black Friday Sale! »

Roll-your-own API Management Platform with NGINX and Lua

Ce461b57b6a1f65ce5b6cc1c124673e3?s=47 Sean Cribbs
September 24, 2015

Roll-your-own API Management Platform with NGINX and Lua

We recently replaced a proprietary API management solution with an in-house implementation built with NGINX and Lua that let us get to a continuous delivery practice in a handful of months. Learn about our development process and the overall architecture that allowed us to write minimal amounts of code, enjoying native code performance while permitting interactive coding, and how we leveraged other open source tools like Vagrant, Ansible, and OpenStack to build an automation-rich delivery pipeline. We will also take an in-depth look at our capacity management approach that differs from the rate limiting concept prevalent in the API community.

Ce461b57b6a1f65ce5b6cc1c124673e3?s=128

Sean Cribbs

September 24, 2015
Tweet

Transcript

  1. Roll-your-own API Management Platform 
 with NGINX and Lua Sean

    Cribbs, Comcast Cable @seancribbs
  2. About me

  3. Background

  4. None
  5. Consumer

  6. Internal Consumer

  7. Partner Internal Consumer

  8. API Management

  9. API Management access control capacity management

  10. CodeBig 1 API Consumer

  11. CodeBig 1 API Consumer CDN • traffic shaping • caching

  12. CodeBig 1 API Consumer CDN • traffic shaping • caching

    • access control • rate limiting Vendor APIM
  13. CodeBig 1 API Consumer CDN • traffic shaping • caching

    • access control • rate limiting Vendor APIM Internet Comcast LB • DNS RR • VIP
  14. CodeBig 1 API Consumer CDN • traffic shaping • caching

    • access control • rate limiting Vendor APIM • DMZ intermediary • path-host mapping iAuth Internet Comcast LB • DNS RR • VIP
  15. CodeBig 1 API Consumer CDN • traffic shaping • caching

    • access control • rate limiting Vendor APIM • DMZ intermediary • path-host mapping iAuth Internet Comcast Origin APIs LB • DNS RR • VIP
  16. Challenges

  17. Challenges visibility

  18. Challenges visibility responsibility

  19. Challenges visibility responsibility scope

  20. Challenges visibility responsibility scope latency

  21. Challenges visibility responsibility scope latency security

  22. CodeBig 2

  23. CodeBig 2 simplify architecture

  24. CodeBig 2 simplify architecture increase visibility

  25. CodeBig 2 simplify architecture increase visibility use open-source tools

  26. Architecture

  27. custom  logic HTTP Proxy

  28. Lua

  29. -­‐-­‐  Validates  the  OAuth  signature   -­‐-­‐  @return  Const.HTTP_UNAUTHORIZED  if

     either  the  key  or  signature  is  invalid   -­‐-­‐  this  method  is  internal  and  should  not  be  called  directly   function  _M.validate_signature(self)          local  headers  =  self.req.get_oauth_params()          local  key  =  headers[Const.OAUTH_CONSUMER_KEY]          local  keyconf  =  self.conf.keys[key]          if  keyconf  ==  nil  then                  return  {                          code  =  Const.HTTP_UNAUTHORIZED,                          error  =  Const.ERROR_INVALID_CONSUMER_KEY                  }          end          local  sig  =  get_hmac_signature(self.req,  keyconf.secret)          if  sig  ~=  headers[Const.OAUTH_SIGNATURE]  then                  return  {                          code  =  Const.HTTP_UNAUTHORIZED,                          error  =  Const.ERROR_INVALID_SIGNATURE                  }          end   end
  30. Lua < 3K LoC!!

  31. Intra-Datacenter VIP

  32. Intra-Datacenter VIP haproxy haproxy …

  33. Intra-Datacenter VIP … haproxy haproxy …

  34. Intra-Datacenter VIP Origin APIs … haproxy haproxy …

  35. DC3 VIP DC2 VIP DC1 VIP Cross-Datacenter

  36. DC3 VIP DC2 VIP DC1 VIP Cross-Datacenter dc1-­‐vip.    

             A              10.1.0.1 dc2-­‐vip.              A              10.2.0.1 dc3-­‐vip.              A              10.3.0.1
  37. DC3 VIP DC2 VIP DC1 VIP Cross-Datacenter vod vod acct

    acct dc1-­‐vip.              A              10.1.0.1 dc2-­‐vip.              A              10.2.0.1 dc3-­‐vip.              A              10.3.0.1
  38. DC3 VIP DC2 VIP DC1 VIP Cross-Datacenter vod vod acct

    acct dc1-­‐vip.              A              10.1.0.1 dc2-­‐vip.              A              10.2.0.1 dc3-­‐vip.              A              10.3.0.1 vod-­‐dc1.              CNAME      dc1-­‐vip. vod-­‐dc2.              CNAME      dc2-­‐vip.
  39. DC3 VIP DC2 VIP DC1 VIP Cross-Datacenter vod vod acct

    acct dc1-­‐vip.              A              10.1.0.1 dc2-­‐vip.              A              10.2.0.1 dc3-­‐vip.              A              10.3.0.1 vod-­‐dc1.              CNAME      dc1-­‐vip. vod-­‐dc2.              CNAME      dc2-­‐vip. vod-­‐dc1-­‐fo.        CNAME      dc1-­‐vip. vod-­‐dc1-­‐fo.        CNAME      dc2-­‐vip.
  40. DC3 VIP DC2 VIP DC1 VIP Cross-Datacenter vod vod acct

    acct dc1-­‐vip.              A              10.1.0.1 dc2-­‐vip.              A              10.2.0.1 dc3-­‐vip.              A              10.3.0.1 vod-­‐dc1.              CNAME      dc1-­‐vip. vod-­‐dc2.              CNAME      dc2-­‐vip. vod-­‐dc1-­‐fo.        CNAME      dc1-­‐vip. vod-­‐dc1-­‐fo.        CNAME      dc2-­‐vip. vod.                      CNAME      vod-­‐dc1. vod.                      CNAME      vod-­‐dc2.
  41. Capacity Management

  42. N = XR

  43. N = XR # concurrent requests

  44. N = XR # concurrent requests transaction rate

  45. N = XR # concurrent requests transaction rate response time

  46. N = XR # concurrent requests transaction rate response time

    Little’s Law
  47. client origin APIM 2 req/s 1s N = XR =

    2 req/s x 1s = 2 req
  48. client origin APIM 2 req/s 10s N = XR =

    2 req/s x 10s = 20 req
  49. client origin APIM 2 req/s 10s N = XR =

    2 req/s x 10s = 20 req
  50. Concurrent Request Limiting lua_shared_dict            

     counts    50M;   access_by_lua          …      +1   log_by_lua                …      -­‐1
  51. Testing

  52. function  TestOAuth1:test_reject_request_when_timestamp_expired()          -­‐-­‐  default  timestamp  is

     01/01/2014  UTC  so  it's  definitely  stale          local  conf  =  Conf:new({validate_timestamp  =  true,  ttl  =  300})          local  header  =  Header:new()          local  req  =  Req:new({oauth_params  =  header})          local  oauth  =  OAuth1:new(conf,  req)          local  res  =  oauth:authorize()          assertEquals(res.code,  Const.HTTP_UNAUTHORIZED)          assertEquals(res.error,  Const.ERROR_EXPIRED_TIMESTAMP)   end   lu  =  LuaUnit.new()   lu:setOutputType("tap")   os.exit(lu:runSuite())
  53. ngx.log(ngx.ERR,  “oops”)

  54. function  get_oauth_params_from_auth_header(env)      env  =  env  or  ngx  

       local  auth_hdrs  =  env.req.get_headers()["Authorization"]      ...
  55. function  TestRequest:test_retrieve_oauth_params_from_header()      local  header  =  [[Oauth  realm="example.com",  ]]

             ..  [[oauth_consumer_key="mykey",]]          ..  [[oauth_version="1.0"]]      local  ngx  =  StubNgx:new({  Authorization  =  header  })      local  res  =  get_oauth_params_from_auth_header(ngx)      assertEquals("1.0",  res.oauth_version)      ...
  56. None
  57. None
  58. test harness

  59. test harness

  60. test harness

  61. test harness

  62. test harness nginx-­‐oauth1.conf

  63. test harness nginx-­‐oauth1.conf

  64. test harness nginx-­‐oauth1.conf

  65. def  spec_using_valid_oauth_credentials(self,  harness):          auth  =  OAuth1("mykey",

     "mysecret")          body_data  =  "{'function':  'tick'}"          harness.reset_data()          response  =  requests.post(root_url,  data=body_data,  auth=auth,                                                            headers={'Content-­‐Type':                                                                              'application/json'})          assert  response.status_code  ==  200          assert  "Authorization"  in  harness.forwarded_headers          assert  "Oauth"  in  harness.forwarded_headers["Authorization"]          assert  harness.forwarded_body  ==  body_data  
  66. Deployment

  67. None
  68. Configs in VCS Playbooks in VCS

  69. Configs in VCS config templates API configs vault
 (keys) Playbooks

    in VCS
  70. Configs in VCS config templates API configs vault
 (keys) Playbooks

    in VCS
  71. Configs in VCS config templates API configs vault
 (keys) Playbooks

    in VCS vip.conf vhost.lua vhost.conf vhost.conf vhost.lua nginx.conf
  72. Configs in VCS config templates API configs vault
 (keys) Playbooks

    in VCS vip.conf vhost.lua vhost.conf vhost.conf vhost.lua nginx.conf
  73. Configs in VCS config templates API configs vault
 (keys) ssh

    Playbooks in VCS vip.conf vhost.lua vhost.conf vhost.conf vhost.lua nginx.conf
  74. Configs in VCS config templates API configs vault
 (keys) ssh

    Playbooks in VCS vip.conf vhost.lua vhost.conf vhost.conf vhost.lua nginx.conf
  75. Results

  76. Performance

  77. switch Performance

  78. switch mean 99th Performance

  79. switch ~10x mean 99th Performance

  80. Stability

  81. switch Stability

  82. Impact index=codebig  host=*.cimops.net  source="/var/log/nginx/access.log"  |   eval  d  =  request_time

     -­‐  upstream_response_time  |  
 timechart  span=1m  perc99(d)  max(d)
  83. Impact seconds index=codebig  host=*.cimops.net  source="/var/log/nginx/access.log"  |   eval  d  =

     request_time  -­‐  upstream_response_time  |  
 timechart  span=1m  perc99(d)  max(d)
  84. Impact seconds index=codebig  host=*.cimops.net  source="/var/log/nginx/access.log"  |   eval  d  =

     request_time  -­‐  upstream_response_time  |  
 timechart  span=1m  perc99(d)  max(d) 99th
  85. Impact seconds index=codebig  host=*.cimops.net  source="/var/log/nginx/access.log"  |   eval  d  =

     request_time  -­‐  upstream_response_time  |  
 timechart  span=1m  perc99(d)  max(d) 99th max
  86. Successes

  87. Successes great performance improvements

  88. Successes great performance improvements 68 endpoints in August

  89. Successes great performance improvements 68 endpoints in August ~150 endpoints

    in September
  90. Challenges

  91. Challenges 3rd-party Lua ecosystem

  92. Challenges 3rd-party Lua ecosystem deployment playbook churn

  93. Challenges 3rd-party Lua ecosystem deployment playbook churn configuration file size

  94. Challenges 3rd-party Lua ecosystem deployment playbook churn configuration file size

    kernel tuning
  95. Challenges 3rd-party Lua ecosystem deployment playbook churn configuration file size

    kernel tuning owning availability
  96. Conclusion

  97. Conclusion NGINX + Lua for HTTP middleware

  98. Conclusion NGINX + Lua for HTTP middleware Automated test and

    deployment pipeline
  99. Conclusion NGINX + Lua for HTTP middleware Automated test and

    deployment pipeline Concurrent request limiting
  100. Conclusion NGINX + Lua for HTTP middleware Automated test and

    deployment pipeline Concurrent request limiting Operational flexibility
  101. Thanks