Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic scaling in a (micro)service oriented architecture

Elastic scaling in a (micro)service oriented architecture

Splitting an application up into multiple independent services can be a good way to keep it scaling and ensure stability and developer productivity in larger, growing teams. But just splitting the codebase, creating APIs and deploying the code on some servers is not enough, somehow your services need to know where and how other services are accessible. Classical approaches like hardcoding everything in every service or having a central load-balancer can quickly lead to problems in terms of scalability and maintainability. In this talk I'll show how we at ResearchGate tackled this challenge. With the help of tools like Consul and haproxy we created a setup that allows us to quickly boot and shutdown services. This ensures that all servers are utilized optimally and load spikes can be reacted upon quickly and automatically.

8e82eb7e128a14a16d642ae55227339b?s=128

Bastian Hofmann

May 22, 2016
Tweet

More Decks by Bastian Hofmann

Other Decks in Programming

Transcript

  1. Elastic Scaling in a (Micro)service oriented Architecture @BastianHofmann

  2. None
  3. Microservices

  4. None
  5. Service Oriented Architecture

  6. Monolith

  7. http://blog.philipphauer.de/microservices-nutshell-pros-cons/ Monolith Microservices

  8. Benefits

  9. Problems

  10. Problems

  11. Challenges

  12. Performance

  13. Latency

  14. Stability

  15. Reliability

  16. Transparency

  17. Learning Curves

  18. Code Reuse

  19. Maintenance

  20. Elastic Scaling?

  21. How can we solve them

  22. A lot of this is also useful for monoliths

  23. None
  24. None
  25. Questions? Ask

  26. http://speakerdeck.com/u/bastianhofmann

  27. https://www.flickr.com/photos/npobre/2601582256/

  28. Deployment

  29. How to get the services on our servers?

  30. Diverse technology stacks

  31. The same for every service

  32. One Click Deployment

  33. None
  34. Automation

  35. Build/Test/Release pipeline

  36. None
  37. https://www.flickr.com/photos/40987321@N02/5580348753/

  38. Base boxes

  39. Services installed in a sandbox

  40. https://www.docker.com/

  41. https://twitter.com/mfdii/status/697532387240996864

  42. Availability

  43. Zero Downtime Deployments

  44. Server Server Server Server

  45. Stability

  46. Canary environments

  47. Server Server Server Server

  48. Fast rollbacks

  49. •Ansible •Capistrano •Saltstack •Custom •….

  50. Running the service

  51. How do I stop and start a service and ensure

    it keeps running?
  52. Diverse technology stacks

  53. The same for every service

  54. •Supervisord •Upstart •S6 •Ruine •Monit •Circus •Restartd •…

  55. Releases

  56. How to synchronize changes over services?

  57. API Versioning

  58. GET /v23/foo/abr Host: myservice.local

  59. GET /foo/abr Host: myservice.local X-Version: 23

  60. GET /foo/abr?version=23 Host: myservice.local

  61. GET /foo/abr Host: myservice.local Accept: application/vnd.company.v23+json

  62. No backwards compatibility breaks

  63. Feature Flags

  64. public function hasAccess() { return featureFlag()->isActive( FeatureFlag::TEST_ONE ); }

  65. None
  66. None
  67. Shared database

  68. Headers

  69. GET /foo/abr Host: myservice.local X-Flag-NewFeature: 1

  70. Configuration Management

  71. How do I synchronize configuration over services?

  72. [ "db_user": "user", "db_pw": "pw", "serviceA": "serviceA.local:8018" ]

  73. Config file on disk

  74. Duplication

  75. Inconsistencies

  76. Consul https://www.consul.io/

  77. •Consul •Zookeeper •etcd •…

  78. Consul Server Consul Server Consul Server Consul Agent ver Consul

    Agent Server Consul Agent Server Co Ag Server
  79. https://github.com/sensiolabs/consul-php-sdk

  80. Key/Value Store

  81. $kv->put('test/foo/bar', 'bazinga'); $kv->get('test/foo/bar', ['raw' => true]); $kv->delete('test/foo/bar');

  82. Credentials

  83. $kv->put('test/db/pw', 'secret_pw');

  84. https://www.vaultproject.io/

  85. Cycling of credentials

  86. Service Discovery

  87. How does one service know where another service is?

  88. Hostname + Port

  89. Server Service A Server Service B Service C Service C

  90. Configuration

  91. $config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'

    => [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];
  92. Consul https://www.consul.io/

  93. Load balancing?

  94. Round robin in the client

  95. $config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'

    => [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];
  96. Service/Server down?

  97. $config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'

    => [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];
  98. Health checks

  99. GET /health HTTP/1.1 Host: serviceA.local HTTP/1.1 200 OK

  100. Consul for Service Discovery

  101. Consul https://www.consul.io/

  102. Consul Server Consul Server Consul Server Consul Agent ver Consul

    Agent Server Consul Agent Server Co Ag Server
  103. Consul Agent Server Service A Registration Health check

  104. Consul API

  105. DNS

  106. admin@hashicorp: dig web-frontend.service.consul. ANY ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul.

    ANY ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29981 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;web-frontend.service.consul. IN ANY ;; ANSWER SECTION: web-frontend.service.consul. 0 IN A 10.0.3.83 web-frontend.service.consul. 0 IN A 10.0.1.109
  107. Monitoring

  108. How are my services behaving?

  109. Central Log Management

  110. Elasticsearch
 Kibana Logstash

  111. Logstash elasticsearch webserver webserver webserver AMQP log log log logstash

    logstash logstash
  112. None
  113. Tracing IDs

  114. web server http service http service http service http service

    create unique trace_id for request user request trace_id trace_id trace_id trace_id log log log log log
  115. https://www.loggly.com/

  116. https://getsentry.com/

  117. Measure everything

  118. Server metrics

  119. Application metrics

  120. StatsD + Graphite

  121. webserver webserver webserver statsd statsd statsd graphite aggregated UPD message

    statsd
  122. https://www.librato.com

  123. http://www.soasta.com/

  124. Profiling

  125. XHProf

  126. None
  127. None
  128. https://tidways.io/

  129. https://blackfire.io/

  130. newrelic.com

  131. Handling failures

  132. What do I do when something breaks?

  133. Errors happen

  134. Detecting regressions

  135. Server outages

  136. Database overloads

  137. Bugs

  138. Service A Service B 200 OK

  139. Service A Service B 5xx

  140. Service A Service B Timeout

  141. Circuit Breakers

  142. Service A Service B 200 OK Circuit Breaker Status: closed

    Error rate: 0
  143. Service A Service B Error Circuit Breaker Status: -> open

    Error rate: > threshold
  144. Service A Service B Circuit Breaker Status: -> open Error

    rate: > threshold
  145. Service A Service B Error Circuit Breaker Status: -> open

    Error rate: > threshold Test if still failing
  146. Service A Service B 200 OK Circuit Breaker Status: ->

    close Error rate: 0 Test if still failing
  147. https://github.com/Netflix/Hystrix

  148. https://github.com/odesk/phystrix

  149. Gracefully handling exceptions

  150. Component based fronted

  151. None
  152. None
  153. None
  154. None
  155. None
  156. None
  157. None
  158. Degrading Functionality

  159. Profile Publications Publication Publication Publication AboutMe LeftColumn Image Menu Institution

  160. Profile Publications Publication Publication Publication AboutMe LeftColumn Image Menu EXCEPTION

    Institution
  161. Test it

  162. http://techblog.netflix.com/2014/09/introducing-chaos-engineering.html

  163. Scalability

  164. How do I handle traffic spikes?

  165. Elasticity

  166. Service A Service B 200 OK Circuit Breaker

  167. Service A Service B Circuit Breaker Service C Circuit Breaker

  168. Throttling

  169. Service A Service B Circuit Breaker Service C Circuit Breaker

    Only allow xx% of calls
  170. None
  171. Priority

  172. Service A Service B Circuit Breaker Service C Circuit Breaker

    100% of calls 10% of calls
  173. Service A Service B Circuit Breaker Service C Circuit Breaker

    100% of calls wait until everything is ok
  174. Elasticity

  175. Service A Service B Circuit Breaker Service C Circuit Breaker

    Service B
  176. Complete Solutions

  177. None
  178. https://mesosphere.github.io/marathon/

  179. https://www.flickr.com/photos/darkdwarf/19701555974/

  180. http://twitter.com/BastianHofmann http://lanyrd.com/people/BastianHofmann http://speakerdeck.com/u/bastianhofmann mail@bastianhofmann.de