Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic scaling in a (micro)service oriented architecture

Elastic scaling in a (micro)service oriented architecture

Splitting an application up into multiple independent services can be a good way to keep it scaling and ensure stability and developer productivity in larger, growing teams. But just splitting the codebase, creating APIs and deploying the code on some servers is not enough, somehow your services need to know where and how other services are accessible. Classical approaches like hardcoding everything in every service or having a central load-balancer can quickly lead to problems in terms of scalability and maintainability. In this talk I'll show how we at ResearchGate tackled this challenge. With the help of tools like Consul and haproxy we created a setup that allows us to quickly boot and shutdown services. This ensures that all servers are utilized optimally and load spikes can be reacted upon quickly and automatically.

Bastian Hofmann

May 22, 2016
Tweet

More Decks by Bastian Hofmann

Other Decks in Programming

Transcript

  1. Elastic Scaling in a
    (Micro)service
    oriented
    Architecture
    @BastianHofmann

    View Slide

  2. View Slide

  3. Microservices

    View Slide

  4. View Slide

  5. Service Oriented
    Architecture

    View Slide

  6. Monolith

    View Slide

  7. http://blog.philipphauer.de/microservices-nutshell-pros-cons/
    Monolith Microservices

    View Slide

  8. Benefits

    View Slide

  9. Problems

    View Slide

  10. Problems

    View Slide

  11. Challenges

    View Slide

  12. Performance

    View Slide

  13. Latency

    View Slide

  14. Stability

    View Slide

  15. Reliability

    View Slide

  16. Transparency

    View Slide

  17. Learning Curves

    View Slide

  18. Code Reuse

    View Slide

  19. Maintenance

    View Slide

  20. Elastic Scaling?

    View Slide

  21. How can we solve
    them

    View Slide

  22. A lot of this is also
    useful for monoliths

    View Slide

  23. View Slide

  24. View Slide

  25. Questions? Ask

    View Slide

  26. http://speakerdeck.com/u/bastianhofmann

    View Slide

  27. https://www.flickr.com/photos/npobre/2601582256/

    View Slide

  28. Deployment

    View Slide

  29. How to get the
    services on our
    servers?

    View Slide

  30. Diverse technology
    stacks

    View Slide

  31. The same for every
    service

    View Slide

  32. One Click
    Deployment

    View Slide

  33. View Slide

  34. Automation

    View Slide

  35. Build/Test/Release
    pipeline

    View Slide

  36. View Slide

  37. https://www.flickr.com/photos/[email protected]/5580348753/

    View Slide

  38. Base boxes

    View Slide

  39. Services installed in
    a sandbox

    View Slide

  40. https://www.docker.com/

    View Slide

  41. https://twitter.com/mfdii/status/697532387240996864

    View Slide

  42. Availability

    View Slide

  43. Zero Downtime
    Deployments

    View Slide

  44. Server
    Server Server
    Server

    View Slide

  45. Stability

    View Slide

  46. Canary
    environments

    View Slide

  47. Server
    Server Server
    Server

    View Slide

  48. Fast rollbacks

    View Slide

  49. •Ansible
    •Capistrano
    •Saltstack
    •Custom
    •….

    View Slide

  50. Running the service

    View Slide

  51. How do I stop and
    start a service and
    ensure it keeps
    running?

    View Slide

  52. Diverse technology
    stacks

    View Slide

  53. The same for every
    service

    View Slide

  54. •Supervisord
    •Upstart
    •S6
    •Ruine
    •Monit
    •Circus
    •Restartd
    •…

    View Slide

  55. Releases

    View Slide

  56. How to synchronize
    changes over
    services?

    View Slide

  57. API Versioning

    View Slide

  58. GET /v23/foo/abr
    Host: myservice.local

    View Slide

  59. GET /foo/abr
    Host: myservice.local
    X-Version: 23

    View Slide

  60. GET /foo/abr?version=23
    Host: myservice.local

    View Slide

  61. GET /foo/abr
    Host: myservice.local
    Accept: application/vnd.company.v23+json

    View Slide

  62. No backwards
    compatibility
    breaks

    View Slide

  63. Feature Flags

    View Slide

  64. public function hasAccess() {
    return featureFlag()->isActive(
    FeatureFlag::TEST_ONE
    );
    }

    View Slide

  65. View Slide

  66. View Slide

  67. Shared database

    View Slide

  68. Headers

    View Slide

  69. GET /foo/abr
    Host: myservice.local
    X-Flag-NewFeature: 1

    View Slide

  70. Configuration
    Management

    View Slide

  71. How do I
    synchronize
    configuration over
    services?

    View Slide

  72. [
    "db_user": "user",
    "db_pw": "pw",
    "serviceA": "serviceA.local:8018"
    ]

    View Slide

  73. Config file on disk

    View Slide

  74. Duplication

    View Slide

  75. Inconsistencies

    View Slide

  76. Consul
    https://www.consul.io/

    View Slide

  77. •Consul
    •Zookeeper
    •etcd
    •…

    View Slide

  78. Consul
    Server
    Consul
    Server
    Consul
    Server
    Consul
    Agent
    ver
    Consul
    Agent
    Server
    Consul
    Agent
    Server
    Co
    Ag
    Server

    View Slide

  79. https://github.com/sensiolabs/consul-php-sdk

    View Slide

  80. Key/Value Store

    View Slide

  81. $kv->put('test/foo/bar', 'bazinga');
    $kv->get('test/foo/bar', ['raw' => true]);
    $kv->delete('test/foo/bar');

    View Slide

  82. Credentials

    View Slide

  83. $kv->put('test/db/pw', 'secret_pw');

    View Slide

  84. https://www.vaultproject.io/

    View Slide

  85. Cycling of
    credentials

    View Slide

  86. Service Discovery

    View Slide

  87. How does one
    service know where
    another service is?

    View Slide

  88. Hostname + Port

    View Slide

  89. Server
    Service A
    Server
    Service B
    Service C Service C

    View Slide

  90. Configuration

    View Slide

  91. $config = [
    'serviceA' => [
    '192.168.0.1:8001',
    '192.168.0.2:8001',
    ],
    'serviceB' => [
    '192.168.0.1:8002',
    ],
    'serviceC' => [
    '192.168.0.2:8003',
    ]
    ];

    View Slide

  92. Consul
    https://www.consul.io/

    View Slide

  93. Load balancing?

    View Slide

  94. Round robin in the
    client

    View Slide

  95. $config = [
    'serviceA' => [
    '192.168.0.1:8001',
    '192.168.0.2:8001',
    ],
    'serviceB' => [
    '192.168.0.1:8002',
    ],
    'serviceC' => [
    '192.168.0.2:8003',
    ]
    ];

    View Slide

  96. Service/Server
    down?

    View Slide

  97. $config = [
    'serviceA' => [
    '192.168.0.1:8001',
    '192.168.0.2:8001',
    ],
    'serviceB' => [
    '192.168.0.1:8002',
    ],
    'serviceC' => [
    '192.168.0.2:8003',
    ]
    ];

    View Slide

  98. Health checks

    View Slide

  99. GET /health HTTP/1.1
    Host: serviceA.local
    HTTP/1.1 200 OK

    View Slide

  100. Consul for Service
    Discovery

    View Slide

  101. Consul
    https://www.consul.io/

    View Slide

  102. Consul
    Server
    Consul
    Server
    Consul
    Server
    Consul
    Agent
    ver
    Consul
    Agent
    Server
    Consul
    Agent
    Server
    Co
    Ag
    Server

    View Slide

  103. Consul
    Agent
    Server
    Service A
    Registration
    Health check

    View Slide

  104. Consul API

    View Slide

  105. DNS

    View Slide

  106. [email protected]: dig web-frontend.service.consul. ANY
    ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul. ANY
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0,
    ADDITIONAL: 0
    ;; QUESTION SECTION:
    ;web-frontend.service.consul. IN ANY
    ;; ANSWER SECTION:
    web-frontend.service.consul. 0 IN A 10.0.3.83
    web-frontend.service.consul. 0 IN A 10.0.1.109

    View Slide

  107. Monitoring

    View Slide

  108. How are my
    services behaving?

    View Slide

  109. Central Log
    Management

    View Slide

  110. Elasticsearch

    Kibana
    Logstash

    View Slide

  111. Logstash
    elasticsearch
    webserver webserver webserver
    AMQP
    log log log
    logstash logstash logstash

    View Slide

  112. View Slide

  113. Tracing IDs

    View Slide

  114. web server http service
    http service
    http service
    http service
    create
    unique
    trace_id for
    request
    user request
    trace_id
    trace_id
    trace_id
    trace_id
    log
    log
    log
    log
    log

    View Slide

  115. https://www.loggly.com/

    View Slide

  116. https://getsentry.com/

    View Slide

  117. Measure everything

    View Slide

  118. Server metrics

    View Slide

  119. Application metrics

    View Slide

  120. StatsD + Graphite

    View Slide

  121. webserver webserver webserver
    statsd statsd
    statsd
    graphite
    aggregated
    UPD message
    statsd

    View Slide

  122. https://www.librato.com

    View Slide

  123. http://www.soasta.com/

    View Slide

  124. Profiling

    View Slide

  125. XHProf

    View Slide

  126. View Slide

  127. View Slide

  128. https://tidways.io/

    View Slide

  129. https://blackfire.io/

    View Slide

  130. newrelic.com

    View Slide

  131. Handling failures

    View Slide

  132. What do I do when
    something breaks?

    View Slide

  133. Errors happen

    View Slide

  134. Detecting regressions

    View Slide

  135. Server outages

    View Slide

  136. Database
    overloads

    View Slide

  137. Bugs

    View Slide

  138. Service A Service B
    200 OK

    View Slide

  139. Service A Service B
    5xx

    View Slide

  140. Service A Service B
    Timeout

    View Slide

  141. Circuit Breakers

    View Slide

  142. Service A Service B
    200 OK
    Circuit
    Breaker
    Status: closed
    Error rate: 0

    View Slide

  143. Service A Service B
    Error
    Circuit
    Breaker
    Status: -> open
    Error rate:
    > threshold

    View Slide

  144. Service A Service B
    Circuit
    Breaker
    Status: -> open
    Error rate:
    > threshold

    View Slide

  145. Service A Service B
    Error
    Circuit
    Breaker
    Status: -> open
    Error rate:
    > threshold
    Test if still failing

    View Slide

  146. Service A Service B
    200 OK
    Circuit
    Breaker
    Status: -> close
    Error rate: 0
    Test if still failing

    View Slide

  147. https://github.com/Netflix/Hystrix

    View Slide

  148. https://github.com/odesk/phystrix

    View Slide

  149. Gracefully handling
    exceptions

    View Slide

  150. Component based
    fronted

    View Slide

  151. View Slide

  152. View Slide

  153. View Slide

  154. View Slide

  155. View Slide

  156. View Slide

  157. View Slide

  158. Degrading
    Functionality

    View Slide

  159. Profile Publications Publication
    Publication
    Publication
    AboutMe
    LeftColumn Image
    Menu
    Institution

    View Slide

  160. Profile Publications Publication
    Publication
    Publication
    AboutMe
    LeftColumn Image
    Menu
    EXCEPTION
    Institution

    View Slide

  161. Test it

    View Slide

  162. http://techblog.netflix.com/2014/09/introducing-chaos-engineering.html

    View Slide

  163. Scalability

    View Slide

  164. How do I handle
    traffic spikes?

    View Slide

  165. Elasticity

    View Slide

  166. Service A Service B
    200 OK
    Circuit
    Breaker

    View Slide

  167. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker

    View Slide

  168. Throttling

    View Slide

  169. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    Only allow xx% of calls

    View Slide

  170. View Slide

  171. Priority

    View Slide

  172. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    100% of calls
    10% of calls

    View Slide

  173. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    100% of calls
    wait until everything is ok

    View Slide

  174. Elasticity

    View Slide

  175. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    Service B

    View Slide

  176. Complete Solutions

    View Slide

  177. View Slide

  178. https://mesosphere.github.io/marathon/

    View Slide

  179. https://www.flickr.com/photos/darkdwarf/19701555974/

    View Slide

  180. http://twitter.com/BastianHofmann
    http://lanyrd.com/people/BastianHofmann
    http://speakerdeck.com/u/bastianhofmann
    [email protected]

    View Slide