Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic scaling in a (micro)service oriented architecture

Elastic scaling in a (micro)service oriented architecture

Splitting an application up into multiple independent services can be a good way to keep it scaling and ensure stability and developer productivity in larger, growing teams. But just splitting the codebase, creating APIs and deploying the code on some servers is not enough, somehow your services need to know where and how other services are accessible. Classical approaches like hardcoding everything in every service or having a central load-balancer can quickly lead to problems in terms of scalability and maintainability. In this talk I'll show how we at ResearchGate tackled this challenge. With the help of tools like Consul and haproxy we created a setup that allows us to quickly boot and shutdown services. This ensures that all servers are utilized optimally and load spikes can be reacted upon quickly and automatically.

Bastian Hofmann

February 18, 2016
Tweet

More Decks by Bastian Hofmann

Other Decks in Programming

Transcript

  1. Elastic Scaling in a
    (Micro)service
    oriented
    Architecture
    @BastianHofmann

    View full-size slide

  2. Microservices

    View full-size slide

  3. Service Oriented
    Architecture

    View full-size slide

  4. http://blog.philipphauer.de/microservices-nutshell-pros-cons/
    Monolith Microservices

    View full-size slide

  5. Microservice

    View full-size slide

  6. Cloud Solutions

    View full-size slide

  7. Using the cloud is
    not always possible

    View full-size slide

  8. … or even desirable

    View full-size slide

  9. Doing it yourself
    creates challenges

    View full-size slide

  10. Transparency

    View full-size slide

  11. Learning Curves

    View full-size slide

  12. Elastic Scaling?

    View full-size slide

  13. How can we solve
    them

    View full-size slide

  14. A lot of this is also
    useful for monoliths

    View full-size slide

  15. •A big monolith
    •Multiple small to medium sized
    services
    •Lots of shared libraries
    •Tools and utilities
    •Hadoop jobs
    •Flink jobs
    •Server Provisioning

    View full-size slide

  16. •PHP
    •Javascript
    •Java
    •Scala
    •Bash
    •Python
    •Puppet
    •Ruby
    •Go

    View full-size slide

  17. •Nginx
    •PHP-FPM
    •Glassfish
    •Jetty
    •Dropwizard
    •haproxy
    •PostgreSQL
    •MongoDB
    •Memcached
    •Infinispan
    •Solr
    •Zookeeper
    •Elasticsearch
    •Logstash
    •Kibana
    •Graphite
    •StatsD
    •RabbitMQ
    •Hortonworks
    Data
    Platform
    •HBase
    •Hive
    •Consul
    •Vault
    •CheckMK
    •Azkaban
    •ActiveMQ
    •Apache
    HTTPD
    •Docker
    •Kafka

    View full-size slide

  18. Several hundred
    servers

    View full-size slide

  19. Questions? Ask

    View full-size slide

  20. http://speakerdeck.com/u/bastianhofmann

    View full-size slide

  21. https://www.flickr.com/photos/npobre/2601582256/

    View full-size slide

  22. How to get the
    services on our
    servers?

    View full-size slide

  23. Diverse technology
    stacks

    View full-size slide

  24. The same for every
    service

    View full-size slide

  25. One Click
    Deployment

    View full-size slide

  26. Build/Test/Release
    pipeline

    View full-size slide

  27. https://www.flickr.com/photos/40987321@N02/5580348753/

    View full-size slide

  28. Services installed in
    a sandbox

    View full-size slide

  29. https://www.docker.com/

    View full-size slide

  30. https://twitter.com/mfdii/status/697532387240996864

    View full-size slide

  31. Docker & PHP - development and
    deployment
    Szymon Skórczyński
    Thursday 16:00

    View full-size slide

  32. Availability

    View full-size slide

  33. Zero Downtime
    Deployments

    View full-size slide

  34. Server
    Server Server
    Server

    View full-size slide

  35. Canary
    environments

    View full-size slide

  36. Server
    Server Server
    Server

    View full-size slide

  37. Fast rollbacks

    View full-size slide

  38. •Ansible
    •Capistrano
    •Saltstack
    •Custom
    •….

    View full-size slide

  39. Running the service

    View full-size slide

  40. How do I stop and
    start a service and
    ensure it keeps
    running?

    View full-size slide

  41. Diverse technology
    stacks

    View full-size slide

  42. The same for every
    service

    View full-size slide

  43. •Supervisord
    •Upstart
    •S6
    •Ruine
    •Monit
    •Circus
    •Restartd
    •…

    View full-size slide

  44. How to synchronize
    changes over
    services?

    View full-size slide

  45. API Versioning

    View full-size slide

  46. GET /v23/foo/abr
    Host: myservice.local

    View full-size slide

  47. GET /foo/abr
    Host: myservice.local
    X-Version: 23

    View full-size slide

  48. GET /foo/abr?version=23
    Host: myservice.local

    View full-size slide

  49. GET /foo/abr
    Host: myservice.local
    Accept: application/vnd.company.v23+json

    View full-size slide

  50. Feature Flags

    View full-size slide

  51. public function hasAccess() {
    return featureFlag()->isActive(
    FeatureFlag::TEST_ONE
    );
    }

    View full-size slide

  52. Shared database

    View full-size slide

  53. GET /foo/abr
    Host: myservice.local
    X-Flag-NewFeature: 1

    View full-size slide

  54. Configuration
    Management

    View full-size slide

  55. How do I
    synchronize
    configuration over
    services?

    View full-size slide

  56. [
    "db_user": "user",
    "db_pw": "pw",
    "serviceA": "serviceA.local:8018"
    ]

    View full-size slide

  57. Config file on disk

    View full-size slide

  58. Inconsistencies

    View full-size slide

  59. Consul
    https://www.consul.io/

    View full-size slide

  60. •Consul
    •Zookeeper
    •etcd
    •…

    View full-size slide

  61. Consul
    Server
    Consul
    Server
    Consul
    Server
    Consul
    Agent
    ver
    Consul
    Agent
    Server
    Consul
    Agent
    Server
    Co
    Ag
    Server

    View full-size slide

  62. https://github.com/sensiolabs/consul-php-sdk

    View full-size slide

  63. Key/Value Store

    View full-size slide

  64. $kv->put('test/foo/bar', 'bazinga');
    $kv->get('test/foo/bar', ['raw' => true]);
    $kv->delete('test/foo/bar');

    View full-size slide

  65. $kv->put('test/db/pw', 'secret_pw');

    View full-size slide

  66. https://www.vaultproject.io/

    View full-size slide

  67. Cycling of
    credentials

    View full-size slide

  68. Service Discovery

    View full-size slide

  69. How does one
    service know where
    another service is?

    View full-size slide

  70. Hostname + Port

    View full-size slide

  71. Server
    Service A
    Server
    Service B
    Service C Service C

    View full-size slide

  72. Configuration

    View full-size slide

  73. $config = [
    'serviceA' => [
    '192.168.0.1:8001',
    '192.168.0.2:8001',
    ],
    'serviceB' => [
    '192.168.0.1:8002',
    ],
    'serviceC' => [
    '192.168.0.2:8003',
    ]
    ];

    View full-size slide

  74. Consul
    https://www.consul.io/

    View full-size slide

  75. Load balancing?

    View full-size slide

  76. Round robin in the
    client

    View full-size slide

  77. $config = [
    'serviceA' => [
    '192.168.0.1:8001',
    '192.168.0.2:8001',
    ],
    'serviceB' => [
    '192.168.0.1:8002',
    ],
    'serviceC' => [
    '192.168.0.2:8003',
    ]
    ];

    View full-size slide

  78. Service/Server
    down?

    View full-size slide

  79. $config = [
    'serviceA' => [
    '192.168.0.1:8001',
    '192.168.0.2:8001',
    ],
    'serviceB' => [
    '192.168.0.1:8002',
    ],
    'serviceC' => [
    '192.168.0.2:8003',
    ]
    ];

    View full-size slide

  80. Health checks

    View full-size slide

  81. GET /health HTTP/1.1
    Host: serviceA.local
    HTTP/1.1 200 OK

    View full-size slide

  82. Central load
    balancer

    View full-size slide

  83. Server
    Service A
    Server
    Service B
    Service C Service C
    Load balancer

    View full-size slide

  84. Scalability?

    View full-size slide

  85. Consul
    https://www.consul.io/

    View full-size slide

  86. Consul
    Server
    Consul
    Server
    Consul
    Server
    Consul
    Agent
    ver
    Consul
    Agent
    Server
    Consul
    Agent
    Server
    Co
    Ag
    Server

    View full-size slide

  87. Consul for Service
    Discovery

    View full-size slide

  88. Consul
    Agent
    Server
    Service A
    Registration
    Health check

    View full-size slide

  89. admin@hashicorp: dig web-frontend.service.consul. ANY
    ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul. ANY
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29981
    ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0,
    ADDITIONAL: 0
    ;; QUESTION SECTION:
    ;web-frontend.service.consul. IN ANY
    ;; ANSWER SECTION:
    web-frontend.service.consul. 0 IN A 10.0.3.83
    web-frontend.service.consul. 0 IN A 10.0.1.109

    View full-size slide

  90. Consul-Template
    https://github.com/hashicorp/consul-template

    View full-size slide

  91. Server
    Service A
    Server
    Service B
    Service C Service C
    Load balancer
    Consul
    Template

    View full-size slide

  92. Single Point of
    Failure

    View full-size slide

  93. Server
    Service A
    Server
    Service B
    Service C Service C
    Load balancer
    Consul
    Template Load balancer
    Consul
    Template

    View full-size slide

  94. How are my
    services behaving?

    View full-size slide

  95. Central Log
    Management

    View full-size slide

  96. Elasticsearch

    Kibana
    Logstash

    View full-size slide

  97. Logstash
    elasticsearch
    webserver webserver webserver
    AMQP
    log log log
    logstash logstash logstash

    View full-size slide

  98. web server http service
    http service
    http service
    http service
    create
    unique
    trace_id for
    request
    user request
    trace_id
    trace_id
    trace_id
    trace_id
    log
    log
    log
    log
    log

    View full-size slide

  99. X-Trace-Id: bbr8ehb984tbab894

    View full-size slide

  100. https://www.loggly.com/

    View full-size slide

  101. https://getsentry.com/

    View full-size slide

  102. Measure everything

    View full-size slide

  103. Server metrics

    View full-size slide

  104. Application metrics

    View full-size slide

  105. StatsD + Graphite

    View full-size slide

  106. webserver webserver webserver
    statsd statsd
    statsd
    graphite
    aggregated
    UPD message
    statsd

    View full-size slide

  107. https://www.librato.com

    View full-size slide

  108. http://www.soasta.com/

    View full-size slide

  109. Use it in production
    for a subset of
    requests

    View full-size slide

  110. newrelic.com

    View full-size slide

  111. https://tidways.io/

    View full-size slide

  112. https://blackfire.io/

    View full-size slide

  113. Make it accessible

    View full-size slide

  114. Handling failures

    View full-size slide

  115. What do I do when
    something breaks?

    View full-size slide

  116. Errors happen

    View full-size slide

  117. Detecting regressions

    View full-size slide

  118. Server outages

    View full-size slide

  119. Database
    overloads

    View full-size slide

  120. Service A Service B
    200 OK

    View full-size slide

  121. Service A Service B
    5xx

    View full-size slide

  122. Service A Service B
    Timeout

    View full-size slide

  123. Circuit Breakers

    View full-size slide

  124. Service A Service B
    200 OK
    Circuit
    Breaker
    Status: closed
    Error rate: 0

    View full-size slide

  125. Service A Service B
    Error
    Circuit
    Breaker
    Status: -> open
    Error rate:
    > threshold

    View full-size slide

  126. Service A Service B
    Circuit
    Breaker
    Status: -> open
    Error rate:
    > threshold

    View full-size slide

  127. Service A Service B
    Error
    Circuit
    Breaker
    Status: -> open
    Error rate:
    > threshold
    Test if still failing

    View full-size slide

  128. Service A Service B
    200 OK
    Circuit
    Breaker
    Status: -> close
    Error rate: 0
    Test if still failing

    View full-size slide

  129. https://github.com/Netflix/Hystrix

    View full-size slide

  130. https://github.com/odesk/phystrix

    View full-size slide

  131. Phystrix does not
    scale well

    View full-size slide

  132. Gracefully handling
    exceptions

    View full-size slide

  133. Component based
    fronted

    View full-size slide

  134. Degrading
    Functionality

    View full-size slide

  135. Profile Publications Publication
    Publication
    Publication
    AboutMe
    LeftColumn Image
    Menu
    Institution

    View full-size slide

  136. Profile Publications Publication
    Publication
    Publication
    AboutMe
    LeftColumn Image
    Menu
    EXCEPTION
    Institution

    View full-size slide

  137. http://techblog.netflix.com/2014/09/introducing-chaos-engineering.html

    View full-size slide

  138. How do I handle
    traffic spikes?

    View full-size slide

  139. Service A Service B
    200 OK
    Circuit
    Breaker

    View full-size slide

  140. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker

    View full-size slide

  141. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    Only allow xx% of calls

    View full-size slide

  142. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    100% of calls
    10% of calls

    View full-size slide

  143. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    100% of calls
    wait until everything is ok

    View full-size slide

  144. Service A Service B
    Circuit
    Breaker
    Service C
    Circuit
    Breaker
    Service B

    View full-size slide

  145. Development
    Environment

    View full-size slide

  146. How do I enusre a
    productive dev
    environment?

    View full-size slide

  147. Diverse technology
    stacks

    View full-size slide

  148. Diverse
    environments

    View full-size slide

  149. https://www.docker.com/

    View full-size slide

  150. Central
    DEV
    Production
    Near
    Production
    Nightly
    DEV

    View full-size slide

  151. Large scale
    refactorings

    View full-size slide

  152. https://qafoo.com/talks/15_08_froscon_monorepos.pdf

    View full-size slide

  153. Global Code
    Search

    View full-size slide

  154. https://github.com/etsy/hound

    View full-size slide

  155. Complete Solutions

    View full-size slide

  156. https://mesosphere.github.io/marathon/

    View full-size slide

  157. Kubernetes at the Home Office
    Billie Thompson
    Friday 11:30

    View full-size slide

  158. https://www.flickr.com/photos/darkdwarf/19701555974/

    View full-size slide

  159. https://joind.in/talk/38117

    View full-size slide

  160. http://twitter.com/BastianHofmann
    http://lanyrd.com/people/BastianHofmann
    http://speakerdeck.com/u/bastianhofmann
    [email protected]

    View full-size slide