Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic scaling in a (micro)service oriented architecture

Elastic scaling in a (micro)service oriented architecture

Splitting an application up into multiple independent services can be a good way to keep it scaling and ensure stability and developer productivity in larger, growing teams. But just splitting the codebase, creating APIs and deploying the code on some servers is not enough, somehow your services need to know where and how other services are accessible. Classical approaches like hardcoding everything in every service or having a central load-balancer can quickly lead to problems in terms of scalability and maintainability. In this talk I’ll show how we at ResearchGate tackled this challenge. With the help of tools like Consul and haproxy we created a setup that allows us to quickly boot and shutdown services. This ensures that all servers are utilized optimally and load spikes can be reacted upon quickly and automatically.

8e82eb7e128a14a16d642ae55227339b?s=128

Bastian Hofmann

January 28, 2017
Tweet

Transcript

  1. Elastic Scaling in a (Micro)service oriented Architecture @BastianHofmann

  2. None
  3. Microservices

  4. None
  5. Service Oriented Architecture

  6. Monolith

  7. http://blog.philipphauer.de/microservices-nutshell-pros-cons/ Monolith Microservices

  8. Benefits

  9. Stricter separation of concerns

  10. Diverse technology stacks

  11. Things that you don’t want to do in language X

  12. Problems

  13. Problems

  14. Challenges

  15. Performance

  16. Latency

  17. Stability

  18. Reliability

  19. Transparency

  20. Monitoring

  21. Learning Curves

  22. Code Reuse

  23. Maintenance

  24. How to solve these

  25. How to elastically scale

  26. •Deploying •Running •Releasing •Configuring •Discovering •Scaling Agenda

  27. A lot of this is also useful for monoliths

  28. None
  29. None
  30. 11 million users

  31. 193 countries

  32. ~1800 request/s

  33. lots of data

  34. >100 million publications

  35. ~ 140 components

  36. ~ 400 repositories

  37. haproxy node memcache postgresql mongodb solr infinispan hbase mongodb solr

    community services
  38. + async events, stream and batch processing

  39. https://www.flickr.com/photos/npobre/2601582256/

  40. Deployment

  41. How to get the services on our servers?

  42. Diverse technology stacks

  43. The same for every service

  44. One Click Deployment

  45. •Ansible •Capistrano •Saltstack •Custom •….

  46. None
  47. Automation

  48. Build/Test/Release pipeline

  49. None
  50. Availability

  51. Zero Downtime Deployments

  52. Server Server Server Server

  53. Server Server Server Server

  54. Server Server Server Server

  55. Server Server Server Server

  56. Server Server Server Server

  57. Server Server Server Server

  58. Stability

  59. Canary environments

  60. Server Server Server Server

  61. Server Server Server Server Test with low amount of traffic

  62. Fast deployments

  63. Fast rollbacks

  64. https://www.flickr.com/photos/40987321@N02/5580348753/

  65. Different libraries, packages, web servers, configurations, versions

  66. Provisioned Base boxes

  67. Services installed in a sandbox

  68. https://www.docker.com/

  69. Running the service

  70. How do I stop and start a service and ensure

    it keeps running?
  71. Diverse technology stacks

  72. The same for every service

  73. •Supervisord •Upstart •S6 •Ruine •Monit •Circus •Restartd •Docker •…

  74. https://www.docker.com/

  75. docker run my-service

  76. Releases

  77. How to synchronize changes over services?

  78. APIs

  79. API Versioning

  80. GET /v23/foo/abr Host: myservice.local

  81. GET /foo/abr Host: myservice.local X-Version: 23

  82. GET /foo/abr?version=23 Host: myservice.local

  83. GET /foo/abr Host: myservice.local Accept: application/vnd.company.v23+json

  84. No backwards compatibility breaks

  85. Feature Flags

  86. public function hasAccess() { return featureFlag()->isActive( FeatureFlag::TEST_ONE ); }

  87. None
  88. None
  89. Configuration Management

  90. How do I synchronize configuration over services?

  91. [ "db_user": "user", "db_pw": "pw", "serviceA": "serviceA.local:8018" ]

  92. Config file on disk

  93. Duplication

  94. Inconsistencies

  95. Consul https://www.consul.io/

  96. •Consul •Zookeeper •etcd •…

  97. Consul Server Consul Server Consul Server Consul Agent ver Consul

    Agent Server Consul Agent Server Co Ag Server
  98. Key/Value Store

  99. $kv->put('test/foo/bar', 'bazinga'); $kv->get('test/foo/bar', ['raw' => true]); $kv->delete('test/foo/bar');

  100. Credentials

  101. $kv->put('test/db/pw', 'secret_pw');

  102. https://www.vaultproject.io/

  103. Cycling of credentials

  104. Service Discovery

  105. How does one service know where another service is?

  106. Hostname/IP:Port

  107. Server Service A Server Service B Service C Service C

  108. Configuration

  109. $config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'

    => [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];
  110. Consul https://www.consul.io/

  111. Load balancing?

  112. Round robin in the client

  113. $config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'

    => [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];
  114. Service/Server down?

  115. $config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'

    => [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];
  116. Health checks

  117. GET /health HTTP/1.1 Host: serviceA.local HTTP/1.1 200 OK

  118. Central load balancer

  119. HAproxy http://www.haproxy.org/

  120. Server Service A Server Service B Service C Service C

    Load balancer
  121. Scalability?

  122. Load balancer

  123. Load balancer

  124. Elasticity?

  125. Load balancer

  126. Consul https://www.consul.io/

  127. Consul Server Consul Server Consul Server Consul Agent ver Consul

    Agent Server Consul Agent Server Co Ag Server
  128. Consul for Service Discovery

  129. Consul Agent Server Service A Registration Health check

  130. None
  131. Load balance directly in the client

  132. Consul API

  133. $ curl http://localhost:8500/v1/catalog/service/refind- service [ { "ServicePort": 10780, "ServiceAddress": "",

    "ServiceTags": [ "env:rg_dev", "protocol:http" ], "ServiceName": "refind-service", "ServiceID": "refind-service", "Address": "172.20.4.61", "Node": "refind-1.ipbl.rgoffice.net" }, { "ServicePort": 10780, "ServiceAddress": "", "ServiceTags": [ "env:rg_dev", "protocol:http"
  134. DNS

  135. $ dig -p 8600 @localhost refind- service.service.rgoffice.consul. ANY ; <<>>

    DiG 9.9.5-3ubuntu0.11-Ubuntu <<>> -p 8600 @localhost refind-service.service.rgoffice.consul. ANY ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19315 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;refind-service.service.rgoffice.consul. IN ANY ;; ANSWER SECTION: refind-service.service.rgoffice.consul. 0 IN A 172.20.4.61 refind-service.service.rgoffice.consul. 0 IN A 172.20.4.58
  136. None
  137. Metrics

  138. Flexible routing options

  139. Circuit breakers

  140. Tracing

  141. None
  142. Server Service A Server Service B Service C Service C

    Linkerd Consul
  143. Single Point of Failure

  144. Server Service A Server Service B Service C Service C

    Linkerd Consul Linkerd Consul
  145. Scaling?

  146. Cloud Solutions

  147. None
  148. Using the cloud is not always possible

  149. … or even desirable

  150. https://mesosphere.github.io/marathon/

  151. None
  152. Very Powerful

  153. Learning curve

  154. Kubernetes Cluster

  155. Image • A docker image built from a Dockerfile that

    contains everything a service needs to run
  156. • A container runs a docker image. • Only 1

    process can run inside of a container Container
  157. Pod • A group of 1 or more containers •

    Same port space • Ports are not accessible from outside of the pod
  158. Replica Set • Defines and manages how many instances of

    a pod should run
  159. Deployment • Manages updates and rollbacks of replica sets

  160. Service • Makes a port of a pod accessible to

    other pods
  161. Ingress • Makes a service accessible to the outside of

    Kubernetes
  162. Node • A physical server • Containers get distributed automatically

  163. ConfigMaps & Secrets • Configuration that can be mounted inside

    of a container
  164. Volumes • Volumes can be mounted into a container to

    access a ConfigMap, Secret or a folder on the host
  165. Namespaces • Dedicated environment to deploy services in

  166. Example

  167. PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER PHP Application

    POD
  168. PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER PHP Application

    POD ReplicaSet: 2 instances PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER PHP Application POD
  169. PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER ReplicaSet: 2

    instances PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER CONFIG WEB :80 PHP Application POD PHP Application POD
  170. PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER ReplicaSet: 2

    instances PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER CONFIG WEB :80 https://php-app.k8s.foo.com:443/ PHP Application POD PHP Application POD
  171. FROM node:7 WORKDIR /opt/appmiral ADD . /opt/appmiral RUN apt-get install

    -y curl git && \ npm install bower@latest -g && npm install grunt@latest -g && \ npm install && bower install --allow- root && grunt build EXPOSE 9012 CMD node /opt/appmiral/dist/server.js
  172. docker build -t appmiral . docker run appmiral

  173. ApiVersion: extensions/v1beta1 kind: Deployment metadata: name: appmiral spec: replicas: 2

    template: spec: containers: - name: appmiral image: your-registry/researchgate/appmiral resources: requests: cpu: 1 memory: 200Mi env: - name: NODE_ENV value: "production" ports: - containerPort: 9012 livenessProbe: httpGet: path: /health port: 9012
  174. - name: appmiral image: your-registry/researchgate/appmiral resources: requests: cpu: 1 memory:

    200Mi env: - name: NODE_ENV value: "production" ports: - containerPort: 9012 livenessProbe: httpGet: path: /health port: 9012
  175. kind: Service apiVersion: v1 metadata: name: appmiral spec: ports: -

    name: http port: 9012 targetPort: 9012 protocol: TCP selector: app: appmiral
  176. apiVersion: extensions/v1beta1 kind: Ingress metadata: name: appmiral-ing spec: rules: -

    host: appmiral.kluster-01.rgoffice.net http: paths: - path: / backend: serviceName: appmiral servicePort: 9012
  177. kubectl create -f k8s_appmiral.yaml

  178. Rolling Deployments

  179. kubectl

  180. REST API

  181. None
  182. Helm The package manager for Kubernetes https://helm.sh/

  183. None
  184. Service Discovery

  185. Service Virtual IP address

  186. Environment Variables

  187. APPMIRAL_SERVICE_HOST=10.0.162.149 APPMIRAL_SERVICE_PORT=80

  188. DNS

  189. $ nslookup appmiral Server: 10.0.0.10 Address 1: 10.0.0.10 Name: appmiral

    Address 1: 10.0.162.149
  190. LinkerD in Kubernetes

  191. PHP-FPM NGINX LINKERD STATSD MEM CACHED MONGO ROUTER PHP Application

    POD
  192. None
  193. Manual Scaling

  194. kubectl scale --replicas=3 deployment/my-app

  195. AutoScaling

  196. None
  197. https://kubernetes.io/docs/user-guide/horizontal-pod- autoscaling/

  198. https://www.flickr.com/photos/darkdwarf/19701555974/

  199. Expect the un- expected: How to handle errors gracefully Saturday,

    9:00 am, Track B
  200. http://speakerdeck.com/u/bastianhofmann

  201. http://twitter.com/BastianHofmann http://lanyrd.com/people/BastianHofmann http://speakerdeck.com/u/bastianhofmann mail@bastianhofmann.de