Elastic scaling in a (micro)service oriented architecture

Elastic Scaling in a (Micro)service oriented Architecture @BastianHofmann

Microservices

Service Oriented Architecture

Monolith

http://blog.philipphauer.de/microservices-nutshell-pros-cons/ Monolith Microservices

Benefits

Stricter separation of concerns

Diverse technology stacks

Problems

Challenges

Performance

Latency

Stability

Reliability

Transparency

Monitoring

Learning Curves

Code Reuse

Maintenance

Elastic Scaling?

Cloud Solutions

Using the cloud is not always possible

… or even desirable

Only solve some of the problems

How can we solve them

•Deploying •Running •Releasing •Conﬁguring •Discovering •Monitoring •Error Handling •Scaling •Developing
Agenda

A lot of this is also useful for monoliths

http://speakerdeck.com/u/bastianhofmann

https://www.ﬂickr.com/photos/npobre/2601582256/

Deployment

How to get the services on our servers?

The same for every service

One Click Deployment

•Ansible •Capistrano •Saltstack •Custom •….

Automation

Build/Test/Release pipeline

Availability

Zero Downtime Deployments

Server Server Server Server

Stability

Canary environments

Server Server Server Server

Server Server Server Server Test with low amount of trafﬁc

Fast deployments

Fast rollbacks

https://www.ﬂickr.com/photos/40987321@N02/5580348753/

Different libraries, packages, web servers, configurations, versions

Base boxes

Services installed in a sandbox

https://www.docker.com/

https://twitter.com/mfdii/status/697532387240996864

Running the service

How do I stop and start a service and ensure
it keeps running?

The same for every service

•Supervisord •Upstart •S6 •Ruine •Monit •Circus •Restartd •…

Releases

How to synchronize changes over services?

API Versioning

GET /v23/foo/abr Host: myservice.local

GET /foo/abr Host: myservice.local X-Version: 23

GET /foo/abr?version=23 Host: myservice.local

GET /foo/abr Host: myservice.local Accept: application/vnd.company.v23+json

No backwards compatibility breaks

Feature Flags

public function hasAccess() { return featureFlag()->isActive( FeatureFlag::TEST_ONE ); }

Every service has its own implementation

Shared database

Headers

GET /foo/abr Host: myservice.local X-Flag-NewFeature: 1

Configuration Management

How do I synchronize configuration over services?

[ "db_user": "user", "db_pw": "pw", "serviceA": "serviceA.local:8018" ]

Config file on disk

Duplication

Inconsistencies

Consul https://www.consul.io/

•Consul •Zookeeper •etcd •…

Consul Server Consul Server Consul Server Consul Agent ver Consul
Agent Server Consul Agent Server Co Ag Server

Key/Value Store

$kv->put('test/foo/bar', 'bazinga'); $kv->get('test/foo/bar', ['raw' => true]); $kv->delete('test/foo/bar');

Credentials

$kv->put('test/db/pw', 'secret_pw');

https://www.vaultproject.io/

Cycling of credentials

Service Discovery

How does one service know where another service is?

Hostname + Port

Server Service A Server Service B Service C Service C

Configuration

$config = [ 'serviceA' => [ '192.168.0.1:8001', '192.168.0.2:8001', ], 'serviceB'
=> [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];

Load balancing?

Round robin in the client

=> [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];

Service/Server down?

=> [ '192.168.0.1:8002', ], 'serviceC' => [ '192.168.0.2:8003', ] ];

Health checks

GET /health HTTP/1.1 Host: serviceA.local HTTP/1.1 200 OK

Central load balancer

Load balancer

Scalability?

Load balancer

Elasticity?

Consul Server Consul Server Consul Server Consul Agent ver Consul
Agent Server Consul Agent Server Co Ag Server

Consul for Service Discovery

Consul Agent Server Service A Registration Health check

Consul API

admin@hashicorp: dig web-frontend.service.consul. ANY ; <<>> DiG 9.8.3-P1 <<>> web-frontend.service.consul.
ANY ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29981 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;web-frontend.service.consul. IN ANY ;; ANSWER SECTION: web-frontend.service.consul. 0 IN A 10.0.3.83 web-frontend.service.consul. 0 IN A 10.0.1.109

Consul-Template https://github.com/hashicorp/consul-template

Load balancer Consul Template

Single Point of Failure

Load balancer Consul Template Load balancer Consul Template

Load balance directly in the client

https://github.com/eBay/fabio

Monitoring

How are my services behaving?

Central Log Management

Elasticsearch  Kibana Logstash

Logstash elasticsearch webserver webserver webserver AMQP log log log logstash
logstash logstash

Tracing IDs

web server http service http service http service http service
create unique trace_id for request user request trace_id trace_id trace_id trace_id log log log log log

X-Trace-Id: bbr8ehb984tbab894

https://www.loggly.com/

https://getsentry.com/

Measure everything

Server metrics

Application metrics

StatsD + Graphite

webserver webserver webserver statsd statsd statsd graphite aggregated UPD message
statsd

https://www.librato.com

http://www.soasta.com/

http://www.monitor.us/

Profiling

newrelic.com

Handling failures

What do I do when something breaks?

Errors happen

Detecting regressions

Server outages

Database overloads

Service A Service B 200 OK

Service A Service B 5xx

Service A Service B Timeout

Circuit Breakers

Service A Service B 200 OK Circuit Breaker Status: closed
Error rate: 0

Service A Service B Error Circuit Breaker Status: -> open
Error rate: > threshold

Service A Service B Circuit Breaker Status: -> open Error
rate: > threshold

Service A Service B Error Circuit Breaker Status: -> open
Error rate: > threshold Test if still failing

Service A Service B 200 OK Circuit Breaker Status: ->
close Error rate: 0 Test if still failing

https://github.com/Netﬂix/Hystrix

https://github.com/odesk/phystrix

Gracefully handling exceptions

Component based fronted

Degrading Functionality

Proﬁle Publications Publication Publication Publication AboutMe LeftColumn Image Menu Institution

Proﬁle Publications Publication Publication Publication AboutMe LeftColumn Image Menu EXCEPTION
Institution

Proﬁle Publications Publication Publication Publication LeftColumn Image Menu Institution

Test it

http://techblog.netﬂix.com/2014/09/introducing-chaos-engineering.html

Scalability

How do I handle traffic spikes?

Elasticity

Service A Service B 200 OK Circuit Breaker

Service A Service B Circuit Breaker Service C Circuit Breaker
Timeouts

Throttling

Only allow xx% of calls

Priority

100% of calls 10% of calls

Elasticity

Service B

Development Environment

How do I enusre a productive dev environment?

Diverse environments

Good documentation

Common setups

https://www.docker.com/

Global Code Search

https://github.com/etsy/hound

Large scale refactorings

Monorepo

https://qafoo.com/talks/15_08_froscon_monorepos.pdf

Complete Solutions

https://mesosphere.github.io/marathon/

https://www.ﬂickr.com/photos/darkdwarf/19701555974/

http://twitter.com/BastianHofmann http://lanyrd.com/people/BastianHofmann http://speakerdeck.com/u/bastianhofmann [email protected]

Elastic scaling in a (micro)service oriented ar...

Elastic scaling in a (micro)service oriented architecture

More Decks by Bastian Hofmann

Other Decks in Programming

Featured

Transcript