Slide 1

Slide 1 text

Resilience in der Praxis Spring Cloud Netflix

Slide 2

Slide 2 text

Speaker Benjamin Wilms codecentric @MrBWilms github.com/MrBW

Slide 3

Slide 3 text

Agenda Resilience Ribbon Spring Cloud Timeout Bulkheads Circuit Breaker Simian Army Hystrix Fallback Configuration Dynamic Configuration Chaos Monkey Archaius Metrics Stream Dashboard Turbine RabbitMQ Spring Boot Admin ZipKin

Slide 4

Slide 4 text

Patterns of Resilience

Slide 5

Slide 5 text

Patterns of Resilience Spring Cloud & Spring Cloud Netflix

Slide 6

Slide 6 text

Patterns by Spring Cloud Netflix Core Isolation Hystrix Bulkhead Hystrix Detect Circuit Breaker Hystrix Timeout Hystrix Monitoring Hystrix, Eureka Health Check Hystrix, Eureka Fail Fast Hystrix Recover Retry Hystrix Failover Hystrix, Eureka, Ribbon Mitigate Fallback Hystrix Fail Silent Hystrix Share Load Ribbon Complement Redundancy Ribbon, Eureka

Slide 7

Slide 7 text

Hystrix Introduction

Slide 8

Slide 8 text

Hystrix Introduction

Slide 9

Slide 9 text

Hystrix Introduction

Slide 10

Slide 10 text

Semaphore vs. Threads Threads Timeout Handling Isolation vom Aufrufer Fallbacks Circuit Breaker Semaphore keine Isolation vom Aufrufer Fallbacks Circuit Breaker (Counting)

Slide 11

Slide 11 text

Hystrix Overhead 10% > 3 ms Overhead 1% > 9 ms Overhead 10+ Millarden Requests

Slide 12

Slide 12 text

Transport my Package

Slide 13

Slide 13 text

Transport my Package Flow

Slide 14

Slide 14 text

Transport my Package Flow

Slide 15

Slide 15 text

Transport my Package Flow

Slide 16

Slide 16 text

Transport my Package Hystrix

Slide 17

Slide 17 text

Service Discovery Netflix Eureka

Slide 18

Slide 18 text

Service Discovery Netflix Eureka (Scaling)

Slide 19

Slide 19 text

Service Discovery - Eureka SPRING-BOOT-ADMIN n/a (1) (1) UP (1) - 99c094703dba:spring-boot-admin (http://172.19.0.16:8080/info) TRANSPORT-API-GATEWAY n/a (1) (1) UP (1) - 7a7cd3e3d526:transport-api-gateway (http://172.19.0.15:8080/info) ZIPKIN-SERVICE n/a (1) (1) UP (1) - 44ea5b30aa79:zipkin-service:9411 (http://172.19.0.14:9411/info) General Info Name Value total-avail-memory 329mb environment test num-of-cpus 6 current-memory-usage 113mb (34%) server-uptime 00:17 registered-replicas http://localhost:8761/eureka/ unavailable-replicas http://localhost:8761/eureka/, available-replicas Instance Info Name Value ipAddr 172.19.0.2 status UP

Slide 20

Slide 20 text

Spring Boot Admin Spring Boot applications Filter n Application / URL Version Info Status n ADDRESS-SERVICE undefined 2 UP n BOOKING-SERVICE undefined 2 UP CONFIGSERVER (49f0e0c8) http://172.19.0.4:8888 UP n CONNOTE-SERVICE undefined 2 UP n CUSTOMER-SERVICE undefined 2 UP HYSTRIX-TURBINE-DASHBOARD (03dc7b82) http://172.19.0.13:8080 UP SPRING-BOOT-ADMIN (58cf2315) http://172.19.0.16:8080 UP TRANSPORT-API-GATEWAY (3904cfe9) http://172.19.0.15:8080 UP ZIPKIN-SERVICE (23ce6947) http://172.19.0.14:9411 UP Reference Guide (https://codecentric.github.io/spring-boot-admin/1.5.3) - Sources (https://github.com/codecentric/spring-boot-admin) - Code licensed under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) Ę Details Ë Ę Details Ë Ę Details Ë Ę Details Ë Ę Details Ë

Slide 21

Slide 21 text

Ribbon Client-Side-Load-Balancing collects all service instances from Eureka calling remote service by logical-name (Eureka) calling remote service instances by round-robbin

Slide 22

Slide 22 text

Ribbon Spring RestTemplate integration ... @LoadBalanced @Bean RestTemplate restTemplate() { return new RestTemplate(); } ...

Slide 23

Slide 23 text

Ribbon Spring RestTemplate integration ... restTemplate.getForObject("http://connote-service/rest/connote/create", ConnoteDTO.class); ...

Slide 24

Slide 24 text

Transport Service Booking Request create-booking.sh (demo-scripts) % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 730 0 411 100 319 131 102 0:00:03 0:00:03 --:--:-- 131 { "fallback": false, "connoteDTO": { "fallback": false, "connote": 4994510 }, "customerDTO": { "customerId": 1, "customerName": "Meier" }, "service-response-status": [ { "serviceName": "booking-service", "status": "OK" }, { "serviceName": "connote-service", "status": "OK" }, { "serviceName": "address-service > sender", "status": "OK" }, { "serviceName": "address-service > receiver", "status": "OK" },

Slide 25

Slide 25 text

Simian Army

Slide 26

Slide 26 text

Simian Army - Chaos Monkey

Slide 27

Slide 27 text

AOP Chaos Monkey @EnableChaosMonkey

Slide 28

Slide 28 text

AOP Chaos Monkey Timeout

Slide 29

Slide 29 text

AOP Chaos Monkey Exception

Slide 30

Slide 30 text

AOP Chaos Monkey Nothing happens...

Slide 31

Slide 31 text

AOP - Chaos Monkey @EnableChaosMonkey @EnableChaosMonkey @SpringBootApplication public class ConnoteServiceApplication { public static void main(String[] args) { SpringApplication.run(ConnoteServiceApplication.class, args); } }

Slide 32

Slide 32 text

chaos.monkey.active=(true | false) AOP - Chaos Monkey @Around @Around("execution( de.codecentric.resilient...Service.(..))") public Object proceedAround(ProceedingJoinPoint pjp) throws Throwable { ... return pjp.proceed(); }

Slide 33

Slide 33 text

AOP - Chaos Monkey GitHub github.com/MrBW/spring-aop-chaos-monkey Todo: extract static execution, make it flexible!

Slide 34

Slide 34 text

Configuration @ Runtime Spring Cloud Config Hystrix & Archaius

Slide 35

Slide 35 text

Configuration @ Runtime Archaius dynamic & typed thread safe polling framework callback mechanism dynamic configuration

Slide 36

Slide 36 text

Configuration @ Runtime Resources AWS Dynamo DB Apache ZooKeeper CoreOS etcd Classpath Resourcen File URI´s

Slide 37

Slide 37 text

Spring Cloud Config

Slide 38

Slide 38 text

Spring Cloud Config

Slide 39

Slide 39 text

Spring Cloud Config <> Archaius Spring Configuration Proxy @Configuration public class DefautConfiguration { @RefreshScope @Bean public AbstractConfiguration archaiusConfiguration() throws Exception { LOGGER.info("Enable Archaius Configuration"); ConcurrentMapConfiguration concurrentMapConfiguration = new ConcurrentMapConfiguration(); return concurrentMapConfiguration; } }

Slide 40

Slide 40 text

Spring Cloud Config File structure Naming convention: { }-{ }.properties

Slide 41

Slide 41 text

Spring Cloud Config http://localhost:8888/address-service/default { "label": null, "name": "address-service", "profiles": [ "default" ], "propertySources": [ { "name": "file:/resilient-transport-config/demo-props/application.properties", "source": { "chaos.monkey.active": "false", "hystrix.command.BookingServiceClient.execution.isolation.thread.timeoutInMilliseconds": "1000", "hystrix.command.ConnoteServiceClient.execution.isolation.thread.timeoutInMilliseconds": "1000", "second.service.call": "false" } }, { "name": "file:/resilient-transport-config/address-service.properties", "source": { "logging.level.": "ERROR" } }, {

Slide 42

Slide 42 text

Activate Chaos Monkey /config-server-props/demo-props/application.properties GNU nano 2.0.6 File: application.properties Modified # Props changed by demo chaos.monkey.active=true second.service.call=false hystrix.command.BookingServiceClient.execution.isolation.thread.timeoutInMilliseconds=1000 hystrix.command.ConnoteServiceClient.execution.isolation.thread.timeoutInMilliseconds=1000 chaos.monkey.level=6 ^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos ^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell

Slide 43

Slide 43 text

Transport Service Booking Request create-booking.sh (demo-scripts) }, "customerDTO": { "customerId": 1, "customerName": "Meier" }, "service-response-status": [ { "serviceName": "booking-service", "status": "OK" }, { "serviceName": "connote-service", "status": "OK" }, { "serviceName": "address-service > sender", "status": "OK" }, { "serviceName": "address-service > receiver", "status": "OK" }, { "serviceName": "customer-service", "status": "OK" } ] } somni:scripts benjaminwilms$

Slide 44

Slide 44 text

What can we do? Caching Messaging Stubbed Fallback - static defaults Scaling docker-compose up -d --scale booking-service=2 connote-service=2 customer-service=2 address-service=2

Slide 45

Slide 45 text

Fallback scaling demo-compose somni:~ benjaminwilms$ demo-compose WARNING: The scale command is deprecated. Use the up command with the --scale flag instead. Starting resilienttransportservice_connote-service_1 ... done Creating resilienttransportservice_connote-service_2 ... Creating resilienttransportservice_connote-service_2 ... done Starting resilienttransportservice_booking-service_1 ... done Creating resilienttransportservice_booking-service_2 ... Creating resilienttransportservice_booking-service_2 ... done Starting resilienttransportservice_customer-service_1 ... done Creating resilienttransportservice_customer-service_2 ... Creating resilienttransportservice_customer-service_2 ... done Starting resilienttransportservice_address-service_1 ... done Creating resilienttransportservice_address-service_2 ... Creating resilienttransportservice_address-service_2 ... done somni:~ benjaminwilms$

Slide 46

Slide 46 text

Service Discovery - Eureka HYSTRIX-TURBINE-DASHBOARD n/a (1) (1) UP (1) - 0ff7f4bc8e70:hystrix-turbine-dashboard:8080 (http://172.19.0.13:8080/info) SPRING-BOOT-ADMIN n/a (1) (1) UP (1) - 99c094703dba:spring-boot-admin (http://172.19.0.16:8080/info) TRANSPORT-API-GATEWAY n/a (1) (1) UP (1) - 7a7cd3e3d526:transport-api-gateway (http://172.19.0.15:8080/info) ZIPKIN-SERVICE n/a (1) (1) UP (1) - 44ea5b30aa79:zipkin-service:9411 (http://172.19.0.14:9411/info) General Info Name Value total-avail-memory 329mb environment test num-of-cpus 6 current-memory-usage 123mb (37%) server-uptime 00:19 registered-replicas http://localhost:8761/eureka/ unavailable-replicas http://localhost:8761/eureka/, available-replicas Instance Info Name Value ipAddr 172.19.0.2 status UP

Slide 47

Slide 47 text

Feature toggle activate "second service call" public class AddressCommand extends HystrixCommand { ... @Override protected AddressResponseDTO getFallback() { if (secondTry) { AddressCommand addressCommand = new AddressCommand(addressDTO, restTemplate, false); return addressCommand.execute(); } else { AddressResponseDTO addressResponseDTO = new AddressResponseDTO(); addressResponseDTO.setFallback(true); ... return addressResponseDTO; } } } }

Slide 48

Slide 48 text

Feature toggle activate "second service call" second.service.call=true GNU nano 2.0.6 File: application.properties Modified # Props changed by demo chaos.monkey.active=true second.service.call=true hystrix.command.BookingServiceClient.execution.isolation.thread.timeoutInMilliseconds=1000 hystrix.command.ConnoteServiceClient.execution.isolation.thread.timeoutInMilliseconds=1000 chaos.monkey.level=6 ^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos ^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell

Slide 49

Slide 49 text

Transport Service Booking Request create-booking.sh (demo-scripts) }, "customerDTO": { "customerId": 1, "customerName": "Meier" }, "service-response-status": [ { "serviceName": "booking-service", "status": "OK" }, { "serviceName": "connote-service", "status": "OK" }, { "serviceName": "address-service > sender", "status": "OK" }, { "serviceName": "address-service > receiver", "status": "OK" }, { "serviceName": "customer-service", "status": "OK" } ] } somni:scripts benjaminwilms$

Slide 50

Slide 50 text

Hystrix Command Connote Timeout - Hystrix defaults

Slide 51

Slide 51 text

Hystrix Command change Connote Timeout

Slide 52

Slide 52 text

Hystrix timeout - Connote Service ...ConnoteServiceClient...timeoutInMilliseconds=1200 ...ConnoteServiceClient...timeoutInMilliseconds=500 GNU nano 2.0.6 File: application.properties # Props changed by demo chaos.monkey.active=true second.service.call=true hystrix.command.BookingServiceClient.execution.isolation.thread.timeoutInMilliseconds=1200 hystrix.command.ConnoteServiceClient.execution.isolation.thread.timeoutInMilliseconds=500 chaos.monkey.level=6 [ Read 7 lines ] ^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos ^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell

Slide 53

Slide 53 text

Transport Service Booking Request (second service call) create-booking.sh (demo-scripts) }, "customerDTO": { "customerId": 1, "customerName": "Meier" }, "service-response-status": [ { "serviceName": "booking-service", "status": "OK" }, { "serviceName": "connote-service", "status": "OK" }, { "serviceName": "address-service > sender", "status": "OK" }, { "serviceName": "address-service > receiver", "status": "OK" }, { "serviceName": "customer-service", "status": "OK" } ] } somni:scripts benjaminwilms$

Slide 54

Slide 54 text

Hystrix Monitoring & Metriken via REST Endpoint via JMX via Logfile via Elastic, Splunk und Co. via Messaging

Slide 55

Slide 55 text

REST Endpoint Zugriff via HystrixCommandMetrics: Result: HystrixCommandMetrics.getInstance(ConnoteRESTCommand.CONNOTE_KEY) { averageExecutionTime: 0, commandGroupKey: "ConnoteRESTCommandGroupKey", commandKey: "ConnoteCommand", concurrentExecutionCount: 0, errorCount: 1, healthCounts: "HealthCounts[1 / 1 : 100%]", totalRequests: 1 }

Slide 56

Slide 56 text

Hystrix Dashboard Spring Boot Admin Spring Boot applications Filter n Application / URL Version Info Status n ADDRESS-SERVICE undefined 2 UP n BOOKING-SERVICE undefined 2 UP CONFIGSERVER (49f0e0c8) http://172.19.0.4:8888 UP n CONNOTE-SERVICE undefined 2 UP n CUSTOMER-SERVICE undefined 2 UP HYSTRIX-TURBINE-DASHBOARD (03dc7b82) http://172.19.0.13:8080 UP SPRING-BOOT-ADMIN (58cf2315) http://172.19.0.16:8080 UP TRANSPORT-API-GATEWAY (3904cfe9) http://172.19.0.15:8080 UP ZIPKIN-SERVICE (23ce6947) http://172.19.0.14:9411 UP Reference Guide (https://codecentric.github.io/spring-boot-admin/1.5.3) - Sources (https://github.com/codecentric/spring-boot-admin) - Code licensed under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) Ę Details Ë Ę Details Ë Ę Details Ë Ę Details Ë Ę Details Ë APPLICATIONS JOURNAL ABOUT

Slide 57

Slide 57 text

Hystrix Metriken via RabbitMQ Metriken werden via RabbitMQ bereitgestellt Aggregation via Turbine als Consumer Hystrix Dashboard verarbeitet den Turbine Stream

Slide 58

Slide 58 text

Hystrix Metriken via RabbitMQ

Slide 59

Slide 59 text

Transport Service Booking Request (second service call) simulate-booking.sh (demo-scripts) Runs until > 08:16:29 somni:scripts benjaminwilms$

Slide 60

Slide 60 text

Hystrix Dashboard & Turbine Stream Circuit Thread Pools Hystrix Stream: http://localhost:8989/ Sort: Error then Volume | Alphabetical | Volume | Error | Mean | Median | 90 | 99 | 99.5 Success | Short-Circuited | Bad Request | Timeout | Rejected | Failure | Error % book...ConnoteServiceClient 55.0 % 5 0 0 3 0 0 Hosts 2 90th 764ms Median 258ms 99th 1007ms Mean 323ms 99.5th 1007ms Host: 0.5/s Cluster: 0.9/s Circuit Closed tran...AddressServiceClient 25.0 % 4 0 3 15 0 0 Hosts 1 90th 1014ms Median 16ms 99th 1024ms Mean 186ms 99.5th 1096ms Host: 2.4/s Cluster: 2.4/s Circuit Closed tran...ustomerServiceClient 46.0 % 5 0 1 6 0 0 Hosts 1 90th 1016ms Median 17ms 99th 2005ms Mean 316ms 99.5th 2005ms Host: 1.3/s Cluster: 1.3/s Circuit Closed tran...BookingServiceClient 28.0 % 2 0 0 5 0 0 Hosts 1 90th 1230ms Median 32ms 99th 2213ms Mean 468ms 99.5th 2213ms Host: 0.7/s Cluster: 0.7/s Circuit Closed Sort: Alphabetical | Volume | AddressServiceClientGroup Active 1 Max Active 4 Queued 0 Executions 24 Pool Size 10 Queue Size 5 Host: 2.4/s Cluster: 2.4/s CustomerServiceClientGroup Active 3 Max Active 2 Queued 0 Executions 13 Pool Size 10 Queue Size 5 Host: 1.3/s Cluster: 1.3/s ConnoteServiceClientGroup Active 3 Max Active 4 Queued 0 Executions 8 Pool Size 18 Queue Size 2 Host: 0.8/s Cluster: 1.6/s BookingServiceClientGroup Active 2 Max Active 2 Queued 0 Executions 5 Pool Size 10 Queue Size 5 Host: 0.5/s Cluster: 0.5/s

Slide 61

Slide 61 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 62

Slide 62 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 63

Slide 63 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 64

Slide 64 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 65

Slide 65 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 66

Slide 66 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 67

Slide 67 text

Distributed Tracing Spring Cloud Sleuth & ZipKin

Slide 68

Slide 68 text

Distributed Tracing Spring Cloud Sleuth & ZipKin ... org.springframework.cloud spring-cloud-sleuth-stream org.springframework.cloud spring-cloud-starter-sleuth org.springframework.cloud spring-cloud-stream-binder-rabbit ...

Slide 69

Slide 69 text

Distributed Tracing Spring Cloud Sleuth & ZipKin Supported by Sleuth: Hystrix RxJava RestTemplate Feign Messaging with Spring Integration Zuul ...

Slide 70

Slide 70 text

Distributed Tracing Spring Cloud Sleuth & ZipKin Start time 08-31-2017 10:10 End time 09-01-2017 10:10 Analyze Dependencies transport-api-gateway booking-service customer-service address-service connote-service

Slide 71

Slide 71 text

Distributed Tracing More... http://opentracing.io/

Slide 72

Slide 72 text

Finished github.com/MrBW/resilient-transport-service