Slide 1

Slide 1 text

@saturnism @kubernetesio @googlecloud Writing a Kubernetes Autoscaler Kuberntes API - In Depth

Slide 2

Slide 2 text

@saturnism @kubernetesio @googlecloud SPRINGONE2GX WASHINGTON, DC Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Writing a Kubernetes Autoscaler Ray Tsang @saturnism

Slide 3

Slide 3 text

@saturnism @kubernetesio @googlecloud Ray Tsang Developer Advocate [email protected] @saturnism | +RayTsang

Slide 4

Slide 4 text

@saturnism @kubernetesio @googlecloud Ray Tsang Developer Architect Traveler Photographer flickr.com/saturnism

Slide 5

Slide 5 text

@saturnism @kubernetesio @googlecloud What we are scaling today? Highly scalable distributed in-memory cache

Slide 6

Slide 6 text

@saturnism @kubernetesio @googlecloud Distributed in-memory cache Key-value NoSQL store Wonderful Elastic Property Infinispan

Slide 7

Slide 7 text

@saturnism @kubernetesio @googlecloud Node 1 Cache Node 2 Cache Node 3 Cache A B C

Slide 8

Slide 8 text

@saturnism @kubernetesio @googlecloud Node 1 Cache Node 2 Cache Node 3 Cache A A B B C C N-Copies for Redundancy

Slide 9

Slide 9 text

@saturnism @kubernetesio @googlecloud Node 1 Cache Node 2 Cache Node 3 Cache A A B B C C

Slide 10

Slide 10 text

@saturnism @kubernetesio @googlecloud Node 1 Cache Node 3 Cache A A B C C B

Slide 11

Slide 11 text

@saturnism @kubernetesio @googlecloud Node 1 Cache A B Node 2 Cache B C Node 3 Cache A Node 4 Cache More Capacity! C

Slide 12

Slide 12 text

@saturnism @kubernetesio @googlecloud Node 1 Cache A B Node 2 Cache B C Node 3 Cache A Node 4 Cache C Rebalanced!

Slide 13

Slide 13 text

@saturnism @kubernetesio @googlecloud Wouldn’t it be nice to… Autoscale this?

Slide 14

Slide 14 text

@saturnism @kubernetesio @googlecloud Horizontal Scaling Vertical Scaling

Slide 15

Slide 15 text

@saturnism @kubernetesio @googlecloud Stateless Workload Scale Out System Metrics Easy Stuff

Slide 16

Slide 16 text

@saturnism @kubernetesio @googlecloud Scale In Custom Metrics Stateful Workload Difficult Stuff

Slide 17

Slide 17 text

@saturnism @kubernetesio @googlecloud Today - Difficult Stuff! Autoscale Infinispan, with target # of entries per node, out and in!

Slide 18

Slide 18 text

@saturnism @kubernetesio @googlecloud Keep in Mind I’ve never written an autoscaler before!

Slide 19

Slide 19 text

@saturnism @kubernetesio @googlecloud Oh, By The Way Running in Kubernetes!

Slide 20

Slide 20 text

@saturnism @kubernetesio @googlecloud Enter Kubernetes Greek for “Helmsman”; also the root of the word “Governor” • Container orchestrator • Runs containers • Supports multiple cloud and bare- metal environments • Inspired and informed by Google’s experiences and internal systems • Open source, written in Go Manage applications, not machines

Slide 21

Slide 21 text

@saturnism @kubernetesio @googlecloud Developer View

Slide 22

Slide 22 text

@saturnism @kubernetesio @googlecloud web browsers Scheduler kubectl web browsers scheduler Kubelet Kubelet Kubelet Kubelet Config file Kubernetes Master Container Image Developer View What just happened?

Slide 23

Slide 23 text

@saturnism @kubernetesio @googlecloud Kubernetes Exposes REST API

Slide 24

Slide 24 text

@saturnism @kubernetesio @googlecloud Let’s See It !

Slide 25

Slide 25 text

@saturnism @kubernetesio @googlecloud Keep in Mind I’ve never written an autoscaler before!

Slide 26

Slide 26 text

@saturnism @kubernetesio @googlecloud Autoscaling 101 What I learned

Slide 27

Slide 27 text

@saturnism @kubernetesio @googlecloud Node Your App Metrics Collector JMX Metrics Node Your App Metrics Collector JMX Metrics Metrics Server API Autoscaler Scaling Config Metrics Store Aggregated Metrics Actuator Desired # of Nodes Image / Template Your App Metrics Collector Read Instantiate a new instance Autoscaling Pattern

Slide 28

Slide 28 text

@saturnism @kubernetesio @googlecloud Autoscaler Configuration Autoscaler Scaling Config Metric Target utilization Image / Template Min # of instances Max # of instances CPU, Memory, QPS, etc. 80% or 100 QPS To start a new instance No less than this many No more than this many

Slide 29

Slide 29 text

@saturnism @kubernetesio @googlecloud Push Metrics Metric Metric Name Utilization / Value Instance Name Cluster Name Timestamp CPU, Memory, QPS, etc. 80% or 100 QPS node-1431 Infinispan Cluster 12:00:34.123

Slide 30

Slide 30 text

@saturnism @kubernetesio @googlecloud Instance Name Metric Name Metric Value Timestamp node-1 numberOfEntries 0 23:01:00 node-2 numberOfEntries 0 23:01:00 node-1 numberOfEntries 100 23:01:10 node-2 numberOfEntries 100 23:01:10 node-1 numberOfEntries 150 23:01:20 node-2 numberOfEntries 150 23:01:20 node-1 numberOfEntries 200 23:01:30 node-2 numberOfEntries 200 23:01:30 Example Metrics

Slide 31

Slide 31 text

@saturnism @kubernetesio @googlecloud Instance Name Metric Name Metric Value Timestamp node-1 numberOfEntries 0 23:01:00 node-2 numberOfEntries 0 23:01:00 node-1 numberOfEntries 100 23:01:10 node-2 numberOfEntries 100 23:01:10 node-1 numberOfEntries 150 23:01:20 node-2 numberOfEntries 150 23:01:20 node-1 numberOfEntries 200 23:01:30 node-2 numberOfEntries 200 23:01:30 Example Metrics

Slide 32

Slide 32 text

@saturnism @kubernetesio @googlecloud Instance Name Metric Name Metric Value Timestamp node-1 numberOfEntries 0 23:01:00 node-2 numberOfEntries 0 23:01:00 node-1 numberOfEntries 100 23:01:10 node-2 numberOfEntries 100 23:01:10 node-1 numberOfEntries 150 23:01:20 node-2 numberOfEntries 150 23:01:20 node-1 numberOfEntries 200 23:01:30 node-2 numberOfEntries 200 23:01:30 Example Metrics Instance Name Average Metric Value node-1 150 node-2 150 Sum 300

Slide 33

Slide 33 text

@saturnism @kubernetesio @googlecloud Autoscaler Algorithm target # instances = sum(utilization) target utilization ceil( )

Slide 34

Slide 34 text

@saturnism @kubernetesio @googlecloud Autoscaler Algorithm sum(300) 100 target # instances = target utilization = 100 ceil( )

Slide 35

Slide 35 text

@saturnism @kubernetesio @googlecloud Autoscaler Algorithm min < target # instances < max

Slide 36

Slide 36 text

@saturnism @kubernetesio @googlecloud It’s expensive to scale out Delay the scaling in Watch out for noises

Slide 37

Slide 37 text

@saturnism @kubernetesio @googlecloud Let’s write an autoscaler! The real stuff

Slide 38

Slide 38 text

@saturnism @kubernetesio @googlecloud Let’s see some code

Slide 39

Slide 39 text

@saturnism @kubernetesio @googlecloud I used Spring Boot and Groovy @SpringBootApplication @ComponentScan("org.a8r") @EnableAutoConfiguration @EnableAsync @EnableScheduling class Application { static void main(String[] args) { SpringApplication.run(Application, args) } }

Slide 40

Slide 40 text

@saturnism @kubernetesio @googlecloud Metrics Store - Used a Tree Cache

Slide 41

Slide 41 text

@saturnism @kubernetesio @googlecloud Microservices - Yay! @RestController @RequestMapping("/a8r/autoscaler") @Slf4j class AutoscalerService { }

Slide 42

Slide 42 text

@saturnism @kubernetesio @googlecloud class AutoscalerDefintion { @NotEmpty String replicationControllerId @NotEmpty String metricName @NotNull Double threshold @NotNull Integer duration Integer minReplicas = 1 @NotNull Integer maxReplicas } @RestController @RequestMapping("/a8r/autoscaler") @Slf4j class AutoscalerService { ... }

Slide 43

Slide 43 text

@saturnism @kubernetesio @googlecloud I Love @Slf4J! @Slf4j class AutoscalerService { … log.info "Updated autoscaler for $fqn, {}", definition … }

Slide 44

Slide 44 text

@saturnism @kubernetesio @googlecloud Wake up periodically @Scheduled(fixedRateString = "\${autoscaler.wakeupInterval}") void autoscale() { autoscalerCache.root.children.each { scale(it.data as AutoscalerDefintion) } } @Async void scale(AutoscalerDefintion definition) { … }

Slide 45

Slide 45 text

@saturnism @kubernetesio @googlecloud Autoscaler Algorithm

Slide 46

Slide 46 text

@saturnism @kubernetesio @googlecloud Talking to Kubernetes API

Slide 47

Slide 47 text

@saturnism @kubernetesio @googlecloud SSLSocketFactory with CA Certificate @Bean SSLSocketFactory sslSocketFactory( @Value("#{environment.KUBERNETES_CA_CERT_FILE}") String caCertFile) throws Exception { ... }

Slide 48

Slide 48 text

@saturnism @kubernetesio @googlecloud RestTemplate with Headers and SSLSocketFactory restTemplate.setRequestFactory(new SimpleClientHttpRequestFactory() { protected void prepareConnection(HttpURLConnection conn, String httpMethod) throws IOException { if (headers["Authorization"]) { conn.setRequestProperty("Authorization", headers.getFirst("Authorization")) } if (sslSocketFactory && conn instanceof HttpsURLConnection) { ((HttpsURLConnection) conn).setSSLSocketFactory(sslSocketFactory) } super.prepareConnection(conn, httpMethod); };

Slide 49

Slide 49 text

@saturnism @kubernetesio @googlecloud Certificate Authority… Ugh! Let’s see the code

Slide 50

Slide 50 text

@saturnism @kubernetesio @googlecloud http://github.com/jmxtrans JMXTrans to the rescue!

Slide 51

Slide 51 text

@saturnism @kubernetesio @googlecloud { "servers" : [ { "url": "service:jmx:http-remoting-jmx://${infinispan.host}:${infinispan.port}", "queries" : [ { "obj": "jboss.infinispan:type=Cache,component=Statistics,name=\"namedCache(dist_sync)\",*", "attr": ["numberOfEntries"], "outputWriters": [ { "@class": "com.googlecode.jmxtrans.model.output.A8RWriter", "metricName": "custom.cloudmonitoring.googleapis.com/infinispan/namedCache/numberOfEntries" } ] } ] } ] } JMX → Output Writer

Slide 52

Slide 52 text

@saturnism @kubernetesio @googlecloud Live Demo! Demo time

Slide 53

Slide 53 text

@saturnism @kubernetesio @googlecloud Try Kubernetes Today!

Slide 54

Slide 54 text

@saturnism @kubernetesio @googlecloud In the Roadmap - Native Autoscaling in Kubernetes Nodes Pods Horizontal # of nodes # of pods Vertical resources for a node resources for a pod

Slide 55

Slide 55 text

@saturnism @kubernetesio @googlecloud Horizontal Pod Autoscaling https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/autoscaling.md Autoscaler is First Class Citizen Autoscaler Controller and Resource Using Pod CPU / Memory Utilization

Slide 56

Slide 56 text

@saturnism @kubernetesio @googlecloud Vertical Pod Autoscaling https://github.com/kubernetes/kubernetes/issues/10782 “vertical auto-sizer sets the compute resource limits and request for pods which do not have them set, and periodically adjust them based on demand signals”

Slide 57

Slide 57 text

@saturnism @kubernetesio @googlecloud Horizontal Node Autoscaling https://github.com/kubernetes/kubernetes/issues/11748 Pretty much done on Google Compute Engine! Using Node CPU / Memory Utilization

Slide 58

Slide 58 text

@saturnism @kubernetesio @googlecloud Try out Google Container Engine https://cloud.google.com/container-engine/

Slide 59

Slide 59 text

@saturnism @kubernetesio @googlecloud Thanks! Images by Connie Zhou http://kubernetes.io http://bit.ly/1QLg5E1