Slide 1

Slide 1 text

Kubernetes at GitHub Jesse Newland @jnewland Principal Site Reliability Engineer

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

4 years ago

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Substrate

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Substrate

Slide 10

Slide 10 text

Substrate

Slide 11

Slide 11 text

Substrate

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Slide 17

Slide 17 text

20% of services run on Kubernetes

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

GitHub dot com, the website

Slide 21

Slide 21 text

$ kubectl get ns github-production NAME STATUS AGE github-production Active 168d

Slide 22

Slide 22 text

$ kubectl get ns NAME STATUS AGE github-production Active 168d kube-system Active 169d

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Cluster C kube-node kube-apiserver 3x kube-node kube-node 45x Cluster B kube-node kube-apiserver 3x kube-node kube-node 67x Cluster A kube-node kube-apiserver 3x kube-node 37x kube-node 67x kube-node 67x 1460 CPUs 5.7 TB RAM 1540 CPUs 5.4 TB RAM 1580 CPUs 6.9 TB RAM

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

$ kubectl -n github-production get deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE unicorn 190 190 190 190 168d unicorn-api 164 164 164 164 168d consul-service-router 2 2 2 2 168d

Slide 33

Slide 33 text

unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role: production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions

Slide 34

Slide 34 text

unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role: production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions

Slide 35

Slide 35 text

unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role: production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions

Slide 36

Slide 36 text

unicorn-api kind: Deployment metadata: name: unicorn-api labels: service: unicorn-api role: production spec: replicas: 164 nginx unicorn failbot requests via unix socket exceptions

Slide 37

Slide 37 text

consul-service-router Metal services github-production Namespace kind: Deployment metadata: name: unicorn mysql gpgverify search hookshot spokes memcached kind: Deployment metadata: name: consul-service-router haproxy unicorn kind: Service metadata: name: consul-service-router

Slide 38

Slide 38 text

consul-service-router Metal services github-production Namespace kind: Deployment metadata: name: unicorn mysql gpgverify search hookshot spokes memcached kind: Deployment metadata: name: consul-service-router haproxy unicorn kind: Service metadata: name: consul-service-router

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Cluster A kind: Namespace metadata: name: github-production ☁ kind: Service metadata: name: unicorn spec: type: NodePort Cluster B kind: Namespace metadata: name: github-production kind: Service metadata: name: unicorn spec: type: NodePort Cluster C kind: Namespace metadata: name: github-production kind: Service metadata: name: unicorn spec: type: NodePort

Slide 41

Slide 41 text

Tools to support operations • kube-testlib • Continuously running suite of conformance tests • kube-health-proxy • Adjust weight of incoming traffic, disable entire clusters at load balancer level • kube-namespace-defaults • Creates default resources in each new namespace, configures imagePullSecrets • kube-pod-patrol • Detects and deletes stuck pods, sets NodeConditions if a node has repeated trouble starting pods • node-problem-healer • Detects NodeConditions, heals them by rebooting nodes

Slide 42

Slide 42 text

A platform for builders

Slide 43

Slide 43 text

A platform for builders

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

GitHub Flow

Slide 49

Slide 49 text

Conventions

Slide 50

Slide 50 text

$ docker build -t $service:$sha1 ./Dockerfile

Slide 51

Slide 51 text

$ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns $service-$environment

Slide 52

Slide 52 text

$ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns $service-$environment $ deploy -Rf ./config/kubernetes/$environment | \

Slide 53

Slide 53 text

$ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns $service-$environment $ deploy -Rf ./config/kubernetes/$environment | \ kubectl -ns $service-$environment apply —f -

Slide 54

Slide 54 text

Create a branch

Slide 55

Slide 55 text

Add some commits

Slide 56

Slide 56 text

Open a pull request

Slide 57

Slide 57 text

Containers built on push, tagged with commit

Slide 58

Slide 58 text

Iterate and review

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

# config/kubernetes/review-lab # updates image field value to $service:$sha1 # injects a Secret # injects an Ingress

Slide 61

Slide 61 text

$ kubectl create ns review-lab-$branch $ kubectl apply -ns review-lab-$branch -f -

Slide 62

Slide 62 text

Deploy

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

Steady state kind: Service metadata: name: unicorn spec: selector: service: unicorn kind: Pod metadata: name: unicorn labels: service: unicorn role: production unicorn

Slide 67

Slide 67 text

Canary deploy kind: Service metadata: name: unicorn spec: selector: service: unicorn kind: Pod metadata: name: unicorn labels: service: unicorn role: production unicorn kind: Pod metadata: name: unicorn-canary labels: service: unicorn role: canary unicorn

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

$ kubectl apply \ —-namespace github-production \ -Rf config/kubernetes/production

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

All of the other services deployed to our Kubernetes clusters can now use this canary workflow

Slide 72

Slide 72 text

Adopting Kubernetes as a standard platform has made it easier for GitHub SREs to build features that apply to all services, not just github/github

Slide 73

Slide 73 text

We're encouraging the decomposition of the monolith by providing a first-class experience for newer, smaller services

Slide 74

Slide 74 text

2018

Slide 75

Slide 75 text

No content

Slide 76

Slide 76 text

State

Slide 77

Slide 77 text

State

Slide 78

Slide 78 text

Distributed systems often use replication to provide fault tolerance, and can therefore tolerate node failures. However, data gravity is preferred for reducing replication traffic and cold startup latencies.

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

Changing our OSS habits

Slide 82

Slide 82 text

Slide 83

Slide 83 text

@jnewland
 
 [email protected]