Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes at GitHub

Jesse Newland
December 08, 2017

Kubernetes at GitHub

An overview of the on-premesis Kubernetes deployments that power 20% of GitHub's production services, and a review of the challenges GitHub faced and overcame during their Kubernetes journey.

Presented at KubeCon in Austin. Slides with presenter notes are available here:

https://schd.ws/hosted_files/kccncna17/44/kubernetes-at-github.pdf

Jesse Newland

December 08, 2017
Tweet

More Decks by Jesse Newland

Other Decks in Technology

Transcript

  1. Cluster C kube-node kube-apiserver 3x kube-node kube-node 45x Cluster B

    kube-node kube-apiserver 3x kube-node kube-node 67x Cluster A kube-node kube-apiserver 3x kube-node 37x kube-node 67x kube-node 67x 1460 CPUs 5.7 TB RAM 1540 CPUs 5.4 TB RAM 1580 CPUs 6.9 TB RAM
  2. $ kubectl -n github-production get deployment NAME DESIRED CURRENT UP-TO-DATE

    AVAILABLE AGE unicorn 190 190 190 190 168d unicorn-api 164 164 164 164 168d consul-service-router 2 2 2 2 168d
  3. unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role:

    production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions
  4. unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role:

    production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions
  5. unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role:

    production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions
  6. unicorn-api kind: Deployment metadata: name: unicorn-api labels: service: unicorn-api role:

    production spec: replicas: 164 nginx unicorn failbot requests via unix socket exceptions
  7. consul-service-router Metal services github-production Namespace kind: Deployment metadata: name: unicorn

    mysql gpgverify search hookshot spokes memcached kind: Deployment metadata: name: consul-service-router haproxy unicorn kind: Service metadata: name: consul-service-router
  8. consul-service-router Metal services github-production Namespace kind: Deployment metadata: name: unicorn

    mysql gpgverify search hookshot spokes memcached kind: Deployment metadata: name: consul-service-router haproxy unicorn kind: Service metadata: name: consul-service-router
  9. Cluster A kind: Namespace metadata: name: github-production ☁ kind: Service

    metadata: name: unicorn spec: type: NodePort Cluster B kind: Namespace metadata: name: github-production kind: Service metadata: name: unicorn spec: type: NodePort Cluster C kind: Namespace metadata: name: github-production kind: Service metadata: name: unicorn spec: type: NodePort
  10. Tools to support operations • kube-testlib • Continuously running suite

    of conformance tests • kube-health-proxy • Adjust weight of incoming traffic, disable entire clusters at load balancer level • kube-namespace-defaults • Creates default resources in each new namespace, configures imagePullSecrets • kube-pod-patrol • Detects and deletes stuck pods, sets NodeConditions if a node has repeated trouble starting pods • node-problem-healer • Detects NodeConditions, heals them by rebooting nodes
  11. $ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns

    $service-$environment $ deploy -Rf ./config/kubernetes/$environment | \
  12. $ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns

    $service-$environment $ deploy -Rf ./config/kubernetes/$environment | \ kubectl -ns $service-$environment apply —f -
  13. Steady state kind: Service metadata: name: unicorn spec: selector: service:

    unicorn kind: Pod metadata: name: unicorn labels: service: unicorn role: production unicorn
  14. Canary deploy kind: Service metadata: name: unicorn spec: selector: service:

    unicorn kind: Pod metadata: name: unicorn labels: service: unicorn role: production unicorn kind: Pod metadata: name: unicorn-canary labels: service: unicorn role: canary unicorn
  15. Adopting Kubernetes as a standard platform has made it easier

    for GitHub SREs to build features that apply to all services, not just github/github
  16. We're encouraging the decomposition of the monolith by providing a

    first-class experience for newer, smaller services
  17. Distributed systems often use replication to provide fault tolerance, and

    can therefore tolerate node failures. However, data gravity is preferred for reducing replication traffic and cold startup latencies.