Kubernetes at GitHub

56348b545d905e840ef32db4a1c85eed?s=47 Jesse Newland
December 08, 2017

Kubernetes at GitHub

An overview of the on-premesis Kubernetes deployments that power 20% of GitHub's production services, and a review of the challenges GitHub faced and overcame during their Kubernetes journey.

Presented at KubeCon in Austin. Slides with presenter notes are available here:

https://schd.ws/hosted_files/kccncna17/44/kubernetes-at-github.pdf

56348b545d905e840ef32db4a1c85eed?s=128

Jesse Newland

December 08, 2017
Tweet

Transcript

  1. Kubernetes at GitHub Jesse Newland @jnewland Principal Site Reliability Engineer

  2. None
  3. 4 years ago

  4. None
  5. None
  6. None
  7. Substrate

  8. None
  9. Substrate

  10. Substrate

  11. Substrate

  12. None
  13. None
  14. None
  15. None
  16. 20% of services run on Kubernetes

  17. None
  18. None
  19. GitHub dot com, the website

  20. $ kubectl get ns github-production NAME STATUS AGE github-production Active

    168d
  21. $ kubectl get ns NAME STATUS AGE github-production Active 168d

    kube-system Active 169d
  22. None
  23. None
  24. Cluster C kube-node kube-apiserver 3x kube-node kube-node 45x Cluster B

    kube-node kube-apiserver 3x kube-node kube-node 67x Cluster A kube-node kube-apiserver 3x kube-node 37x kube-node 67x kube-node 67x 1460 CPUs 5.7 TB RAM 1540 CPUs 5.4 TB RAM 1580 CPUs 6.9 TB RAM
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. $ kubectl -n github-production get deployment NAME DESIRED CURRENT UP-TO-DATE

    AVAILABLE AGE unicorn 190 190 190 190 168d unicorn-api 164 164 164 164 168d consul-service-router 2 2 2 2 168d
  32. unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role:

    production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions
  33. unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role:

    production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions
  34. unicorn kind: Deployment metadata: name: unicorn labels: service: unicorn role:

    production spec: replicas: 190 nginx unicorn failbot requests via unix socket exceptions
  35. unicorn-api kind: Deployment metadata: name: unicorn-api labels: service: unicorn-api role:

    production spec: replicas: 164 nginx unicorn failbot requests via unix socket exceptions
  36. consul-service-router Metal services github-production Namespace kind: Deployment metadata: name: unicorn

    mysql gpgverify search hookshot spokes memcached kind: Deployment metadata: name: consul-service-router haproxy unicorn kind: Service metadata: name: consul-service-router
  37. consul-service-router Metal services github-production Namespace kind: Deployment metadata: name: unicorn

    mysql gpgverify search hookshot spokes memcached kind: Deployment metadata: name: consul-service-router haproxy unicorn kind: Service metadata: name: consul-service-router
  38. None
  39. Cluster A kind: Namespace metadata: name: github-production ☁ kind: Service

    metadata: name: unicorn spec: type: NodePort Cluster B kind: Namespace metadata: name: github-production kind: Service metadata: name: unicorn spec: type: NodePort Cluster C kind: Namespace metadata: name: github-production kind: Service metadata: name: unicorn spec: type: NodePort
  40. Tools to support operations • kube-testlib • Continuously running suite

    of conformance tests • kube-health-proxy • Adjust weight of incoming traffic, disable entire clusters at load balancer level • kube-namespace-defaults • Creates default resources in each new namespace, configures imagePullSecrets • kube-pod-patrol • Detects and deletes stuck pods, sets NodeConditions if a node has repeated trouble starting pods • node-problem-healer • Detects NodeConditions, heals them by rebooting nodes
  41. A platform for builders

  42. A platform for builders

  43. None
  44. None
  45. None
  46. None
  47. GitHub Flow

  48. Conventions

  49. $ docker build -t $service:$sha1 ./Dockerfile

  50. $ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns

    $service-$environment
  51. $ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns

    $service-$environment $ deploy -Rf ./config/kubernetes/$environment | \
  52. $ docker build -t $service:$sha1 ./Dockerfile $ kubectl create ns

    $service-$environment $ deploy -Rf ./config/kubernetes/$environment | \ kubectl -ns $service-$environment apply —f -
  53. Create a branch

  54. Add some commits

  55. Open a pull request

  56. Containers built on push, tagged with commit

  57. Iterate and review

  58. None
  59. # config/kubernetes/review-lab # updates image field value to $service:$sha1 #

    injects a Secret # injects an Ingress
  60. $ kubectl create ns review-lab-$branch $ kubectl apply -ns review-lab-$branch

    -f -
  61. Deploy

  62. None
  63. None
  64. None
  65. Steady state kind: Service metadata: name: unicorn spec: selector: service:

    unicorn kind: Pod metadata: name: unicorn labels: service: unicorn role: production unicorn
  66. Canary deploy kind: Service metadata: name: unicorn spec: selector: service:

    unicorn kind: Pod metadata: name: unicorn labels: service: unicorn role: production unicorn kind: Pod metadata: name: unicorn-canary labels: service: unicorn role: canary unicorn
  67. None
  68. $ kubectl apply \ —-namespace github-production \ -Rf config/kubernetes/production

  69. None
  70. All of the other services deployed to our Kubernetes clusters

    can now use this canary workflow
  71. Adopting Kubernetes as a standard platform has made it easier

    for GitHub SREs to build features that apply to all services, not just github/github
  72. We're encouraging the decomposition of the monolith by providing a

    first-class experience for newer, smaller services
  73. 2018

  74. None
  75. State

  76. State

  77. Distributed systems often use replication to provide fault tolerance, and

    can therefore tolerate node failures. However, data gravity is preferred for reducing replication traffic and cold startup latencies.
  78. None
  79. None
  80. Changing our OSS habits

  81. @jnewland
 
 jnewland@github.com