Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes for 14 months

Kubernetes for 14 months

Presentation given at DockerGrunn #10

Small note: Don't use Kube-Lego but CertManager

Joshua Peper

February 27, 2019
Tweet

More Decks by Joshua Peper

Other Decks in Technology

Transcript

  1. Why Kubernetes Hipster tech Verifai started from scratch Experience in

    Docker Whole dev setup in Docker Need to be able to hyper scale
  2. Why all at once No infra to replace Starting phase

    Downtime is not a disaster Should not be able to go down by design Help from OSSO
  3. Current stack (running services) Django Backend Celery workers RabbitMQ Redis

    Galera / MariaDB Public Docker registry Website Pootle Elasticsearch (ELK) Kibana APM Review env Public docs Sentry PostgreSQL Redis Minio Backup systems
  4. Fix nodes that use storage For example: Static files (NGINX)

    Database servers (Galera / MariaDB) Object storage (Minio (S3 compatible)) Blob storage (CEPH)
  5. Fix nodes that use storage Problem: Pinning means downtime (if

    that node is down) Minio has options to replicate itself
  6. Monitor your jobs We have a few large jobs Packaging

    all the data required to train neural networks Lots of IO Lots of DB queries A lot of calculations 100k loops, and memory leaks It literally has taken down our entire cluster a dozen times
  7. But it couldn’t go down? Yes, because of Services: Spreads

    load to all servers Uses all cores in the cluster Using all DB servers Writing so fast to the disk Cluster down
  8. Pod anti-affinity Kubernetes schedules by default to node with most

    spare resources Can be that all your backend pods run on 1 node Gets redistributed when node goes down Takes 2-5 seconds (for our backend) Also prevents all pods on one node issue
  9. Python / Django TensorFlow / AI Kubernetes / DevOps Swift

    for iOS Kotlin for Android C++ PHP / Laravel Hypervisors / BGP / DevOps Swift for iOS Kotlin for Android