$30 off During Our Annual Pro Sale. View Details »

Lessons learned from 1 year of Kubernetes in pr...

Lessons learned from 1 year of Kubernetes in production

Avatar for Amanpreet Singh

Amanpreet Singh

July 30, 2017
Tweet

More Decks by Amanpreet Singh

Other Decks in Technology

Transcript

  1. Lessons learned from 1 year of Kubernetes in production at

    Crowd re 30 Jul 2017 Amanpreet Singh Software Engineer, Crowd re
  2. What the stack looks like 50+ microservices written in Java,

    Node.Js, Python, Go Mostly stateless (except the Chat service) Heavy use of Amazon SQS to decouple some parts Data stores - DynamoDB, MongoDB, Elasticsearch, Aerospike, MySQL, Postgres, Redis, S3
  3. Before Kubernetes State of things before Kubernetes came along Google

    App Engine ---> AWS Monolith ---> Microservices Lots of services == Lots of AWS Elastic Beanstalk environments Slow, less-repeatable deploys Low self-healing ability Underutilization
  4. Before Kubernetes Why it made sense to move to Kubernetes

    Our architecture ts pretty well in Kubernetes world Containers good for packaging == repeatability++ Uni ed pool of resouces & bin-packing == utilization++ Quick container restarts + rescheduling == self-healing++
  5. How we migrated Have 12-factor apps! Containerize all the things!

    To move everything at once or one-by-one?
  6. How we migrated lots of bene ts moving initial few

    services, not so much after that move a relatively less important service rst, to deal with the unknowns move a complex service - if that works, everything else would work too supporting services, that hardly do anything now
  7. Service discovery (internal dns) pitfalls k8s does service discovery via

    pre-populated env vars or internal dns Service IPs don't change unless we delete and recreate the service Use internal dns only when we need the pod IPs directly (in DBs, for example) Protip: Create a service of type ExternalName - easy to set an alias that could be resolved via KubeDNS
  8. Are your apps Kubernetes-ready? Running an app in k8s doesn't

    magically make it awesome Make sure our apps have good healthchecks - k8s won't deploy bad code if you have failing healthchecks! Gracefully handle shutdown
  9. Resource constraints Constraint all the things! Keep all those leaky

    apps from wreaking havoc Choose appropriate QoS Class based on service type/priority - Gauranteed, Burstable or Best-E ort
  10. Surviving Failures Have enough number of pods to survive multiple

    node/pod failures Did you know? AWS provides a CMAAS (Chaos Monkey As a Service) It's called "running in US-EAST-1" Have at least one extra node than required, since new node takes a while to come up.
  11. Logging & Monitoring Since container and logs are ephemeral, ship

    logs quickly! K8S creates symlinks to actual docker logs - with useful info in lenames POD-NAME_NAMESPACE_CONTAINER-NAME_CONTAINER-ID.log Be sure to monitor pod restarts! Check if it was OOM Killed, App Error or Healthcheck failure To run logging & monitoring agent, use Daemonsets https://github.com/ApsOps/filebeat-kubernetes
  12. Stateful applications K8S has StatefulSets (previously PetSets) which are pretty

    awesome This can get tricky though - attaching EBS volumes to nodes may not always work as quickly we expect it to Members coming-and-going are generally costly operations for most of the data stores Bottomline: we don't have to go all in with k8s. Evaluate your use-cases for persistent workloads, and have enough replicas
  13. Kubernetes Alpha resources Be careful when using k8s alpha resources

    - they're alpha for a reason CronJobs (prev. ScheduledJobs) had lots of missed schedules for us
  14. Sticky sessions Since k8s services are L3/L4 based, it can't

    see the headers k8s has a sessionA nity, but it can't see the actual client IP Solution that just works - ELB w/ ProxyProtocol enabled --> intermediary nginx --> websocket app
  15. After Kubernetes Ease of managing lots of services Deploys are

    super fast Much better resource utilization Self-healing services