Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Our fantastic journey through the Cloud Native ...

Our fantastic journey through the Cloud Native World.

from our k8s meetup on 30. July 2019

https://www.meetup.com/dreamIT-Hamburg/events/263190198/

We would like to tell you about the result of our journey through the world of Kubernetes and our successful launch at the beginning of 2019.
We look back on our way of Kubernetes adaptation and describe the hurdles we had to overcome to run our applications on Kubernetes stable and scalable. We introduce the tools we use from the CNCF ecosystem, show how we use them, and give you some insights, lessons learned, and experiences we've had with these tools.

dreamIT

July 30, 2019
Tweet

More Decks by dreamIT

Other Decks in Technology

Transcript

  1. Our fantastic journey
 Lessons learned Our adaption of Kubernetes, cloud

    native tools and thinking André Veelken, DevOps Engineer@DreamIT
  2. Overview (1/3) • The beginning • The challenge of running

    a Java monolith on Kubernetes • Changing the mindset • Our favourite k8s distro: kops • The big switch • ElasticSearch on Kubernetes • Fluentd & fluentbit
  3. Overview (2/3) • Helm: organizing the yaml mess • Security

    • Monitoring: Prometheus with prometheus-operator and kube-prometheus • Infrastructure testing • Authentication & authorization • Gitlab CICD
  4. The beginning • Three Kubernetes clusters in the beginning: dev,

    CICD, prod; version 1.7 • First microservice going live mid 2017, second and third December 2017 • „kubectl apply -f“ deployments • No RBAC, manual kubeconfig handling • OS: Debian, FluentD, traefik ingress, SysDig monitoring • Problems to convince business and get developer time for cloud/k8s project.
  5. Running a Java monolith on k8s • state is hard

    • Payara, hazelcast („hasslecast“) • Migration a classical three tier architecture • Long time to get pods warmed up and ready
  6. Changing the mindset • Pods are mortal, short-lived
 -> More

    disruption • Cattle, not pets • More flexibility for devs • Several months of prod deployments
 to old env and k8s
 -> Gain trust
  7. Kops — Our k8s distro of choice • well tested,

    rock solid • rather old k8s versions • no problems with k8s updates even to 1.12 with etcd3 • once, a cluster died last year (~one year old) • Container Linux with CLUO (update operator) • spin up a test cluster via CI
  8. The big switch — moving to the cloud • Went

    smooth in general • Problems with secrets and 
 helm (chicken or egg problem) • Prometheus and ElasticSearch 
 collapsed several times under 
 load
  9. Elasticsearch on Kubernetes • We use chart from helm/stable repo.

    • Recommendable! • Almost no reason to have ES cluster outside of k8s. • Exeption: logs lost when cluster crashed
  10. Fluentd & fluentbit • use of fluentd since beginning •

    need for encrypted logs • complex filter chain • trial and error • bad documentation • fluentbit evaluated short time ago
  11. Helm • we use v2.13.1 at the moment • complex

    deployments • rollback feature & installation of previous versions broken • broken states of deployments • no alternatives, kustomize e.g. does sth. different • hopes for helm 3
  12. Security • RBAC • No root user in containers •

    No Kubernetes dashboard / web UI • AWS audit trail: CloudTrail with alerting • Clair container vulnerability scanning in CI
  13. Monitoring: Prometheus with 
 prometheus-operator & kube-prometheus • CoreOS Prometheus

    operator runs Prometheus, Grafana, Alertmanager • Initially installed with helm — painful • Better solution: jsonnet • CI
  14. Authentication & authorization • At first: manual management of kubeconfig

    files • Then: Dex (OpenID connect provider), kuberos, OAuth proxy -> RBAC authorization • Authentication through GitLab • Now: Heptio Gangway, Keycloak -> RBAC authorization • Authentication through GitLab
  15. Gitlab • Single sign on for our Karma services, e.g.

    Kibana, Grafana and Gangway • Most rollouts via helm CIs into different clusters • Docker builds • Test cluster setup
  16. Cluster networking: CNI • Weave in the beginning • Then:

    canal (calico and flannel combined) • No use of calico features for a long time • Special case: GCE cluster uses kubenet, switch to weave (real CNI) planned https://github.com/kubernetes/ kops/issues/2087
  17. DNS inside and outside of clusters • Inside clusters •

    Kube-dns for a long time • 5 sec responses from time to time • Switch to CoreDNS • Better performance and autopath plugin • Outside clusters • External-dns by Zalando -> Route53, CloudFlare • Just annotate your deployment
  18. Ingress & SSL • Traefik in the beginning • Switch

    to nginx-ingress before big migration • Rock solid, better performance than traefik • Cert-manager by Jetstack • Lets encrypt usage
  19. Conclusion (1/2) & • Better scaling: horizontal pod autoscaler (HPA)

    and aws- autoscaler (more nodes) ☁ ☁ ☁ • Self healing but tuning apps and load testing was a lot of effort • Far better operation even though many 12factor rules are neglected • It’s always good to have a failover cluster in place ⛈ • Databases are located outside of Kubernetes.
  20. Conclusion (2/2) & • No network problems anymore • Just

    scale with customer traffic • Six DevOps engineers full time on project and one dev team needed • Engaging with community helps a lot • High learning curve with new tools
  21. The future • pod security policies • network policies •

    anomaly detection in pods • more microservices
  22. Sources • pictures from https://commons.wikimedia.org • kops https://github.com/kubernetes/kops • fluentd

    https://www.fluentd.org/ • Elasticsearch Chart https://github.com/helm/charts/ tree/master/stable/elasticsearch