Kubernetes, Ingress and Traefik Usage at CERN

Slide 1

Slide 1 text

Kubernetes, Ingress and Traefik Usage at CERN Ricardo Rocha CERN Cloud Team

Slide 2

Slide 2 text

About Computing Engineer in the CERN cloud team Focusing on containers, kubernetes and networking Accelerators and ML Previous work in storage and the WLCG (worldwide LHC computing grid) @ahcorporto [email protected]

Slide 3

Slide 3 text

Founded in 1954 What is 96% of the universe made of? Fundamental Science Why isn’t there anti-matter in the universe? What was the state of matter just after the Big Bang?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

~70 PB/year 700 000 Cores ~400 000 Jobs ~30 GiB/s 200+ Sites

Slide 14

Slide 14 text

Computing at CERN Increased numbers, increased automation 1970s 2007

Slide 15

Slide 15 text

Computing at CERN Increased numbers, increased automation 1970 2007

Slide 16

Slide 16 text

Computing at CERN Increased numbers, increased automation 1970 2007

Slide 17

Slide 17 text

Computing at CERN Increased numbers, increased automation 1970 2007

Slide 18

Slide 18 text

Provisioning Deployment Update Physical Infrastructure Days or Weeks Minutes or Hours Minutes or Hours Utilization Poor Maintenance Highly Intrusive Cloud API Virtualization Minutes Minutes or Hours Minutes or Hours Good Potentially Less Intrusive Containers Seconds Seconds Seconds Very Good Less Intrusive

Slide 19

Slide 19 text

Simplified Infrastructure Monitoring, Lifecycle, Alarms Simplified Deployment Uniform API, Replication, Load Balancing Periodic Load Spikes International Conferences, Reprocessing Campaigns

Slide 20

Slide 20 text

Use Cases

Slide 21

Slide 21 text

1 PB / sec < 10 GB / sec Typically split into Hardware and Software Filters ( this might change too ) 40 million particle interactions / second ~3000 multi-core nodes ~30.000 applications to supervise Critical system, sustained failure means data loss Can it be improved for Run 4? Study 2017, Mattia Cadeddu, Giuseppe Avolio Kubernetes 1.5.x A new evaluation phase to be tried this year ATLAS Event Filter

Slide 22

Slide 22 text

How to efficiently distribute experiment software? CernVM-FS (cvmfs): a read-only, hierarchical filesystem In production for several years, battle tested, solved problem Now with containers? Can they carry all required software? > 200 sites in our computing grid ~400 000 concurrent jobs Frequent software releases 100s of GBs

Slide 23

Slide 23 text

Docker Images of ~10GB+ Poorly Layered, Frequently Updated Clusters of 100s of nodes Can we have lazy image distributions? And file level granularity? And caches? Containerd Remote Snapshotter https://bit.ly/3bdkLmh

Slide 24

Slide 24 text

Simulation is one of our major computing workloads x100 soon as described early Deep Learning for Fast Simulation Can we easily distribute to reduce training time? Sofia Vallecorsa, CERN OpenLab Konstantinos Samaras-Tsakiris

Slide 25

Slide 25 text

ATLAS Production System Running a Grid site is not trivial We have > 200 of them Multiple components for Storage and Compute Lots of history in the software Fernando Barreiro Megino Fahui-Lin, Mandy Yang ATLAS Distributed Computing Can a Kubernetes endpoint be a Grid site?

Slide 26

Slide 26 text

1st attempt to ramp up. K8s master running on Medium VM Master killed (OOM) on Saturday Test Cluster with 2000 cores Good: Initial results show error rates as any other site Improvements: defaults on the scheduler causing inefficiencies Pack vs Spread Affinity Predicates, Weights Custom Scheduler?

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Cluster as a Service API / CLI to create, update, resize, delete clusters Ability to pass labels for custom clusters (specific features, versions, components) openstack coe cluster template list | kubernetes-1.17.9-2 | | kubernetes-1.18.6-3 | openstack coe cluster create --cluster-template kubernetes-1.18.6-3 \ --node-count 10 --flavor m2.large \ --labels nvidia_gpu_enabled=true \ mytestcluster openstack coe nodegroup create \ --label availability_zone=cern-geneva-a --node-count 3 ...

Slide 30

Slide 30 text

Common Stack Fedora Core as base, immutable OS Containerd / runc as container runtime - relying on CRI Kubernetes as the container orchestrator Fluentd for log collection and aggregation Prometheus for monitoring and metric collection

Slide 31

Slide 31 text

Ingress: Traefik Traefik has been our default ingress controller from day one Great integration, healthy community and feedback Covered all our initial use cases

Slide 32

Slide 32 text

Traefik and Ingress Master Node Node Master update dns role=ingress Node Node Node Network DB myservice.cern.ch apiVersion: extensions/v1beta1 kind: Ingress metadata: name: myservice-ingress annotations: kubernetes.io/ingress.class: traefik spec: rules: - host: myservice.cern.ch http: paths: - path: / backend: serviceName: myservice servicePort: 8080 Sync watch ingress watch/set nodes

Slide 33

Slide 33 text

Ingress: Traefik and Simple HTTP The simple Ingress definition covers most of our use cases In most cases SSL termination is good enough kind: Ingress metadata: name: myservice-ingress-tls annotations: kubernetes.io/ingress.class: traefik traefik.ingress.kubernetes.io/frontend-entry-points: https spec: rules: - host: myservice.cern.ch http: paths: - path: / backend: serviceName: myservice servicePort: 8080

Slide 34

Slide 34 text

Ingress: Traefik and ACME / Let’s Encrypt Easy, popular solution DNS challenge is not yet an options No API available to update TXT records We rely on the HTTP-01 challenge This requires a firewall opening to get a certificate, not ideal

Slide 35

Slide 35 text

Ingress: Traefik and Client Certificate Some services require info from the client certificate Annotations start being larger than the core Ingress resource definition annotations: kubernetes.io/ingress.class: traefik traefik.ingress.kubernetes.io/frontend-entry-points: https traefik.ingress.kubernetes.io/pass-client-tls-cert: | pem: true infos: notafter: true notbefore: true sans: true subject: country: true province: true locality: true organization: true commonname: true serialnumber: true

Slide 36

Slide 36 text

Ingress: Other Requirements SSL Passthrough Exposing TCP ports HTTP Header based redirection

Slide 37

Slide 37 text

Conclusion + Next Steps Traefik has been very stable in our deployments We need to move Traefik to 2.0 - yes we’re still in 1.x Most used Ingress controller by far - almost 400 clusters using it at CERN Integrate Ingress with our external LBs - using VIP, no DNS Monitor developments of the new Service APIs https://github.com/kubernetes-sigs/service-apis

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text