Slide 1

Slide 1 text

Infrastructure at CERN Scale Ricardo Rocha - CERN Cloud Team @ahcorporto [email protected]

Slide 2

Slide 2 text

Founded in 1954 What is 96% of the universe made of? Fundamental Science Why isn’t there anti-matter in the universe? What was the state of matter just after the Big Bang?

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

7

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

9

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

~70 PB/year 700 000 Cores ~400 000 Jobs ~30 GiB/s 200+ Sites

Slide 13

Slide 13 text

Computing at CERN Increased numbers, increased automation 1970s 2007

Slide 14

Slide 14 text

Computing at CERN Increased numbers, increased automation 1970 2007

Slide 15

Slide 15 text

Computing at CERN Increased numbers, increased automation 1970 2007

Slide 16

Slide 16 text

Computing at CERN Increased numbers, increased automation 1970 2007

Slide 17

Slide 17 text

CERN IT Today 200+ people Storage, Computing and Monitoring, Databases, Network, DC, … Batch Systems and Core Physics Services But also campus services: hotel, bike service, wifi, … Common for teams to work on 2 weeks sprints, even for operations Rota system per team ServiceNow for end user support tickets

Slide 18

Slide 18 text

Provisioning Deployment Update Physical Infrastructure Days or Weeks Minutes or Hours Minutes or Hours Utilization Poor Maintenance Highly Intrusive

Slide 19

Slide 19 text

Provisioning Deployment Update Physical Infrastructure Days or Weeks Minutes or Hours Minutes or Hours Utilization Poor Maintenance Highly Intrusive Cloud API Virtualization Minutes Minutes or Hours Minutes or Hours Good Potentially Less Intrusive

Slide 20

Slide 20 text

OpenStack Private Cloud 3 Separate Regions (Main, Batch, Point 8) Scalability, Rolling Upgrades Regions split in multiple Cells Often matching hardware deliveries Different configurations and capabilities Single hypervisor type (KVM, used to have HyperV as well) CELL 1 MAIN CELL 2 CELL N Compute GPU Compute Nova Network Neutron Neutron

Slide 21

Slide 21 text

OpenStack Private Cloud 3 Separate Regions (Main, Batch, Point 8) Scalability, Rolling Upgrades, Regions split in multiple Cells Often matching hardware deliveries Different configurations and capabilities Single hypervisor type (KVM, used to have HyperV as well) CELL 1 MAIN CELL 2 CELL N Compute GPU Compute Nova Network Neutron Neutron

Slide 22

Slide 22 text

OpenStack Private Cloud 3 Separate Regions (Main, Batch, Point 8) Scalability, Rolling Upgrades, Regions split in multiple Cells Often matching hardware deliveries Different configurations and capabilities Single hypervisor type (KVM, used to have HyperV as well) CELL 1 MAIN CELL 2 CELL N Compute GPU Compute Nova Network Neutron Neutron

Slide 23

Slide 23 text

Networking Flat, segmented network (broadcast domains) Hierarchy of Primary (hypervisors) and Secondary (VMs) services CELL NODE 2 NODE 1 VN V2 V1 V3 V2 V1 V3 S513-V-IP123 137.1XX.43.0/24 ( Primary Service ) S513-V-VM908 188.1XX.191.0/24 ( Secondary Service ) Hypervisors Virtual Machines

Slide 24

Slide 24 text

OpenStack Private Cloud Automate everything! Puppet based deployment of all components Including control plane running on VMs Same is true for most CERN services Workflows for all sorts of tasks Onboarding new users, project creation, quota updates, special capabilities Overcommit, Pre-emptible instances, Backfilling workloads

Slide 25

Slide 25 text

Provisioning Deployment Update Physical Infrastructure Days or Weeks Minutes or Hours Minutes or Hours Utilization Poor Maintenance Highly Intrusive Cloud API Virtualization Minutes Minutes or Hours Minutes or Hours Good Potentially Less Intrusive Containers Seconds Seconds Seconds Very Good Less Intrusive

Slide 26

Slide 26 text

Lingua franca of the cloud Managed services offered by all major public clouds Multiple options for on-premise or self-managed deployments Common declarative API for basic infrastructure : compute, storage, networking Healthy ecosystem of tools offering extended functionality Kubernetes

Slide 27

Slide 27 text

Lingua franca of the cloud Managed services offered by all major public clouds Multiple options for on-premise or self-managed deployments Common declarative API for basic infrastructure : compute, storage, networking Healthy ecosystem of tools offering extended functionality Kubernetes

Slide 28

Slide 28 text

GitOps for Automation We were already doing similar things with Puppet Git as the source of truth for configuration data Allowing multiple choices of deployment models 1 ⇢ 1: Currently the most popular: one application, one cluster 1 ⇢ *: One application, multiple clusters (HA, Blast Radius, Rolling Upgrades) * ⇢ *: Separation of roles, improve resource usage Meta Chart git push FluxCD git pull Helm Release CRD Helm Operator

Slide 29

Slide 29 text

Cluster Creation Image Pre-Pull Data Stage-In Process 5 min 4 min 4 min 90 sec Kubernetes More than just infrastructure management Potential to ease scaling out data analysis on-demand Challenge: Re-processing the Higgs analysis in under 10min Processing a dataset of ~70TB of data split in ~25000 files

Slide 30

Slide 30 text

OpenStack Magnum 70 TB Dataset Job Results Interactive Visualization Aggregation 25000 Kubernetes Jobs

Slide 31

Slide 31 text

Cluster on GKE Max 25000 Cores Single Region, 3 Zones 70 TB Dataset Job Results Interactive Visualization Aggregation 25000 Kubernetes Jobs

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Monitoring From ~40k machines More than 3TB/day compressed Modular architecture Decoupled producers / consumers Built-in stream processing Multiple backends with different SLAs Credit: Diogo Lima Nicolau

Slide 34

Slide 34 text

Credit: Diogo Lima Nicolau

Slide 35

Slide 35 text

Service Availability Historical View ● Availability per service ● Outages integration Credit: Diogo Lima Nicolau

Slide 36

Slide 36 text

Alarming Local (on the machine) ● Simple Threshold / Actuators On dashboards ● Grafana alert engine External ● Alarm source Integrated with ticketing system ● Service now Credit: Diogo Lima Nicolau

Slide 37

Slide 37 text

Challenges Do more with similar resources High Luminosity Large Hadron Collider x7 collisions per second, x10 Data and Computing Machine Learning Considered for fast simulation, detector triggers, anomaly detection, … Accommodate accelerators and scale this new type of workload GPUs, TPUs, IPUs, FPGAs, ...

Slide 38

Slide 38 text

Questions ? @ahcorporto [email protected] http://visits.cern/

Slide 39

Slide 39 text

User Notebook Distributed Compute Build, Validate Model Train at Scale Persistent Storage for Feedback 1. 2. 3. 4. Serving