Slide 1

Slide 1 text

Where to Run Your Code Real and Virtual Platforms David Bindel, 17 Oct 2016

Slide 2

Slide 2 text

What is a computer system?

Slide 3

Slide 3 text

What is a computer system? Typical laptop: • A few cores and a few GB memory • Disk or SSD • Battery pack (battery life trumps cores!) • Standard peripherals: monitor, keyboard, mouse • External connectivity: NIC, HDMI, USB

Slide 4

Slide 4 text

What is a computer system? Totient is typical small cluster: • Several rack-mounted servers (12 core, 32 GB RAM) • Xeon Phi accelerators • Fast Ethernet interconnect • Local and network mounted file systems

Slide 5

Slide 5 text

What is a computer system? Typical modern supercomputer is: • High-end commodity nodes • Custom network

Slide 6

Slide 6 text

What is a computer system? Cloud infrastructure • Rack-mounted servers in data centers • Storage is typically disaggregated • Used to run VMs: the "machine" and the HW are separate!

Slide 7

Slide 7 text

What is a computer system? Hardware resources? • I/O peripherals (irrelevant today) • Compute: CPU cores, accelerators • Memory: Cache and RAM • Storage: Disk, SSD, etc • Network How can you share these resources?

Slide 8

Slide 8 text

What is a computer system? OS and associated utilities? The operating system provides • Uniform programming abstractions for HW resources • Scheduling of resources to processes • Mechanisms for • Accessing hardware resources • Communicating with other processes • Isolation between processes

Slide 9

Slide 9 text

What is a computer system? • Hardware? • OS abstraction of hardware? • A higher-level platform for running programs?

Slide 10

Slide 10 text

Virtualization

Slide 11

Slide 11 text

All problems in computer science can be solved by another level of indirection. — David Wheeler

Slide 12

Slide 12 text

From physical to logical OS already abstracts the hardware: • I/O peripherals: "Print to X" • Compute: Hyperthreading, multitasking • Memory: Virtual memory • Storage: Dropbox, NFS, etc • Network: Virtual NICs

Slide 13

Slide 13 text

Another level of indirection

Slide 14

Slide 14 text

Another level of indirection • OS: Share HW resources between processes • Hypervisor: Share HW resources between virtual machines • Each VM has an independent OS, utilities, libraries • Sharing HW across VMs improves utilization • Separating VM from HW improves portability Sharing hardware across VMs is key to Amazon, Azure, Google clouds.

Slide 15

Slide 15 text

The Virtual Machine: CPU + memory • Sharing across processes with same OS is old • OS-supported pre-emptive multi-tasking • Virtual memory abstractions with HW support • Page tables, TLB • Sharing HW between systems is newer • Today: CPU virtualization with near zero overhead • Backed by extended virtual memory support • DMA remapping, extended page tables

Slide 16

Slide 16 text

The Virtual Machine: Storage • Network attached storage around for a long time • Issue: The I/O blender • Disks are good at big sequential reads • Lots of independent access thrashing • SSD-enabled machines are increasingly common

Slide 17

Slide 17 text

The Virtual Machine: Network • Hard to get full-speed access via VM! • Issue: Sharing peripherals with direct memory access? • Hardware support is improving (e.g.~SR-IOV standards) • Still a potential pain point (esp for networking)

Slide 18

Slide 18 text

The Virtual Machine: Accelerators? I don't understand how these would be virtualized! But I know people are doing it.

Slide 19

Slide 19 text

Hypervisor options • Type 1 (bare metal) vs type 2 (run guest OS atop host OS) • Not always a clear distinction (e.g. KVM somewhere between?) • You may have used Type 2 (e.g. Parallels, VirtualBox, etc) • Common large-scale choices • KVM (used by Google cloud) • Xen (used by Amazon cloud) • HyperV (used by Azure) • vmWare (used in many commercial clouds)

Slide 20

Slide 20 text

Performance implications The good VMs perform well enough for many commercial workloads: • Hypervisor CPU overheads are pretty low (absent sharing) • May now be within a few percent on LINPACK-style loads • VMWare agrees with this • Virtual memory (a mature tech) is being extended appropriately

Slide 21

Slide 21 text

Performance implications The bad Virtualization does have performance impacts: • Contention between VMs has nontrivial overheads • Untuned VMs may miss important memory features • Mismatched scheduling of VMs can slow multi-CPU runs • I/O virtualization is still costly It probably does not make sense to do big PDE solves on VMs yet.

Slide 22

Slide 22 text

Performance implications VM performance is a fast moving target: • VMs are important for isolation and utilization • Important for economics of rented infrastructure • Economic importance drives a lot • Big topic of academic systems research • Lots of industry and open source R&D (HW and SW) Scientific HPC will ultimately benefit even if it is not the driver.

Slide 23

Slide 23 text

VM performance punchline • VM computing in clouds will not give "bare metal" performance • If I get 32 cores + 256 GB RAM, do I care about 10% penalty? • Try it before you knock it • Much depends on the workload • And remember: performance comparisons are hard! • And the picture will change next year anyhow

Slide 24

Slide 24 text

Containers

Slide 25

Slide 25 text

Why virtualize? A scientific SW perspective A not-atypical coding day: 1. Build my code (four different languages, countless libraries) 2. Doesn't work; install missing library 3. Library requires a different version of another dependency 4. Install new version, breaking a different package 5. Swear, take a coffee break 6. Go to 1

Slide 26

Slide 26 text

Why virtualize? Application isolation • Desiderata: Codes operate independently on same HW • Isolated HW use: memory spaces, processes, etc (OS handles) • Isolated SW use: dependencies, dynamic libraries, etc (OS shrugs) • Many different tools for isolation • Virtual machine: strong isolation, heavyweight solution • Python virtualenv, conda environments, environment modules: language level, only partial isolation

Slide 27

Slide 27 text

Why virtualize? Application portability • Desiderata: Code developed on my laptop runs elsewhere • Even if "elsewhere" prefers a different Linux distribution • What about automatic configuration (autoconf, CMake)? • Great at finding some libraries that satisfy dependencies • Maintenance woes: bug on a system I can't reproduce? • Solution: Package code and all dependencies in a VM? • But what about performance? And image size?

Slide 28

Slide 28 text

Containers • Instead of virtualizing all HW, virtualize the OS • Container image includes library dependencies, config files, etc • A running container has its own • Root filesystem (no sharing libraries across containers) • Process space, inter-process communication, TCP sockets, ... • Can run on VM or on bare metal

Slide 29

Slide 29 text

Container landscape • Docker dominates • rkt is an up-and-coming alternative • Several others (see this comparison) • Multiple efforts on containers for HPC • Shifter: Docker-like user-defined images for HPC systems • Singularity: Competing system

Slide 30

Slide 30 text

Containers vs VMs? • VMs: Different operating systems on same hardware • What if I want Windows and Linux together on one machine? • This is a good reason for running VMs locally, too! • VMs: Strong isolation between jobs sharing hardware (security) • Operating system is supposed to isolate jobs • But what about shared OS, one malicious user with a root kit? • Hypervisor provides smaller attack surface • Containers: one OS, weaker isolation, but lower overhead

Slide 31

Slide 31 text

XaaS and the cloud

Slide 32

Slide 32 text

IaaS Infrastructure as a Service • Low-level computing for rent • Computers (VMs or bare metal) • Network (you pay for bandwidth) • Storage (virtual disks, storage buckets, databases) • The focus of our discussion so far

Slide 33

Slide 33 text

PaaS Platform as a Service • Programmable environments one step above raw machines • Example: Wakari and other Python notebook hosts

Slide 34

Slide 34 text

SaaS Software as a Service • Relatively fixed software package (usually with web interface) • Example: GMail

Slide 35

Slide 35 text

The big three for XaaS • Amazon Web Services (AWS): first mover, GPUs • Google Cloud Platform (GCP): better prices? • Microsoft Azure: only one with Infiniband instances

Slide 36

Slide 36 text

The many others: HPC IaaS • RedCloud: Cornell local • Nimbix • Sabalcore • Penguin-on-Demand

Slide 37

Slide 37 text

The many others: HPC PaaS/SaaS • Rescale: Turn-key HPC and simulations • Penguin-on-Demand: Bare-metal IaaS or PaaS • Mathworks cloud: One-stop shopping for parallel MATLAB cores • Cycle computing: PaaS on clouds (e.g. Google, Amazon, Azure) • SimScale: Simulation from your browser • TotalCAE: Turn-key private or public cloud FEA/CFD • CPU 24/7: CAE as a Service

Slide 38

Slide 38 text

Choosing a platform

Slide 39

Slide 39 text

Questions to ask • What type of workload do I have? • Big memory but modest core count? • Embarassingly parallel? • GPU friendly? • How much data? Data transfer is not always free! • How will I interact with the system? SSH alone? GUIs? Web? • What about licensed software?

Slide 40

Slide 40 text

Standard options beyond the laptop • Local clusters and servers (totient, several others around Cornell) • Public cloud VMs (Amazon, Google, Azure) • Will discuss over next few weeks • Can pay money or write proposal for credits • Public cloud bare metal (Nimbix, Sabalcore, PoD) • Good if bare-metal parallel performance an issue • Might want to compare to CAC offerings • Supercomputer (XSEDE, DOE): discuss in the spring