Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where to Run Your Code

Where to Run Your Code

You have a week until the deadline, and simulations that would take two weeks to run on your laptop. Or you have a problem you want to solve, if you only had enough memory on your desktop. Where should you run your computation? Should you apply for time on a supercomputer, look for a local cluster, or dive into the buzzword-laden world of VMs, Docker, and XaaS? This week, we give a panoramic overview of servers, clusters, clouds, and supercomputers, discussing some performance realities and myths associated with each. We also discuss the problem of how to set up a consistent environment across many platforms, and describe the concept of virtualization: of environments, languages, hardware, and operating systems.

(Presented to the Cornell Scientific Software Club on 10/17: cornell-ssw.github.io)

David Bindel

October 17, 2016
Tweet

More Decks by David Bindel

Other Decks in Programming

Transcript

  1. What is a computer system? Typical laptop: • A few

    cores and a few GB memory • Disk or SSD • Battery pack (battery life trumps cores!) • Standard peripherals: monitor, keyboard, mouse • External connectivity: NIC, HDMI, USB
  2. What is a computer system? Totient is typical small cluster:

    • Several rack-mounted servers (12 core, 32 GB RAM) • Xeon Phi accelerators • Fast Ethernet interconnect • Local and network mounted file systems
  3. What is a computer system? Typical modern supercomputer is: •

    High-end commodity nodes • Custom network
  4. What is a computer system? Cloud infrastructure • Rack-mounted servers

    in data centers • Storage is typically disaggregated • Used to run VMs: the "machine" and the HW are separate!
  5. What is a computer system? Hardware resources? • I/O peripherals

    (irrelevant today) • Compute: CPU cores, accelerators • Memory: Cache and RAM • Storage: Disk, SSD, etc • Network How can you share these resources?
  6. What is a computer system? OS and associated utilities? The

    operating system provides • Uniform programming abstractions for HW resources • Scheduling of resources to processes • Mechanisms for • Accessing hardware resources • Communicating with other processes • Isolation between processes
  7. What is a computer system? • Hardware? • OS abstraction

    of hardware? • A higher-level platform for running programs?
  8. All problems in computer science can be solved by another

    level of indirection. — David Wheeler
  9. From physical to logical OS already abstracts the hardware: •

    I/O peripherals: "Print to X" • Compute: Hyperthreading, multitasking • Memory: Virtual memory • Storage: Dropbox, NFS, etc • Network: Virtual NICs
  10. Another level of indirection • OS: Share HW resources between

    processes • Hypervisor: Share HW resources between virtual machines • Each VM has an independent OS, utilities, libraries • Sharing HW across VMs improves utilization • Separating VM from HW improves portability Sharing hardware across VMs is key to Amazon, Azure, Google clouds.
  11. The Virtual Machine: CPU + memory • Sharing across processes

    with same OS is old • OS-supported pre-emptive multi-tasking • Virtual memory abstractions with HW support • Page tables, TLB • Sharing HW between systems is newer • Today: CPU virtualization with near zero overhead • Backed by extended virtual memory support • DMA remapping, extended page tables
  12. The Virtual Machine: Storage • Network attached storage around for

    a long time • Issue: The I/O blender • Disks are good at big sequential reads • Lots of independent access thrashing • SSD-enabled machines are increasingly common
  13. The Virtual Machine: Network • Hard to get full-speed access

    via VM! • Issue: Sharing peripherals with direct memory access? • Hardware support is improving (e.g.~SR-IOV standards) • Still a potential pain point (esp for networking)
  14. The Virtual Machine: Accelerators? I don't understand how these would

    be virtualized! But I know people are doing it.
  15. Hypervisor options • Type 1 (bare metal) vs type 2

    (run guest OS atop host OS) • Not always a clear distinction (e.g. KVM somewhere between?) • You may have used Type 2 (e.g. Parallels, VirtualBox, etc) • Common large-scale choices • KVM (used by Google cloud) • Xen (used by Amazon cloud) • HyperV (used by Azure) • vmWare (used in many commercial clouds)
  16. Performance implications The good VMs perform well enough for many

    commercial workloads: • Hypervisor CPU overheads are pretty low (absent sharing) • May now be within a few percent on LINPACK-style loads • VMWare agrees with this • Virtual memory (a mature tech) is being extended appropriately
  17. Performance implications The bad Virtualization does have performance impacts: •

    Contention between VMs has nontrivial overheads • Untuned VMs may miss important memory features • Mismatched scheduling of VMs can slow multi-CPU runs • I/O virtualization is still costly It probably does not make sense to do big PDE solves on VMs yet.
  18. Performance implications VM performance is a fast moving target: •

    VMs are important for isolation and utilization • Important for economics of rented infrastructure • Economic importance drives a lot • Big topic of academic systems research • Lots of industry and open source R&D (HW and SW) Scientific HPC will ultimately benefit even if it is not the driver.
  19. VM performance punchline • VM computing in clouds will not

    give "bare metal" performance • If I get 32 cores + 256 GB RAM, do I care about 10% penalty? • Try it before you knock it • Much depends on the workload • And remember: performance comparisons are hard! • And the picture will change next year anyhow
  20. Why virtualize? A scientific SW perspective A not-atypical coding day:

    1. Build my code (four different languages, countless libraries) 2. Doesn't work; install missing library 3. Library requires a different version of another dependency 4. Install new version, breaking a different package 5. Swear, take a coffee break 6. Go to 1
  21. Why virtualize? Application isolation • Desiderata: Codes operate independently on

    same HW • Isolated HW use: memory spaces, processes, etc (OS handles) • Isolated SW use: dependencies, dynamic libraries, etc (OS shrugs) • Many different tools for isolation • Virtual machine: strong isolation, heavyweight solution • Python virtualenv, conda environments, environment modules: language level, only partial isolation
  22. Why virtualize? Application portability • Desiderata: Code developed on my

    laptop runs elsewhere • Even if "elsewhere" prefers a different Linux distribution • What about automatic configuration (autoconf, CMake)? • Great at finding some libraries that satisfy dependencies • Maintenance woes: bug on a system I can't reproduce? • Solution: Package code and all dependencies in a VM? • But what about performance? And image size?
  23. Containers • Instead of virtualizing all HW, virtualize the OS

    • Container image includes library dependencies, config files, etc • A running container has its own • Root filesystem (no sharing libraries across containers) • Process space, inter-process communication, TCP sockets, ... • Can run on VM or on bare metal
  24. Container landscape • Docker dominates • rkt is an up-and-coming

    alternative • Several others (see this comparison) • Multiple efforts on containers for HPC • Shifter: Docker-like user-defined images for HPC systems • Singularity: Competing system
  25. Containers vs VMs? • VMs: Different operating systems on same

    hardware • What if I want Windows and Linux together on one machine? • This is a good reason for running VMs locally, too! • VMs: Strong isolation between jobs sharing hardware (security) • Operating system is supposed to isolate jobs • But what about shared OS, one malicious user with a root kit? • Hypervisor provides smaller attack surface • Containers: one OS, weaker isolation, but lower overhead
  26. IaaS Infrastructure as a Service • Low-level computing for rent

    • Computers (VMs or bare metal) • Network (you pay for bandwidth) • Storage (virtual disks, storage buckets, databases) • The focus of our discussion so far
  27. PaaS Platform as a Service • Programmable environments one step

    above raw machines • Example: Wakari and other Python notebook hosts
  28. SaaS Software as a Service • Relatively fixed software package

    (usually with web interface) • Example: GMail
  29. The big three for XaaS • Amazon Web Services (AWS):

    first mover, GPUs • Google Cloud Platform (GCP): better prices? • Microsoft Azure: only one with Infiniband instances
  30. The many others: HPC IaaS • RedCloud: Cornell local •

    Nimbix • Sabalcore • Penguin-on-Demand
  31. The many others: HPC PaaS/SaaS • Rescale: Turn-key HPC and

    simulations • Penguin-on-Demand: Bare-metal IaaS or PaaS • Mathworks cloud: One-stop shopping for parallel MATLAB cores • Cycle computing: PaaS on clouds (e.g. Google, Amazon, Azure) • SimScale: Simulation from your browser • TotalCAE: Turn-key private or public cloud FEA/CFD • CPU 24/7: CAE as a Service
  32. Questions to ask • What type of workload do I

    have? • Big memory but modest core count? • Embarassingly parallel? • GPU friendly? • How much data? Data transfer is not always free! • How will I interact with the system? SSH alone? GUIs? Web? • What about licensed software?
  33. Standard options beyond the laptop • Local clusters and servers

    (totient, several others around Cornell) • Public cloud VMs (Amazon, Google, Azure) • Will discuss over next few weeks • Can pay money or write proposal for credits • Public cloud bare metal (Nimbix, Sabalcore, PoD) • Good if bare-metal parallel performance an issue • Might want to compare to CAC offerings • Supercomputer (XSEDE, DOE): discuss in the spring