Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Virtual Machine Warm-up Blows Hot and Cold

Virtual Machine Warm-up Blows Hot and Cold

ICOOOLPS Workshop

July 18, 2016
Tweet

More Decks by ICOOOLPS Workshop

Other Decks in Science

Transcript

  1. VM Warmup Blows Hot and Cold Edd Barrett, Carl Friedrich

    Bolz, Rebecca Killick (Lancaster), Vincent Knight (Cardiff), Sarah Mount, Laurence Tratt Software Development Team 2016-07-18 1 / 30 http://soft-dev.org/
  2. The Warmup Experiment The Warmup Experiment Measure warmup of modern

    language implementations 8 / 30 http://soft-dev.org/
  3. The Warmup Experiment The Warmup Experiment Measure warmup of modern

    language implementations Hypothesis: Small, deterministic programs exhibit classical warmup behaviour 8 / 30 http://soft-dev.org/
  4. Method 1: Which benchmarks? Method 1: Which benchmarks? The language

    benchmark games are perfect for us (unusually) 9 / 30 http://soft-dev.org/
  5. Method 1: Which benchmarks? Method 1: Which benchmarks? The language

    benchmark games are perfect for us (unusually) We removed any CFG non-determinism 9 / 30 http://soft-dev.org/
  6. Method 1: Which benchmarks? Method 1: Which benchmarks? The language

    benchmark games are perfect for us (unusually) We removed any CFG non-determinism We added checksums to all benchmarks 9 / 30 http://soft-dev.org/
  7. Method 2: How long to run? Method 2: How long

    to run? 2000 in-process iterations 10 / 30 http://soft-dev.org/
  8. Method 2: How long to run? Method 2: How long

    to run? 2000 in-process iterations 10 process executions 10 / 30 http://soft-dev.org/
  9. Method 3: VMs Method 3: VMs • Graal-0.13 • HHVM-3.12.0

    • JRuby/Truffle (git #f82ac771) • Hotspot-8u72b15 • LuaJit-2.0.4 • PyPy-4.0.1 • V8-4.9.385.21 • GCC-4.9.3 Note: same GCC (4.9.3) used for all compilation 11 / 30 http://soft-dev.org/
  10. Method 4: Machines Method 4: Machines • Linux-Debian8/i4790K, 24GiB RAM

    • Linux-Debian8/i4790, 32GiB RAM • OpenBSD-5.8/i4790, 32GiB RAM 12 / 30 http://soft-dev.org/
  11. Method 4: Machines Method 4: Machines • Linux-Debian8/i4790K, 24GiB RAM

    • Linux-Debian8/i4790, 32GiB RAM • OpenBSD-5.8/i4790, 32GiB RAM • Turbo boost and hyper-threading disabled • SSH blocked from non-local machines • Daemons disabled (cron, smtpd) 12 / 30 http://soft-dev.org/
  12. Method 5: Krun Method 5: Krun Benchmark runner: tries to

    control as many confounding variables as possible 13 / 30 http://soft-dev.org/
  13. Method 5: Krun Method 5: Krun Benchmark runner: tries to

    control as many confounding variables as possible e.g.: • Minimises I/O • Sets fixed heap and stack ulimits • Drops privileges to a ‘clean’ user account • Automatically reboots the system prior to each proc. exec • Checks dmesg for changes after each proc. exec • Checks system at (roughly) same temperature for proc. execs • Enforces kernel settings (tickless mode, CPU governors, ...) 13 / 30 http://soft-dev.org/
  14. Classical Warmup Classical Warmup 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 0.232 0.341 0.449 0.558 0.666 0.775 0.884 Time(s) Richards, Graal, Linux1/i7-4790K, Process execution #3 0 1 2 3 4 5 6 7 8 9 0.232 0.558 0.884 15 / 30 http://soft-dev.org/
  15. Classical Warmup Classical Warmup 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 1.137 1.143 1.150 1.156 1.163 1.169 1.176 Time(s) Fasta, V8, Linux2/i7-4790, Process execution #1 15 / 30 http://soft-dev.org/
  16. Classical Warmup Classical Warmup 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #7 15 / 30 http://soft-dev.org/
  17. Classical Warmup Classical Warmup 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 1.021 1.027 1.032 1.038 1.044 1.050 1.055 Time(s) Fasta, V8, Linux1/i7-4790K, Process execution #1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 1.137 1.143 1.150 1.156 1.163 1.169 1.176 Time(s) Fasta, V8, Linux2/i7-4790, Process execution #1 (Different machines) 16 / 30 http://soft-dev.org/
  18. Slowdown Slowdown 0 200 400 600 800 1000 1200 1400

    1600 1800 2000 In-process iteration 0.562 0.563 0.564 0.565 0.566 0.566 0.567 Time(s) Fannkuch Redux, LuaJIT, OpenBSD/i7-4790, Process execution #10 17 / 30 http://soft-dev.org/
  19. Slowdown Slowdown 0 200 400 600 800 1000 1200 1400

    1600 1800 2000 In-process iteration 0.266 0.271 0.276 0.282 0.287 0.293 0.298 Time(s) Richards, Hotspot, Linux2/i7-4790, Process execution #2 17 / 30 http://soft-dev.org/
  20. Cycles Cycles 0 200 400 600 800 1000 1200 1400

    1600 1800 2000 In-process iteration 0.301 0.309 0.316 0.324 0.332 0.340 0.347 Time(s) Fannkuch Redux, Hotspot, Linux1/i7-4790K, Process execution #1 18 / 30 http://soft-dev.org/
  21. Cycles Cycles 0 200 400 600 800 1000 1200 1400

    1600 1800 2000 In-process iteration 0.358 0.366 0.374 0.382 0.389 0.397 0.405 Time(s) Fannkuch Redux, Hotspot, OpenBSD/i7-4790, Process execution #4 250 300 350 400 450 500 550 600 0.359 0.372 0.386 18 / 30 http://soft-dev.org/
  22. Cycles Cycles 0 200 400 600 800 1000 1200 1400

    1600 1800 2000 In-process iteration 0.504 0.513 0.522 0.530 0.539 0.547 0.556 Time(s) Binary Trees, PyPy, Linux2/i7-4790, Process execution #1 200 205 210 215 220 225 230 235 240 0.506 0.510 0.515 18 / 30 http://soft-dev.org/
  23. Never-ending Phase Changes Never-ending Phase Changes 0 200 400 600

    800 1000 1200 1400 1600 1800 2000 In-process iteration 0.350 0.351 0.352 0.353 0.354 0.354 0.355 Time(s) Fasta, LuaJIT, OpenBSD/i7-4790, Process execution #5 19 / 30 http://soft-dev.org/
  24. Inconsistent Process-executions Inconsistent Process-executions 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 3.605 3.618 3.630 3.643 3.655 3.668 3.681 Time(s) Fasta, PyPy, Linux2/i7-4790, Process execution #3 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 3.605 3.618 3.630 3.643 3.655 3.668 3.681 Time(s) Fasta, PyPy, Linux2/i7-4790, Process execution #4 (Note: same machine) 20 / 30 http://soft-dev.org/
  25. Inconsistent Process-executions Inconsistent Process-executions 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 0.963 0.976 0.989 1.001 1.014 1.026 1.039 Time(s) Binary Trees, C, Linux2/i7-4790, Process execution #1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 3.242 3.265 3.288 3.311 3.334 3.357 3.380 Time(s) Binary Trees, C, OpenBSD/i7-4790, Process execution #1 (Note: different machines. Bouncing ball pattern Linux-specific) 20 / 30 http://soft-dev.org/
  26. Summary Summary Classical warmup occurs for only: 50% of process

    executions 25% of (VM, benchmark) pairs 21 / 30 http://soft-dev.org/
  27. Summary Summary Classical warmup occurs for only: 50% of process

    executions 25% of (VM, benchmark) pairs 0% of benchmarks for all VMs, machines & proc execs. 21 / 30 http://soft-dev.org/
  28. Open Questions Open Questions How can we measure anything any

    more? For how long has this been going on? 23 / 30 http://soft-dev.org/
  29. Open Questions Open Questions How can we measure anything any

    more? For how long has this been going on? Is this really the fault of the VMs? 23 / 30 http://soft-dev.org/
  30. Outlier Detection Outlier Detection 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #2 Measurement Outliers 25 / 30 http://soft-dev.org/
  31. Outlier Detection Outlier Detection 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #2 Measurement Outliers 5¾ Outliers outside 5σ of rolling average 25 / 30 http://soft-dev.org/
  32. Outlier Detection Outlier Detection 0 200 400 600 800 1000

    1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 0.466 0.469 0.471 0.473 0.476 0.478 0.480 Time(s) Spectral Norm, PyPy, Linux1/i7-4790K, Process execution #2 Measurement Unique outliers (0:05%) Common outliers (0:40%) Recurring outliers 25 / 30 http://soft-dev.org/
  33. Future Work Future Work The ‘obvious’ (control more variables; more

    benchmarks; more VMs; etc.) 27 / 30 http://soft-dev.org/
  34. Future Work Future Work The ‘obvious’ (control more variables; more

    benchmarks; more VMs; etc.) Can we work out why we see what we see? e.g. is that spike at x = 78 actually {GC, compilation, . . . } 27 / 30 http://soft-dev.org/
  35. Full Results Full Results https://archive.org/download/softdev_warmup_ experiment_artefacts/v0.2/ • all_graphs.pdf All plots

    in one huge PDF. • warmup_results*.json.bz2 Raw results. (Note: newer results available) 28 / 30 http://soft-dev.org/
  36. References References VM Warmup Blows Hot and Cold E. Barrett,

    C. F. Bolz, R. Killick, V. Knight, S. Mount and L. Tratt. Rigorous Benchmarking in Reasonable Time T. Kalibera and R. Jones Specialising Dynamic Techniques for Implementing the Ruby Programming Language C. Seaton (Chapter 4) Quantifying performance changes with effect size confidence intervals T. Kalibera and R. Jones 29 / 30 http://soft-dev.org/
  37. Thanks for listening 0 200 400 600 800 1000 1200

    1400 1600 1800 2000 In-process iteration 0.301 0.309 0.316 0.324 0.332 0.340 0.347 Time(s) Fannkuch Redux, Hotspot, Linux1/i7-4790K, Process execution #1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 In-process iteration 0.350 0.351 0.352 0.353 0.354 0.354 0.355 Time(s) Fasta, LuaJIT, OpenBSD/i7-4790, Process execution #5 30 / 30 http://soft-dev.org/