Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Numerical computing on the web: benchmarking for the future

David
September 12, 2018

Numerical computing on the web: benchmarking for the future

Recent advances in execution environments for JavaScript and WebAssembly that run on a broad range of devices, from workstations and mobile phones to IoT devices, provide new opportunities for portable and web-based numerical computing. Indeed, numerous numerical libraries and applications are emerging on the web, including Tensorflow.js, JSMapReduce, and the NLG Protein Viewer. This paper evaluates the current performance of numerical computing on the web, including both JavaScript and WebAssembly, over a wide range of devices from workstations to IoT devices. We developed a new benchmarking approach, which allowed us to perform centralized benchmarking, including benchmarking on mobile and IoT devices. Using this approach we performed four performance studies using the Ostrich benchmark suite, a collection of numerical programs representing the numerical dwarf categories identified by Colella. We studied the performance evolution of JavaScript, the relative performance of WebAssembly, the performance of server-side Node.js, and a comprehensive performance showdown for a wide range of devices.

David

September 12, 2018
Tweet

More Decks by David

Other Decks in Research

Transcript

  1. Numerical Computing on the Web: Benchmarking for the Future Sable

    Research Group McGill University David Herrera, Hanfeng Chen, Eric Lavoie, Laurie Hendren [email protected] November 7, 2018 David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 1 / 27
  2. Why numerical performance on the Web? Web-enabled devices are everywhere

    Various compute-intensive applications coming up Tensorflow.js NLGProtein Viewer Recent addition of WebAssembly to the web world. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 2 / 27
  3. Why numerical performance on the Web? Web-enabled devices are everywhere

    Various compute-intensive applications coming up Tensorflow.js NLGProtein Viewer Recent addition of WebAssembly to the web world. (Source: https://js.tensorflow.org/) (Source: http://nglviewer.org) David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 2 / 27
  4. Why numerical performance on the Web? Web-enabled devices are everywhere

    Various compute-intensive applications coming up Tensorflow.js NLGProtein Viewer Recent addition of WebAssembly to the web world. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 2 / 27
  5. Background The DLS 2014 performance study (Khan et al. 2014)

    led to two further projects: Wu-wei: A benchmarking approach to easily, run, manage and reproduce experiments. Numerical Study: A numerical performance study of JavaScript and WebAssembly along different web-enabled environments and devices. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 3 / 27
  6. A benchmarking approach: Wu-wei . Wu-wei provides solutions to the

    following problems: Combinatorial explosion of possible configurations. Restrictions on how the different experimental dimensions interact. Reproducibility of experiments and handling of dependencies. Benchmarking in remote environments such as mobile and IoT. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 5 / 27
  7. A benchmarking approach: Wu-wei . Wu-wei provides solutions to the

    following problems: Combinatorial explosion of possible configurations. Restrictions on how the different experimental dimensions interact. Reproducibility of experiments and handling of dependencies. Benchmarking in remote environments such as mobile and IoT. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 5 / 27
  8. A benchmarking approach: Wu-wei . Wu-wei provides solutions to the

    following problems: Combinatorial explosion of possible configurations. Restrictions on how the different experimental dimensions interact. Reproducibility of experiments and handling of dependencies. Benchmarking in remote environments such as mobile and IoT. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 5 / 27
  9. A benchmarking approach: Wu-wei . Wu-wei provides solutions to the

    following problems: Combinatorial explosion of possible configurations. Restrictions on how the different experimental dimensions interact. Reproducibility of experiments and handling of dependencies. Benchmarking in remote environments such as mobile and IoT. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 5 / 27
  10. Wu-wei: Overview "input-size":"small" Process Algorithm or Problem Definition Implementation Compiler

    Environment Platform Experiment Concrete Realization in a Programming Language Static T ransformer to Executable Execution Environment Hardware + OS User-Supplied Experiment Configuration DEFINITION EXAMPLES Back-propagation C, JavaScript Browserify, Emcc, Gcc Chrome v63.0.3239.84, Node.js v8.0.0, Native Iphone X + OSX 11.1, MacBook Pro 2012 + OSX 10.11.6 Figure: Wu-wei artifacts David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 6 / 27
  11. Wu-wei: Measuring browser remote execution wu-wei wu-wei hosted environment Firebase

    Spawn Process Browser Open at Server IP ---- Initial set up ---- Request ---- Response Server Device New Bench. Iteration Benchmark. Response Spawn Browser Benchmark Response Spawn Browser {IP, PORT} {IP, PORT} Figure: Wu-wei remote browser execution 1 The environment spawns a local website and sends the run information to Firebase. 2 The Wu-wei device captures the request from Firebase. 3 Device spawns the specified browser in remote environment. 4 The environment captures result and returns control to the Wu-wei. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 7 / 27
  12. Wu-wei: Measuring browser remote execution wu-wei wu-wei hosted environment Firebase

    Spawn Process Browser Open at Server IP ---- Initial set up ---- Request ---- Response Server Device New Bench. Iteration Benchmark. Response Spawn Browser Benchmark Response Spawn Browser {IP, PORT} {IP, PORT} Figure: Wu-wei remote browser execution 1 The environment spawns a local website and sends the run information to Firebase. 2 The Wu-wei device captures the request from Firebase. 3 Device spawns the specified browser in remote environment. 4 The environment captures result and returns control to the Wu-wei. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 7 / 27
  13. Wu-wei: Measuring browser remote execution wu-wei wu-wei hosted environment Firebase

    Spawn Process Browser Open at Server IP ---- Initial set up ---- Request ---- Response Server Device New Bench. Iteration Benchmark. Response Spawn Browser Benchmark Response Spawn Browser {IP, PORT} {IP, PORT} Figure: Wu-wei remote browser execution 1 The environment spawns a local website and sends the run information to Firebase. 2 The Wu-wei device captures the request from Firebase. 3 Device spawns the specified browser in remote environment. 4 The environment captures result and returns control to the Wu-wei. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 7 / 27
  14. Wu-wei: Measuring browser remote execution wu-wei wu-wei hosted environment Firebase

    Spawn Process Browser Open at Server IP ---- Initial set up ---- Request ---- Response Server Device New Bench. Iteration Benchmark. Response Spawn Browser Benchmark Response Spawn Browser {IP, PORT} {IP, PORT} Figure: Wu-wei remote browser execution 1 The environment spawns a local website and sends the run information to Firebase. 2 The Wu-wei device captures the request from Firebase. 3 Device spawns the specified browser in remote environment. 4 The environment captures result and returns control to the Wu-wei. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 7 / 27
  15. Conclusions: Wu-wei Wu-wei is a benchmarking toolkit that allows you

    to easily organize, run and query benchmarking experiments across a variety of experimental dimensions. We have extended Wu-wei to perform centralized benchmarking across remote environments, including mobile, and some IoT devices. Repository: https://github.com/Sable/wu-wei-benchmarking-toolkit David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 8 / 27
  16. Performance Studies End Goal: To study the numerical performance of

    JavaScript and WebAssembly across a variety of browsers and devices. Including some desktops, tablets, smart-phones, and single-board computers. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 10 / 27
  17. Methodology - Choice of Benchmarks Ostrich Benchmark Set (Khan et

    al. 2014) Numerical benchmark set containing 12 benchmarks representing the 7 dwarf numerical categories identified by Phillip Colella in 2004. Each benchmark contains a C implementation and an equivalent JavaScript implementation. WebAssembly is obtained by compiling the C benchmarks Emscripten (Alon Zakai 2011) version 1.37.22 using optimization flag -03. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 11 / 27
  18. Methodology - Choice of Benchmarks Ostrich Benchmark Set (Khan et

    al. 2014) Numerical benchmark set containing 12 benchmarks representing the 7 dwarf numerical categories identified by Phillip Colella in 2004. Each benchmark contains a C implementation and an equivalent JavaScript implementation. WebAssembly is obtained by compiling the C benchmarks Emscripten (Alon Zakai 2011) version 1.37.22 using optimization flag -03. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 11 / 27
  19. Methodology - Choice of Benchmarks Ostrich Benchmark Set (Khan et

    al. 2014) Numerical benchmark set containing 12 benchmarks representing the 7 dwarf numerical categories identified by Phillip Colella in 2004. Each benchmark contains a C implementation and an equivalent JavaScript implementation. WebAssembly is obtained by compiling the C benchmarks Emscripten (Alon Zakai 2011) version 1.37.22 using optimization flag -03. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 11 / 27
  20. Methodology - Execution & Timing The experiments measure start-up time

    typical of one benchmark run. At each benchmark run, the browser/environment is re-spawned as to avoid cached optimizations of the JavaScript engines. Each benchmark was ran a total of 30 times for each environment as suggested by Georges et. al 2007. Comparisons are made using relative speedups versus a common baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 12 / 27
  21. Methodology - Execution & Timing The experiments measure start-up time

    typical of one benchmark run. At each benchmark run, the browser/environment is re-spawned as to avoid cached optimizations of the JavaScript engines. Each benchmark was ran a total of 30 times for each environment as suggested by Georges et. al 2007. Comparisons are made using relative speedups versus a common baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 12 / 27
  22. Methodology - Execution & Timing The experiments measure start-up time

    typical of one benchmark run. At each benchmark run, the browser/environment is re-spawned as to avoid cached optimizations of the JavaScript engines. Each benchmark was ran a total of 30 times for each environment as suggested by Georges et. al 2007. Comparisons are made using relative speedups versus a common baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 12 / 27
  23. Methodology - Execution & Timing The experiments measure start-up time

    typical of one benchmark run. At each benchmark run, the browser/environment is re-spawned as to avoid cached optimizations of the JavaScript engines. Each benchmark was ran a total of 30 times for each environment as suggested by Georges et. al 2007. Comparisons are made using relative speedups versus a common baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 12 / 27
  24. RQ1: Old versus new JavaScript Question: How has JavaScript evolve

    in the context of numerical computing since the study in 2014? backprop bfs crc fft hmm lavamd lud nqueens nw pagerank spmv srad geomean 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Speedup (relative to C) Chromium38-js Firefox39-js Chrome63-js Firefox57-js C Figure: JavaScript Speed-up performance against native C for MacBook Pro 2018. Experimental Setup: Platforms: ubuntu-deer and mbp2018 Environments: 2014: Firefox 39 & Chromium 38. 2018: Firefox 57 & Chrome 63. Comparison: Speedups versus the C baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 15 / 27
  25. RQ1: Old versus new JavaScript Question: How has JavaScript evolve

    in the context of numerical computing since the study in 2014? backprop bfs crc fft hmm lavamd lud nqueens nw pagerank spmv srad geomean 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Speedup (relative to C) Chromium38-js Firefox39-js Chrome63-js Firefox57-js C Figure: JavaScript Speed-up performance against native C for MacBook Pro 2018. Experimental Setup: Platforms: ubuntu-deer and mbp2018 Environments: 2014: Firefox 39 & Chromium 38. 2018: Firefox 57 & Chrome 63. Comparison: Speedups versus the C baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 15 / 27
  26. RQ1: Old versus new JavaScript Figure: Geometric means for the

    performance of old versus new engines against native C Insights: JavaScript performance remains within a 2 factor of native code. Chrome performance decreased since 2014 for both platforms. Firefox performance has increased since 2014, but it remains within 2x. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 16 / 27
  27. RQ1: Old versus new JavaScript Figure: Geometric means for the

    performance of old versus new engines against native C Insights: JavaScript performance remains within a 2 factor of native code. Chrome performance decreased since 2014 for both platforms. Firefox performance has increased since 2014, but it remains within 2x. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 16 / 27
  28. RQ1: Old versus new JavaScript Figure: Geometric means for the

    performance of old versus new engines against native C Insights: JavaScript performance remains within a 2 factor of native code. Chrome performance decreased since 2014 for both platforms. Firefox performance has increased since 2014, but it remains within 2x. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 16 / 27
  29. RQ1: Old versus new JavaScript Figure: Evolution of numeric performance

    since 2014 for the Chrome and Firefox browsers Insights: The performance of numeric code has not improved much since 2014. Firefox browser performance has been progressively increasing since 2014. Chromium’s performance has decreased, this is due to two factors: New compiler infrastructure introduced around Chromium 59. Change of optimization objective of JavaScript engines to favor real-world website loads. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 17 / 27
  30. RQ2: Performance of WebAssembly Question: Does targeting WebAssembly offer benefits

    in performance in the context of numerical computing? Background: New virtual ISA for the web Supported by all major browser vendors Embeds into the JavaScript run-time. Future features include SIMD and multi-threading Emscripten: Allows efficient translation from llvmIR to WebAssembly (Alon Zakai 2011). David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 18 / 27
  31. RQ2: Performance of WebAssembly versus C Figure: Geometric means for

    the performance WebAssembly versus native C Insights: Overall range of WebAssembly speedups against native code varied from 0.5x to 0.8x. Firefox 57 wasm achieved the best performance over the three platforms, however, no platform was able to outperform the native C code David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 19 / 27
  32. RQ2: WebAssembly performance versus JavaScript Figure: Geometric means for the

    performance WebAssembly against JavaScript Insights: Overall range of WebAssembly speedups against JavaScript varied from 1.3x to 2.7x. Largest speedup was achieved by the mbp2018 Safari engine, and Microsoft Edge engine. However, this is mostly due to the low JavaScript performance of both engines. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 20 / 27
  33. RQ3: NodeJS - Server-side performance Figure: Geometric means of Node.js

    performance versus C Insights: Node.js WebAssembly performance ranged between 0.5x - 0.9x against native C. No significant difference was found between Node.js and the same browser’s engine. Numeric benchmarks can help identify performance issues in the new versions of engines. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 21 / 27
  34. RQ4: Device Showdown Figure: Device performance across environments using the

    native C raspberry pi implementation as baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 22 / 27
  35. RQ4: Device Showdown Figure: Device performance across environments using the

    native C raspberry pi implementation as baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 23 / 27
  36. RQ4: Device Showdown Figure: Device performance across environments using the

    native C raspberry pi implementation as baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 24 / 27
  37. Conclusions - Performance Studies JavaScript numerical performance remains within 2

    factors of native C code. WebAssembly numerical performance ranges from 0.58x - 0.84x against the native C performance. WebAssembly performed significantly better than JavaScript on all platforms. The best performing browser was Firefox for both JavaScript and WebAssembly. Performance of the mobile phones is very impressive. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 25 / 27
  38. Future Work Experiment with other experimental parameters such as: Comparing

    cold-start vs. warm-start Comparing speedups for different input sizes, including the small and large inputs available in the Ostrich benchmark set. Comparing the effect of memory and cache for each system. Further extend our benchmarks to include more machine learning benchmarks. Links to our work: Wu-wei: https://github.com/Sable/wu-wei-benchmarking-toolkit Ostrich Benchmarks: https://github.com/Sable/ostrich-suite Experiment Data: https://github.com/Sable/dls18-ostrich David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 26 / 27