Numerical computing on the web: benchmarking for the future

Transcript

Numerical Computing on the Web: Benchmarking for the Future Sable

Research Group McGill University David Herrera, Hanfeng Chen, Eric Lavoie, Laurie Hendren [email protected] November 7, 2018 David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 1 / 27

Why numerical performance on the Web? Web-enabled devices are everywhere

Various compute-intensive applications coming up Tensorﬂow.js NLGProtein Viewer Recent addition of WebAssembly to the web world. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 2 / 27

Why numerical performance on the Web? Web-enabled devices are everywhere

Various compute-intensive applications coming up Tensorﬂow.js NLGProtein Viewer Recent addition of WebAssembly to the web world. (Source: https://js.tensorﬂow.org/) (Source: http://nglviewer.org) David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 2 / 27

Why numerical performance on the Web? Web-enabled devices are everywhere

Various compute-intensive applications coming up Tensorﬂow.js NLGProtein Viewer Recent addition of WebAssembly to the web world. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 2 / 27

Background The DLS 2014 performance study (Khan et al. 2014)

led to two further projects: Wu-wei: A benchmarking approach to easily, run, manage and reproduce experiments. Numerical Study: A numerical performance study of JavaScript and WebAssembly along diﬀerent web-enabled environments and devices. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 3 / 27

Wu-wei Benchmarking Toolkit David Herrera (McGill University) Numerical Computing on

the Web November 7, 2018 4 / 27

A benchmarking approach: Wu-wei . Wu-wei provides solutions to the

following problems: Combinatorial explosion of possible conﬁgurations. Restrictions on how the diﬀerent experimental dimensions interact. Reproducibility of experiments and handling of dependencies. Benchmarking in remote environments such as mobile and IoT. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 5 / 27

A benchmarking approach: Wu-wei . Wu-wei provides solutions to the

Wu-wei: Overview "input-size":"small" Process Algorithm or Problem Definition Implementation Compiler

Environment Platform Experiment Concrete Realization in a Programming Language Static T ransformer to Executable Execution Environment Hardware + OS User-Supplied Experiment Configuration DEFINITION EXAMPLES Back-propagation C, JavaScript Browserify, Emcc, Gcc Chrome v63.0.3239.84, Node.js v8.0.0, Native Iphone X + OSX 11.1, MacBook Pro 2012 + OSX 10.11.6 Figure: Wu-wei artifacts David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 6 / 27

Wu-wei: Measuring browser remote execution wu-wei wu-wei hosted environment Firebase

Spawn Process Browser Open at Server IP ---- Initial set up ---- Request ---- Response Server Device New Bench. Iteration Benchmark. Response Spawn Browser Benchmark Response Spawn Browser {IP, PORT} {IP, PORT} Figure: Wu-wei remote browser execution 1 The environment spawns a local website and sends the run information to Firebase. 2 The Wu-wei device captures the request from Firebase. 3 Device spawns the speciﬁed browser in remote environment. 4 The environment captures result and returns control to the Wu-wei. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 7 / 27

Wu-wei: Measuring browser remote execution wu-wei wu-wei hosted environment Firebase

Conclusions: Wu-wei Wu-wei is a benchmarking toolkit that allows you

to easily organize, run and query benchmarking experiments across a variety of experimental dimensions. We have extended Wu-wei to perform centralized benchmarking across remote environments, including mobile, and some IoT devices. Repository: https://github.com/Sable/wu-wei-benchmarking-toolkit David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 8 / 27

Performance Studies David Herrera (McGill University) Numerical Computing on the

Web November 7, 2018 9 / 27

Performance Studies End Goal: To study the numerical performance of

JavaScript and WebAssembly across a variety of browsers and devices. Including some desktops, tablets, smart-phones, and single-board computers. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 10 / 27

Methodology - Choice of Benchmarks Ostrich Benchmark Set (Khan et

al. 2014) Numerical benchmark set containing 12 benchmarks representing the 7 dwarf numerical categories identiﬁed by Phillip Colella in 2004. Each benchmark contains a C implementation and an equivalent JavaScript implementation. WebAssembly is obtained by compiling the C benchmarks Emscripten (Alon Zakai 2011) version 1.37.22 using optimization ﬂag -03. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 11 / 27

Methodology - Choice of Benchmarks Ostrich Benchmark Set (Khan et

Methodology - Execution & Timing The experiments measure start-up time

typical of one benchmark run. At each benchmark run, the browser/environment is re-spawned as to avoid cached optimizations of the JavaScript engines. Each benchmark was ran a total of 30 times for each environment as suggested by Georges et. al 2007. Comparisons are made using relative speedups versus a common baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 12 / 27

Methodology - Execution & Timing The experiments measure start-up time

Methodology - Platforms David Herrera (McGill University) Numerical Computing on

the Web November 7, 2018 13 / 27

Environments David Herrera (McGill University) Numerical Computing on the Web

November 7, 2018 14 / 27

RQ1: Old versus new JavaScript Question: How has JavaScript evolve

in the context of numerical computing since the study in 2014? backprop bfs crc fft hmm lavamd lud nqueens nw pagerank spmv srad geomean 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Speedup (relative to C) Chromium38-js Firefox39-js Chrome63-js Firefox57-js C Figure: JavaScript Speed-up performance against native C for MacBook Pro 2018. Experimental Setup: Platforms: ubuntu-deer and mbp2018 Environments: 2014: Firefox 39 & Chromium 38. 2018: Firefox 57 & Chrome 63. Comparison: Speedups versus the C baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 15 / 27

RQ1: Old versus new JavaScript Question: How has JavaScript evolve

in the context of numerical computing since the study in 2014? backprop bfs crc fft hmm lavamd lud nqueens nw pagerank spmv srad geomean 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Speedup (relative to C) Chromium38-js Firefox39-js Chrome63-js Firefox57-js C Figure: JavaScript Speed-up performance against native C for MacBook Pro 2018. Experimental Setup: Platforms: ubuntu-deer and mbp2018 Environments: 2014: Firefox 39 & Chromium 38. 2018: Firefox 57 & Chrome 63. Comparison: Speedups versus the C baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 15 / 27

RQ1: Old versus new JavaScript Figure: Geometric means for the

performance of old versus new engines against native C Insights: JavaScript performance remains within a 2 factor of native code. Chrome performance decreased since 2014 for both platforms. Firefox performance has increased since 2014, but it remains within 2x. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 16 / 27

RQ1: Old versus new JavaScript Figure: Geometric means for the

RQ1: Old versus new JavaScript Figure: Evolution of numeric performance

since 2014 for the Chrome and Firefox browsers Insights: The performance of numeric code has not improved much since 2014. Firefox browser performance has been progressively increasing since 2014. Chromium’s performance has decreased, this is due to two factors: New compiler infrastructure introduced around Chromium 59. Change of optimization objective of JavaScript engines to favor real-world website loads. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 17 / 27

RQ2: Performance of WebAssembly Question: Does targeting WebAssembly oﬀer beneﬁts

in performance in the context of numerical computing? Background: New virtual ISA for the web Supported by all major browser vendors Embeds into the JavaScript run-time. Future features include SIMD and multi-threading Emscripten: Allows eﬃcient translation from llvmIR to WebAssembly (Alon Zakai 2011). David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 18 / 27

RQ2: Performance of WebAssembly versus C Figure: Geometric means for

the performance WebAssembly versus native C Insights: Overall range of WebAssembly speedups against native code varied from 0.5x to 0.8x. Firefox 57 wasm achieved the best performance over the three platforms, however, no platform was able to outperform the native C code David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 19 / 27

RQ2: WebAssembly performance versus JavaScript Figure: Geometric means for the

performance WebAssembly against JavaScript Insights: Overall range of WebAssembly speedups against JavaScript varied from 1.3x to 2.7x. Largest speedup was achieved by the mbp2018 Safari engine, and Microsoft Edge engine. However, this is mostly due to the low JavaScript performance of both engines. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 20 / 27

RQ3: NodeJS - Server-side performance Figure: Geometric means of Node.js

performance versus C Insights: Node.js WebAssembly performance ranged between 0.5x - 0.9x against native C. No signiﬁcant diﬀerence was found between Node.js and the same browser’s engine. Numeric benchmarks can help identify performance issues in the new versions of engines. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 21 / 27

RQ4: Device Showdown Figure: Device performance across environments using the

native C raspberry pi implementation as baseline. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 22 / 27

RQ4: Device Showdown Figure: Device performance across environments using the

Conclusions - Performance Studies JavaScript numerical performance remains within 2

factors of native C code. WebAssembly numerical performance ranges from 0.58x - 0.84x against the native C performance. WebAssembly performed signiﬁcantly better than JavaScript on all platforms. The best performing browser was Firefox for both JavaScript and WebAssembly. Performance of the mobile phones is very impressive. David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 25 / 27

Future Work Experiment with other experimental parameters such as: Comparing

cold-start vs. warm-start Comparing speedups for diﬀerent input sizes, including the small and large inputs available in the Ostrich benchmark set. Comparing the eﬀect of memory and cache for each system. Further extend our benchmarks to include more machine learning benchmarks. Links to our work: Wu-wei: https://github.com/Sable/wu-wei-benchmarking-toolkit Ostrich Benchmarks: https://github.com/Sable/ostrich-suite Experiment Data: https://github.com/Sable/dls18-ostrich David Herrera (McGill University) Numerical Computing on the Web November 7, 2018 26 / 27

Questions David Herrera (McGill University) Numerical Computing on the Web

November 7, 2018 27 / 27

Numerical computing on the web: benchmarking fo...

Numerical computing on the web: benchmarking for the future

More Decks by David

Other Decks in Research

Featured

Transcript