Your benchmark may not guide real application performance

by Tetsuharu Ohzeki

Slide 1

Slide 1 text

Your benchmark may not guide real application performance Tetsuharu OHZEKI

Slide 2

Slide 2 text

Performance is always hot topic • We love fast software! • Low latency, high throughput, power eﬃcient, fast response… • Would you like to use slow software? • Have you ever see that people saying “many features are important than speed” loves slow software actually? • No!

Slide 3

Slide 3 text

Performance is important for • User Experience • One of the fundamental value of software • Marketing value • New comer sometimes beat down other products by performance • In 2008, Google Chrome beat down other browsers

Slide 4

Slide 4 text

Make software performant • Random optimization does not contribute to actual user experience • “Don’t guess, measure” is always right • We can use benchmark to measure our software

Slide 5

Slide 5 text

Benchmark • Score software performance as Quantitive value • i.e. normalize software performance by benchmark • Reproducibility is important • Keep our application faster from regression • We use benchmark to evaluate our application performance

Slide 6

Slide 6 text

Questions • Does your benchmark is really make sense? • Does your benchmark scores real application scenario actually?

Slide 7

Slide 7 text

Goals • Show that benchmarking by real scenario is  important principle to make your application faster • Introduce (pitfall) case studies related to benchmarks

Slide 8

Slide 8 text

Outline 1. Introduction 2. What should we focus? ⬅ 3. JS cost is diﬃcult 4. Critical path may be hidden 5. How to improve performance? 6. Conclusion

Slide 9

Slide 9 text

Work on video-streaming service… • In this case, performance key is when start to play video • What is meaningful metric? • First Meaningful Paint/First Content Paint is nice • Time to Interactive? • Is it meaningful for this service actually? • Think about the bad case that page is responsible but video is not started

Slide 10

Slide 10 text

General metrics may not suite special case • General metrics is useful to measure performance of general web page • e.g. Startup time • But general metrics cannot catch up application speciﬁc performance • Measure real scenario for your application • What is your application doing? • What is purpose?

Slide 11

Slide 11 text

Lesson • Performance Metrics is not simple • General Purpose • Application speciﬁc • We should think about what performance metrics is most suitable • Not only Lighthouse! • We should focus actual scenario that our application will do

Slide 12

Slide 12 text

Outline 1. Introduction 2. What should we focus? 3. JS cost is diﬃcult ⬅ 4. Critical path may be hidden 5. How to improve performance 6. Conclusion

Slide 13

Slide 13 text

When you optimize your code… • I’d like to optimize my slow code! • But the running time/ops is pretty small… (e.g. ~0.1ms/ops) • I cannot ﬁnd a diﬀerence! • I have nice idea. Run this code 10000 times • The result will be stretched! Easy to compare! • …Wait! Is this nice approach really? 

Slide 14

Slide 14 text

JSVM has multiple tiers • JSVM has multiple tier to optimize user code • e.g. JavaScriptCore has 4 tier (LLInt, Baseline, DFG, FTL) • JIT compiler change optimization level speculatively by how much you code run • Hot path (executed frequently) would be heavily optimized • Cold path (executed rarely) would be less optimized

Slide 15

Slide 15 text

Hot loop may not be what your application do actualy • Typical micro-benchmark execute many iterations to stabilize results • But many iteration would make functions compiled with heavy optimizations by highest JIT tier • If your actual workload is run only several times, many iteration leads a diﬀerent results from what you expected • Let’s see execution time changes of some cases from JetStream2

Slide 16

Slide 16 text

Plot JetStream2/prepack-wtb execution times of each iteration (change to iteration=100) Running Time (ms) 150 300 450 600 Iteration Count 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97100 WebKit r252841 Chrome Canary 80.0.3976.0 Firefox 72.0a1 (20191128214853) Lower Tier Highest Tier

Slide 17

Slide 17 text

Plot JetStream2/Air execution times of each iteration (iteration=120) Running Time (ms) 0 22.5 45 67.5 90 Iteration Count 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 WebKit r252841 Chrome Canary 80.0.3976.0 Firefox 72.0a1 (20191128214853) Lower Tier Highest Tier

Slide 18

Slide 18 text

Lesson • JSVM changes optimization levels by execution counts • Workload may changes your benchmark score • Be careful to proﬁle on actual workload as possible • Invalid assumption mislead your optimization strategy • By misleading, your application might go wrong…

Slide 19

Slide 19 text

Outline 1. Introduction 2. What should we focus? 3. JS cost is diﬃcult 4. Critical path may be hidden⬅ 5. How to improve performance 6. Conclusion

Slide 20

Slide 20 text

I tried to improve the page load time… • Add ‘defer’ attribute to to improve the overall page init speed • But it did not improve the ﬁrst meaningful paint. Why? cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Slide 21

Slide 21 text

Before After • Achieved to improve  sub-resource loading • But no improvement for  the critical path cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Slide 22

Slide 22 text

Why? • The critical path depends on a “bootstrap” script which starts working on DOMContentLoaded • script[defer] does not change this behavior • This “bootstrap” script is small size and fast execution • The proﬁler does not show up it as a “bottleneck” point easily cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Slide 23

Slide 23 text

(Unfortunately) Finding bottlenecks is hard •Using several tools is better for crosscutting analyzing bottleneck • But be careful, profiler sometimes shows unrelated values • It often requires domain specific knowledge • How your application works? • Is it a real bottleneck? • Performance Tracing for Tasks • Causal Profling [Curtsinger+, SOSP ‘15] (Virtual Speedup)  cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Slide 24

Slide 24 text

Benchmark Site for Networking • Firefox results slower than Chrome’s one on same devices • https://bugzilla.mozilla.org/show_bug.cgi?id=1556022 • https://bugzilla.mozilla.org/show_bug.cgi?id=1570313 • This means simply that “Firefox network stack is slow"? • We tend to think so. Really?

Slide 25

Slide 25 text

What did this benchmark measure in Firefox? https://twitter.com/hsivonen/status/1179763669535805441

Slide 26

Slide 26 text

• This benchmark caused many translation from utf8 -> utf16 • This site use XMLHttpRequest but its responseType is text for download test • Why not use “.responseType=arraybuﬀer”? • In worst case, this waste 59% of overall processing time in paint phase • Fancy animation caused performance issue that is not related to networking! What did this benchmark measure in Firefox?

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Lesson • Critical path is important but they might be hidden • Proﬁler might not shown them • There are may be problem which you cannot control • Improve your application actually, insight for your application speciﬁc behavior is most important • Breakdown bottlenecks with various tools & knowledge

Slide 29

Slide 29 text

Outline 1. Introduction 2. What should we focus? 3. JS cost is diﬃcult 4. Critical path may be hidden 5. How to improve performance⬅ 6. Conclusion

Slide 30

Slide 30 text

Use benchmark to keep your app faster • “The way to make a program faster is to never let get it slower” • https://webkit.org/performance/ • Let’s benchmark your application continuously, and plot results, per commit

Slide 31

Slide 31 text

Use benchmark to keep your app faster • Focus long term Trend • Each of score may bit change randomly by others • Other OS’ services, other guests on hypervisor, and more • Reproducible Infrastructure is important to test again

Slide 32

Slide 32 text

Outline 1. Introduction 2. What should we focus? 3. JS cost is diﬃcult 4. Critical path may be hidden 5. How to improve performance 6. Conclusion ⬅

Slide 33

Slide 33 text

Conclusions • Real scenario guide what you should improve performance • Analyze perf issues deeply with tools & your app speciﬁc knowledge • CI is nice to keep performance through iteration cycles • First step: Benchmark your application based on your story