Your benchmark may not guide real application performance

Your benchmark may not guide real application performance Tetsuharu OHZEKI

Performance is always hot topic • We love fast software!
• Low latency, high throughput, power eﬃcient, fast response… • Would you like to use slow software? • Have you ever see that people saying “many features are important than speed” loves slow software actually? • No!

Performance is important for • User Experience • One of
the fundamental value of software • Marketing value • New comer sometimes beat down other products by performance • In 2008, Google Chrome beat down other browsers

Make software performant • Random optimization does not contribute to
actual user experience • “Don’t guess, measure” is always right • We can use benchmark to measure our software

Benchmark • Score software performance as Quantitive value • i.e.
normalize software performance by benchmark • Reproducibility is important • Keep our application faster from regression • We use benchmark to evaluate our application performance

Questions • Does your benchmark is really make sense? •
Does your benchmark scores real application scenario actually?

Goals • Show that benchmarking by real scenario is  important
principle to make your application faster • Introduce (pitfall) case studies related to benchmarks

Outline 1. Introduction 2. What should we focus? ⬅ 3.
JS cost is diﬃcult 4. Critical path may be hidden 5. How to improve performance? 6. Conclusion

Work on video-streaming service… • In this case, performance key
is when start to play video • What is meaningful metric? • First Meaningful Paint/First Content Paint is nice • Time to Interactive? • Is it meaningful for this service actually? • Think about the bad case that page is responsible but video is not started

General metrics may not suite special case • General metrics
is useful to measure performance of general web page • e.g. Startup time • But general metrics cannot catch up application speciﬁc performance • Measure real scenario for your application • What is your application doing? • What is purpose?

Lesson • Performance Metrics is not simple • General Purpose
• Application speciﬁc • We should think about what performance metrics is most suitable • Not only Lighthouse! • We should focus actual scenario that our application will do

Outline 1. Introduction 2. What should we focus? 3. JS
cost is diﬃcult ⬅ 4. Critical path may be hidden 5. How to improve performance 6. Conclusion

When you optimize your code… • I’d like to optimize
my slow code! • But the running time/ops is pretty small… (e.g. ~0.1ms/ops) • I cannot ﬁnd a diﬀerence! • I have nice idea. Run this code 10000 times • The result will be stretched! Easy to compare! • …Wait! Is this nice approach really? 

JSVM has multiple tiers • JSVM has multiple tier to
optimize user code • e.g. JavaScriptCore has 4 tier (LLInt, Baseline, DFG, FTL) • JIT compiler change optimization level speculatively by how much you code run • Hot path (executed frequently) would be heavily optimized • Cold path (executed rarely) would be less optimized

Hot loop may not be what your application do actualy
• Typical micro-benchmark execute many iterations to stabilize results • But many iteration would make functions compiled with heavy optimizations by highest JIT tier • If your actual workload is run only several times, many iteration leads a diﬀerent results from what you expected • Let’s see execution time changes of some cases from JetStream2

Plot JetStream2/prepack-wtb execution times of each iteration (change to iteration=100)
Running Time (ms) 150 300 450 600 Iteration Count 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97100 WebKit r252841 Chrome Canary 80.0.3976.0 Firefox 72.0a1 (20191128214853) Lower Tier Highest Tier

Plot JetStream2/Air execution times of each iteration (iteration=120) Running Time
(ms) 0 22.5 45 67.5 90 Iteration Count 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 WebKit r252841 Chrome Canary 80.0.3976.0 Firefox 72.0a1 (20191128214853) Lower Tier Highest Tier

Lesson • JSVM changes optimization levels by execution counts •
Workload may changes your benchmark score • Be careful to proﬁle on actual workload as possible • Invalid assumption mislead your optimization strategy • By misleading, your application might go wrong…

cost is diﬃcult 4. Critical path may be hidden⬅ 5. How to improve performance 6. Conclusion

I tried to improve the page load time… • Add
‘defer’ attribute to <script/> to improve the overall page init speed • But it did not improve the ﬁrst meaningful paint. Why? cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Before After • Achieved to improve  sub-resource loading • But
no improvement for  the critical path cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Why? • The critical path depends on a “bootstrap” script
which starts working on DOMContentLoaded • script[defer] does not change this behavior • This “bootstrap” script is small size and fast execution • The proﬁler does not show up it as a “bottleneck” point easily cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

(Unfortunately) Finding bottlenecks is hard •Using several tools is better
for crosscutting analyzing bottleneck • But be careful, profiler sometimes shows unrelated values • It often requires domain specific knowledge • How your application works? • Is it a real bottleneck? • Performance Tracing for Tasks • Causal Profling [Curtsinger+, SOSP ‘15] (Virtual Speedup)  cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

Benchmark Site for Networking • Firefox results slower than Chrome’s
one on same devices • https://bugzilla.mozilla.org/show_bug.cgi?id=1556022 • https://bugzilla.mozilla.org/show_bug.cgi?id=1570313 • This means simply that “Firefox network stack is slow"? • We tend to think so. Really?

What did this benchmark measure in Firefox? https://twitter.com/hsivonen/status/1179763669535805441

• This benchmark caused many translation from utf8 -> utf16
• This site use XMLHttpRequest but its responseType is text for download test • Why not use “.responseType=arraybuﬀer”? • In worst case, this waste 59% of overall processing time in paint phase • Fancy animation caused performance issue that is not related to networking! What did this benchmark measure in Firefox?

Lesson • Critical path is important but they might be
hidden • Proﬁler might not shown them • There are may be problem which you cannot control • Improve your application actually, insight for your application speciﬁc behavior is most important • Breakdown bottlenecks with various tools & knowledge

cost is diﬃcult 4. Critical path may be hidden 5. How to improve performance⬅ 6. Conclusion

Use benchmark to keep your app faster • “The way
to make a program faster is to never let get it slower” • https://webkit.org/performance/ • Let’s benchmark your application continuously, and plot results, per commit

Use benchmark to keep your app faster • Focus long
term Trend • Each of score may bit change randomly by others • Other OS’ services, other guests on hypervisor, and more • Reproducible Infrastructure is important to test again

cost is diﬃcult 4. Critical path may be hidden 5. How to improve performance 6. Conclusion ⬅

Conclusions • Real scenario guide what you should improve performance
• Analyze perf issues deeply with tools & your app speciﬁc knowledge • CI is nice to keep performance through iteration cycles • First step: Benchmark your application based on your story

Your benchmark may not guide real application p...

Your benchmark may not guide real application performance

Tetsuharu Ohzeki

More Decks by Tetsuharu Ohzeki

Other Decks in Programming

Featured

Transcript

Your benchmark may not guide real application performance Tetsuharu OHZEKI

Performance is always hot topic • We love fast software!

Performance is important for • User Experience • One of

Make software performant • Random optimization does not contribute to

Benchmark • Score software performance as Quantitive value • i.e.

Questions • Does your benchmark is really make sense? •

Goals • Show that benchmarking by real scenario is  important

Outline 1. Introduction 2. What should we focus? ⬅ 3.

Work on video-streaming service… • In this case, performance key

General metrics may not suite special case • General metrics

Lesson • Performance Metrics is not simple • General Purpose

Outline 1. Introduction 2. What should we focus? 3. JS

When you optimize your code… • I’d like to optimize

JSVM has multiple tiers • JSVM has multiple tier to

Hot loop may not be what your application do actualy

Plot JetStream2/prepack-wtb execution times of each iteration (change to iteration=100)

Plot JetStream2/Air execution times of each iteration (iteration=120) Running Time

Lesson • JSVM changes optimization levels by execution counts •

Outline 1. Introduction 2. What should we focus? 3. JS

I tried to improve the page load time… • Add

Before After • Achieved to improve  sub-resource loading • But

Why? • The critical path depends on a “bootstrap” script

(Unfortunately) Finding bottlenecks is hard •Using several tools is better

Benchmark Site for Networking • Firefox results slower than Chrome’s

What did this benchmark measure in Firefox? https://twitter.com/hsivonen/status/1179763669535805441

• This benchmark caused many translation from utf8 -> utf16

• This benchmark caused many translation from utf8 -> utf16

Lesson • Critical path is important but they might be

Outline 1. Introduction 2. What should we focus? 3. JS

Use benchmark to keep your app faster • “The way

Use benchmark to keep your app faster • Focus long

Outline 1. Introduction 2. What should we focus? 3. JS

Conclusions • Real scenario guide what you should improve performance