Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Your benchmark may not guide real application performance

Your benchmark may not guide real application performance

JSConfJP 2019

Tetsuharu Ohzeki

December 01, 2019
Tweet

More Decks by Tetsuharu Ohzeki

Other Decks in Programming

Transcript

  1. Your benchmark may not guide
    real application performance
    Tetsuharu OHZEKI

    View Slide

  2. Performance is always hot topic
    • We love fast software!

    • Low latency, high throughput, power efficient, fast response…

    • Would you like to use slow software?

    • Have you ever see that people saying “many features are
    important than speed” loves slow software actually?

    • No!

    View Slide

  3. Performance is important for
    • User Experience

    • One of the fundamental value of software
    • Marketing value

    • New comer sometimes beat down other products by performance

    • In 2008, Google Chrome beat down other browsers

    View Slide

  4. Make software performant
    • Random optimization does not contribute to actual user experience

    • “Don’t guess, measure” is always right

    • We can use benchmark to measure our software

    View Slide

  5. Benchmark
    • Score software performance as Quantitive value
    • i.e. normalize software performance by benchmark

    • Reproducibility is important
    • Keep our application faster from regression

    • We use benchmark to evaluate our application performance

    View Slide

  6. Questions
    • Does your benchmark is really make sense?

    • Does your benchmark scores real application scenario actually?

    View Slide

  7. Goals
    • Show that benchmarking by real scenario is

    important principle to make your application
    faster
    • Introduce (pitfall) case studies related to
    benchmarks

    View Slide

  8. Outline
    1. Introduction

    2. What should we focus? ⬅

    3. JS cost is difficult

    4. Critical path may be hidden

    5. How to improve performance?

    6. Conclusion

    View Slide

  9. Work on video-streaming service…
    • In this case, performance key is when start to play video

    • What is meaningful metric?

    • First Meaningful Paint/First Content Paint is nice

    • Time to Interactive?

    • Is it meaningful for this service actually?
    • Think about the bad case that page is responsible but video is not
    started

    View Slide

  10. General metrics may not suite special case
    • General metrics is useful to measure performance of general web page

    • e.g. Startup time

    • But general metrics cannot catch up application specific performance

    • Measure real scenario for your application
    • What is your application doing?

    • What is purpose?

    View Slide

  11. Lesson
    • Performance Metrics is not simple

    • General Purpose

    • Application specific

    • We should think about what performance metrics is most suitable
    • Not only Lighthouse!

    • We should focus actual scenario that our application will do

    View Slide

  12. Outline
    1. Introduction

    2. What should we focus?

    3. JS cost is difficult ⬅
    4. Critical path may be hidden

    5. How to improve performance

    6. Conclusion

    View Slide

  13. When you optimize your code…
    • I’d like to optimize my slow code!

    • But the running time/ops is pretty small… (e.g. ~0.1ms/ops)

    • I cannot find a difference!

    • I have nice idea. Run this code 10000 times

    • The result will be stretched! Easy to compare!

    • …Wait! Is this nice approach really?


    View Slide

  14. JSVM has multiple tiers
    • JSVM has multiple tier to optimize user code

    • e.g. JavaScriptCore has 4 tier (LLInt, Baseline, DFG, FTL)

    • JIT compiler change optimization level speculatively by how much you
    code run

    • Hot path (executed frequently) would be heavily optimized

    • Cold path (executed rarely) would be less optimized

    View Slide

  15. Hot loop may not be
    what your application do actualy
    • Typical micro-benchmark execute many iterations to stabilize results

    • But many iteration would make functions compiled with heavy
    optimizations by highest JIT tier

    • If your actual workload is run only several times, many iteration leads a
    different results from what you expected

    • Let’s see execution time changes of some cases from JetStream2

    View Slide

  16. Plot JetStream2/prepack-wtb execution times of each iteration

    (change to iteration=100)
    Running Time (ms)
    150
    300
    450
    600
    Iteration Count
    1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97100
    WebKit r252841 Chrome Canary 80.0.3976.0
    Firefox 72.0a1 (20191128214853)
    Lower Tier Highest Tier

    View Slide

  17. Plot JetStream2/Air execution times of each iteration

    (iteration=120)
    Running Time (ms)
    0
    22.5
    45
    67.5
    90
    Iteration Count
    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116
    WebKit r252841 Chrome Canary 80.0.3976.0
    Firefox 72.0a1 (20191128214853)
    Lower Tier Highest Tier

    View Slide

  18. Lesson
    • JSVM changes optimization levels by execution counts

    • Workload may changes your benchmark score

    • Be careful to profile on actual workload as possible

    • Invalid assumption mislead your optimization strategy
    • By misleading, your application might go wrong…

    View Slide

  19. Outline
    1. Introduction

    2. What should we focus?

    3. JS cost is difficult

    4. Critical path may be hidden⬅
    5. How to improve performance

    6. Conclusion

    View Slide

  20. I tried to improve the page load time…
    • Add ‘defer’ attribute to to improve the overall page init speed

    • But it did not improve the first meaningful paint. Why?
    cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

    View Slide

  21. Before
    After
    • Achieved to improve

    sub-resource loading
    • But no improvement for

    the critical path
    cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

    View Slide

  22. Why?
    • The critical path depends on a “bootstrap” script which starts working on
    DOMContentLoaded

    • script[defer] does not change this behavior
    • This “bootstrap” script is small size and fast execution

    • The profiler does not show up it as a “bottleneck” point easily
    cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

    View Slide

  23. (Unfortunately) Finding bottlenecks is hard
    •Using several tools is better for crosscutting analyzing bottleneck

    • But be careful, profiler sometimes shows unrelated values

    • It often requires domain specific knowledge

    • How your application works?

    • Is it a real bottleneck?

    • Performance Tracing for Tasks

    • Causal Profling [Curtsinger+, SOSP ‘15] (Virtual Speedup)

    cite: https://docs.google.com/presentation/d/1MXlFGqFQFJByv8k6Ege0pt0GwJQqbjoh7GdIYia9UQg/

    View Slide

  24. Benchmark Site for Networking
    • Firefox results slower than Chrome’s one on same devices

    • https://bugzilla.mozilla.org/show_bug.cgi?id=1556022

    • https://bugzilla.mozilla.org/show_bug.cgi?id=1570313

    • This means simply that “Firefox network stack is slow"?

    • We tend to think so. Really?

    View Slide

  25. What did this benchmark measure in Firefox?
    https://twitter.com/hsivonen/status/1179763669535805441

    View Slide

  26. • This benchmark caused many translation from utf8 -> utf16
    • This site use XMLHttpRequest but its responseType is text for
    download test

    • Why not use “.responseType=arraybuffer”?

    • In worst case, this waste 59% of overall processing time in paint
    phase
    • Fancy animation caused performance issue that is not related to
    networking!
    What did this benchmark measure in Firefox?

    View Slide

  27. • This benchmark caused many translation from utf8 -> utf16
    • This site use XMLHttpRequest but its responseType is text for
    download test

    • Why not use “.responseType=arraybuffer”?

    • In worst case, this waste 59% of overall processing time in paint
    phase
    • Fancy animation caused performance issue that is not related to
    networking!
    What did this benchmark measure in Firefox?

    View Slide

  28. Lesson
    • Critical path is important but they might be hidden
    • Profiler might not shown them

    • There are may be problem which you cannot control

    • Improve your application actually, insight for your application specific
    behavior is most important
    • Breakdown bottlenecks with various tools & knowledge

    View Slide

  29. Outline
    1. Introduction

    2. What should we focus?

    3. JS cost is difficult

    4. Critical path may be hidden

    5. How to improve performance⬅
    6. Conclusion

    View Slide

  30. Use benchmark to keep your app faster
    • “The way to make a program faster is to never let get it slower”
    • https://webkit.org/performance/
    • Let’s benchmark your application continuously, and plot results, per
    commit

    View Slide

  31. Use benchmark to keep your app faster
    • Focus long term Trend
    • Each of score may bit change randomly by others

    • Other OS’ services, other guests on hypervisor, and more

    • Reproducible Infrastructure is important to test again

    View Slide

  32. Outline
    1. Introduction

    2. What should we focus?

    3. JS cost is difficult

    4. Critical path may be hidden

    5. How to improve performance

    6. Conclusion ⬅

    View Slide

  33. Conclusions
    • Real scenario guide what you should improve performance

    • Analyze perf issues deeply with tools & your app specific knowledge

    • CI is nice to keep performance through iteration cycles

    • First step: Benchmark your application based on your story

    View Slide