Performance Testing in a Public Cloud - How bad is it really?

Performance Testing in a Public Cloud How bad is it
really? Dr. Philipp Leitner [email protected] @xLeitix

Chalmers !2 On Software Performance (and performance testing)

Chalmers !3

Chalmers !4 Managing Software Performance Analytical (queuing theory etc.) Experimental
(perf testing) Observational (monitoring)

Chalmers !5 “ Software performance testing is (…) a testing
practice performed to determine (…) responsiveness and stability under a workload “

Chalmers !6 Problem? Performance testing is slow RXJava: JMH benchmark
suite that takes multiple days to run Mid-size startup in the US: Single load test run takes about 2 weeks to complete

Chalmers !7 Is cloudifying performance tests a way forward?

Chalmers !8 Source: https://arxiv.org/abs/1411.2429

Chalmers !9 Study results from cloudifying performance tests

Chalmers !10 Research question: Ran 19 software performance tests in
different environments How small performance regressions can we reliably find? Study setup: 4 open source projects in Java and Go Study executed in AWS, Azure, Google Baseline: baremetal server in Softlayer / Bluemix

Chalmers !11 Error Modes False positives Test falsely indicates a
regression False negatives Test misses an actual regression

Chalmers !12 Variability of Software Benchmark Results

Chalmers !13 Testing strategies “Once per release” Or: instance-based testing
(ibs) “Controlled experiment” Or: trial-based testing (tbs)

Chalmers !14 False Positives • • • • • •
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Wilcoxon test with ibs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Wilcoxon test with tbs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Confidence−interval test with ibs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Confidence−interval test with tbs Project • Log4j2 RxJava bleve etcd Trials Instances How many repetitions are needed to get <5% false positives?

Chalmers !15 False Negatives What’s the smallest slowdown we can
find >95% of the time? 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Wilcoxon test with tbs 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Confidence−interval test with tbs Instances 1 5 10 20 # Benchmarks Minimal−detectable Slowdown [%]

Chalmers !16 Suggestions “Controlled experiment” style Provides much more reliable
results than just running once per release Use a randomized execution order Experiment with different statistical methods False positives and false negatives are both real problems Often needs 20+ repetitions to achieve sufficient reliability

Chalmers !17 Key Takeaways

Chalmers !18 Key Takeaways

Chalmers !19 Key Takeaways 0 25 50 75 100 1
1.5 2 5 10 50 100 1000 Inf Wilcoxon test with tbs 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Confidence−interval test with tbs Instances 1 5 10 20 # Benchmarks Minimal−detectable Slowdown [%]

Chalmers !20 Further Reading Philipp Leitner, Jürgen Cito (2016). Patterns
in the Chaos - A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Transactions on Internet Technology, 16(3), pp. 15:1–15:23. New York, NY, USA. Christoph Laaber, Philipp Leitner (2018). An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of the 15th International Conference on Mining Software Repositories Tomas Kalibera, Richard Jones (2013). Rigorous benchmarking in reasonable time. In Proceedings of the 2013 International Symposium on Memory Management Ali Abedi, Tim Brecht (2017). Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments. Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering

Performance Testing in a Public Cloud - How bad...

Performance Testing in a Public Cloud - How bad is it really?

xLeitix

More Decks by xLeitix

Other Decks in Technology

Featured

Transcript