Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Testing in a Public Cloud - How bad is it really?

xLeitix
October 29, 2018

Performance Testing in a Public Cloud - How bad is it really?

Presentation from the Ericsson Metrics Day 2018.

xLeitix

October 29, 2018
Tweet

More Decks by xLeitix

Other Decks in Technology

Transcript

  1. Chalmers !5 “ Software performance testing is (…) a testing

    practice performed to determine (…) responsiveness and stability under a workload “
  2. Chalmers !6 Problem? Performance testing is slow RXJava: JMH benchmark

    suite that takes multiple days to run Mid-size startup in the US: Single load test run takes about 2 weeks to complete
  3. Chalmers !10 Research question: Ran 19 software performance tests in

    different environments How small performance regressions can we reliably find? Study setup: 4 open source projects in Java and Go Study executed in AWS, Azure, Google Baseline: baremetal server in Softlayer / Bluemix
  4. Chalmers !11 Error Modes False positives Test falsely indicates a

    regression False negatives Test misses an actual regression
  5. Chalmers !13 Testing strategies “Once per release” Or: instance-based testing

    (ibs) “Controlled experiment” Or: trial-based testing (tbs)
  6. Chalmers !14 False Positives • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Wilcoxon test with ibs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Wilcoxon test with tbs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Confidence−interval test with ibs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Confidence−interval test with tbs Project • Log4j2 RxJava bleve etcd Trials Instances How many repetitions are needed to get <5% false positives?
  7. Chalmers !15 False Negatives What’s the smallest slowdown we can

    find >95% of the time? 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Wilcoxon test with tbs 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Confidence−interval test with tbs Instances 1 5 10 20 # Benchmarks Minimal−detectable Slowdown [%]
  8. Chalmers !16 Suggestions “Controlled experiment” style Provides much more reliable

    results than just running once per release Use a randomized execution order Experiment with different statistical methods False positives and false negatives are both real problems Often needs 20+ repetitions to achieve sufficient reliability
  9. Chalmers !19 Key Takeaways 0 25 50 75 100 1

    1.5 2 5 10 50 100 1000 Inf Wilcoxon test with tbs 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Confidence−interval test with tbs Instances 1 5 10 20 # Benchmarks Minimal−detectable Slowdown [%]
  10. Chalmers !20 Further Reading Philipp Leitner, Jürgen Cito (2016). Patterns

    in the Chaos - A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Transactions on Internet Technology, 16(3), pp. 15:1–15:23. New York, NY, USA. Christoph Laaber, Philipp Leitner (2018). An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of the 15th International Conference on Mining Software Repositories Tomas Kalibera, Richard Jones (2013). Rigorous benchmarking in reasonable time. In Proceedings of the 2013 International Symposium on Memory Management Ali Abedi, Tim Brecht (2017). Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments. Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering