Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Testing in a Public Cloud - How bad is it really?

Bc9e8d8b75bf91fed108372a5f83855c?s=47 xLeitix
October 29, 2018

Performance Testing in a Public Cloud - How bad is it really?

Presentation from the Ericsson Metrics Day 2018.

Bc9e8d8b75bf91fed108372a5f83855c?s=128

xLeitix

October 29, 2018
Tweet

Transcript

  1. Performance Testing in a Public Cloud How bad is it

    really? Dr. Philipp Leitner philipp.leitner@chalmers.se @xLeitix
  2. Chalmers !2 On Software Performance (and performance testing)

  3. Chalmers !3

  4. Chalmers !4 Managing Software Performance Analytical (queuing theory etc.) Experimental

    (perf testing) Observational (monitoring)
  5. Chalmers !5 “ Software performance testing is (…) a testing

    practice performed to determine (…) responsiveness and stability under a workload “
  6. Chalmers !6 Problem? Performance testing is slow RXJava: JMH benchmark

    suite that takes multiple days to run Mid-size startup in the US: Single load test run takes about 2 weeks to complete
  7. Chalmers !7 Is cloudifying performance tests a way forward?

  8. Chalmers !8 Source: https://arxiv.org/abs/1411.2429

  9. Chalmers !9 Study results from cloudifying performance tests

  10. Chalmers !10 Research question: Ran 19 software performance tests in

    different environments How small performance regressions can we reliably find? Study setup: 4 open source projects in Java and Go Study executed in AWS, Azure, Google Baseline: baremetal server in Softlayer / Bluemix
  11. Chalmers !11 Error Modes False positives Test falsely indicates a

    regression False negatives Test misses an actual regression
  12. Chalmers !12 Variability of Software Benchmark Results

  13. Chalmers !13 Testing strategies “Once per release” Or: instance-based testing

    (ibs) “Controlled experiment” Or: trial-based testing (tbs)
  14. Chalmers !14 False Positives • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Wilcoxon test with ibs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Wilcoxon test with tbs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Confidence−interval test with ibs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 1 2 3 5 1 2 3 5 10 15 20 Confidence−interval test with tbs Project • Log4j2 RxJava bleve etcd Trials Instances How many repetitions are needed to get <5% false positives?
  15. Chalmers !15 False Negatives What’s the smallest slowdown we can

    find >95% of the time? 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Wilcoxon test with tbs 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Confidence−interval test with tbs Instances 1 5 10 20 # Benchmarks Minimal−detectable Slowdown [%]
  16. Chalmers !16 Suggestions “Controlled experiment” style Provides much more reliable

    results than just running once per release Use a randomized execution order Experiment with different statistical methods False positives and false negatives are both real problems Often needs 20+ repetitions to achieve sufficient reliability
  17. Chalmers !17 Key Takeaways

  18. Chalmers !18 Key Takeaways

  19. Chalmers !19 Key Takeaways 0 25 50 75 100 1

    1.5 2 5 10 50 100 1000 Inf Wilcoxon test with tbs 0 25 50 75 100 1 1.5 2 5 10 50 100 1000 Inf Confidence−interval test with tbs Instances 1 5 10 20 # Benchmarks Minimal−detectable Slowdown [%]
  20. Chalmers !20 Further Reading Philipp Leitner, Jürgen Cito (2016). Patterns

    in the Chaos - A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Transactions on Internet Technology, 16(3), pp. 15:1–15:23. New York, NY, USA. Christoph Laaber, Philipp Leitner (2018). An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of the 15th International Conference on Mining Software Repositories Tomas Kalibera, Richard Jones (2013). Rigorous benchmarking in reasonable time. In Proceedings of the 2013 International Symposium on Memory Management Ali Abedi, Tim Brecht (2017). Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments. Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering