different environments How small performance regressions can we reliably find? Study setup: 4 open source projects in Java and Go Study executed in AWS, Azure, Google Baseline: baremetal server in Softlayer / Bluemix
results than just running once per release Use a randomized execution order Experiment with different statistical methods False positives and false negatives are both real problems Often needs 20+ repetitions to achieve sufficient reliability
in the Chaos - A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Transactions on Internet Technology, 16(3), pp. 15:1–15:23. New York, NY, USA. Christoph Laaber, Philipp Leitner (2018). An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of the 15th International Conference on Mining Software Repositories Tomas Kalibera, Richard Jones (2013). Rigorous benchmarking in reasonable time. In Proceedings of the 2013 International Symposium on Memory Management Ali Abedi, Tim Brecht (2017). Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments. Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering