How to run a stable benchmark

How to run a stable benchmark

Working on optimizations is a task more complex than expected on the first look. Any optimization must be measured to make sure that, in practice, it speeds up the application task. Problem: it is very hard to obtain stable benchmark results.

The stability of a benchmark (performance measurement) is essential to be able to compare two versions of the code and compute the difference (faster or slower?). An unstable benchmark is useless, and is a risk of giving a false result when comparing performance which could lead to bad decisions.

I'm gonna show you the Python project "perf" which helps to launch benchmarks, but also to analyze them: compute the mean and the standard deviation on multiple runs, render an histogram to visualize the probability curve, compare between multiple results, run again a benchmark to collect more samples, etc.

The use case is to measure small isolated optimizations on CPython and make sure that they don't introduce performance regression in term of performance.

Daa45563a98419bb1b6b63904ce71f95?s=128

Victor Stinner

February 05, 2017
Tweet