Working on optimizations is a task more complex than expected on the first look. Any optimization must be measured to make sure that, in practice, it speeds up the application task. Problem: it is very hard to obtain stable benchmark results.
The stability of a benchmark (performance measurement) is essential to be able to compare two versions of the code and compute the difference (faster or slower?). An unstable benchmark is useless, and is a risk of giving a false result when comparing performance which could lead to bad decisions.
I'm gonna show you the Python project "perf" which helps to launch benchmarks, but also to analyze them: compute the mean and the standard deviation on multiple runs, render an histogram to visualize the probability curve, compare between multiple results, run again a benchmark to collect more samples, etc.
The use case is to measure small isolated optimizations on CPython and make sure that they don't introduce performance regression in term of performance.