Slide 12
Slide 12 text
Two axes of correctness
• Experimental Design
- Benchmarks, input sizes, data sets
- VM parameters, hardware platform
• Evaluation Methodology
- Which numbers to report (avg, geomean, best, worst)
- Non-determinism in or out (compilation, GC times, etc.)
- Statistical rigorous methodologies [1]
[1] A. George, D. Buytaert, L. Eeckhout. Statistically Rigorous Java Performance Evaluation, In OOPSLA 2007.