Normalness Representative of a typical app No occult tuning High quality Model best app dev practices Model best performance practices Test framework, not infrastructure Results not dominated by database Aim to be CPU-bound
is there noise in the results from things outside our control? - are we reporting useful metrics? - does this help us make a decision? - is it answering a question we actually care about? - is this close to real-world? - is this representative of the way applications will be run? realism relevance