assume that the reference forecast is simply a climatological forecast (i.e. in the case of quintile-‐based categories, equal probabilities of 0.2 are assigned to each category). Lastly, we’re going to assume that we are interested into skill metrics that takes into account the uncertainty in the forecast, i.e. they actually consider the distribution of the discrete probability values assigned to each category. Lastly another constraint is that the categorical (quintile based) MLOS predictand is ordinal (i.e. the categories are naturally ordered from e.g. ‘well below normal’ to ‘well above normal’) and thus the magnitude of the potential forecast error is important. Another option (which will NOT be discussed here) is to ignore the forecast’s uncertainty, by only considering the category to which the highest probability is assigned by the model. In this case, skill scores based on the calculation of hit rates, such as the Heidke Skill Score or Peirce Skill Scores can be used: their extensive description can be found in the IRI’s ‘Descriptions of the IRI climate forecast verification scores document, available at URL: http://iri.columbia.edu/wp-‐content/uploads/2013/07/scoredescriptions.pdf Given all the assumptions and constraints above (probabilistic forecasts, reference forecast is climatology, uncertainty in the forecast must be taken into account for the calculation of the skill score, predictand is an ordinal categorical variable), the metric that I recommend to compare the performance of the classification models is the Ranked Probability Skill Score (RPSS). The Ranked Probability Skill Score (RPSS) As usual, the RPSS is based on the scaling of one metric, namely the Ranked Probability Score (RPS) of the actual forecast to the one calculated for a reference forecast (again here climatological forecast). The RPS is essentially an extension of the Brier Score to multiple (more than 2) category events. Let J be the number of categories (5) events and therefore also the number of probabilities included in each forecast. The forecast vector is constituted of the forecast probabilities, summing to 1, e.g.: for a 5 categories forecast: Y1 = 0.05, y2=0.1, y3= 0.2, y4=0.25, y5=0.4 The observation vector has also 5 components, with the observed outcome set to 1 and the other components set to 0, i.e. in the case of observed MLOS anomaly above the 80th percentile (‘well-‐above’ category) the observation vector is Y1 = 0, y2=0, y3= 0, y4=0, y5=1