the development of an experimental seasonal MLOS forecasting scheme for the Pacific Islands Nicolas Fauchereau 1,2 Scott Stephens 1 Nigel Goodhue 1 Rob Bell 1 Doug Ramsay 1 [email protected] 1NIWA Ltd., Auckland, New Zealand 2Oceanography Dept., University of Cape-Town, Cape-Town, South Africa June 21, 2013 1/19
contents 1 Introduction 2 Data processing Mean Level of the Sea anomalies (MLOS) Predictors sets Indices SST EOFs 3 Methods Regression Classification 4 Results 5 Conclusion and recommendations 2/19
Set out in the “White Paper” High impact from sea level extremes Value in developing an “extreme sea-level calendar” Extreme tides + NTR (MLOS + “high frequency”) Goal Compared to existing PEAC scheme: Extend coverage to non-US affiliated Islands Frequency: every month for the coming 3 months (Island Climate Update) Performance of the model, type of forecast (probabilistic ?) 3/19
Provide recommendations: Data processing, predictand Choice of the set of predictors Statistical methods for prediction Operational Implementation Implementation For 3 Islands in the Pacific (presenting wide range of variability): ”Hindcast”: forecast for T+1 to 3 using information at T0 (e.g. May for June-August) Different predictors Different methods (state of the art Machine Learning) 4/19
Coordinates (144.7833 W., 13.4500 N.) 1948-03-10 to 2008-12-31 proportion of days missing: 12 % Kiribari, Tarawa Coordinates (172.9300 W., 1.3625 N.) 1974-05-03 to 2012-07-30 proportion of days missing: 8 % Cook Islands, Rarotonga Coordinates (200.2147 W., 21.2048 S.) 1977-04-24 to 2011-08-31 proportion of days missing: 2 % 5/19
Choice of the predictors set is dictated by: Relevance: Need to reflect plausible physical relationships between Ocean-Climate system and Sea-Level. Operational constraints: Must be available in near real time (within the first 5 days of Month 1 for forecast Season Month 1 - Month 3). 8/19
of SST and Atmospheric variables, monthly time-scale: NINOS (1+2, 3.4, 3, 4): from CPC Southern Oscillation Index (SOI): calculated by NIWA, data from BoM El Nino Modoki Index (EMI): calculated from ERSST dataset Seasonal Cycle: (first 3 harmonics on MLOS climatology) Regional SST anomalies ... 9/19
1 Logistic Regression Binomial or multinomial (categorical) response variable Models probability of observation to belong to each class 2 Support Vector Machines (SVM) Optimal hyperplane (2 classes) or set of hyperplanes (k classes) Kernel trick: map data to higher dimensional space to deal with non-linearly separable classes Radial Basis Function is widely used kernel 13/19
the methods referred to above are tested in turn, using successively the Indices and the SST EOFs set as predictors Applied to Guam, Kiribati and Cooks ”Best” Model selected using objective measures (i.e. R-squared) + cross-validation + expert judgment Results for Guam only presented in details 14/19
Guam Notes on the Guam time-series 12 % of missing values Large gap October 1997 - January 1999, 26 consecutive seasons missing trend from about 2002 1979 1984 1989 1994 1999 2004 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 Guam time-series TS minus quadratic fit Original Time-series quadratic fit 15/19
recommendations For regression (continuous): MARS with SST EOFs For classification (categorical): SVM with SST EOFs how to deal with (non-linear) trend ? here we used a damped linear term, but bit of a ad-hoc solution Include Pacific Decadal Oscillation Consider quantile-based categories for classification Ensemble techniques (Random Forests, bagging, boosting) for classifications ? Hybrid predictor set ? EOF on enhanced indices set Length of the time-series (30 years is really minimum) 19/19