03 Nov 2022
03 Nov 2022
Status: this preprint is currently under review for the journal HESS.

Machine Learning and Committee Models for Improving ECMWF Subseasonal to Seasonal (S2S) Precipitation Forecast

Mohamed Elneel Elshaikh Eltayeb Elbasheer1,, Gerald Augusto Corzo1,, Dimitri Solomatine1, and Emmanouil Varouchakis2 Mohamed Elneel Elshaikh Eltayeb Elbasheer et al.
  • 1Hydroinformatics department, IHE Delft Institute for Water Education, Delft, Netherlands
  • 2Technical University of Crete, Crete, Greece
  • These authors contributed equally to this work.

Abstract. The European Centre for Medium-Range Weather Forecasts (ECMWF) provides subseasonal to seasonal (S2S) precipitation forecasts; S2S forecasts extend from two weeks to two months ahead; however, the accuracy of S2S precipitation forecasting is still underdeveloped, and a lot of research and competitions have been proposed to study how machine learning (ML) can be used to improve forecast performance. This research explores the use of machine learning techniques to improve the ECMWF S2S precipitation forecast, here following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation (WMO). A baseline analysis of the ECMWF S2S precipitation hindcasts (2000–2019) targeting three categories (above normal, near normal and below normal) was performed the ranked probability skill score (RPSS) and the receiver operating characteristic curve (ROC). A regional analysis of a time series was done to group similar (correlated) hydrometeorological time series variables. Three regions were finally selected based on their spatial and temporal correlations. The methodology first replicated the performance of the ECMWF forecast data available and used it as a reference for the experiments (baseline analysis). Two approaches were followed to build categorical classification correction models: (1) using ML and (2) using a committee model. The aim of both was to correct the categorical classifications (above normal, near normal and below normal) of the ECMWF S2S precipitation forecast. In the first approach, the ensemble mean was used as the input, and five ML techniques were trained and compared: k-nearest neighbours (k-NN), logistic regression (LR), artificial neural network multilayer perceptron (ANN-MLP), random forest (RF) and long–short-term memory (LSTM). Here, we have proposed a gridded spatial and temporal correlation analysis (autocorrelation, cross-correlation and semivariogram) for the input variable selection, allowing us to explore neighbours’ time series and their lags as inputs. These results provided the final data sets that were used for the training and validation of the machine learning models. The total precipitation (tp), two-metre temperature (t2m) and time series with a resolution of 1.5 by 1.5 degrees were the main variables used, and these two variables were provided as the global ECMWF S2S real-time forecasts, ECMWF S2S reforecasts/hindcasts and observation data from the National Oceanic and Atmospheric Administration (Climate Prediction Centre, CPC). The forecasting skills of the ML models were compared against a reference model (ECMWF S2S precipitation hindcasts and climatology) using RPSS, and the results from the first approach showed that LR and MLP were the best ML models in terms of RPSS values. In addition, a positive RPSS value with respect to climatology was obtained using MLP. It is important to highlight that LSTM models performed quite similarly to MLP yet had slightly lower scores overall. In the second approach, the committee model (CM) was used, in which, instead of using one ECMWF hindcast (ensemble mean), the problem is divided into many ANN-MLP models (train each ensemble member independently) that are later combined in a smart ensemble model (trained with LR). The cross-validation and testing of the CMs showed positive RPSS values regarding climatology, which can be interpreted as improved ECMWF on the three climatological regions. In conclusion, ML models have very low—if any—improvement, but by using a CM, the RPSS values are all better than the reference forecast. This study was done only on random samples over three global regions; a more comprehensive study should be performed to explore the whole range of possibilities.

Mohamed Elneel Elshaikh Eltayeb Elbasheer et al.

Status: open (until 29 Dec 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2022-348', Anonymous Referee #1, 30 Nov 2022 reply

Mohamed Elneel Elshaikh Eltayeb Elbasheer et al.

Data sets

S2S AI challenge template Aaron Spring, Andrew Robertson, Florian Pinault, Frederic Vitart, Roc Roskar, and Tasko Olevski

Mohamed Elneel Elshaikh Eltayeb Elbasheer et al.


Total article views: 364 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
304 56 4 364 3 2
  • HTML: 304
  • PDF: 56
  • XML: 4
  • Total: 364
  • BibTeX: 3
  • EndNote: 2
Views and downloads (calculated since 03 Nov 2022)
Cumulative views and downloads (calculated since 03 Nov 2022)

Viewed (geographical distribution)

Total article views: 357 (including HTML, PDF, and XML) Thereof 357 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 30 Nov 2022
Short summary
In this research, we explored the use of machine learning (ML) to improve the ECMWF S2S ensemble precipitation forecast, different approaches were used as exploratory experiments to see which approach is better addressing the improvement of the ensemble probabilistic forecast, as a conclusion of our research, we found that the concept of committee model (CM) is a promising approach that can be further studied and evaluated using a different combination of the state of the art ML techniques.