Machine Learning and Committee Models for Improving ECMWF Subseasonal to Seasonal (S2S) Precipitation Forecast

Elbasheer, Mohamed Elneel Elshaikh Eltayeb; Corzo, Gerald Augusto; Solomatine, Dimitri; Varouchakis, Emmanouil

doi:10.5194/hess-2023-98

Preprints

https://doi.org/10.5194/hess-2023-98

Preprints

24 Apr 2023

| 24 Apr 2023

Status: this preprint was under review for the journal HESS but the revision was not accepted.

Machine Learning and Committee Models for Improving ECMWF Subseasonal to Seasonal (S2S) Precipitation Forecast

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Abstract. The European Centre for Medium-Range Weather Forecasts (ECMWF) provides subseasonal to seasonal (S2S) precipitation forecasts; S2S forecasts extend from two weeks to two months ahead; however, the accuracy of S2S precipitation forecasting is still underdeveloped, and a lot of research and competitions have been proposed to study how machine learning (ML) can be used to improve forecast performance. This research explores the use of machine learning techniques to improve the ECMWF S2S precipitation forecast, here following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation (WMO). A baseline analysis of the ECMWF S2S precipitation hindcasts (2000–2019) targeting three categories (above normal, near normal and below normal) was performed using the ranked probability skill score (RPSS) and the receiver operating characteristic curve (ROC). A regional analysis of a time series was done to group similar (correlated) hydrometeorological time series variables. Three regions were finally selected based on their spatial and temporal correlations. The methodology first replicated the performance of the ECMWF forecast data available and used it as a reference for the experiments (baseline analysis). Two approaches were followed to build categorical classification correction models: (1) using ML and (2) using a committee model. The aim of both was to correct the categorical classifications (above normal, near normal and below normal) of the ECMWF S2S precipitation forecast. In the first approach, the ensemble mean was used as the input, and five ML techniques were trained and compared: k-nearest neighbours (k-NN), logistic regression (LR), artificial neural network multilayer perceptron (ANN-MLP), random forest (RF) and long–short-term memory (LSTM). Here, we have proposed a gridded spatial and temporal correlation analysis (autocorrelation, cross-correlation and semivariogram) for the input variable selection, allowing us to explore neighbours’ time series and their lags as inputs. These results provided the final data sets that were used for the training and validation of the machine learning models. The total precipitation (tp), two-metre temperature (t2m) and time series with a resolution of 1.5 by 1.5 degrees were the main variables used, and these two variables were provided as the global ECMWF S2S real-time forecasts, ECMWF S2S reforecasts/hindcasts and observation data from the National Oceanic and Atmospheric Administration (Climate Prediction Centre, CPC). The forecasting skills of the ML models were compared against a reference model (ECMWF S2S precipitation hindcasts and climatology) using RPSS, and the results from the first approach showed that LR and MLP were the best ML models in terms of RPSS values. In addition, a positive RPSS value with respect to climatology was obtained using MLP. It is important to highlight that LSTM models performed quite similarly to MLP yet had slightly lower scores overall. In the second approach, the committee model (CM) was used, in which, instead of using one ECMWF hindcast (ensemble mean), the problem is divided into many ANN-MLP models (train each ensemble member independently) that are later combined in a smart ensemble model (trained with LR). The cross-validation and testing of the CMs showed positive RPSS values regarding climatology, which can be interpreted as improved ECMWF on the three climatological regions. In conclusion, ML models have very low – if any – improvement, but by using a CM, the RPSS values are all better than the reference forecast. This study was done only on random samples over three global regions; a more comprehensive study should be performed to explore the whole range of possibilities.

Received: 09 Apr 2023 – Discussion started: 24 Apr 2023

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Status: closed

RC1:
'Comment on hess-2023-98', Anonymous Referee #1, 03 Sep 2023

Subseasonal to seasonal (S2S) precipitation forecasts can provide valuable information for hydrological modelling and water resources management. This paper is concentrated on the calibration of raw S2S forecasts by using machine learning and committee models. While the performance of raw forecasts may not be satisfactory, this paper shows that the performance of calibrated forecasts is improved.

Reading through the paper, it is observed that some technical issues are not clearly presented. Therefore, there are five comments for further improvements of the paper.

Firstly, lead time plays an important part in predictive performance (White et al., 2017). Specifically, forecast skill tends to decrease as lead time prolongs. In Figure 10 and 11, the attention is paid to “autocorrelation at different lags”. Meanwhile, the performance at different lead times is missing. The authors may want to clearly illustrate how raw forecasts are correlated with observations by using spatial plots. Examples can be found at the NOAA website (https://repository.library.noaa.gov/view/noaa/22608).

Secondly, seasonality plays an important part in predictive performance (Schepen et al., 2020; Huang et al., 2021). That is, forecast skill can vary by month. Are the forecasts calibrated month by month? Or forecasts across different months are pooled in the analysis? Details must be presented.

Thirdly, climatological forecasts play an important part in benchmarking the predictive performance (Schepen et al., 2020; Huang et al., 2021). Specifically, climatological forecasts refer to some forecasts based solely upon the climatological statistics for a region rather than the dynamical implications of the current conditions and they are often used as a baseline for evaluating the performance of weather and climate forecasts (https://glossary.ametsoc.org/wiki/Climatological_forecast). Meanwhile, it seems that climatological forecasts are not illustrated in the paper.

Fourthly, some figures in this paper are not of high quality (e.g., Figure 2, 10, 11, 13 and 14). The spatial resolution is quite low. Also, there are large margins that are blank.

Fifthly, some figures in this paper are for the sake of conceptual illustration (e.g., Figure 3, 4, 5, 6, 7, 8 and 9). They are adapted from other publications but are not directly related to model developments and results/findings of the paper. Such illustrative figures can either be deleted or moved to the supporting information.

References:
White, C.J., Carlsen, H., Robertson, A.W., Klein, R.J., Lazo, J.K., Kumar, A., Vitart, F., Coughlan de Perez, E., Ray, A.J., Murray, V. and Bharwani, S., 2017. Potential applications of subseasonal‐to‐seasonal (S2S) predictions. Meteorological applications, 24(3), pp.315-325.
Schepen, A., Everingham, Y. and Wang, Q.J., 2020. An improved workflow for calibration and downscaling of GCM climate forecasts for agricultural applications–a case study on prediction of sugarcane yield in Australia. Agricultural and Forest Meteorology, 291, p.107991.
Huang, Z., Zhao, T., Xu, W., Cai, H., Wang, J., Zhang, Y., Liu, Z., Tian, Y., Yan, D. and Chen, X., 2022. A seven-parameter Bernoulli-Gamma-Gaussian model to calibrate subseasonal to seasonal precipitation forecasts. Journal of Hydrology, 610, p.127896.

Citation: https://doi.org/10.5194/hess-2023-98-RC1
- AC1: 'Reply on RC1', Mohamed Elbasheer, 16 Oct 2023
  
  We are delighted to receive these important and constructive comments, we believe it will definitely improve the quality of this paper, and we are very thankful for the time you spent reading the paper.
  Subseasonal to seasonal (S2S) precipitation forecasts can provide valuable information for hydrological modelling and water resources management. This paper is concentrated on the calibration of raw S2S forecasts by using machine learning and committee models. While the performance of raw forecasts may not be satisfactory, this paper shows that the performance of calibrated forecasts is improved.
  Reading through the paper, it is observed that some technical issues are not clearly presented. Therefore, there are five comments for further improvements of the paper.
  Firstly, lead time plays an important part in predictive performance (White et al., 2017). Specifically, forecast skill tends to decrease as lead time prolongs. In Figure 10 and 11, the attention is paid to “autocorrelation at different lags”. Meanwhile, the performance at different lead times is missing. The authors may want to clearly illustrate how raw forecasts are correlated with observations by using spatial plots. Examples can be found at the NOAA website (https://repository.library.noaa.gov/view/noaa/22608).
  The focus of this paper now is to illustrate the performance of different ML techniques on specific lead-times (weak 3&4 and weak 5&6). The illustration of how raw forecasts are correlated with observations by using spatial plots is important, we've already done this step, and we agree with you that showing the results would give more insights to the reader. The corresponding plots were added to the manuscript. The link below shows the high-quality spatial plots:
  https://drive.google.com/file/d/1UmhUKv48DxydvXVwT3Nz7ab8SygKvrQD/view?usp=drive_link
  Secondly, seasonality plays an important part in predictive performance (Schepen et al., 2020; Huang et al., 2021). That is, forecast skill can vary by month. Are the forecasts calibrated month by month? Or forecasts across different months are pooled in the analysis? Details must be presented.
  Thank you for this important information and for the reference, the setup of the data wouldn’t allow for month-by-month calibration, the data is limited per month for the training of the ML model (the precipitation is bi-weekly aggregated). As a result, the months are pooled together to train the model using the hindcast data from 2000-2019.
  Thirdly, climatological forecasts play an important part in benchmarking the predictive performance (Schepen et al., 2020; Huang et al., 2021). Specifically, climatological forecasts refer to some forecasts based solely upon the climatological statistics for a region rather than the dynamical implications of the current conditions and they are often used as a baseline for evaluating the performance of weather and climate forecasts (https://glossary.ametsoc.org/wiki/Climatological_forecast). Meanwhile, it seems that climatological forecasts are not illustrated in the paper.
  Here, the three predicted probabilities (above normal, near normal, and below normal) are compared with a reference climatological value of 1/3.
  Fourthly, some figures in this paper are not of high quality (e.g., Figure 2, 10, 11, 13 and 14). The spatial resolution is quite low. Also, there are large margins that are blank.
  After reviewing how the plots are imported into the manuscript, we found that the final resolution is low, we modified the importing procedure and we were able to provide very high-quality plots. The link below shows the new high-quality figures (we also did this for all the figures in the manuscripts):
  https://drive.google.com/file/d/1PrUhnpOcro_5a2XFdltPr9vhxPnL1xYM/view?usp=drive_link
  Fifthly, some figures in this paper are for the sake of conceptual illustration (e.g., Figure 3, 4, 5, 6, 7, 8 and 9). They are adapted from other publications but are not directly related to model developments and results/findings of the paper. Such illustrative figures can either be deleted or moved to the supporting information.
  Not all the figures, but some of them are only illustrative figures, we will keep only the figures that are directly related to the model development. The illustrative figures will be moved to the appendix.
  
  Citation: https://doi.org/10.5194/hess-2023-98-AC1
RC2:
'Comment on hess-2023-98', Anonymous Referee #2, 05 Sep 2023

Signs of improvement can be detected in the resubmitted paper. However, the paper still has several weaknesses in terms of writing and presentation. My recommendation is that lots of work is needed before the paper can be suitable for publication in Hydrology and Earth System Sciences.

1. The latest progress in S2S forecast correction especially correction by AI is still not adequately addressed. We can’t see the necessity of the study (not only for the competition) and the novelty of the work.

2. The methodology session is not well constructed, please rewrite.

3. In 3.3, all the results are only roughly presented. The authors are recommended to use numbers to describe the results. For example, in 3.3.1 CM does show ‘substantial increase’, ‘very good’ performance compared with raw S2S, but how much (percent) of the improvement is?

4. A comparison between MLP and CM-MLP is needed to show the superiority of the CM model.

5. It would be great if the spatial results are present in the manuscript.

Citation: https://doi.org/10.5194/hess-2023-98-RC2
- AC2:
  'Reply on RC2', Mohamed Elbasheer, 16 Oct 2023
  Thank you so much for re-reading the manuscript, we highly appreciate that, we’re delighted that there are signs of improvements from our last submission. And thank you for the new comments.
  Signs of improvement can be detected in the resubmitted paper. However, the paper still has several weaknesses in terms of writing and presentation. My recommendation is that lots of work is needed before the paper can be suitable for publication in Hydrology and Earth System Sciences.
  The latest progress in S2S forecast correction especially correction by AI is still not adequately addressed. We can’t see the necessity of the study (not only for the competition) and the novelty of the work.
  
  We value our work as a tiny contribution toward a reliable S2S precipitation forecast, which will have a huge societal positive impact, regarding food security, agriculture, and risk mitigation. The novelty of this work lies in the proposal of the use of a CM which mainly promoted the separated training of the S2S ensemble members.
  We believe this paper contains three important messages that in our opinion, deserve to be distributed and shared:
  The committee model results showed substantial improvements in the forecasting skills in comparison with the models in the first approach where the ensemble mean is used as input. An average increase of +68.38% (+0.16 RPSS value with respect to ECMWF) and 111.07% (+0.173 RPSS value with respect to climatology) for cross-validation across the four models. Here, the MLP model results were compared with the CM results (because the MLP model has shown the best performance among the other models). These substantial improvements have encouraged us to test the committee model even further and to think about the dissemination of the results.
  
  The committee model is tested using the 2019 dataset, to check the overfitting and the performance of the model using an unseen dataset. Compared with the cross-validation results from the ANN-MLP model, the testing results of the CM kept showing substantial improvement with an overall average increase of +4.96% (+0.0012 RPSS value with respect to ECMWF) and +53.21% (+0.05 RPSS value with respect to climatology). Just as a reminder, these are only preliminary/experimental results and there is a huge room for improvement here.
  
  The third message/question is whether a machine learning model that is trained using only the ECMWF hindcast would be able to improve the forecasting skills of the real-time ECMWF forecast having known that the number of the ensemble members is 51 for real-time forecast. In our case, the first 11 ensemble members were used as input for the CM.
  
  The methodology session is not well constructed, please rewrite.
  
  The methodology session is reviewed, and the text is updated.
  In 3.3, all the results are only roughly presented. The authors are recommended to use numbers to describe the results. For example, in 3.3.1 CM does show ‘substantial increase’, ‘very good’ performance compared with raw S2S, but how much (percent) of the improvement is?
  
  Thank you for this valuable comment, we reviewed all the results, and numbers are used in comparing the results.
  A comparison between MLP and CM-MLP is needed to show the superiority of the CM model.
  
  The comparison is done (considering comment no 3) and the text is updated.
  
  It would be great if the spatial results are present in the manuscript.
  
  For now, we can only provide the average per region RPSS values, to provide the spatial results, we need to modify and run the codes again and that would consume a lot of computational resources. We highly appreciate this comment.
  
  Citation: https://doi.org/10.5194/hess-2023-98-AC2
RC3:
'Comment on hess-2023-98', Anonymous Referee #3, 05 Sep 2023

Overall comment:
This paper builds categorical classification correction models to correct the ECMWF S2S precipitation forecast by adopting machine learning and committee models. The result may be meaningful and practical to management decisions regarding food security, agriculture, and risk mitigation.
However, the methodology is routing and the work is not innovative enough, although it considers spatiotemporal information and machine learning algorithms to construct the S2S correction model. In addition, there is still a lot large space for improvement, especially since it is not a scientific paper.
As such, I don’t recommend the publication of this manuscript in Hydrology and Earth System Sciences in its present form.
Major comments:
(1) This paper adopts the traditional way to post-process the S2S precipitation forecast by machine learning algorithm, and the machine learning algorithm is not innovative. Besides, the committee model is essentially the same as the stacking integration algorithm, and the stacking integration algorithm may be better than the committee model. On the other hand, S2S was considered a difficult time range for weather forecasting, being both too long for much memory of the atmospheric initial conditions and too short for SST anomalies to be felt sufficiently strongly, making it difficult to beat persistence. Therefore, it is not enough to only consider early precipitation and temperature as model inputs, but also to consider the Madden-Julian Oscillation (MJO), ocean conditions, soil moisture, and so on.
(2) The abstract should have been brief and logical. However, the abstract is redundant, it is recommended to only introduce the core content to make it more logical. For example, there is no need to spend 13 lines introducing the first approach, which reduces the correlation between the two approaches.
(3) In the introduction, it is recommended to remove the methods used in the top three submissions for the competition (lines 83-88), which are not related to the content of the introduction. In addition, it is recommended to explore the state-of-the-art in using machine learning techniques for improving S2S precipitation forecasts and then introduce the innovative aspects of the proposed method compared to existing machine learning methods.
(4) In the methodology, the ECMWF extended-range forecast provided 100 ensemble members for real-time forecasting, so the corresponding content should be modified (lines 237, 465)
(5) In the methodology, the structure of the separate MLP models was the same as the structure of the MLP models used for models (1), (2), (3) and (4). It's not clear what the models represent. Therefore, it is recommended to add Table 1 to the appropriate location in the methodology.
(6) In the results and discussion, the content of section 3.2 can be transferred to section 2.3.
(7) In the results and discussion, the authors compare the performance of different ML methods on ECWMF and climatology using the CRSS metric, but I don't understand what data climatology represents.
(8) In the results and discussion, the cross-correlation results showed a very low correlation between the total precipitation and the two-metre temperature in most of the areas for different lag times, so why did the authors choose temperature as the input for the models?
(9) In the results and discussion, the authors only briefly described the results. I don’t see any further in-depth analysis. For example, models 2, 3, and 4 adopt different spatial information, what conclusions can be drawn from the differences in their results. Unfortunately, similar conclusions have not been seen in the results and discussion. It is recommended that some of the conclusions be added to the results and discussion.
(10) Consistent with the opinions of previous reviewers, conclusions are those of a technical report, with no attempt at explaining what it brings to the state of the art.
Others:
(1) Line 52, there are 12 numerical weather prediction (NWP) centres that contribute to the S2S project database.
(2) Line 106, “k-NN” should be “K-NN”.
(3) Line 214, “Figure (24)” does not exist.
(4) Lines 262-263, “𝑅𝑃𝑆𝐸_{E𝐶𝑀𝑊𝐹}_{𝑏𝑒𝑛𝑐}_ℎ_𝑚𝑎𝑟” is inconsistent with the formula in the paper.

Citation: https://doi.org/10.5194/hess-2023-98-RC3
- AC3: 'Reply on RC3', Mohamed Elbasheer, 16 Oct 2023
  
  Anonymous Referee #3:
  Thank you so much for your valuable comments, we appreciate the time you spent on this manuscript.
  Major comments:
  (1) This paper adopts the traditional way to post-process the S2S precipitation forecast by machine learning algorithm, and the machine learning algorithm is not innovative. Besides, the committee model is essentially the same as the stacking integration algorithm, and the stacking integration algorithm may be better than the committee model. On the other hand, S2S was considered a difficult time range for weather forecasting, being both too long for much memory of the atmospheric initial conditions and too short for SST anomalies to be felt sufficiently strongly, making it difficult to beat persistence. Therefore, it is not enough to only consider early precipitation and temperature as model inputs, but also to consider the Madden-Julian Oscillation (MJO), ocean conditions, soil moisture, and so on.
  What we are trying to contribute here is not an innovative model but a small contribution which based on our work showed good results, which the idea of the separate training of the ensemble members using the committee model technique which is in essence the same as the stacking integration algorithm, the committee model technique is not new or innovative, however, adopting the technique in this case has showed substantial improvement in the forecasting skills, which lead to our main recommendation for the separate training of the ensemble members. We thought that it would be good if we disseminated these findings as a contribution toward the overall objective of realizing a reliable S2S precipitation forecast.
  The committee model results showed substantial improvements in the forecasting skills in comparison with the models in the first approach where the ensemble mean is used as input. An average increase of +68.38% (+0.16 RPSS value with respect to ECMWF) and 111.07% (+0.173 RPSS value with respect to climatology) for cross-validation across the four models. Here, the MLP model results were compared with the CM results (because the MLP model has shown the best performance among the other models). These substantial improvements have encouraged us to test the committee model even further and to think about the dissemination of the results.
  The committee model is tested using the 2019 dataset, to check the overfitting and the performance of the model using an unseen dataset. Compared with the cross-validation results from the ANN-MLP model, the testing results of the CM kept showing substantial improvement with an overall average increase of +4.96% (+0.0012 RPSS value with respect to ECMWF) and +53.21% (+0.05 RPSS value with respect to climatology). Just as a reminder, these are only preliminary/experimental results and there is a huge room for improvement here.
  We didn’t add more variables because the main analysis focuses on the comparison between different model setups and their performances rather than the best ML model for S2S correction, and based on the results we’re showing, we think that by adding a more relevant source of predictability such as MJO and soil moisture and by adopting the CM, better results will be obtained.
  (2) The abstract should have been brief and logical. However, the abstract is redundant, it is recommended to only introduce the core content to make it more logical. For example, there is no need to spend 13 lines introducing the first approach, which reduces the correlation between the two approaches.
  Good point, we modified the abstract accordingly.
  Abstract. The European Centre for Medium-Range Weather Forecasts (ECMWF) provides subseasonal to seasonal (S2S) precipitation forecasts; S2S forecasts extend from two weeks to two months ahead; however, the accuracy of S2S precipitation forecasting is still underdeveloped, and a lot of research and competitions have been proposed to study how machine learning (ML) can be used to improve forecast performance. This research explores the use of machine learning techniques to improve the ECMWF S2S precipitation forecast, following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation (WMO). A baseline analysis of the ECMWF S2S precipitation hindcasts (2000–2019) targeting three categories (above normal, near normal and below normal) was performed using the ranked probability skill score (RPSS) and the receiver operating characteristic curve (ROC) over three globally selected regions. The assessment regions were selected based on a regional analysis that was done to group similar (correlated) hydrometeorological time series variables, and three regions were selected based on their spatial and temporal correlations. Here, we have proposed a gridded spatial and temporal correlation analysis (autocorrelation, cross-correlation and semivariogram), allowing us to explore neighbours’ time series and their lags. ECMWF ensemble bi-weekly hindcasts/forecasts and National Oceanic and Atmospheric Administration (Climate Prediction Centre, CPC) bi-weekly observations with a resolution of 1.5 by 1.5 degrees were used, the total precipitation (tp) and the two-metre temperature (t2m) were the main variables used in our analysis. The spatio-temporal correlation analysis was also used to prepare four input datasets for the ML-based correction models. Two approaches were followed to build ML-based categorical classification correction models: (1) using different ML algorithms and (2) using the committee model technique. The aim of both was to correct the categorical classifications (above normal, near normal and below normal) of the ECMWF S2S precipitation forecast. The forecasting skills of both model approaches were compared against a reference model (ECMWF S2S precipitation hindcasts and climatology) using RPSS. In the first approach, the ensemble mean was used as the input, and five ML techniques were trained and compared: k-nearest neighbours (K-NN), logistic regression (LR), artificial neural network multilayer perceptron (ANN-MLP), random forest (RF) and long–short-term memory (LSTM). In the second approach, the committee model technique (CM) was used, in which, instead of using one ECMWF hindcast (ensemble mean), the problem is divided into many ANN-MLP models (train each ensemble member independently) that are later combined in an ensemble model (trained with LR). The cross-validation results from the first approach showed that the ANN-MLP was the best model with an overall average RPSS value of +0.1135 and -0.014 with respect to the ECMWF baseline and climatology respectively, highlighting that the highest RPSS value with respect to climatology is +0.0057 in region two using the third input dataset. The cross-validation results following the second approach show substantial improvements in the forecasting skills as the overall average RPSS values are +0.27 and +0.16 with respect to the ECMWF baseline and climatology respectively. The testing results of the CM also showed an improvement in the forecasting skills with an overall average RPSS value of +0.115 and +0.04 with respect to the ECMWF baseline and climatology respectively. Comparing the two approaches, the second approach which adopts the use of the committee model technique, has shown very interesting results, and it is expected that by adding more relevant sources of predictability such as the Madden–Julian oscillation (MJO) and soil moisture and by adopting the CM, better results will be obtained. Emphasizing that this study was done only on random samples over three global regions and a more comprehensive study should be performed to explore the whole range of possibilities.
  (3) In the introduction, it is recommended to remove the methods used in the top three submissions for the competition (lines 83-88), which are not related to the content of the introduction. In addition, it is recommended to explore the state-of-the-art in using machine learning techniques for improving S2S precipitation forecasts and then introduce the innovative aspects of the proposed method compared to existing machine learning methods.
  Thank you for the comment, we appreciate it, the elaboration is removed.
  (4) In the methodology, the ECMWF extended-range forecast provided 100 ensemble members for real-time forecasting, so the corresponding content should be modified (lines 237, 465)
  Thank you for the comment, this is one of the major improvements that were recently (end of June/2023) introduced regarding the extended-range ensemble forecasts, the number of ensemble members has been increased from 51 to 101, and they are now run daily instead of twice weekly. Our work was based on the previous ensemble setup of 51 ensemble members. This wouldn’t affect the content since our analysis mostly was based on the (2000-2019) ensemble ECMWF hindcasts.
  (5) In the methodology, the structure of the separate MLP models was the same as the structure of the MLP models used for models (1), (2), (3) and (4). It's not clear what the models represent. Therefore, it is recommended to add Table 1 to the appropriate location in the methodology.
  That is true. Table 1 is added to the appropriate location in the methodology.
  (6) In the results and discussion, the content of section 3.2 can be transferred to section 2.3.
  Done. The content of section 3.2 is transferred to section 2.3.
  (7) In the results and discussion, the authors compare the performance of different ML methods on ECWMF and climatology using the CRSS metric, but I don't understand what data climatology represents.
  Here, the three predicted probabilities (above normal, near normal, and below normal) are compared with a reference climatological value of 1/3.
  (8) In the results and discussion, the cross-correlation results showed a very low correlation between the total precipitation and the two-metre temperature in most of the areas for different lag times, so why did the authors choose temperature as the input for the models?
  Yes, that is true, the cross-validation results, were lower for different lag times, as a result, we only used one two-metre temperature (t2m) value, which corresponds to the same time-step as the precipitation. the regional analysis shows that for the first two regions, the t2m correlation values range between 0.4 and 1, and it is lower than 0.4 for the third region, based on these results and the aim for the generalization of the model over the three regions, we decided to the add the t2m as input for all the models.
  (9) In the results and discussion, the authors only briefly described the results. I don’t see any further in-depth analysis. For example, models 2, 3, and 4 adopt different spatial information, what conclusions can be drawn from the differences in their results. Unfortunately, similar conclusions have not been seen in the results and discussion. It is recommended that some of the conclusions be added to the results and discussion.
  Well noted, we elaborated more and some of the conclusions were added to the results and discussion section.
  (10) Consistent with the opinions of previous reviewers, conclusions are those of a technical report, with no attempt at explaining what it brings to the state of the art.
  Understood, we elaborated more on explaining what it brings to the state of the art.
  Others:
  (1) Line 52, there are 12 numerical weather prediction (NWP) centres that contribute to the S2S project database.
  The text is modified.
  (2) Line 106, “k-NN” should be “K-NN”.
  The text is modified.
  (3) Line 214, “Figure (24)” does not exist.
  The text is modified. It is replaced with Figure 5.
  (4) Lines 262-263, “𝑅𝑃𝑆𝐸_E_{𝐶𝑀𝑊𝐹𝑏𝑒𝑛𝑐ℎ𝑚𝑎𝑟}” is inconsistent with the formula in the paper.
  The text is modified. It is replaced with 𝑅𝑃𝑆𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑚𝑜𝑑𝑒𝑙
  
  Citation: https://doi.org/10.5194/hess-2023-98-AC3

Status: closed

RC1:
'Comment on hess-2023-98', Anonymous Referee #1, 03 Sep 2023

Subseasonal to seasonal (S2S) precipitation forecasts can provide valuable information for hydrological modelling and water resources management. This paper is concentrated on the calibration of raw S2S forecasts by using machine learning and committee models. While the performance of raw forecasts may not be satisfactory, this paper shows that the performance of calibrated forecasts is improved.

Reading through the paper, it is observed that some technical issues are not clearly presented. Therefore, there are five comments for further improvements of the paper.

Firstly, lead time plays an important part in predictive performance (White et al., 2017). Specifically, forecast skill tends to decrease as lead time prolongs. In Figure 10 and 11, the attention is paid to “autocorrelation at different lags”. Meanwhile, the performance at different lead times is missing. The authors may want to clearly illustrate how raw forecasts are correlated with observations by using spatial plots. Examples can be found at the NOAA website (https://repository.library.noaa.gov/view/noaa/22608).

Secondly, seasonality plays an important part in predictive performance (Schepen et al., 2020; Huang et al., 2021). That is, forecast skill can vary by month. Are the forecasts calibrated month by month? Or forecasts across different months are pooled in the analysis? Details must be presented.

Thirdly, climatological forecasts play an important part in benchmarking the predictive performance (Schepen et al., 2020; Huang et al., 2021). Specifically, climatological forecasts refer to some forecasts based solely upon the climatological statistics for a region rather than the dynamical implications of the current conditions and they are often used as a baseline for evaluating the performance of weather and climate forecasts (https://glossary.ametsoc.org/wiki/Climatological_forecast). Meanwhile, it seems that climatological forecasts are not illustrated in the paper.

Fourthly, some figures in this paper are not of high quality (e.g., Figure 2, 10, 11, 13 and 14). The spatial resolution is quite low. Also, there are large margins that are blank.

Fifthly, some figures in this paper are for the sake of conceptual illustration (e.g., Figure 3, 4, 5, 6, 7, 8 and 9). They are adapted from other publications but are not directly related to model developments and results/findings of the paper. Such illustrative figures can either be deleted or moved to the supporting information.

References:
White, C.J., Carlsen, H., Robertson, A.W., Klein, R.J., Lazo, J.K., Kumar, A., Vitart, F., Coughlan de Perez, E., Ray, A.J., Murray, V. and Bharwani, S., 2017. Potential applications of subseasonal‐to‐seasonal (S2S) predictions. Meteorological applications, 24(3), pp.315-325.
Schepen, A., Everingham, Y. and Wang, Q.J., 2020. An improved workflow for calibration and downscaling of GCM climate forecasts for agricultural applications–a case study on prediction of sugarcane yield in Australia. Agricultural and Forest Meteorology, 291, p.107991.
Huang, Z., Zhao, T., Xu, W., Cai, H., Wang, J., Zhang, Y., Liu, Z., Tian, Y., Yan, D. and Chen, X., 2022. A seven-parameter Bernoulli-Gamma-Gaussian model to calibrate subseasonal to seasonal precipitation forecasts. Journal of Hydrology, 610, p.127896.

Citation: https://doi.org/10.5194/hess-2023-98-RC1
- AC1: 'Reply on RC1', Mohamed Elbasheer, 16 Oct 2023
  
  We are delighted to receive these important and constructive comments, we believe it will definitely improve the quality of this paper, and we are very thankful for the time you spent reading the paper.
  Subseasonal to seasonal (S2S) precipitation forecasts can provide valuable information for hydrological modelling and water resources management. This paper is concentrated on the calibration of raw S2S forecasts by using machine learning and committee models. While the performance of raw forecasts may not be satisfactory, this paper shows that the performance of calibrated forecasts is improved.
  Reading through the paper, it is observed that some technical issues are not clearly presented. Therefore, there are five comments for further improvements of the paper.
  Firstly, lead time plays an important part in predictive performance (White et al., 2017). Specifically, forecast skill tends to decrease as lead time prolongs. In Figure 10 and 11, the attention is paid to “autocorrelation at different lags”. Meanwhile, the performance at different lead times is missing. The authors may want to clearly illustrate how raw forecasts are correlated with observations by using spatial plots. Examples can be found at the NOAA website (https://repository.library.noaa.gov/view/noaa/22608).
  The focus of this paper now is to illustrate the performance of different ML techniques on specific lead-times (weak 3&4 and weak 5&6). The illustration of how raw forecasts are correlated with observations by using spatial plots is important, we've already done this step, and we agree with you that showing the results would give more insights to the reader. The corresponding plots were added to the manuscript. The link below shows the high-quality spatial plots:
  https://drive.google.com/file/d/1UmhUKv48DxydvXVwT3Nz7ab8SygKvrQD/view?usp=drive_link
  Secondly, seasonality plays an important part in predictive performance (Schepen et al., 2020; Huang et al., 2021). That is, forecast skill can vary by month. Are the forecasts calibrated month by month? Or forecasts across different months are pooled in the analysis? Details must be presented.
  Thank you for this important information and for the reference, the setup of the data wouldn’t allow for month-by-month calibration, the data is limited per month for the training of the ML model (the precipitation is bi-weekly aggregated). As a result, the months are pooled together to train the model using the hindcast data from 2000-2019.
  Thirdly, climatological forecasts play an important part in benchmarking the predictive performance (Schepen et al., 2020; Huang et al., 2021). Specifically, climatological forecasts refer to some forecasts based solely upon the climatological statistics for a region rather than the dynamical implications of the current conditions and they are often used as a baseline for evaluating the performance of weather and climate forecasts (https://glossary.ametsoc.org/wiki/Climatological_forecast). Meanwhile, it seems that climatological forecasts are not illustrated in the paper.
  Here, the three predicted probabilities (above normal, near normal, and below normal) are compared with a reference climatological value of 1/3.
  Fourthly, some figures in this paper are not of high quality (e.g., Figure 2, 10, 11, 13 and 14). The spatial resolution is quite low. Also, there are large margins that are blank.
  After reviewing how the plots are imported into the manuscript, we found that the final resolution is low, we modified the importing procedure and we were able to provide very high-quality plots. The link below shows the new high-quality figures (we also did this for all the figures in the manuscripts):
  https://drive.google.com/file/d/1PrUhnpOcro_5a2XFdltPr9vhxPnL1xYM/view?usp=drive_link
  Fifthly, some figures in this paper are for the sake of conceptual illustration (e.g., Figure 3, 4, 5, 6, 7, 8 and 9). They are adapted from other publications but are not directly related to model developments and results/findings of the paper. Such illustrative figures can either be deleted or moved to the supporting information.
  Not all the figures, but some of them are only illustrative figures, we will keep only the figures that are directly related to the model development. The illustrative figures will be moved to the appendix.
  
  Citation: https://doi.org/10.5194/hess-2023-98-AC1
RC2:
'Comment on hess-2023-98', Anonymous Referee #2, 05 Sep 2023

Signs of improvement can be detected in the resubmitted paper. However, the paper still has several weaknesses in terms of writing and presentation. My recommendation is that lots of work is needed before the paper can be suitable for publication in Hydrology and Earth System Sciences.

1. The latest progress in S2S forecast correction especially correction by AI is still not adequately addressed. We can’t see the necessity of the study (not only for the competition) and the novelty of the work.

2. The methodology session is not well constructed, please rewrite.

3. In 3.3, all the results are only roughly presented. The authors are recommended to use numbers to describe the results. For example, in 3.3.1 CM does show ‘substantial increase’, ‘very good’ performance compared with raw S2S, but how much (percent) of the improvement is?

4. A comparison between MLP and CM-MLP is needed to show the superiority of the CM model.

5. It would be great if the spatial results are present in the manuscript.

Citation: https://doi.org/10.5194/hess-2023-98-RC2
- AC2:
  'Reply on RC2', Mohamed Elbasheer, 16 Oct 2023
  Thank you so much for re-reading the manuscript, we highly appreciate that, we’re delighted that there are signs of improvements from our last submission. And thank you for the new comments.
  Signs of improvement can be detected in the resubmitted paper. However, the paper still has several weaknesses in terms of writing and presentation. My recommendation is that lots of work is needed before the paper can be suitable for publication in Hydrology and Earth System Sciences.
  The latest progress in S2S forecast correction especially correction by AI is still not adequately addressed. We can’t see the necessity of the study (not only for the competition) and the novelty of the work.
  
  We value our work as a tiny contribution toward a reliable S2S precipitation forecast, which will have a huge societal positive impact, regarding food security, agriculture, and risk mitigation. The novelty of this work lies in the proposal of the use of a CM which mainly promoted the separated training of the S2S ensemble members.
  We believe this paper contains three important messages that in our opinion, deserve to be distributed and shared:
  The committee model results showed substantial improvements in the forecasting skills in comparison with the models in the first approach where the ensemble mean is used as input. An average increase of +68.38% (+0.16 RPSS value with respect to ECMWF) and 111.07% (+0.173 RPSS value with respect to climatology) for cross-validation across the four models. Here, the MLP model results were compared with the CM results (because the MLP model has shown the best performance among the other models). These substantial improvements have encouraged us to test the committee model even further and to think about the dissemination of the results.
  
  The committee model is tested using the 2019 dataset, to check the overfitting and the performance of the model using an unseen dataset. Compared with the cross-validation results from the ANN-MLP model, the testing results of the CM kept showing substantial improvement with an overall average increase of +4.96% (+0.0012 RPSS value with respect to ECMWF) and +53.21% (+0.05 RPSS value with respect to climatology). Just as a reminder, these are only preliminary/experimental results and there is a huge room for improvement here.
  
  The third message/question is whether a machine learning model that is trained using only the ECMWF hindcast would be able to improve the forecasting skills of the real-time ECMWF forecast having known that the number of the ensemble members is 51 for real-time forecast. In our case, the first 11 ensemble members were used as input for the CM.
  
  The methodology session is not well constructed, please rewrite.
  
  The methodology session is reviewed, and the text is updated.
  In 3.3, all the results are only roughly presented. The authors are recommended to use numbers to describe the results. For example, in 3.3.1 CM does show ‘substantial increase’, ‘very good’ performance compared with raw S2S, but how much (percent) of the improvement is?
  
  Thank you for this valuable comment, we reviewed all the results, and numbers are used in comparing the results.
  A comparison between MLP and CM-MLP is needed to show the superiority of the CM model.
  
  The comparison is done (considering comment no 3) and the text is updated.
  
  It would be great if the spatial results are present in the manuscript.
  
  For now, we can only provide the average per region RPSS values, to provide the spatial results, we need to modify and run the codes again and that would consume a lot of computational resources. We highly appreciate this comment.
  
  Citation: https://doi.org/10.5194/hess-2023-98-AC2
RC3:
'Comment on hess-2023-98', Anonymous Referee #3, 05 Sep 2023

Overall comment:
This paper builds categorical classification correction models to correct the ECMWF S2S precipitation forecast by adopting machine learning and committee models. The result may be meaningful and practical to management decisions regarding food security, agriculture, and risk mitigation.
However, the methodology is routing and the work is not innovative enough, although it considers spatiotemporal information and machine learning algorithms to construct the S2S correction model. In addition, there is still a lot large space for improvement, especially since it is not a scientific paper.
As such, I don’t recommend the publication of this manuscript in Hydrology and Earth System Sciences in its present form.
Major comments:
(1) This paper adopts the traditional way to post-process the S2S precipitation forecast by machine learning algorithm, and the machine learning algorithm is not innovative. Besides, the committee model is essentially the same as the stacking integration algorithm, and the stacking integration algorithm may be better than the committee model. On the other hand, S2S was considered a difficult time range for weather forecasting, being both too long for much memory of the atmospheric initial conditions and too short for SST anomalies to be felt sufficiently strongly, making it difficult to beat persistence. Therefore, it is not enough to only consider early precipitation and temperature as model inputs, but also to consider the Madden-Julian Oscillation (MJO), ocean conditions, soil moisture, and so on.
(2) The abstract should have been brief and logical. However, the abstract is redundant, it is recommended to only introduce the core content to make it more logical. For example, there is no need to spend 13 lines introducing the first approach, which reduces the correlation between the two approaches.
(3) In the introduction, it is recommended to remove the methods used in the top three submissions for the competition (lines 83-88), which are not related to the content of the introduction. In addition, it is recommended to explore the state-of-the-art in using machine learning techniques for improving S2S precipitation forecasts and then introduce the innovative aspects of the proposed method compared to existing machine learning methods.
(4) In the methodology, the ECMWF extended-range forecast provided 100 ensemble members for real-time forecasting, so the corresponding content should be modified (lines 237, 465)
(5) In the methodology, the structure of the separate MLP models was the same as the structure of the MLP models used for models (1), (2), (3) and (4). It's not clear what the models represent. Therefore, it is recommended to add Table 1 to the appropriate location in the methodology.
(6) In the results and discussion, the content of section 3.2 can be transferred to section 2.3.
(7) In the results and discussion, the authors compare the performance of different ML methods on ECWMF and climatology using the CRSS metric, but I don't understand what data climatology represents.
(8) In the results and discussion, the cross-correlation results showed a very low correlation between the total precipitation and the two-metre temperature in most of the areas for different lag times, so why did the authors choose temperature as the input for the models?
(9) In the results and discussion, the authors only briefly described the results. I don’t see any further in-depth analysis. For example, models 2, 3, and 4 adopt different spatial information, what conclusions can be drawn from the differences in their results. Unfortunately, similar conclusions have not been seen in the results and discussion. It is recommended that some of the conclusions be added to the results and discussion.
(10) Consistent with the opinions of previous reviewers, conclusions are those of a technical report, with no attempt at explaining what it brings to the state of the art.
Others:
(1) Line 52, there are 12 numerical weather prediction (NWP) centres that contribute to the S2S project database.
(2) Line 106, “k-NN” should be “K-NN”.
(3) Line 214, “Figure (24)” does not exist.
(4) Lines 262-263, “𝑅𝑃𝑆𝐸_{E𝐶𝑀𝑊𝐹}_{𝑏𝑒𝑛𝑐}_ℎ_𝑚𝑎𝑟” is inconsistent with the formula in the paper.

Citation: https://doi.org/10.5194/hess-2023-98-RC3
- AC3: 'Reply on RC3', Mohamed Elbasheer, 16 Oct 2023
  
  Anonymous Referee #3:
  Thank you so much for your valuable comments, we appreciate the time you spent on this manuscript.
  Major comments:
  (1) This paper adopts the traditional way to post-process the S2S precipitation forecast by machine learning algorithm, and the machine learning algorithm is not innovative. Besides, the committee model is essentially the same as the stacking integration algorithm, and the stacking integration algorithm may be better than the committee model. On the other hand, S2S was considered a difficult time range for weather forecasting, being both too long for much memory of the atmospheric initial conditions and too short for SST anomalies to be felt sufficiently strongly, making it difficult to beat persistence. Therefore, it is not enough to only consider early precipitation and temperature as model inputs, but also to consider the Madden-Julian Oscillation (MJO), ocean conditions, soil moisture, and so on.
  What we are trying to contribute here is not an innovative model but a small contribution which based on our work showed good results, which the idea of the separate training of the ensemble members using the committee model technique which is in essence the same as the stacking integration algorithm, the committee model technique is not new or innovative, however, adopting the technique in this case has showed substantial improvement in the forecasting skills, which lead to our main recommendation for the separate training of the ensemble members. We thought that it would be good if we disseminated these findings as a contribution toward the overall objective of realizing a reliable S2S precipitation forecast.
  The committee model results showed substantial improvements in the forecasting skills in comparison with the models in the first approach where the ensemble mean is used as input. An average increase of +68.38% (+0.16 RPSS value with respect to ECMWF) and 111.07% (+0.173 RPSS value with respect to climatology) for cross-validation across the four models. Here, the MLP model results were compared with the CM results (because the MLP model has shown the best performance among the other models). These substantial improvements have encouraged us to test the committee model even further and to think about the dissemination of the results.
  The committee model is tested using the 2019 dataset, to check the overfitting and the performance of the model using an unseen dataset. Compared with the cross-validation results from the ANN-MLP model, the testing results of the CM kept showing substantial improvement with an overall average increase of +4.96% (+0.0012 RPSS value with respect to ECMWF) and +53.21% (+0.05 RPSS value with respect to climatology). Just as a reminder, these are only preliminary/experimental results and there is a huge room for improvement here.
  We didn’t add more variables because the main analysis focuses on the comparison between different model setups and their performances rather than the best ML model for S2S correction, and based on the results we’re showing, we think that by adding a more relevant source of predictability such as MJO and soil moisture and by adopting the CM, better results will be obtained.
  (2) The abstract should have been brief and logical. However, the abstract is redundant, it is recommended to only introduce the core content to make it more logical. For example, there is no need to spend 13 lines introducing the first approach, which reduces the correlation between the two approaches.
  Good point, we modified the abstract accordingly.
  Abstract. The European Centre for Medium-Range Weather Forecasts (ECMWF) provides subseasonal to seasonal (S2S) precipitation forecasts; S2S forecasts extend from two weeks to two months ahead; however, the accuracy of S2S precipitation forecasting is still underdeveloped, and a lot of research and competitions have been proposed to study how machine learning (ML) can be used to improve forecast performance. This research explores the use of machine learning techniques to improve the ECMWF S2S precipitation forecast, following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation (WMO). A baseline analysis of the ECMWF S2S precipitation hindcasts (2000–2019) targeting three categories (above normal, near normal and below normal) was performed using the ranked probability skill score (RPSS) and the receiver operating characteristic curve (ROC) over three globally selected regions. The assessment regions were selected based on a regional analysis that was done to group similar (correlated) hydrometeorological time series variables, and three regions were selected based on their spatial and temporal correlations. Here, we have proposed a gridded spatial and temporal correlation analysis (autocorrelation, cross-correlation and semivariogram), allowing us to explore neighbours’ time series and their lags. ECMWF ensemble bi-weekly hindcasts/forecasts and National Oceanic and Atmospheric Administration (Climate Prediction Centre, CPC) bi-weekly observations with a resolution of 1.5 by 1.5 degrees were used, the total precipitation (tp) and the two-metre temperature (t2m) were the main variables used in our analysis. The spatio-temporal correlation analysis was also used to prepare four input datasets for the ML-based correction models. Two approaches were followed to build ML-based categorical classification correction models: (1) using different ML algorithms and (2) using the committee model technique. The aim of both was to correct the categorical classifications (above normal, near normal and below normal) of the ECMWF S2S precipitation forecast. The forecasting skills of both model approaches were compared against a reference model (ECMWF S2S precipitation hindcasts and climatology) using RPSS. In the first approach, the ensemble mean was used as the input, and five ML techniques were trained and compared: k-nearest neighbours (K-NN), logistic regression (LR), artificial neural network multilayer perceptron (ANN-MLP), random forest (RF) and long–short-term memory (LSTM). In the second approach, the committee model technique (CM) was used, in which, instead of using one ECMWF hindcast (ensemble mean), the problem is divided into many ANN-MLP models (train each ensemble member independently) that are later combined in an ensemble model (trained with LR). The cross-validation results from the first approach showed that the ANN-MLP was the best model with an overall average RPSS value of +0.1135 and -0.014 with respect to the ECMWF baseline and climatology respectively, highlighting that the highest RPSS value with respect to climatology is +0.0057 in region two using the third input dataset. The cross-validation results following the second approach show substantial improvements in the forecasting skills as the overall average RPSS values are +0.27 and +0.16 with respect to the ECMWF baseline and climatology respectively. The testing results of the CM also showed an improvement in the forecasting skills with an overall average RPSS value of +0.115 and +0.04 with respect to the ECMWF baseline and climatology respectively. Comparing the two approaches, the second approach which adopts the use of the committee model technique, has shown very interesting results, and it is expected that by adding more relevant sources of predictability such as the Madden–Julian oscillation (MJO) and soil moisture and by adopting the CM, better results will be obtained. Emphasizing that this study was done only on random samples over three global regions and a more comprehensive study should be performed to explore the whole range of possibilities.
  (3) In the introduction, it is recommended to remove the methods used in the top three submissions for the competition (lines 83-88), which are not related to the content of the introduction. In addition, it is recommended to explore the state-of-the-art in using machine learning techniques for improving S2S precipitation forecasts and then introduce the innovative aspects of the proposed method compared to existing machine learning methods.
  Thank you for the comment, we appreciate it, the elaboration is removed.
  (4) In the methodology, the ECMWF extended-range forecast provided 100 ensemble members for real-time forecasting, so the corresponding content should be modified (lines 237, 465)
  Thank you for the comment, this is one of the major improvements that were recently (end of June/2023) introduced regarding the extended-range ensemble forecasts, the number of ensemble members has been increased from 51 to 101, and they are now run daily instead of twice weekly. Our work was based on the previous ensemble setup of 51 ensemble members. This wouldn’t affect the content since our analysis mostly was based on the (2000-2019) ensemble ECMWF hindcasts.
  (5) In the methodology, the structure of the separate MLP models was the same as the structure of the MLP models used for models (1), (2), (3) and (4). It's not clear what the models represent. Therefore, it is recommended to add Table 1 to the appropriate location in the methodology.
  That is true. Table 1 is added to the appropriate location in the methodology.
  (6) In the results and discussion, the content of section 3.2 can be transferred to section 2.3.
  Done. The content of section 3.2 is transferred to section 2.3.
  (7) In the results and discussion, the authors compare the performance of different ML methods on ECWMF and climatology using the CRSS metric, but I don't understand what data climatology represents.
  Here, the three predicted probabilities (above normal, near normal, and below normal) are compared with a reference climatological value of 1/3.
  (8) In the results and discussion, the cross-correlation results showed a very low correlation between the total precipitation and the two-metre temperature in most of the areas for different lag times, so why did the authors choose temperature as the input for the models?
  Yes, that is true, the cross-validation results, were lower for different lag times, as a result, we only used one two-metre temperature (t2m) value, which corresponds to the same time-step as the precipitation. the regional analysis shows that for the first two regions, the t2m correlation values range between 0.4 and 1, and it is lower than 0.4 for the third region, based on these results and the aim for the generalization of the model over the three regions, we decided to the add the t2m as input for all the models.
  (9) In the results and discussion, the authors only briefly described the results. I don’t see any further in-depth analysis. For example, models 2, 3, and 4 adopt different spatial information, what conclusions can be drawn from the differences in their results. Unfortunately, similar conclusions have not been seen in the results and discussion. It is recommended that some of the conclusions be added to the results and discussion.
  Well noted, we elaborated more and some of the conclusions were added to the results and discussion section.
  (10) Consistent with the opinions of previous reviewers, conclusions are those of a technical report, with no attempt at explaining what it brings to the state of the art.
  Understood, we elaborated more on explaining what it brings to the state of the art.
  Others:
  (1) Line 52, there are 12 numerical weather prediction (NWP) centres that contribute to the S2S project database.
  The text is modified.
  (2) Line 106, “k-NN” should be “K-NN”.
  The text is modified.
  (3) Line 214, “Figure (24)” does not exist.
  The text is modified. It is replaced with Figure 5.
  (4) Lines 262-263, “𝑅𝑃𝑆𝐸_E_{𝐶𝑀𝑊𝐹𝑏𝑒𝑛𝑐ℎ𝑚𝑎𝑟}” is inconsistent with the formula in the paper.
  The text is modified. It is replaced with 𝑅𝑃𝑆𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑚𝑜𝑑𝑒𝑙
  
  Citation: https://doi.org/10.5194/hess-2023-98-AC3

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Data sets

S2S AI challenge template A. Spring, A. Robertson, F. Pinault, F. Vitart, and R. Roskar https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template

Model code and software

The main codes developed and used to conduct this research M. E. Elbasheer and G. C. Perez https://zenodo.org/badge/latestdoi/511522205

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Viewed

Total article views: 2,123 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,603	436	84	2,123	99	124

HTML: 1,603
PDF: 436
XML: 84
Total: 2,123
BibTeX: 99
EndNote: 124

Views and downloads (calculated since 24 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	191	24	4	219
May 2023	134	23	2	159
Jun 2023	14	9	1	24
Jul 2023	31	14	1	46
Aug 2023	27	18	1	46
Sep 2023	132	28	19	179
Oct 2023	51	14	8	73
Nov 2023	18	8	0	26
Dec 2023	16	12	0	28
Jan 2024	19	10	0	29
Feb 2024	18	10	2	30
Mar 2024	25	14	2	41
Apr 2024	30	9	6	45
May 2024	29	9	3	41
Jun 2024	39	3	1	43
Jul 2024	16	5	21
Aug 2024	16	2	0	18
Sep 2024	35	3	0	38
Oct 2024	16	3	0	19
Nov 2024	8	5	1	14
Dec 2024	13	9	0	22
Jan 2025	18	5	2	25
Feb 2025	19	2	1	22
Mar 2025	16	7	5	28
Apr 2025	11	7	0	18
May 2025	23	9	0	32
Jun 2025	27	17	0	44
Jul 2025	25	15	3	43
Aug 2025	73	21	2	96
Sep 2025	335	19	1	355
Oct 2025	26	24	2	52
Nov 2025	31	43	3	77
Dec 2025	67	29	3	99
Jan 2026	48	10	4	62
Feb 2026	6	1	2	9

Cumulative views and downloads (calculated since 24 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	191	24	4	219
May 2023	134	23	2	159
Jun 2023	14	9	1	24
Jul 2023	31	14	1	46
Aug 2023	27	18	1	46
Sep 2023	132	28	19	179
Oct 2023	51	14	8	73
Nov 2023	18	8	0	26
Dec 2023	16	12	0	28
Jan 2024	19	10	0	29
Feb 2024	18	10	2	30
Mar 2024	25	14	2	41
Apr 2024	30	9	6	45
May 2024	29	9	3	41
Jun 2024	39	3	1	43
Jul 2024	16	5	21
Aug 2024	16	2	0	18
Sep 2024	35	3	0	38
Oct 2024	16	3	0	19
Nov 2024	8	5	1	14
Dec 2024	13	9	0	22
Jan 2025	18	5	2	25
Feb 2025	19	2	1	22
Mar 2025	16	7	5	28
Apr 2025	11	7	0	18
May 2025	23	9	0	32
Jun 2025	27	17	0	44
Jul 2025	25	15	3	43
Aug 2025	73	21	2	96
Sep 2025	335	19	1	355
Oct 2025	26	24	2	52
Nov 2025	31	43	3	77
Dec 2025	67	29	3	99
Jan 2026	48	10	4	62
Feb 2026	6	1	2	9

Viewed (geographical distribution)

Total article views: 2,056 (including HTML, PDF, and XML) Thereof 2,056 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 06 Feb 2026

Short summary

In this research, we explored the use of machine learning (ML) to improve the S2S ensemble precipitation forecast, different approaches were used as exploratory experiments to see which approach is better addressing the improvement of the ensemble probabilistic forecast, as a conclusion of our research, we found that the concept of committee model (CM) is a promising approach that can be further studied and evaluated using a different combination of the state of the art ML techniques.


Total:	0
HTML:	0
PDF:	0
XML:	0