Machine Learning and Committee Models for Improving ECMWF Subseasonal to Seasonal (S2S) Precipitation Forecast

Elbasheer, Mohamed Elneel Elshaikh Eltayeb; Corzo, Gerald Augusto; Solomatine, Dimitri; Varouchakis, Emmanouil

doi:https://doi.org/10.5194/hess-2022-348

Preprints

https://doi.org/10.5194/hess-2022-348

Preprints

03 Nov 2022

| 03 Nov 2022

Status: this discussion paper is a preprint. It has been under review for the journal Hydrology and Earth System Sciences (HESS). The manuscript was not accepted for further review after discussion.

Machine Learning and Committee Models for Improving ECMWF Subseasonal to Seasonal (S2S) Precipitation Forecast

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Abstract. The European Centre for Medium-Range Weather Forecasts (ECMWF) provides subseasonal to seasonal (S2S) precipitation forecasts; S2S forecasts extend from two weeks to two months ahead; however, the accuracy of S2S precipitation forecasting is still underdeveloped, and a lot of research and competitions have been proposed to study how machine learning (ML) can be used to improve forecast performance. This research explores the use of machine learning techniques to improve the ECMWF S2S precipitation forecast, here following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation (WMO). A baseline analysis of the ECMWF S2S precipitation hindcasts (2000–2019) targeting three categories (above normal, near normal and below normal) was performed the ranked probability skill score (RPSS) and the receiver operating characteristic curve (ROC). A regional analysis of a time series was done to group similar (correlated) hydrometeorological time series variables. Three regions were finally selected based on their spatial and temporal correlations. The methodology first replicated the performance of the ECMWF forecast data available and used it as a reference for the experiments (baseline analysis). Two approaches were followed to build categorical classification correction models: (1) using ML and (2) using a committee model. The aim of both was to correct the categorical classifications (above normal, near normal and below normal) of the ECMWF S2S precipitation forecast. In the first approach, the ensemble mean was used as the input, and five ML techniques were trained and compared: k-nearest neighbours (k-NN), logistic regression (LR), artificial neural network multilayer perceptron (ANN-MLP), random forest (RF) and long–short-term memory (LSTM). Here, we have proposed a gridded spatial and temporal correlation analysis (autocorrelation, cross-correlation and semivariogram) for the input variable selection, allowing us to explore neighbours’ time series and their lags as inputs. These results provided the final data sets that were used for the training and validation of the machine learning models. The total precipitation (tp), two-metre temperature (t2m) and time series with a resolution of 1.5 by 1.5 degrees were the main variables used, and these two variables were provided as the global ECMWF S2S real-time forecasts, ECMWF S2S reforecasts/hindcasts and observation data from the National Oceanic and Atmospheric Administration (Climate Prediction Centre, CPC). The forecasting skills of the ML models were compared against a reference model (ECMWF S2S precipitation hindcasts and climatology) using RPSS, and the results from the first approach showed that LR and MLP were the best ML models in terms of RPSS values. In addition, a positive RPSS value with respect to climatology was obtained using MLP. It is important to highlight that LSTM models performed quite similarly to MLP yet had slightly lower scores overall. In the second approach, the committee model (CM) was used, in which, instead of using one ECMWF hindcast (ensemble mean), the problem is divided into many ANN-MLP models (train each ensemble member independently) that are later combined in a smart ensemble model (trained with LR). The cross-validation and testing of the CMs showed positive RPSS values regarding climatology, which can be interpreted as improved ECMWF on the three climatological regions. In conclusion, ML models have very low—if any—improvement, but by using a CM, the RPSS values are all better than the reference forecast. This study was done only on random samples over three global regions; a more comprehensive study should be performed to explore the whole range of possibilities.

Received: 05 Oct 2022 – Discussion started: 03 Nov 2022

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Status: closed

RC1:
'Comment on hess-2022-348', Anonymous Referee #1, 30 Nov 2022

This study compared two approaches: (1) five machine learning models and (2) a committee model, to build categorical classification correction models for S2S precipitation forecasts categories, defined as above normal, near normal, and below normal using the 0.67 and 0.33 quantiles. Model inputs were objectively determined by a spatial (semivariance) and temporal (autocorrelation and cross-correlation) correlation analysis, which as a consequence led to four models with different input datasets. Model results were finally assessed in three selected regions. The analysis was generally performed following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation. I think this paper will make a useful contribution and deserves publication in Hydrology and Earth System Sciences, once the detailed comments given below have been addressed. I recommend a major revision and will be very glad to re-evaluate it.

1. The research question is absolutely the hotspot for S2S forecasts. However, in the introduction, the authors only briefly described the S2S AI challenge. The latest progress in S2S forecasts, S2S forecasts correction especially correction by AI were poorly addressed. The authors should clarify why the study is necessary (not only for the competition) and what is the novelty of their research.

2. The paper is written in a loose style. One sentence for a paragraph is too casual. For example, Lines 232-233:” The prepared spatial and temporal datasets used for the training and validation of the ML models and CMs are shown in Table 1.” should be incorporated in the above texts. Lines 254-260 should be organized in one paragraph. The authors should carefully check the whole paper and avoid this similar problem. I suggest the authors improve the paper to more fit like a scientific paper.

3. I am not sure whether there is a limitation for figure numbers in HESS. I am concerned that the figures are too redundant in the paper, and some of them are unnecessary or can be merged. Meanwhile, all the figure quality should be improved. For example, Figures 6-7 can be reorganized as 2*3 rather than 2*4. Figures 9-11 is hard to figure out the texts inside. The lower subplots in Figure 13 are incomplete. Figures 15 and 16 are not necessary to be shown as Table 2 and 3 has depicted the same information.

4. Generally, sections 3.4.7 and 3.4.8 are only straightforward descriptions of results. I don’t see any further physical explanations. For example, why the Model 4 is not the best model although it incorporates all the relevant inputs? Why do Regions 2 and 3 gain more improvements than Region 1 though the cross-correlation is higher between tp and t2m in this region? I think the authors should try to explain the AI model and the results rather than a simple description. This is the way to accelerate AI development and to give guidelines for subsequent research.

Specific Comments:

1. Line 221: could you explain why there is a fluctuation when the gamma approaches the sill?

2. Lines 275-279: It is not clear how the 0.67 and 0.33 quantiles are defined for ECMWF hindcasts. It seems that it is not defined for each member as there is only one line in Figure 14.

3. It is reasonable to assess model performance in selected regions. According to Figure 12, at least 9 cells were investigated for each region. It would be great if the spatial results are present in the manuscript.

4. Contents in sections 3.4.1-3.4.6 are more like methods, but the authors put them in the results section. It would be more appropriate to include them in Section 2.5.

5. Line 344: I don’t see the reason why the authors select MLP to construct the committee model. Please add the reasons.

Citation: https://doi.org/10.5194/hess-2022-348-RC1
- AC1: 'Reply on RC1', Mohamed Elbasheer, 03 Feb 2023
  
  We are very thankful for the important comments that we are sure will enhance the quality of the paper.
  RC1 Comments:
  1. The research question is absolutely the hotspot for S2S forecasts. However, in the introduction, the authors only briefly described the S2S AI challenge. The latest progress in S2S forecasts, S2S forecasts correction especially correction by AI were poorly addressed. The authors should clarify why the study is necessary (not only for the competition) and what is the novelty of their research.
  Response:
  Thank you for the comment, the introduction has been thoroughly reviewed and modified accordingly.
  2. The paper is written in a loose style. One sentence for a paragraph is too casual. For example, Lines 232-233:” The prepared spatial and temporal datasets used for the training and validation of the ML models and CMs are shown in Table 1.” Should be incorporated in the above texts. Lines 254-260 should be organized in one paragraph. The authors should carefully check the whole paper and avoid this similar problem. I suggest the authors improve the paper to more fit like a scientific paper.
  Response:
  Thank you for the comment, we have revised the style avoiding similar sentences and reflecting on the colloquial terms.
  3. I am not sure whether there is a limitation for figure numbers in HESS. I am concerned that the figures are too redundant in the paper, and some of them are unnecessary or can be merged. Meanwhile, all the figure quality should be improved. For example, Figures 6-7 can be reorganized as 2*3 rather than 2*4. Figures 9-11 is hard to figure out the texts inside. The lower subplots in Figure 13 are incomplete. Figures 15 and 16 are not necessary to be shown as Table 2 and 3 has depicted the same information.
  Response:
  Thank you for the comment, we reorganized the figures and removed redundant figures. For this, Figures 6-7 were reorganized as 2*3 rather than 2*4. Figures 9-11 were re-plotted so that the text inside is bigger and easier to read. The Lower subplots in Figure 13 are already there but the figure is a combination of two plots, so the lower part is shifted to the next page (This problem is solved in Figures 13 and 14). Figures 15 and 16 were deleted as the provided Tables 2 and 3 contain the same information.
  4. Generally, sections 3.4.7 and 3.4.8 are only straightforward descriptions of results. I don’t see any further physical explanations. For example, why the Model 4 is not the best model although it incorporates all the relevant inputs? Why do Regions 2 and 3 gain more improvements than Region 1 though the cross-correlation is higher between tp and t2m in this region? I think the authors should try to explain the AI model and the results rather than a simple description. This is the way to accelerate AI development and to give guidelines for subsequent research.
  Response:
  Thank you for the comment, we have updated the paper accordingly.
  Now the good performance of CMs is commonly attributed to the statistical reduction of variance when multiple models are combined.
  We also added a short text to improve the explanation. For the RPSS results in region 2 and region 3, and why they are better than in region 1 for the baseline analysis and the ML models despite the high cross-correlation between the tp and t2m in region 1. The committee models capture easier nonlinearities, since combining various possible models, that in the single ML algorithms is often quite difficult. Due to the high number of parameters in one model, it is the case that it is more difficult to fit a linear relationship, as per the correlated variables.
  We highlighted that the additional inputs in model 3 and model 4 didn’t substantially increase the models’ performance, they either slightly increase or decrease the performance, due to the noise these inputs may bring to the ML model (some inputs are not part of the forecast, and therefore they add noise), especially, when these inputs are perturbed members, which makes it difficult to ML models to learn from it. The paper shows that when the CM was tested, the model with the least inputs (Model 1) showed better results.
  Specific Comments:
  1. Line 221: could you explain why there is a fluctuation when the gamma approaches the sill?
  Response:
  Thank you for the comment. The plot that is provided is an experimental semivariogram, the plot shows fluctuations at very large distance lags. These fluctuations show very small negative and positive correlations and don’t indicate any significant spatial trend. This explanation is added to the manuscript (section 3.1).
  2. Lines 275-279: It is not clear how the 0.67 and 0.33 quantiles are defined for ECMWF hindcasts. It seems that it is not defined for each member as there is only one line in Figure 14.
  Response:
  Thank you for the comment. The 0.67 and 0.33 quantiles were calculated for each week of the year using the biweekly distribution of the (2000 – 2019) CPC observations. This explanation is also added to the manuscript (section 2.4).
  3. It is reasonable to assess model performance in selected regions. According to Figure 12, at least 9 cells were investigated for each region. It would be great if the spatial results are present in the manuscript.
  Response:
  Thank you for the comment. The evaluation was based on the average RPSS value over the region (averaging the RPSS was also followed in the AI competition), We agree that it would be great to present the spatial results, but this would make the comparison between the models a bit harder. In addition, this would take a lot of time and it would lead us to modify the code and run it all again.
  4. Contents in sections 3.4.1-3.4.6 are more like methods, but the authors put them in the results section. It would be more appropriate to include them in Section 2.5.
  Response:
  Thank you for the comment. The contents in sections 3.4.1-3.4.6 were included in section 2.5.
  5. Line 344: I don’t see the reason why the authors select MLP to construct the committee model. Please add the reasons.
  Response:
  The MLP is used because it showed slightly better results when the individual ML models were used, but still, we recommended a comprehensive study of the committee model using other state-of-the-art DL techniques instead of the MLP. The reason is added to the manuscript (section 2.5.6).
  
  Citation: https://doi.org/10.5194/hess-2022-348-AC1
RC2:
'Comment on hess-2022-348', Anonymous Referee #2, 03 Jan 2023
This paper presents interesting technical work on the topic of improving the ECMWF S2S forecasts with machine learning techniques. However, there is ample evidence that authors have not even tried to turn the technical work into a scientific paper (where the technical work is presented insisting on novel aspects):

The introduction is a half page long, with most paragraphs made of a single sentence, and presents the context (deliverable of a European project) in which the work was made.

No attempt is made at exploring the state-of-the-art in using machine learning techniques for improving hydroclimatic forecasts in general or S2S forecasts in particular.

Similarly, conclusions are those of a technical report, with no attempt at explaining what it brings to the state of the art.

What is more, authors claim to contribute to the WMO Prize Challenge to Improve Subseasonal to Seasonal Predictions Using Artificial Intelligence, the winners of which were announced in February 2022, but they do not benchmark themselves against the challenge winners, nor do they explain why they should not benchmark themselves against the challenge winners (source for the information on the WMO challenge: https://journals.ametsoc.org/view/journals/bams/103/12/BAMS-D-22-0046.1.xml. While that paper was published after the article’s submission, WMO challenge results were already known by then.)

Therefore, even though the technical work is interesting, the lack of engagement with even trying to demonstrate a contribution to the state of the art means this paper cannot be accepted. It is even surprising it was not desk rejected by the editorial team. Asking a reviewer to provide a substantive review in this context is tantamount to asking them to do the authors’ work of stating what the contribution is. Note this is very different from asking reviewers to evaluate the authors’ attempt at situating their contribution: I am usually very happy to do that, and to suggest improvements when needed.

I also note that authors have no excuse there: their funding and affiliations reveal they are no outsiders to the academic world.

For these reasons, I do not believe it is appropriate to provide comments beyond suggesting that the paper should not be considered further until and unless authors do their work and explain how their technical work contributes to the field.
Citation: https://doi.org/10.5194/hess-2022-348-RC2
- AC2: 'Reply on RC2', Mohamed Elbasheer, 03 Feb 2023
  
  Thank you for the time taken in commenting about the paper, we have expanded the introduction part to highlight more the scientific contribution. We believe that some journals highlight the need for shorter introductions and they should be limited to provide the main motivation and scientific context of the work. It is not meant to be a literature review nor to be the scientific contribution of the work. In general, we have a ten pages introduction with a review and a more extensive discussion of the motivation, but we believe that this work is self-descriptive and therefore we have extended the introduction but kept it as precise and to the point as possible.
  We have added the suggested citation, and we agree that it is an important reference research work in the context of the challenge. It is worth mentioning that at the time of submission, this reference paper was either not yet published or was not known at least to our knowledge.
  
  Citation: https://doi.org/10.5194/hess-2022-348-AC2

Status: closed

RC1:
'Comment on hess-2022-348', Anonymous Referee #1, 30 Nov 2022

This study compared two approaches: (1) five machine learning models and (2) a committee model, to build categorical classification correction models for S2S precipitation forecasts categories, defined as above normal, near normal, and below normal using the 0.67 and 0.33 quantiles. Model inputs were objectively determined by a spatial (semivariance) and temporal (autocorrelation and cross-correlation) correlation analysis, which as a consequence led to four models with different input datasets. Model results were finally assessed in three selected regions. The analysis was generally performed following the AI competition guidelines proposed by the S2S project and the World Meteorological Organisation. I think this paper will make a useful contribution and deserves publication in Hydrology and Earth System Sciences, once the detailed comments given below have been addressed. I recommend a major revision and will be very glad to re-evaluate it.

1. The research question is absolutely the hotspot for S2S forecasts. However, in the introduction, the authors only briefly described the S2S AI challenge. The latest progress in S2S forecasts, S2S forecasts correction especially correction by AI were poorly addressed. The authors should clarify why the study is necessary (not only for the competition) and what is the novelty of their research.

2. The paper is written in a loose style. One sentence for a paragraph is too casual. For example, Lines 232-233:” The prepared spatial and temporal datasets used for the training and validation of the ML models and CMs are shown in Table 1.” should be incorporated in the above texts. Lines 254-260 should be organized in one paragraph. The authors should carefully check the whole paper and avoid this similar problem. I suggest the authors improve the paper to more fit like a scientific paper.

3. I am not sure whether there is a limitation for figure numbers in HESS. I am concerned that the figures are too redundant in the paper, and some of them are unnecessary or can be merged. Meanwhile, all the figure quality should be improved. For example, Figures 6-7 can be reorganized as 2*3 rather than 2*4. Figures 9-11 is hard to figure out the texts inside. The lower subplots in Figure 13 are incomplete. Figures 15 and 16 are not necessary to be shown as Table 2 and 3 has depicted the same information.

4. Generally, sections 3.4.7 and 3.4.8 are only straightforward descriptions of results. I don’t see any further physical explanations. For example, why the Model 4 is not the best model although it incorporates all the relevant inputs? Why do Regions 2 and 3 gain more improvements than Region 1 though the cross-correlation is higher between tp and t2m in this region? I think the authors should try to explain the AI model and the results rather than a simple description. This is the way to accelerate AI development and to give guidelines for subsequent research.

Specific Comments:

1. Line 221: could you explain why there is a fluctuation when the gamma approaches the sill?

2. Lines 275-279: It is not clear how the 0.67 and 0.33 quantiles are defined for ECMWF hindcasts. It seems that it is not defined for each member as there is only one line in Figure 14.

3. It is reasonable to assess model performance in selected regions. According to Figure 12, at least 9 cells were investigated for each region. It would be great if the spatial results are present in the manuscript.

4. Contents in sections 3.4.1-3.4.6 are more like methods, but the authors put them in the results section. It would be more appropriate to include them in Section 2.5.

5. Line 344: I don’t see the reason why the authors select MLP to construct the committee model. Please add the reasons.

Citation: https://doi.org/10.5194/hess-2022-348-RC1
- AC1: 'Reply on RC1', Mohamed Elbasheer, 03 Feb 2023
  
  We are very thankful for the important comments that we are sure will enhance the quality of the paper.
  RC1 Comments:
  1. The research question is absolutely the hotspot for S2S forecasts. However, in the introduction, the authors only briefly described the S2S AI challenge. The latest progress in S2S forecasts, S2S forecasts correction especially correction by AI were poorly addressed. The authors should clarify why the study is necessary (not only for the competition) and what is the novelty of their research.
  Response:
  Thank you for the comment, the introduction has been thoroughly reviewed and modified accordingly.
  2. The paper is written in a loose style. One sentence for a paragraph is too casual. For example, Lines 232-233:” The prepared spatial and temporal datasets used for the training and validation of the ML models and CMs are shown in Table 1.” Should be incorporated in the above texts. Lines 254-260 should be organized in one paragraph. The authors should carefully check the whole paper and avoid this similar problem. I suggest the authors improve the paper to more fit like a scientific paper.
  Response:
  Thank you for the comment, we have revised the style avoiding similar sentences and reflecting on the colloquial terms.
  3. I am not sure whether there is a limitation for figure numbers in HESS. I am concerned that the figures are too redundant in the paper, and some of them are unnecessary or can be merged. Meanwhile, all the figure quality should be improved. For example, Figures 6-7 can be reorganized as 2*3 rather than 2*4. Figures 9-11 is hard to figure out the texts inside. The lower subplots in Figure 13 are incomplete. Figures 15 and 16 are not necessary to be shown as Table 2 and 3 has depicted the same information.
  Response:
  Thank you for the comment, we reorganized the figures and removed redundant figures. For this, Figures 6-7 were reorganized as 2*3 rather than 2*4. Figures 9-11 were re-plotted so that the text inside is bigger and easier to read. The Lower subplots in Figure 13 are already there but the figure is a combination of two plots, so the lower part is shifted to the next page (This problem is solved in Figures 13 and 14). Figures 15 and 16 were deleted as the provided Tables 2 and 3 contain the same information.
  4. Generally, sections 3.4.7 and 3.4.8 are only straightforward descriptions of results. I don’t see any further physical explanations. For example, why the Model 4 is not the best model although it incorporates all the relevant inputs? Why do Regions 2 and 3 gain more improvements than Region 1 though the cross-correlation is higher between tp and t2m in this region? I think the authors should try to explain the AI model and the results rather than a simple description. This is the way to accelerate AI development and to give guidelines for subsequent research.
  Response:
  Thank you for the comment, we have updated the paper accordingly.
  Now the good performance of CMs is commonly attributed to the statistical reduction of variance when multiple models are combined.
  We also added a short text to improve the explanation. For the RPSS results in region 2 and region 3, and why they are better than in region 1 for the baseline analysis and the ML models despite the high cross-correlation between the tp and t2m in region 1. The committee models capture easier nonlinearities, since combining various possible models, that in the single ML algorithms is often quite difficult. Due to the high number of parameters in one model, it is the case that it is more difficult to fit a linear relationship, as per the correlated variables.
  We highlighted that the additional inputs in model 3 and model 4 didn’t substantially increase the models’ performance, they either slightly increase or decrease the performance, due to the noise these inputs may bring to the ML model (some inputs are not part of the forecast, and therefore they add noise), especially, when these inputs are perturbed members, which makes it difficult to ML models to learn from it. The paper shows that when the CM was tested, the model with the least inputs (Model 1) showed better results.
  Specific Comments:
  1. Line 221: could you explain why there is a fluctuation when the gamma approaches the sill?
  Response:
  Thank you for the comment. The plot that is provided is an experimental semivariogram, the plot shows fluctuations at very large distance lags. These fluctuations show very small negative and positive correlations and don’t indicate any significant spatial trend. This explanation is added to the manuscript (section 3.1).
  2. Lines 275-279: It is not clear how the 0.67 and 0.33 quantiles are defined for ECMWF hindcasts. It seems that it is not defined for each member as there is only one line in Figure 14.
  Response:
  Thank you for the comment. The 0.67 and 0.33 quantiles were calculated for each week of the year using the biweekly distribution of the (2000 – 2019) CPC observations. This explanation is also added to the manuscript (section 2.4).
  3. It is reasonable to assess model performance in selected regions. According to Figure 12, at least 9 cells were investigated for each region. It would be great if the spatial results are present in the manuscript.
  Response:
  Thank you for the comment. The evaluation was based on the average RPSS value over the region (averaging the RPSS was also followed in the AI competition), We agree that it would be great to present the spatial results, but this would make the comparison between the models a bit harder. In addition, this would take a lot of time and it would lead us to modify the code and run it all again.
  4. Contents in sections 3.4.1-3.4.6 are more like methods, but the authors put them in the results section. It would be more appropriate to include them in Section 2.5.
  Response:
  Thank you for the comment. The contents in sections 3.4.1-3.4.6 were included in section 2.5.
  5. Line 344: I don’t see the reason why the authors select MLP to construct the committee model. Please add the reasons.
  Response:
  The MLP is used because it showed slightly better results when the individual ML models were used, but still, we recommended a comprehensive study of the committee model using other state-of-the-art DL techniques instead of the MLP. The reason is added to the manuscript (section 2.5.6).
  
  Citation: https://doi.org/10.5194/hess-2022-348-AC1
RC2:
'Comment on hess-2022-348', Anonymous Referee #2, 03 Jan 2023
This paper presents interesting technical work on the topic of improving the ECMWF S2S forecasts with machine learning techniques. However, there is ample evidence that authors have not even tried to turn the technical work into a scientific paper (where the technical work is presented insisting on novel aspects):

The introduction is a half page long, with most paragraphs made of a single sentence, and presents the context (deliverable of a European project) in which the work was made.

No attempt is made at exploring the state-of-the-art in using machine learning techniques for improving hydroclimatic forecasts in general or S2S forecasts in particular.

Similarly, conclusions are those of a technical report, with no attempt at explaining what it brings to the state of the art.

What is more, authors claim to contribute to the WMO Prize Challenge to Improve Subseasonal to Seasonal Predictions Using Artificial Intelligence, the winners of which were announced in February 2022, but they do not benchmark themselves against the challenge winners, nor do they explain why they should not benchmark themselves against the challenge winners (source for the information on the WMO challenge: https://journals.ametsoc.org/view/journals/bams/103/12/BAMS-D-22-0046.1.xml. While that paper was published after the article’s submission, WMO challenge results were already known by then.)

Therefore, even though the technical work is interesting, the lack of engagement with even trying to demonstrate a contribution to the state of the art means this paper cannot be accepted. It is even surprising it was not desk rejected by the editorial team. Asking a reviewer to provide a substantive review in this context is tantamount to asking them to do the authors’ work of stating what the contribution is. Note this is very different from asking reviewers to evaluate the authors’ attempt at situating their contribution: I am usually very happy to do that, and to suggest improvements when needed.

I also note that authors have no excuse there: their funding and affiliations reveal they are no outsiders to the academic world.

For these reasons, I do not believe it is appropriate to provide comments beyond suggesting that the paper should not be considered further until and unless authors do their work and explain how their technical work contributes to the field.
Citation: https://doi.org/10.5194/hess-2022-348-RC2
- AC2: 'Reply on RC2', Mohamed Elbasheer, 03 Feb 2023
  
  Thank you for the time taken in commenting about the paper, we have expanded the introduction part to highlight more the scientific contribution. We believe that some journals highlight the need for shorter introductions and they should be limited to provide the main motivation and scientific context of the work. It is not meant to be a literature review nor to be the scientific contribution of the work. In general, we have a ten pages introduction with a review and a more extensive discussion of the motivation, but we believe that this work is self-descriptive and therefore we have extended the introduction but kept it as precise and to the point as possible.
  We have added the suggested citation, and we agree that it is an important reference research work in the context of the challenge. It is worth mentioning that at the time of submission, this reference paper was either not yet published or was not known at least to our knowledge.
  
  Citation: https://doi.org/10.5194/hess-2022-348-AC2

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Data sets

S2S AI challenge template Aaron Spring, Andrew Robertson, Florian Pinault, Frederic Vitart, Roc Roskar, and Tasko Olevski https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template

Mohamed Elneel Elshaikh Eltayeb Elbasheer, Gerald Augusto Corzo, Dimitri Solomatine, and Emmanouil Varouchakis

Viewed

Total article views: 1,504 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,034	409	61	1,504	56	70

HTML: 1,034
PDF: 409
XML: 61
Total: 1,504
BibTeX: 56
EndNote: 70

Views and downloads (calculated since 03 Nov 2022)

Month	HTML	PDF	XML	Total
Nov 2022	310	58	4	372
Dec 2022	43	24	3	70
Jan 2023	41	17	4	62
Feb 2023	59	18	4	81
Mar 2023	23	18	2	43
Apr 2023	28	21	1	50
May 2023	15	13	0	28
Jun 2023	13	11	1	25
Jul 2023	58	17	2	77
Aug 2023	40	14	1	55
Sep 2023	31	31	5	67
Oct 2023	15	21	0	36
Nov 2023	11	5	0	16
Dec 2023	14	4	0	18
Jan 2024	15	6	1	22
Feb 2024	9	11	3	23
Mar 2024	27	11	1	39
Apr 2024	18	4	5	27
May 2024	25	9	5	39
Jun 2024	36	7	2	45
Jul 2024	13	2	1	16
Aug 2024	16	4	0	20
Sep 2024	23	13	0	36
Oct 2024	11	9	3	23
Nov 2024	6	5	2	13
Dec 2024	11	2	2	15
Jan 2025	16	3	0	19
Feb 2025	19	4	3	26
Mar 2025	13	13	3	29
Apr 2025	12	9	3	24
May 2025	21	6	0	27
Jun 2025	30	11	0	41
Jul 2025	12	8	0	20

Cumulative views and downloads (calculated since 03 Nov 2022)

Month	HTML	PDF	XML	Total
Nov 2022	310	58	4	372
Dec 2022	43	24	3	70
Jan 2023	41	17	4	62
Feb 2023	59	18	4	81
Mar 2023	23	18	2	43
Apr 2023	28	21	1	50
May 2023	15	13	0	28
Jun 2023	13	11	1	25
Jul 2023	58	17	2	77
Aug 2023	40	14	1	55
Sep 2023	31	31	5	67
Oct 2023	15	21	0	36
Nov 2023	11	5	0	16
Dec 2023	14	4	0	18
Jan 2024	15	6	1	22
Feb 2024	9	11	3	23
Mar 2024	27	11	1	39
Apr 2024	18	4	5	27
May 2024	25	9	5	39
Jun 2024	36	7	2	45
Jul 2024	13	2	1	16
Aug 2024	16	4	0	20
Sep 2024	23	13	0	36
Oct 2024	11	9	3	23
Nov 2024	6	5	2	13
Dec 2024	11	2	2	15
Jan 2025	16	3	0	19
Feb 2025	19	4	3	26
Mar 2025	13	13	3	29
Apr 2025	12	9	3	24
May 2025	21	6	0	27
Jun 2025	30	11	0	41
Jul 2025	12	8	0	20

Viewed (geographical distribution)

Total article views: 1,464 (including HTML, PDF, and XML) Thereof 1,464 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 18 Jul 2025

Short summary

In this research, we explored the use of machine learning (ML) to improve the ECMWF S2S ensemble precipitation forecast, different approaches were used as exploratory experiments to see which approach is better addressing the improvement of the ensemble probabilistic forecast, as a conclusion of our research, we found that the concept of committee model (CM) is a promising approach that can be further studied and evaluated using a different combination of the state of the art ML techniques.


Total:	0
HTML:	0
PDF:	0
XML:	0

Machine Learning and Committee Models for Improving ECMWF Subseasonal to Seasonal (S2S) Precipitation Forecast

Data sets

Viewed

Viewed (geographical distribution)

Cited

2 citations as recorded by crossref.