Signature and sensitivity-based comparison of conceptual and process oriented models, GR4H, MARINE and SMASH, on French Mediterranean flash floods

Haruna, Abubakar; Garambois, Pierre-Andre; Roux, Helene; Javelle, Pierre; Jay-Allemand, Maxime

doi:10.5194/hess-2021-414

Preprints

https://doi.org/10.5194/hess-2021-414

Preprints

27 Oct 2021

| 27 Oct 2021

Status: this discussion paper is a preprint. It has been under review for the journal Hydrology and Earth System Sciences (HESS). The manuscript was not accepted for further review after discussion.

Signature and sensitivity-based comparison of conceptual and process oriented models, GR4H, MARINE and SMASH, on French Mediterranean flash floods

Abubakar Haruna, Pierre-Andre Garambois, Helene Roux, Pierre Javelle, and Maxime Jay-Allemand

Abstract. The improvement of flood forecast ability of models is a key issue in hydrology, particularly in Mediterranean catchments that are subjected to strong convective events. This contribution compared models of different complexities, lumped GR4H, continuous SMASH and process-oriented MARINE. The objective was to understand how they simulate catchment's hydrological behavior, the differences in terms of their simulated discharge, the soil moisture, and how these can help to improve the relevance of the models. The study was applied on two Mediterranean catchments in the South of France. The methodology involved global sensitivity analysis, investigations of the response surface, calibration and validation, signature comparison at event scale, and comparison of soil moisture simulated with respect to the outputs of the surface model, SIM. The results revealed contrasted and catchment specific parameter sensitivity to the same efficiency measure and equifinality issues are highlighted via response surface plots. Higher sensitivity is found for all models to transfer parameters on the Gardon and for production parameters on the Ardeche. The exchange parameter controlling a non-conservative flow component of GR4H is found to be sensitive. All models had good calibration efficiencies, with MARINE having the highest, and GR4H being more robust in validation. At the event scale, indices of discharge showed that, the event-based MARINE was better in reproducing the peak and its timing. It is followed by SMASH, while GR4H was the least in this aspect. SMASH performed relatively better in the volume of water exported and is followed by GR4H. Regarding the soil moisture simulated by the three models and using the outputs of the operational surface model SIM as the benchmark, MARINE emerged as the most accurate in terms of both the dynamics and the amplitude. GR4H followed closely while SMASH was the least in comparison. This study paves the way for extended model hypothesis and calibration-regionalization methods testing and intercomparison in the light of multi-sourced signatures in order to assess/discriminate internal model behaviors. It highlights, in particular, the need for future investigations on combinations of vertical and lateral flow components, including groundwater exchanges, in distributed hydrological models along with new optimization methods for optimally exploiting, at the regional scale, multi-source datasets composed of both physiographic data and hydrological signatures.

Received: 03 Aug 2021 – Discussion started: 27 Oct 2021

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Abubakar Haruna, Pierre-Andre Garambois, Helene Roux, Pierre Javelle, and Maxime Jay-Allemand

Status: closed

RC1:
'Comment on hess-2021-414', Anonymous Referee #1, 05 Jan 2022

General Comments:

Abubakar et al. are presenting their work on comparing three divers hydrological models regarding their capability to model flash floods. The comparison of the lumped, conceptual GR4H model, the distributed, conceptual SMASH model and the process-oriented, event-based MARINE model is based on a sensitivity analysis for each model, a performance comparison and a soil moisture comparison of the model states with the modelled SAFRAN-ISBA-MODCOU (SIM) soil moisture predictions.

In my perception, the manuscript is currently lacking a concise story line and presentation of the study goals and relevant outcomes. There is a lot of information the authors try to convey which makes it hard to grasp the core of the study. A few more tables should help to achieve a clearer presentation of all the very individual characteristics of the compared models and their setups. This will be necessary to understand and conduct a final evaluation of their acquired results. I feel that not all information presented are valuable for the goal of the paper and could be moved to the appendix to make the manuscript an easier read. While the results are elaborately presented, I am missing some depth in the analysis/discussion and a substantial conclusion. I find this study very specific (case study) with only limited insights into the question they set out to answer (which model brings which benefits/challenges when modelling flash floods?). I even feel the authors are missing a chance to generalize their results to offer some insights or advice on modelling flash floods with one or all of the three used models.

Thus, I advise on a thorough and extensive review with several iterations between the co-authors (1^st author & supervisors) before resubmitting this work in a format that makes it easier for the reader to identify the goal and relevant outcomes of this study.

Specific Comments:

Structure: The formulated goal of the paper (“the objective was to understand how [the 3 models] simulate catchment’s hydrological behavior, the differences in terms of their simulated discharge, the soil moisture, and how these can help to improve the relevance of the models”) should be matched with the specific analyses that were conducted to answer these questions and with the conclusions that result out of these analyses. These connections should then be formulated very clearly in the paper and parts that don’t contribute to answering the questions should be removed, or the part they play be made very clear.

Introduction: I would recommend adding a paragraph on previous model comparison studies and their challenges in order to better evaluate how comparable the results acquired in this study actually are or at least what the challenges might be. Especially since the authors refer to their work as an intercomparison study (Line 158). Some references to start out with might be:

Refsgaard, J. C., & Knudsen, J. (1996). Operational Validation and Intercomparison of Different Types of Hydrological Models. Water Resources Research, 32(7), 2189–2202. https://doi.org/10.1029/96WR00896

Butts, M. B., Payne, J. T., Kristensen, M., & Madsen, H. (2004). An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation. Journal of Hydrology, 298(1–4), 242–266. https://doi.org/10.1016/j.jhydrol.2004.03.042

Clark, M. P., Kavetski, D., & Fenicia, F. (2011). Pursuing the method of multiple working hypotheses for hydrological modeling. Water Resources Research, 47(9), 1–16. https://doi.org/10.1029/2010WR009827

Fenicia, F., Kavetski, D., & Savenije, H. H. G. (2011). Elements of a flexible approach for conceptual hydrological modeling: 1. Motivation and theoretical development. Water Resources Research, 47(11), 1–13. https://doi.org/10.1029/2010WR010174

Orth, R., Staudinger, M., Seneviratne, S. I., Seibert, J., & Zappa, M. (2015). Does model performance improve with complexity? A case study with three hydrological models. Journal of Hydrology, 523, 147–159. https://doi.org/10.1016/j.jhydrol.2015.01.044

Models, Tools, and Data:

Regarding the SIM model please describe why it is feasible to use this as a benchmark for the modelled soil moisture (Is model VS model really a good idea? Why not satellite data as benchmark?) Please describe the differences between SIM1 and SIM2 and why you chose SIM 1 to initialize MARINE and SIM2 as the benchmark. Why is it okay to compare to the daily SIM product when the models run on hourly time steps?

The authors compare very diverse model structures and I believe it would be helpful to the reader to see a somewhat aggregated version of the main differences of the model setups and their data requirements. E.g. it might be helpful to see the differences in the input data in a table format. As there is a large difference between the input for the lumped GR4H (P, PET, Q) and the distributed, process-based MARINE (landuse, soilmaps etc). The table could also be used to show the main differences of the models itself (lumped vs process-based etc., delta t and delta x etc.) in one glance. Please include delta t, delta x and also the number of calibrated parameters for all models of the study. This is a long paper and you want to make it as easy as possible for the reader so they don’t lose interest.

Why is delta x and delta t different for MARINE and how do you think this influences your results? Please justify why different delta x and delta t are used for the models.

Please also provide a table for a quick comparison of the 2 catchments. It should include the same information for both catchments. This is currently not the case in the text. E.g. average slope is only given for Gardon. Also “a lot of intense rainfalls” could be backed up by some climate data statistics in the table. At least mention the average precipitation input for each catchment.

Input Data: Is there any information on the quality of the radar observation reanalysis data? Radar data is known to underestimate especially heavy rainfalls of short duration which are important for flash flood generation. Could this be a problem for the current study? Also it seems the precipitation input of the SIM model is different to the tested models. How could this influence the results, especially as it is used as a benchmark for soil moisture comparison. What are the implications?

Sensitivity Analysis:

While I agree the sensitivity analysis is necessary for better understanding the model results later on I currently fail to see how we can learn anything from the comparison of the model parameters here as they are different for each model (are differently implemented in the model structures). You might want to state which parameters are comparable and why you believe they are comparable. Please clarify in the manuscript.

The tables with the sensitivity ranks of the parameters (Table 6 + 7) are currently given without much description or discussion before the section starts. They should be included in the subsections when they are referred to or removed to the Appendix.

If a comparison of the models is the goal I find it essential to analyze the event based sensitivity for SMASH and GR4J as was done for MARINE as well. Otherwise, it’s really hard to attempt a fair comparison here.

Table 3 - Why is the number of classes so much higher for Gardon than for Ardeche?

Line 285 – Why only 5000 runs if the other models had 10000 for their SA? Please justify.

Currently the SA seems to tell us which parameters are sensitive in which model/catchment and we see some expected differences. Due to the small sample we can’t conclude anything general though and thus I currently fail to see the benefit for comparing the models with regard to flash flood modelling. Especially since the comparability of the sensitivities lacks some justification. As the paper is very long, I would advise to move the SA to the Appendix and focus on the actual comparison of the models. Section 4.1.4 may be kept as the main outcome of the SA but doesn’t require all the plots and descriptions in the main text. Unless the authors can clarify the benefit and insights we gain from the results.

Calibration:

The section on the response surface and functioning points is interesting but does not seem to immediately add to the point of the paper. Consider moving to the appendix.

How were the events separated into 2 periods and how is this justified? There are only 2 events in one period one for Gardone? Why?

Is Table 8 showing the mean from the unmasked calibration and then the STD from the masked calibration? Please clarify. Are we only looking at the masked calibration results for the rest of the paper? Please indicate more precisely.

Figure 11 does not seem to be mentioned or described anywhere even though it clearly indicates that GR4H and SMASH have a problem in modelling the high discharges for Ardeche, which is quite relevant for this study. Please comment on this in the manuscript.

Section 4.4.3 I don’t understand what was done here and what Figure 18 is supposed to tell us. What is the y-axis “change in available storage”? Is this in percent? What does Figure 11 tell us and why is it relevant?

I feel it is a rather large drawback of this study that all models have a rather different calibration routine. What’s the take of the authors on this?

Conclusions:

There are no general conclusions after the description of the results they acquired. Are there cases when SMASH is the better choice or when MARINE is? What do we actually learn from these results that is of relevance to people attempting future work with these models in a flash flood context?

The authors conclude from their results “The difference in the model performances could stem from differences in the levels of complexity of the models, the processes described and the constrains of the models, and thus highlights the need for future improvements in the models and calibration methods.” – 1.) It is very much expected to have differences in performance when testing 3 so very divers models – so what did your study contribute? Please describe either HOW they perform differently, WHY they perform differently or what would be the new insight on that they actually perform differently. 2.) Why does that highlight the need for improvement of the models? Which weaknesses were identified that need to be improved? Should a lumped conceptual and distributed process-based model perform identical?

The authors state in line 465f that “MARINE has its efficiency in validation decreased by around 25%, while SMASH and GR4H have a decrease of 5.2% and 4.8% respectively.” The conclusions read rather as if MARINE comes off pretty well. I feel that the results need to be contextualized a little more and general advice be given. What did you learn from your model comparison study that may be of benefit for a reader?

There are a lot of unnecessary relative terms in this paper. Try to be more precise! (e.g. line 405 “ “somehow similar conclusions” or line 440 “relative robustness”)

Minor/Technical Comments:

There were a few terms that felt unfamiliar to me. E.G “flow operator” (especially used in 2.1) for process description/algorithm etc. I would advise to use more commonly used terms in the literature such as process description/algorithm. Also I wouldn’t use “numerical experiments” for a normal methodology consisting of calibration-validation and model comparison.

Line 9f - If the catchment names are used it should be mentioned that these are catchment names somehow. Otherwise, it’s a little confusing.

Line 29 – potentially use “internal states and fluxes”

Line 41 – another suitable citation in that context would be Bouaziz et al. (2021)

Line 85 – potentially use “as well as” instead of “and next”

Line 91 – please add reference where in the paper this is analyzed

Line 122 – is there a reference for the “Michel calibration algorithm” you can add?

Line 125 – it looks as if the authors missed to describe which warm up period they are using to avoid the effects on the results they mentioned

Line 164 – is detailed after – please refer to the location of the paper where this is detailed

Line 166 – MARINE is an event-based, physically based, […]; add based

Line 184 - is detailed after – please refer to the location of the paper where this is detailed

Line 259 – KS test: abbreviation is only defined in appendix, so please do so here as well

Line 268 – Section B should be Appendix B

Line 292 – as shown after – please refer to the location of the paper where this is detailed

Line 294 – “dividing the data into two”: two what? Please specify by adding time periods or similar.

Line 295 – The sentence is hard to read. Maybe change to something along the lines of: A “Time series of 13 years at hourly time step is considered and “ divided into “ two sub-periods of 7 years each for calibration and validation. The Period 1 is defined from …

Line 296 – Why are Period 1 and 2 overlapping by a year?

Line 300 – does this mean you concatenated the single events to form a consecutive time period of events which you then split for calibration? If so, this is not yet clear from your text. Please clarify.

Line 304f – “These experiments are designed to compare 3 models at flash flood modelling […]” should appear much more prominently and earlier in this paper!

Line 310 – which aggregation are you comparing? The Catchment size aggregation? Please clearify. Also this should already be mentioned around line 210, so the questions about the different resolutions and their comparability doesn’t arise.

Line 315 – how do the events compare relating to their number of peaks, gradients in limbs and precipitation patterns? It’s not specified in table 5. Please specify!

Line 337 – gives the delay in hours? Please specify.

Line 338 - why is this more rigorous in terms of safety? Please elaborate.

Line 421 – “a few parameters are stuck at the bound”. For SMASH it seem to be most parameters that are stuck at the bound. What does this tell us?

Line 483 – For which time period are the signatures calculated? Only for the event time period? Please specify.

Figure 2 – I find the map could use a scale and north arrow to really allow for the term map.

Figure 3 – Maybe add a line for your divide between behavioral and non-behavioral runs at NSE=0.7 in 1^st row of plots

Figure 14 – the black cross is missing in the legend. Why does the header for CR look different?

Citation: https://doi.org/10.5194/hess-2021-414-RC1
- AC1: 'Reply on RC1', Abubakar Haruna, 04 Feb 2022
  
  Dear Reviewer,
  We thank you very much for the detailed and constructive comments. Please find attached our reply.
  Best regards.
  
  Citation: https://doi.org/10.5194/hess-2021-414-AC1
RC2:
'Comment on hess-2021-414', Anonymous Referee #2, 12 Feb 2022

Review of by Haruna Abubakar et al

General comments:

The study explores the parameter sensitivity and event-based performance of three different models in aspect of French Mediterranean flash floods. The authors did a lot work resulting in 18 figures, 11 tables and abundance appendix materials, however, which makes the main point neither condensed or concise. Ovreall, the novelty of this study is not strong and has not been articulated.

Here are some major comments:

For distributed models, why using different spatial resolution for MARINE (0.5km) and SMASH (1km)? Is it fair in terms of inter-model comparison? I suggest adding a table to describe the resolutions of data and models, in order to avoid confusing.

I also wonder how do you consider the influence of hydrograph shape (one peak or multi-peak, thin or flat hydrograph) to model performance?

How robust it is to set SIM2 as benchmark for evaluating model performance of soil moisture?

What’s the point to soil moisture comparing? Is there any possible that the peak flow is related to better soil moisture simulation? It would be better if the result sections can be related to each other.

For the text throughout this study, it would be better if you use the same order when you describe the results of the tree models (e.g. always in the order of GR4H, MARINE and SMASH). Please also put figures and tables in the correct position. It will also be my pleasure if there is more analyzation and discussion, but less repeated figures or tables.

Specific comments:

P8. L212. 1.Can you please introduce more about Mediterranean climate of these two catchments to support the selection of flood events? For example, is there snowmelt in spring to cause flood events of return period higher than 2 years?

2.What are the difference and similarity of these two catchments? Why do you choose these two catchments? Is there any influence for model performance since they don’t share similar catchment shape?

P12.L295-302. Did you use GR4H and SMASH to simulate long-term hourly runoff process and use event based MARINE model to simulate selected events? If so, the performance evaluation is present in event based scale. How do you consider the advantage of event-based model MARINE?

P19. L406-407. This is an interesting finding. Is it because the flood of these two catchments is mainly relied on transferring or production components? Is it possible that it can be related to certain catchments attributes?

Figure 2. The longitude and latitude gird is needed. It would be better if you can show Mediterranean climate regions in this figure.

Figure 12. Please also include P in the legend.

Table 3. Is represents number of soil classes? Why Cp differs a lot between these two catchments?

Table 4. Is Gardon in the first row correct? Should it be replaced by range?

Table 5. 1. Please uniform the flood events name for two catchments.

2.As you introduced before: 1) heavy rainfall always appears in autumn (line 27) and, 2) intense rainfalls are in the autumn and winter (line 223) for Mediterranean climate. So please give a brief explanation about why spring events can be selected under this kind of climate.

Citation: https://doi.org/10.5194/hess-2021-414-RC2
- AC3:
  'Reply on RC2', Abubakar Haruna, 03 Mar 2022
  Thank you for reviewing our work and for the constructive comments that will help us revise and
  
  improve this article. Some elements of answer to your main questions are provided here:
  A better presentation of details of each model will clarify the reading, for example with
  
  respect to the models’ resolutions - that is on the order of rainfall forcing grid resolution.
  
  Regarding simulated states and the comparison with those of SIM model, as answered to re-
  
  viewer 1. This choice is motivated by the fact that SIM is a well validated surface model with
  
  a rich description of soil atmosphere processes, applied on a wide spatio-temporal domain
  
  which ensures good data availability. Note that SIM runs at relatively fine time steps but
  
  we only used daily quantities for sake of simplicity. Moreover, SIM1 is traditionally used to
  
  initialize MARINE, hence the choice is made to use SIM2 as benchmark, whose parameteri-
  
  zation is more complex (ex. more soil levels, which is not usable for MARINE initialization).
  
  Note that using satellite moisture data is a work on its own, already studied with MARINE
  
  in Eeckman (2020), and is left for further work. These points will be clarified and better
  
  discussed.
  
  Studying the relation between hydrograph shape and model performance is an interesting
  
  topic, this could be used to refine the analysis based on presented results.
  
  We will be happy to revise our paper following your helpful comments and also to provide a
  
  detailed response letter.
  
  Citation: https://doi.org/10.5194/hess-2021-414-AC3
RC3:
'Comment on hess-2021-414', Anonymous Referee #3, 12 Feb 2022

The manuscript is well written and organized apart from the introduction. However, the following comments are provided to improve the manuscript.

Specific comments:

1. The text in Figure 1, 6, 12, 15 should be more readable.

2. Too much figures and tables. Keeping ten figures and ten tables in main manuscript, please take others in supplementary part.

3. Figures and tables should be placed just after the paragraph, it got first stated.

4. I don’t agree with the authors with Appendix B. Sensitivity analysis and uncertainty analysis are not the same things.

5. I suggest to perform an uncertainty analysis to present the results in robust form.

6. Tables in Appendix are not easily readable.

7. From line 46 to 55, can be summarized in to a table.

8. The novelty is not clear from the introduction

9. Last paragraph of the Introduction does not seem to be required

10. The objective is not clear, at all, from introduction.

Citation: https://doi.org/10.5194/hess-2021-414-RC3
- AC2: 'Reply on RC3', Abubakar Haruna, 03 Mar 2022
  
  Thank you for your positive evaluation of our work and for the comments provided in order to improve it. We will use those to improve the manuscript and in particular the introduction as requested.
  
  Citation: https://doi.org/10.5194/hess-2021-414-AC2

Status: closed

RC1:
'Comment on hess-2021-414', Anonymous Referee #1, 05 Jan 2022

General Comments:

Abubakar et al. are presenting their work on comparing three divers hydrological models regarding their capability to model flash floods. The comparison of the lumped, conceptual GR4H model, the distributed, conceptual SMASH model and the process-oriented, event-based MARINE model is based on a sensitivity analysis for each model, a performance comparison and a soil moisture comparison of the model states with the modelled SAFRAN-ISBA-MODCOU (SIM) soil moisture predictions.

In my perception, the manuscript is currently lacking a concise story line and presentation of the study goals and relevant outcomes. There is a lot of information the authors try to convey which makes it hard to grasp the core of the study. A few more tables should help to achieve a clearer presentation of all the very individual characteristics of the compared models and their setups. This will be necessary to understand and conduct a final evaluation of their acquired results. I feel that not all information presented are valuable for the goal of the paper and could be moved to the appendix to make the manuscript an easier read. While the results are elaborately presented, I am missing some depth in the analysis/discussion and a substantial conclusion. I find this study very specific (case study) with only limited insights into the question they set out to answer (which model brings which benefits/challenges when modelling flash floods?). I even feel the authors are missing a chance to generalize their results to offer some insights or advice on modelling flash floods with one or all of the three used models.

Thus, I advise on a thorough and extensive review with several iterations between the co-authors (1^st author & supervisors) before resubmitting this work in a format that makes it easier for the reader to identify the goal and relevant outcomes of this study.

Specific Comments:

Structure: The formulated goal of the paper (“the objective was to understand how [the 3 models] simulate catchment’s hydrological behavior, the differences in terms of their simulated discharge, the soil moisture, and how these can help to improve the relevance of the models”) should be matched with the specific analyses that were conducted to answer these questions and with the conclusions that result out of these analyses. These connections should then be formulated very clearly in the paper and parts that don’t contribute to answering the questions should be removed, or the part they play be made very clear.

Introduction: I would recommend adding a paragraph on previous model comparison studies and their challenges in order to better evaluate how comparable the results acquired in this study actually are or at least what the challenges might be. Especially since the authors refer to their work as an intercomparison study (Line 158). Some references to start out with might be:

Refsgaard, J. C., & Knudsen, J. (1996). Operational Validation and Intercomparison of Different Types of Hydrological Models. Water Resources Research, 32(7), 2189–2202. https://doi.org/10.1029/96WR00896

Butts, M. B., Payne, J. T., Kristensen, M., & Madsen, H. (2004). An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation. Journal of Hydrology, 298(1–4), 242–266. https://doi.org/10.1016/j.jhydrol.2004.03.042

Clark, M. P., Kavetski, D., & Fenicia, F. (2011). Pursuing the method of multiple working hypotheses for hydrological modeling. Water Resources Research, 47(9), 1–16. https://doi.org/10.1029/2010WR009827

Fenicia, F., Kavetski, D., & Savenije, H. H. G. (2011). Elements of a flexible approach for conceptual hydrological modeling: 1. Motivation and theoretical development. Water Resources Research, 47(11), 1–13. https://doi.org/10.1029/2010WR010174

Orth, R., Staudinger, M., Seneviratne, S. I., Seibert, J., & Zappa, M. (2015). Does model performance improve with complexity? A case study with three hydrological models. Journal of Hydrology, 523, 147–159. https://doi.org/10.1016/j.jhydrol.2015.01.044

Models, Tools, and Data:

Regarding the SIM model please describe why it is feasible to use this as a benchmark for the modelled soil moisture (Is model VS model really a good idea? Why not satellite data as benchmark?) Please describe the differences between SIM1 and SIM2 and why you chose SIM 1 to initialize MARINE and SIM2 as the benchmark. Why is it okay to compare to the daily SIM product when the models run on hourly time steps?

The authors compare very diverse model structures and I believe it would be helpful to the reader to see a somewhat aggregated version of the main differences of the model setups and their data requirements. E.g. it might be helpful to see the differences in the input data in a table format. As there is a large difference between the input for the lumped GR4H (P, PET, Q) and the distributed, process-based MARINE (landuse, soilmaps etc). The table could also be used to show the main differences of the models itself (lumped vs process-based etc., delta t and delta x etc.) in one glance. Please include delta t, delta x and also the number of calibrated parameters for all models of the study. This is a long paper and you want to make it as easy as possible for the reader so they don’t lose interest.

Why is delta x and delta t different for MARINE and how do you think this influences your results? Please justify why different delta x and delta t are used for the models.

Please also provide a table for a quick comparison of the 2 catchments. It should include the same information for both catchments. This is currently not the case in the text. E.g. average slope is only given for Gardon. Also “a lot of intense rainfalls” could be backed up by some climate data statistics in the table. At least mention the average precipitation input for each catchment.

Input Data: Is there any information on the quality of the radar observation reanalysis data? Radar data is known to underestimate especially heavy rainfalls of short duration which are important for flash flood generation. Could this be a problem for the current study? Also it seems the precipitation input of the SIM model is different to the tested models. How could this influence the results, especially as it is used as a benchmark for soil moisture comparison. What are the implications?

Sensitivity Analysis:

While I agree the sensitivity analysis is necessary for better understanding the model results later on I currently fail to see how we can learn anything from the comparison of the model parameters here as they are different for each model (are differently implemented in the model structures). You might want to state which parameters are comparable and why you believe they are comparable. Please clarify in the manuscript.

The tables with the sensitivity ranks of the parameters (Table 6 + 7) are currently given without much description or discussion before the section starts. They should be included in the subsections when they are referred to or removed to the Appendix.

If a comparison of the models is the goal I find it essential to analyze the event based sensitivity for SMASH and GR4J as was done for MARINE as well. Otherwise, it’s really hard to attempt a fair comparison here.

Table 3 - Why is the number of classes so much higher for Gardon than for Ardeche?

Line 285 – Why only 5000 runs if the other models had 10000 for their SA? Please justify.

Currently the SA seems to tell us which parameters are sensitive in which model/catchment and we see some expected differences. Due to the small sample we can’t conclude anything general though and thus I currently fail to see the benefit for comparing the models with regard to flash flood modelling. Especially since the comparability of the sensitivities lacks some justification. As the paper is very long, I would advise to move the SA to the Appendix and focus on the actual comparison of the models. Section 4.1.4 may be kept as the main outcome of the SA but doesn’t require all the plots and descriptions in the main text. Unless the authors can clarify the benefit and insights we gain from the results.

Calibration:

The section on the response surface and functioning points is interesting but does not seem to immediately add to the point of the paper. Consider moving to the appendix.

How were the events separated into 2 periods and how is this justified? There are only 2 events in one period one for Gardone? Why?

Is Table 8 showing the mean from the unmasked calibration and then the STD from the masked calibration? Please clarify. Are we only looking at the masked calibration results for the rest of the paper? Please indicate more precisely.

Figure 11 does not seem to be mentioned or described anywhere even though it clearly indicates that GR4H and SMASH have a problem in modelling the high discharges for Ardeche, which is quite relevant for this study. Please comment on this in the manuscript.

Section 4.4.3 I don’t understand what was done here and what Figure 18 is supposed to tell us. What is the y-axis “change in available storage”? Is this in percent? What does Figure 11 tell us and why is it relevant?

I feel it is a rather large drawback of this study that all models have a rather different calibration routine. What’s the take of the authors on this?

Conclusions:

There are no general conclusions after the description of the results they acquired. Are there cases when SMASH is the better choice or when MARINE is? What do we actually learn from these results that is of relevance to people attempting future work with these models in a flash flood context?

The authors conclude from their results “The difference in the model performances could stem from differences in the levels of complexity of the models, the processes described and the constrains of the models, and thus highlights the need for future improvements in the models and calibration methods.” – 1.) It is very much expected to have differences in performance when testing 3 so very divers models – so what did your study contribute? Please describe either HOW they perform differently, WHY they perform differently or what would be the new insight on that they actually perform differently. 2.) Why does that highlight the need for improvement of the models? Which weaknesses were identified that need to be improved? Should a lumped conceptual and distributed process-based model perform identical?

The authors state in line 465f that “MARINE has its efficiency in validation decreased by around 25%, while SMASH and GR4H have a decrease of 5.2% and 4.8% respectively.” The conclusions read rather as if MARINE comes off pretty well. I feel that the results need to be contextualized a little more and general advice be given. What did you learn from your model comparison study that may be of benefit for a reader?

There are a lot of unnecessary relative terms in this paper. Try to be more precise! (e.g. line 405 “ “somehow similar conclusions” or line 440 “relative robustness”)

Minor/Technical Comments:

There were a few terms that felt unfamiliar to me. E.G “flow operator” (especially used in 2.1) for process description/algorithm etc. I would advise to use more commonly used terms in the literature such as process description/algorithm. Also I wouldn’t use “numerical experiments” for a normal methodology consisting of calibration-validation and model comparison.

Line 9f - If the catchment names are used it should be mentioned that these are catchment names somehow. Otherwise, it’s a little confusing.

Line 29 – potentially use “internal states and fluxes”

Line 41 – another suitable citation in that context would be Bouaziz et al. (2021)

Line 85 – potentially use “as well as” instead of “and next”

Line 91 – please add reference where in the paper this is analyzed

Line 122 – is there a reference for the “Michel calibration algorithm” you can add?

Line 125 – it looks as if the authors missed to describe which warm up period they are using to avoid the effects on the results they mentioned

Line 164 – is detailed after – please refer to the location of the paper where this is detailed

Line 166 – MARINE is an event-based, physically based, […]; add based

Line 184 - is detailed after – please refer to the location of the paper where this is detailed

Line 259 – KS test: abbreviation is only defined in appendix, so please do so here as well

Line 268 – Section B should be Appendix B

Line 292 – as shown after – please refer to the location of the paper where this is detailed

Line 294 – “dividing the data into two”: two what? Please specify by adding time periods or similar.

Line 295 – The sentence is hard to read. Maybe change to something along the lines of: A “Time series of 13 years at hourly time step is considered and “ divided into “ two sub-periods of 7 years each for calibration and validation. The Period 1 is defined from …

Line 296 – Why are Period 1 and 2 overlapping by a year?

Line 300 – does this mean you concatenated the single events to form a consecutive time period of events which you then split for calibration? If so, this is not yet clear from your text. Please clarify.

Line 304f – “These experiments are designed to compare 3 models at flash flood modelling […]” should appear much more prominently and earlier in this paper!

Line 310 – which aggregation are you comparing? The Catchment size aggregation? Please clearify. Also this should already be mentioned around line 210, so the questions about the different resolutions and their comparability doesn’t arise.

Line 315 – how do the events compare relating to their number of peaks, gradients in limbs and precipitation patterns? It’s not specified in table 5. Please specify!

Line 337 – gives the delay in hours? Please specify.

Line 338 - why is this more rigorous in terms of safety? Please elaborate.

Line 421 – “a few parameters are stuck at the bound”. For SMASH it seem to be most parameters that are stuck at the bound. What does this tell us?

Line 483 – For which time period are the signatures calculated? Only for the event time period? Please specify.

Figure 2 – I find the map could use a scale and north arrow to really allow for the term map.

Figure 3 – Maybe add a line for your divide between behavioral and non-behavioral runs at NSE=0.7 in 1^st row of plots

Figure 14 – the black cross is missing in the legend. Why does the header for CR look different?

Citation: https://doi.org/10.5194/hess-2021-414-RC1
- AC1: 'Reply on RC1', Abubakar Haruna, 04 Feb 2022
  
  Dear Reviewer,
  We thank you very much for the detailed and constructive comments. Please find attached our reply.
  Best regards.
  
  Citation: https://doi.org/10.5194/hess-2021-414-AC1
RC2:
'Comment on hess-2021-414', Anonymous Referee #2, 12 Feb 2022

Review of by Haruna Abubakar et al

General comments:

The study explores the parameter sensitivity and event-based performance of three different models in aspect of French Mediterranean flash floods. The authors did a lot work resulting in 18 figures, 11 tables and abundance appendix materials, however, which makes the main point neither condensed or concise. Ovreall, the novelty of this study is not strong and has not been articulated.

Here are some major comments:

For distributed models, why using different spatial resolution for MARINE (0.5km) and SMASH (1km)? Is it fair in terms of inter-model comparison? I suggest adding a table to describe the resolutions of data and models, in order to avoid confusing.

I also wonder how do you consider the influence of hydrograph shape (one peak or multi-peak, thin or flat hydrograph) to model performance?

How robust it is to set SIM2 as benchmark for evaluating model performance of soil moisture?

What’s the point to soil moisture comparing? Is there any possible that the peak flow is related to better soil moisture simulation? It would be better if the result sections can be related to each other.

For the text throughout this study, it would be better if you use the same order when you describe the results of the tree models (e.g. always in the order of GR4H, MARINE and SMASH). Please also put figures and tables in the correct position. It will also be my pleasure if there is more analyzation and discussion, but less repeated figures or tables.

Specific comments:

P8. L212. 1.Can you please introduce more about Mediterranean climate of these two catchments to support the selection of flood events? For example, is there snowmelt in spring to cause flood events of return period higher than 2 years?

2.What are the difference and similarity of these two catchments? Why do you choose these two catchments? Is there any influence for model performance since they don’t share similar catchment shape?

P12.L295-302. Did you use GR4H and SMASH to simulate long-term hourly runoff process and use event based MARINE model to simulate selected events? If so, the performance evaluation is present in event based scale. How do you consider the advantage of event-based model MARINE?

P19. L406-407. This is an interesting finding. Is it because the flood of these two catchments is mainly relied on transferring or production components? Is it possible that it can be related to certain catchments attributes?

Figure 2. The longitude and latitude gird is needed. It would be better if you can show Mediterranean climate regions in this figure.

Figure 12. Please also include P in the legend.

Table 3. Is represents number of soil classes? Why Cp differs a lot between these two catchments?

Table 4. Is Gardon in the first row correct? Should it be replaced by range?

Table 5. 1. Please uniform the flood events name for two catchments.

2.As you introduced before: 1) heavy rainfall always appears in autumn (line 27) and, 2) intense rainfalls are in the autumn and winter (line 223) for Mediterranean climate. So please give a brief explanation about why spring events can be selected under this kind of climate.

Citation: https://doi.org/10.5194/hess-2021-414-RC2
- AC3:
  'Reply on RC2', Abubakar Haruna, 03 Mar 2022
  Thank you for reviewing our work and for the constructive comments that will help us revise and
  
  improve this article. Some elements of answer to your main questions are provided here:
  A better presentation of details of each model will clarify the reading, for example with
  
  respect to the models’ resolutions - that is on the order of rainfall forcing grid resolution.
  
  Regarding simulated states and the comparison with those of SIM model, as answered to re-
  
  viewer 1. This choice is motivated by the fact that SIM is a well validated surface model with
  
  a rich description of soil atmosphere processes, applied on a wide spatio-temporal domain
  
  which ensures good data availability. Note that SIM runs at relatively fine time steps but
  
  we only used daily quantities for sake of simplicity. Moreover, SIM1 is traditionally used to
  
  initialize MARINE, hence the choice is made to use SIM2 as benchmark, whose parameteri-
  
  zation is more complex (ex. more soil levels, which is not usable for MARINE initialization).
  
  Note that using satellite moisture data is a work on its own, already studied with MARINE
  
  in Eeckman (2020), and is left for further work. These points will be clarified and better
  
  discussed.
  
  Studying the relation between hydrograph shape and model performance is an interesting
  
  topic, this could be used to refine the analysis based on presented results.
  
  We will be happy to revise our paper following your helpful comments and also to provide a
  
  detailed response letter.
  
  Citation: https://doi.org/10.5194/hess-2021-414-AC3
RC3:
'Comment on hess-2021-414', Anonymous Referee #3, 12 Feb 2022

The manuscript is well written and organized apart from the introduction. However, the following comments are provided to improve the manuscript.

Specific comments:

1. The text in Figure 1, 6, 12, 15 should be more readable.

2. Too much figures and tables. Keeping ten figures and ten tables in main manuscript, please take others in supplementary part.

3. Figures and tables should be placed just after the paragraph, it got first stated.

4. I don’t agree with the authors with Appendix B. Sensitivity analysis and uncertainty analysis are not the same things.

5. I suggest to perform an uncertainty analysis to present the results in robust form.

6. Tables in Appendix are not easily readable.

7. From line 46 to 55, can be summarized in to a table.

8. The novelty is not clear from the introduction

9. Last paragraph of the Introduction does not seem to be required

10. The objective is not clear, at all, from introduction.

Citation: https://doi.org/10.5194/hess-2021-414-RC3
- AC2: 'Reply on RC3', Abubakar Haruna, 03 Mar 2022
  
  Thank you for your positive evaluation of our work and for the comments provided in order to improve it. We will use those to improve the manuscript and in particular the introduction as requested.
  
  Citation: https://doi.org/10.5194/hess-2021-414-AC2

Abubakar Haruna, Pierre-Andre Garambois, Helene Roux, Pierre Javelle, and Maxime Jay-Allemand

Viewed

Total article views: 2,104 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,485	527	92	2,104	77	102

HTML: 1,485
PDF: 527
XML: 92
Total: 2,104
BibTeX: 77
EndNote: 102

Views and downloads (calculated since 27 Oct 2021)

Month	HTML	PDF	XML	Total
Oct 2021	137	14	1	152
Nov 2021	153	31	1	185
Dec 2021	36	16	1	53
Jan 2022	58	21	1	80
Feb 2022	94	27	6	127
Mar 2022	52	20	5	77
Apr 2022	29	15	0	44
May 2022	16	3	1	20
Jun 2022	10	5	2	17
Jul 2022	23	4	0	27
Aug 2022	12	10	0	22
Sep 2022	7	9	0	16
Oct 2022	16	6	1	23
Nov 2022	9	3	0	12
Dec 2022	8	3	0	11
Jan 2023	12	22	0	34
Feb 2023	16	8	0	24
Mar 2023	9	4	1	14
Apr 2023	3	5	1	9
May 2023	8	4	1	13
Jun 2023	6	9	1	16
Jul 2023	12	10	1	23
Aug 2023	10	7	1	18
Sep 2023	20	6	3	29
Oct 2023	10	8	1	19
Nov 2023	11	1	1	13
Dec 2023	16	3	1	20
Jan 2024	12	8	0	20
Feb 2024	16	6	1	23
Mar 2024	10	16	0	26
Apr 2024	15	4	12	31
May 2024	9	5	3	17
Jun 2024	10	5	2	17
Jul 2024	9	2	3	14
Aug 2024	11	4	1	16
Sep 2024	19	10	0	29
Oct 2024	9	6	1	16
Nov 2024	8	8	1	17
Dec 2024	4	4	0	8
Jan 2025	10	10	4	24
Feb 2025	13	4	4	21
Mar 2025	18	10	3	31
Apr 2025	3	9	2	14
May 2025	13	8	0	21
Jun 2025	17	11	1	29
Jul 2025	15	27	1	43
Aug 2025	61	10	5	76
Sep 2025	320	12	2	334
Oct 2025	28	12	2	42
Nov 2025	26	26	3	55
Dec 2025	15	25	3	43
Jan 2026	15	9	5	29
Feb 2026	6	2	2	10

Cumulative views and downloads (calculated since 27 Oct 2021)

Month	HTML	PDF	XML	Total
Oct 2021	137	14	1	152
Nov 2021	153	31	1	185
Dec 2021	36	16	1	53
Jan 2022	58	21	1	80
Feb 2022	94	27	6	127
Mar 2022	52	20	5	77
Apr 2022	29	15	0	44
May 2022	16	3	1	20
Jun 2022	10	5	2	17
Jul 2022	23	4	0	27
Aug 2022	12	10	0	22
Sep 2022	7	9	0	16
Oct 2022	16	6	1	23
Nov 2022	9	3	0	12
Dec 2022	8	3	0	11
Jan 2023	12	22	0	34
Feb 2023	16	8	0	24
Mar 2023	9	4	1	14
Apr 2023	3	5	1	9
May 2023	8	4	1	13
Jun 2023	6	9	1	16
Jul 2023	12	10	1	23
Aug 2023	10	7	1	18
Sep 2023	20	6	3	29
Oct 2023	10	8	1	19
Nov 2023	11	1	1	13
Dec 2023	16	3	1	20
Jan 2024	12	8	0	20
Feb 2024	16	6	1	23
Mar 2024	10	16	0	26
Apr 2024	15	4	12	31
May 2024	9	5	3	17
Jun 2024	10	5	2	17
Jul 2024	9	2	3	14
Aug 2024	11	4	1	16
Sep 2024	19	10	0	29
Oct 2024	9	6	1	16
Nov 2024	8	8	1	17
Dec 2024	4	4	0	8
Jan 2025	10	10	4	24
Feb 2025	13	4	4	21
Mar 2025	18	10	3	31
Apr 2025	3	9	2	14
May 2025	13	8	0	21
Jun 2025	17	11	1	29
Jul 2025	15	27	1	43
Aug 2025	61	10	5	76
Sep 2025	320	12	2	334
Oct 2025	28	12	2	42
Nov 2025	26	26	3	55
Dec 2025	15	25	3	43
Jan 2026	15	9	5	29
Feb 2026	6	2	2	10

Viewed (geographical distribution)

Total article views: 2,060 (including HTML, PDF, and XML) Thereof 2,060 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Feb 2026

Short summary

We compared three hydrological models in a flash flood modelling framework. We first identified the sensitive parameters of each model, then compared their performances in terms of outlet discharge and soil moisture simulation. We found out that resulting from the differences in their complexities/process representation, performance depends on the aspect/measure used. The study then highlights and proposed some future investigations/modifications to improve the models.


Total:	0
HTML:	0
PDF:	0
XML:	0