the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Leveraging multi-variable observations to reduce and quantify the output uncertainty of a global hydrological model: evaluation of three ensemble-based approaches for the Mississippi River basin
Howlader Mohammad Mehedi Hasan
Kerstin Schulze
Helena Gerdener
Lara Börger
Somayeh Shadkam
Sebastian Ackermann
Seyed-Mohammad Hosseini-Moghari
Hannes Müller Schmied
Andreas Güntner
Jürgen Kusche
Download
- Final revised paper (published on 30 May 2024)
- Supplement to the final revised paper
- Preprint (discussion started on 11 May 2023)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on hess-2023-18', Anonymous Referee #1, 07 Jun 2023
The study by Doell et al. compares three different strategies to reduce parameter uncertainty for the global hydrological model WaterGap. The methods used are BORG, GLUE, and an ensemble Kalman filter, which the authors apply in a pilot study to the Mississippi basin. How we best estimate global water models is an interesting and relevant question to which the authors contribute. I do like the study and what the authors do and show, but I have some critical comments regarding how the work is currently presented and discussed. I outline my main comments below.
[0] The authors' use of sensitivity analysis is very nice and interesting, but the results are hardly discussed. I would have liked to see more detail on these results. For example, the precipitation multiplier is not slected as important. Interesting, given that this parameter is often very relevant. Is rthis due to the monthly time step? The authors study a huge domain. How did sensitivity to the parameters vary across this domain? A lot of insights to be gained from this analysis, but they are not discussed. I think this would be worth including rather than some other parts as suggested below.
[1] This is a very long paper with a lot of details on the model and the data that, at least to me as a reader, seems excessive and not needed to understand the main story presented. It makes reading the paper a bit tedious because most readers will not run WaterGap and they might not even be interested in the extensive background information on the data (as part of the main story).
For example, lines 500-508 discuss problems with the GRACE data and how others have gone about reducing them. Is this really something I need to know to follow the story? I think text like this can go into the supplemental material without reducing the strength of the story told. On the contrary, it would make it better because I do not have to read through this background information unless I want to.
Lines 466-508 discuss details of the GRACE data and their uncertainties in (excessive) detail. At the same time, the authors spent one sentence on stating that two studies considered streamflow errors of about 10%, while the next sentence states that this is maybe a possible average but the variability is very large. The authors spend over 60 lines discussing GRACE and 6 (ok 7) lines to discuss the other variable they use. I do not understand why the authors do not present a more balanced discussion given that both variables suffer from significant and potentially complex uncertainties.
[2] Starting from the back, i.e. the Outlook section, I wonder what transferrable knowledge the authors contribute that is unrelated to using WaterGap (and potentially the traditional approach to calibrating WaterGap)?
My impression is that most of the conclusions are rather specific to the use of WaterGap. I do not think that this is a problem per se, but it would be good if the authors would be clearer about general outcomes and those specific to WaterGap. One problem in this context is that Discussion and Conclusions are jointly discussed and that this section is 7 pages long. I think these sections can be joined if this part of the paper is short, but here it is very long. A long discussion followed by a very short conclusions and outlook section would make it much easier for the reader. There the authors could also easily separate specific and general conclusions.
[3] The final recommendation to include uncertainty in climate change impact projections related to freshwater is good, but this is already widely done (see below). Can the authors be more specific regarding their recommendation? They could for example discuss this issue much more in the context of global models and the specific implications this has.
Just a few random examples from a quick online search:
https://www.nature.com/articles/s41598-019-41334-7
https://hess.copernicus.org/articles/21/4245/2017/
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2011WR010602
[4] The connection to existing literature is in places very extensive and in others very brief. All methods used here have been previously assessed widely. Maybe not exactly in this combination, but certainly individually or in combination with other methods. I would therefore have expected that the authors help the reader to start from a more informed level.
For example, the (poor) ability of GLUE to identify the best parameter set has been explored in the past (see link below) and thus is what should be expected. The issue now is rather what relevance this has for the study at hand.
https://backend.orbit.dtu.dk/ws/portalfiles/portal/9729153/MR2007_305.pdf )
[5] While I possibly sound rather critical, I think this is an interesting and relevant study. My comments are simply meant to help the authors communicate their work with the readers. Shortening the paper, being clearer about specific and general contributions, and a better connection with existing literature would make it much easier for readers to understand the study and its relevance.
Citation: https://doi.org/10.5194/hess-2023-18-RC1 -
AC2: 'Reply on RC1', Petra Döll, 28 Nov 2023
We thank you very much for your helpful comments and constructive suggestions for improving the manuscript. Below, each reviewer's comment (indicated by “RC”) is followed by our answer (indicated by “AC”). The proposed new text in the revised manuscript is written in bold.
RC: The study by Doell et al. compares three different strategies to reduce parameter uncertainty for the global hydrological model WaterGap. The methods used are BORG, GLUE, and an ensemble Kalman filter, which the authors apply in a pilot study to the Mississippi basin. How we best estimate global water models is an interesting and relevant question to which the authors contribute. I do like the study and what the authors do and show, but I have some critical comments regarding how the work is currently presented and discussed. I outline my main comments below.
AC: Thank you for the positive feedback.
RC: [0] The authors' use of sensitivity analysis is very nice and interesting, but the results are hardly discussed. I would have liked to see more detail on these results. For example, the precipitation multiplier is not slected as important. Interesting, given that this parameter is often very relevant. Is rthis due to the monthly time step? The authors study a huge domain. How did sensitivity to the parameters vary across this domain? A lot of insights to be gained from this analysis, but they are not discussed. I think this would be worth including rather than some other parts as suggested below.
AC: The results of the sensitivity analyses are discussed in 27 lines of text (lines 581-607), and some information on the differences across the whole MRB, i.e. in the five sub-basins (CDA units) are provided in the text but mainly in Table 2 where it is indicated, in the last column, which parameters were selected for which CDA unit based on the CDA unit-specific sensitivity analysis. In the revised version, we will extend the sub-basin-specific discussion of the results of the sensitivity analysis by referring to the table below, which we will add to the supplement.
Table Sx. The most influential parameters for streamflow, TWSA, snow cover and local lake storage
CDA Unit
Streamflow
TWSA
Snow cover
Local lake storage
I Arkansas
SL-RC, SL-MSM, EP-PTh, SL-MEP, GW-MM
SL-RC, SL-MSM, NA-GM
SN-MT
SW-LD, SW-DC
II Missouri
SL-RC, SL-MSM, EP-PTh, SN-MT, NA-SM
SL-RC, SL-MSM, SW-WD, EP-PTh, NA-GM
SN-MT
SW-LD, SW-DC, NA-SM
III Upper MRB
SL-RC, SL-MSM, EP-PTh, SN-MT, GW-MM
SL-RC, SL-MSM, SW-WD, SW-DC, EP-PTh
SN-MT
SW-LD, SW-DC
IV Ohio
SL-RC, SL-MSM, SW-RRM, EP-PTh, GW-MM
SL-RC, SL-MSM, EP-PTh, GW-DC
SN-MT
SW-LD, SW-DC
V Lower MRB
SL-RC, SL-MSM, SW-RRM, EP-PTh, SN-MT
SL-MSM, GW-RFM, NA-GM
SN-MT
SW-LD, SW-DC
MRB
SL-RC, SL-MSM, SW-RRM, EP-PTh
SL-RC, SL-MSM, EP-PTh, NA-GM
SN-MT
SW-LD, SW-DC
Note that although SW-WD was not selected in units I, IV, V and MRB, we decided to select the parameter for all units due to the effect on groundwater recharge from surface water bodies.
Regarding the precipitation multiplier P-PM, we write in line 583 “P-PM was excluded from calibration even though it ranked 1st in the sensitivity analyses in all six basins for almost all four test variables because the precipitation input is perturbed in EnCDA, and an additional multiplier would lead to a double-counting of precipitation uncertainty.” So one reason for not including P-PM in POC and GLUE was that we wanted to compare all three calibration methods. The other reason was that different from other basins such as the Amazon or the Ganges-Brahmaputra basins, precipitation in the Mississippi River Basin is expected to be rather well represented by the climate data used as input to WaterGAP. In the table below, you see that the mean annual precipitation in the CDA units that was used to drive WaterGAP does not differ much from the values derived from the high-resolution (4 km) PRISM dataset for the USA. In the revised version, we will add a more extensive explanation for why we did not use P-PM as calibration parameters and will add the table below to the supplement and reference it in the sensitivity analysis section.
Table Sy. Comparison of mean annual precipitation in the CDA units for the calibration period 2003-2012 between GPCC-WFDEI used to drive WaterGAP and the high-resolution (4 km) PRISM* dataset for the USA [mm/yr]
CDA Unit
GPCC-WFDEI
PRISM
PRISM/GPCC-WFDEI
(potential P-PM)
I Arkansas
705
667
0.95
II Missouri
595
622
1.04
III Upper MRB
951
878
0.92
IV Ohio
1313
1242
0.95
V Lower MRB
1286
1254
0.97
MRB
839
829
0.99
* https://climatedataguide.ucar.edu/climate-data/prism-high-resolution-spatial-climate-data-united-states-maxmin-temp-dewpoint
RC: [1] This is a very long paper with a lot of details on the model and the data that, at least to me as a reader, seems excessive and not needed to understand the main story presented. It makes reading the paper a bit tedious because most readers will not run WaterGap and they might not even be interested in the extensive background information on the data (as part of the main story).
For example, lines 500-508 discuss problems with the GRACE data and how others have gone about reducing them. Is this really something I need to know to follow the story? I think text like this can go into the supplemental material without reducing the strength of the story told. On the contrary, it would make it better because I do not have to read through this background information unless I want to.
Lines 466-508 discuss details of the GRACE data and their uncertainties in (excessive) detail. At the same time, the authors spent one sentence on stating that two studies considered streamflow errors of about 10%, while the next sentence states that this is maybe a possible average but the variability is very large. The authors spend over 60 lines discussing GRACE and 6 (ok 7) lines to discuss the other variable they use. I do not understand why the authors do not present a more balanced discussion given that both variables suffer from significant and potentially complex uncertainties.
AC: Regarding the description of the WaterGAP model in section 3.1, we constrained the information to the information that is necessary to understand 1) the meaning and importance of parameters that are to be estimated by the multi-variable calibration and 2) the differences between the multi-variable calibration presented in the manuscript and the (very simple) standard calibration of WaterGAP. Thus, we think that it is not beneficial to shorten the model description or move it to the supplement.
Regarding the description of the GRACE TWSA data, we agree with the reviewer that there is excessive detail in the main text. We will move the text on leakage problems (lines 483 to 508) to the supplement. To increase the readability by decreasing the length of the main text, we also plan to move section 2.4 “Comparison of the three calibration approaches” (lines 276-377, including Table 1) to the Appendix.
RC: [2] Starting from the back, i.e. the Outlook section, I wonder what transferrable knowledge the authors contribute that is unrelated to using WaterGap (and potentially the traditional approach to calibrating WaterGap)?
My impression is that most of the conclusions are rather specific to the use of WaterGap. I do not think that this is a problem per se, but it would be good if the authors would be clearer about general outcomes and those specific to WaterGap. One problem in this context is that Discussion and Conclusions are jointly discussed and that this section is 7 pages long. I think these sections can be joined if this part of the paper is short, but here it is very long. A long discussion followed by a very short conclusions and outlook section would make it much easier for the reader. There the authors could also easily separate specific and general conclusions.
AC: We will follow your suggestion to split the “Discussion and Conclusions” section and organize the last part of the manuscript as follows:
5 Discussion (with sections 5.1 to 5.6)
6 Conclusions (which includes, in revised form, what is now 5.7 Outlook).
Also following the advice of the reviewer, in the revised conclusions, we will clearly distinguish general conclusions for global hydrological modeling from WaterGAP-specific conclusions.
RC: [3] The final recommendation to include uncertainty in climate change impact projections related to freshwater is good, but this is already widely done (see below). Can the authors be more specific regarding their recommendation? They could for example discuss this issue much more in the context of global models and the specific implications this has.
Just a few random examples from a quick online search:
https://www.nature.com/articles/s41598-019-41334-7
https://hess.copernicus.org/articles/21/4245/2017/
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2011WR010602
AC: Thank you for making us rethink our recommendation on including parameter uncertainty in a multi-model ensemble of impact models to project climate change hazards. Focusing on global-scale climate change impact studies, we will replace the last paragraph of the main text
“We recommend including, in future freshwater-related climate change impact studies, a behavioral ensemble of parameter sets as determined by the GLUE approach even though this will require a significant computational effort. This should reduce the underestimation of modeling uncertainty by traditional multi-model studies. As shown in the multi-model/multi-parameter study for the Colorado River basin by Mendoza et al. (2016), parameter sets with a similar performance during the calibration period may provide very different projections of climate change hazards, and the impact of parameter uncertainty is similar to the impact of hydrological model selection. “
by the following:
Climate change impact studies for individual river basins have shown that parameter sets with a similar performance during the calibration period may provide very different projections of climate change hazards, and that the impact of parameter uncertainty can be similar to the impact of the selected climate or hydrological model selection (Mendoza et al., 2016; Her et al., 2019). Therefore, consideration of parameter uncertainty by running the hydrological model with a number of behavioral parameter sets helps to reduce the underestimation of the uncertainty of potential climate change impact. However, producing a global-scale ensemble of potential future changes in hydrological variables by combining not only multiple greenhouse gas emissions scenarios, global climate models and global hydrological models (as is currently done in ISIMIP) but also model-specific behavioral parameter sets is currently infeasible. The main reason is that behavioral (or even optimal) parameter sets have not yet been determined for any global hydrological model in a spatially explicit manner at the global scale. In addition, the computational effort for such a multi-model/multi-parameter ensemble is likely prohibitive.
RC: [4] The connection to existing literature is in places very extensive and in others very brief. All methods used here have been previously assessed widely. Maybe not exactly in this combination, but certainly individually or in combination with other methods. I would therefore have expected that the authors help the reader to start from a more informed level.
For example, the (poor) ability of GLUE to identify the best parameter set has been explored in the past (see link below) and thus is what should be expected. The issue now is rather what relevance this has for the study at hand.
https://backend.orbit.dtu.dk/ws/portalfiles/portal/9729153/MR2007_305.pdf
AC: With the help of both reviews, we noticed that we have not clearly described the role of GLUE (with a random ensemble of parameter sets) as compared to the role of POC (in which a search algorithm derives an ensemble of Pareto-optimal parameter sets). In the revised version of the manuscript, we need to develop the storyline of the paper differently, starting with saying already in the introduction (in the paragraph in lines 121-133, before the objectives of the study) that optimal parameter sets are best identified using a search algorithm, like used in POC, while GLUE serves, in the face of equifinality, to identify behavioral parameter sets. In our study, GLUE served, for example, to identify which of the Pareto-optimal parameter sets identified with the search algorithm are behavioral given the uncertainty of the streamflow and TWSA observations. To do this, we need to revise parts of the text (including the abstract, definition of the objective of the manuscript and the conclusions), and we plan to change the title of the manuscript from “Multi-variable parameter estimation for a global hydrological model: Comparison and evaluation of three ensemble-based calibration methods for the Mississippi River basin” to
Leveraging multi-variable observations to reduce and quantify the output uncertainty of a global hydrological model: Evaluation of three ensemble-based approaches for the Mississippi River basin
In the conclusions, we will add that POC can be used to estimate optimal parameter sets for the large basins of the globe that can then be applied in all global-scale hydrological studies including climate change impact studies. GLUE can be used to determine behavioral parameter sets for those basins, which enables quantifying the uncertainty of the model output.
RC: [5] While I possibly sound rather critical, I think this is an interesting and relevant study. My comments are simply meant to help the authors communicate their work with the readers. Shortening the paper, being clearer about specific and general contributions, and a better connection with existing literature would make it much easier for readers to understand the study and its relevance
AC: We will direct our revision in this way.
Citation: https://doi.org/10.5194/hess-2023-18-AC2
-
AC2: 'Reply on RC1', Petra Döll, 28 Nov 2023
-
RC2: 'Comment on hess-2023-18', Anonymous Referee #2, 23 Oct 2023
Döll et al. exploit three different approaches to identify parameters in a WaterGAP model of the Mississippi River Basin. This should provide insights for the calibration of global hydrological models.
Although as a reviewer I aim to be constructive and to provide concrete recommendations, I have to admit that I found this difficult for the presented study. I hope I can make my points clear and that this provides enough guidance for the authors to search for other directions.
The long, quite unfocused, introduction seems to give the goal of this study (line 70), where the complex and long research question is already a preparation for the reader on what is coming. My summary of the goal of the study, if I understood correctly, would be to explore how global hydrological models can be calibrated in order to make better use of available observations.In my understanding and experience, one of the reasons why GHMs currently are not thoroughly or automatically calibrated is mainly because of computational demand, besides model complexity (leading to non-uniqueness). The argument of computational demand has, surprisingly, not been taken into account in any way in selecting calibration approaches for this study. Expensive algorithms and approaches, such as Borg-MOEA and EnKF are explored, already going towards computational limits for the basin explored here. How is this ever going to translate to a global application then?
But besides, there were other reasons to be surprised by the selected methods and approaches. The three methods seem to be presented as calibration strategies, but I would argue they are not. GLUE is presented as an optimization technique, while it is merely a way of evaluating a sample. Therefore, it should not come as a surprise that Borg-MOEA outperforms GLUE; the authors already write themselves that the search algorithm searches in the region of interest, while GLUE is just a sample across the whole parameter space. This conclusion, therefore, could have been drawn without doing all the computations. The same holds true for the EnCDA. An implementation of EnKF is used as a way of calibrating, but EnKF has never been developed to serve as a calibration algorithm. It is useful for real-time applications, it is useful to identify model structural errors, but it never claims a convergence towards an optimal parameter set. Therefore, no surprise that results drifted off in the validation period!Besides these methodological issues, the study is hard to read and follow. Only at page 20 (!) I felt that I got a more concrete picture of what was done. And even then, it read a lot like a diary. For instance, first I was very very surprised at line 520 that also a multiplication factor for precipitation and net radiation were included as calibration parameters. Then I was not so surprised to find out that the multiplication factor for precipitation came out as most sensitive (l. 582), to then I was surprised again to learn that it was still left out of the calibration (l. 585). I know that there is an argument for documenting failures etc., but I don’t think this is helpful at all at this level: just leave this kind of stuff out, don’t bother the reader with it. Furthermore, there is some kind of strange mixed use of NSE and KGE. The NSE is optimized, but the KGE components are evaluated. Why not directly optimizing the KGE then? That would lead to different results compared to the NSE. Figure 3 shows NSE’s if I read the axes, but the caption refers to some kind of KGE.
Finally, there is no conclusion-section, just a very extensive “Discussion and conclusion”, which is already indicative that there are too many separate aspects that are aimed to be tackled in this study. This study aimed to serve the GHM community, but the kind of strategies and questions explored here have already been extensively addressed and investigated by regional scale models – with the same conclusions as this study. Now the challenge remains how to translate this to models applied to larger areal extends, and this study does not seem to contribute to that.
Overall, the methods seem to be not in line with the goals that this study aims to achieve, and the written presentation requires substantial improvement.Citation: https://doi.org/10.5194/hess-2023-18-RC2 -
AC1: 'Reply on RC2', Petra Döll, 28 Nov 2023
We thank you very much for your helpful comments and constructive suggestions for improving the manuscript. Below, each reviewer's comment (indicated by “RC”) is followed by our answer (indicated by “AC”). The proposed new text in the revised manuscript is written in bold.
RC: Döll et al. exploit three different approaches to identify parameters in a WaterGAP model of the Mississippi River Basin. This should provide insights for the calibration of global hydrological models. Although as a reviewer I aim to be constructive and to provide concrete recommendations, I have to admit that I found this difficult for the presented study. I hope I can make my points clear and that this provides enough guidance for the authors to search for other directions.
AC: Thank you for your critical feedback that will help us to better communicate the objectives, results and conclusions of our study.
RC: The long, quite unfocused, introduction seems to give the goal of this study (line 70), where the complex and long research question is already a preparation for the reader on what is coming. My summary of the goal of the study, if I understood correctly, would be to explore how global hydrological models can be calibrated in order to make better use of available observations.
AC: The manuscript does aim at showing how to make (better) use of available observations in global hydrological modeling beyond streamflow but it is not only about calibration in the sense of finding optimal parameter sets but also about estimation of model output uncertainty. In the revised version, we will therefore change the title of the manuscript from “Multi-variable parameter estimation for a global hydrological model: Comparison and evaluation of three ensemble-based calibration methods for the Mississippi River basin” to
Leveraging multi-variable observations to reduce and quantify the output uncertainty of a global hydrological model: Evaluation of three ensemble-based approaches for the Mississippi River basin
The introduction is intended to first provide the broad research question very quickly after only 17 introductory lines of text, in lines 68-72). The broad research question is long as we want to show the challenges of parameter estimation. In the following part of the introduction (in only 61 lines), we describe the state of the art regarding the challenges of equifinality that can be tackled by using observations of more than one output variable, focusing on the two variables streamflow and total water storage anomaly that were used in our study as well as on different methods for model calibration by parameter estimation. This is followed by the specific objective of the study and the formulation of six specific research questions that are then addressed in the discussion section. So we think that our introduction is focused and well-structured.
In the revised version, we will modify the research objective in lines 134 ff. to fit better to the new title and to focus less on pure comparison of the three approaches but on the broader methodological challenges that are also expressed in the six specific research questions. We will replace
“The objective of this paper is to assess the suitability of the three multi-variable calibration approaches POC, GLUE and EnCDA for identifying ensembles of optimal and behavioral parameter sets of the GHM WaterGAP by model calibration against observations of Q and TWSA, taking into account observation uncertainties. In addition, an approach for taking into account the observation errors for the definition of performance thresholds for behavioral parameter sets is presented. In each calibration approach, model parameters of all WaterGAP grid cells within so-called calibration-data assimilation (CDA) units were uniformly adjusted. Based on calibration exercises either for the whole Mississippi River basin (MRB) as one CDA unit or for its five sub-basins (four upstream basins and one downstream basin) as alternative CDA units, we will answer the following research questions:”
by
The objective of this paper is to analyze how the uncertainty of the output of global hydrological models can be reduced and quantified taking into account observations of multiple output variables and their uncertainties. This paper shows how Q and TWSA observations can be used to obtain ensembles of (Pareto)-optimal and behavioral parameter sets for the GHM WaterGAP and evaluates the suitability and role of the three multi-variable calibration approaches POC, GLUE and EnCDA. It presents a method for defining performance thresholds for behavioral parameter sets based on observation uncertainties and the initial GLUE ensemble. In each approach, model parameters of all WaterGAP grid cells within so-called calibration-data assimilation (CDA) units were uniformly adjusted. Based on parameter estimation either for the whole Mississippi River basin (MRB) as one CDA unit or for its five sub-basins (four upstream basins and one downstream basin) as alternative CDA units, we will answer the following research questions:
RC: In my understanding and experience, one of the reasons why GHMs currently are not thoroughly or automatically calibrated is mainly because of computational demand, besides model complexity (leading to non-uniqueness). The argument of computational demand has, surprisingly, not been taken into account in any way in selecting calibration approaches for this study. Expensive algorithms and approaches, such as Borg-MOEA and EnKF are explored, already going towards computational limits for the basin explored here. How is this ever going to translate to a global application then?
AC: We wanted to explore whether it is possible to benefit from the advantages of EnKF (EnCDA), which - different from typical calibration of hydrological models - simultaneously adjusts system states (water storages) and model parameters. We hypothesize that this property can be of advantage in situations where a model has structural deficiencies that cannot be “absorbed” via parameter calibration. The goal of the study was to explore whether EnKF can be used to assimilate not only observations of total water storage anomalies, as has already been shown to be feasible and successful at the global scale (e.g., Gerdener et al. 2023) but also streamflow observations (which had not yet been demonstrated), while at the same time adjusting parameters (which also had not yet been demonstrated in this context). We hypothesized that taking into account, in EnKF, both the uncertainties of the climate input and 2) adapting storages, parameter estimates would stabilize towards the end of the calibration period; these parameters could then be used for periods without observation data. Our study has shown that in the current setting of this study, the EnKF approach was less successful in these aspects compared to POC and GLUE, and was thus not found as applicable for reducing and quantifying the uncertainty of output of WaterGAP.
We would like to point out that compared to an uncalibrated run, in the present setting, EnC does show improvements. Increasing the numerical efficiency of the framework even with very large state vectors, as it could be the case for a global EnCDA, is still under development. After submitting this paper, the run time of the assimilation setup was already strongly improved by avoiding reading in and writing to the hard disc, reducing the run time of a global GRACE assimilation by 75%. Therefore, a global EnCDA would still be time-consuming but not impossible after defining a setup for global EnCDA based on the results of this regional study. It is well-known that EnKF performance relies very much on the proper representation of model state and, in this case, parameter correlations, and this in turn depends on ensemble size. Our EnKF may improve in the given setting for larger ensembles, but this is indeed computationally demanding at the global scale.
We think that both GLUE and Borg-MOEA (for POC) are not too expensive to be applied in global hydrological modeling. With 20,000 ensemble members, the run times for six CDA units in our study were 72 hours and 53 hours for POC and GLUE, respectively, and, for 32 ensemble members in the case of EnCDA, 72 hours, in the parallel computing environments described in the manuscript (section 3.4). We are currently setting up a global POC for 712 calibration units (drainage basins) covering and based on runs with a small number of calibration units we estimate the total runtime in case of 20,000 ensemble members to be 15-20 days. In times of high-performance computing, computational demand for global-scale multi-variable parameter estimating is very high but not prohibitive. We will add the information about runtimes to the revised version of the manuscript.
RC: But besides, there were other reasons to be surprised by the selected methods and approaches. The three methods seem to be presented as calibration strategies, but I would argue they are not. GLUE is presented as an optimization technique, while it is merely a way of evaluating a sample. Therefore, it should not come as a surprise that Borg-MOEA outperforms GLUE; the authors already write themselves that the search algorithm searches in the region of interest, while GLUE is just a sample across the whole parameter space. This conclusion, therefore, could have been drawn without doing all the computations. The same holds true for the EnCDA. An implementation of EnKF is used as a way of calibrating, but EnKF has never been developed to serve as a calibration algorithm. It is useful for real-time applications, it is useful to identify model structural errors, but it never claims a convergence towards an optimal parameter set. Therefore, no surprise that results drifted off in the validation period!
AC: We would argue that both Borg-MOEA and GLUE are calibration strategies; Borg-MOEA is a technique for identifying (Pareto-) optimal parameter sets, while GLUE is a technique for identifying behavioral parameter sets but can be also used to determine (in a sub-optimal way compared to POC) optimal (i.e., best-behaving) parameters sets. In this way, both are calibration techniques. GLUE approaches were called calibration, for example, in Marmy et al. (2016) and Wu and Jansson (2013). In our study, GLUE served not only to determine behavioral parameter sets among the GLUE ensemble of parameter sets but also to identify which of the Pareto-optimal parameter sets identified with the search algorithm (POC) are behavioral given the uncertainty of the streamflow and TWSA observations.
With the help of both reviews, we noticed that we have not clearly described the role of GLUE (with a random ensemble of parameter sets) as compared to the role of POC (in which a search algorithm derives an ensemble of Pareto-optimal parameter sets). In the revised manuscript, we will rewrite the storyline and improve the insufficiently clear presentation of the roles of Borg-MOEA and GLUE. To do this, we need to revise parts of the text including the abstract, introduction, definition of the objective of the manuscript and conclusions. We will develop the storyline of the paper differently, starting with saying already in the introduction (in the paragraph in lines 121-133, before the objectives of the study) that optimal parameter sets are best identified using a search algorithm, like used in POC, while GLUE serves, in the face of equifinality, to identify behavioral parameter sets..
Regarding EnKF, it is true that EnKF has never been developed as a calibration algorithm but in our research we tried to find out whether it can serve to estimate parameters. Beyond this statement, however, we respectfully disagree with the reviewer: EnKF has been demonstrated in various studies to improve the realism of global hydrological model simulations when compared to various observations (beyond calibrated model versions), and this includes our own EnKF implementation at global scale with WaterGAP (Gerdener et al., 2023). It is one of the standard techniques when multiple data sets, at different spatial scales and with possibly differing temporal or spatial coverage are to be combined, such as in meteorological or hydrological reanalyses. Various papers (e.g., Wanders et al., 2014, cited in the manuscript) have shown, typically in regional or local settings, that the EnKF variants are capable of estimating model parameters along with model states. Therefore, we believe it is perfectly reasonable to ask whether EnKF is able, at the same time, to estimate model parameters albeit maybe not as efficient as a dedicated calibration approach. Some of the reasons why EnKF has the potential for improved parameter estimation are provided in section 2 of the manuscript and our response to the previous comment.
We agree that from a parameter calibration perspective, deriving an optimal parameter set from EnKF seems complicated. POC and GLUE generate constant parameter sets. However, we hypothesize that the updates of the water storages could stabilize the parameters and compensate for model structure deficiencies and climate input uncertainties. EnKF generates a time series of estimates of parameter sets, which is often misunderstood as generating time-variable parameters. This time series which in the ideal cases converges may include typical seasonal signals, and such signals point towards model errors and are difficult to interpret. In this study, we decided to apply the parameter estimates of the last month of the calibration phase during the validation phase, to be able to compare the different ensemble-based approaches for reducing and quantifying uncertainty — which is the aim of this study. Future studies will investigate how seasonal signals in the parameter estimates can be used to (1) trace back model errors and (2) develop empirical error models, which can include parameterizations depending on the season.
RC: Besides these methodological issues, the study is hard to read and follow. Only at page 20 (!) I felt that I got a more concrete picture of what was done. And even then, it read a lot like a diary. For instance, first I was very very surprised at line 520 that also a multiplication factor for precipitation and net radiation were included as calibration parameters. Then I was not so surprised to find out that the multiplication factor for precipitation came out as most sensitive (l. 582), to then I was surprised again to learn that it was still left out of the calibration (l. 585). I know that there is an argument for documenting failures etc., but I don’t think this is helpful at all at this level: just leave this kind of stuff out, don’t bother the reader with it. Furthermore, there is some kind of strange mixed use of NSE and KGE. The NSE is optimized, but the KGE components are evaluated. Why not directly optimizing the KGE then? That would lead to different results compared to the NSE. Figure 3 shows NSE’s if I read the axes, but the caption refers to some kind of KGE.
AC: To increase the readability, we plan to decrease the length of the main text. We will move section 2.4 “Comparison of the three calibration approaches” (lines 276-377, including Table 1) to the Appendix. Regarding the description of the GRACE TWSA data, we will move the text on leakage problems (lines 483 to 508) to the supplement.
Regarding the reviewer’s comment on the method descriptions reading as a diary, our goal was to make transparent to the reader the many decisions that need to be taken in parameter estimation. Regarding the process of deciding on whether to include the precipitation multiplier P-PM as a calibration parameter, it has nothing to do with documenting a failure but with explicating why it was excluded even though model results are sensitive. While we could remove this from the manuscript, reviewer 1 wanted to get a deeper discussion on the selection of calibration parameters and is interested in a more detailed explanation for the exclusion (R1 comment 0).
Regarding NSE vs. KGE. Exactly because optimization for KGE and NSE would have led to different results, we think it is a good idea to optimize against one criterion (NSE) and then analyze the performance for another criterion (KGE components). We will correct the typo (KGE) in the caption of Figure 3, it should read NSE.
RC: Finally, there is no conclusion-section, just a very extensive “Discussion and conclusion”, which is already indicative that there are too many separate aspects that are aimed to be tackled in this study. This study aimed to serve the GHM community, but the kind of strategies and questions explored here have already been extensively addressed and investigated by regional scale models – with the same conclusions as this study. Now the challenge remains how to translate this to models applied to larger areal extends, and this study does not seem to contribute to that.
AC: We will follow your suggestion (and that of the other reviewer) to split the “Discussion and Conclusions” section and organize the last part of the manuscript as follows:
5 Discussion (with sections 5.1 to 5.6)
6 Conclusions (which includes, in revised form, what is now 5.7 Outlook).
In the conclusions, we will be more specific regarding the meaning of our study results for global-scale modeling. For example, we will add that POC can be used to estimate optimal parameter sets for the large basins of the globe that can then be applied in all global-scale hydrological studies including climate change impact studies. GLUE can be used to determine behavioral parameter sets for those basins, which enables quantifying the uncertainty of the model output. However, due to the excessive computational demand, those many behavioral parameter sets could not be used in climate change impact studies, neither if just WaterGAP were applied and certainly not in a multi-model ensemble with various global hydrological models. Regarding the consideration of parameter ensembles in global-scale climate change impact studies, we will replace the last paragraph of the main text
“We recommend including, in future freshwater-related climate change impact studies, a behavioral ensemble of parameter sets as determined by the GLUE approach even though this will require a significant computational effort. This should reduce the underestimation of modeling uncertainty by traditional multi-model studies. As shown in the multi-model/multi-parameter study for the Colorado River basin by Mendoza et al. (2016), parameter sets with a similar performance during the calibration period may provide very different projections of climate change hazards, and the impact of parameter uncertainty is similar to the impact of hydrological model selection. “
by the following:
Climate change impact studies for individual river basins have shown that parameter sets with a similar performance during the calibration period may provide very different projections of climate change hazards, and that the impact of parameter uncertainty can be similar to the impact of the selected climate or hydrological model selection (Mendoza et al., 2016; Her et al., 2019). Therefore, consideration of parameter uncertainty by running the hydrological model with a number of behavioral parameter sets helps to reduce the underestimation of the uncertainty of potential climate change impact. However, producing a global-scale ensemble of potential future changes in hydrological variables by combining not only multiple greenhouse gas emissions scenarios, global climate models and global hydrological models (as is currently done in ISIMIP) but also model-specific behavioral parameter sets is currently infeasible. The main reason is that behavioral (or even optimal) parameter sets have not yet been determined for any global hydrological model in a spatially explicit manner at the global scale. In addition, the computational effort for such a multi-model/multi-parameter ensemble is likely prohibitive.
RC: Overall, the methods seem to be not in line with the goals that this study aims to achieve, and the written presentation requires substantial improvement.
AC: As described above, we will more clearly define the goals and show how the methods applied served to achieve the goals and will improve the presentation.
References
Gerdener, H., Kusche, J., Schulze, K., Döll, P., Klos, A. (2023): The Global Land Water Storage Data Set Release 2 (GLWS2.0) derived via assimilating GRACE and GRACE-FO data into a global hydrological model. J. Geodesy, 97, 73. https://doi.org/10.1007/s00190-023-01763-9.
Marmy et al. (2016): Semi-automated calibration method for modelling of mountain permafrost evolution in Switzerland. The Cryosphere, 10, 2693–2719. doi:10.5194/tc-10-2693-2016
Wu, S.H., Jansson, P.-E. (2013): Modelling soil temperature and moisture and corresponding seasonality of photosynthesis and transpiration in a boreal spruce ecosystem. Hydrol. Earth Syst. Sci., 17, 735–749. doi:10.5194/hess-17-735-2013
Citation: https://doi.org/10.5194/hess-2023-18-AC1
-
AC1: 'Reply on RC2', Petra Döll, 28 Nov 2023