Multi-variable parameter estimation for a global hydrological model: Comparison and evaluation of three ensemble-based calibration methods for the Mississippi River basin
Abstract. Global hydrological models enhance our understanding of the Earth system and support the sustainable management of water, food and energy in a globalized world. They integrate process knowledge with a multitude of model input data (e.g., precipitation, land cover and soil properties and location and extent of surface water bodies) that describe the state of the Earth. However, they do not fully utilize observations of model output variables (e.g., streamflow and water storage) to decrease model output uncertainty by, e.g., parameter estimation. For the pilot region Mississippi River basin, we assessed the suitability of three ensemble-based multi-variable calibration approaches for identifying both optimal and behavioral parameter sets for the global hydrological model WaterGAP, utilizing observations of streamflow (Q) and total water storage anomaly (TWSA). The common first steps in all approaches are 1) the definition of spatial units for which calibration parameters are uniformly adjusted (CDA units), combined with the selection of observation data, 2) the identification of potential calibration parameters and their a-priori probability distributions and 3) sensitivity analyses to select the most influential model parameters per CDA unit that will be adjusted by calibration. In the estimation of model output uncertainty, we considered the uncertainties of the Q and TWSA observations. We found that the Pareto-optimal calibration (POC) approach, which utilizes the Borg multi-objective evolutionary search algorithm to find Pareto-optimal parameter sets, is best suited for identifying a single “optimal” parameter set for each CDA unit. This parameter set leads to an improved fit to the monthly time series of both Q and TWSA as compared to the standard WaterGAP variant, which is only calibrated against mean annual Q, and can be used to compute the best estimate of WaterGAP output. The Generalized Likelihood Uncertainty Estimation (GLUE) approach is less suitable than POC to identify the optimal parameter set but enables the estimation of model output uncertainties that are due to the equifinality of parameter sets and the observation uncertainty. The potential advantages of the ensemble Kalman filter calibration and data assimilation (EnCDA) approach, in which both parameter sets and water storages are updated, could not be realized, likely due to the high computational burden of this approach, This limited the EnCDA ensemble size to 32, while 20,000 ensemble members could be evaluated in the case of POC and GLUE. Partitioning the whole Mississippi River basin into five CDA units (sub-basins) instead of only one improved model performance during the calibration and validation periods. Very diverse parameter sets were found to lead to similarly good fits to observations, but the range of values of three parameters could be narrowed by calibration. Model structure uncertainties, in particular regarding the operation of man-made reservoirs, the location and extent of small wetlands, and the (lacking) representation of losing river conditions in WaterGAP, are suspected to be the main reasons for the low coverage of the observation uncertainty bands by the GLUE-derived model output uncertainty bands. Model structure uncertainties are also the likely reason for major trade-offs between optimal fit to Q and TWSA. Calibration against GRACE TWSA only, in regions without Q observations, may worsen the Q simulation as compared to the uncalibrated model variant. We plan to add additional remotely-sensed observations in the multi-variable calibration of WaterGAP and suggest considering parameter uncertainty in multi-model ensemble studies of the global freshwater system.
Petra Döll et al.
Status: open (until 09 Jul 2023)
- RC1: 'Comment on hess-2023-18', Anonymous Referee #1, 07 Jun 2023 reply
Petra Döll et al.
Petra Döll et al.
Viewed (geographical distribution)
The study by Doell et al. compares three different strategies to reduce parameter uncertainty for the global hydrological model WaterGap. The methods used are BORG, GLUE, and an ensemble Kalman filter, which the authors apply in a pilot study to the Mississippi basin. How we best estimate global water models is an interesting and relevant question to which the authors contribute. I do like the study and what the authors do and show, but I have some critical comments regarding how the work is currently presented and discussed. I outline my main comments below.
 The authors' use of sensitivity analysis is very nice and interesting, but the results are hardly discussed. I would have liked to see more detail on these results. For example, the precipitation multiplier is not slected as important. Interesting, given that this parameter is often very relevant. Is rthis due to the monthly time step? The authors study a huge domain. How did sensitivity to the parameters vary across this domain? A lot of insights to be gained from this analysis, but they are not discussed. I think this would be worth including rather than some other parts as suggested below.
 This is a very long paper with a lot of details on the model and the data that, at least to me as a reader, seems excessive and not needed to understand the main story presented. It makes reading the paper a bit tedious because most readers will not run WaterGap and they might not even be interested in the extensive background information on the data (as part of the main story).
For example, lines 500-508 discuss problems with the GRACE data and how others have gone about reducing them. Is this really something I need to know to follow the story? I think text like this can go into the supplemental material without reducing the strength of the story told. On the contrary, it would make it better because I do not have to read through this background information unless I want to.
Lines 466-508 discuss details of the GRACE data and their uncertainties in (excessive) detail. At the same time, the authors spent one sentence on stating that two studies considered streamflow errors of about 10%, while the next sentence states that this is maybe a possible average but the variability is very large. The authors spend over 60 lines discussing GRACE and 6 (ok 7) lines to discuss the other variable they use. I do not understand why the authors do not present a more balanced discussion given that both variables suffer from significant and potentially complex uncertainties.
 Starting from the back, i.e. the Outlook section, I wonder what transferrable knowledge the authors contribute that is unrelated to using WaterGap (and potentially the traditional approach to calibrating WaterGap)?
My impression is that most of the conclusions are rather specific to the use of WaterGap. I do not think that this is a problem per se, but it would be good if the authors would be clearer about general outcomes and those specific to WaterGap. One problem in this context is that Discussion and Conclusions are jointly discussed and that this section is 7 pages long. I think these sections can be joined if this part of the paper is short, but here it is very long. A long discussion followed by a very short conclusions and outlook section would make it much easier for the reader. There the authors could also easily separate specific and general conclusions.
 The final recommendation to include uncertainty in climate change impact projections related to freshwater is good, but this is already widely done (see below). Can the authors be more specific regarding their recommendation? They could for example discuss this issue much more in the context of global models and the specific implications this has.
Just a few random examples from a quick online search:
 The connection to existing literature is in places very extensive and in others very brief. All methods used here have been previously assessed widely. Maybe not exactly in this combination, but certainly individually or in combination with other methods. I would therefore have expected that the authors help the reader to start from a more informed level.
For example, the (poor) ability of GLUE to identify the best parameter set has been explored in the past (see link below) and thus is what should be expected. The issue now is rather what relevance this has for the study at hand.
 While I possibly sound rather critical, I think this is an interesting and relevant study. My comments are simply meant to help the authors communicate their work with the readers. Shortening the paper, being clearer about specific and general contributions, and a better connection with existing literature would make it much easier for readers to understand the study and its relevance.