Introduction

HESS

Hydrology and Earth System Sciences

HESS

Hydrol. Earth Syst. Sci.

1607-7938

Copernicus Publications

Göttingen, Germany

10.5194/hess-20-2913-2016

Simultaneous calibration of hydrological models in geographical space

Bárdossy

András

Huang

Yingchun

yingchun.huang@iws.uni-stuttgart.de Wagener

Thorsten

https://orcid.org/0000-0003-3881-5849

1Institute for Modelling Hydraulic and Environmental Engineering, University of Stuttgart, Stuttgart, Germany 2Department of Civil Engineering, Queen's School of Engineering, University of Bristol, Bristol, UK

Yingchun Huang (yingchun.huang@iws.uni-stuttgart.de)

19July2016

20 7 29132928 3October2015 30October2015 27June2016 1July2016

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://hess.copernicus.org/articles/20/2913/2016/hess-20-2913-2016.html

The full text article is available as a PDF file from https://hess.copernicus.org/articles/20/2913/2016/hess-20-2913-2016.pdf

Hydrological models are usually calibrated for selected catchments individually using specific performance criteria. This procedure assumes that the catchments show individual behavior. As a consequence, the transfer of model parameters to other ungauged catchments is problematic. In this paper, the possibility of transferring part of the model parameters was investigated. Three different conceptual hydrological models were considered. The models were restructured by introducing a new parameter η which exclusively controls water balances. This parameter was considered as individual to each catchment. All other parameters, which mainly control the dynamics of the discharge (dynamical parameters), were considered for spatial transfer. Three hydrological models combined with three different performance measures were used in three different numerical experiments to investigate this transferability. The first numerical experiment, involving individual calibration of the models for 15 selected MOPEX catchments, showed that it is difficult to identify which catchments share common dynamical parameters. Parameters of one catchment might be good for another catchment but not the opposite. In the second numerical experiment, a common spatial calibration strategy was used. It was explicitly assumed that the catchments share common dynamical parameters. This strategy leads to parameters which perform well on all catchments. A leave-one-out common calibration showed that in this case a good parameter transfer to ungauged catchments can be achieved. In the third numerical experiment, the common calibration methodology was applied for 96 catchments. Another set of 96 catchments was used to test the transfer of common dynamical parameters. The results show that even a large number of catchments share similar dynamical parameters. The performance is worse than those obtained by individual calibration, but the transfer to ungauged catchments remains possible. The performance of the common parameters in the second experiment was better than in the third, indicating that the selection of the catchments for common calibration is important.

Introduction

Hydrological models are widely used to describe catchment behavior, and for subsequent use for water management, flood forecasting, and other purposes. Hydrological modeling is usually done for catchments with observed precipitation and discharge data. The unknown (and partly not measurable) parameters of a conceptual or, to some extent, physics-based model are adjusted in a calibration procedure to reproduce the measured discharge from the observed weather and catchment properties. Due to the high variability of catchment properties and hydrological behavior , this modeling procedure is usually performed individually for each catchment. Different catchments are often modeled using different models. This great variety of models and catchments makes a generalization of the description of the hydrological processes very challenging . Additionally, even for a selected model applied for a specific catchment, the parameter identification is not unique. A great number of parameter vectors might lead to a very similar performance .

Moreover, due to overreliance on measured discharge for model calibration, estimation of model parameters for ungauged basins is a big challenge. Instead of model calibration, parameters have to be estimated on the basis of other information . A decade of worldwide research efforts have been carried out for the runoff prediction in ungauged basins (PUB) . The PUB synthesis book takes a comparative approach to learning from similarities between catchments and summarizes a great number of interesting methods that are being used for predicting runoff regimes in ungauged basins. Many attempts have been made to develop catchment classification schemes to identify groups of catchments which behave similarly . However, the task is of great importance. discussed the need for a widely accepted classification system and pointed out that a good classification would help to model the rainfall–runoff process for ungauged catchments.

give a comprehensive review of regionalization methods for predicting streamflow in ungauged basins. Catchment similarity can be determined by comparing their corresponding discharge series using correlation or copulas . Much of the variability in discharge time series is controlled by the weather patterns. Therefore, it is likely that similarity in discharge is higher for catchments with well correlated weather, which often requires geographical closeness . However, discharge series produced by catchments can be very different under different meteorological conditions. Even the same catchment behaves differently in a dry and in a wet year. Due to the different weather forcing, the above methods would consider the same catchment in one time period as dissimilar to itself in another time period.

One can also define catchment similarity using hydrological models . Catchments are similar if they can be modeled reasonably well by the same model using the same model parameters . Due to observational errors and specific features in the calibration period, the adjustment of the model can be very specific to the observation period leading to an overcalibration . To overcome such limitations, a regional calibration approach is suggested to identify single parameter sets that perform well for all catchments within the modeled domain. indicate that the iterative regional calibration indeed reduced the uncertainty of most parameters. Regional calibration can result in a better temporal robustness than normal individual calibration and it provides an effective approach in large-scale hydrological assessments .

The focus of this paper is to investigate if the transformation of precipitation to discharge is possible independently of the weather. For this purpose, the hydrological model parameters are separated into two groups:

parameters describing the water balances which are strongly related to climate; and

parameters describing the dynamics of the runoff triggered by weather.

The second group of parameters is supposed to be weather independent and represent the focus of this paper. To simplify the problem, a single new parameter η was introduced to describe water balance. This parameter is conditional on the other model parameters and adjusts the long-term water balances.

The purpose of this paper is to investigate to what extent the different catchments share a similar dynamical rainfall–runoff behavior and can be modeled using the same model parameters, with the exception of the newly introduced individualized water balance parameter η.

Hydrological models are usually judged according to the degree of reproducing discharge dynamics and water balances. While water balances are mainly driven by weather in terms of precipitation, temperature, radiation, and wind, dynamics are controlled by catchment properties in terms of size, terrain, slopes, soils, etc. Formation of landscapes as a result of long-time climate is a quasi-equilibrium process. The hypothesis of this paper is that this equilibrium is mirrored in a similar dynamic behavior. Thus, a large number of catchments can be modeled by using the same dynamic parameters.

Three simple conceptual hydrological models combined with three different performance measures are used to describe the rainfall–runoff behavior on the daily timescale for a large number of catchments.

Location of the catchments selected for the experiments.

The following three different numerical experiments, including calibration and validation procedures, are carried out for different sets of selected catchments:

The usual catchment-by-catchment calibration is carried out. In order to test if dynamical model parameters are shared, the parameters are directly transferred to all of other catchments.

Instead of the traditional catchment-by-catchment calibration, it is assumed that the model parameters are similar for a set of catchments in a close geometrical setting. Thus, a simultaneous calibration of the models is carried out and tested both in a gauged and an ungauged version.

The geographical extent of the catchments used for simultaneous calibration is expanded. A great number of assumed ungauged catchments are used for testing the hypothesis.

The hypothesis is that the rainfall–runoff process can be described using the same dynamical hydrological model parameters for a number of catchments. The very different climatic conditions and water balances of the catchments are considered by the newly introduced specific parameter η controlling the long-term water balance of each catchment individually. The other model parameters control the discharge dynamics on both short and long timescales. These dynamical parameters are supposed to be shared despite the great heterogeneity of the catchments. This procedure simplifies the hydrological model parameter estimation for ungauged catchments, namely the procedure is reduced to the estimation of a single parameter η, which can be related to long-term water balances.

The paper is structured as follows: after the introduction, the investigation area is described. This is followed by a description of the three conceptual hydrological models and the three performance criteria used for calibration and validation. In Sect. 4, the new model parameter η controlling the water balance is introduced. In Sects. 5–7, three numerical experiments are described and the results are presented, starting with the individual calibration of the models and ending with a transfer of the model parameters to randomly selected catchments. The paper concludes with a discussion of the results.

Investigation area and available data

The study area is the eastern United States. Locations of the 196 catchments used in this study are shown in Fig. . The catchments for a subset used for the international Model Parameter Estimation Experiment (MOPEX) project. Catchments range in size from 134 to 9889 km2 and exhibit aridity indices (long-term potential evapotranspiration to precipitation rates) between 0.41 and 3.3, hence representing a heterogeneous data set. Time series data of daily streamflow, precipitation, and temperature for all catchments were provided by the MOPEX project . Catchments within this data set are minimally impacted by human influences. Streamflow information within this data set was originally provided by the United States Geological Survey (USGS) gauges, while precipitation and temperature was supplied by the National Climate Data Center (NCDC). The MOPEX data set has been used widely for hydrological model comparison studies (see references in ).

Hydrological models and performance criteria

Three simple conceptual hydrological models were applied in this study. The reason for this is that the great number of calibration and validation experiments could only be performed with relatively simple model structures. It is important to see if the results are similar for different models and performance measures. In a subsequent study, spatially distributed models will be considered.

HYMOD model

The HYMOD model is a conceptual rainfall–runoff model derived from the Probability Distributed Model . The soil moisture accounting module of HYMOD utilizes a Pareto distribution function of storage elements of varying sizes. The storage elements of the catchment are distributed according to a probability density function defined by the maximum soil moisture storage CMAX and the distribution of soil moisture store b . Evaporation from the soil moisture store occurs at the rate of the potential evaporation estimates using the Hamon approach. After evaporation, the remaining rainfall and snowmelt are used to fill the soil moisture stores. A routing module divides the excess rainfall using a split parameter α which separates fluxes amongst two parallel conceptual linear reservoirs meant to simulate the quick and slow flow response of the system (defined by residence times kq and ks).

HBV model

The HBV model is a conceptual model and was originally developed at the Swedish Meteorological and Hydrological Institute (SMHI) . Snow accumulation and melt, actual soil moisture, and runoff generation are calculated using conceptual routines. The snow accumulation and melt is based on the degree-day approach. Actual soil moisture is calculated by considering precipitation and evapotranspiration. Runoff generation is estimated by a nonlinear function of actual soil moisture and precipitation. The dynamics of the different flow components at the subcatchment scale are conceptually represented by two linear reservoirs. The upper reservoir simulates the near surface and interflow in the subsurface layer, while the lower reservoir represents the base flow. They are connected through a linear percolation rate. Finally, there is a transformation function consisting of a triangular weighting function with one free parameter for smoothing the generated flow.

Xinanjiang model (XAJ)

The Xinanjiang model (XAJ) model was established in the early 1970s in China. This conceptual rainfall–runoff model has been applied to a large number of basins in the humid and semi-humid regions in China. The lumped version of XAJ model consisted of four main components . The evapotranspiration is represented by a three-layer soil moisture module which differentiates upper, lower, and deeper soil layers. Runoff production is calculated based on rainfall and soil storage deficit, tension water capacity curve is introduced to provide for a nonuniform distribution of tension water capacity throughout the whole catchment. The runoff separation module separates the determined runoff into three parts, namely surface runoff, interflow, and groundwater. The flow routing module transfers the local runoff to the outlet of the basin. In order to account for the precipitation that is contributed from snowmelt, the degree-day snowmelt approach is added in this model. In this study, the model has 16 parameters which can be adjusted using calibration.

Performance criteria

Model calibration depends strongly on the performance criteria used. In order to obtain reasonably general results, three different criteria were selected to evaluate model performance.

The Nash–Sutcliffe efficiency between the observed and modeled flow is most frequently taken as the first evaluation criterion: O(1):NS=1-∑t=1TQo(t)-Qm(t)2∑t=1TQo(t)-Qo‾2. Here, Qo(t) is the observed discharge and Qm(t) is the modeled discharge on a given day t. The abbreviation NS is used subsequently for this performance measure.

The NS model performance criterion was often criticized for example, in, and several modifications and other criteria were suggested. One interesting suggestion was published in : the authors suggest using a performance measure which accounts for the water balances and the correlation of the observed and modeled time series separately. Their approach was slightly modified and the following performance criterion was introduced: O(2):GK=1-β∑t=1TQo(t)-Qm(t)∑t=1TQo(t)2-1-rQo,Qm2. Here, r(Qo, Qm) is the correlation coefficient between the observed and modeled time series of discharge. β is a weight to express the importance of the water balance. In our study, β = 5 was selected. The reason for selecting this version of the coefficient is that a model should produce good water balances and appropriate discharge dynamics simultaneously. The quadratic form in Eq. () assures that both aspects are considered, and the worse of them is dominant. The abbreviation GK is used subsequently for this performance measure.

The Nash–Sutcliffe coefficient of the logarithm of the discharges is focusing on the low flow conditions more than the traditional NS coefficient: LNS=1-∑t=1Tlog⁡Qo(t)-log⁡Qm(t)2∑t=1Tlog⁡Qo(t)-log⁡Qo‾2. To equally concentrate on high and low flows, a combination of the original NS and the logarithmic NS is used as a third measure: O(3):NS+LNS=NS+LNS2. The abbreviation NS + LNS is used subsequently for this performance measure.

The three performance criteria were modified, hence the higher the value, the better the model. Further the best value for the criteria is 1.

Method Model parameter to control water balance

Climatic conditions are of central importance for water balances. The relationship of potential to actual evapotranspiration can differ strongly due to water or energy limitations. This suggests that catchments might have similar dynamical behavior but with different water balances. In order to account for this, the model parameters could be separated to form two groups, one group with parameters controlling the water balances and another controlling the discharge dynamics. This separation of existing model parameters is difficult, as they often simultaneously influence both components. Instead of an artificial model-specific separation, a new parameter η was introduced to all three models. This parameter controls the ratio between daily potential and actual evapotranspiration depending on the available water and depends on the long-term water balance only. This parameter η gives Eta=EtpifSMCMAX>ηminSMη⋅CMAXEtp,SMelse. Here, SM is the actual soil water available for evapotranspiration. CMAX is the maximum possible soil moisture. Etp stands for the potential and Eta for the actual evapotranspiration, respectively.

The parameter η regulates the water balances in accordance with the dynamical parameters. It can be calculated directly for each parameter vector θ. This is necessary as it is thought to establish correct water balances. Thus, parameter η depends on the catchment and parameter vector θ. Here, f(η) = ViM(η, θ) is a monotonically decreasing function of η. If the model can provide correct long-term water balances then ViM(1,θ)<ViO<ViM(0,θ). As f(η) = ViM(η, θ) is continuous, there is a unique η(θ) for which ViM(η(θ),θ)=ViO. If Eq. () is not fulfilled, then the parameter vector θ is not appropriate for the model.

The parameter η is fitted individually for each θ, and this way a correct water balance is assured for the calibration period.

Experimental design

In this study, the ROPE algorithm was applied for model parameter optimization. This parameter optimization method could obtain a predetermined number of optimal parameter sets that perform very similar to the models, although the parameter sets are very heterogeneous. In this study, each calibration yielded 10 000 convex sets of good parameter vectors. Three numerical experiments on a large number of catchments were carried out to investigate the transferability of the model parameters under different calibration strategies. For a clear explanation and understanding of the methods, the procedure and results for these three experiments are presented in the following three sections.

Numerical experiment 1: individual calibration and parameter transfer

The first experiment is thought to test the transferability of the model parameters under the usual individual calibration for each catchment.

As a first step, 15 catchments with reliable data and slightly varying catchment properties in the eastern US were selected. Locations of the selected gauges are marked as the red plus symbols in Fig. . Table lists the basic catchment properties and Table summarizes the meteorological conditions for the selected 15 catchments, respectively . The tables show that despite their geographical proximity, these catchments have quite different climate and hydrographic properties.

For the 15 selected catchments, an individual calibration was performed using all three models and all three performance measures. Data series from 1951 to 2000 were split up into five subperiods. This leads to 45 calibrations for each catchment. Each calibration yielded convex sets Gi of good parameters for each catchment i. A total of 10 000 parameter vectors from each of these sets were generated. (Note that the corresponding parameter η was estimated for each element of the parameter set separately.)

Let Oi(j)(θ) denote the value of the objective function j for a parameter vector θ in catchment i. The best objective function value for each individual catchment is denoted with Oi(j)*. The parameter sets display substantial equifinality as all of them perform very similarly. For simplicity, we used the average value of the 10 000 performances to represent the simulation result for each catchment.

The left part of Fig. shows the mean values of the objective function NS for the 10 000 parameter vectors for the calibration period 1971–1980 for the three selected models (denoted as individual calibration). As expected, the model performance varies across catchments. The reasons for this are observation errors both in input and output as well as a possible inability of the model to represent the main hydrological processes reasonably well.

Catchment properties for the selected 15 catchments.

Streamgauge Streamgauge Drainage Shape Field Average Base Snow ID name area factor capacity porosity flow proportion (km2) index (%) 01548500 Pine Creek 1564 0.14 0.32 0.42 0.44 26.6 at Cedar Run, PA 01606500 So. Branch Potomac River 1663 0.15 0.31 0.28 0.45 19.5 near Petersburg, WA 01611500 Cacapon River 1753 0.17 0.269 0.27 0.41 15.6 near Great Cacapon, WV 01663500 Hazel River at Rixeyville 743 0.16 0.30 0.39 0.51 12.1 at Rixeyville, VA 01664000 Rappahannock River 1606 0.11 0.294 0.40 0.50 11.8 at Remington, VA 01667500 Rapidan River 1222 0.13 0.32 0.40 0.51 10.6 near Culpeper, VA 02016000 Cowpasture River 1194 0.18 0.28 0.27 0.43 16.0 near Clifton Forge, VA 02018000 Craig Creek 852 0.24 0.27 0.30 0.44 11.3 at Parr, VA 02030500 Slate River 585 0.20 0.30 0.46 0.48 8.5 near Arvonia, VA 03114500 Middle Island Creek 1186 0.14 0.36 0.27 0.21 15.6 at Little, WV 03155500 Hughes River 1171 0.14 0.36 0.27 0.22 14.9 at Cisco, WV 03164000 New River 2929 0.09 0.29 0.43 0.64 13.3 near Galax, VA 03173000 Walker Creek 790 0.24 0.32 0.37 0.46 13.5 at Bane, VA 03180500 Greenbrier River 344 0.26 0.36 0.27 0.37 25.3 at Durbin, WV 03186500 Williams River 332 0.33 0.36 0.28 0.36 24.3 at Dyer, WV

Climate variables of the 15 selected catchments.

No. Streamgauge Annual Average Annual potential Annual ID precipitation temperature evapotranspiration runoff (mm) (∘C) (mm) (mm) 1 01548500 951.7 7.2 727.0 495.1 2 01606500 948.6 10.3 716.3 378.3 3 01611500 905.6 10.8 800.0 310.5 4 01663500 1049.9 11.7 897.2 402.6 5 01664000 1027.7 12.0 906.1 367.5 6 01667500 1087.4 12.3 915.2 380.4 7 02016000 1029.5 11.0 746.0 402.9 8 02018000 1010.6 11.4 764.6 406.3 9 02030500 1075.9 13.5 918.2 350.3 10 03114500 1089.7 11.4 737.4 483.9 11 03155500 1057.8 11.6 740.0 443.7 12 03164000 1247.9 10.6 807.4 593.3 13 03173000 958.6 11.1 762.7 371.9 14 03180500 1224.2 8.3 710.9 543.2 15 03186500 1401.5 9.1 710.9 945.0

Performance of the individually calibrated and the common calibrated models using NS as performance criterion.

The ranges of the model parameters are relatively large. As a first step, we checked if the catchments have common parameter vectors. For each pair of catchments (i, j), for the same performance measure and time period, the intersection of the convex hull of the good parameter sets Gi ∩ Gj is empty, showing that there are no common best parameters. From the result, seemingly none of the catchments are similar.

As a next step, the 10 000 generated best dynamical parameter vectors for a given time period and hydrological model obtained for catchment i were applied to model all other catchments using the same hydrological model and time period. Note that the value of η is not transferred but adjusted to the true long-term water balance. In the numerical experiments, we assume that the long-term discharge volumes are known variables for all simulations. However, it highlights the issue of estimating the real water balance in ungauged basins, which will be addressed in the discussion. Figure shows the color-coded matrices for the mean NS performance and GK performance of the three hydrological models using transferred parameters for all 15 catchments for a calibration period (1971–1980).

Color-coded matrices for the mean model performance of the parameter transfer for the selected 15 catchments. The upper panel used NS as performance measure, the lower panel used GK as performance measure.

The performance of the transferred parameter vectors displays a strongly varying picture. While in some cases the catchments seem to share parameter vectors with reasonably good performance, in other cases the transfer led to weak performances. A further surprising fact is that none of the matrices are symmetrical. One can see that some catchments are good donors as their parameters are good for nearly all catchments, while others have parameters which are hardly transferable.

The asymmetry of the parameter transition matrices cannot be explained by catchment properties. Two different catchments seem to share well-performing parameters if calibrated on one catchment and no common good parameters if calibrated on the other one. Take the catchments 1 and 12 with the NS performance as an example. For all three models, parameters calibrated for catchment 1 are not suitable for catchment 12, but parameters of catchment 12 perform reasonably well for catchment 1. From the observation data, we found that catchment 12 is under relatively dry climate conditions during the calibration period. We also found, from the simulated hydrographs, that the parameter sets calibrated on catchment 1 could not adequately capture the dynamic behavior of catchment 12 as the low flows were underestimated for most of the time and the peak flows were obviously overestimated. The matrices for NS show different performances with different models. In general, the HBV model performs the best. The average value of the matrix is 0.62 for HBV, 0.55 for HYMOD, and 0.54 for XAJ. Furthermore, the correlations of transferred model performance between different models are all greater than 0.7. From the viewpoint of parameter transferability, the three models perform similarly, if a parameter transfer is reasonable from catchment i to j for one model then it is also reasonable for the other models. The results for the GK performance differ from those of the NS performance. Here, the XAJ model seems to give the generally best transferable parameters. Parameter vectors from other catchments generally fail to perform on catchment 15 across all three models.

The difference of the transferability for these two performance measures could be explained by different focuses; while NS is mainly focusing on the squared difference between the observed and modeled discharge, GK focuses on water balances and good timing, and the combination of NS and LNS is strongly influenced by low flow events. It is interesting to observe that catchment 12 is a very bad receiver for model parameters for NS, while it is an excellent receiver for GK. This means that different events have different influence on the performance. A possible explanation for the asymmetry is the fact that the catchments have different weather forcing in the calibration period. It could be that runoff events which are most important for a performance measure occur in the calibration period frequently in one catchment leading to good transferability, and seldom in the other, causing weak transferability of the parameters from one catchment to another.

The transferability of the model parameters was also tested for an independent validation period between 1991 and 2000. Figure shows the corresponding color-coded results for NS as performance measure. The matrices are similar to those obtained for calibration. Catchment 12 remained a bad receiver but a good donor, indicating that the bad performance is unlikely to be caused by observation errors. Further, for some columns the off-diagonal elements are larger than the diagonal ones which is a sign of a possible overcalibration of models.

To investigate the influence of climate on calibration, the hydrological models calibrated for different time periods using the same model and performance measure were compared. As the different time periods represent different climate conditions, the calibrations led to different parameter sets. As a comparison, the differences in calibrated model parameters using the same model and performance measure for different catchments were compared. As an example, the left part of Fig. shows two calibrated parameters of the HYMOD model for catchment 13 on three different 10-year time periods. The right part of Fig. shows the same parameters obtained by calibration for three different catchments (7, 8 and 13) during the time period 1951–1960. The structural similarity of the two scatterplots suggests that the difference between the different catchments is comparable to the difference between the different time periods. In hydrological modeling, it is usually assumed that model parameters are constant over time, assuming no significant change in climate or other characteristics. The results, however, show the assumption that parameters are the same over space is not completely unrealistic. The figures even suggest that there might be parameter vectors which perform reasonably well for all 15 catchments. As a next step, an experiment to test this assumption was devised.

Color-coded matrices for the mean NS model performance of the parameter transfer for the validation period for the selected 15 catchments.

Scatterplots for two selected HYMOD parameters (CMAX and α) obtained via model calibration using NS as performance measures. Left panel: for catchment 13 (black: 1951–1960, blue: 1971–1980, and red: 1991–2000); Right panel: for catchments 7 (red), 8 (blue), and 13 (black) for 1951–1960.

Numerical experiment 2: simultaneous calibration

For many pairs of catchments, the parameter transfer worked reasonably well. As a next step, we investigated if there are parameters which perform reasonably well for all catchments. As seen in the previous section, none of the catchments share optimal parameters. Therefore, common suboptimal parameters have to be found.

In order to identify parameter vectors which perform simultaneously well for each catchment, the hydrological models were calibrated for all 15 catchments simultaneously. The simultaneous calibration of the model for all catchments is a multi-objective optimization problem. The goal is to find parameter vectors which are almost equally good for all catchments with no exception. As the models perform differently for the different catchments due to data quality and catchment particularities, the performance was measured through the loss in performance compared to the usual individual calibration. Thus, the objective function was formulated using the formulation of the compromise programming method : R(j)(θ)=∑i=1nOi(j)*-Oi(j)(θ)p. Here, index i indicates the catchment number and index j indicates the type of the individual performance measure specified in Eqs. (), (), and (). The goal in this objective function is to minimize R(j). Here, p is the so-called balancing factor. The larger the value of p is, the more the biggest loss in performance contributes to the common performance. In order to obtain parameters which are good for all catchments, a relatively high p = 4 was selected for all three performance measures.

In the same way as individual calibration, the ROPE algorithm was used for the simultaneous calibration. The optimized parameter sets H(j) are simultaneously well performed for each model and time period. The left part of Fig. compares the performance of the individually calibrated and the common calibration for the 15 selected catchments using NS as performance criterion. As expected, the results show that the individual calibrations led to better performances, but the joint parameter vectors perform reasonably well for all catchments.

Mean NS model performance of the calibration, individual parameter transfer, and for the leave-one-out transfer for the selected 15 catchments for the calibration time period 1971–1980. Left panel: HBV, right panel: HYMOD.

Mean NS model performance of the calibration, individual parameter transfer, and for the leave-one-out transfer for the selected 15 catchments for the validation time period 1991–2000. Left panel: HBV, right panel: HYMOD.

As the goal of modeling is not the reconstruction of already observed data, the performances on a different validation period (1991–2000) were also compared. The right part of Fig. shows the mean model performances for the 15 individually calibrated and the common calibrated data sets. The observation that parameter vectors obtained through common calibration may outperform individual on-site calibration may also indicate the weakness of the calibration process for an individual catchment, which should ideally be able to identify the best parameter set.

Runoff hydrographs for catchment 14 obtained using individual and leave-one-out common calibrations of HBV using the GK performance measure.

Runoff hydrographs for catchment 5 obtained using individual and leave-one-out common calibrations of HBV using the NS performance measure.

These results indicate that instead of transferring model parameters from a single catchment, a parameter transfer might perform better if the parameters obtained through common calibration on all other catchments are used. In order to test this kind of parameter transfer, a set of simple leave-one-out calibrations were performed. This means that for a catchment i, the hydrological models were simultaneously calibrated for the remaining 14 catchments. Each time another catchment i was not considered for calibration, leading to 15 simultaneous calibrations. These common model parameters were then applied for the catchment which was left out. The performance of the models on these catchments in the calibration period is reasonably good for all catchments. Figure shows the result of HBV and HYMOD using the NS performance measure. It compares the performance of the parameters obtained via individual calibrations (red x mark), parameter transfers from other catchments individually (blue plus), and the transfer of the common parameters obtained by leave-one-out procedure (green diamond). The performance of common parameters is obviously weaker than that of the individual calibration but better than many parameter transfer obtained using individual parameter transfer. To test the potential of the transferability of the common parameters, a validation period was used. Figure shows the results for the validation time period 1991–2000. In this case, the common calibration performs very well. For HYMOD, it outperforms the parameter vectors obtained by individual calibration for 6 out of the 15 catchments. For the other catchments, the loss in performance is relatively small. Note that this good performance of the common models was obtained without using any information of the target catchment. The transfer of parameters obtained from individual calibrations on other catchments shows a highly heterogeneous picture, as described in experiment 1. The transferred common calibration is better than most of these performances. Further, note that the results of experiment 1 show that there is no explanation for why certain transfers work well and others do not. Thus, for the transfer of model parameters to ungauged catchments, common calibration seems to be a reasonable method.

In order to illustrate how model parameters of the leave-one-out common calibration perform in validation, two hydrographs are presented. Figures and show a part of the observed, the modeled, and the common calibration transferred hydrographs for a randomly selected parameter set obtained by individual calibration and leave-one-out common calibration of HBV for catchments 5 and 14. While for catchment 5 the common calibration leads to a hydrograph which is slightly better than that obtained by individual calibration, in the second case for catchment 14 the performance is reversed. However, in both cases the common parameters, which were obtained without using any observations of the catchment, perform surprisingly well.

Numerical experiment 3: extension to other catchments

The results of the previous experiment suggest that even more catchments might share parameters which perform well on all. The 15 catchments used in experiments 1 and 2 are however, to some extent, similar and thus can not necessarily be considered as representative of a great number of other catchments. Thus, for the third experiment, 192 catchments of the MOPEX data set were considered. Of them, 96 were randomly selected for common calibration (marked as blue circles in Fig. ); the other 96 catchments were used as receivers to test the performance of the common parameters (marked as green triangles in Fig. ). The HBV model using three selected performance measures was considered in this experiment.

For each of the 192 catchments, an individual model calibration was carried out using 1971–1980 as the calibration period. Common calibration was performed for the selected 96 catchments the same way as in experiment 2, and for the HBV model using all performance measures.

Histograms of the NS model performance of HBV for the 96 selected (donor) catchments. Left panel: calibration period (1971–1980), right panel: validation period (1991–2000).

Histograms of the NS model performance of HBV for the 96 test (ungauged) catchments. Left panel: calibration period (1971–1980), right panel: validation period (1991–2000).

As a first step, the model performances for the individual and common calibration were compared. As expected and already seen in experiment 2, the performance for the common calibration is lower than the individual one for HBV using all performance measures. For example, the mean performance NS over all 96 catchments drops from 0.69 to 0.50. When one applies the models for the validation period 1991–2000, the individually calibrated model mean performance is 0.65, while for the common calibration the mean increases to 0.51. Figure shows the histograms of the performance NS for the calibration and validation periods for the individual and the common calibrations. Results indicate the robustness of the common calibration. The transfer to the 96 assumed ungauged catchments shows very similar performance for the common parameters as for the catchments selected for common calibration. Figure shows the histograms of the performance NS for the individual calibration and the transfer for the assumed ungauged catchments. It can be seen clearly from the histogram that there is very little difference between the performance for the gauged and the ungauged catchments. In 90 % of catchments, the common calibration works reasonably well, even for the ungauged cases. The common parameters describing runoff dynamics of all 192 catchments indicate that there is a high degree of similarity of these catchments.

Comparing the results of the common calibration using the 96 catchments to that obtained using the 15 catchments, one can observe that the increase of catchments considered for the common calibration led to a decrease of the performance. The common parameter sets calibrated by 15 catchments in a reasonable geographic proximity perform better than the parameter sets calibrated by 96 catchments. If one is interested in finding model parameters for a specific ungauged catchment, the common calibration using a more careful selection of the donor set of catchments is likely to lead to good parameter transfers.

The water balances of the 192 catchments are different leading to different η parameters. Figure shows the distribution of η values for three randomly selected common good parameter sets for the HBV model using NS as a performance measure for the calibration time period. It can be seen clearly from the curve that for the same catchment, η is specific for different dynamical parameter sets. Also, due to the differences in water balance, different catchments requires different η values to control actual evapotranspiration. Furthermore, for all 192 catchments, the parameter η presents a very similar tendency for different dynamical parameter sets. Figure plots the mean η value against the ratio of the long-term actual evapotranspiration to potential evapotranspiration (Eta/Etp) for each catchment. It shows strong negative correlation (-0.72) between η and Eta/Etp.

Discussion Robust parameter sets

The three experiments were carried out in way that a set of parameters (usually represented by 10 000 individual parameter sets) was used. This leads to a considerable fluctuation of the results. Modelers often prefer to use single parameter vectors. If a single parameter vector is desired, then according to , the deepest parameter set (which represents the most central point in the whole parameter vector) is the most likely candidate to be robust. This study also indicates the deepest parameter set performs slightly better than the mean of the parameter sets considered.

Distribution of water balance parameter η for three randomly selected common parameter vectors obtained via HBV using the NS performance measure for 192 selected catchments.

Scatterplots of mean η value and ratio of actual evapotranspiration to potential evapotranspiration for 192 selected catchments.

The discharge coefficient of the catchments selected for the experiments.

Variability and estimation of <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula>

As defined, the water-balance-related parameter η is specific for each catchment and each model parameter vector. Therefore, each individual catchment has a large variation in η for the calibrated 10 000 parameter sets. Also, for the same set of good parameters that match different water balances, different catchments always require very different η values to control actual evapotranspiration. Parameter η is estimated because it controls the water balance and can be estimated at other catchments. The remainder of the parameters (the dynamic ones) are regionally calibrated (all catchments are given the same parameter set). Therefore, only η varies between catchments. As η is specific for each parameter vector, regionalization of η directly is not feasible and η remains different for different parameter vectors after regionalization. In the numerical experiments, in order to estimate water balance parameter η, the long-term discharge volumes were treated as known variables for both gauged and ungauged catchments. For application in practical systems, the long-term discharge volumes have to be estimated for ungauged catchments. This problem is not explicitly treated in this paper. The estimation of parameter η is a limitation of the presented simultaneous calibration approach. Regionalization of long-term discharge volumes is a prerequisite for the application in ungauged basins. For the study area, the discharge coefficients which relate discharge volumes to (known) precipitation show quite a smooth spatial behavior as shown in Fig. . Thus, the regionalization of this parameter does not seem to be an extremely complicated task in this particular region. According to the previous analysis of η, for each common dynamical parameter set, one can have a possible estimator of η for a certain catchment based on the regionalization of discharge coefficients. The potential application of this approach in other regions needs to be investigated in future work.

Prediction in ungauged basins

The results of this study supported the general finding of and , where the simultaneous calibration led to weaker model performance than the individual one for both calibration and validation time periods. The loss of model performance in validation is smaller than that in calibration. When applied to ungauged catchments, the simultaneous calibration shows more robustness than the individual one. Simultaneous calibration of models in geographical space offers a good possibility for the runoff prediction in ungauged basins. Compared with traditional regionalization method, only the water balance parameter η has to be estimated based on the regionalization of discharge coefficients.

It was examined from the hydrographs that high flows are often underestimated and low flows are probably overestimated. This kind of phenomenon has also been detected in previous regional calibration studies . This behavior is mainly due to the uncertainty of model structure and the low spatial and temporal resolutions of both models and input variables .

Conclusions

In this paper, the transfer of the dynamical parameters of hydrological models was investigated. A new model parameter η controlling the actual evapotranspiration was introduced to cope with the clear differences in water balances due to water or energy limitations. Three hydrological models were used in combination with three different performance measures in three numerical experiments on a large number of catchments.

The individual calibration and transfer results indicate that models are often overfitted during calibration. The parameters are sometimes more specific for the calibration time period and their relation to catchment properties seems to be unclear. This makes parameter transfers or parameter regionalization based on individual calibration difficult. The common spatial calibration strategy, which explicitly assumed that catchments share dynamical parameters, was tested on 15 catchments and 96 catchments, respectively. The common calibration provides an effective way to identify parameter sets which work reasonably for all catchments within the modeled domain. Testing the parameters on an independent time period shows that common parameters perform comparably well to those obtained using individual calibration. The transfer of the common parameters to model ungauged catchments works well. The performance of common parameters on a small number of catchments (15) was better than on a big number of catchments (96) covering a large spatial scale. It indicates that the performance of the common parameters depends strongly on the selection of the catchments used to assess them and a reasonable geographic proximity of the catchments might be a good choice for common calibration. The results of the experiments were similar for all three hydrological models applied independently of the choice of the performance measures. Note, however, that the common parameters corresponding to the different performance measures differ considerably. Common behavior is dependent on how one evaluates the performance of the models.

The fact that many catchments share common parameters which describe their dynamical behavior does not mean that they have the same dynamical behavior. The model output highly depends on the parameter η which varies from catchment to catchment and also as a function of the other model parameters describing dynamical behavior. Common parameters offer a good possibility for the prediction of ungauged catchments; only the parameter η, which controls the long-term water balances, has to be estimated individually. This, however, can be done using other modeling approaches including regionalization methods.

In this study, all the models were tested on the daily timescale. The results show that many catchments that behave similar to the same dynamical parameter sets could perform reasonably for all of them. This means that hydrological behavior on the daily scale is mainly dominated by precipitation characteristics and actual evapotranspiration, and we believe that differences in catchment properties have rather significant effects on smaller temporal scales (e.g., hourly). Results also indicate that the differences in catchment properties cannot be captured well by simple lumped model parameters.

The Supplement related to this article is available online at doi:10.5194/hess-20-2913-2016-supplement.

Acknowledgements

The study of the second author (Yingchun Huang) was supported by China Scholarship Council. The authors gratefully acknowledge two anonymous reviewers for their invaluable and constructive suggestions, and thank Stephen Kwakye for proofreading the manuscript. Edited by: R. Merz Reviewed by: R. Arsenault and one anonymous referee

References Ali et al.(2012)Ali, Tetzlaff, Soulsby, McDonnell, and Capell

Ali, G., Tetzlaff, D., Soulsby, C., McDonnell, J. J., and Capell, R.: A comparison of similarity indices for catchment classification using a cross-regional dataset, Adv. Water Resour., 40, 11–22, 2012.

Andréassian et al.(2012)Andréassian, Le Moine, Perrin, Ramos, Oudin, Mathevet, Lerat, and Berthet

Andréassian, V., Le Moine, N., Perrin, C., Ramos, M.-H., Oudin, L., Mathevet, T., Lerat, J., and Berthet, L.: All that glitters is not gold: the case of calibrating hydrological models, Hydrol. Process., 26, 2206–2210, 2012.

Archfield and Vogel(2010)

Archfield, S. A. and Vogel, R. M.: Map correlation method: Selection of a reference streamgage to estimate daily streamflow at ungaged catchments, Water Resour. Res., 46, W10513, 10.1029/2009WR008481, 2010.

Bárdossy(2007)

Bárdossy, A.: Calibration of hydrological model parameters for ungauged catchments, Hydrol. Earth Syst. Sci., 11, 703–710, 10.5194/hess-11-703-2007, 2007.

Bárdossy and Singh(2008)

Bárdossy, A. and Singh, S. K.: Robust estimation of hydrological model parameters, Hydrol. Earth Syst. Sci., 12, 1273–1283, 10.5194/hess-12-1273-2008, 2008.

Bergström and Forsman(1973)

Bergström, S. and Forsman, A.: Development of a conceptual deterministic rainfall-runoff model, Nord. Hydrol., 4, 174–190, 1973.

Beven(2000)

Beven, K. J.: Uniqueness of place and process representations in hydrological modelling, Hydrol. Earth Syst. Sci., 4, 203–213, 10.5194/hess-4-203-2000, 2000.

Beven and Freer(2001)

Beven, K. J. and Freer, J.: Equifinality, data assimilation, and data uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology, J. Hydrol., 249, 11–29, 2001.

Blöschl et al.(2013)Blöschl, Sivapalan, Wagener, Viglione, and Savenije

Blöschl, G., Sivapalan, M., Wagener, T., Viglione, A., and Savenije, H. E.: Runoff Prediction in Ungauged Basins: Synthesis across Processes, Places and Scales, Cambridge University Press, Cambridge, 2013.

Boyle et al.(2001)Boyle, Gupta, Sorooshian, Koren, Zhang, and Smith

Boyle, D. P., Gupta, H. V., Sorooshian, S., Koren, V., Zhang, Z., and Smith, M.: Toward Improved Streamflow Forecasts: Value of Semidistributed Modeling, Water Resour. Res., 37, 2749–2759, 2001.

Duan et al.(2006)Duan, Schaake, Andreassian, Franks, Goteti, Gupta, Gusev, Habets, Hall, Hay, Hogue, Huang, Leavesley, Liang, Nasonova, Noilhan, Oudin, Sorooshian, Wagener, and Wood

Duan, Q., Schaake, J., Andreassian, V., Franks, S., Goteti, G., Gupta, H., Gusev, Y., Habets, F., Hall, A., Hay, L., Hogue, T., Huang, M., Leavesley, G., Liang, X., Nasonova, O., Noilhan, J., Oudin, L., Sorooshian, S., Wagener, T., and Wood, E.: Model Parameter Estimation Experiment (MOPEX): An overview of science strategy and major results from the second and third workshops, J. Hydrol., 320, 3–17, 2006.

Falcone et al.(2010)Falcone, Carlisle, Wolock, and Meador

Falcone, J. A., Carlisle, D. M., Wolock, D. M., and Meador, M. R.: GAGES: A stream gage database for evaluating natural and altered flow conditions in the conterminous United States: Ecological Archives E091-045, Ecology, 91, 621, 2010.

Fernandez et al.(2000)Fernandez, Vogel, and Sankarasubramanian

Fernandez, W., Vogel, R., and Sankarasubramanian, A.: Regional calibration of a watershed model, Hydrolog. Sci. J., 45, 689–707, 2000.

Gaborit et al.(2015)Gaborit, Ricard, Lachance-Cloutier, Anctil, Turcotte, and Polat

Gaborit, É., Ricard, S., Lachance-Cloutier, S., Anctil, F., Turcotte, R., and Polat, A.: Comparing global and local calibration schemes from a differential split-sample test perspective, Can. J. Earth Sci., 52, 990–999, 2015.

Grigg(1965)

Grigg, D.: The logic of regional systems 1, Ann. Assoc. Am. Geogr., 55, 465–491, 1965.

Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez

Gupta, H., Kling, H., Yilmaz, K., and Martinez, G.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, 2009.

Hrachowitz et al.(2013)Hrachowitz, Savenije, Blöschl, McDonnell, Sivapalan, Pomeroy, Arheimer, Blume, Clark, Ehret et al.

Hrachowitz, M., Savenije, H. H., Blöschl, G., McDonnell, J. J., Sivapalan, M., Pomeroy, J., Arheimer, B., Blume, T., Clark, M. P., Ehret, U., Fenicia, F., Freer, J. E., Gelfan, A., Gupta, H. V., Hughes, D. A., Hut, R., Montanari, A., Pande, S., Tetzlaff, D., Troch, P. A., Uhlenbrook, S., Wagener, T., Winsemius, H., Woods, R. A., Zehe, E., and Cudennec, C.: A decade of Predictions in Ungauged Basins (PUB) – a review, Hydrolog. Sci. J., 58, 1198–1255, 2013.

McDonnell and Woods(2004)

McDonnell, J. and Woods, R.: On the need for catchment classification, J. Hydrol., 299, 2–3, 2004.

McIntyre et al.(2005)McIntyre, Lee, Wheater, Young, and Wagener

McIntyre, N., Lee, H., Wheater, H., Young, A., and Wagener, T.: Ensemble predictions of runoff in ungauged catchments, Water Resour. Res., 41, W12434, 10.1029/2005WR004289, 2005.

Moore(1985)

Moore, R. J.: The probability-distributed principle and runoff production at point and basin scales, Hydrolog. Sci. J., 30, 273–297, 1985.

Nash and Sutcliffe(1970)

Nash, J. and Sutcliffe, J.: River flow forecasting through conceptual models. 1. A discussion of principles, J. Hydrol., 10, 282–290, 1970.

Oudin et al.(2010)Oudin, Kay, Andréassian, and Perrin

Oudin, L., Kay, A., Andréassian, V., and Perrin, C.: Are seemingly physically similar catchments truly hydrologically similar?, Water Resour. Res., 46, W11558, 10.1029/2009WR008887, 2010.

Parajka et al.(2007)Parajka, Blöschl, and Merz

Parajka, J., Blöschl, G., and Merz, R.: Regional calibration of catchment models: Potential for ungauged catchments, Water Resour. Res., 43, W06406, 10.1029/2006WR005271, 2007.

Razavi and Coulibaly(2012)

Razavi, T. and Coulibaly, P.: Streamflow prediction in ungauged basins: review of regionalization methods, J. Hydrol. Eng., 18, 958–975, 2012.

Ricard et al.(2012)Ricard, Bourdillon, Roussel, and Turcotte

Ricard, S., Bourdillon, R., Roussel, D., and Turcotte, R.: Global calibration of distributed hydrological models for large-scale applications, J. Hydrol. Eng., 18, 719–721, 2012.

Samaniego et al.(2010)Samaniego, Bárdossy, and Kumar

Samaniego, L., Bárdossy, A., and Kumar, R.: Streamflow prediction in ungauged catchments using copula-based dissimilarity measures, Water Resour. Res., 46, W02506, 10.1029/2008WR007695, 2010.

Sawicz et al.(2011)Sawicz, Wagener, Sivapalan, Troch, and Carrillo

Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, 10.5194/hess-15-2895-2011, 2011.

Schaefli and Gupta(2007)

Schaefli, B. and Gupta, H.: Do Nash values have value?, Hydrol. Process., 21, 2075–2080, 2007.

Sivakumar and Singh(2012)

Sivakumar, B. and Singh, V. P.: Hydrologic system complexity and nonlinear dynamic concepts for a catchment classification framework, Hydrol. Earth Syst. Sci., 16, 4119–4131, 10.5194/hess-16-4119-2012, 2012.

Sivapalan(2003)

Sivapalan, M.: Prediction in ungauged basins: a grand challenge for theoretical hydrology, Hydrol. Process., 17, 3163–3170, 2003.

Toth(2013)

Toth, E.: Catchment classification based on characterisation of streamflow and precipitation time series, Hydrol. Earth Syst. Sci., 17, 1149–1159, 10.5194/hess-17-1149-2013, 2013.

Wagener et al.(2001)Wagener, Boyle, Lees, Wheater, Gupta, and Sorooshian

Wagener, T., Boyle, D. P., Lees, M. J., Wheater, H. S., Gupta, H. V., and Sorooshian, S.: A framework for development and application of hydrological models, Hydrol. Earth Syst. Sci., 5, 13–26, 10.5194/hess-5-13-2001, 2001.

Wagener et al.(2007)Wagener, Sivapalan, Troch, and Woods

Wagener, T., Sivapalan, M., Troch, P., and Woods, R.: Catchment classification and hydrologic similarity, Geogr. Compass, 1, 901–931, 2007.

Zeleny(1981)

Zeleny, M.: Multiple Criteria Decision Making, McGraw-Hill, New York, USA, 1981.

Zhao(1995)

Zhao, R. J. and Liu, X.: The Xinanjiang model, in: Computer Models of Watershed Hydrology, Water Resources Publications, Littleton, Colorado, USA, 215–232, 1995.

</app></app-group></back> </article>