Comparison of different multi-objective calibration criteria using a conceptual rainfall-runoff model of flood events

. A conceptual lumped rainfall-runoff ﬂood event model was developed and applied on the Gardon catchment located in Southern France and various single-objective and multi-objective functions were used for its calibration. The model was calibrated on 15 events and validated on 14 oth-ers. The results of both the calibration and validation phases are compared on the basis of their performance with regards to six criteria, three global criteria and three relative criteria representing volume, peakﬂow, and the root mean square error. The ﬁrst type of criteria gives more weight to large events whereas the second considers all events to be of equal weight. The results show that the calibrated parameter values are dependent on the type of criteria used. Signiﬁcant trade-offs are observed between the different objectives: no unique set of parameters is able to satisfy all objectives simultaneously. Instead, the solution to the calibration problem is given by a set of Pareto optimal solutions. From this set of optimal solutions, a balanced aggregated objective function is proposed, as a compromise between up to three objective functions. The single-objective and multi-objective calibration strategies are compared both in terms of parameter variation bounds and simulation quality. The results of this study indicate that two well chosen and non-redundant objective functions are sufﬁcient to calibrate the model and that the use of three objective functions does not necessarily yield different results. The problems of non-uniqueness in model calibration, and the choice of the adequate objective functions for ﬂood event models, emphasise the importance catchment’s hydrology.

bias as an objective function, one may find a set of parameters that provides a very good simulation of volume, but a poor simulation of the hydrograph shape or peak flow, and vice versa. Conventional objective functions such as the root mean square error, the Nash and Sutcliffe (1970) efficiency coefficient, or the index of agreement (Willmott, 1981) tend to emphasize the high flows, and consequently, are oversensitive to extreme values and outliers (Legates and McCabe, 1999). On the opposite, the mean absolute percent error tends to emphasize the low flows (Yu and Yang, 2000).
However, in most real world applications, models are used to reproduce the entire shape of the hydrograph, sometimes even to simulate more than one flow component in the catchment i.e. groundwater and surface water, water and nutrient fluxes etc. In these occurrences, the use of a single objective function may be questionable and it would be advisable to take into account various objective functions by considering the calibration problem in a multi-objective framework (Yapo et al., 1998;Madsen, 2000). After Pareto (1906) introduced the concept of non-inferior solutions in the context of economics, Zadeh (1963) and Stadler (1979) began to apply the notion of Pareto optimiality to the fields of engineering. Over the last three decades tremendous amount of research effort has been expanded on multi-objective decision making leading to the publication of many interesting results in the literature (Changkong and Haimes, 1983;Miettinen, 1999;Ehrgott, 2005). One of the most widely used methods for solving multi-objective optimisation problems is to transform a multi-objective problem into a series of singleobjective problems. When an appropriate set of solutions is obtained by the single-objective optimisations, the solutions can approximate a Pareto front (case of a two-objective function) or Pareto surface (case of three-objective functions). In hydrology, most of the studies related to multi-objective calibration have investigated the use of two-objective functions and few ones have looked into the use of three or more functions (Madsen et al., 2002;Schoops et al., 2005;Parajka et al., 2007). In this study we will use the multi-objective calibration approach with three objective functions as suggested by Madsen (2000) and Madsen et al. (2002). In addition, while most of these studies have dealt with continuous simulations, we will investigate multi-objective calibration issues related to event based modelling.
Hence, the objective of this paper is to develop an event based lumped conceptual rainfall-runoff model and then to compare single-objective and multi-objective calibration approaches. The emphasis is put on the impact of the selected objective functions on the actual hydrograph shapes and simulation errors rather than on the calibration algorithm as several studies have already investigated this issue (Yu and Yang, 2000;Johnsen et al., 2005;Tang et al., 2006). The Gardon catchment located in southern France is used in the applications because of the recurrent flooding problems its inhabitants encounter yearly. The objective functions used in this study can be divided in two broad categories: global and relative. Given the diversity of the flood events to be modelled such an approach was deemed necessary as the first type of criteria gives more weight to large events whereas the second considers all events to be of equal weight. For each category, three different objective functions are considered: volume conservation which is important for gauging problems, peakflow reproduction which is essential for flood applications and the root mean square error as a measure of the global agreement between the simulated and observed curves for high flows. The paper is organised in four sections: i) model presentation; ii) formulation of the calibration criteria; iii) presentation of the study zone, and iv) model performance analysis based on single-objective and multi-objective calibration.

The lumped conceptual rainfall-runoff model
Since the 1960s, lumped, conceptual rainfall-runoff models have been used in hydrology (e.g. Crawford and Linsley, 1966;Cormary and Guilbot, 1969;Duan et al., 1992;Bergström, 1995;Donigan et al., 1995;Havnø et al., 1995). These models consider the catchment as an undivided entity, and use lumped values of input variables and parameters. For the most part (for a review, see Fleming, 1975;Singh, 1995), they have a conceptual structure based on the interaction between storage elements representing the different processes, with mathematical functions to describe the fluxes between the stores.
Hydrol. Earth Syst. Sci., 13, 519-535, 2009 www.hydrol-earth-syst-sci.net/13/519/2009/ The modelling approach followed herein will be lumped and the catchment will be considered as a single entity. A two-reservoir-layer model has been developed to represent the catchment on the basis of the CREC model (Cormary and Guilbot, 1969) and the Diskin and Nazimov (1995) production function (Fig. 1). Evaporation is not represented since the purpose of the model is to simulate individual flood events during which evaportanspiration is negligible. The first layer, denoted "soil-reservoir", represents the upper soil layer and controls surface runoff, infiltration, interflow and percolation. The second layer, denoted "aquifer-reservoir", represents the aquifer where mainly base flow occurs. A unit hydrograph transfer function is used to route flows to the outlet. The output of the model will be a simulated hydrograph which will be compared to the original measured hydrograph to assess model performance. A general description of each procedure is given below.

The production function
A regulating element f separates the precipitation P into surface runoff R, and infiltration I . The soil-reservoir element has one input, the infiltration I , and two outputs, the lateral interflow q, and the vertical flow g which represents the percolation from the upper soil layer to deeper layers. The state variable of the regulating element, denoted f , is determined by the magnitude of the reservoir state variable S according to the Diskin and Nazimov (1995) relationship where f 0 [ms −1 ] is the maximum infiltration capacity (corresponding to S=0), f c [ms −1 ] the minimum infiltration capacity (corresponding to S=S m ), and S m [m] the maximum storage in the soil-reservoir layer. The value of f c characterises the soil's infiltration capacity at saturation, and the term (S/S m ) characterises the relative soil moisture. The two outputs I and R of the regulating element depend on the value of the state variable f and on the value of the input P , at the same instant according to the following equations If P <f then I =P and R=0 If P >f then I =f and R=P − f The two outputs of the soil-reservoir, q and g, are calculated as function of a parameter b (with 0≤b≤1) and the term (S/S m ) It should be noted that the sum q+g=f c S S m is independent of the parameter b and verifies the soil-reservoir output of the Diskin and Nazimov (1995) model. As the storage S approaches the threshold value S m , both the infiltration capacity f and the sum q+g tend to the same value f c .
The aquifer-reservoir has one input, the percolation g, and one output, the base flow B [ms −1 ], which is calculated as function of the aquifer-reservoir level S b using a linear relation where k [s −1 ] is a constant characterizing the recession curve of the aquifer. In order to reduce the number of parameters, the aquifer-reservoir does not have a maximum storage depth. The value of the state variable S b of the aquiferreservoir is obtained using the continuity equation

The transfer function
A transfer function is used to route the rainfall excess to the catchment outlet. A unit hydrograph linear model, based on a Hayami (1951) kernel function, which is an approximation of the diffusive wave equation, was used to simulate the transfer of the sum of (R + q + B) to the outlet (Moussa and Bocquillon, 1996). Let I(t) [m 3 s −1 ] be the input hydrograph where A [m 2 ] is the catchment area. Let O(t) be the routed hydrograph at the outlet with H(t) the Hayami kernel function defined as where w[s] is a time parameter that represents the centre of gravity of the unit hydrograph, z [dimensionless] a form parameter, π=3.1416 and t the time [s]. For low values of z (i.e. z=1, 2 or 5), the unit hydrograph represents both translation and diffusivity (approximation of the diffusive wave equation), while for high values of z (i.e. z=20, 50 or 100), the unit hydrograph tends to represent only a translation equal to w (approximation of the kinematic wave equation).

Model properties and parameters
The input rainfall P is usually given as a function of time in the form of a histogram using a fixed time interval. Consequently, the other variables are also presented as functions of time, and the computations are carried out for the same fixed time interval. The regulating element f and the soilreservoir element are linked by a feedback path transmitting information about the state of the storage element to the regulating element. The regulating element is related to the soilreservoir element by the fact that one of its outputs is the input of the soil-reservoir element. It is also related to the transfer function by the fact that its output R is one input of the transfer function.
Computations start at an instant adopted as zero time t=0, with a known, or an assumed, initial value of the soilreservoir S 0 and the aquifer-reservoir S b0 at the start of that time interval. Initial conditions can considerably influence modelling results. However, our model is more sensitive to the ratio between the initial and maximum storage capacities than to the value of the initial condition. The 5-day antecedent rainfall index is frequently used in hydrological modelling (i.e. Chevallier, 1983;Fedora and Beschta, 1989;Heggen, 2001;Peugeot et al., 2003;Brocca et al., 2008). Although some studies have used wider time spans to calculate this index (Anctil et al., 2004), the 5-day rainfall is considered as a standard. For each flood event, the value of S 0 is defined according to the 5-day antecedent rainfall R 5d corresponding to three classes representing dry, normal and wet soil moisture initial conditions as suggested by the Soil Conservation Service-USDA (1972) method.
If R 5d <10 mm then S 0 = 0.25 S m (10) At the beginning of the rainfall event, the measured discharge Qo(0) at t=0 is the sum of the lateral interflow q and the base flow B. Using Eq. (4) and (5), and substituting S by S 0 , the initial value of the aquifer-reservoir S b0 is calculated as For each time interval, the three state variables f (t) which separates rainfall into surface runoff and infiltration, the level S(t) in the soil-reservoir and the level S b (t) in the aquiferreservoir are calculated from the known values of the variables at the beginning of the time interval and the rainfall input to the model during the interval. The values of the other variables at the end of the computation time interval are derived from the value of the three state variables by using the equations above.
The model needs : i) five parameters for the production function, the minimum value of the infiltration capacity f c , a coefficient "a" that relates the maximum and minimum infiltration capacities i.e. f 0 =af c (with a>1), the maximum level of the soil-reservoir S m , the parameter b of the lateral interflow, and the parameter k of the aquifer-reservoir's recession curve, ii) two parameters for the transfer function, the lag time w and the shape parameter z and iii) two initial conditions S 0 and S b0 calculated as function of the 5-day antecedent rainfall R 5d and the measured discharge Qo(0) at t=0.

Formulation of calibration criteria
The objective of model calibration is to select parameter values so that the model simulates the measured hydrograph as closely as possible. Our aim is to consider multiple objectives that measure different aspects of the hydrological response (Madsen, 2000): i) a good agreement between the average simulated and observed runoff volume (i.e. a good water balance); ii) a good overall agreement of the hydrograph shape; iii) a good agreement of the peak flows.
When a calibration procedure is used, the quality of the final model parameters will depend on the structure of the model, the power of the optimisation algorithm, the quality of the input data, and the estimation criteria or objective functions used in the optimisation procedure (Gan and Biftu, 1996). It is beyond the scope of this paper to address all the above factors. We will focus on the last one only and discuss the definition of objective functions and multi-objective calibration procedures.

The objective functions
The objective functions used in this study include both relative and absolute error measures as suggested by Legates and McCabe (1999). The selected criteria can be divided in two broad categories: "global" which gives more weight to larger flood events and "relative" which considers all events to be of equal weight. For each category, three different objective functions were considered: volume conservation, the root mean square error (RMSE) (which is related by a one-toone relationship to the widely used Nash and Sutcliffe (1970) efficiency measure) and the peakflow prediction: 1. The global volume error V g and the relative volume error V r where N is the total number of flood events used for calibration, i an index representing a flood event (1≤i≤N), and for each flood event i: n i the number of time steps, Lo i the observed runoff depth and Ls i the simulated runoff depth. 2. The global root mean square error RMSE g and the relative root mean square error RMSE r Hydrol. Earth Syst. Sci., 13, 519-535, 2009 www.hydrol-earth-syst-sci.net/13/519/2009/ where n i is the number of time steps in the flood event i, j is an index representing the time step in a flood event i (1≤j≤n i ), Qo ij the observed discharge at time j in the flood event i and Qs ij the simulated discharge at time j on the flood event i.
3. The global peakflow P g and the relative peakflow P r where Qxo i is the observed peak flow of discharge in the flood event i and Qxs i is the simulated peak flow of discharge in the flood event i.
The six objective functions, V g , V r , RMSE g , RMSE r , P g and P r , are positive functions, and the optimum value of the parameters corresponds to the minimum value of each "0". A single-objective calibration procedure was undertaken separately with each of the six criteria. Most model calibration procedures suffer from the same problems, namely the existence of multiple optima and the presence of high interaction or correlation between subsets of fitted model parameters. A coupled manual and automatic calibration procedure, similar to adaptive cluster based methods (Solomatine, 1999) was used. In the first phase, the manual method is used on the basis of expert knowledge of the hydrological model structure and the catchment characteristics (rainfall, discharge, events type). The manual calibration method is based on the results of previous sensitivity analysis, and uses a trial and error method to obtain a set of parameters for each flood event separately (Moussa, 1991;Moussa et al., 2007). The manual calibration is followed by a collective calibration using all the flood events simultaneously (Chahinian et al., 2005(Chahinian et al., , 2006. Through this first phase, the lower and higher limits of each parameter range are obtained. The second phase consists in an automatic calibration procedure based on model simulations using a progressively finer grid in space. The second phase aims at exploring the whole space of parameters, especially around the optimum obtained during the first phase. The initial grid is obtained by subdividing the interval of variation of each parameter into steps, not necessarily regular, on the basis of the results of the first phase. Different local minima can appear; around which, smaller grid steps are defined. This approach enables the exploration of a larger space, but is limited by the difficulties due to non-linearities and thresholds in the model which can produce local minima in the objective functions space. The procedure is repeated iteratively and guides the grid adaptation to focus on more promising regions in the space of parameters. The number of simulations was limited to 50000 runs. In the applications, this procedure was first used for single-objective calibration runs.

Multi-objective calibration procedures
When using multiple objectives, the calibration problem can be stated as follows (Madsen, 2000) min . . , m) are the different objective functions. The optimisation problem is constrained because θ is restricted to the feasible parameter space . The parameter space is usually defined as a hypercube by specifying lower and upper limits on each parameter. These limits are chosen according to physical and mathematical constraints, information about physical characteristics of the system, and from modelling experiences (Kuczera, 1997;Madsen, 2000Madsen et al., 2002). The solution of Eq. (17) will not, in general, be a single unique set of parameters but will consist of the so-called Pareto set of solutions according to different trade-offs between the different objectives (Gupta et al., 2003). The parameter space can be divided into "good" (Pareto optimal) and "bad" solutions, and none of the "good" solutions can be said to be "better" than any of the other "good" solutions (Madsen, 2000). A member of the Pareto set will be better than any other member with respect to some of the objectives, but because of the trade-off between the different objectives it will not be better with respect to other objectives. In practical applications, the entire Pareto set may be too expensive to calculate, and one is only interested in part of the Pareto optimal solutions.
Generally, when dealing with the multi-objective calibration, the problem is usually transformed into a singleobjective optimisation problem by defining a scalar that aggregates the various objective functions (Madsen, 2000;Parakja et al., 2007). However herein, the balanced optimum Pareto criterion was calculated using the simulation results obtained by the single-objective calibration. The procedure was first undertaken for each of the couples within the same function type i.e. (V g and V r ), (RMSE g and RMSE r ) and (P g and P r ). Then, we crossed two "global" criteria i.e. (V g and RMSE g ), (V g and P g ), (RMSE g and P g ), and two "relative" criteria (V r and RMSE r ), (V r and P r ) and (V r and RMSE r ). In the last step, the calibration was carried out using all functions of the same type (V g , RMSE g and P g ) and (V r , RMSE r and P r ).

Catchment description
The Gardon d'Anduze is a 543 km 2 Mediterranean catchment located in Southern France. It has a highly marked topography consisting of high mountain peaks, narrow valleys, steep hillslopes and a herring-bone shaped channel network. The highest point is the Mont Aigoual at 1567 m a.s.l. and the outlet is located at Anduze at 123 m a.s.l. The catchment's soils developed essentially on metamorphic (64% of the catchment area) and granitic terrains. The substrate is made of shale and crystalline rocks overlain by silty clay loams (83% of the catchment area) and sandy loam top soil. The vegetation is dense and composed mainly of beech and chestnut trees, holm oaks and garrigue, conifers, moor, pasture and cultivated lands. These vegetation classes are typical of Mediterranean forests.
Rainfall data for the 1977-1984 period were obtained on paper medium from the "Direction Départementale de l'Equipement du Gard" of the French Ministry of Equipment on seven rain gauge stations. The "Direction Départementale de l'Equipement du Gard" provided also analogue streamflow hydrographs at the outlet at Anduze (Moussa, 1991). The Gardon region is characterized by the highest rainfall intensities recorded in France e.g. a maximum daily rainfall of 608 mm was recorded on the Mont Aigoual during 24 h on 30-31 October 1963. The analysis of long rainfall time series shows that a daily rainfall of 70 mm has a return period of 1 year and daily rainfall of 170 mm has a return period of 100 years. The conjunction of high intensity rainfall, shallow soils and steep slopes produce devastating floods in autumn.
Mean rainfall was calculated as the arithmetic mean of the seven rain gauges. It is true that a mean value of rainfall cannot fully express rainfall heterogeneity. However, in lumped hydrologic models, one cannot distribute rainfall without altering the model structure into a semi-distributed scheme. Herein the number of rain gauge is seven, and the mean precipitation can be calculated using either simple graphical methods or more elaborate techniques based on spline functions or kriging methods. However, Lebel et al. (1987) showed that on this catchment, for a small number of rainfall gauges, the arithmetic mean and the Thiessen models yield comparable results to more complicated methods such as kriging and spline.

Characteristics of the studied flood events
Flood events from the 1977-1984 period were selected based on a continuous rainfall greater than 50 mm. All corresponding hyetographs and hydrographs were digitised at an hourly time step (Moussa, 1991). In total, 29 events were retained: the event durations range between 24 h and 108 h, the total rainfall between 50 mm and 300 mm, the total runoff between 9 and 166 mm, the runoff coefficients between 15% and 67%, and the initial discharge between 3 and 92 m 3 s −1 . Figure 2 shows the relations between the total rainfall, the total runoff depth, runoff coefficient, peakflow, and the initial discharge. No clear correlation can be seen between them i.e. the most important rainfall events in terms of precipitation volume are not necessarily those that have the highest runoff coefficients or peakflow. The initial discharge value which represents the catchment's moisture condition does not seem to yield linear trends either. This finding is typical  of Mediterranean climatic conditions where short duration and high intensity rainfall events are often the cause of the most important runoff events in terms of both runoff depth and peakflow.
Fifteen events, corresponding to the 1977-1979 period, were chosen for calibration and the remaining fourteen, corresponding to the 1980-1984 period, were used for validation (Fig. 2). Both data sets are representative of the various hydrological behaviours observed on the catchment. They cover all climatic seasons and display a large spectrum of rainfall intensity, peakflow and runoff coefficient values.

Single-and mutli-objective calibration and validation results
For the fifteen events of the calibration period, a number of tests were carried out in order to optimise the parameters first using single-objective functions, and then to estimate the Pareto front and analyse the trade-offs between the different objectives when using two-or three-objective approaches.

A priori ranges for the model parameters
We defined a lower and an upper variation bound for each parameter. The hypercube search space shown in Table 1 was used for all the numerical tests: -The soil-reservoir maximum capacity S m represents the storage in the root zone for a soil depth of approximately 1-2 m, and was estimated equal to 365 mm on the Gardon d'Anduze by Moussa (1991) and Moussa et al. (2007). S m 's variation range was set between 0 and 1000 mm.
-"f c " represents the soil's infiltration capacity at natural saturation. As a first approximation, this parameter was considered equal to the hydraulic conductivity at natural saturation using Rawls and Brakenseik's pedotransfer functions (1989 we distinguish two soil classes (Moussa et al., 2007). The first one corresponds to silty clay loam soils with low permeability (estimated f c =2.8×10 −7 ms −1 ) while the second one to sandy loam soils, with higher permeability values (estimated f c =1.4×10 −6 ms −1 ). f c 's variation range was set between 0 and 300×10 −6 ms −1 .
-"a" is an empirical parameter that relates the maximum f 0 and minimum f c infiltration capacities such as f 0 =af c (with a>1). f 0 depends also on soil hydrodynamic properties as stated above for f c . Its variation range was set between 1 and 70.
-"b" is an empirical parameter used in Eq. (4) to separate the infiltration from the soil-reservoir into interflow q and deep percolation g (Fig. 1). "b" can vary between 0 and 1.
-"k" is an empirical parameter characterizing the exponential recession curve of the aquifer. k's variation range was set between 0 and 30×10 −5 s −1 .
-"w" is a time parameter that represents the centre of gravity of the unit hydrograph. For the Gardon d'Anduze, the lag time for the studied flood events ranges generally between 3 and 9 h. The range of variation of w was set between 0 and 72 h.
-"z" is a form parameter of the unit hydrograph. For low values of z, the unit hydrograph represents an approximation of the diffusive wave equation, while for high values, the unit hydrograph tends to represent an approximation of the kinematic wave equation. z ranges between 0.01 and 100 as suggested by Moussa and Bocquillon (1996).
Note that "z" has a feasible range that covers several decades (0.01-100); that is why in the calibration procedure, we used a small interval for lower values of z (z<1) and a larger interval for higher values (z>1). This procedure is equivalent to a Log transformation of parameter z which allows a better distinction between small and very large values. For all remaining parameters, the initial grid is obtained by subdividing the interval of variation of each parameter into regular steps.

Results of the single-objective calibration procedure
The calibrated parameter values for each of the six objective functions V g , V r , RMSE g , RMSE r , P g and P r , are presented in Table 1. Results show that the parameters vary considerably depending on the objective function used.
For the production function, the soil-reservoir maximum capacity S m ranges between 9 and 71 mm; this appears to be a small value if it is supposed to represent the storage in the root zone. f c , which represents the soil's infiltration capacity at natural saturation, varies between 0.29×10 −5 and 2.6×10 −5 ms −1 and compares well with the values estimated when using Rawls and Brakenseik's (1989) pedotransfer function. It is interesting to note that the two empirical parameters a and b have the widest span, probably because these parameters are used to compensate for the errors made on the remaining parameters of the production function. Indeed, even when using conceptual models, modellers tend to be less permissive with parameters that can evoke physical characteristics or be assimilated to such. As a direct consequence, such parameters, even in an un-constrained calibration procedure, will be allowed to vary in a tighter interval.
The transfer function has two parameters w and z. The travel time values of w obtained through the use of V g and V r (w=21-23 h) are clearly overestimated both in comparison with the observation data (the basin response time ranges between 3 and 9 h) and the values obtained through RMSE g , RMSE r , P g and P r (w=3-4 h). This is because the two volume criteria V g and V r are less sensitive to the hydrograph's global shape and consequently to the parameters of the transfer function. The second parameter of the transfer function z ranges between 3 and 17. These values highlight the importance of the diffusive factor.
For most parameters the use of RMSE g or RMSE r yields close results. However, when using P g or P r , the calibrated parameters differ from those obtained with V g , V r , RMSE g or RMSE r . These results highlight the differences between the various criteria: V g or V r describe the mean value of the discharge, RMSE g or RMSE r describe the whole shape of the hydrograph, while P g and P r refer to a single point representing peakflow.
When overlaying the simulated hydrographs for the same event using the calibrated parameters for each objective function, one can note major differences between the shapes of simulated hydrographs. Figure 3 shows the simulated hydrographs for the event of 22 October 1977 using the calibrated parameters obtained from the single-objective calibration (Table 1) for respectively V g , RMSE g and P g . When using the parameters calibrated using the V g -objective function (Fig. 3, normal dotted line), we observe that the simulated hydrograph reproduces the total volume of the hydrograph accurately but fails to represent the global shape of the hydrograph; neither the peakflow amplitude, nor the time of occurrence of peakflow are well reproduced. When using the parameters calibrated using the RMSE g -objective function (Fig. 3, bold dotted line), the global shape of the hydrograph is well represented; as is the total volume and the peakflow is better represented than in the previous case. Finally, when using the parameters calibrated using the P g -objective func-   (Table 1) for respectively V g (normal dotted line), RMSE g (bold dotted line) and P g (dashdot line) objective functions; the solid bold line shows the observed hydrograph. tion (Fig. 3, dashdot line), the amplitude and the time of occurrence of the peakflow are very well simulated, while the total volume and the global shape of the hydrograph are not. We observe that the use of each single-objective function improves the simulation of some characteristics of the shape of the hydrograph, but never the three studied criteria simultaneously.
Also, no unique solution to the single-objective problem is found and an "equifinality" of parameter sets can be seen as stated by Beven and Binely (1992) and Beven (1993) i.e. many different parameter combinations give acceptable solutions. However, it is important to distinguish equifinality as defined by Beven and Binley (1992), and which forms the basis for the generalised likelihood uncertainty estimation (GLUE) procedure, and trade-offs between different objectives in a multi-objective optimisation framework. The former is related to the fact that different parameter sets may give very similar model performance measured with respect to some likelihood measure (or single objective function), whereas the latter is referred to as multi-objective equivalence of parameter sets and defined in terms of the Pareto criterion (see e.g. Madsen (2000) for a discussion on this).

Results of the multi-objective calibration procedure when crossing two objective functions
The results of the multi-objective calibration obtained when crossing two objective functions are shown in Figs. 4, 5 and 6. The dotted plots represent the objective function values during the calibration process; the "*" indicates the Pareto front and the bold point indicates the balanced aggregated objective function. In Fig. 4, the calibration is based on two objective functions, the global volume error V g and the relative  volume error V r (Fig. 4a), the global RMSE g and the relative RMSE r (Fig. 4c) and the global peakflow P g and the relative peakflow P r (Fig. 4e). The estimated Pareto front for the calibration of V g and V r (Fig. 4a) presents a trade-off. A very good calibration of V g (corresponding to V g =0) provides a bad calibration of V r (V r =16.7%), and vice-versa (V g =0.166×10 −4 m 3 for V r =8.7%). The same comment can be made about P g and P r : P g =40.3 m 3 s −1 when P r =18.3% and P g =58.0 m 3 s −1 when P r =13.3%. This result is not surprising as peakflow refers to instantaneous values that are both difficult to determine and simulate. The Pareto front of RMSE g and RMSE r shows a high correlation. This can be explained by the fact  that the majority of flood events have similar duration (48 to 96 h) and consequently the two objective functions presented in Eq. (15) tend to be similar.
The variation of the optimum model parameter sets along the Pareto front is shown in grey in Fig. 4b, d and e for the multi-objective calibration of (V g and V r ), (RMSE g and RMSE r ) and (P g and P r ) respectively. The parameter values are normalised with respect to the upper and lower limits given in Table 1 so that the range of all normalised parameters is between 0 and 1. For the calibration of V g and V r (Fig. 4b), a remarkably large span is observed in the parameter values when moving along the Pareto front. The range is larger than 50% for the main parameters S m , a, b, w and z.
The compromised solution using the Pareto front is shown in bold on Fig. 4b, d and e, and the corresponding values of the calibrated parameters are given in Table 1. The calibrated parameters of the compromised solution of (V g and V r ) in Table 1 are within the interval delimited by the calibrated parameters of V g and V r separately (Table 1). Similar results are observed for (RMSE g and RMSE r ) and for (P g and P r ). Figure 5 shows results when crossing two "global" criteria (V g and RMSE g ; Fig. 5a and b), (V g and P g ; Fig. 5c and d) and (RMSE g and P g ; Fig. 5e and f) and Fig. 6 shows results when crossing two "relative" criteria (V r and RMSE r ; Fig. 6a and b), (V r and P r ; Fig. 6c and d) and (RMSE r and P r ; Fig. 6e   using the global volume objective function V g , the criterion tends to zero because the calibration procedure allows the simulation of the exact total volume; this is not the case for the other criteria. Note also that the volume based objective functions V g and V r are calculated in volume per time step. Consequently, the corresponding objective function (i.e. less than 0.0001 m 3 /time step in Fig. 5) is small in comparison with other objective functions (i.e. the relative volume error ranges between 0.05 and 0.25 in Fig. 6). Again, we observe significant trade-offs for the three cases of Fig. 5 : -V g =0 when RMSE g =65.5 m 3 s −1 and V g =5.0×10 −5 m 3 when RMSE g =31.0 m 3 s −1 (Fig. 5a); -V g =0 when P g =92.4 m 3 s −1 and V g =23.8×10 −5 m 3 when P g =40.3 m 3 s −1 (Fig. 5c); -RMSE g =31.0 m 3 s −1 when P g =61.6 m 3 s −1 and RMSE g =50.9 m 3 s −1 when P g =40.3 m 3 s −1 (Fig. 5e).
Comparable results are obtained with the relative criteria in Fig. 6a, c and e. The variation of the optimum model parameter sets along the Pareto front for the multi-objective calibration of (V g and RMSE g ; Fig. 5b), (V g and P g ; Fig. 5d), (V r and RMSE r ; Fig. 6b), (V r and P r ; Fig. 6d) are similar and sometimes higher than those observed for (RMSE g and P g ; Fig. 5f) and (RMSE r and P r ; Fig. 6f). Figures 4 to 6 indicate that the global volume criteria (V g ) is the most insensitive i.e. volume conservation can be easily respected with a number of parameter combinations. This translates into a somehow flat Pareto front (Figs. 4a, 5a and b). In comparison RMSE and peakflow produce sharper fronts; RMSE seems to be the most restrictive of the tested criteria. The relative criteria seem to yield sharper fronts than the global criteria which seem to be controlled by the extreme events.
In comparison with the single-objective calibration, the use of a multi-objective calibration technique seems to yield tighter variation intervals. These findings are in accordance with those of Engeland et al. (2006). The variation ranges we obtained for the multi-objective calibration are smaller than those reported in Schoops et al. (2005) and Madsen (2000) who had respectively 3 and 2 additional parameters to calibrate. However, the number of free parameters cannot be the only explanation behind the wider variation spans as Gupta et al. (2003) obtained tighter intervals when calibrating the 13 parameters of the SAC-SMA model.

Results of the multi-objective calibration procedure when crossing three objective functions
The same methodology was extended in order to combine three objective functions. Figure 7 shows a volumetric view for the three-objective calibration space, when crossing the three global objective functions (V g , RMSE g and P g ; Fig. 7a), and the three relative objective functions (V r , RMSE r and P r ; Fig. 7c). Results show comparable parameter variation ranges for both (V g , RMSE g and P g ; Fig. 7b) and (V r , RMSE r and P r ; Fig. 7d). Note that Fig. 5a, c and e give the projection on two-objective planes, respectively (V g , RMSE g ), (V g , P g ) and (RMSE g , P g ) of the three global objective functions (V g , RMSE g , P g ) of Fig. 7a. Similarly, Fig. 6a, c and e give the projection on two-objective planes, respectively (V r , RMSE r ), (V r , P r ) and (RMSE r , P r ) of the three relative objective functions (V r , RMSE r , P r ) of Fig. 7c. For the optimal solution (Table 1), the soil-reservoir maximum capacity S m is equal to 172 mm when crossing (V g , RMSE g and P g ) and 225 mm when crossing (V r , RMSE r and P r ). This value is more representative of the storage in the root zone. For both combinations (V g , RMSE g and P g ) and (V r , RMSE r and P r ), the optimal parameter f c ranges between 5.5×10 −5 and 7.4×10 −5 ms −1 , and compares well with pedotransfer functions (Rawls and Brakenseik, 1989). It is interesting to note that the two empirical parameters a and b still have the widest span. The travel time values of w obtained range between 3 and 4 h and correspond to the approximate lag time of the basin during intense flood events, while the dimensionless shape parameter z, ranges between 14 and 28. The parameter variation ranges are comparable to those obtained when using only two functions. The parameters that vary most between the two combinations are b, k and z. Parameters b and k are clearly linked to each other as they both control the "outward" fluxes and can be used to correct possible flow over-estimation by increasing percolation and interflow. Data on water levels both in the unsaturated and saturated zones could have been useful in constraining these parameters further. Unfortunately such data were unavailable for the studied period. A similar approach can be used for other sets of three-, four-, five-or six-objective functions. However, we limit our analysis to the three global objective functions (V g , RMSE g and P g ) and the three relative objective functions (V r , RMSE r and P r ), because the aim of this study is to conduct a sensitivity analysis of the calibrated parameters when moving from single-to two-or three-objective functions, and not to explore the whole space of the objective functions. Also, it is not simple to analyse simultaneously all possible combinations of the objective functions due to their large number, and because three or more objective functions could yield very complex geometries for the Pareto front. Moreover, mathematically, the Pareto front will "collapse dimension" in subspaces if conflicts do not exist between the subsets of the analysed objectives (see Fig. 4c for an example of a dimensional collapse). Herein, the analysis was conducted in a six-objective space, and a single analysis using all single objective functions implicitly yields all the presented results. It was also possible to present results moving from higher to lower dimensional analysis (i.e. from six-, to five-, four-, three-, two-and single-objective functions). However, it is not simple to analyse data in a six-D space and even in three-D space due to the complex shape of functions. Consequently, starting the analysis with single-, than two-and finally three-objective functions allows the analysis of the shape of each objective function separately and then two by two, then three by three, etc.
The methodology presented in this paper has also its limits, especially in hydrologic modelling where non-linearities and thresholds yield non-convex Pareto fronts such as in Fig. 6a and c. As discussed in a number of studies by Changkong and Haimes (1983), Koski (1985), Das and Dennis (1997) and Messac and Mattson (2002), the traditional approaches used in the calculation of the Pareto front have drawbacks on non-convex parts of the Pareto front. This is due to the fact that the weighted-sum method is often implemented as a convex combination of objectives, where the sum of all weights is constant and negative weights are not allowed. Generally, multi-objective optimisation algorithms perform well for bivariate problems, but scale poorly to multiple objectives (Changkong and Haimes, 1983). More recently, new methods developed to generate the Pareto optimal solutions in non-convex regions (i.e. Kim and Weck, 2005) give promising results in the case of two-objective functions, but further theoretical work is still needed to overcome the limitations for multi-objective functions.

Validation and uncertainty analysis
As the "global" and "relative" criteria gave comparable results and interpretation, we choose to validate the results of    Comparison between measured and simulated runoff depths (a) and peakflows (b) using the set of parameters from the multiobjective V r -RMSE r -P r procedure.
the "relative" criteria (V r , RMSE r and P r ) which are less sensitive to extreme events.
To compare the calibration procedures, we computed the value of each objective function (Table 2). It is not surprising that the values of V r , RMSE r and P r are minimal when single-objective calibration are conducted function of each of these criteria : V r =8.7%, RMSE r =26.0 m 3 s −1 and P r =13.4%. The maximum value of V r =19.1% is obtained when using a single-objective calibration minimising P r ; the maximum value of RMSE r =43.2 m 3 s −1 is obtained when using a single-objective calibration minimising V r ; and the maximum value of P r =27.8% is also obtained when using a single-objective calibration minimising V r . Once again the use of a volume based criterion is restrictive while the use of RMSE can yield relatively acceptable results for peakflow and vice versa. The sensitivity of the RMSE criterion to peakflow has already been reported by many authors (Parada et al., 2003;Yapo et al., 1998) and our findings are similar to theirs.
When using two-or three-objective functions, the error values fall within the intervals indicated above. However, it is interesting to note that for all three criteria, the maximum errors values obtained with the multi-objective methods are always lower than those obtained when using a singleobjective calibration in which the given criterion is not considered. This is essentially a consequence of the trade-offs between objective functions. Finally, the combination of (V r , RMSE r and P r ) gives a reasonable compromise between the three criteria with V r =12.6%, RMSE r =33.3 m 3 s −1 45 Figure 9 a. Event of 22-24/10/1977 b. Event of Fig. 9. Example of simulated hydrographs of high intensity (a) and low intensity events (b) using the set of parameters from the multiobjective V r -RMSE r -P r procedure. The full bold line shows the measured hydrograph, and the dotted lines the simulated hydrographs.  Table 3. Means ε V and ε Q of the relative prediction error on runoff depth and peakflow of the calibration and validation events when using the calibrated parameters with the single-objective calibration and the balanced two-objective or three-objective Pareto optimum solution.
Objective function used for calibration Runoff depth ε V Peakflow ε Q and P r =21.4%. These values are quite comparable to those obtained when using just two objective functions i.e. V r , RMSE r . To further compare the performance of the calibration procedures, we computed the relative error on both runoff depth and peakflow: for a given event i, the error on runoff depth and peakflow are defined respectively by ε vi =(Ls i −Lo i )/Lo i and ε Qi =(Qxs i −Qxo i )/Qxo i . Let ε V and ε Q be the mean of ε vi and ε Qi respectively. The quantities ε V and ε Q represent the bias of runoff depth and peakflow predictions. Table 3 illustrates the findings when using either the single or multiobjective method (i.e. two or three objective functions) and shows that ε V and ε Q fall within similar ranges for two-or three-objective functions. It is worthy to note that the greatest errors on runoff depth are obtained through the use of a peakflow criterion while the greatest errors on peakflow are caused by the use of a volume criterion. Apart from two cases, the error on runoff depth is acceptable (<5%) while, not surprisingly, the error on peakflow is far higher (>10%) but less than 27%. In addition, the model seems to have a tendency to overestimate runoff depth and to underestimate peakflow.
The best compromise between both errors is reached by using the parameters obtained through multi-objective calibration with three functions (V r , RMSE r and P r ) but the combination of V r and RMSE r gives once again comparably good results. In this instance the use of two "well" chosen and complementary objective functions seems to be sufficient for runoff simulation. However, the lack of soilmoisture and groundwater data prevents us from extending our results to the other vertical components of the model. Figure 8 compares the measured and simulated runoff depths (Fig. 8a) and the measured and simulated peakflows (Fig. 8b) obtained when using the parameters of the balanced aggregated objective function (V r , RMSE r and P r ) given in Table 1. It can be seen that the model gives similar results both for the calibration and validation periods. An illustration of the equifinality problem is shown in Fig. 9 where hydrographs are simulated using the parameters corresponding to the entire Pareto front of (V r , RMSE r and P r ); the bold line shows the measured hydrograph and the dotted lines show the various model simulations using the multi-objective V r -RMSE r -P r procedure's parameters. The results indicate that the hydrographs are also well simulated for high (Fig. 9a) and low intensity events (Fig. 9b). The figure clearly shows that many different parameter sets, may produce equally good simulations according to the three objective functions.
Hydrol. Earth Syst. Sci., 13, 519-535, 2009 www.hydrol-earth-syst-sci.net/13/519/2009/ 6 Discussion and conclusion A conceptual lumped event based rainfall-runoff model, coupling a production and a transfer function, was developed and applied on the Gardon catchment in southern France. The model has seven free parameters which need to be calibrated. The model was calibrated on 15 events and validated on 14 others. The results of both the calibration and validation phases were compared on the basis of their performance with regards to six objective functions, three global and three relative, representing volume, the root mean square error, and peakflow.
The results showed that the calibrated parameter values were dependent on the type of the objective function used. Trade-offs are observed between the different objectives and no single set of parameter was able to optimise all objectives simultaneously. Thus a set of Pareto optimal solutions and a balanced aggregated objective function were calculated with two and three objective functions.
The comparison of the single and multi-objective calibration results, using two or three functions, illustrates the "non-uniqueness problem" (Beven and Binley, 1992) since many different parameter combinations gave acceptable solutions according to a given objective. However, the volume conservation criterion seemed to be the most permissive whereas the RMSE and the peakflow prediction criteria yielded sharper Pareto fronts. The model was then validated with the set of optimised parameters obtained using the "combined relative" criteria (V r , RMSE r and P r ). The use of a triple objective function does not seem to be justified in our case. Indeed given the impact peakflow values have on the RMSE, there seems to be a redundancy in their use, hence a combination of either (V r and P r ) or (V r and RMSE r ) can yield equally acceptable results.
Differences between measured and simulated hydrographs were assessed by calculating the bias of the simulated runoff depth and peakflow. These errors can be due to the use of non-optimal parameter values but also to errors inherent to the model structure and the meteorological input data. In the model calibration herein, only the error due to the parameter values is minimised. However, the calibration of model parameters can also compensate the other error sources. In our case the best results in terms of bias were obtained through multiple calibration with a volume and an RMSE criterion The choice of an adequate objective function when modelling separate flood events, emphasise the importance of the modeller's intervention for tailoring the model calibration to a specific application. Attempts have already been made to include this knowledge objectively in the model calibration procedure (Boyle et al., 2000). Our results highlight the importance of the modeller's professional judgement as often the criteria values and error estimates are within close bounds and may not be significantly different from a statistical point of view. It is therefore important to plot the hydrographs and assess the graphical differences in the simulated hydrograph's shape.
A sound hydrological knowledge is required to evaluate data and model errors. In most real world application, especially in an operational framework and for real-time predictions, data quality checks could be too time consuming and hence difficult to carry out. Thus a robust calibration procedure becomes even more essential.
Edited by: H. Bormann