Sensitivity analysis and parameter estimation for distributed hydrological modeling: potential of variational methods

. Variational methods are widely used for the analysis and control of computationally intensive spatially distributed systems. In particular, the adjoint state method enables a very efﬁcient calculation of the derivatives of an ob-jective function (response function to be analysed or cost function to be optimised) with respect to model inputs. In this contribution


Introduction
The distributed modelling of catchment hydrology is now recognised as a valuable approach in order to understand, reproduce and predict the behavior of hydrological systems. However, distributed hydrological models remain a simplified and imperfect representation of the physical processes using uncertain observation data for the estimation of the model inputs to be prescribed (parameters, initial condition and rainfall forcing). Analyzing and reducing uncertainty is therefore essential but issues such as sensitivity and uncertainty analysis, parameter estimation and state-updating are challenging given the dimensionality of the system. Although the approach adopted in this paper is not restricted to a specific type of model input, the focus will be on model parameters to be assigned on the basis of indirect observations (i.e. model calibration).
For both sensitivity analysis and parameter estimation, many of the approaches which are now adopted for distributed hydrological models where originally developed for parsimonious hydrological models (e.g. simplified bucket models, models based on the concept of hydrological similarity). These models are very often characterised by a very low dimensionality, a limited computational cost and no sizable constraints on model parameters.
Published by Copernicus Publications on behalf of the European Geosciences Union.
Experiences in the calibration of such models revealed that the corresponding response surface often contains several regions of attraction, discontinuous derivatives and other geometrical properties compromising the use of local methods, especially those using derivative information (Ibbit and O'Donnell, 1971;Johnston and Pilgrim, 1976;Duan et al., 1992).
Therefore, many recent applications and methodological developments for model calibration involve a stochastic exploration of the parameter space using computationally intensive Monte Carlo methods and/or evolutionary algorithms. These techniques enable the global optimization of single or multiple objectives and very often characterise the uncertainty affecting model parameters (Beven and Binley, 1992;Duan et al., 1992;Kuczera and Parent, 1998;Yapo et al., 1997;Vrugt et al., 2003a,b).
Differential sensitivity analysis (McCuen, 1973a,b), which was very often the only approach computationally affordable, is now gradually replaced by assessments carried out in the statistical framework. The Regional Sensitivity Analysis (RSA) of Hornberger and Spear (1981) inspired numerous applications and developments for the analysis of hydrological systems including the contribution of Beven and Binley (1992). The combination with recursive estimation techniques (Vrugt et al., 2002;Wagener et al., 2003) or the extension to multiple objectives (Bastidas et al., 1999) can provide an interesting insight into the behaviour of hydrological models. The use of variance decomposition approaches which are based on unambiguous importance measures (Cukier et al., 1978;Sobol', 1993;Homma and Saltelli, 1996) is now emerging in the hydrological community (Tang et al., 2007a,b;Yatheendradas et al., 2008;Van Werkhoven et al., 2008a). Using these global sensitivity analysis techniques, it is possible to assess how uncertainty in the model outputs can be apportioned to different sources of uncertainty in the model inputs (Saltelli et al., 2000).
When parameter estimation and sensitivity analysis are carried out in the statistical framework, it is necessary to sample the space of uncertain inputs. However, in distributed hydrological models, parameters are discretized according to the spatial discretization of the model state variables. Approaches developed for parsimonious hydrological models are frequently transferred to distributed hydrological models by means of an empirical dimension reduction of the parameter space. For parameter estimation, scalar multipliers are used in order to adjust spatially distributed parameters featuring a variability which is fixed using prior information (Refsgaard, 1997;Madsen, 2003). The same strategy can be adopted for probabilistic sensitivity analysis (Yatheendradas et al., 2008). In some cases, spatially distributed importance measures are estimated for a very coarse grid resolution or few zones of constant value (Hall et al., 2005;Tang et al., 2007a;Van Werkhoven et al., 2008b). Sampling based approaches to sensitivity analysis and parameter estimation enable an exploration of the parameter space (essential for non-linear systems) but can be limited in handling distributed parameter systems (i.e. curse of dimensionality).
In the deterministic framework, both sensitivity analysis and parameter estimation can be addressed using variational methods. The adjoint state method enables an efficient calculation of the derivatives of an objective function with respect to all model inputs. This technique is particularly suited when the dimension of the response function to be analysed or cost function to be optimized is small when compared to the number of inputs to be prescribed (Lions, 1968;Cacuci, 1981a;Le Dimet and Talagrand, 1986). It has contributed to numerous applications related to the analysis and forecasting of meteorological and oceanographic systems (Le Dimet and Talagrand, 1986;Hall and Cacuci, 1983;Navon, 1998;Ghil and Malanotte-Rizzoli, 1991;Bennett, 1992). Early applications of the adjoint state method to hydrological systems have been carried out in groundwater hydrology (Chavent, 1974;Carrera and Neuman, 1986;Sun and Yeh, 1990). The resolution of inverse problems (parameter, state and boundary condition estimation), local sensitivity analysis, where also addressed in this framework in land surface hydrology (Mahfouf, 1991;Callies et al., 1998;Calvet et al., 1998;Bouyssel et al., 1999;Margulis and Entekhabi, 2001;Reichle et al., 2001), vadose zone hydrology (Ngnepieba et al., 2002), river and floodplain hydraulics (Piasecki and Katopodes, 1997;Belanger and Vincent, 2005;Honnorat et al., 2006) and catchment hydrology (White et al., 2003;Seo et al., 2003a).
The previously mentioned applications involve non-linear models and the underlying inverse problems (i.e. parameter estimation and state updating) are ill-posed. For example, equifinality is inherent in the estimation of a distributed hydraulic conductivity in groundwater hydrology or in the estimation of an initial state for the atmosphere in meteorology. The estimation of model inputs require an appropriate combination of prior information (e.g. derived from land cover and soil type) and observations of the model diagnostic variables (e.g. streamflow observations). The variational framework is suitable for the combination of the different sources of information (including statistical information) trough the resolution of a regularized inverse problem.
The use of parsimonious parametrizations (i.e. scalar multipliers or parameter zonation) can be seen as a regularization approach leading to well-posed inverse problems. Using Tikhonov regularization (Tikhononv and Arsenin, 1977), prior values are specified though a penalization term in the objective function. Therefore, deviations from prior values are allowed only when strongly supported by the calibration data. Whatever the strategy adopted for the regularization of the inverse problem, the adjoint state method enable a precise and efficient estimation of the gradient driving efficient optimization algorithms. Adjoint-based sensitivities can also provide an extensive insight on the way the values specified at the different grid elements influence the simulated hydrological response. The assessment is only local (i.e. outcomes Hydrol. Earth Syst. Sci., 13,[503][504][505][506][507][508][509][510][511][512][513][514][515][516][517]2009 www.hydrol-earth-syst-sci.net/13/503/2009/ valid for a specific point of the parameter space) but provide tremendous information which would require a prohibitive computational cost if it is was to be obtained using sampling based approaches. In rainfall-runoff modelling, deterministic sensitivity analysis and gradient-based parameter estimation have been used in the past and the contributions of McCuen (1973a,b) and Gupta and Sorooshian (1985) are directly in line with the current research endeavor. The explicit but piecewise differentiation (i.e. analytic derivatives rather than classical finite difference approximation) carried out by McCuen (1973a) and Gupta and Sorooshian (1985) corresponds to the strategy adopted for the forward mode of algorithmic differentiation (Rall, 1981). By making the computational cost independent from the dimension of the input space the adjoint state method (implemented with the reverse mode of algorithmic differentiation) represents a significant improvement for the analysis and control of spatially distributed hydrological systems. The contributions of White et al. (2003) and Seo et al. (2003a,b) address the use of this approach for parameter estimation and state updating.
The objective of this paper is to demonstrate the potential of variational methods, briefly presented in Sect. 2, for the analysis of distributed rainfall-runoff models. A very simple application case is adopted for this prospective study. Other investigations involving more complex configurations and other model structures were carried out and will be reported in due course. A distributed flash flood model described in Sect. 3 was applied to a small catchment of the Thoré basin. The authors seek to illustrate what can be learned or corroborated using forward and adjoint sensitivity analysis for understanding the mapping between the model parameters and the simulated hydrological response. Using synthetic observations, the ability of efficient adjoint-based optimization to estimate reliable values for the model parameters is investigated. The results of sensitivity analysis and parameter estimation experiments are provided in Sects. 4 and 5 and the paper is concluded by a discussion of the main outcomes.

Variational methods for sensitivity analysis and parameter estimation
Variational methods provide a deterministic framework for the analysis and control of physical systems. The mathematical formalism, based on functional analysis and differential calculus, was largely expanded by related scientific disciplines such as optimal control and optimization. Adopting the vocabulary of optimal control theory, model parameters, initial and boundary conditions are referred as control variables living in a so-called control space. This differential approach enables the exact calculation of the derivatives of a function of the model prognostic variables with respect to all control variables. Using the adjoint state method, the computational cost can be indepen-dent from the dimension of the control space. It is precisely this feature which makes the approach very attractive for the analysis and control of spatially distributed systems.
Although any control variable can be analysed, the focus of this paper is parametric uncertainty. The use of derivatives for local sensitivity analysis and parameter estimation is addressed in Sects. 4 and 5. In this section, the calculation strategy is briefly described using a simplified mathematical formalism and the practical implementation of the approach is discussed.
For a didactic presentation of the approach, let us consider that the behavior of the system between times t 0 and t f is described by a generic model of the form: where x is the state variable of dimension N s , M a nonlinear operator (after space discretization) and α a vector of parameters of dimension N p . When the model parameters are fixed to α=ᾱ,x the corresponding nominal value for the state variable x is obtained by solving the system given by Eq.
(1). In order to analyse or control the behaviour of the system, let us define a generic objective functional: where φ is a nonlinear function of the state variables and model parameters. The objective function J can represent a specific aspect of the system behaviour (i.e. response function) or quantify the misfit between the model diagnostic variables and the available observations (i.e. cost function). The gradient of the functional J (real valued scalar function) with respect to α at the pointᾱ is given by The components of this vector quantify the rate of change of J along the vectors of the standard basis in the parameter space. After the application of a normalization procedure (discussed in Sect. 4), importance measures can be estimated in order to compare the relative influence of the various model parameters on the response of interest. When J is a cost function to be optimized, ∇ α J can drive very efficient gradient-based optimization algorithms (e.g. quasi-newton) for the estimation of model parameters.

Problems underlying the approximation of derivatives
Differential sensitivity analysis and gradient-based parameter estimation both rely on an efficient and accurate evaluation of partial derivatives. The most common technique for the evaluation of the gradient components consists in repeated model evaluations. For example, the first order finite difference approximation for the i-th component is given by where ε refers to a perturbation applied to the nominal value of α i . Using this approach, the model can be considered as a black box and the practical implementation is straightforward. However, the precision and efficiency of this technique are very limited. The accuracy of the approximation crucially depends on choice of the step size ε.
In practice, there is a very difficult compromise to be found. Large values of ε lead to truncation error, small values may give rise to cancellation/round-off error. Moreover, for the approximation of all gradient components (i.e. ∂J /∂α i i=1, . . . , N p ), perturbations should be applied along all vectors of the canonical basis of R N p . Using the first order finite difference approximation given by Eq. 4, N p +1 model evaluations are necessary. This number is of course larger for higher order approximations. Therefore, the overall computational cost is at least linearly related to the dimension of the parameter space.
In order to avoid the use of a perturbation parameter ε, derivatives can be calculated analytically. A very general and comprehensive mathematical formalism for differential sensitivity analysis was proposed by Cacuci (1981a,b). It is based on the concept of Gâteaux derivative, a generalisation of the concept of directional derivative in differential calculus.

Forward sensitivity analysis
The derivative of the objective function J at the pointᾱ in the directionα is given by: wherex refers to variations on the state variable x resulting from perturbations on the parameters α in the directionα. It is important to note thatĴ (ᾱ,α)= ∇ α J,α ( , standing for the scalar product). Given that x is governed by Eq. (1), it is necessary to derive this system in order to estimatex. Therefore,x is solution of the following system: where [∂M/∂x] represents the Jacobian of the model with respect to the state variables and [∂M/∂α] the Jacobian of the model with respect to the model parameters.
The system given by Eq. (6) is the so-called tangent linear model (TLM). For a given perturbationα,x is obtained by the resolution of the TLM. The resulting variation of the objective function (e.g.Ĵ (ᾱ,α)) can be calculated from Eq. (5).
However, in order to obtain all gradient components, the operation should be repeated forα corresponding to the different vectors of the canonical basis of R N p . This means that the resolution of the TLM has to be carried out for each directionα. Therefore, the precision problem (i.e. approximation error) is addressed but the overall computational cost is still dependent on the dimension of the parameter space. This difficulty can be overcome by using the adjoint state method.

Adjoint sensitivity analysis
The linearity ofĴ (ᾱ,α) with respect toα is produced using the introduction of an auxiliary variable p (of dimension N s ). It can be shown (Cacuci, 1981a;Le Dimet and Talagrand, 1986) that if p is governed by the following system the gradient is given by where [ ] T stands for the transpose.
It is important to note thatx andα do not appear in Eqs. (7) and (8). Therefore, once p is known by integration (backward in time) of the system described by Eq. (7), all the components of the gradient ∇ α J needed for sensitivity analysis and parameter estimation can be calculated. In the mathematical optimization framework, the objective is to maximize/minimise the cost function J while x is subject to the constraint given by Eq. 1 (i.e. x should verify the governing equations). In this case, the adjoint variables correspond to the Lagrange multipliers of the constrained optimization problem. The principal difficulty resides in the derivation and transposition of complex operators.

Practical implementation
In principle, forward and adjoint sensitivity analysis can be performed on the continuous or discrete formulation of the model. Different implementation strategies can be adopted depending if the derivation and transposition operations are carried out on the continuous form of the direct model, on its discretized form or directly on the computer code. Algorithmic differentiation (AD) is usually preferred (see Sei andSymes, 1995 or Sirkes andTziperman, 1997 for counterexamples) for reliable and accurate derivatives. The reader is referred to Griewank (2000) for a comprehensive description of AD.
The source code implementing the model and objective functional is a concatenated sequence of instructions. Each statement contains elementary functions which can be easily derived (i.e. local Jacobians). Algorithmic differentiation (AD) is based on a rigorous application of the chain Hydrol. Earth Syst. Sci., 13, 503-517, 2009 www.hydrol-earth-syst-sci.net/13/503/2009/ rule (e.g. product of local Jacobians) in the forward or reverse direction. In order to use this discrete equivalent of forward and adjoint sensitivity analysis methods, the source code of the model should be available. The availability of automatic differentiation engines (see http://www.autodiff.org/) provide a helpful and efficient support for the practical implementation of variational methods. It is important to emphasise that using this implementation strategy, the potential on-off switches characterising the representation of physical processes (i.e. thresholds in model formulation) are simply reported in the TLM and ADM (Zou et al., 1993). The derivative provided by AD is therefore an element of the subgradient.

Flash flood model
A very simple and common model structure was adopted and applied to a small catchment using synthetic observations. An event-based distributed rainfall-runoff model is applied to a small area in the upper part of the Thoré catchment (Tarn department, South West of France).

Model description
The underlying physics of MARINE flash flood model (Estupina-Borrell et al., 2006) is adapted to events for which infiltration excess dominates the generation of the flood. A simplified version of the model is used for this prospective study. Rainfall abstractions are evaluated using the Green Ampt infiltration model and the resulting surface runoff is transferred using the Kinematic wave approximation (KWA). The complex geometry of the catchment is described by a structured grid in which each cell receives water from its upslope neighbors and discharge to a single downslope neighbor (steepest direction). For a one dimensional flow of average velocity u and average depth h, the continuity equation can be expressed as: where r is the rainfall intensity and i the infiltration rate. Using the KWA approximation, which has shown the ability to represent channelized and sheet overland flow (Singh, 2001), the momentum conservation equation reduces to an equilibrium between the bed slope S 0 and the friction slope S f . The Manning equation (uniform flow on each grid cell) is used to relate the flow velocity and the flow depth: where R is the hydraulic radius, n the Manning roughness coefficient and w the elemental flow width. In this simplified version of the model, the flow width is constant (rectangular section) and given the ratio between the width (grid resolution) and the flow depth the hydraulic radius is approximated by the water depth (i.e. R=h). The resulting equation governing the overland flow is given by: In the right hand side of Eq. (11), the infiltration rate i(t) is estimated using the very common Green-Ampt infiltration model. For an homogeneous soil column characterised by its effective hydraulic conductivity K, ψ the soil suction at wetting front, the potential infiltration rate is given by where θ is the relative initial soil moisture (i.e. θ∈[0, 1]), η the soil porosity and I (t) the cumulative infiltration at time t. After surface ponding, the cumulative infiltration at time t+ t can be calculated by the following equation which is solved by Newton's method.
In order to carry out the sensitivity analysis and parameter estimation experiments presented in Sects. 4 and 5, it is necessary to implement the tangent linear and adjoint models for the hydrological model presented in this section.

Computer code differentiation
As emphasised in Sect. 2.4, the best representation of the operator to be derived is the associated computer code. The algorithmic differentiation of the MARINE source code (in Fortran 90) was carried out with the support of an automatic differentiation engine. The TAPENADE automatic differentiation engine (Hascoët and Pascual, 2004), a source-tosource transformation program, was adopted because of its flexibility and efficiency for both forward and reverse modes. Preliminary modifications of the source code were necessary and the code produced by TAPENADE was optimized in order to reduce the memory footprint and the computational time. This leads to a computational time for the adjoint model which is about 6 times higher than the one observed for a single model evaluation.

Case study description
MARINE was applied to a very small catchment area (25 km 2 ) from the upper part of the Thoré basin which was affected by a catastrophic flood event in November 1999. During this event, the observed cumulative rainfall was 135 mm and the maximum intensity around 75 mm/h −1 . Using the observed rainfall forcing (radar data from Météo France) and prior values derived from published tables for the model parameters, synthetic observations are generated. Using the values specified in Table 1 (spatially uniform values for model parameters), the resulting specific discharge is typical for Mediterranean flash flood events. The two sub-catchments contributing to the discharge in Labastide-Rouairoux (outlet situated to the north on Fig. 1) are drained by the Thoré river (eastern part) and the Beson stream (smaller sub-catchment in the western part).

Differential sensitivity analysis
In differential sensitivity analysis, first order importance measures are calculated from the gradient when the response is scalar (e.g. peak discharge). For a vectorial response (e.g. entire flood hydrograph), the Jacobian matrix of the transformation can be evaluated. It can be computed column by column using the forward mode of AD, line by line using the reverse mode. When the dimension of the parameter space is much larger than the dimension of the response to be analysed, the adjoint technique (e.g. reverse mode of AD) is the most efficient calculation method. However, using the rate of change along the vectors of the standard basis (components of the gradient), the parameters cannot be ranked because the nominal values might be characterised by different units and therefore different orders of magnitude. It is possible to normalize the partial derivatives with the associated nominal values for the parameter and response (e.g. ∂J /∂K.K/J ). In this case, the importance measure corresponds to the effect on the response from perturbing the parameter by a fixed fraction of its base value. In order to take the uncertainty underlying the different parameters into account, the associated standard deviations (or variance) can be used for the normalization. The resulting importance measure corresponds to the effect on the response from perturbing the parameter by a fixed fraction of its standard deviation (Helton, 1993). Although the approach is quite appealing, this means that derivatives are used for ranking over the space of uncertain parameters. In this paper, the choice was made to prefer a strictly local analysis in which the assessment concerns the base point where derivatives are evaluated.
In the following paragraphs, forward sensitivity analysis will be carried out to a parametrization of reduced dimensionality, adjoint sensitivity analysis will be used for the fully distributed parametrization. In order to facilitate the interpretation of the results, a spatially lumped rainfall forcing is used for most of the experiments presented in this section.

Numerical experiments with network/hillslopes parametrization
In this paragraph, a classical reduction of the control space is adopted. The drainage network and the hillslopes are distinguished using a threshold on the drained area. It leads to the definition of basis vectors exclusively composed of 0 and 1 on the hillslopes or drainage network. The spatially distributed parameters are expressed in this basis (e.g. rather than the canonical basis). In other words, for each parameter, a scalar multiplier is applied on the hillslopes and another in the drainage network.

Analysis of a scalar response
The relative importance of the scalar multipliers on two aspects of the hydrological response (flood volume and flood peak) is provided by Figs. 2 and 3. The parameters θ, ψ and η appear as a product in the infiltration model (Eq. 12). Therefore, given the adopted normalization procedure, they have exactly the same influence on the model response. If a statistical normalization procedure is adopted, the ranking of those parameters would be completely driven by the associated statistical properties (variance or standard deviation).
It is important to note that for both flood volume and flood peak, all sensitivities are negative. This means that increasing the nominal value for all the parameters reduces the magnitude of the response. In fact, increasing the Green-Ampt model parameters leads to increased infiltration loses and therefore reduces the flood volume and flood peak. Increasing the friction parameters leads to a flatter flood hydrograph and therefore also reduces the flood peak.
Hydrol. Earth Syst. Sci., 13, 503-517, 2009 www.hydrol-earth-syst-sci.net/13/503/2009/ The analysis of Fig. 2 confirms that the flood volume is mainly driven by the infiltration parameters on the hillslopes (hydraulic conductivity K and initial soil moisture θ). The infiltration parameters assigned on the hillslopes also have a significant influence on the flood peak but the effect of modifying the friction parameter in the drainage network is much larger (see Fig. 3).
The results presented above are strictly local and the importance measures are affected by the nominal values assigned to the different parameters. However, given the reduced computational cost of the analysis, it can be carried out at different locations in the parameter space. Additional experiments (not reported in this paper) where conducted along a transect of the parameter space (θ ∈ [0, 1]). The results show that the wetter the soil at the beginning of the event, the faster the decay of the infiltration rate to the hydraulic conductivity, and therefore the greater the relative influence of K when compared to the initial soil moisture θ .

Analysis of the flood hydrograph
In order to analyse of the effect of parameter variations on the complete flood hydrograph, a vectorial response containing the temporal evolution (80 time steps) of the simulated discharge was considered. Given to the ratio between input and output space dimensions (i.e. 6/80), the Jacobian matrix is computed using the tangent mode of TAPENADE (i.e. forward sensitivity analysis).
Each column of the Jacobian matrix represents the variations in discharges resulting from the perturbation of one of the model parameters. After normalization, the physical interpretation of the lines and/or columns of the Jacobian matrix can provide an interesting insight (not reported in the present contribution). However, a very interesting perspective is provided by the singular value decomposition of this Jacobian matrix.
The singular value decomposition (SVD) of an m×n matrix A is a factorization of the form where S is a diagonal matrix containing the singular values of A in the decreasing order while U and V are orthogonal matrices (respectively of dimension m×m and n×n). The set of entries composing the main diagonal of S, denoted σ 1 , σ 2 , . . . , σ min(m, n) , is referred as the singular spectrum of A. The columns of U = {u 1 , u 2 , . . . , u m } and V = {v 1 , v 2 , . . . , v m } are the left and right singular vectors in the input and output spaces of the transformation represented by A. The magnitude of the singular values in S represents the importance of the corresponding singular vectors in the columns of U and V. This factorization is widely used for the analysis of linear ill-posed inverse problems (Hansen, 1998). Its application to non-linear systems can serve many purposes such as sensitivity or identifiability analysis, control variables estimation or perturbations growth analysis in ensemble prediction (Li et al., 2005;Doherty and Randall, 2008;Clément et al., 2004;Marchand et al., 2008;Durbiano, 2001;Buizza and Palmer, 1995). For the adopted network/hillslopes parametrization, the singular spectrum is given by Table 2 and the components of the first 2 singular vectors in the parameter space (right singular vectors) are plotted in Fig. 4. From Table 2, it can be seen that the decay of the singular values is very rapid. Most of the variability (more than 97%) is captured by the first two singular vectors. These vectors exhibit a clear distinction between the production and transfer of runoff. This could have been expected because they represent orthogonal directions in the parameter space.
Given the components of the first singular vector and the magnitude of the singular value associated, the scalar multiplier affecting the friction in the drainage network has an overwhelming influence on the flood hydrograph. The analysis of the second singular vector components indicates a predominance of the hillslopes infiltration parameters and a potential compensation with friction parameters. Using the adjoint state method (reverse mode of AD) the computational cost related to the evaluation of local sensitivities is not related to the dimension of the parameter space. Therefore a similar analysis was carried out without reduction of the control space using scalar multipliers.

Numerical experiments with fully distributed parameters
When no strategy is adopted for dimension reduction, the use of sampling based sensitivity analysis methods is not tractable for distributed parameter systems. In this paragraph the full potential of the adjoint method is exploited in order to analyse the effect on the hydrological response of variations on the value assigned to each element of the computational grid for the model parameters. The analysis is also carried out for a scalar response and for the entire flood hydrograph.

Analysis of a scalar response
For a scalar response, a single integration of the adjoint model yields to sensitivity indices for all parameters at all spatial locations. In this paragraph, only the sensitivity to the flood peak is reported. Within a single river reach, it is obvious that increasing the friction will reduce the maximum discharge (i.e. negative sensitivities at all spatial locations). The situation is more complex when dealing with overland flow over the topography of a catchment. The analysis of Fig. 5 reveals that positive and negative sensitivities are encountered over the catchment. Negative sensitivities are much larger than positive sensitivities in magnitude but there are locations where increasing the friction coefficient lead to a slight increase of the maximum discharge. While all sensitivities have the expected sign (i.e. negative) along the main stream (i.e. Thoré river), some positive sensitivities can counterbalance the overall effect in some concomitant sub-basins (e.g. area drained by the Beson stream and some hillslopes along the Thoré river).
Therefore, when applying a single scalar multiplier for the entire catchment, compensation effects usually occur which are very difficult to identify without such analysis. A simple corroboration can be carried out using multiple model evaluations. As an illustration, increasing the nominal by 10% for all roughness coefficients leads to −4.5% variation on the peak discharge. This variation is −5.9% when only the cells featuring a negative sensitivities are modified and it becomes +1.5% when the same operation is carried out on the cells featuring positive sensitivities.

Analysis of the flood hydrograph
When considering the entire flood hydrograph, the ratio between the input and output space dimensions is now very close to 100 (i.e. 3×2582/80). Therefore, the Jacobian matrix is computed line by line using multiple integrations of the adjoint model.
In order to facilitate the physical interpretation, the SVD is performed on sub-Jacobians. Each sub-Jacobian accounts for a single parameter but for all spatial locations. For a given parameter, the analysis enables an extensive understanding for the influence of the values specified over the entire catchment.
For this specific analysis, the singular vectors in the parameter space can be mapped on the surface of the catchment. They are provided for the parameters n and K by Figs. 6 and 7. The analysis of the previously cited figures enables an extensive insight into the model behavior. The drainage network is emphasized for the friction coefficient (Fig. 6) and flow concentrations areas can be distinguished for the hydraulic conductivity (Fig. 7). Significant interactions between different regions of the catchment were already identified when analysing the variability of the sensitivity to the flood peak (see Fig. 5

). A similar behaviour is
Hydrol. Earth Syst. Sci., 13, 503-517, 2009 www.hydrol-earth-syst-sci.net/13/503/2009/ encountered when analysing the entire flood hydrograph for both friction coefficient and hydraulic conductivity. The leading singular vector mainly corresponds to the Thoré river and drained area for both n and K (Figs. 6a and  7a). However, the interacting regions are already characterised by different signs. For the second singular vectors, the catchment regions and signs are inverted (Figs. 6b and  7b).
The interactions between the two sub-basins can be also analysed in the observation space. For the roughness coefficient n , the components of u 1 and u 2 (singular vectors in the observation space) are plotted together with the outlet discharge (see Fig. 8). The analysis of this figure explains that the slight disruptions of the hydrograph during both the rising limb and the recession are mainly due to the flood wave coming from the Beson stream. For the imposed spatially lumped rainfall, given that the smaller sub-basin (holding v 2 ) is closer to the catchment outlet, the resulting smaller concentration time leads to a quicker response perfectly characterised by u 2 . The correspondence between the singular vectors in the parameter and observation spaces is really informative and meaningful.
In addition, the singular spectrum for all parameters and different forcing conditions (lumped and spatially variable rainfall) was also analysed. The analysis of Fig. 9 reveals that the decay of singular values is faster for the roughness coefficient when compared to the infiltration parameters. Although the influence of friction is very important, less singular vectors are necessary to describe the sub-space producing variability in simulated discharges. This is due to the fact that this subspace is mainly restricted to the drainage network for friction parameters (see Fig. 6).
For a spatially distributed rainfall forcing the decay of singular values appears to be slower and the gap between friction and infiltration parameters is reduced. The fact that more singular vectors are necessary to describe the sub-space producing important variations of the simulated discharges is a sign of increased information content. This increase in information content was expected when comparing the results obtained with uniform rainfall forcing with those obtained for spatially variable precipitations.
It was shown in this section that the derivatives obtained with algorithmic differentiation provide a valuable introspection into the relation between the model parameters and the simulated hydrological response. The availability of accurate adjoint-based sensitivities also enables the use of efficient gradient-based optimization techniques for the estimation of model parameters.

Gradient-based parameter estimation
In this paper, a bound-constrained (inequality constraints) quasi-Newton (BFGS) optimization algorithm (Lemarchal and Panier, 2000) from the MODULOPT library was used. Using the adjoint sensitivities, the algorithm estimates the active set by performing a Wolfe line search along the gradient projection path.

Numerical experiments with network/hillslopes parametrization
Synthetic observations are generated with the parametrization described in the previous section with K net =4 mmh −1 , K hill =2 mmh −1 , n net =0.05, n hill =0.08 and θ=0.5 (uniform over the catchment). The Nash criterion is used to measure the misfit between model simulations and the synthetic observations. As shown in Fig. 10, all control variables are retrieved independently from the initial parameter values (initial point for the optimization routine). The relative importance of the parameters inferred from the local sensitivity analysis results seems to be similar for the Nash efficiency over the bounded parameter space. The more sensitive is the performance measure to a parameter, the greater is the identifiability of this parameter and therefore the faster the iterates convergence to the reference value (e.g. parameters n net and K hill ).
It is important to emphasize that both adjoint (required to the estimation of the gradient) and direct model evaluations (required required for the line search) were reported in Fig. 10. The total number of iterations (less than 50) is very small when compared to the number of model evaluations required by evolutionary algorithms like the very popular Shuffled Complex Evolution (SCE) from Duan et al. (1992). As commented by Kavetski et al. (2006), "5 min of Newton computing often replaces 24 h of SCE search and yield useful additional information".
In the parameter estimation experiments presented in this section, a network/hillslopes parametrization was adopted in order to ensure the identifiability of model parameters. However, the sensitivity analysis results described in Sect. 4.2.2 have shown that the sub-space from the original parameter space driving the simulated discharges is spanned by the  leading singular vectors (i.e. in the parameter space) of the Jacobian matrix. Given that a small number of singular values are dominant, as illustrated in Fig. 9, most of the variability can be captured with very few orthogonal directions in the parameter space. In the following paragraph, the leading singular vectors of the Jacobian matrix are used in order to reduce the dimensionality of the parameter estimation problem.

Numerical experiments with TSVD parametrization
Using the sensitivity analysis outcomes, it is possible to carry out parameter estimation in a reduced basis taking the observations information content into account. The spatially distributed model parameters are therefore expressed in the basis spanned by the Jacobian leading singular vectors.
The derivation was carried out at a specific point of the parameter space. However, singular vectors in the parameter space are mainly determined by the topography of the catchment and the spatial variability of rainfall. Important variations of those orthogonal directions where not encountered when the Jacobian is evaluated at different locations in the parameter space. In order to compute the singular vectors describing the relevant sub-space for parameter estimation, the SVD was performed for the Jacobian calculated with spatially uniform rainfall forcing. A more rigorous approach would require the Jacobian to be computed with the response of the catchment for several rainfall events.
When compared to the parameter estimation experiments carried out in the previous section, a more complex virtual hydrological reality was adopted for the generation of the synthetic flood hydrograph. A relatively meaningful spatial variability was imposed for the different model parameters. The hydraulic conductivity K is decreasing linearly with the distance to the hydrographic network and the roughness coefficient n on the hillslopes is assigned according to a land-use classification derived from a SPOT satellite image. The friction coefficient is constant in the drainage network and the initial soil moisture θ constant over the catchment. The initial condition for the optimization consist in the values specified in Table 1  The calibration problem is then tackled using parametrizations of increasing dimensionality. The simpler parametrization P 1 assigns a single scalar multiplier for each parameter over the entire catchment area (i.e. n K =n n =n θ =1 in Table 3). For P 2 , a scalar multiplier is applied to the hillslopes and another one to the drainage network for the hydraulic conductivity and friction coefficient (i.e. n K =n n =2).
Then, apart from the parameter θ , the number of degrees of freedom is gradually increased by taking as a basis the singular vectors driving X% of the variability for parameters K and n (parametrizations P SVX in the table). As reported during the sensitivity analysis (Sect. 4.2.2), the number of degrees of freedom required for the roughness coefficient is much lower than that obtained for the hydraulic conductivity. The number of degrees of freedom for each parameter, the Nash performance for the estimated parameter set and the inverse of the condition number are given in Table 3. The condition number was calculated with an approximation of the hessian after the last BFGS update (i.e. at the optimum) of the quasi-newton algorithm. It is reminded that the larger the ratio 1/κ(H ), the better is the conditioning of the optimization problem.
From the results shown in Table 3, it seems that using this description of the parameter space the number of control variables can be increased without altering the conditioning of the optimization problem. The previous statement is valid as long as the vectors describing the kernel in the parameter space (the specified degrees of freedom which do not significantly alter the hydrological response) are not introduced in the parametrization. The results obtained with P SV90 show that even with noise-free observations, the use of those directions for the description of the affordable sub-space lead to instability in the inverse problem.
However, it is interesting to note that with respectively 7 and 10 degrees of freedom (i.e. P SV70 and P SV80 ), the % of variability eigenvalues K (uniform rainfall) n (uniform rainfall) θ, η, φ (uniform rainfall) K (radar rainfall) n (radar rainfall) θ, η, φ (radar rainfall) Fig. 9. Singular values spectrum for lumped and spatially variable rainfall. Table 3. Complexity (label and number of degrees of freedom for the parameters K, n and θ ), Nash efficiency (with synthetic noise free observations) and conditioning for the different parametrizations.
n K n n n θ Nash 1/κ(H ) conditioning is even better than the one obtained with parametrization P 2 (5 degrees of freedom). As emphasised by Tonkin and Doherty (2005), the subspace determined from the truncated singular value decomposition of the Jacobian (TSVD) is determined from the observations information content whereas the subspace constructed from a prior parsimony strategy is not. In the previously cited contribution, the Jacobian matrix was approximated using finite differences and used in the linearized equations of the Levenberg-Marquardt method. In order to prevent overfitting and combine the advantages of TSVD and Tikhonov regularizations an hybrid regularization methodology was proposed.
In fact, while the truncation level of the SVD is the only mechanism for preventing over-fitting, the penalization term of the Tikhonov approach is also a way to insert prior information on the parameters. In the experiments carried out in this paper, the hydrological reality is known. The comparison with the estimated parameters have shown that the improvement of performances in terms of Nash efficiency do not necessarily come with parameters closer to the reference values.

Discussion and conclusions
Using a relatively simple application case, it has been shown that the potential of variational methods for distributed catchment scale hydrology should be considered. Although for this particular application many outcomes are limited to evidence retrieval, the adopted approach should be further exploited.
The fact that other techniques might offer a practical advantage (quick and easy implementation) when compared to variational methods cannot be contested. However, the practical implementation of the adjoint state method is largely facilitated by the advent of very efficient automatic differentiation engines such as the one adopted for this study. The use of this approach is at best anticipated in the development stage of hydrological models but remain affordable for existing computer models.
It is important to emphasise that a single integration of the adjoint code, encompassing the forward integration of the direct model and the backward integration of the adjoint model, yields all spatial and temporal sensitivities (Hall and Cacuci, 1983). The key advantage of this technique is that the computational cost is independent from the dimension of the control space. The results provided in this paper show that spatially distributed parameter sensitivities can be obtained for a very modest calculation effort (∼6 times the computing time of a single model run). The analysis of an essential aspect of the simulated hydrograph such the maximum discharge has shown that the influence of the friction coefficient assigned at different spatial locations was characterised by relatively complex sensitivity patterns.
For the analysis of a vectorial response such as the flood hydrograph, the Jacobian of the transformation can be calculated using the adjoint technique. The physical interpretation of the singular vectors in the parameter and observations spaces brings out relevant features of the rainfall-runoff transformation. Furthermore, the analysis of the singular spectrum can be used to apprehend the complexity of an affordable parametrization and compare the information content of different rainfall events. Sensitivity analysis can be motivated by different goals. For understanding the model behaviour for parameter values leading to an acceptable fit to the available observations, the analysis should be carried out with posterior probability distribution functions (PDFs). Using sampling based approaches, this is rarely the case in practice, partly because posterior PDFs are often characterized by dependence which can be difficult to represent and compromise the use of many existing global sensitivity analysis techniques (see Kanso et al., 2006 for one of the few attempts). For this specific setting, local sensitivity analysis at the best estimate might prove very informative. Many of the outcomes, such as the spatially distributed importance measures, are mainly driven by the topography of the catchment and the spatial variability of the rainfall forcing rather than the specific point in the parameter space used for the analysis.
For the estimation of model parameters, even when its is obtained with an empirical dimension reduction (e.g. scalar multipliers), uni-modality or at least limited multi-modality can be achieved for many hydrological models. In this case, gradient-based optimization techniques are indisputably the most efficient (i.e. convergence to single mode or exploration of multi-modality). As underlined by Kavetski et al. (2006) convergence safeguards such as line-searches and trust-regions in modern gradientbased algorithms improve significantly the reliability of the estimates. The accuracy of the gradient can be essential and to this regard, the approach used in this paper avoid the specification of a perturbation parameter (required for finite difference approximation). The derivatives computed with the reverse mode of algorithmic differentiation (i.e. adjoint method) were found exceedingly efficient in driving a bound-constrained quasi-newton optimization algorithms to the reference values used to generate synthetic observations.
Although the number of gradient/model evaluations is already eloquent, the authors acknowledge that a comparison with global non-smooth optimization techniques such as the Shuffle Complex Evolution from Duan et al. (1992) would strengthen the argument of the paper. However, the mathematical representation of hydrological processes in distributed models tend to produce smoother response surfaces. Although the presence of thresholds remains in the model formulations, they usually occur at the grid element level and do not produce discontinuous derivatives for the cost function. In fact, as far a discharge is concerned, the cost functions used for the calibration of model parameters involve an integration of the residuals over time for an integrated hydrological response (i.e. spatio-temporal smoothing).
As advocated by Moore and Doherty (2006a,b), empirical dimension reduction do not allow all the information to be extracted from observation data in order to reduce the predictive model error. Parameters at different locations over the surface of the catchment are not equally constrained by the observations of the hydrological response. The identifiable subspace can described using the truncated singular value decomposition of the Jacobian matrix (TSVD). Doherty and Randall (2008) recently proposed statistics for evaluating parameter identifiability and error reduction using this factorization of the Jacobian matrix.
The experiments carried out with TSVD parametrization show that this technique represents a promising regularization strategy. However, as emphasised by Tonkin and Doherty (2005), it is essential to combine this approach with Tikhonov regularization in order to account for prior information and prevent overfitting. The appropriate calibration paradigm would therefore require a good compromise to be found between flexibility and stability. The objective would be to improve prior values rather than conducting a blind search over an arbitrarily reduced parameter space.