Interactive comment on “ Incorporation of rating curve uncertainty in dynamic identifiability analysis and model structure evaluation ” by S

Abstract. When applying hydrological models, different sources of uncertainty are present and the incorporation of these uncertainties in evaluations of model performance are needed to assess model outcomes correctly. Nevertheless, uncertainty in the discharge observations complicate the model identification, both in terms of model structure and parameterization. In this paper, two different lumped model structures (PDM and NAM) are compared taking into account the uncertainty coming from the rating curve. The derived uncertainty bounds of the observations are used to derive limits of acceptance for the model simulations. The DYNamic Identifiability Approach (DYNIA) is applied to identify structural failure of both models and to evaluate the configuration of their structures. The analysis focuses on different parts of the hydrograph and evaluates the seasonal performance. In general, similar model performance is observed. However, the model structures tend to behave differently in function of the time. Based on the analyses we did, the probability based soil storage representation of the PDM model outperformed the NAM structure. The incorporation of the observation error did not prevent the DYNIA analysis to identify potential model structural deficiencies that are limiting the representation of the seasonal variation.


Introduction
Model structures applied in hydrological rainfall-runoff modeling have traditionally been considered as a fixed representation of the dominant processes of the underlying system resulting in a set of model structures like HBV (Lindstr öm et al., 1997), PDM (Moore, 2007), TOPMODEL (Beven and Kirkby, 1979) amongst others, assuming that different conditions can be captured by optimizing the parameter values.These models proved their capability by their range of applications and are still essential since they allow to interpret the dependencies of its parameters on catchment properties, benefiting model interpretation and regionalization (Fenicia et al., 2011).Nevertheless, the Figures

Back Close
Full "uniqueness of place" (Beven, 2006) results in difficulties in finding the optimal parameter set (i.e.lack of identifiability), indicating potential failures of the model structure amongst other sources of uncertainty.Recently, efforts towards ad hoc identification of the model structure of the system under study and the idea of adding flexibility to the model development process are pursued (Wagener et al., 2001;Fenicia et al., 2011;Clark et al., 2008).This facilitates the testing of model structures as hypotheses of the underlying dynamics (Clark et al., 2011).Nevertheless, identification of the most appropriate model structure(s) in combination with parameterizations of those dynamics is an essential part to ensure model structures are accepted for the right reasons (Kirchner, 2006).
To evaluate the suitability of an individual conceptual model structure by seeking a best parameter set based on a particular objective function (e.g.Nash-Sutcliffe efficiency) is generally insufficient.Both the use of multiple non-commensurable objectives measuring the model's performance during different response modes (Gupta et al., 1998;Boyle et al., 2000) or the use of a combined set of limits of acceptance focusing on different flow characteristics allows to link model performance and model components (Blazkova and Beven, 2009).The use of data different from measurements, such as groundwater level information or isotope data (Fenicia et al., 2008;Winsemius et al., 2009) is preferable, but in many cases not available and the flow time series remains the main basis for model evaluation.Hence, such evaluation criteria are very dependent on the reliability of the flow measurements they use.The inherent uncertainty in these observed flows restricts the ability to discriminate among competing model structures (Clark et al., 2011).Taking into account the uncertainty on the rating curve in the model evaluation is thereby worthwhile investigating.
With regard to rating curve uncertainty, Di Baldassarre and Montanari (2009) distinguish (1) errors of the stage-discharge relation induced by interpolating and extrapolating of river discharge observations, (2) the presence of unsteady flow and (3) the seasonal variation of the roughness, with increasing errors when discharges increase.
To determine the observational error from rating curve interpolation and extrapolation, Figures Blazkova and Beven (2009) and Westerberg et al. (2011a) use a fuzzy regression method introduced by Hojati et al. (2005).Pappenberger et al. (2006) use a twodimensional fuzzy membership function to evaluate the parameter combinations for the rating curve functions resulting in likelihood measures to compute uncertainty bounds in prediction.Krueger et al. (2010) and McMillan et al. (2010) further extended this concept by fitting the rating curve towards a subset of data points and checking consistency of the fit with the remaining points.
The incorporation of the uncertainty of the rating curve in model evaluation has recently been described in literature and most approaches use a time step based method.Beven (2006) proposed the extended Generalized Likelihood Uncertainty Estimation (GLUE) approach as a way to partly avoid the subjectivity of the GLUE uncertainty analysis by translating the uncertainty of the discharge observations in limits of acceptance' (Blazkova and Beven, 2009;Westerberg et al., 2011b;Krueger et al., 2010;Liu et al., 2009).Fuzzy weighting functions (in most cases triangular) are used to assign time step based weights according to the level of performance and the combined weight can be used within the regular GLUE framework to form prediction limits (Beven, 2008).McMillan et al. (2010) on the other hand derive the complete Probability Density Function (PDF) of the measured flow to form a likelihood function used in Bayesian inference parameter search.This results in higher parameter uncertainty and hence wider uncertainty bounds for flow predictions compared to the use of a deterministic rating curve based inference scheme.
The incapability of finding an identifiable set of parameters based on the different sources of uncertainty, also referred as "equifinality", is addressed by the GLUE approach by accepting all parameter sets that are behavioral according to the proposed limits of acceptance (Beven, 2006).Nevertheless, inadequacy of the model structure to represent the dominant processes further contributes to this lack of parameter identifiability.Hence, methods to evaluate and enhance the identifiability of the parameters in more detail are essential for model structure evaluation.Parameter identifiability can be quantified in different ways, but consists in general of an evaluation of the sensitivity of Introduction

Conclusions References
Tables Figures

Back Close
Full the parameter and the dependency towards other parameters (De Pauw et al., 2008).
A model parameter that is highly sensitive towards the model output and which effect is not canceled out by other parameters can be regarded as identifiable.Brun et al. (2001) propose two indices to evaluate the identifiability of the parameters when using large environmental models.By detecting linear and non-linear relations of parameters, Hengl et al. (2007) identify model components causing non-identifiabilities.
Parameter identifiability is highly related to the information content of the data, since the data provides the dynamic conditions to perform the identifiability analysis.Vrugt et al. (2002) anticipate this by identifying informative observations for the identification of the parameters in a sequential optimization algorithm.Temporal analysis to evaluate the information content of the data and to extract signature information is a valuable procedure to identify potential model deficits.de Vos et al. (2010) use temporal clustering to identify periods of hydrological similarity.Reusser and Zehe (2011) propose an approach to relate types of model errors with parameter sensitivity and model component dominance to understand model structural deficits.Reichert and Mieleitner (2009) combine the estimation time dependent model parameters with the degree of bias reduction to identify model deficiency.The DYNamic Identifiability Analysis (DYNIA) developed by Wagener et al. (2003) builds on the GLUE by evaluating the parameter identifiability in a moving window.For each window, an evaluation is done and the distributions of the preferred parameter values are plotted as a function of time.Temporal changes of the parameter probability distribution suggests the compensation of shortcomings in the model structure by these changing optimal parameter values.The approach is highly related to the idea of evaluating the model during different response modes (Wagener et al., 2001), since it reacts on the limitations of model evaluation when aggregating model errors in time, hereby limiting the information on the performance of individual model components active during different periods of time.Parameter dependency is implicitly addressed by the univariate marginal distribution of the parameter by the absence of strong peaks (Wagener et al., 2003).Introduction

Conclusions References
Tables Figures

Back Close
Full In this paper, the DYNIA method is applied using the uncertain measured flow values as source for model structure evaluation.This is combined with the extended GLUE approach by using the uncertain observed values as limits of acceptance for the model evaluation.Two hydrological model structures, NAM (Nielsen and Hansen, 1973) and PDM (Moore, 2007), implemented in the Python programming language (Python, 2012) are evaluated.Furthermore, the evaluation focuses on different response modes and seasonal variation.The paper is structured as follows: first, the used hydrological models are introduced.Secondly, information about the study catchment and the used data is provided.Next, the derivation of the rating curve is introduced and taken as a starting point in the model structure evaluation explained in Sect. 5.The latter consists of the results of the extended GLUE approach, the obtained prediction uncertainty, the DYNIA interpretation and confronted with the posterior parameter distributions when selecting based on subperiod based limits of acceptance.Finally, in Sect.6, the results are discussed and conclusions drawn.

PDM model
The Probability Distributed Model (PDM) is a conceptual rainfall-runoff model which transforms rainfall and evaporation data into flow at the catchment outlet.Figure 1 shows the general layout of the PDM model most commonly used in practice.A more detailed description can be found in Moore (2007).
The model consists of three main components: (1) a probability distributed soil moisture storage component for separation of direct runoff and subsurface runoff, (2) a surface storage component for transforming direct runoff to surface runoff (surface routing), (3) a groundwater storage which receives drainage water from the soil moisture storage component and contributes to baseflow (Moore, 2007).Introduction

Conclusions References
Tables Figures

Back Close
Full The soil moisture reservoirs defined by the probability distribution represent different locations in the catchment with different storage capacity.In any rain event, stores with the smallest storage capacity will be filled first and will start to produce rapid runoff first.Based on the proportion of the catchment with filled stores the area producing fast runoff can also be calculated.As such, the probability-distributed soil moisture storage component is used to define the separation between direct runoff and subsurface runoff.A Pareto or truncated Pareto distribution is mostly invoked for practical applications although the PDM model offers a range of options (Moore, 2007).The Pareto distribution function F (c) and probability density function f (c) used here is given by where C max is the maximum storage capacity in the basin and parameter b controls the degree of spatial variability of storage capacity over the catchment.
The ratio between actual and potential evapotranspiration is defined as and mostly depends linearly (b e = 1) or quadratically (b e = 2) on the soil moisture deficit, (S max − S(t)).
Loss towards the groundwater as recharge is defined by the assumption that the rate of drainage, d i , is linearly dependent on the basin soil moisture content: where k g is the drainage time constant and b g the exponent of the recharge function, here set to 1. S τ is the threshold storage below which there is no drainage and the Introduction

Conclusions References
Tables Figures

Back Close
Full water is immobilized by the soil tension.The routing by the surface storage is represented by a cascade of two linear reservoirs, with equally assumed time constants k f .Subsurface flow is routed by the groundwater storage by a non-linear storage routing function is adopted to effect this transformation.In this case, baseflow is calculated by q b = k b S(t) 3 .By summing the surface runoff and base flow, the total discharge at the catchment outlet is calculated at every time step of the simulation.

NAM model
NAM is the abbreviation of the Danish Nedbo-Afstromings-Model, meaning rainfallrunoff model.Nielsen and Hansen (1973) describe the original model, developed at the Hydrological section of the Institute of Hydrodynamics and Hydaulics Engineering at the Technical University of Denmark.During the last decade, the model is maintained by DHI (Danish Hydraulic Institute) as a part of the MIKE software-suite, but a custom Python implementation was used for this paper.
The NAM model is a rainfall-runoff model that operates by continuously accounting for the moisture content in different and mutually interrelated storages.These storages include: (1) snow storage (not included here), (2) surface storage, (3) lower or root zone storage and (4) groundwater storage (DHI, 2008).The model structure is shown in Fig. 2.
Rainfall contributes to the surface storage when the temperature is above freezing point (freezing is neglected for this paper).When the surface storage compartment is full, the remaining rainfall infiltrates towards the lower zone storages and contributes to the overland flow.Water is also extracted by (potential) evapotranspiration and interflow (hypodermic flow, i.e. horizontal flows in the unsaturated zone).The lower zone storage controls the different subflows, varying linearly with the relative soil moisture content of this lower zone storage.The different processes modeled by NAM are conceptualized by 9 empirical model parameters that need to be calibrated.A short description of each one of the model parameters is presented in Table 2. Introduction

Conclusions References
Tables Figures

Back Close
Full Evapotranspiration occurs at a potential rate from the surface storage.When the moisture content U is less than potential evapotranspiration E p , the remaining fraction of evapotranspiration varies linearly with the lower storage water content (L/L max ) by: The interflow (hypodermic flow), Q IF , is assumed to be proportional to the surface storage, U, and is given as When surface storage is full, excess rainfall P N (effective rainfall after subtracting the interflow), will form overland flow, whereas the remainder is diverted as infiltration into the lower zone and groundwater storage.Overland flow, Q OF , is assumed to be proportional to this saturation excess P N and depends on the soil moisture content in the lower zone storage, given as The amount of water recharging the groundwater storage depends on the soil moisture content in the root zone.The groundwater storage will generate baseflow.The baseflow is assumed to be proportional to the amount of infiltrating water recharging the groundwater storage and depends on the soil moisture content in the lower zone storage.The groundwater recharge is given by

Conclusions References
Tables Figures

Back Close
Full The routing of the interflow uses two linear reservoirs in series with the time constants CK 1 and CK 2 , usually assumed equal (CK 1,2 ).Overland routing is also based on two linear reservoirs, but with a variable time constant depending on an upper limit for linear routing (equation not given, analytical solution used).The baseflow routing is calculated as the outflow from one linear reservoir with time constant CK BF .Total flow is assessed by summing all different subflows.

Study area and data
The Grote Nete catchment, located in the northeast of Belgium served as study area.
It has a temperate climate with an average annual precipitation of 790.3 mm.Rainfall occurs throughout the entire year with more intensive and shorter storms in summer and more frequent, generally less intensive, storms in winter (Rouhani et al., 2007).
The two main tributaries, the Grote Nete and the Grote Laak merge before the Geel-Zammel outlet station.The soils are predominantly composed of sand, sandy loam in the southern and valley areas, and silt, with almost half the area consisting of sandy permeable soils (Rubarenzya et al., 2007) with high hydraulic conductivity (Rouhani et al., 2007).The topology is rather flat with an average slope of 0.3 % and a maximum one of 5 %, and has a shallow phreatic surface with a water table rising close to the surface in winter.Water resources of the Grote Nete catchment have been profoundly influenced by anthropogenic activities.
The model structures are fed with hourly data of rainfall and potential evapotranspiration.Daily potential evapotranspiration data were measured at Ukkel and were assumed to be representative for the Grote Nete catchment.An empirical relationship was used to transpose the data to an hourly time step (Vansteenkiste et al., 2011).The focus and starting point of the analysis are the stage-discharge evaluation points of the Geel-Zammel discharge station, as seen as triangles in Fig. 3.A power law is used to find the relationship between the discharge and the water level: with, Q the discharge, h the water level and a, b, c fitting parameters.
To estimate the uncertain power law, an uncertainty envelope was estimated based on an initial uncertainty of both the discharge derivation and the water level measurements.By varying the 3 parameters of the power law, those realizations fitting in the uncertain envelope of the different measurements were used to derive an overall rating curve uncertainty envelope, similar to Pappenberger et al. (2006) and based on the GLUE (Generalized Likelihood Uncertainty Estimation) methodology (Beven, 2006).A membership function of 1 was given to each rating curve which is within the assumed uncertain boundaries of the measured discharge and water level and zero when outside these boundaries.A parameter set was accepted to fit all measurement points, when all individual membership values are 1.
The same measurement error for all the calibration measurements of the discharge was assumed.Literature reports values between 1.8 % and 8.5 % for discharge measurements and 3 till 14 mm for the water level measurements (Pappenberger et al., 2006).In this study, a discharge error of 5 % and no error for the water level (as no specific information of the observation spot was available and the relative error in the discharge is assumed larger) was assumed to be sufficient for the test-case.More elaborated study would be needed to identify a more reliable value of the uncertainty, since uncertainty in the individual rating curve measurements can be significant for both low and high discharges (Blazkova and Beven, 2009).As opposed to Krueger et al. (2010) and McMillan et al. (2010) all observations were evaluated together, but no realizations of the power-law were able to go through all the observation points uncertainty Introduction

Conclusions References
Tables Figures

Back Close
Full regions resulting in zero membership functions, so a relaxation was needed.A first option is to increase the uncertain membership region around the measurement points until a set of realizations is accepted.A second option, pursued in this study, a relaxation is possible with respect to the number of observation points a realization need to have membership value 1.This choice was driven by the large variability of the observation points between 0.3 m and 0.6 m.By applying this type of relaxation, the chance of a totally incorrect measurement is assumed larger than the error of the individual measurements.When allowed 1 out of 16 membership functions to be zero, the set of behavioral parameter sets can be used to derive uncertainty bounds of the discharge measurements.The resulting uncertainty envelope is shown in Fig. 3.The uncertainty increases towards lower and higher extrapolated areas of the stage-discharge measurement points.Since only membership functions of one and zero are used, every behavioral realization gets the same weight.This assumption is made since the model error was expected to be larger than the measurement error (similar to Krueger et al., 2010).Therefore, it was not expected that the hydrological model realizations would fall into these uncertain measurement bounds for all timesteps and more detailed information about the observation error structure within the bounds would not add significant information to the model structure evaluation.
Optimally, the uncertainty bounds of the flow time series would be calculated by sampling from the posterior distribution of the three parameters of the power law and calculating this ensemble of accepted realizations to retrieve the uncertainty bounds of the discharge time series based on the original water level data.Since different stagedischarge relationships were used during the simulation period (by the operators) and a correction of 0.8 m 3 s −1 due to external discharges, an alternative approach was used as approximation.From the ensemble of realizations, the median (50th percentile) was inferred together with their corresponding 5th and 95th percentiles.For every time step in the flow time series, the measured value was assumed to correspond to the median value and the percentiles of the corresponding median was used to identify the flow time series uncertainty.The resulting uncertainty bounds for 2004 are given in Fig. 4. Introduction

Conclusions References
Tables Figures

Back Close
Full

Methodology
The approach presented here does not use a specific performance criteria or likelihood measure, but rather a specification of the limits of acceptability for all the observations of interest, similar to Liu et al. (2009) and Blazkova and Beven (2009).It avoids the problem of making assumptions about the characteristics of the modeling error needed in Bayesian applications (Beven et al., 2008;Beven and Freer, 2001;Vrugt and Robinson, 2007).Hence, a model prediction will be accepted and considered behavioral if for all observed values the modeled values fall between these specified minimum (Q min ) and maximum (Q max ) limits of acceptance.An approach similar with Westerberg et al. (2011b) and Liu et al. (2009) was applied, i.e. with a score of −1 and 1 when simulated discharge is equal to, respectively the lower or upper limit and linear interpolated values otherwise (Fig. 5, left).When all model realizations are rejected, relaxation of the initial limits can be considered (Blazkova and Beven, 2009;Liu et al., 2009).A first option is the relaxation of the number of observation points that need to satisfy the specified limits.This needs careful checking to avoid that the periods where these models are outside the bounds are those of interest.A second option is a relaxation of the initially set limits of acceptance of the individual observation points, so accepting time steps with scores larger than 1 or smaller than −1 (Liu et al., 2009).
All model realizations that have a sufficient amount of time steps with the model output between the minimum and maximum limits are accepted as behavioral.The definition of prediction percentiles requires a likelihood weight to be specified for every model run (Beven, 2006).To derive these weights for all observations a positive weight could be assigned to model predictions, according to their level of performance at individual time steps.The most straightforward option to translate the score is a triangular Introduction

Conclusions References
Tables Figures
The weights for every time step are then combined by taking the sum of the individual points to derive the single weight associated with the particular model realization (a behavioral parameterization of a certain model structure), analogue to Liu et al. (2009).
Again, other methods to combine the weights of the individual points are possible (e.g.giving periods of low flow and high flow more importance) and worth testing, but this is considered outside the scope of the case presented here.The weights of the different behavioral simulations are subsequently used in the GLUE methodology to derive the prediction limits of the ensemble of model realizations.By applying this work flow, models that produce flow predictions close to the observations will have higher weights.Other conceptualizations about the measurement error can be used to construct these weights differently translating this conceptualization into the model evaluation.

Results
A total of 500 000 model simulations of both model structures were performed and sampling of the parameter combinations was performed with a pseudo-random sampling technique (Sobol and Kucherenko, 2005).Parameter ranges are given in Tables 1  and 2 for PDM and NAM models, respectively.For the PDM model, parameter ranges were based on those proposed by Cabus (2008).Results of the study performed by Vansteenkiste et al. (2011) were used to set up the parameter ranges for the NAM model.
When applying the initial limits of acceptance and requiring that all time steps in a simulation are within the initially derived observed uncertainty boundaries, all simulations are rejected.Relaxation of the boundaries towards both the limits of score and the percentage of time steps the simulation is enveloped was applied.Since over-and under prediction of the simulations was observed at similar degree, both the upper and lower score limit were extended simultaneously to −2 and 2 under the assumption that the derived uncertainty measurement boundaries are too conservative.Moreover, the Introduction

Conclusions References
Tables Figures

Back Close
Full percentage of time the simulations were allowed to be outside these score limits was set to 10 % of the simulation period.As such, the limits of acceptance are relaxed and a total of 477 parameter combinations for the NAM model and 389 parameter combinations for the PDM model were accepted.The aim of relaxing the score boundaries and evaluating different model simulations for the period as a whole is to compare the characteristics of the behavioural model structures and evaluate the general behaviour of the accepted model simulations.
Given the applied relaxations, it is important to understand when the model simulations are trespassing the score boundaries to observe potential systematic failure of the accepted simulations.First, a separation was performed to discriminate different modes of the hydrograph similar to Boyle et al. (2000), Wagener et al. (2001) and Krueger et al. (2010).A segmentation is done between driven (wetting up, positive slope of the hydrograph) and non-driven (draining, negative slope of the hydrograph).A further separation of the non-driven periods in quick and slow is done using a threshold.This threshold was set to the mean flow of the season the period belongs to (as opposed to Wagener et al., 2001, using overall mean flow) in order to better adapt to the seasonal variations.Second, a seasonal segmentation was done to evaluate the seasonal effects.
Figure 6 shows the scores for the calibration as a whole as well as for the driven and non-driven periods.90 % of the time steps are within the −2 to 2 boundaries defined as limits of acceptance.The gray bounds indicate the −1 and 1 boundaries to evaluate the model structures in comparison to the derived observation uncertainty bounds.No unbalanced over-or underestimation of the scores is observed, except for a slight skewness of the scores during the non-driven slow periods.This indicates shortcomings of the model structures in representing the long-term drying up of the catchment.Based on the seasonal scores (Fig. 7), differences are clearer and the long term seasonal limitations of the model appear to predominate the short term representation of the wetting and drying after a rain event.Furthermore, larger differences in Introduction

References
Tables Figures

Back Close
Full the histogram plots of the models indicates the mutual difference between both model structures to be more apparent at the seasonal level.

Methodology
The DYNIA approach, initially developed by Wagener et al. (2003), is essentially a dynamic extension of the Regional Sensitivity Analysis (RSA) (Hornberger and Spear, 1981) and the GLUE approach (Beven and Binley, 1992).The approach uses a set of Monte Carlo simulations based on a uniform prior distribution of the parameters and focuses on the posterior parameter distribution after selecting the best performing parameter sets according to a predefined support measure (an objective function).
The approach improves the amount of information that can be obtained from the observed time series through the use of the moving window.Cullmann and Wriedt (2008) compare the optimized parameter set derived with Gauss Marquardt Levenberg (GML) algorithm on event basis with the identifiable regions of the DYNIA approach concluding that in most cases both coincide.Furthermore, by reorganizing the data according to the state variable (i.e.flow) instead of using the time series as such, a relation between the optimal parameter value and the observed flow is revealed, leading to the suggestion of using state-dependent (transient) model parameters for models in operational conditions.Wriedt and Rode (2006) obtained similar results observing a shift of the confidence range of the parameter controlling the interflow volume with increased discharge.The evolution of the parameter identification with growing window size is also evaluated suggesting for most parameters a constant uncertainty range after one or two years of simulation.Lee et al. (2004)  of the shifts in the posterior distribution of the parameters with soil moisture storage dynamics, improved structures are suggested, however not leading to a significant improvement in terms of representing the outflow hydrograph.The PDM approach is also tested by Tripp and Niemann (2008) and compared with a more physically based soil moisture representation, noticing structural errors in both.They also argue that the stability and identifiability of the model parameter is not a sufficient reason to confirm the assumptions underlying the parameter occurrence, since the most stable and identifiable parameter appears to actually vary in time.Abebe et al. (2010) apply the DYNIA approach on the HBV model and retrieved for three out of five analyzed parameters clearly defined periods with high information content to identify these parameter values.The relation between model parameters and soil moisture state is also observed.They relate changing parameter optima with time not only with structural inadequacy, but also the unsteadiness of the catchment processes (e.g.land-use changes affecting the hydrological response) changing the dominant runoff generation processes, mostly assumed constant during the model development.
The main difference with the GLUE and RSA methods mentioned is the evaluation of the simulations for each simulation time step within a specified time frame (moving window) before and after the time step.For the specified time window, only the best performing parameter sets (e.g. the top 10 %) are selected and their cumulative support is derived based on the corresponding support measure.Comparable to the RSA approach, parameters that are highly sensitive for the current time frame will be conditioned by the support measure and deviate from the initial uniform distribution.The marginal probability distribution of the parameter is represented by the gradient of the obtained cumulative function and therefore an indicator of identifiability of the parameter (Wagener et al., 2003).The results are visualised in a 2-D plot of parameter values in function of time (Fig. 8), where the probability density of the parameter is represented by a gray scale, in which a darker gray represents higher identifiability in time and parameter space.Moreover, the 5 % and 95 % confidence limits of the parameter density function can be calculated and the range is a measure for the ability of the Introduction

Conclusions References
Tables Figures

Back Close
Full data to discriminate the parameter values.Wagener et al. (2003) expressed this in an Information Content (IC) measure as follows: with p 5 % and p 95 % , respectively the lower and upper confidence interval of the obtained marginal parameter disitribution and ∆P the initial parameter range as given in Tables 1 and 2 for PDM and NAM, respectively.The information criterium ranges between 0 and 1, with high values indicating a small confidence interval expressing high identifiability.Changing regions of high identification can reveal model structure deficits.
The support measure here is based on the score evaluation described in Sect.5.1.
For every time step and with a time window of n time steps, the absolute values of scores of the individual time steps between t − n and t + n are aggregated (summed) and only the lowest 10 % selected.Using the scores in the evaluation process has the advantage of allocating the same relative punishment towards different flow magnitudes as opposed to Sum of Squared Errors (SSE) based performance criteria.
The selected time window of the different parameters not only depends on the influential period of the parameter (response time), but also on the quality of the data (Wagener et al., 2003).Since for all parameters the same (uncertain) flow time series is used, a classification between a short (1-5 days), median (5-30 days) and long (30-45 days) window was used for, respectively parameters mainly contributing to overland flow, unsaturated zone and groundwater processes.When the window size is too narrow, the influence of data error could become too influential, whereas too wide window sizes can result in aggregation of different periods of information (Wagener et al., 2003).By adapting the time frame manually within the proposed ranges, a period was taken to optimize the visualization of the potential underlying message.Depending on the window size, the proportion of time steps at the beginning and the end of the time series that is distorted needs to be excluded for the interpretation (Wagener et al., 2003).Introduction

Conclusions References
Tables Figures

Back Close
Full For each parameter of both model structures, a plot is made representing the dynamic identifiability of the parameter.As opposed to Wagener et al. (2003) only the behavioral model simulations after imposing the limits of acceptance are included in the analysis.By only using behavioral simulations, the analysis focuses on combined posterior parameter distributions that represent the dynamics of the system with a certain minimum level (resulting from the relaxed limits of acceptance).

Results of DYNIA for NAM model
Figure 8 shows the identifiability analysis for the threshold value for overland flow (T OF ) of the NAM model.The plot visualizes both the DYNIA results in the parameter-time space as well as the derived IC over time.The range of the y-axis at the parameter side is taken from the original parameter boundaries.The combined analysis allows to identify periods with high identifiability and verify the location of optima in the parameter space during these periods.The IC of the T OF parameter is the highest during summer rain events, where the confidence limits are narrowing towards lower values.The convergence of the parameter value fluctuates during the remaining periods without particular optima, indicating that varying values of the parameter yields similar score values in combination with the remaining parameters.During these summer months the threshold strives towards lower values to make sure enough overland flow is generated and the parameter gets more importance.
The L max parameter, representing the maximum water storage in the lower soil between root zone and groundwater (Fig. 9 in combination with higher baseflow time constants to prevent the overprediction.After the winter months, higher L max values are needed to decrease the flows together with lower baseflow routing.In general, the combination of the small U max reservoir and the single L max reservoir accounting for unsaturated zone is not sufficient to incorporate seasonal variations. Similar analysis of the other parameters (not shown here) of the NAM model shows a shift towards very low values of U max during certain rain events, but this causes at the same time overestimation of the peaks.CQ OF is identifiable during winter events and also for T G seasonal variation is recognizable, but not as distinct as for the other parameters.For T IF identification of the parameter is low throughout the entire calibration period, whereas for CK IF a small shift toward higher values is observable in winter months when the catchment is in wet condition.Differences in the area of identifiability of the CK 1,2 parameter during rising and falling limbs indicates that using the same time constant for overland flow and interflow may will be too simplistic to capture the retention of the basin.

Results of DYNIA for PDM model
For the PDM model, Fig. 10 shows the dynamic analysis of the maximum storage capacity (C max ).Convergence to specific values is much more present along the entire period with the highest information content during periods of heavy rain along the year.In the recession after the winters of 2003 and 2004, some shifting towards higher values is visible, but to a lesser extent than the L max parameter of the NAM model indicating a better representing of the seasonal variation in the catchment.The parameter defining the shape of the pareto distribution, b, representing the spatial variation in the catchment is the second parameter defining the unsaturated zone processes, shown in Fig. 11.During most of the year, parameter b strives to lower values, except of the spring periods, where the parameter gets less identifiable, due to the strong interaction with other parameters.Furthermore, the increase of the parameter value indicates more variation in the catchment in terms of soil storage available.Introduction

Conclusions References
Tables Figures

Back Close
Full Results of other parameters of the PDM model are not shown here but are briefly discussed.The exponent of the evaporation function b e does not show a distinct area of identifiability.The groundwater recharge constant k g is much more identifiable than the baseflow time constant k b showing the importance of the drainage function to capture the seasonal variation of the groundwater.The storage capacity S τ of the drainage function on the other hand is less identifiable whereas the routing of the overland flow (k f ) is identifiable during the entire period, but jumps between two optima that are not directly seasonally related.

Prediction uncertainty
Based on the set of accepted parameter combinations and their corresponding (normalized) weights, the uncertainty of both model structures is derived.Figure 12 gives the uncertainty (90 % prediction uncertainty) for 2004 and compares the observed uncertainty with the modelled prediction uncertainty for 2004.PDM tends to underpredict the peaks during winter months, but captures the dynamic behaviour in the summer months.The variation in June is completely missed by the NAM model.Both models are underestimating the flow peaks in march.Mainly periods where one out of the two models is unable to predict the dynamics are useful to deduce model structural differences.
For the validation period, the prediction uncertainty of 2006 is shown in Fig. 13.Similar differences between the model structures as compared to the calibration period are apparent.PDM better captures the recession periods in July and October and the NAM model predicts in general higher peak discharges.Variation in the prediction of the peaks in the NAM model is also higher.The similarity in the failures of the models in both calibration and validation periods further confirm the conclusions of the identifiability analysis to be independent from the specific calibration period selected.Introduction

Conclusions References
Tables Figures

Back Close
Full

Posterior evaluation of periodically selected parameter combinations
Based on the results above, a second selection of behavioral parameter sets was done on the same set of 500 000 simulations, but now with a selection of the behavioral sets based on the scores during the individual periods, both in terms of seasonal and the driven/non-driven segmentation as opposed to the analysis on the complete calibration period used in Sect.5.1.Limits of acceptance were put for each period separately with score limits of −2.5 and 2.5 and a maximal of 5 % of the time steps threspassing these limits, putting less concern on the individual scores, but more on the percentage of time as compared to the selection on the entire period.Under these conditions, none of the simulations were able to satisfy the driven and non-driven quick limits and the analysis is focussed on the seasonal differences.In Fig. 14 the posterior parameter sets of the NAM model are shown for each season and the non-driven slow.Since the importance of the seasonal variations in the model identification (5.2), the resulting posterior parameter distributions are in correspondence to these findings.Seasonal variation of optimal parameter vaues is mainly visible for parameters L max , CK BF and CK IF .This mainly indicates the insufficiency of the model structure to capture the water retention in the catchment throughout the year based on the unsaturated zone concept of the NAM model.Overland flow parameters, CQ OF and CK 1,2 , are highly identifiable during winter, whereas T OF during summer months.Nevertheless, seasonal differences are visible due to rain events happening during, respectively wet or dry conditions of the catchment.Parameter T IF has no posterior convergence in any of the seasons.Since also the DYNIA approach revealed no specific region of identifiability, the usefulness of the infiltration threshold for this application can be questioned and simplifying the interflow description (leaving out the T IF parameter) can be considered.winter were not accepted in the limits set to the entire period.Higher values of b indicate the higher spatial variation during winter months, whereas the low values during the rest of the year suggest uniformity in the catchment implying that a single storage may be sufficient.Higher C max (more water storage capacity) would help the prediction during winter but tends to predict the rest of the hydrograph wrongly.The main differences with the seasonal variation is noticeable for parameter k g .Again, these high values in summer and spring were not taken into account in the DYNIA approach.These high values decrease the drainage towards the groundwater reservoir.The posterior of the non-driven slow supports the convergence towards winter values.Based on the seasonal selection k b and S τ are not identifiable.

Discussion and conclusions
This paper combines the limits of acceptance approach (Beven, 2008) with the dynamical identifiability approach (Wagener et al., 2003) to evaluate the potential of detecting model structural deficiencies when taking into account the rating curve uncertainty.Using the uncertain rating curve as evaluation limits one does not need to make explicit assumptions about the nature of the modeling errors.When the analysis of the obtained evaluation scores for different subperiods are lacking clear indication of over-and underprediction (Figs. 6 and 7), the added value of the DYNIA application to get insight in the model structural limitations is shown.Comparable information about the parameter time-variation is derived by the subperiod parameter selection (Sect.5.4), but this is based on the knowledge of seasonal defects brought by the DYNIA approach.This, in combination with the ease of use, indicates the advantages of applying the DYNIA approach as generic information source for model structure improvement.
A first difference between the applied models is the soil moisture storage component.NAM is using one upper and lower storage reservoir, whereas PDM uses the probability distribution concept aiming to conceptually introduce the spatial variability.Furthermore, a linear routing of the groundwater is used in the NAM model in contrast to

Conclusions References
Tables Figures

Back Close
Full A restriction in the application of the limits of acceptance approach is the need for relaxation of the initial limits of acceptance to avoid rejection of all model simulations (both in terms of parameterization and structure), as was also needed in Blazkova and Beven (2009) and Liu et al. (2009).However, since the focus is on model improvement, the approach is based on rejection rather than optimization to identify and focus on particular parts of the time series that are not well simulated (Beven, 2008).This "learning by rejecting" is made possible by consecutively relaxing of the limits of acceptance.In Figures

Back Close
Full the presented approach here, 2 major degrees of freedom can be altered: % of time transgressing the limits and weakening of the limits.More focus can be given towards the prediction of the general behavior of the dynamics, by putting rigorous requirements in the % of time and weakening the limits itself or more focus can go to periods of transgressing the initial derived limits by relaxing the % of time while keeping these limits.In the application here, a balanced relaxation of both was used to gain general insight in the behavior of the resulting behavioral simulations.By doing so, the resulting behavioral model simulations used in Sect.5.1 are actually selected based on a time-aggregating performance criterion, whereas in Sect.5.4 the use of applying separate limits on different response modes of the hydrograph is shown giving comparable results as Peters et al. (2003).The use of the aggregated score is useful in the model evaluation, since the DYNIA approach gives the possibility to differentiate the selected simulations in function of time.In model evaluation, the use of multiple non-commensurable evaluation functions focusing on different underlying assumptions is essential (Gupta et al., 1998;Winsemius et al., 2009), but the selection of the most appropriate criteria is not always straightforward.The application of the DYNIA approach can give guidance in the selection of objective functions.For this application example, the use of a total seasonly volume could support the model optimization for practical applications.Besides, by focusing on the behavioral simulation with DYNIA, information is extracted about the reasons of equifinality of these selected (behavioral) parameter sets.Insight is given in how identification (in terms of parameter space and model structures) can be improved, leading to more objective and guided reasoning of defining limits of acceptance.
Based on the variation in optimal parameter sets, both in seasonal variation and on storm level, and the relation between model states and optimal parameter combinations lead to the of introduction of time-variant and stochastic parameters (Beck and Young, 1976;Cullmann and Wriedt, 2008;Lin and Beck, 2007;Reichert and Mieleitner, 2009;Kuczera et al., 2006;Tomassini et al., 2009) and is associated with the Data-Based Mechanistic (DBM) approach using state-dependent parameters to Figures identify non-linear systems (Young et al., 2001).The main argument for introducing stochastic parameter values is the inherent stochasticity of these conceptual models due to spatial and temporal averaging (Kuczera et al., 2006).Nevertheless, the use of time-variant parameters remains mainly useful in terms of model structure evaluation.The idea of allowing parameters to vary in time to gain information about potential model structural improvements goes back to Beck and Young (1976) and the potential of learning from the behavior of time-dependent parameters is higher than from corrections in model states (Reichert and Mieleitner, 2009).Cullmann and Wriedt (2008) argue to incorporate the state-dependent parameter changes in the formulation of conceptual model intended for continuous simulations, adapting to governing processes.
However, the general assumption in conceptual hydrological models is that model parameters are constant in time given that catchment characteristics do not change with time, and if parameter optima changes, then the inference is that there is a missing aspect in the model formulation and thus a model structural error (Abebe et al., 2010).
The evolution towards flexible model structures allows to evaluate multiple hypotheses (Wagener et al., 2001;Clark et al., 2011;Fenicia et al., 2011;Van Hoey et al., 2011) and this contribution (amongst others) demonstrates that incorporating the DY-NIA approach is a straight forward way to discover potential pitfalls to enhance the learning curve about model structure improvement.Looking into model performance in function of time gives guidance towards model optimization and identification.By incorporating the discharge uncertainty, potential periods of wrong measurements are less influencing the model evaluation making it is less biased than by the use of deterministic flow values.Besides, it is shown that the observation uncertainty is not inhibiting the identification of deficiencies.Still, the use of erroneous input forcing (i.e.rainfall and evapotranspiration data) can obscure the effects and accounting for input forcing errors in the structure evaluation can potentially clarify parameter value switches (Kavetski et al., 2006a,b;Vrugt et al., 2008).Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Hourly flow data of the basin outlet was used to compare the observed and predicted values of the different model structures.Based on the availability and quality of the data, a calibration period from 2002 until 2005 was used and a validation period from 2006 until 2008.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | compare two conceptual model structures, with one of them a probability based model structure.Parameters are either non-identifiable over the entire time series or exhibit time-dependency in their optimal values.Seasonal variations of the optimum parameter values are consistent and much clearer than the variations from dry to wet years.Based on the correlation Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ), converges towards different parameter values during different periods.Lower values appear during winter months in 2004 and 2005, whereas higher values are obtained during spring months of 2003 and 2004.As stated by Wagener et al. (2003), this typically indicates a failure of the model structure due to the inconsistency in the way the model fits the observed flow.Moreover, parameter CK BF behaves in the opposite direction to compensate for this seasonal variation during 2003 and 2004 (results not shown, but the seasonal parameter switch is also visible in Fig. 14).During winter months lower L max values produce more runoff Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | The posterior of the parameters of the PDM are shown in Fig. 15.The seasonal variation that is visible for the C max parameter (mainly winter and fall) in combination with parameter b is different to the DYNIA results since the higher posterior values during Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |a non-linear routing of PDM.Groundwater recharge is comparable when b g is assumed 1 for the PDM model.The differentiation in 3 subflows in the NAM model, against 2 of the PDM model is partly undone by the use of one time constant for both overland flow and interflow in the NAM model.The limitation to simulate the seasonal dynamics are dominating the peak discharges of the individual rain events, mainly dominated by the soil moisture storage conceptualization.From the results presented here, the probability distribution approach from the PDM model seems to be more suited.This has also consequences on the prediction uncertainty of peak discharges, where the NAM model has broader uncertainty bounds during peaks.Moreover, capturing the seasonal dynamics is in this catchment mainly related to the groundwater representation.The absence of identifiability in the PDM base flow time constant (k b ) and the interplay of the seasonal variation in the NAM base flow time constant (CK BF ) with the soil moisture L max suggests shortcomings for both models, albeit for different reasons.In the PDM model the seasonal variation is mainly captured by the soil moisture variation in combination with the identifiable recharge parameter (k g ), inducing the limited influence of the k b parameter.In the NAM model, a clear compensation of the parameter values suggests that the inability of the soil moisture storage is causing these problems, probably due to the inability to capture the dynamics by only one reservoir.McMillan et al. (2011) reached similar conclusions based on the nonuniqueness of the storage-discharge relationship, suggesting that multiple reservoirs are required, such that seasonal variation is captured by varying proportions of flow from the different reservoirs (cfr.the PDM approach).
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 6 .
Fig. 6.Scores to limits of acceptance of the behavioral model simulations for the entire calibration period as well as different part of the hydrograph for both NAM and PDM model.The histograms were normalized by the number of behavioral simulations and represent the % of timesteps of the defined period.The gray band represents the −1.and 1. boundaries of the measured uncertainty.

Fig. 7 .
Fig. 7. Scores to limits of acceptance of the behavioral model simulations for the different seasons for both NAM and PDM model.The histograms were normalized by the number of behavioral simulations and represent the % of timesteps of the defined period.The gray band represents the −1 and 1 boundaries of the measured uncertainty.

Table 1 .
Overview of the PDM model parameters.

Table 2 .
Overview of the NAM model parameters.