Reducing structural uncertainty in conceptual hydrological modeling in the semi-arid

Introduction Conclusions References

and Letcher, 2003;Xu and Singh, 2004).The main rationale behind this success lies in the fact that relatively simple structures with low data and computer requirements generally outweigh the performance of far more complex physically-based models (e.g.Michaud and Sorooshian, 1994;Refsgaard and Knudsen, 1996;Kokkonen and Jakeman, 2001).Also, most water management decisions are made at operational scales having much more to do with catchment-scale administrative considerations than with our understanding of microscale processes.As a result, conceptual models are being increasingly used to evaluate the potential impacts of climate change on hydrological systems (e.g.Minville et al., 2008;Ruelland et al., 2012) and freshwater availability (e.g.Milano et al., 2013;Collet et al., 2013).
This modeling strategy, however, is regularly criticized for oversimplifying the physics of catchments and leading to unreliable simulations when conditions shift beyond the range of prior experience.Part of the problem comes from the fact that model structures are usually specified a priori, based on preconceived opinions about how systems work, which in general leads to an excessive dependence on the calibration process.
More than a lack of physical background, this practice reveals a misunderstanding about how such models should be based on physics (Kirchner, 2006;Blöschl and Montanari, 2010).Hydrological systems are not structureless things composed of randomly distributed elements, but rather self-organizing systems characterized by the emergence of macroscale patterns and structures (Dooge, 1986;Sivapalan, 2006;Ehret et al., 2014).As such, the reductionist idea that catchments can be understood by merely aggregating (upscaling) fine-scale mechanistic laws is generally misleading (Anderson, 1972;Dooge, 1997;McDonnell et al., 2007).Self-organization at the catchment scale means that new hydrologic relationships with fewer degrees of freedom have to be envisioned (e.g.McMillan, 2012).Yet, finding simplicity in complexity does not imply that simple models available in the literature can be used as ready-made engineering tools with little or no consideration for the specific features of each catchment (Savenije, 2009).As underlined by Kirchner (2006), it is important to ensure that the "right answers" are obtained for the "right reasons".In the case of Figures poorly-defined systems where physically-oriented interpretations can only be sought a posteriori to check for the model realism, this requires placing more emphasis on the uncertainty arising from deficiencies and/or ambiguities in the model structure than is currently done in most hydrological impact studies.Structural uncertainty can be described in terms of inadequacy and non-uniqueness.
Model inadequacy arises from the many simplifying assumptions and epistemic errors made in the selection of which processes to represent and how to represent them.It reflects the extent to which a given model differs from the real system it is intended to represent.In practice, this results in the failure to capture all relevant aspects of the system behavior within a single model structure or parameter set.A common way of addressing this source of uncertainty is to adopt a top-down approach to model-building (Jothityangkoon et al., 2001;Sivapalan et al., 2003), in which different models of increasing complexity are tested to determine the adequate level of process representation.Where fluxes and state variables are made explicit, alternative data sources (other than streamflow) such as groundwater levels (Seibert, 2000;Seibert and McDonnell, 2002), tracer samples (Son and Sivapalan, 2007;Birkel et al., 2010;Capell et al., 2012) or snow measurements (Clark et al., 2006;Parajka and Blöschl, 2008), can also be used to improve the internal consistency of model structures.
Additional criteria can then be introduced in relation to these auxiliary data or to specific aspects of the hydrograph (driven vs. nondriven components, rising limb, recession limbs, etc.).In this perspective, multi-criteria evaluation techniques based on the concept of Pareto-optimality provide an interesting way to both reduce and quantify structural inadequacy (Gupta et al., 1998;Boyle et al., 2000;Efstratiadis and Koutsoyiannis, 2010).A parameter set is said to be Pareto-optimal if it cannot be improved upon without degrading at least one of the objective criteria.In general, meaningful information on the origin of model deficiencies can be derived from the mapping of Pareto-optimal solutions in the space of performance measures (often called the Pareto front) and used to discriminate between several rival structures (Lee et al., 2011).Further, the Pareto set of solutions obtained with a given model is Figures commonly used to generate simulation envelopes (hereafter called "Pareto-envelopes" for brevity's sake) representing the uncertainty associated with structural errors (i.e.model inadequacy).Non-uniqueness refers to the existence of many different model structures (and parameter sets) giving equally acceptable fits to the observed data.Structural inadequacy and the limited (and often uncertain) information of the available data make it highly unlikely to identify a single, unambiguous representation of how a system works.There may be, for instance, many different possible representations of flow pathways yielding the same integral signal (e.g.streamflow) at the catchment outlet (Shaefli et al., 2011).Non-uniqueness in model identification has also been widely described in terms of equifinality (Beven, 1993(Beven, , 2006) ) and may be viewed as a special case of a more general epistemological issue known as the "underdetermination" problem.Over the past decade, these considerations have encouraged a shift in focus toward more flexible modeling tools based on the concept of multiple working hypotheses (Buytaert and Beven, 2011;Clark et al., 2011).A number of modular frameworks have been proposed, in which model components (i.e.individual hypotheses) can be assembled and connected in many ways to build a variety of alternative structures (i.e.overall hypotheses).Recent examples of such modeling frameworks include the Imperial College Rainfall-Runoff Modeling Toolbox (RRMT) (Wagener et al., 2002), the Framework for Understanding Structural Errors (FUSE) (Clark et al., 2008) and the SUPERFLEX modeling environment (Fenicia et al., 2011).Clark et al. (2011) suggested that multiple-hypothesis frameworks (MHF) represent a valuable alternative to "most practical applications of the top-down approach", which "seldom consider competing process representations of equivalent complexity".Compared to current multimodel strategies, these frameworks also provide the possibility to better scrutinize the effect of each individual hypothesis (i.e.model component), provided that the model decomposition is sufficiently fine-grained.Finally, Clark et al. (2011) argued that ensembles of competing model structures (both of equal and varying complexity) can also be generated to quantify the structural uncertainty Introduction

Conclusions References
Tables Figures

Back Close
Full arising because of system nonidentifiability (i.e.model non-uniqueness).So far, however, this method has mostly been applied to small (< 10 km 2 ) experimental (wellmonitored) catchments (e.g.Clark et al., 2008;Smith and Marshall, 2010;Buytaert and Beven, 2011;McMillan et al., 2012;Fenicia et al., 2014), with less attention being given to larger scales of interest (100-400 km 2 ) (e.g.Kavetski and Fenicia, 2011;Coxon et al., 2013) or long time periods.Therefore, the need remains to establish whether MHF can also be used to improve conceptual modeling on multi-decadal periods at operational scales of 1000 km 2 or more.The potential benefits of combining MHF with Pareto-based optimization schemes also remain largely unexplored in the current literature.
Addressing these issues is of particular importance in the case of arid to semiarid, mountainous catchments such as those found in north-central Andes (30 • S).The Norte Chico region of Chile, in particular, has been identified as being highly vulnerable to climate change impacts in a number of recent reports (IPCC, 2013) and studies (e.g.Souvignet et al., 2010;Young et al., 2010).Yet, very few catchments in this region have been studied intensively enough to provide reliable model simulations, often with no estimation of the surrounding uncertainty (Souvignet, 2007;Ruelland et al., 2011;Vicuña et al., 2012;Hublart et al., 2013).This study is the first step of a larger research project, whose final aim is to assess the capacity to meet current and future irrigation water requirements in a mesoscale catchment of the Norte Chico region.The objective here is to provide a set of reasonable model structures that can be used for the hydrological modeling of the catchment.To achieve this goal, a MHF was developed and combined with a multi-criteria optimization framework using streamflow and satellite-based snow cover data.Introduction

Conclusions References
Tables Figures

Back Close
Full 2 Study area

General site description
The Claro River Catchment (CRC) is a semi-arid, mountainous catchment located in the northeastern part of the Coquimbo region, in north-central Chile (Fig. 1).It drains an area of approximately 1515 km 2 , characterized by high elevations ranging from 820 m a.s.l. at the basin outlet (Rivadavia) to over 5500 m a.s.l. in the Andes Cordillera.
The topography is dominated by a series of generally north-trending, fault-bounded mountain blocks interspersed with a few steep-sided valleys.
The underlying bedrock consists almost entirely of granitic rocks ranging in age from Pennsylvanian to Oligocene and locally weathered to saprolite.Above 3000 m a.m.s.l., repeated glaciations and the continuous action of frost and thaw throughout the year have caused an intense shattering of the exposed rocks (Caviedes and Paskoff, 1975), leaving a landscape of bare rock and screes almost devoid of soil.
The valley-fill material consists of mostly unconsolidated Quaternary alluvial sediments mantled by generally thin soils (< 1 m) of sandy to sandy-loam texture (CIREN, 2005).Vineyards and orchards cover most of the valley floors and lower hill slopes but account for less than 1 % of the total catchment area (INE, 2009;CIREN, 2011).By contrast, natural vegetation outside the valleys is extremely sparse and composed mainly of subshrubs (e.g.Adesmia echinus) and cushion plants (e.g.Laretia acaulis, Azorella compacta) with very low transpiration rates (Squeo et al., 1993).The Claro River originates from a number of small tributaries flowing either permanently or seasonally in the mountains.

Hydro-climatic data
In order to represent the hydro-climatic variability over the catchment, a 30 year period  was chosen according to data availability and quality.Precipitation and temperature data were interpolated based on respectively 12 and 8 stations (Fig. 1) Introduction

Conclusions References
Tables Figures

Back Close
Full using the inverse distance weighted method on a 5 km × 5 km grid.Since very few measurements were available outside the river valleys, elevation effects on precipitation and temperature distribution were considered using the SRTM digital elevation model (Fig. 1).In a previous study, Ruelland et al. (2014) examined the sensitivity of the GR4j hydrological model to different ways of interpolating climate forcing on this basin.Their results showed that a dataset based on a constant lapse rate of 6.5 • C km −1 for temperature and no elevation effects for precipitation provided slightly better simulations of the discharge over the last 30 years.However, since the current study also seeks to reproduce the seasonal dynamics of snow accumulation and melt, it was decided to rely on a mean monthly orographic gradient estimated from the precipitation observed series (Fig. 1).Potential evapotranspiration (PE) was computed using the following formula proposed by Oudin et al. (2005): where PE is the rate of potential evapotranspiration (mm d −1 ), R e is the extraterrestrial radiation (MJ m −2 d −1 ), λ is the latent heat flux (2.45 MJ kg −1 ), ρ is the density of water (kg m −3 ) and T is the mean daily air temperature ( • C).Oudin et al. (2005) determined the values of K 1 and K 2 by selecting those that gave the best streamflow simulations when the formula was used to feed hydrological models.In this study, the FAO Penman-Monteith equation for a reference grass was used as a basis to tune K 1 and K 2 at two different locations within the basin (Rivadavia, Pisco Elqui, Fig. 1) (for more details on the results, see Hublart et al., 2014).Naturalized streamflow time series were estimated using information provided by the Chilean Dirección General de Aguas, mainly streamflow measurements at the gauging station of Rivadavia and historical surface-water diversion data.In addition to streamflow data, remotely-sensed data from the MODerate resolution Imaging Spectroradiometer (MODIS) sensor were used to estimate the seasonal dynamics of snow accumulation and melt processes over a 9 year period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011).Daily snow cover products retrieved from NASA's Terra 12144 Introduction

Conclusions References
Tables Figures

Back Close
Full (MOD10A1) and Aqua (MYD10A1) satellites were combined into a single, composite 500 m resolution product to reduce the effect of swath gaps and cloud obscuration.The remaining data voids were subsequently filled using a linear temporal interpolation method.

Precipitation variability
Among the primary factors that control the hydrological functioning of the CRC is the high seasonality of precipitation patterns.Precipitation occurs mainly during the winter months when the South Pacific High reaches its northernmost position.Most of the annual precipitation falls as snow at high elevations, where it accumulates in seasonal snow packs that are gradually released from October to April.The El Niño Southern Oscillation (ENSO) represents the largest source of climate variability at the interannual timescale (e.g.Rutllant and Fuenzalida, 1991;Montecinos and Aceituno, 2003).Anomalously wet (dry) years in the region are generally associated with warm (cold) El Niño (La Niña) episodes and a simultaneous weakening (strengthening) of the South Pacific High.It is worth noting, however, that some very wet years in the catchment can also coincide with neutral to weak La Niña conditions, as in 1984, while several years of below-normal precipitation may not exhibit clear La Niña characteristics (Verbist et al., 2010;Jourde et al., 2011).These anomalies may be due to other modes of climate variability affecting the Pacific basin on longer timescales.The Interdecadal Pacific Oscillation (IPO), in particular, has been shown to modulate the influence of ENSOrelated events according to cycles of between 15 and 30 years (Schulz et al., 2011;Quintana and Aceituno, 2012).Recent shifts in the IPO phase occurred in 1977 and 1998 and may be responsible for the highest frequency of humid years during the 1980s and the early 1990s when compared to the late 1990s and the 2000s.Introduction

Conclusions References
Tables Figures

Back Close
Full

Catchment-scale water balance and dominant processes
Notwithstanding this significant climate variability, a rough estimate of the catchment water balance can be given for the period 2003-2011 using the data presented in the previous subsection and additional information available in the literature.Spatially averaged precipitation ranges from a low of 80 mm in 2010 to an estimated high of 190 mm in 2008.Evapotranspiration from non-cultivated areas is sufficiently low to be reasonably neglected at the basin scale (Kalthoff et al., 2006).By contrast, water losses from the cultivated portions of the basin are likely to be around 10 mm yr −1 (Hublart et al., 2013).At high elevations, sublimation plays a much greater role than evapotranspiration.Mean annual sublimation rates over two glaciers located in similar, neighbouring catchments have been estimated to be about 1 mm d −1 (see e.g.MacDonell et al., 2013).Thus, a first estimate of the annual water loss associated with snow sublimation can be made by multiplying, for each day of the period, the proportion of the catchment covered with snow by an average rate of 1 mm d −1 .This leads to a mean annual loss of 70 mm between 2003 and 2011.Note that this value is of the same order of magnitude as those obtained by Favier et al. (2009) using the Weather Research and Forecasting regional-scale climate model.Mean annual discharge per unit area varies from a minimum of 20 mm in 2010 to a maximum of 140 mm in 2003.Interestingly, runoff coefficients exceed 100 % during several years of the period (in 2003, 2006, 2007 and 2009), indicating either an underestimation of precipitation at high elevations, as suggested by Favier et al. (2009), or a greater contribution of groundwater to surface flow (Jourde et al., 2011).
Groundwater movement in the catchment is mainly from the mountain blocks toward the valleys and then northward along the riverbed.In the mountains, groundwater flow and storage are controlled primarily by the presence of secondary permeability in the form of joints and fractures (Souvignet et al., 2006).The unconfined valley-fill aquifers are replenished by mountain front recharge along the valley margins and by infiltration through the channel bed along the losing river reaches (Jourde et al., 2011).Introduction

Conclusions References
Tables Figures

Back Close
Full

Multiple-hypothesis modeling framework
In order to evaluate various numerical representations of the catchment functioning, a multiple-hypothesis modeling framework inspired by previous studies in literature was developed.All the models built within this framework are lumped hypotheses run at a daily time step.The modeling process was decomposed into three modules and six model-building decisions.Each module deals with a different aspect of the precipitation-runoff relationship through one or more decisions (Fig. 2): snow accumulation (A) and melt (B), runoff generation (C), redistribution (D) and delay (E) of water fluxes, and natural storage effects (F).Each of these decisions is provided with a set of alternative modeling options, which are named by concatenating the following elements: first a capital letter from A to F referring to the decision being addressed, then a number from 1 to 3 to distinguish between several competing architectures and, finally, a lower case letter from a to c to indicate different parameterizations of the same architecture.Model hypotheses are named by concatenating the names of the six modeling options used to build them (see Table 4).The models designed within this framework share the same overall structure (based on the same series of decisions) but differ in their specific formulations within each decision.Introduction

Conclusions References
Tables Figures

Back Close
Full The model-building decisions can be divided into two broad categories.The first pertains to the production of fluxes from conceptual stores (decisions B, C and F).The second concerns the allocation and transmission of these fluxes using the typical junction elements and lag functions (decisions A, D and E) described by Fenicia et al. (2011).Junction elements can be defined as "zero-state" model components used to combine several fluxes into a single one (option D2) or split a single flux into two or more fluxes (options A1 and D3).Lag functions are used to reflect the travel time (delay) required to convey water from one conceptual store to another or from one or more conceptual stores to the basin outlet.They usually consist of convolution operators (option E2), although conceptual stores may also do the trick.Modeling options in which water fluxes are left unchanged are labelled as "No operation" options in Fig. 2. Water fluxes and state variables are named using generic names (from Q1 to Q6 and from S1 to S4, respectively) to ensure a perfect modularity of the framework.Further details on the alternative options provided for each decision are given in the following subsections.Note that some combinations of modeling options were clearly incompatible with one another (options C1 and C2, for instance, cannot work with option D2).As a result, these combinations were removed from the framework.
Another important feature of this modular framework is the systematic smoothing of all model thresholds using infinitely differentiable approximants, as recommended by Kavetski and Kuczera (2007) and Fenicia et al. (2011).The purpose here is twofold: first, to facilitate the calibration process by removing any unnecessary (and potentially detrimental) discontinuities from the gradients of the objective functions; and second, to provide a more realistic description of hydrological processes across the catchment (Moore and Clarke, 1981;Moore, 2007).

Snow accumulation and melt (decisions A and B)
Snow accumulation and melt components deal with the representation of snow processes at the catchment scale.All modeling options rely on a single conceptual store to accumulate snow during the winter months and release water during the melt 12148 Introduction

Conclusions References
Tables Figures

Back Close
Full season.Decision A refers to the partitioning of precipitation into rain, snow or a mixture of rain and snow.Decision B refers to the representation of snowmelt processes.Option A1 is the only hypothesis implemented to evaluate the relative abundance of rain and snow.A logistic distribution is used in this option instead of usual temperature thresholds to implicitly account for spatial variations in rain/snow partitioning over the catchment.In contrast, three modeling options drawing upon the temperature-index approach (Hock, 2003) are available for the evaluation of snowmelt rates (options B1a, B1b, B1c).Option B1a relies on a constant melt factor while options B1b and B1c allow for temporal variability in the melt factor to reflect seasonal changes in the energy available for melt.A recent example of option B1c can be found in Clark et al. (2005).
Option B1b has been previously applied by Schreider et al. (1997) but at the grid cell scale.Finally, it is worth noting that a smoothing kernel proposed by (Kavetski and Kuczera, 2007) was introduced in the state equation of the snow reservoir to ignore residual snow remaining in the reservoir outside the snowmelt season (see Eq. 1).

Runoff generation (decision C)
Runoff generation components determine how much of a rainfall or snowmelt event is available for runoff, lost through evapotranspiration or temporarily stored in soils and surface depressions.Many models rely on a conceptual store to keep track of the catchment moisture status and generate runoff as a function of both current and antecedent precipitation.Here, an assortment of four commonly used methods is available.Option C1 is the only one in which no moisture accounting store is required to estimate the contributing rainfall or snowmelt (see Fig. 2).Actual evapotranspiration then represents the only process involved in the production of runoff from precipitation or snowmelt.The remaining options make use of moisture accounting stores and distribution functions (see Table 1) to estimate the proportion of the basin generating runoff.An important distinction is made between option C2, in which runoff generation occurs only during rainfall or snowmelt events, and option C3, in which a leakage from the moisture accounting store remains possible even after rainfall or snowmelt has 12149 Introduction

Conclusions References
Tables Figures

Back Close
Full ceased.Examples of these two moisture accounting options can be found, respectively, in the HBV (e.g.Seibert and Vis, 2012) and PDM (Moore, 2007) rainfall-runoff models.
Alternative distribution functions are available in the literature, for instance in the GR4j (Perrin et al., 2003) and FLEX (Fenicia et al., 2006) models, but the rationale behind their use remains the same.Actual evapotranspiration is computed from the estimated PE using either a constant coefficient (option C1) or a function of the catchment moisture status (options C2 and C3).

Runoff transformation and routing (decisions D to F)
Runoff transformation components account for all the retention and translation processes occurring as water moves through the catchment.In practice, junction elements (decision D) and lag functions (decision E) are typically combined with one or more conceptual stores (decision F) to represent the effects of different flow pathways on the runoff process (both timing and volume).Additional elements in the form of lag functions or conceptual stores can also be used to reflect water routing in the channel network.However, in this study channel routing elements were considered useless at a daily time step.All the modeling options available for decision F consist of two stores.These can be arranged in parallel (options F1a and F1b), in series (options F2a and F2b), or in a combination of both (options F3a and F3b).In each case, one of the stores has a nonlinear behavior while the other reacts linearly.Two types of nonlinear response are provided: one that relies on smoothed thresholds and different storage coefficients (options F1b, F2b and F3b), and the other that relies on power laws (options F1a, F2a and F3a).Options F1a and F1b are based on the classical parallel transfer function used in many conceptual models, such as the PDM (Moore, 2007) and IHACRES (Jakeman et al., 1993) models, where one store stands for a relatively quick catchment response and the other for a slower response.The structure of options F3a Introduction

Conclusions References
Tables Figures

Back Close
Full

Principle
In optimization problems with at least two conflicting objectives, a set of solutions rather than a unique one exists because of the trade-offs between these objectives.A Paretooptimal solution is achieved when it cannot be improved upon without degrading at least one of its objective criteria.The set of Pareto-optimal solutions for a given model is often called the "Pareto set" and the set of criteria corresponding to this Pareto set is usually referred to as the "Pareto front".

The NSGA-II algorithm
The Non-dominated Sorted Genetic Algorithm II (NSGA-II) (Deb, 2002) was selected to calibrate the models implemented within the multiple-hypothesis framework.This algorithm has been used successfully in a number of recent hydrological studies (see e.g.Khu and Madsen, 2005;Bekele and Nicklow, 2007;De Vos and Rientjes, 2007;Fenicia et al., 2008;Shafii and De Smedt, 2009) and has the advantage of not needing any additional parameter (other than those common to all genetic algorithms, i.e. the initial population and the number of generations).Its most distinctive features are the use of a binary tournament selection, a simulated binary crossover and a polynomial mutation operator.For brevity's sake, the detailed instructions of the algorithm and the conditions of its application to rainfall-runoff modeling cannot be discussed further here.Instead, the reader is referred to the aforementioned literature.

Conclusions References
Tables Figures

Back Close
Full Four criteria were chosen to evaluate the models built within the multiple-hypothesis framework.The first three of them are common to both calibration and validation periods while the fourth criterion differs between the two.
The first criterion (NSE) is the related to the estimation of high flows and draws upon the Nash-Sutcliffe efficiency metric: Where Q d obs and Q d sim are the observed and simulated discharges for day d , and N is the number of days with available observations.
The second criterion (NSE log ) is related to the estimation of low flows and draws upon a modified, log version of the first criterion: The third criterion quantifies the mean annual volume error (VE M ) made in the estimation of the water balance of the catchment: Where V y obs and V y sim are the observed and simulated volumes for year y, and N years is the number of years of the simulation period.
The fourth criterion (Crit4) differs between the two simulation periods.In calibration, snow-covered areas (SCA) estimated from the MODIS data were used to evaluate the consistency of snow-accounting modeling options in terms of snow presence or absence in the basin.The objective was to quantify the error made in simulating the seasonal dynamics of snow accumulation, storage and melt processes.Following Parajka and Blöschl (2008), the snow error (SE) was defined as the total number of Introduction

Conclusions References
Tables Figures

Back Close
Full days when the snow-accounting store of options B1a, B1b and B1c disagreed with the MODIS data as to whether snow was present in the basin (Fig. 3).The number of days with simulation errors is eventually divided by the total number of days with available MODIS data to express SE as a percentage.
In validation, a cumulated volume error was used to replace the snow error criterion that could not be computed due to a lack of remotely-sensed data over this period: (5)

Model selection, model analysis and ensemble modeling
Finally, a total of 72 model structures were implemented and tested within the multiobjective and multiple-hypothesis frameworks.In addition to their names and for purposes of simplicity, these 72 model hypotheses are given a number from 1 to 72 corresponding to their order of appearance in the simulation process (see e.g.Sect.4.1).Model hypotheses can be thought of as points x in the space of performance measures.One possible way to locate these points in space is to consider that each coordinate (x i ) i =1...4 of x is given by the best performance obtained along the Pareto front of model x with respect to the i th criterion described in Sect.3.3.2.
A clustering technique based on the fuzzy c means algorithm (Bezdek et al., 1983) and the initialization procedure developed by Chiu (1994) was chosen to explore this multi-objective space and identify natural groupings among model hypotheses.To facilitate comparison between calibration and validation, the clustering operations were repeated independently for each period.The whole experiment, from model building to multi-objective optimization and cluster identification, was repeated several times to ensure that the final composition of the clusters remains the same.
Once the composition of each cluster was established, it was possible to identify a set of "best-performing" clusters for each simulation period, i.e. a set of clusters with the smallest Euclidian distances to the origin of the objective space.The model Introduction

Conclusions References
Tables Figures

Back Close
Full structures of these "best-performing" clusters can be regarded as equally acceptable representations of the system.An important indicator of structural uncertainty is the extent to which the simulation bounds derived from the Pareto sets of these models reproduce the various features of the observed hydrograph.The overall uncertainty envelope should be wide enough to include most of the observed discharge but not so wide that its representation of the various aspects of the hydrograph (rising limb, peak discharge, falling limb, baseflow) becomes meaningless.In general, one will seek to reduce as much as possible the width of the envelope while maximizing the number of observations enclosed within the bounds.In this study, priority was given to maintaining at its lowest value the number of outlying observations before searching for the best combination of models which minimized the envelope area.This was achieved iteratively through the following steps: 1. Start with an initial ensemble composed of the N max models identified as members of the best-performing clusters in both calibration and validation (i.e.models which fail the validation test are ruled out).
2. From now on, consider only the calibration period.
Add up the N max individual simulation envelopes that can be obtained from the Pareto sets of the N max models (hereafter referred to as the "Pareto-envelopes").
3. Estimate the maximum number of observations enclosed within the resulting overall envelope, N obs (N max ), and calculate the area of this envelope, Area(N max ).
Reject the current combination.
c.If all the possible combinations of N max − k models are rejected, break the loop.The final ensemble of models to consider is the last accepted combination of N max − k + 1 models.

Cluster analysis
The 72 model hypotheses can be grouped into 5 clusters in calibration and 6 in validation.Table 3 displays the coordinates of the cluster centroids and gives, for each cluster, the number of points with membership values above 50 %.Figure 4 shows the projections of these clusters onto three possible two-dimensional (2-D) subspaces of the objective space (the three other subspaces being omitted for brevity's sake).Each cluster is given a rank (from 1 to 5 or 6) reflecting its distance from the origin of the coordinate system.As is evident from both Fig. 4 and Table 3, most of the best-performing structures can be found in Cluster 1.This is particularly clear in the planes defined by the high-flow (Crit1) and low-flow (Crit2) criteria (Fig. 4), where all clusters tend to line up along a diagonal axis (dashed line).In contrast, a small trade-off between Cluster 1 and Cluster 2 can be observed in calibration in Introduction

Conclusions References
Tables Figures

Back Close
Full the plane defined by the high-flow (Crit1) and volume error (Crit3) criteria: models from Cluster 2 (respectively Cluster 1) tend to perform slightly better than those from Cluster 1 (respectively Cluster 2) with respect to Crit3 (respectively Crit1).However, this trade-off disappears in validation.Similar comments can be made about the other 2-D subspaces (not shown here).In the following analysis, Cluster 1 will be considered as the only best-performing cluster.This cluster encompasses 24 members in calibration as against 15 in validation, indicating that several model structures do not pass the validation test (namely models no. 30, 32, 49, 52, 53, 55, 66, 67, 69 and 72, as shown in Table 4).
Several observations can be made regarding the composition of Cluster 1 in both simulation periods.As can be seen from the values listed in Table 4, it is not possible to pick out a single, unambiguous model hypothesis that would perform better than the others with respect to all criteria.On the one hand, there appears to be several equally acceptable structures for each individual criterion.Models no.22, 46 and 54, for instance, yield very similar values of the high-flow criterion (Crit1), despite huge differences in their modeling options.This illustrates the equifinality of model structures in reproducing one aspect of the system behavior.On the other hand, some structures seem more appropriate to the simulation of high flows or snow dynamics while others appear to be better at reproducing low flows or estimating the annual water balance of the catchment.This indicates trade-offs between model structures in reproducing several aspects of the system behavior.It is however possible to identify some recurring patterns among the modeling options present in (or absent from) Cluster 1 in both periods.First, option B1c is the most represented snowmelt-accounting hypothesis, despite an increase in the number of alternative options (B1a, B1b) in validation.More strikingly, option C2 is totally absent from Cluster 1 in both periods. in calibration and over 90 % in validation.In particular, option F3a turns out to be completely absent from Cluster 1 in both periods while models based on option F2a (no.49, 55, 67 and 69) fail the validation test.On the opposite, option F2b is particularly well-represented.

Pareto analysis
In general, valuable insight can be gained from the mapping of Pareto fronts in the space of performance measures.While a full description of all the Pareto fronts obtained in calibration is not possible here due to space limitations, two emblematic model hypotheses are used to illustrate this point.Figure 5 shows the Pareto-optimal solutions of models no.49 (A1-B1c-C1-D1-E1-F2a) and 50 (A1-B1c-C1-D1-E1-F2b) plotted in two dimensions for different combinations of two of the four objective functions used in calibration.Note that these two models differ only in their runoff transformation options (F2a vs. F2b) so that the comparison can be made in a controlled way.
Trade-offs between the high-flow (Crit1) and low-flow (Crit2) criteria are clearly more important with option F2a (Fig. 5a) than with option F2b (Fig. 5b).This means that option F2a is less efficient in reproducing simultaneously high and low flows and explains why this option disappears from Cluster 1 in validation.By contrast, the other pairs of criteria (Crit1-Crit3, Crit1-Crit4) displayed in Fig. 5 appear to be less useful in differentiating between the two models.Further insight into the structural strengths and weaknesses of model hypotheses can be obtained by determining how parameter values vary along the Pareto fronts of the models.A large "Pareto range" in some parameters indicates structural deficiencies in the corresponding model components (see e.g.Gupta et al., 1998) or a lower sensitivity of model outputs to those parameters (Engeland et al., 2006).For purposes of clarity, Fig. 6 focuses on eight illustrative structures identified as members of Custer Introduction

Conclusions References
Tables Figures

Back Close
Full using the lower and upper limits given in Table 2 so that all of them lie between 0 and 1. Different colors are used to indicate the parameter sets associated with the smallest high-flow (in black), low-flow (in red), volume (in blue) and snow (in green) errors.To what extent these colored solutions converge toward the same parameter values or diverge from each other determines the level of parameter identifiability of each model hypothesis.As regards snow-accounting options, a distinction can be made between snow accumulation paramaters (T S and m S ), whose ranges of variation appear to be large in all cases, and snowmelt parameters (T M , f M , r 1 , r 2 , f 1 , f 2 ), whose levels of identifiability depend on interactions with the other model components.In Fig. 6a, the Pareto range of snowmelt parameters decreases in width when moving from option B1a to B1b and using the combination of options C3-D2-E1.Yet changing this combination into C3-D1-E2 has the opposite effect (Fig. 6b): parameter uncertainty now decreases when moving from option B1b to B1a.As regards runoff transformation parameters (α, N b , K 2 , K 3 , δ, S C and K 4 ), the black and red solutions are closer to each other when options F2b (Fig. 6a, b and c) and F1b (Fig. 6d) are used.By contrast, options F2a (Fig. 6c) and F1a (Fig. 6d) require very different parameter sets to adequately simulate both low and high flows.Again, this suggests that runoff transformation options based on a threshold-like behavior may be more consistent with the observed data than those based on a power law relationship.It should be noted, however, that relatively large Pareto ranges in some runoff transformation parameters (e.g.K 2 and K 3 ) may still be required to obtain small volume and snow errors at the same time as high low-flow and high-flow performances (e.g.models no.44 and 54).Interestingly, the black, red and blue solutions of models no.49, 50, 53 and 54 also converge towards the same low values of parameter K C (evapotranspiration coefficient) independently of runoff transformation options.
Drawing any conclusion at this stage about the links between parameter identifiability and model performance might be somewhat hazardous.Other examples (not shown here) show that a model structure may have highly identifiable parameter values in calibration and yet not be suited to the conditions prevailing in validation.Also, Introduction

Conclusions References
Tables Figures

Back Close
Full a reduction of parameter uncertainty as is the case with options F2b and F1b often comes with a greater number of parameters.Finally, a better understanding of the reasons why some models, or modeling options, work better than others is provided by the simulation bounds (or Paretoenvelopes) derived from the Pareto sets of these models.Figure 7 shows the Paretoenvelopes of the SWE internal state obtained with three competing model hypotheses (no.6, 30 and 54) differing only in their snowmelt-accounting options.Note that only the last two of these models (30, 54) belong to Cluster 1 in calibration (see Table 4).Simulated snow accumulation starts later than expected with all modeling options (B1a, B1b and B1c).As will be further discussed in Sect.5.2, this is likely to indicate systematic errors in the input precipitation and/or MODIS-based SCA data.On the whole, the envelope widths suggest a reduction in the uncertainty associated with the prediction of snow seasonal dynamics when moving from option B1a to option B1c.This is consistent with the mean annual snow errors reported in Table 4, which are significantly lower with option B1c independently of the other model options.It must be acknowledged, however, that even this option (B1c) fails to capture the seasonal dynamics of snow accumulation and melt during several years of the period.The release of water from the snow-accounting store of model no.54 continues well after the end of the observed snowmelt season in 2008, 2009, 2010 and 2011.On the contrary, the simulated snowmelt season tends to end sooner than expected with model no.30 in 2003, 2004, 2005 and 2006.In that case, options B1b and B1c appear to be somewhat complementary.

Snow accumulation and melt
The relatively large Pareto bounds obtained for parameters T S and m S with nearly all model hypotheses indicate that mixed conditions of rain and snow are likely to occur across a large range of temperatures.This may be due to the lumped representation Introduction

Conclusions References
Tables Figures

Back Close
Full of the snow accumulation process and the necessity to implicitly account for spatial variations in rain/snow partitioning across the catchment.Likewise, the relatively high values of parameter K C (> 0.2) obtained with the green solutions (smallest snow errors) of models no.50, 53 and 54 (Fig. 6) might indicate a need to compensate for the absence of sublimation scheme in the available snow modeling options.The sine function used in option B1c appears to be better suited to the estimation of the melt factor than the other options tested in this study (B1a, B1b).The degree-day method implemented in option B1a has a physical basis (Ohmura, 2011).Yet some components in the energy balance of snow-covered areas cannot be fully captured by temperature alone nor easily reduced to a simple formula (Hock, 2003).In semi-arid central Andes (29-30 • S), small zenith angles and a thin, dry and cloud-free atmosphere during most of the year make incoming shortwave radiation the most important source of seasonal variations in the energy available for melt (see e.g.Aberman et al., 2013).As a result, the seasonal timing of snowmelt is expected to show greater year-to-year stability, which may explain the relative success of option B1c when compared to option B1b.

Runoff generation
The absence of option C2 in Cluster 1 in both simulation periods suggests that moisture accounting components may not be essential to the conceptual modeling of this semiarid Andean catchment.Most of the land cover is, indeed, dominated by barren to sparsely vegetated exposed rocks, boulders and rubble with poor soil development outside the valleys.This setting may also explain the relatively low values of parameter K C obtained with the black, red and blue solutions shown in Fig. 6.

Runoff transformation and routing
The high representation of options F2a and F2b in Cluster 1 suggests that the catchment actually behaves as a serial system and may reveal a better correspondence with its overall physical structure.The overall organization of fluxes in the catchment, Introduction

Conclusions References
Tables Figures

Back Close
Full from high elevations toward the valleys and then northward to the outlet, can be conceptualized as a series of two hydraulically connected reservoirs: one standing for the mountain blocks (upstream reservoir) and the other for the alluvial valleys (downstream reservoir).Of course, this interpretation needs to be qualified, since other runoff transformation options (F1a, F1b and F3b) have proved to yield equally acceptable simulations despite significant differences in their model structures.The results also provide evidence for an emergent threshold behavior at the catchment scale, which might be related to connectivity levels among the fractured and till-mantled areas of the mountain blocks.

Representation of structural uncertainties
This section deals with the identification and use of an ensemble of equally acceptable model structures to quantify and represent the uncertainty arising from the system nonidentifiability.Figure 7 shows the overall uncertainty envelope obtained with the 8 model structures whose combination minimizes the envelope area in calibration while holding constant the number of outlying observations (see Sect. 3.3).Over 82 % of discharge observations are captured by the envelope in both simulation periods.Interestingly, this number exceeds the best N par value obtained in calibration with the individual Paretoenvelopes (see Table 4), which shows how necessary it is to consider an ensemble of model structures.In validation, however, a better combination could be identified since several models of Cluster 1 display significantly higher N par values (Table 4).On the This study provided an opportunity to combine a modular modeling approach with a multi-criteria evaluation scheme to reduce structural uncertainty in the conceptual modeling of a large Andean catchment over a 30 year period.In particular, it demonstrated the benefits of using the concept of Pareto-efficiency to discriminate among several competing model structures.Among the 72 hypotheses tested, the results showed that 58 model hypotheses can be rejected as inappropriate.However, 14 other hypotheses were shown to yield equally acceptable representations of the catchment hydrological functioning in both calibration and validation.Further, the simulation envelopes derived from the Pareto sets of 8 model structures among the 14 best-performing ones were used to represent the minimum structural uncertainty that could be obtained with this modeling framework.The rejection of some hypotheses was closely related to particular types of model components or modeling options.
For instance, option C2, in which runoff generation requires the filling a moistureaccounting store, can be ruled out from the set of plausible runoff generation representations.It is noteworthy that most rejected hypotheses among the 24 identified in calibration as the best-performing ones have more than 11 free parameters, with only one rejected hypothesis having 9 parameters.Thus, more parsimonious models seem to better withstand changes in the climate conditions.The principle of parsimony, however, cannot be used to further discriminate between the remaining best-performing hypotheses.For instance, model no.54 (12 parameters) performs better than model no. 2 (9 parameters) with respect to the high-flow criterion.
There remains several ways to improve this assessment of structural uncertainty and model suitability.In particular, the concept of Pareto optimality should not be confused with that of equifinality.Of course, both notions agree that it is not possible to identify a single, best solution to the calibration problem and that multiple parameter sets should be retained to give a proper account of model uncertainty.However, the Pareto set of solutions represents the minimum parameter uncertainty that can be achieved Introduction

Conclusions References
Tables Figures

Back Close
Full when several criteria are considered simultaneously with no a priori preference for one over the others (Gupta et al., 2003).By contrast, two parameter sets are said to be equifinal if they can be regarded as equally acceptable in a statistical sense with respect to one particular criterion (for more details on these differences, see Engeland et al., 2006).From this perspective, the choice of Pareto optimality to characterize model uncertainty can be criticized for leading to the rejection of many behavioural parameter sets (i.e.being close to, but not part of, the Pareto front) that might have been Pareto-optimal with different performance measures, calibration data or errors in the input data (e.g.Freer et al., 2003;Beven, 2006).One possible way to address this limitation and improve model transposability in time has been suggested by Gharari et al. (2013).The idea is to divide the calibration period into k sub-periods and identify parameter sets (in the whole parameter space) which minimize the distance to the k Pareto fronts of these sub-periods.For a proper assessment of parameter equifinality, however, Bayesian frameworks should be considered (Madsen, 2000;Huisman et al., 2010).
The use of Pareto-envelopes to quantify structural uncertainty is also questionable in that it fails to account for all discharge observations, as shown in Table 4.While this failure can be partly remedied within a multiple-hypothesis framework (MHF), Fig. 8 shows that the overall uncertainty envelope obtained by merging the Paretoenvelopes of 8 competing model hypotheses still leaves out a significant part of the observations.Indeed, like any other modular framework, the MHF developed in this study suffers from an insufficient coverage of the hypothesis space (Gupta et al., 2012).The parameterization of evapotranspiration, for example, was not considered as an independent model-building decision.Only one formula was applied to calculate potential evapotranspiration and the possibility to retrieve actual evapotranspiration from downstream water stores was not provided.Likewise, the runoff transformation process was described using only two water stores, of which only one was assumed to have a nonlinear behavior.Future work to improve the conceptual modeling of the Claro River Catchment should include the testing of new or refined hypotheses to allow for Introduction

Conclusions References
Tables Figures

Back Close
Full the use of additional auxiliary data (e.g.groundwater levels).Competing alternatives to the lumped mode used in this study should also be included within the MHF.For example, semi-lumped approaches in which snow accumulation and melt components are applied at the grid-cell level provide an interesting way to improve the use of snow-cover data without increasing too much computational requirements.In this way, catchment-wide snow-covered areas (SCA) can be simulated and directly compared to MODIS-based data.Daily rainfall and snowmelt amounts are then integrated over all grid cells to be used as catchment-averaged inputs in the subsequent spatially-lumped model components (see e.g.Schreider et al., 1997).This improved MHF should then be applied to other mesoscale catchments to better understand how the specific features of each catchment relate to specific model requirements.Such understanding is of primary importance for the use of conceptual models in climate change impact studies.Finally, our ability to discriminate among the competing model hypotheses was constrained by inevitable errors in the input and output data sets.In particular, the comparison of simulated SWE levels and MODIS-based SCA estimates revealed considerable uncertainty in the estimation of precipitation inputs.Some precipitation events occurring in the early winter are not captured by the gauging network (< 3000 m a.s.l.) used for the interpolation of precipitation across the catchment.These errors add to the systematic volume errors caused by wind, wetting and evaporation losses at the gauge level, leading to an overall underestimation of precipitation, as indicated by the rough estimation of catchment-scale water balance given in Sect. 2. It was also possible to highlight some errors in the streamflow data.Part of these errors might be associated with uncertainties in the estimation of natural streamflow.Further research is therefore required to better integrate the effect of water abstractions in the hydrological modeling process.From a multiple-hypothesis perspective, the modeling of irrigation water withdrawals should be regarded as a testable model component in its own right.Introduction

Conclusions References
Tables Figures

Conclusions References
Tables Figures

Back Close
Full

Conclusions References
Tables Figures

Back Close
Full Coefficient for computation of the models based on the combination of several schematic stores are popular tools in flood forecasting and water resources management (e.g.Jakeman Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Their hydraulic conductivity and saturated thickness range from about 10 m d −1 and 40 m respectively in the upper part of the catchment to more than 30 m d −1 and 60 m respectively at the outlet (CAZALAC, 2006), allowing a rapid transfer of water to the hydraulically connected surface streams.Pourrier et al. (2014) studied flow processes and dynamics in the headwaters of the neighbouring Turbio River catchment; yet very little remains currently known about the emergent processes taking place at the catchment scale.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 3.2 Multi-objective optimization Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

4.
For k = 1 to N max a. Identify the N max N max − k possible combinations of N max models taken N max −k at a time.b.For each of these combinations Discussion Paper | Discussion Paper | Discussion Paper | -Add up the individual Pareto-envelopes of the N max − k models and calculate the number of observations enclosed within the bounds of the resulting overall envelope, N obs Discussion Paper | Discussion Paper | Discussion Paper | Single-flux combinations (C1-D1 and C3-D2) and their splitting counterparts (C1-D3 and C3-D1) tend to be equally well-represented, thus providing evidence of significant equifinality among these conceptual representations.Finally, runoff transformation options based on a threshold-like behavior (F1b, F2b and F3b) account for 75 % of model hypotheses Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | whole, the comparison of the observed hydrograph with the simulation bounds of the envelope shows a good match of rising limbs and peak discharges in both simulation periods, but a less accurate fit of falling limbs during at least one major (in 1987/1988) and two minor (in 2005/2006 and 2007/2008) events.The slower recession of the observed hydrograph might indicate a delayed contribution of one or more catchment compartments that cannot be described by any of the modeling options available in the multiple-hypothesis framework.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | De Vos, N. J. and Rientjes, T. H. M.: Multi-objective performance comparison of an artificial neural network and a conceptual rainfall-runoff model, Hydrolog.Sci.J.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Figure 6 .Figure 8 .
Figure 6.Estimated normalized ranges of the Pareto-optimal sets of eight alternative model structures differing in at least one of their components.The colored lines stand for the best solutions obtained in calibration with respect to the high flow criterion (in black), the low flow criterion (in red), the mean annual volume error (in blue) and the snow error (in green).

Table 1 .
Constitutive equations of fluxes between the various components of the modeling options described in Fig.2.Parameter (in italic) significations and units are detailed in Table2(P : catchment-averaged daily precipitation; rain: rain fraction of precipitation P ; snow: snow fraction of precipitation P ; T : catchment-averaged daily temperature; PE: catchment-averaged daily potential evapotranspiration; AE: catchment-averaged daily actual evapotranspiration; S j , j ∈ [1, 5]: state variables of the conceptual stores; Q j , j ∈ [1, 5]: water fluxes between the model components).

Table 2 .
Parameters used in the various modeling options with their signification and initial sampling.

Table 3 .
Coordinates of the cluster centroids in the four-dimensional (4-D) space of performance measures.The number of models with membership values > 50 % (N 50 % ) is given for each cluster.