Using expert knowledge to increase realism in environmental system models can dramatically reduce the need for calibration

. Conceptual environmental system models, such as rainfall runoff models, generally rely on calibration for parameter identiﬁcation. Increasing complexity of this type of models for better representation of hydrological process heterogeneity, typically makes parameter identiﬁcation more difﬁcult. Although various, potentially valuable, approaches for better parameter estimation have been developed, strategies to impose general conceptual understanding of how a catchment works into the process of parameter estimation has not been fully explored. In this study we assess the effects of imposing semi-quantitative, relational inequality constraints, based on expert-knowledge, for model development and parameter speciﬁcation, efﬁciently exploiting the complexity of a semi-distributed model formulation. Making use of a topography driven rainfall-runoff modeling (FLEX-TOPO) approach, a catchment was delineated into three functional units, i.e., wetland, hillslope and plateau. Ranging from simple to complex, three model setups, FLEX A , FLEX B and FLEX C were developed based on these functional units, where FLEX A is a lumped representation of the study catchment, and the semi-distributed formulations FLEX B and FLEX C progressively introduce more complexity. In spite of increased complexity, FLEX B and FLEX C allow modelers to compare parameters, as well as states and ﬂuxes, of their different functional units to each other, allowing the formulation of constraints that limit the feasible parameter space. We show that by allowing for more landscape-related process heterogeneity in a model, e.g., FLEX C , the performance increases even without traditional calibration. The additional introduction of relational constraints further improved the performance of these models.


Introduction
Lumped conceptual and distributed physically based models are the two endpoints of the modeling spectrum, ranging from simplicity to complexity, which here is defined as the number of model parameters. Each modeling strategy is characterized by advantages and limitations. In hydrology, physically based models are typically applied under the assumptions that (a) the spatial resolution and the complexity of the model are warranted by the available data, and (b) the catchment response is a mere aggregation of small scale processes. However, these two fundamental assumptions are commonly violated. As a result, the predictive power and hydrological insight achievable via these models is limited (e.g., Beven, 1989Beven, , 2001Grayson et al., 1992;Blöschl, 2001;Pomeroy et al., 2007;Sivapalan, 2006;McDonnell et al., 2007;Hrachowitz et al., 2013b).
In contrast, lumped conceptual models require less data for model parameters estimation. This advantage comes at the expense of considerable limitations. Representing system integrated processes, model structures and parameters are not directly linked to observable quantities. Their estimation therefore strongly relies on calibration. To limit parameter identifiability issues arising from calibration, these models are often oversimplified abstractions of the system, and if inadequately tested they may act as mathematical marionettes (Kirchner, 2006). They may outperform more complex distributed models (e.g., Refsgaard and Knudsen, 1996;Ajami et al., 2004;Reed et al., 2004), but often fail to provide realistic representations of the underlying processes, leading to limited predictive power (e.g., Freer et al., 2003;Seibert, 2003;Kirchner, 2006;Beven, 2006;Kling and Gupta, 2009;Andréassian et al., 2012;Euser et al., 2013;Gharari et al., 2013).
Traditional parameter estimation of conceptual models relies on the availability of calibration data, which, however, are frequently not available for the time period or the spatiotemporal resolution of interest. A wide range of regionalization techniques for model parameters and hydrological signatures have been developed to avoid calibration in such data scarce environments (e.g., Bárdossy, 2007;Yadav et al., 2007;Perrin et al., 2008;Zhang et al., 2008;Kling and Gupta, 2009;Samaniego et al., 2010;Kumar et al., 2010;Wagener and Montanari, 2011;Kapangaziwiri et al., 2012;Viglione et al., 2013). However, it is challenging to identify suitable functional relationships between catchment characteristics and model parameters (e.g., Merz and Blöschl, 2004;Kling and Gupta, 2009), and only recently did Kumar et al. (2010Kumar et al. ( , 2013a show that multi-scale parameter regionalization (MPR) can yield global parameters that perform consistently over different catchment scales. In a further study they successfully transferred parameters obtained by the MPR technique to ungauged catchments in Germany and the USA (Kumar et al., 2013b).
Related to these difficulties with parameter identifiability, the lack of sufficient representation of processes heterogeneity (i.e., complexity) in conceptual models limits the degree of realism of these models. The concept of hydrological response units (HRUs) can be exploited as a strategy for an efficient tradeoff between model simplicity, required for adequate parameter identifiability, and a sufficient degree of process heterogeneity. HRUs are units within a catchment, characterized by a different hydrological function and can be represented by different model structures or parameters. In most cases HRUs are defined based on soil types, land cover and other physical catchment characteristics (e.g., Knudsen et al., 1986;Flügel, 1995;Grayson and Blöschl, 2000;Krcho, 2001;Winter, 2001;Scherrer and Naef, 2003;Wolock et al., 2004;Pomeroy et al., 2007;Scherrer et al., 2007;Schmocker-Fackel et al., 2007;Efstratiadis and Koutsoyiannis, 2008;Lindström et al., 2010;Nalbantis et al., 2011;Kumar et al., 2010).
A wide range of studies also points towards the potential value of using topographical indices, readily available from digital elevation models (DEMs), to account for process heterogeneity (e.g., McGlynn and McDonnell, 2003;Seibert, 2003;McGuire et al., 2005;Hrachowitz et al., 2009;Jencso et al., 2009;Detty and McGuire, 2010;Gascuel-Odoux et al., 2010). Because standard metrics of landscape organization, such as absolute elevation, slope or curvature, as used in the catena concept (Milne, 1935;Park and van de Giesen, 2004), are often not strong enough descriptors to infer hydrological function, alternative concepts have been sought. The development of derived metrics such as the Topographic Wetness Index (Beven and Kirkby, 1979) facilitated an important step forward, being at the core of TOPMODEL (e.g., Beven and Kirkby, 1979;Beven and Freer, 2001a), which has proven to be a valuable approach in specific environmental settings meeting the assumptions of the model. A different descriptor allowing a potentially more generally applicable and hydrologically meaningful landscape classification has recently been suggested by Rennó et al. (2008): the Height Above the Nearest Drainage (HAND). Nobre et al. (2011) showed the hydrological relevance of HAND by investigating long term groundwater behavior and land use.
Explicitly invoking the co-evolution of topography, vegetation and hydrology, Savenije (2010) argued that catchments, as self-organizing systems, need to fulfill the contrasting hydrological functions of efficient drainage and sufficient water storage to allow, in a feedback process, topography and vegetation to develop the way they did. These distinct hydrological functions can be associated with different landscape elements or HRUs as defined by HAND and slope, such that each HRU is represented by a model structure best representing its function in the ecosystem (cf. Savenije, 2010).
While HAND-based landscape classification can potentially show a way forward, it does not solve the problem arising when moving from lumped to HRU-guided, semidistributed model formulations: multiple parallel model structures typically result in an increased number of parameters, which, when not adequately constrained, may increase equifinality and thereby limit predictive uncertainty (e.g., Gupta and Sorooshian, 1983;Beven, 2006;Gupta et al., 2008). To better satisfy the contrasting priorities of model complexity and predictive power, new strategies are sought to more efficiently utilize the modelers' understanding of the system, particularly when for constraining the feasible modeland parameter space is scarce (e.g., Gupta et al., 2008;Wagener and Montanari, 2011;Singh and Bárdossy, 2012;Andréassian et al., 2012;Gharari et al., 2013;Hrachowitz et al., 2013a;Razavi and Tolson, 2013).
In contrast to earlier attempts to constrain models using multiple evaluation criteria or a priori information on catchment properties such as land use or soil type (e.g., Koren et al., 2008), this study tests the utility of a different and so far underexploited type of constraint, based on a priori understanding of the system. The concept of topography-driven conceptual modeling involves the identification of HRUs that operate in parallel. Linked to the technique of regularization (e.g., Tikhonov, 1963;Engl et al., 1996), this opens the possibility to impose semi-quantitative, expert knowledge based, relational constraints of catchment behavior on model parameters, similar to what was suggested by Pokhrel et al. (2008) and Yilmaz et al. (2008). In contrast to those studies, the suggested concept introduces relations between parallel HRUs exclusively based on hydrological reasoning to ensure that similar processes between parallel model structures are represented in an internally consistent way, thereby reducing the feasible parameter space. The advantage of this method is that there is only minimal need to quantify the constraints or the prior parameter distributions (e.g., Koren et al., 2000Koren et al., , 2003Kuzmin et al., 2008;Duan et al., 2006), while allowing for a meaningful and potentially more realistic representation of the system in which each model component is, within certain limits, constrained to do what it is designed to do, rather than allowing it to compensate for data and model structural errors.
The objective of this paper is to test the hypothesis that application of model constraints based on expert knowledge (regarding relations between parameters, fluxes and states) to semi-distributed conceptual models defined by a hydrologically meaningful, topography-based, landscape classification system combined can (1) increase model internal consistency and thus the level of process realism as compared to lumped model setups, (2) increase the predictive power compared to lumped model setups and (3) reduce the need for model calibration.

Study area and data
The outlined methodology will be illustrated and tested with a case study using data of the Wark catchment in the Grand Duchy of Luxembourg. The catchment has an area of 82 km 2 with the catchment outlet located downstream of the town of Ettelbrück at the confluence with the Alzette River (49.85 • N, 6.10 • E, Fig. 1). With an annual mean precipitation of 850 mm yr −1 and an annual mean potential evaporation of 650 mm yr −1 the annual mean runoff is approximately 250 mm yr −1 . The geology in the northern part is dominated by schist while the southern part of the catchment is mostly underlain by sandstone and conglomerate. Hillslopes are generally characterized by forest, while plateaus and valley bottoms are mostly used as crop land and pastures, respectively. Drogue et al. (2002) quantified land use in the catchment as 4.3 % urban areas, 52.7 % agricultural land and 42.9 % forest. In addition they reported that 61 % of catchment is covered by permeable soils while the remainder is characterized by lower permeability substrate. The elevation varies between 195 to 532 m, with a mean value of 380 m. The slope of the catchment varies between 0 and 200 %, with a mean value of 17 % (Gharari et al., 2011).
The hydrological data used in this study include discharge measured at the outlet of the Wark catchment, potential evaporation estimated by the Hamon equation (Hamon, 1961) with temperature data measured at Luxembourg airport (Fenicia et al., 2006); and precipitation measured by a tipping bucket rain gauge located at Reichlange. The temporal resolution used in this study is 3 h.

FLEX-TOPO framework
Realizing the potential of "reading the landscape" in a systems approach (cf. Sivapalan et al., 2003), Savenije (2010) argued that due to the co-evolution of topography, soil and vegetation, all of which define the hydrological function of a given location, an efficient, hydrologically meaningful descriptor of topography together with land use could be used to distinguish different HRUs. HAND, which can be loosely interpreted as the hydraulic head at a given location in a catchment, may be such a descriptor as it potentially allows for meaningful landscape classification (e.g., Rennó et al., 2008). It was argued previously (Gharari et al., 2011) that, in central European landscapes, HAND can efficiently distinguish between wetlands, hillslopes and plateaus. These are landscape elements that may also be assumed to fulfill distinct hydrological functions (HRUs) in the study catchment (Savenije, 2010). Wetlands, located at low elevations above streams, are characterized by shallow ground water tables with limited fluctuations. Due to reduced storage capacity between ground water table and soil surface, potentially exacerbated by the relative importance of the capillary fringe, wetlands tend to be saturated, and thus connected, earlier during a rainfall event than the two other landscape elements with arguably higher storage capacity, thus frequently becoming the dominant source of storm flow during comparably dry periods (e.g., Seibert, 2003;McGlynn et al., 2004;Molénat et al., 2005;Blume et al., 2008;Anderson et al., 2010;Kavetski et al., 2011). The dominant runoff process in wetlands can therefore be assumed to be saturation overland flow. In contrast, forested hillslopes, , (e) the classified landscape units wetland, hillslope and plateau using the combined HAND and slope thresholds of 5 m and 11 %, respectively (from Gharari et al., 2011). landscape elements with steeper slopes than the wetlands or plateaus, require a balance between sufficient storage capacity and efficient drainage to develop and maintain the ecosystem (Savenije, 2010). A dual system combining sufficient water storage in the root zone and efficient lateral drainage through preferential flow networks, controlled by a suite of activation thresholds as frequently observed on hillslopes (e.g., Hewlett, 1961;Beven and Germann, 1982;Sidle et al., 2001;Freer et al., 2002;Weiler et al., 2003;McNamara et al., 2005;Tromp-van Meerveld and McDonnell, 2006a, b;Zehe and Sivapalan, 2009;Spence, 2010) can be seen as the dominant mechanism. Finally, plateaus are landforms with low to moderate slopes and comparably deep ground water tables. In absence of significant topographic gradients and due to the potentially increased unsaturated storage capacity, it can be hypothesized that the primary functions of plateaus are sub-surface storage and groundwater recharge (Savenije, 2010). Although plateaus may experience infiltration excess overland flow in specific locations, the topographical gradients may not be sufficient to generate surface runoff connected to the stream network (Liu et al., 2012). In the FLEX-TOPO approach the proportions of the hydrologically distinct landscape units, i.e., HRUs, in a given catchment need to be determined on the basis of topographical and land cover information. Subsequently suitable model structures and parameterizations (read constitutive functions) will be assigned to the different HRUs (Clark et al., 2009;Fenicia et al., 2011;Hrachowitz et al., 2014). The integrated catchment output, i.e., runoff and evaporative fluxes, can then be obtained by combining the computed proportional outputs from the individual HRUs. Note that the three landscape classes tested for suitability in this study, i.e., wetland, hillslope and plateau together with their assumed dominant runoff process are designed for the Wark catchment and are likely to be different for other environmental settings (e.g., Gao et al., 2014).

Landscape classification
As the objective of FLEX-TOPO is to efficiently extract and use hydrologically relevant information from worldwide readily available topographic data, i.e., DEMs, the Height Above the Nearest Drainage (HAND; Rennó et al., 2008;Nobre et al., 2011;Vannametee et al., 2014) is a potentially powerful metric to classify landscapes into HRUs with distinct hydrological function, as discussed above. Testing a suite of HAND-based classification methods Gharari et al. (2011) found that results best matching observed landscape types could be obtained by using HAND together with the local slope. Based on a probabilistic framework to map the desired HRUs which were then compared with in situ observations they obtained a threshold for HAND and slope of approximately 5 m and 11 % for the Wark catchment. Following that, wetlands were defined to be areas with HAND ≤ 5 m. Areas with HAND > 5 m and local slopes > 11 % were classified as hillslopes, while areas with HAND > 5 m and slope ≤ 11 % were defined as plateaus. The HAND and slope map of the study catchment together with the classified landscape entities (wetland, hillslope and plateau) are presented in Fig. 1. The proportion of the individual HRUs wetland, hillslope, and plateau are 15 %, 45 %, and 40 %, respectively.

Model setup
In this study a lumped conceptual model of the Wark catchment, hereafter referred to as FLEX A , is used as similar lumped conceptual models are frequently used in catchment hydrology (e.g., Merz and Blöschl, 2004;Clark et al., 2008;Perrin et al., 2008;Seibert and Beven, 2009;Fenicia et al., 2014). The above discussed concept of FLEX-TOPO (Savenije, 2010) is thereafter tested with a stepwise increased number of parallel landscape units (FLEX B , FLEX C ), thereby increasing the conceptualized process heterogeneity and thus the model complexity. The core of the three model setups is loosely based on the FLEX model (Fenicia et al., 2006(Fenicia et al., , 2008b.

FLEX A
This model setup represents the catchment in a lumped way. The FLEX A model structure consists of four storage elements representing interception, unsaturated, slow (i.e., groundwater) and fast responding reservoirs (i.e., preferential flow and saturation overland flow). A schematic illustration of FLEX A is shown in Fig. 2a. The water balance equations and constitutive functions used are given in Table 2.

Interception reservoir (S I )
The interception reservoir is characterized by its maximum storage capacity (I max [L]). After precipitation (P [L T −1 ]) enters this reservoir the excess precipitation, hereafter referred to as effective precipitation (P e [L T −1 ]), is distributed between the unsaturated (S U ), slow (S S ) and fast (S F ) reservoirs. Interception (I [L T −1 ]) is then dependent on the potential evaporation (E pot [L T −1 ]) and the amount of water stored in the interception reservoir (S I ).

Unsaturated reservoir (S U )
The unsaturated reservoir is characterized by a parameter that loosely reflects the maximum soil moisture capacity in the root zone (S U,max [L]). Part of the effective precipitation (P e ) enters the unsaturated zone according to the coefficient C r , which here is defined by a power function with exponent β [-], reflecting the spatial heterogeneity of thresholds for activating fast lateral flows from S F . This coefficient C r will be 1 when soil moisture (S U ) is lower than a specific percentage of maximum soil moisture capacity (S U,max ) defined by relative soil moisture at field capacity (F C [-]), meaning that the entire incoming effective precipitation (P e ) at a given time step is stored in the unsaturated reservoir (S U ). The soil moisture reservoir feeds the slow reservoir through matrix percolation (R p [L T −1 ]), expressed as a linear relation of the available moisture in the unsaturated zone (S U ) and the maximum percolation capacity (P Per [L T −1 ]). The reverse process, capillary rise (R C ), feeds the unsaturated reservoir from the saturated zone. Capillary rise (R C [L T −1 ]) has an inverse linear relation with the moisture content in the unsaturated zone and is characterized by the maximum capillary rise capacity (C [L T −1 ]). Soil moisture is depleted by plant transpiration. Transpiration is assumed to be moisture constrained when the soil moisture content is lower than a fraction L p [-] of the maximum unsaturated capacity (S U,max ). When the soil moisture content in the unsaturated reservoir is higher than this fraction (L p ) transpiration is assumed to be equal to the available potential evaporation (E pot − I ).

Splitter and transfer functions
The proportion of effective rainfall which is not stored in the unsaturated zone, i.e., 1 − C r , is further regulated by the partitioning coefficient (D [-]), distributing flows between preferential groundwater recharge (R S [L T −1 ]) to S S and water that is routed to the stream by fast lateral processes from S F (e.g., preferential flow or saturation overland flow, R F ). Both fluxes are lagged by rising linear lag functions with parameters N lagf and N lags , respectively (Fenicia et al., 2006).

Fast reservoir (S F )
The fast reservoir is a linear reservoir characterized by reservoir coefficient S F .

Slow reservoir (S S )
The slow reservoir is a linear reservoir characterized by reservoir coefficient S S .

FLEX B
As discussed above, a range of process studies suggested that wetlands can frequently exhibit storage-discharge dynamics that are decoupled from other parts of a catchment, in particular due to their typically reduced storage capacity and closeness to the stream. FLEX B explicitly distinguishes wetlands from the rest of the catchment, the remainder (i.e., hillslopes and plateaus), which is represented in a lumped way, to account for this difference. The FLEX B model setup therefore consists of two parallel model structures which are connected with a common groundwater reservoir (  what has been suggested by Knudsen et al. (1986). One major difference between the two parallel structures is that capillary rise is assumed to be a relevant process only in the wetland, while it is considered negligible in the remainder of the catchment due to the deeper groundwater. Further, since the wetlands are predominantly ex-filtration zones of potentially low permeability, preferential recharge is considered negligible in wetlands. The areal proportions of wetland and the remainder (i.e., hillslope and plateau) of the catchment are 15 % and 85 %, respectively (Gharari et al., 2011).

FLEX C
This model setup offers a complete representation of the three HRUs in the study catchment: wetland, hillslope and plateau (Fig. 2c). The formulation of the wetland module in FLEX C is identical to the one suggested above for FLEX B . The hillslope HRU is represented by a model structure resembling the FLEX A setup. Plateaus are assumed to be dominated by vertical fluxes, while direct lateral movement in the form of Hortonian overland flow is considered negligible compared to those generated from hillslope and wetland HRUs. Therefore the plateau model structure does not account for these fast fluxes. Analogous to FLEX B , the FLEX C setup is characterized by one single groundwater reservoir linking the three dominant HRUs in this catchment. The individual proportions of wetland, hillslope, and plateau are 15 %, 45 %, and 40 %, respectively (Gharari et al., 2011). The proportions of these HRUs are used to compute the total discharge based on the contribution of each landscape unit. The connection between the parallel structures of FLEX B and FLEX C is through the surface drainage network (the stream network) and through the slow (groundwater) reservoir.

Introducing realism constraints in selecting behavioral parameter sets
With increasing process heterogeneity from FLEX A over FLEX B to FLEX C , the respective model complexities and therefore the number of calibration parameters also increase. This, in the absence of sufficient suitable data to efficiently constrain a model, typically leads to a situation where parameters have increased freedom to compensate for errors in data and model structures, as recently reiterated by Gupta et al. (2008). In this study, two fundamentally different types of model constraints were applied to test their value for reducing equifinality in complex model setups, parameter constraints and process constraints.  Breuer et al. (2003).

Parameter constraints
Inequality conditions between parameters of parallel model units, hereafter referred to as parameter constraints, were imposed before each model evaluation run. These a priori constraints ensure that the individual parameter values for the same process in the parallel units, reflect the modeler's perception of the system. For example, it can be argued that the maximum interception capacity (I max ) of a forested HRU needs to be higher than that of a non-forested one. In the absence of more detailed information this does not only allow overlapping prior distributions but it also avoids the need for quantification of the constraints themselves. In the following, a set of parameter constraints imposed on the different model structure are listed. The applicability of each parameter constraint for every model structure is summarized in Table 3. The subscripts w, h, and p indicate parameters for wetland, hillslope, and plateau, respectively.

Interception
Different land cover proportions of individual landscape units, here wetlands, hillslopes and plateaus, can be used to define the relation between interception thresholds (I max ) of these individual units. The land uses are defined as two general classes for this case study: (1) forested areas, (2) cropland and grassland areas. The maximum interception capacity (I max ) for each landscape entity can be estimated from the proportion of land-use classes and their maximum interception capacities, selected from their respective prior 4846 S. Gharari et al.: Increasing model realism reduce the need for calibration distributions as given in Table 1: (2) I max,p = a p I max,forest + b p I max,crop-grassland . (3) The proportions of forested area are indicated with a w , a h and a p for wetland, hillslope, and plateau and are fixed at 42 %, 60 %, and 29 %, respectively. The proportions of cropland and grassland areas are indicated by b w , b h , and b p for wetland, hillslope, and plateau and are fixed at 58 %, 40 %, and 71 %, respectively. Moreover the parameter sets which are selected for maximum interception capacity of forest are expected to be higher than cropland or grassland:

Lag functions
Preferential recharge (R S ) is routed to the slow reservoir by a lag function. Due to a deeper groundwater table on plateaus it can be assumed that the lag time for (R S ) is longer for plateaus than for hillslopes. It can also be assumed that the lag function used for fast reservoir for hillslopes is longer than for wetlands due to the on average higher distance and therefore longer travel times from hillslopes to the stream.

Soil moisture capacity
Wetlands have shallower groundwater tables than the other two landscape entities in this study. Therefore the unsaturated zone of wetland should have a lower maximum soil moisture capacity (S U,max ) than hillslopes and plateaus. Moreover, as hillslopes in the study catchment are predominantly covered with forest, it can, due to the deeper root zone of forests, be expected that the maximum unsaturated soil moisture capacity (S U,max ) in the root zone of hillslopes is deeper than the other two landscape entities.

Reservoir coefficients
The reservoir coefficient of the wetland fast reservoir (K F ) is assumed to be higher than reservoir coefficient of the hillslope fast reservoir as, once connectivity is established, the flow velocities of saturation overland flow in wetlands are assumed to exceed the integrated flow velocities of preferential flow networks (cf. Anderson et al., 2009). Likewise, the reservoir coefficient of the slow reservoir should be lower than both wetland and hillslope fast reservoirs.
The reservoir constraints can be applied to all models while the other constraints can only be applied to FLEX B and FLEX C .

Process constraints
In contrast to the parameters constraints discussed above, which are set a priori, process constraints are applied a posteriori. Only parameters which generate model flux and state dynamics in agreement with the modeler's perception of these dynamics are retained as feasible. Hence, while with the use of parameter constraints there is no need to run the model for rejected parameter sets, here it is necessary to run the model to evaluate it with respect to the process constraints.
Process constraints are defined for dry and wet periods as well as for peak, high, and low flows. Here wet periods were defined to be the months from November to April, while the dry periods in the study catchment occur between May and October. The thresholds for distinguishing between high and low flow were chosen to be 0.05 and 0.2 mm (3 h) −1 respectively for dry and wet periods. Furthermore, events during which discharge increases with a rate of more than 0.2 mm (3 h) −2 are defined as peak flows. Note that in the following the subscripts peak, high and low indicate peak, high, and low flows. The applicability of each process constraint for every model structure is summarized in Table 3.

Transpiration
Transpiration typically exhibits a clear relationship with the normalized difference vegetation index (NDVI, Szilagyi et al., 1998). Therefore the ratios between NDVI values of different landscape units can serve as constraints on modeled transpiration obtained from the individual parallel model components. A rough estimation of the ratio between transpiration from plateau and hillslope can be derived from LAND-SAT 7 images. For this ratio, seven cloud free images were selected (acquisition dates of 20 April 2000, 6 March 2000, 11 September 2000, 18 February 2001, 6 March 2001, 26 March 2001and 29 August 2001. The ratio of transpiration between hillslope and plateau (R trans ) can be estimated by assuming a linear relation (Szilagyi et al., 1998) with slope of α and intercept zero between transpiration and mean NDVI for each landscape unit (µ NDVI ).
Mean (µ Rtrans ) and standard deviation (σ Rtrans ) of the transpiration ratio (R trans ) can be used to estimate acceptable limits of the transpiration ratios for hillslope and plateau. Therefore the annual transpiration can be confined between two values as follows: Based on the mean (µ Rtrans = 1.2) and standard deviation (σ Rtrans = 0.2) of the seven LANDSAT 7 images used, the following process constraint on transpiration from hillslope (T h ) and plateau (T p ) was imposed: Similar constraints can be imposed between transpiration fluxes from wetland, hillslope or plateau; however, the spatial resolution of LANDSAT 7 data with resolution of 30 meters is coarser than the required 20 m DEM resolution for distinguishing wetlands from other landscape entities (Gharari et al., 2011).

Runoff coefficient
The runoff coefficient is a frequently used catchment signature (e.g., Sawicz et al., 2011;Euser et al., 2013) and can be used as a behavioral constraint (e.g., Duan et al., 2006;Winsemius et al., 2009). In this study the runoff coefficients of dry and wet periods as well as the annual runoff coefficient were used. Parameters that result in modeled runoff coefficients that substantially deviate from the observed ones are therefore discarded. In case of absence of suitable runoff data, the long-term mean annual runoff coefficient can be estimated from the regional Budyko curve using for example the Turc-Pike relationship (Turc, 1954;Pike, 1964;Arora, 2002). However, in this study, the runoff coefficients of each individual year and their respective dry and wet periods were used and determined the mean and standard deviation of the runoff coefficients for these periods. Here, as a conservative assumption, the limits are set to three times the standard deviation around the mean runoff coefficient. Note that the runoff coefficient is the only process constraint that is not related to model structure in this study and can therefore also be applied to the lumped FLEX A setup.

Preferential recharge
The slow reservoir can be recharged by both preferential and matrix percolation from the unsaturated reservoirs. Here, hillslopes and plateaus contribute to the slow reservoir by preferential recharge. It can be assumed that in a realistic model setup the long term contribution volume of preferential recharge ratio between hillslope and plateau should not be unrealistically high or low. For example, it can be assumed unrealistic that the ratio is zero or infinity, meaning that one landscape unit is constantly feeding the slow reservoir while another one is not contributing at all. To avoid such a problem, a loose and very conservative constraint was imposed on the ratio of contribution of the two fluxes.

Fast component discharge
During dry periods, hillslopes and plateaus can exhibit significant soil moisture deficits, limiting the amount of fast runoff generated from these landscape elements. In contrast, due to their reduced storage capacity, wetlands are likely to generate fast flows at lower moisture levels, thus dominating event response during dry periods (cf. Beven and Freer, 2001b;Seibert, 2003;Molénat et al., 2005;Anderson et al., 2010;Birkel et al., 2010). It can thus be assumed that for peak flows during dry periods, the fast component of wetlands (Q f,w,dry,peaks ) contributes more to runoff than the fast component of hillslopes (Q f,h,dry,peaks ). In contrast, high flows during wet periods are predominantly generated by fast reaction from hillslopes (Q f,h,wet ; Q f,h,wet,high ) rather than of wetland (Q f,w,wet ; Q f,w,wet,high ). This process constraint is also applied to FLEX B .

Calibration algorithm and objective functions
Based on uniform prior parameter distributions as well as on the parameter and process constraints the model was calibrated using MOSCEM-UA (Vrugt et al., 2003). However penalizing the objective function(s) based on the number of unsatisfied constraints, can lead to non-smooth objective functions that can cause instabilities in the search algorithm and the generation of invalid results. To resolve this issue, a recently developed stepwise search algorithm was used to find parameter sets that satisfy both parameter and process constraints , and these parameter sets were then used as initial sampling parameter sets for MOSCEM-UA (instead of traditional Latin Hypercube sampling). The models were evaluated on the basis of three different objective functions to emphasize different characteristics of the system response: (i) the Nash-Sutcliffe efficiency of the flows (Nash and Sutcliffe, 1970

Model validation and parameter evaluation
To assess the value of incorporating parameter and process constraints in increasingly complex models a four-step procedure was followed.

Evaluating models with constrained but uncalibrated parameter sets
First, all parameter sets that satisfy all the applied constraints were evaluated for their ability to reproduce the observed hydrograph; these parameter sets are referred to as constrained but uncalibrated parameter sets because they were obtained without any calibration to the observed hydrographs. Using these parameter sets, the mean performance of the three constrained but uncalibrated models FLEX A , FLEX B and FLEX C , was evaluated using the three objective functions (E NS , E NS,log , E NS,FDC ). Note that FLEX A , FLEX B and FLEX C have an increasing number of constraints and so this tests both whether the higher complexity models also result in better model performance and how the predictive uncertainty is affected by increased complexity and model realism. To investigate how well the hydrographs match the observed hydrograph, the simulated 95% uncertainty intervals were generated and uncertainty was estimated as the area contained within the 95% uncertainty intervals.
To further study the effect of constraints on the performance and uncertainty of constrained but uncalibrated parameter sets, three benchmark models were considered in which the aforementioned constraints were not applied. This simply means the models can produce any possible output without any restriction on parameters, fluxes and states. However the percentages of each landscape for model FLEX B and FLEX C remains intact.

Evaluating models with constrained and calibrated parameter sets
In the second step, the three models FLEX A , FLEX B and FLEX C were calibrated while being constrained to the parameter space that satisfies all of the imposed parameter and process constraints. Calibration was performed using a multi-objective strategy (E NS , E NS,log , E NS,FDC ), and the obtained Pareto optimal model parameters are referred to as constrained and calibrated. Uncertainty intervals were evaluated based on the constrained and calibrated Pareto members and the uncertainty was estimated on the basis of the area within the uncertainty bands.
Again, the results were compared to the calibrated but unconstrained benchmarks.

Comparison of model performance and uncertainty for constrained but uncalibrated and constrained and calibrated parameter sets
To assess the added value of incorporating constraints in higher complexity models, the performance and uncertainties of the three models FLEX A , FLEX B and FLEX C were compared for both the constrained but uncalibrated and the constrained and calibrated case during calibration and validation periods.

Comparison of modeled hydrograph components for different model structures
One of the main reasons for imposing constraints on model parameters is to ensure realistic internal dynamics. Comparing different fluxes contributing to the modeled hydrograph can provide insights into the performance of imposed constraints on the model. The effect of imposing behavioral constraints on fast and slow components of the three models structures, FLEX A , FLEX B and FLEX C is compared visually. The fast component of the lumped model, FLEX A , is compared with fast components of FLEX B that are wetland and remainder of catchment and fast components of FLEX C which are wetland and hillslope. This visual comparison is based on normalized average contribution of each component for Pareto optimal parameter sets in every time step.

Evaluating the performance of constrained but uncalibrated parameter sets
The median and the 95 % uncertainty intervals of the performance of modeled hydrographs for constrained but uncalibrated parameter sets is presented in Table 4 for the 2002-2005 calibration and 2006-2009 validation periods together with their (unconstrained) benchmarks. The lumped FLEX A model has only one parameter and one process constraint, i.e., the reservoir coefficient and the runoff coefficient, respectively. Hence, this model is free within the limits of this relatively weak condition, resulting in a wide range of feasible parameter sets, many of which cannot adequately reproduce the system response. As a consequence, the overall performance is poor (E NS,median = 0.18, E NS,log,median = 0.05, E NS,FDC,median = 0.39) ( Table 4, Fig. 3). FLEX B , run with the set of constrained but uncalibrated parameters shows a substantial improvement in overall performance (E NS,median = 0.56, E NS,log,median = 0.33, E NS,FDC,median = 0.87) compared to FLEX A , as FLEX B not only allows for more process heterogeneity but, more importantly, it is conditioned with an increased number of constraints.
The additional process heterogeneity and constraints allowed by FLEX C , results in the highest overall performance for all three objective functions (E NS,median = 0.66, E NS,log,median = 0.36, E NS,FDC,median = 0.93) ( Table 4, Fig. 3).
These results clearly illustrate that the imposed relational constraints force the model and its parameters towards a more realistic behavior, which significantly improves model performance. Additionally, the comparison of result of the three models with their unconstrained benchmarks (Table 4) clearly shows that the incorporation of constraints improves the median performance and 95 % uncertainty intervals of all the models by rejecting parameter sets that violate the constraints and cannot reproduce certain aspects of the response patterns. In addition, the comparison between the unconstrained benchmark models themselves suggests that more complex model structures improve the performance, implying that model structures themselves already contain a considerable degree of information even in absence of any constraints or calibration attempts.
The 95 % uncertainty areas mapped by simulated hydrographs indicate that FLEX C , which might be expected to produce the highest uncertainty interval due to its complexity, is providing a lower uncertainty compared to FLEX B . Although FLEX C cannot outperform FLEX A in terms of a narrower uncertainty interval in the validation period, overall performance of this model is better than FLEX A as discussed earlier (Table 4, Fig. 3).
Flipping calibration and validation gave equivalent results, which are, for brevity, provided in Table S1; Fig. S1 in the Supplement.

Evaluating the performance of constrained and calibrated parameter sets
The comparison of the constrained and calibrated model setups shows that all three models setups can reproduce the hydrograph similarly well (Table 5, Fig. 4). FLEX A exhibits a slightly better calibration performance, based on E NS,log,median , compared to the other two model setups. This can partly be attributed to the lower number of parameters which leads, with the same number of samples, to a more exhaustive sampling of the parameter space and a smoother identification of Pareto optimal solutions. In addition, FLEX A has the lowest number of imposed constraints, i.e., only the runoff coefficient and one parameter constraints, compared to FLEX B and FLEX C . This allows the model more freedom in exploiting the parameter space to produce mathematically good fits between observed and modeled system response in the calibration period. For the validation period, arguably more important for model assessment because it provides independent information on model consistency (cf. Klemeš, 1986;Andréassian et al., 2009;Euser et al., 2013) and predictive uncertainty, the performances of the three model setups exhibit quite different patterns (Table 5). The simplest model, the lumped FLEX A , is characterized by performance deterioration from calibration to validation. In contrast, FLEX B and FLEX C exhibit a performance improvement in the validation period. Although the increase in performance is subjected to the nature of the forcing and observed discharge data in calibration and validation period, and formally no meaningful comparison between Nash-Sutcliffe efficiencies for different periods can be made, these results nevertheless indicate that the more complex model structure together with its constraints performs in a more stable manner outside of the calibration period. When flipping the calibration and validation periods the difference between model performance in calibration and validation is not as strong (Table S2). A possible explanation could be that the observed data quality is not informative enough for calibration (period 2002-2005). Constraints then prevent the model to over-fit and thus enable the models to maintain a more reliable performance outside the calibration Table 4. The median model performances (in brackets their corresponding 95 % uncertainty intervals) and the area spanned by the 95 % uncertainty interval of hydrograph derived from uncalibrated parameter sets which satisfy the complete set of constraints for the three model setups FLEX A , FLEX B and FLEX C , for the three modeling objectives (E NS , E NS,log , E NS,FDC ) in the calibration (2002)(2003)(2004)(2005) and validation (2006)(2007)(2008)(2009)   period. In contrast, if the calibration period is informative (period 2006-2009), constraints may not affect performance outside the calibration period that much. However constraints remain necessary to reduce model uncertainty both during calibration and validation. In addition to formal performance and uncertainty measures, it can be seen visually in Fig. 4 that FLEX C can adequately predict the high flows during a dry period, while FLEX A misses most of the peaks.

Comparison of constrained but uncalibrated and constrained and calibrated models
The following comparison of the performances of FLEX A , FLEX B and FLEX C for constrained but uncalibrated, constrained and calibrated and their unconstrained benchmarks is focused on E NS only, for the reason of brevity (Fig. 5, gray box plots indicate the benchmark models). In Fig. 5a and b the model performances based on the constrained but  uncalibrated parameter sets, that satisfy the full set of constraints, are shown for the calibration and validation periods. As discussed in detail above, although uncalibrated, increasing the number of constraints from FLEX A to FLEX C increases the overall performance of the models while reducing uncertainty ( Fig. 5c and d; note that these are zoom-ins). Further, comparison to the uncalibrated benchmarks, suggests that improving the model structure based on landscape units in itself substantially increases the performance of the model. However additional constraints will eventually reduce the uncertainty and improve the performance ( Fig. 5a and b). Figure 5e compares model performance based on constrained and calibrated parameter sets for the calibration period. When comparing the individual model performances of the constrained and calibrated models during the validation period (Fig. 5f), it can be seen that FLEX A not only shows the strongest performance deterioration compared to the calibration period but also that FLEX A is also the model with the poorest performance in the validation period. This implies that although FLEX C is the most complex model, the realism constraints together with landscape related structure imposed on this model generate the most reliable outputs when used for prediction, i.e., in the validation period. When the calibration and validation periods are switched, the performance of FLEX C remains comparable to above, although in this case FLEX A performs best during validation (see Fig. S5). This strongly underlines that the widely accepted notion of complex models necessarily being subject to higher predictive uncertainty is not generally valid when the feasible parameter space can be constrained based on assumptions of realistic functionality of a catchment. As explained earlier, this also indicates that when the data of the calibration period are not sufficiently informative, imposing constraints will force the model to perform better outside the calibration period. In addition, a second crucial aspect was revealed by comparing constrained but uncalibrated and constrained and calibrated models. It can be seen that constrained but uncalibrated FLEX C , shows significant improvement in performance approaching the performance of the calibrated lumped model, FLEX A . Interestingly, it was found that in validation the constrained but uncalibrated FLEX C can, depending on the performance measure used and the information content of the calibration period (i.e., climatic variability and data quality), reach the performance level of the constrained and calibrated FLEX A (Figs. 5 and S5). This highlights the value of semi-and non-quantitative hydrological expert knowledge for finding suitable model parameter sets for ungauged basins.

Comparison of flow contributions from different model components
The comparison of the fluxes generated from the individual model components in the three model setups helps to assess to which degree the model internal dynamics reflect the modeler's perception of the system and thus to a certain degree the realism of the models. Fast and slow responses of each tested model setup have been visually illustrated in Fig. 6. Predominance of slow responses of all the three models are indicated by green color; predominance of fast responses of FLEX A , fast responses of the remainder of the catchment of FLEX B and fast responses of hillslope of FLEX C are indicated by red color; wetland fast responses of FLEX B and FLEX C are indicated by predominance of blue color.
The colors in Fig. 6 are an illustration using three colors (red, green and blue) for the models' responses based on their weight of contribution to the modeled runoff. As it can be seen in Fig. 6a the fast component of FLEX A is dominant just during peak flows and even the recession shortly after peak flows are accounted for mainly by ground water. Analysis of the individual model components computed by Pareto optimal parameter sets (not shown here for brevity), indicates that some Pareto optimal parameters can generate peak flows by predominant contributions from slow responses while fast reaction tends to be inactive during these events.
In accordance with the perception of the system that wetlands are predominantly responsible for peak flows during dry conditions, Fig. 6b and c show that wetland fast responses in FLEX B and FLEX C control the rapid response during wetting up periods (dry to wet transition), before hillslope fast processes become more important at higher moisture levels. When the system is saturated the hillslope contribution to modeled runoff becomes significantly higher compared to the wetland response. Note that the response of the wetland may not correspond well to individual events, as a consequence of the fact that the corresponding constraint was set for an aggregated period.

Wider implications
The results of this study quite clearly indicate that discretizing the catchment into hydrological response units (HRUs) and incorporating expert knowledge in model development and testing is a potentially powerful strategy for runoff prediction, even where insufficient data for model calibration (e.g., Koren et al., 2003;Duan et al., 2006;Winsemius et al., 2009) or only comparatively unreliable regionalization tools are available (e.g., Wagener and Wheater, 2006;Bárdossy, 2007;Parajka et al., 2007;Oudin et al., 2008;Laaha et al., 2014). It was found that the performance and the predictive power of a comparatively complex uncalibrated conceptual model, based on posterior parameter distributions obtained merely from relational, semi-and non-quantitative realism constraints inferred from expert knowledge, can approach or even be as efficient as the calibration of a lumped conceptual model (Figs. 5 and S3).
Typically it is expected that, if not warranted by data, models with higher complexity suffer from higher predictive uncertainty. As stated by Beven (2001): "More complexity means more parameters, more parameters mean more calibration problems, more calibration problems will often mean more uncertainty in the predictions, particularly outside the range of the calibration data." Thus, more parameters would allow better fits of the hydrograph but would not necessarily imply a better and more robust understanding of catchment behavior or more reliable predictions.
A complex model may include many processes, i.e., hypotheses, which can usually not be rigorously tested with the available data. However, a wide range of previous studies has demonstrated that hydrologically meaningful constraints can help to limit the increased uncertainty caused by incorporating additional processes, i.e., parameters (e.g., Yadav et al., 2007;Zhang et al., 2008;Kapangaziwiri et al., 2012). These studies generally include a large set of catchments and try to relate model parameters to catchment characteristics. Although regional constraints are important, the importance of expert knowledge on the catchment scale, which leads to better understanding of hydrological behavior is highlighted in this study.
In a similar attempt, Pokhrel et al. (2008Pokhrel et al. ( , 2012 demonstrated use of regularization for model parameters and reduction of model parameter space dimensionality by linking model parameters using super-parameters to catchment characteristics. However, no explicit hydrological reasoning is typically applied for such "regularization rules" (e.g., Pokhrel et al., 2012). On the other hand, Kumar et al. (2010Kumar et al. ( , 2013a parameterize and successfully regionalize their models using empirical transfer functions with global parameters, developed from extensive literature study and iterative testing in a large sample of catchments In contrast, the use of relational parameter and process constraints, as presented in this study, is based on semi-quantitative, hydrologically explicit and meaningful reasoning avoiding the need for empirical transfer functions to link catchments characteristics and model parameters. Including prior knowledge for parameters of physically based models for estimating runoff in ungauged basins was quite successfully investigated in the past (e.g., Ott and Uhlenbrook, 2004;Vinogradov et al., 2011;Fang et al., 2013;Semenova et al., 2013). These studies specifically indicate that calibration can be replaced by prior information which is a significant contribution to Predictions in Ungauged Basins (PUB). While physically based models need detailed information of catchment behavior for model parameters, the here proposed semi-distributed conceptual modeling framework, exploiting relational constraints, can be more efficiently setup using the least prior information necessary. In this study, the performances and uncertainties of the three tested model setups for constrained but uncalibrated parameters indicate the potential of the presented FLEX-TOPO framework for Predictions in Ungauged Basins (PUB). Hence, this framework can efficiently use expert knowledge for improving model parameter value selection in complex conceptual hydrological models, not only to increase model performance but also to reduce model predictive uncertainty even in the absence of calibration.
It should be kept in mind that the conclusions of this study remain at this point only valid for the study catchment. To generalize the findings of this study more rigorous tests should be set up (Andréassian et al., 2009) which expand this presented concept for different time series of a catchment and also a larger set of catchments such as in recent work of Gao et al. (2014) and Hrachowitz et al. (2014). Some further challenges remain, including the need to formulate generic constraints for any catchment based on available data in an automated procedure. Likewise it will be necessary to develop a better understanding of model sensitivities to different constraints and of the effectiveness and reliability of individual constraints. It is also emphasized that the constraints introduced in this study are based on the authors' subjective understanding of catchment behavior and can and should be discussed further. However, we would like to stress the notion that reaching an agreement on the relations between parameters and fluxes in different landscape units is potentially much easier than finding the most adequate parameter values together with associated uncertainties for a conceptual model based on field observations or available data on geology or soil types.

Conclusions
This study has tested whether a topography-driven semidistributed formulation of a catchment-scale conceptual model, conditioned by expert knowledge based relational parameter and process constraints, can increase the level of process realism and predictive power while reducing the need for calibration.
It was found that 1. The performance of models, although uncalibrated, improves by accounting for different topography-based hydrological response units, even if this introduces additional complexity.
2. Imposing relational parameter and process constraints improves the performance of uncalibrated models and reduces their uncertainty. This illustrates the potential value of the combined use of higher complexity models and relational constraints for prediction in ungauged basins, where no time series are available for model calibration.
3. Due to the reduced feasible parameter space, the search for behavioral parameter sets focuses on the feasible parameter space only.
4. Imposing constraints prevents the model from overfitting on calibration time series and therefore enables the model to more reliably perform outside the calibration period.
The Supplement related to this article is available online at doi:10.5194/hess-18-4839-2014-supplement.