A simple topography-driven and calibration-free runoff generation module

Reading landscapes and developing calibrationfree runoff generation models that adequately reflect land surface heterogeneities remains the focus of much hydrological research. In this study, we report a novel and simple topography-driven runoff generation parameterization – the HAND-based Storage Capacity curve (HSC), which uses a topographic index (HAND, Height Above the Nearest Drainage) to identify hydrological similarity and the extent of saturated areas in catchments. The HSC can be used as a module in any conceptual rainfall–runoff model. Further, coupling the HSC parameterization with the mass curve technique (MCT) to estimate root zone storage capacity (SuMax), we developed a calibration-free runoff generation module, HSC-MCT. The runoff generation modules of HBV and TOPMODEL were used for comparison purposes. The performance of these two modules (HSC and HSC-MCT) was first checked against the data-rich Bruntland Burn (BB) catchment in Scotland, which has a long time series of fieldmapped saturation area extent. We found that HSC, HBV and TOPMODEL all perform well to reproduce the hydrograph, but the HSC module performs better in reproducing saturated area variation, in terms of correlation coefficient and spatial pattern. The HSC and HSC-MCT modules were subsequently tested for 323 MOPEX catchments in the US, with diverse climate, soil, vegetation and geological characteristics. In comparison with HBV and TOPMODEL, the HSC performs better in both calibration and validation, particularly in the catchments with gentle topography, less forest cover, and arid climate. Despite having no calibrated parameters, the HSC-MCT module performed comparably well with calibrated modules, highlighting the robustness of the HSC parameterization to describe the spatial distribution of the root zone storage capacity and the efficiency of the MCT method to estimate SuMax. This novel and calibration-free runoff generation module helps to improve the prediction in ungauged basins and has great potential to be generalized at the global scale.

Abstract. Reading landscapes and developing calibrationfree runoff generation models that adequately reflect land surface heterogeneities remains the focus of much hydrological research. In this study, we report a novel and simple topography-driven runoff generation parameterization -the HAND-based Storage Capacity curve (HSC), which uses a topographic index (HAND, Height Above the Nearest Drainage) to identify hydrological similarity and the extent of saturated areas in catchments. The HSC can be used as a module in any conceptual rainfall-runoff model. Further, coupling the HSC parameterization with the mass curve technique (MCT) to estimate root zone storage capacity (S uMax ), we developed a calibration-free runoff generation module, HSC-MCT. The runoff generation modules of HBV and TOPMODEL were used for comparison purposes. The performance of these two modules (HSC and HSC-MCT) was first checked against the data-rich Bruntland Burn (BB) catchment in Scotland, which has a long time series of fieldmapped saturation area extent. We found that HSC, HBV and TOPMODEL all perform well to reproduce the hydrograph, but the HSC module performs better in reproducing saturated area variation, in terms of correlation coefficient and spatial pattern. The HSC and HSC-MCT modules were subsequently tested for 323 MOPEX catchments in the US, with diverse climate, soil, vegetation and geological characteristics. In comparison with HBV and TOPMODEL, the HSC performs better in both calibration and validation, particularly in the catchments with gentle topography, less forest cover, and arid climate. Despite having no calibrated parameters, the HSC-MCT module performed comparably well with calibrated modules, highlighting the robustness of the HSC parameterization to describe the spatial distribution of the root zone storage capacity and the efficiency of the MCT method to estimate S uMax . This novel and calibration-free runoff generation module helps to improve the prediction in ungauged basins and has great potential to be generalized at the global scale.
ing a simple, calibration-free, but robust runoff generation module has been, and continues to be, an essential pursuit of hydrological modelers. Although we have made tremendous advances in enhancing our ability in prediction in ungauged basins (PUB) (Sivapalan et al., 2003;Blöschl et al., 2013;Hrachowitz et al., 2013), it is not uncommon that models become increasingly complicated in order to capture the details of hydrological processes shown by empirical studies Sivapalan, 2009;Yu et al., 2014). More detailed process conceptualization normally demands higher data requirements than our standard climatological and hydrological networks can provide, leading to more calibrated parameters and a probable increase in model uncertainty (Sivapalan, 2009).
In detailed distributed hydrological bottom-up models, connectivity emerges from the interplay of topography, soil type, and water table depth. For example, TOPMODEL (Beven and Kirkby, 1979;Beven and Freer, 2001) uses the topographic wetness index (TWI) to distinguish hydrologic similarity; and SHE (Abbott et al., 1986) and tRIBS (Ivanov et al., 2004;Vivoni et al., 2005) use partial differential equations to describe the water movement based on pressure gradients obtained by topography; and the representative elementary watershed (REW) approach divides catchments into a number of REWs to build balance and constitutive equations for hydrological simulation (Reggiani et al., 1999;Zhang and Savenije, 2005;Tian et al., 2008). As the relevant model parameters such as local topographic slope and hydraulic conductivity can, in spite of several unresolved issues for example relating to the differences in the observation and modeling scales (e.g., Beven, 1989;Zehe et al., 2014), be obtained from direct observations, they could in principle be applied without calibration.
Zooming out to the macro-scale, top-down models, in contrast, are based on emergent functional relationships that integrate system-internal heterogeneity (Sivapalan, 2005). These functional relationships require parameters that are ef-fective on the modeling scale and that can largely not be directly determined with small-scale field observations (cf. Beven, 1995) and thus traditionally determined by calibration. However, frequently the number of observed variables for model calibration is, if available at all, limited to time series of streamflow. The absence of more variables to constrain models results in such models being ill-posed inverse problems. Equifinality in parameterization and in the choice of parameters then results in considerable model uncertainty (e.g., Beven, 1993Beven, , 2006. To limit this problem and to also allow predictions in the vast majority of ungauged catchments, it is therefore desirable to find ways to directly infer effective model parameters at the modeling scale from readily available data (Hrachowitz et al., 2013).
The component that is central for establishing connectivity in most top-down models is the soil moisture routine. Briefly, it controls the dynamics of water storage and release in the unsaturated root zone and partitions water into evaporative fluxes, groundwater recharge, and fast lateral storm flow generating runoff (Gao et al., 2018a;Shao et al., 2018), the latter of which is critical from the aspect of connectivity. In the majority of regions, Hortonian overland flow (HOF, i.e., infiltration excess overland flow) is of minor importance (Dunne and Black, 1970;Sklash and Farvolden, 1979;Beven, 2004;Burt and McDonnell, 2015), even in arid regions where often most locally generated HOF is re-infiltrated while flowing on hillslopes (Liu et al., 2012;Zhao et al., 2019) and never reaches the stream channel network. Thus the term saturation excess flow (SEF) can represent, depending on the model and the area of application, different processes, such as saturation overland flow, preferential flow, flow through shallow, highpermeability soil layers, or combinations thereof. The interplay between water volumes that are stored and those that are released laterally to the stream via fast, connected flow paths ("connectivity") is in most top-down models described by functions between water stored in the unsaturated root zone ("soil moisture") and the areal proportion of heterogeneous, local storage thresholds that are exceeded and thus "connected" (Zhao et al., 1980). In other words, in those parts of a catchment where the storage threshold is exceeded it will generate lateral flows, and can alternatively be interpreted as a runoff coefficient (e.g., Ponce and Hawkins, 1996;Perrin and Andreassian, 2001;Fenicia et al., 2007;Bergström and Lindström, 2015). Thus the idea goes back to the variable contributing area concept, assuming that only partial areas of a catchment, where soils are saturated and thus storage thresholds are exceeded, contribute to runoff (Hewlett, 1961;Dunne and Black, 1970;Hewlett and Troendle, 1975). Although originally developed for catchments dominated by saturation overland flow, the extension of the concept to subsurface connectivity, illustrating that surface and subsurface connectivity are "two sides of the same coin" (McDonnell, 2013), proved highly valuable for models such as Xinanjiang (Zhao et al., 1980), HBV (Bergström and Forsman, 1973;Bergström and Lindström, 2015), SCS-CN (Ponce and Hawkins, 1996;Bartlett et al., 2016), FLEX (Fenicia et al., 2008), and GR4J (Perrin and Andreassian et al., 2001).
Among these models, connectivity is formulated in a general form as C R = f (S U (t), S uMax , β), where C R is the runoff coefficient; i.e., the proportion of the catchment generating runoff, S U (t), is the catchment water content in the unsaturated root zone at any time t, S uMax is a parameter representing the total storage capacity in the unsaturated root zone, and β is a shape parameter, representing the spatial distribution of heterogeneous storage capacities in the unsaturated root zone. The parameters of these functions are typically calibrated. In spite of being the core component of soil moisture routines in many top-down models, little effort was previously invested to find ways to determine the parameters at the catchment scale directly from available data. An important step towards understanding and quantifying connectivity patterns directly based on observations was recently achieved by intensive experimental work in the Tenderfoot Creek catchments in Montana, US. In their work, Jencso et al. (2009) were able to show that connectivity of individual hillslopes in their headwater catchments is highly related to their respective upslope accumulated areas. Using this close relationship, Smith et al. (2013) successfully developed a simple top-down model with very limited need for calibration, emphasizing the value of "enforcing field-based limits on model parameters" (Smith et al., 2016). Based on hydrological landscape analysis, the FLEX-Topo model (Savenije, 2010) can dramatically reduce the need for calibration (Gharari et al., 2014), and holds considerable potential for spatial model transferability without the need for parameter re-calibration (Gao et al., 2014a;. In a recent development, several studies suggest that S uMax can be robustly and directly inferred from long-term water balance data by the mass curve technique (MCT) (Gao et al., 2014a;de Boer-Euser et al., 2016;Nijzink et al., 2016). The MCT is an engineering method for reservoir design, in which the reservoir size is estimated as a function of accumulated inflow and human water demand. The MCT treats the root zone as a reservoir, and estimates catchment-scale S uMax from measurable hydrometeorological data, without the need for further calibration. This leaves shape parameter β as the only free calibration parameter for soil moisture routines of that form. Topography is often the dominant driver of water movement caused by prevailing hydraulic gradients. More crucially, topography usually provides an integrating indicator for hydrological behavior, since topography is usually closely related to other landscape elements, such as soil vegetation climate and even geology Savenije, 2010;Rempe and Dietrich, 2014;Gao et al., 2014b;Maxwell and Condon, 2016;Gomes, 2016). The Height Above the Nearest Drainage (HAND; Rennó et al., 2008;Nobre et al., 2011;Gharari et al., 2011), which can be computed from readily available digital elevation models (DEMs), could potentially provide first-order estimates of groundwater depth, as there is some experimental evi-dence that with increasing HAND, groundwater depths similarly increase (e.g., Haria and Shand, 2004;Molenat et al., 2005Molenat et al., , 2008Shand et al., 2005;Condon and Maxwell, 2015;Maxwell and Condon, 2016). HAND can be interpreted as a proxy of the hydraulic head and is thus potentially more hydrologically informative than the topographic elevation above sea level (Nobre et al., 2011). Compared with the TWI in TOPMODEL, HAND is an explicit measure of a physical feature linking terrain to water-related potential energy for local drainage (Nobre et al., 2011). More interestingly, topographic structure emerges as a powerful force determining rooting depth under a given climate or within a biome ( Fig. 1), revealed by a global synthesis of 2200 root observations of >1000 species (Fan et al., 2017). This leads us to think from an ecological perspective to use the topographic information as an indicator of root zone spatial distribution without calibrating the β, and coupling it with the MCT method to estimate the S uMax , to eventually create a calibration-free runoff generation module.
In this study we are therefore going to test the hypotheses that (1) HAND can be linked to the spatial distribution of storage capacities and therefore can be used to develop a new runoff generation module (HAND-based Storage Capacity curve, i.e., HSC); (2) the distribution of storage capacities determined by HAND contains different information than the topographic wetness index; (3) the HSC together with water balance-based estimates of S uMax (MCT method) allow the formulation of calibration-free parameterizations of soil moisture routines in top-down models directly based on observations. All these hypotheses will be tested firstly in a small data-rich experimental catchment (the Bruntland Burn catchment in Scotland), and then apply the model to a wide range of larger MOPEX catchments (Model Parameter Estimation Experiment). This paper is structured as follows. In Sect. 2, we describe two of our proposed modules, i.e., HSC and HSC-MCT, and two benchmark models (HBV, TOPMODEL). This section also includes the description of other modules (i.e., interception, evaporation, and routing) in rainfall-runoff modeling, and the methods for model evaluation, calibration, and validation. Section 3 reviews the empirically based knowledge of the Bruntland Burn catchment in Scotland and the hydrometeorological and topographic datasets of MOPEX catchments in the US for model comparison. Sections 4 and 5 present the model comparison results. Section 6 interprets the relation between rainfall-runoff processes and topography, catchment heterogeneity, and simple models, and the implications and limitations of our proposed modules. The conclusions are briefly reviewed in Sect. 7.

Methods
Based on our perceptual model that saturation excess flow (SEF) is the dominant runoff generation mechanism in most cases, we developed the HAND-based Storage Capacity curve (HSC) module. Subsequently, estimating the parameter of root zone storage capacity (S uMax ) by the MCT method without calibration, the HSC-MCT was developed. In order to assess the performance of our proposed modules, two widely used runoff generation modules, i.e., the HBV power function and TOPMODEL module, were set as benchmarks.
Other modules, i.e., interception, evaporation, and routing, are kept with identical structure and parameterization for the four rainfall-runoff models (HBV, TOPMODEL, HSC, and HSC-MCT, whose names are from their runoff generation modules), to independently diagnose the difference among runoff generation modules (Clark et al., 2008(Clark et al., , 2011.

Two benchmark modules HBV power function
The HBV runoff generation module applies an empirical power function to estimate the nonlinear relationship between the runoff coefficient and soil moisture (Bergström and Forsman, 1973;Bergström and Lindström, 2015). The function is written as where A s (-) represents the contributing area, which equals the runoff coefficient of a certain rainfall event; S u (mm) represents the averaged root zone soil moisture; S uMax (mm) is the averaged root zone storage capacity of the studied catchment; β (-) is the parameter determining the shape of the power function. The prior range of β can be from 0.1 to 5. The S u and A s have a linear relation, while β equals 1. And the shape becomes convex while the β is less than 1, and the shape turns to concave while the β is larger than 1. In most situations, S uMax and β are two free parameters, cannot be directly measured at the catchment scale, and need to be calibrated based on observed rainfall-runoff data.

TOPMODEL module
The TOPMODEL assumes topographic information captures the runoff generation heterogeneity at the catchment scale, and the TWI is used as an index to identify rainfall-runoff similarity (Beven and Kirkby, 1979;Sivapalan et al., 1997). Areas with similar TWI values are regarded as possessing equal runoff generation potential. More specifically, the areas with larger TWI values tend to be saturated first and contribute to SEF; but the areas with lower TWI values need more water to reach saturation and generate runoff. The equations are written as follow: where D i (mm) is the local storage deficit below saturation at a specific location (i); D (mm) is the averaged water deficit of the entire catchment (Eq. 2), which equals (S uMax − S u ), as shown in Eq. (3). I TW_i is the local I TW value. I TW is the averaged TWI of the entire catchment. Equation (2) means in a certain soil moisture deficit condition for the entire catchment (D), the soil moisture deficit of a specific location (D i ), is determined by the catchment topography (I TW and I TW_i ), and the root zone storage capacity (S uMax ). Therefore, the areas with D i less than zero are the saturated areas (A s_i ), equal to the contributing areas. The integration of the A s_i areas (A s ), as presented in Eq. (4), is the runoff contributing area, which equals the runoff coefficient of that rainfall event.
(2)-(4) also allow us to obtain the contributing area (A s ) from the estimated relative soil moisture (S u /S uMax ) and then map it back to the original TWI map, which makes it possible to test the simulated contributing area by field measurement. It is worth mentioning that the TOPMODEL in this study is a simplified version, and not identical to the original one, which combines the saturated and unsaturated soil components.

HSC module
In the HSC module, we assume (1) primarily saturation excess flow as the dominant runoff generation mechanism; (2) the local root zone storage capacity has a positive and linear relationship with HAND, from which we can derive the spatial distribution of the root zone storage capacity; (3) rainfall firstly feeds local soil moisture deficit, and no runoff can be generated before local soil moisture being saturated. Figure 2 shows the perceptual HSC module, in which we simplified the complicated 3-D topography of a real catchment into a 2-D simplified hillslope. And then derive the distribution of root zone storage capacity, based on topographic analysis and the second assumption as mentioned in the preceding paragraph. Figure 3 shows the approach to derive the S u -A s relation, which is detailed as follows.
I. Generate HAND map. The HAND map, which represents the relative vertical distance to the nearest river channel, can be generated from a DEM (Gharari et al., 2011). The stream initiation threshold area is a crucial parameter, determining the perennial river channel network (Montgomery and Dietrich, 1989;Hooshyar et al., 2016), and significantly impacting the HAND values. In this study, the start area was chosen as 40 ha for the BB catchment to maintain a close correspondence to the observed stream network. And for the MOPEX catchments, the stream initiation area threshold is set as 500 grid cells (4.05 km 2 ), which fills in the range of stream initiation thresholds reported by others (e.g., Colombo et al., 2007;Moussa, 2008Moussa, , 2009. HAND maps were then calculated from the elevation of each raster cell above the nearest grid cell flagged as a stream cell following the flow direction (Gharari et al., 2011).
II. Generate normalized HAND distribution curve. Firstly, sort the HAND values of grid cells in ascending order. Secondly, divide the sorted HAND values evenly into n bands (e.g., 20 bands in this study), to make sure each HAND band has a similar area. The averaged HAND value of each band is regarded as the HAND value of that band. Thirdly, normalize the HAND bands, and then plot the normalized HAND distribution curve (Fig. 2b).
III. Distribute S uMax to each HAND band (S uMax_i ). As assumed, the normalized storage capacity of each HAND band (S uMax_i ) increases with HAND value (Fig. 2c). Based on this assumption, the unsaturated root zone storage capacity (S uMax ) can be distributed to each HAND band as S uMax_i (Fig. 3a). It is worth noting that S uMax needs to be calibrated in the HSC module, but free of calibration in the HSC-MCT module.
IV. Derive the S u −A s curve. With the number of s saturated HAND bands (Fig. 3a-c), the soil moisture (S u ) can be obtained by Eq. (5); and saturated area proportion (A s ) can be obtained by Eq. (6).
where S uMax_s is the maximum S uMax_i of all the saturated HAND bands. Subsequently, the A s -S u curve can be derived, and is shown in Fig. 3d.
The SEF mechanism assumes that runoff is only generated from saturation areas; therefore, the proportion of the saturation area is equal to the runoff coefficient of that rainfallrunoff event. Based on the S u -A s curve in Fig. 3d, generated runoff can be calculated from root zone moisture (S u ). The HSC module also allows us to map out the fluctuation of saturated areas by the simulated catchment average soil moisture. For each time step, the module can generate the simulated root zone moisture for the entire basin (S u ). Based on the S u -A s relationship ( Fig. 3d), we can map S u back to the saturated area proportion (A s ) and then visualize it in the original HAND map. Based on this conceptual model, we developed the computer program and created a procedural module. The technical roadmap can be found in Fig. 4.

HSC-MCT module
The S uMax is an essential parameter in various hydrological models (e.g., HBV, Xinanjiang, GR4J), which determines the long-term partitioning of rainfall into infiltration and runoff. Gao et al. (2014a) found that S uMax represents the adaption of ecosystems to local climate. Ecosystems may design their S uMax based on the precipitation pattern and their water demand. The storage is neither too small to be mortal in dry seasons, nor too large to consume excessive energy and nutrients. Based on this assumption, we can estimate the S uMax without calibration, by the MCT method, from climatological and vegetation information. More specifically, the average annual plant water demand in the dry season (S R ) is determined by the water balance and the vegetation phenology, i.e., precipitation, runoff, and seasonal NDVI. Subsequently, based on the annual S R , the Gumbel distribution (Gumbel, 1935), frequently used for estimating hydrological extremes, was used to standardize the frequency of drought occurrence. S R20yr , i.e., the root zone storage capacity required to overcome a drought once in 20 years, is used as the proxy for S uMax due to the assumption of a "cost" minimization strategy of plants as we mentioned above (Milly, 1994), and the fact that S R20yr has the best fit with S uMax . The S R20yr of the MOPEX catchments can be found in the map of Gao et al. (2014a).
Eventually, with the MCT approach to estimate S uMax and the HSC curve to represent the root zone storage capacity   spatial distribution, the HSC-MCT runoff generation module is created, without free parameters. It is worth noting that both the HSC-MCT and HSC modules are based on the HAND-derived S u -A s relation, and their distinction lays in the methods to obtain S uMax . So far, the HBV power function module has two free parameters (S uMax , β), while the TOP-MODEL and the HSC both have one free parameter (S uMax ). Ultimately the HSC-MCT has no free parameter.

Interception, evaporation and routing modules
Except for the runoff generation module in the root zone reservoir (S UR ), we need to consider other processes, including interception (S IR ) before the S UR module, evaporation from the S UR and the response routine (S FR and S SR ) after runoff generation from S UR (Fig. 5). Precipitation is firstly intercepted by vegetation canopies. In this study, the interception was estimated by a threshold parameter (S iMax ), set to 2 mm (Gao et al., 2014a), below which all precipitation will be intercepted and evaporated (Eq. 9) (de Groen and Savenije, 2006). For the S UR reservoir, we can either use the HBV beta-function (Eq. 12), the runoff generation module of TOPMODEL (Eqs. 2-4) or the HSC module (Sect. 2.3) to partition precipitation into generated runoff (R u ) and infiltration. The actual evaporation (E a ) from the soil equals the potential evaporation (E p ), if S u /S uMax is above a threshold (C e ), where S u is the soil moisture and S uMax is the catchment-averaged storage capacity. And E a linearly reduces with S u /S uMax , while S u /S uMax is below C e (Eq. 13). The E p can be calculated by the Hargreaves equation (Hargreaves and Samani, 1985), with maximum and minimum daily temperature as input. The generated runoff (R u ) is fur- Figure 5. Model structure and free parameters, involving four runoff generation models (HBV-type, TOPMODEL, HSC, and HSC-MCT). HBV-type has S uMax and beta two free parameters; TOPMODEL and HSC models have S uMax as one free parameter; and the HSC-MCT model does not have a free parameter. In order to simplify calibration process and make fair comparison, the interception storage capacity (S iMax ) was fixed as 2 mm. ther split into two fluxes, including the flux to the fast response reservoir (R f ) and the flux to the slow response reservoir (R s ), by a splitter (D) (Eq. 14). The delayed time from rainfall peak to the flood peak is estimated by a convolution delay function, with a delay time of T lagF (Eqs. 15, 16). Subsequently, the fluxes into two different response reservoirs (S FR and S SR ) were released by two linear equations between discharge and storage (Eqs. 18, 21), representing the fast response flow and the slow response flow mainly from groundwater reservoir. The two discharges (Q f and Q s ) generated the simulated streamflow (Q m ). The model parameters are shown in Table 1, while the equations are given in Table 2. More detailed description of the model structure can be referred to Gao et al. (2014b. It is worth underlining that the only difference among the benchmark HBV type, TOP-MODEL type, HSC, and HSC-MCT models is their runoff generation modules. Eventually, there are 7 free parameters in HBV model, 6 in TOPMODEL and HSC model, and 5 in the HSC-MCT model.

Model evaluation, calibration, validation, and model comparison
Two objective functions were used to evaluate model performance, since multi-objective evaluation is a more robust approach to quantifying model performance with different criteria than a single one. The Kling-Gupta efficiency (Gupta et al., 2009) (I KGE ) was used as the criterion to evaluate model Maximum interception capacity 2 S uMax (mm) * The root zone storage capacity (10, 1000) β (-) * * The shape of the storage capacity curve (0.01, 5) C e (-) Soil moisture threshold for reduction of evaporation (0.1, 1) D (-) Splitter to fast and slow response reservoirs (0, 1) T lagF (d) Lag time from rainfall to peak flow (0, 10) The slow recession coefficient (20, 400) * S uMax is a parameter in HBV, TOPMODEL, and the HSC model, but the HSC-MCT model does not have S uMax as a free parameter; * * β is a parameter in the HBV model, but not in the TOPMODEL, HSC, and HSC-MCT models. Table 2. The water balance and constitutive equations used in models. (Eq. (12) * is used in the HBV model, but is not used in the TOP-MODEL, HSC, and HSC-MCT models).

Reservoirs
Water balance equations Constitutive equations Interception reservoir Unsaturated reservoir Splitter and Lag function Fast reservoir performance and as an objective function for calibration. The equation is written as where r is the linear correlation coefficient between simulation and observation; α (α = σ m /σ o ) is a measure of relative variability in the simulated and observed values, where σ m is the standard deviation of simulated streamflow, and σ o is the standard deviation of observed streamflow; and ε is the ratio between the average value of simulated and observed data. And the I KGL (I KGE of the logarithmic flows) (Fenicia et al., 2007;Gao et al., 2014b) is used to evaluate the model performance in baseflow simulation. A multi-objective parameter optimization algorithm (MOSCEM-UA) (Vrugt et al., 2003) was applied for the calibration. The parameter sets on the Pareto-frontier of the multi-objective optimization were assumed to be the behavioral parameter sets and can equally represent model performance. The averaged hydrograph obtained by all the behavioral parameter sets were regarded as the simulated result of that catchment for further studies. The number of complexes in MOSCEM-UA were set as the number of parameters (seven for HBV, six for TOPMODEL and the HSC model, and five for the HSC-MCT model), and the number of initial samples was set to 210 and a total number of 50 000 model iterations for all the catchment runs. For each catchment, the first half period of data was used for calibration, and the other half was used to do validation.
In module comparison, we defined three categories: if the difference of I KGE of model A and model B in validation is less than 0.1, model A and B are regarded as "equally well". If the I KGE of model A is larger than model B in validation by 0.1 or more, model A is regarded as outperforming model B. If the I KGE of model A is less than model B in validation by −0.1 or less, model B is regarded as outperforming model A.

The Bruntland Burn catchment
The 3.2 km 2 Bruntland Burn catchment (Fig. 6), located in north-eastern Scotland, was used as a benchmark study to test the model's performance based on a rich data base of hydrological measurements. The Bruntland Burn is a typical upland catchment in northwestern Europe (e.g., Birkel et al., 2010), namely a combination of steep and rolling hillslopes and over-widened valley bottoms due to the glacial legacy of this region. The valley bottom areas are covered by deep (in parts >30 m) glacial drift deposits (e.g., till) containing a large amount of stored water superimposed on a relatively impermeable granitic solid geology (Soulsby et al., 2016). Peat soils developed (>1 m deep) in these valley bottom areas, which remain saturated throughout most of the year with a dominant near-surface runoff generation mechanism delivering runoff quickly via micro-topographical flow pathways connected to the stream network (Soulsby et al., 2015). Brown rankers, peaty rankers and peat soils are responsible for a flashy hydrological regime driven by saturation excess overland flow, while humus iron podzols on the hillslopes do not favor near-surface saturation but rather facilitate groundwater recharge through vertical water movement . Land use is dominated by heather moorland, with smaller areas of rough grazing and forestry on the lower hillslopes. Its annual precipitation is 1059 mm, with the summer months (May-August) generally being the driest (Ali et al., 2014). Snow makes up less than 10 % of annual precipitation and melts rapidly below 500 m. The evapotranspiration is around 400 mm per year and annual discharge around 659 mm. The daily precipitation, potential evaporation, and discharge data range from 1 January 2008 to 30 September 2014. The calibration period is from 1 January 2008 to 31 December 2010, and the data from 1 January 2011 to 30 September 2014 is used as validation.
The LiDAR-derived DEM map with 2 m resolution shows elevation ranging from 250 to 539 m (Fig. 6). There are seven saturation area maps (Fig. 7) (2 May, 2 July, 4 August, 3 September, 1 October, 26 November 2008, and 21 January 2009), measured directly by the "squishy boot" method and field mapping by the global positioning system (GPS), to delineate the boundary of saturation areas connected to the stream network (Birkel et al., 2010;Ali et al., 2014). These saturation area maps revealed a dynamic behavior of expanding and contracting areas connected to the stream network that were used as a benchmark test for the HSC module.

MOPEX catchments
The MOPEX dataset was collected for a hydrological model parameter estimation experiment Schaake et al., 2006), containing 438 catchments in the CONUS (Contiguous United States). The longest time series range from 1948 to 2003. 323 catchments were used in this study (see the name list in SI), with areas between 67 and 10 329 km 2 , and excluding the catchments with data records <30 years, impacted by snowmelt or with extreme arid climate (aridity index E p /P >2). In order to analyze the impacts of catchment characteristics on model performance, excluding hydrometeorology data, we also collected the datasets of topography, depth to rock, soil texture, land use, and stream density (Table 3). These characteristics help us to understand in which catchments the HSC performs better or worse than the benchmark models.

Hydrometeorology
The dataset contains the daily precipitation, daily maximum and minimum air temperature, and daily streamflow. The daily streamflow was used to calibrate the free parameters and validate the models.

Topography
The DEM of the CONUS in 90 m resolution was download from the Earth Explorer of United States Geological Survey (USGS, http://earthexplorer.usgs.gov/, last access: 25 April 2017). The HAND and TWI map can be generated from the DEM. The averaged elevation and HAND are used as two catchment characteristics.

Soil texture
In this study, soil texture is synthetically represented by the K factor, since the K factor is a lumped soil erodibility factor which represents the soil profile reaction to soil detachment (Renard et al., 2011). Generally, the soils (high in clay and sand) have low K values, and soils with high silt content have larger K values. The averaged K factor for each catchment was calculated from soil survey information available from USGS (Wolock, 1997).

Land use
Land use data was obtained from National Land Cover Database (NLCD, Wickham et al., 2014). Forest plays an essential role in hydrological processes (Gao et al., 2018a), es-pecially for the runoff generation (Brooks et al., 2010). Forest area proportion was utilized as an integrated indictor to represent the impact of vegetation cover on hydrological processes.

Stream density
Stream density (km km −2 ) is the total length of all the streams and rivers in a drainage basin divided by the total area of the drainage basin. Stream density data was obtained from Horizon Systems Corporation (http://www. horizon-systems.com/nhdplus/, last access: 25 April 2017).

Topography analysis
The generated HAND map, derived also from the DEM, is shown in Fig. 6, with HAND values ranging from 0 to 234 m. Based on the HAND map, we can derive the S u -A s curve (Fig. 8) by analyzing the HAND map with the method in Sect. 2.3. The TWI map of the BB (Fig. 6) was generated from its DEM. Overall, the TWI map, ranging from −0.4 to 23.4, mainly differentiates the valley bottom areas with the highest TWI values from the steeper slopes. This is proba-bly caused by the fine resolution of the DEM map in 2 m, as previous research found that the sensitivity of TWI to DEM resolution . From the TWI map, the frequency distribution function and the accumulative frequency distribution function can be derived (Fig. 8), with one unit of TWI as the interval.

Model performance
It is found that all three models (HBV, TOPMODEL, and HSC) can perform well in reproducing the observed hydrograph (Fig. 9). The I KGE of the three models are all around 0.66 in calibration, which is largely in line with other studies from the BB (Birkel et al., 2010. And the I KGL are 0.76, 0.72, and 0.74 for HSC, HBV, and TOPMODEL, respectively, in calibration, while in validation, the I KGE of the three models are also around 0.66, while I KGL are 0.75, 0.70, and 0.65 for the three models. Since the measured rainfallrunoff time series only lasts from 2008 to 2014, which is too short to estimate the S R20yr (proxy for S uMax ) by the MCT approach (which needs long-term hydro-meteorological observation data), the HSC-MCT model was not applied to this catchment. Figure 8 shows the calibrated power curve by HBV (averaged beta = 0.98) with the S u -A s curve obtained from the HSC module. We found the two curves are largely comparable, especially while the relative soil moisture is low. This result demonstrates that for the BB catchment with glacial drift deposits and combined terrain of steep and rolling hillslopes and over-widened valley bottoms, the HBV power curve can essentially be derived from the S u -A s curve of the HSC module merely by topographic information without calibration.
The normalized relative soil moisture of the three model simulations are presented in Fig. 9. Their temporal fluctuation patterns are comparable. Nevertheless, the simulated soil moisture by TOPMODEL has larger variation, compared with HBV and HSC (Fig. 9).

Contributing area simulation
The observed saturation area and the simulated contributing area from both TOPMODEL and the HSC are shown in Figs. 7, 9, and 10. We found that, although both modules overestimated the saturated areas, they can capture the temporal variation. For example, the smallest saturated area, both observed and simulated, occurred on 2 July 2008, and the largest saturated areas both occurred on 21 January 2009. Comparing the estimated contributing area of TOPMODEL with the HSC module, we found that the results of the HSC correlate better (R 2 = 0.60, I KGE = −3.0) with the observed saturated areas than TOPMODEL (R 2 = 0.50, I KGE = −3.4) (Fig. 10). For spatial patterns, the HSC contributing area is located close to the river network and reflects the spatial pattern of observed saturated area, while TOPMODEL results are more scattered, probably due to the sensitivity of TWI to  DEM resolution (Fig. 7). The HSC is more discriminating in terms of less frequently giving an unrealistic 100 % saturation and retaining unsaturated upper hillslopes.

Topography analysis of the contiguous US and 323 MOPEX catchments
To delineate the TWI map for the CONUS, the depressions of the DEM were firstly filled with a threshold height of 100 m (recommended by Esri). The TWI map of the CONUS is produced (Fig. S1 in the Supplement). Based on the TWI map of the CONUS, we clipped the TWI maps for the 323 MOPEX catchments with their catchment boundaries. And then the TWI frequency distribution and the accumulated frequency distribution of the 323 MOPEX catchments (Fig. S2), with one unit of TWI as an interval, were derived based on the 323 TWI maps. In Fig. 11, it is shown that the regions with large HAND values are located in Rocky Mountains and Appalachian Mountains, while the Great Plains has smaller HAND values. The Great Basin, especially in the Salt Lake Desert, has small HAND values, illustrating its low elevation above the nearest drainage, despite a high elevation above sea level. From the CONUS HAND map, we clipped the HAND maps for the 323 MOPEX catchments with their catchment boundaries. We then plot their HAND-area curves, following the procedures of I and II in Sect. 2.2. Figure 12a shows the normalized HAND profiles of the 323 catchments.
Based on the HAND profiles and the Step III in Sect. 2.2, we derived the normalized storage capacity distribution for all catchments (Fig. 12b). Subsequently, the root zone moisture and saturated area relationship (A s -S u ) can be plotted by the method in Step IV of Sect. 2.2. Lastly, reversing the curve of A s -S u to the S u -A s relation (Fig. 12c), the latter can be implemented to simulate runoff generation by soil moisture. Figure 12c interestingly shows that in some catchments, there is almost no threshold behavior between rainfall and runoff generation, where the catchments are covered by large areas with low HAND values and limited storage capacity. Therefore, when rainfall occurs, wetlands response quickly and generate runoff without a precipitation-discharge threshold relationship characteristic of areas with higher moisture deficits. This is similar to the idea of FLEX-Topo where the storage capacity is distinguished between wetlands and hillslopes, and on wetlands, with low storage capacity, where runoff response to rainfall is almost instantaneous.

Model performance
Overall, the performance of the two benchmark models, i.e., HBV and TOPMODEL, for the MOPEX data (Fig. 13) is comparable with the previous model comparison experiments, conducted with four rainfall-runoff models and four land surface parameterization schemes Kollat et al., 2012;Ye et al., 2014). The median value of I KGE of the HBV type model is 0.61 for calibration in the 323 catchments (Fig. 13), and averaged I KGE in calibration is 0.62. In validation, the median and averaged values of I KGE are kept the same as calibration. The comparable performance of models in calibration and validation demonstrates the robustness of benchmark models and the parameter optimization algorithm (i.e., MOSCEM-UA). The TOPMODEL improves the median value of I KGE from 0.61 (HBV) to 0.67 in calibration, and from 0.61 (HBV) to 0.67 in validation. But the averaged values of I KGE for TOPMODEL are slightly decreased from 0.62 (HBV) to 0.61 in both calibration and validation. The HSC module, by involving the HAND topographic information without calibrating the β parameter, improves the median value of I KGE to 0.68 for calibration and 0.67 for validation. The averaged values of I KGE in both calibration and validation are also increased to 0.65, comparing with HBV (0.62) and TOPMODEL (0.61). Furthermore, Fig. 13 demonstrates that, comparing with the benchmark HBV and TOPMODEL, not only the median and averaged values were improved by the HSC module, but also the 25th and 75th percentiles and the lower whisker end, all have been improved. The performance gains on baseflow (I KGL ) have been investigated and shown in the Fig. S3. These results indicate the HSC module improved model performance to reproduce hydrograph for both peak flow (I KGE ) and baseflow (I KGL ).
Additionally, for the HSC-MCT model, the median I KGE value is improved from 0.61 (HBV) to 0.65 in calibration, and from 0.61 (HBV) to 0.64 in validation, but is not as well performed as TOPMODEL (0.67 for calibration and validation). For the averaged I KGE values, they were slightly reduced from 0.62 (HBV) and 0.61 (TOPMODEL) to 0.59 for calibration and validation. Although the HSC-MCT did not perform as well as the HSC module, considering there is no free parameter to calibrate, the median I KGE value of 0.64 (HBV is 0.61) and averaged I KGE of 0.59 (TOP-MODEL is 0.61) are quite acceptable. In addition, the 25th and 75th percentiles and the lower whisker end of the HSC-MCT model are all improved compared to the HBV model. Moreover, the largely comparable results between the HSC and the HSC-MCT modules demonstrate the feasibility of the MCT method to obtain the S uMax parameter and the potential for HSC-MCT to be implemented in prediction of ungauged basins. Figure 14 shows the spatial comparisons of the HSC and HSC-MCT models with the two benchmark models. We found that the HSC performs "equally well" as HBV (the difference of I KGE in validation ranges −0.1-0.1) in 88 % catchments, and in the remaining 12 % of the catchments the HSC outperforms HBV (the improvement of I KGE in validation is larger than 0.1). In not a single catchment did the calibrated HBV outperform the HSC. Comparing the HSC model with TOPMODEL, we found in 91 % of the catch-  ments that the two models have approximately equal performance. In 8 % of the catchments, the HSC model outperformed TOPMODEL. Only in 1 % of the catchments (two in the Appalachian Mountains and one in the Rocky Mountains in California), TOPMODEL performed better.
In order to further explore the impact of catchment characteristics on model performance, we used topography (averaged HAND, averaged slope, and averaged elevation), soil (K factor), land cover (forest area proportion), climate (aridity index), stream density, and geology (depth to rock) information to test the impact of catchment features on model performance. Table 4 clearly shows that compared with HBV, the 39 catchments with better performance have lower HAND values (37 m), more gentle slopes (4.0 • ), and smaller forest area (22 %); while the elevation, K factor, aridity index, stream density and depth to rock are almost similar. Also, in the catchments where HSC outperformed TOP-MODEL, the catchments have smaller HAND (27 m), more gentle slopes (3.6 • ), moderate elevation (469 m), less forest proportion (14 %), and more arid climate (aridity index is 1.3). TOPMODEL performs better in only three catchments with larger HAND (193 m), steeper slopes (13.5 • ), higher elevation (740 m), more humid climate (aridity index is 0.8), and larger depth to rock (333 cm). In summary, the HSC showed better performance in catchments with gentle topography and more arid climate.
Without calibration of S uMax , as expected, the performance of the HSC-MCT module slightly deteriorates (Fig. 13). In comparison with HBV, the outperformed percentage reduced from 12 % (HSC) to 4 % (HSC-MCT), the approximately equally well-simulated catchments dropped from 88 % to 79 %, and the inferior performance increased from 0 % to 17 %. Also, in comparison with TOPMODEL, the better performance dropped from 8 % (HSC) to 7 % (HSC-MCT), the approximately equal catchments reduced from 91 % to 72 %, and the inferior performance increased from 1 % to 21 %. The inferiority of the HSC-MCT model is probably caused by the uncertainty of the MCT method for different ecosystems which have different survival strategies and use different return periods to bridge critical drought periods. By using ecosystem dependent return periods, this problem could be reduced (Wang-Erlandsson et al., 2016).
To further explore the reason for the better performance of the HSC approach, we selected the 08171000 catchment in Texas (Fig. 14), in which both the HSC module and the HSC-MCT module outperformed the two benchmark modules to reproduce the observed hydrograph (Fig. S4). The HBV model dramatically underestimated the peak flows, with I KGE as 0.54, while TOPMODEL significantly overestimated the peak flows, with I KGE as 0.30. The HSC-MCT model improved the I KGE to 0.71, and the HSC model further enhanced I KGE to 0.74.
Since the modules of interception, evaporation and routing are identical for the four models, the runoff generation  modules are the key to understand the difference in model performance. Figure S5 shows the HBV β curve and the S u -A s curve of the HSC model, as well as the TWI frequency distribution. We found that with a given S u /S uMax , the HBV β function generates less contributing area than the HSC model, which explains the underestimation of the HBV model. In contrast, TOPMODEL has a sharp and steep accumulated TWI frequency curve. In particular, the region with TWI = 8 accounts for 40 % of the catchment area, and over 95 % of the catchment areas are within the TWI ranging from 6 to 12. This indicates that even with low soil moisture content (S u /S uMax ), the contributing area by TOPMODEL is relatively large, leading to the sharply increased peak flows for all rainfall events.

Rainfall-runoff processes and topography
We applied a novel approach to derive the relationship between soil moisture storage and the saturated area from HAND. The areas with relatively low HAND values are saturated earlier than areas with higher HAND values, due to the larger storage capacity in higher HAND locations. The outperformance of the HSC over the benchmark HBV and TOPMODEL in gentle sloping catchments indicates that the HSC module likely has a higher realism than the calibrated HBV beta-function and the TWI of TOPMODEL in these regions. Very interestingly, Fan et al. (2017) presented an ecological observation in the global scale, and revealed the systematic variation of rooting depth along HAND (Fig. 1, in Fan et al., 2017). Since rooting depth can be translated to root zone storage capacity through combination with soil plant-available water (Wang-Erlandsson et al., 2016). This large sample dataset, from an ecological perspective, provides strong support for the assumption of the HSC model on gentle slopes, i.e., the increase in root zone storage capacity with HAND. More interestingly, on excessively drained uplands, rooting depth does not follow the same pattern, with shallow depth and limited to rain infiltration (Fig. 1, in Fan et al., 2017). This could explain the inferior performance of HSC model to TOPMODEL in three MOPEX catchments with excessively drained uplands (larger HAND, steeper slope, higher elevation, and deeper depth to rock), where Hortonian overland flow is likely the dominant mechanism, and the HSC assumption likely does not work well.  This indicates that comparing with TWI, the HAND is closer to catchment realism distinguishing hydrological similarity in gentle topography catchments. The HSC module assumes SEF as the dominant mechanism. But since in a real catchment different runoff generating processes may act simultaneously in different environments (McDonnell, 2013;Hrachowitz and Clark, 2017). Such SEF dominated catchments, or parts thereof, are typically characterized by a subdued relief and thus gently sloping. In steeper catchments, where the groundwater table is deeper and thus more additional water can be stored in the soil, another conceptual parameterization would be appropriate.
The FLEX-Topo model (Savenije, 2010) also uses HAND as a topographic index to distinguish between landscaperelated runoff processes and has both similarity and differences with the HSC model. The results of the HSC model illustrate that the riparian areas are more prone to be saturated, which is consistent with the concept of the FLEX-Topo model. Another important similarity of the two models is their parallel model structure. In both models it is assumed that the upslope area has larger storage capacity, therefore the upper land generates runoff less and later than the lower land. In other words, in most cases, the local storage is saturated due to the local rainfall, instead of flow from upslope. The most obvious difference between the HSC and the FLEX-Topo is the approach towards discretization of a catchment. The FLEX-Topo model classifies a catchment into various landscapes, e.g., wetlands, hillslopes, and plateau. This discretization method requires threshold values to classify landscapes, i.e., threshold values of HAND and slope, which leads to fixed and time-independent proportions of landscapes. The HSC model does not require landscape classification, which reduced the subjectivity in discretization and restricted the model complexity, as well as simultaneously allowing the fluctuation of contributing areas (termed as wetlands in FLEX-Topo).

Catchment heterogeneity and simple models
Catchments exhibit a wide array of heterogeneity and complexity with spatial and temporal variations of landscape characteristics and climate inputs. For example, the Darcy-Richards equation approach is often consistent with pointscale measurements of matrix flow, but not for preferential flow caused by roots, soil fauna, and even cracks and fissures (Beven and Germann, 1982;Zehe and Fluehler, 2001;Weiler and McDonnell, 2007). As a result, field experimentalists continue to characterize and catalog a variety of runoff processes, and hydrological and land surface modelers are developing more and more complicated models to involve the increasingly detailed processes . However, there is still no compelling evidence to support the outperformance of sophisticated "physically based" models in terms of higher equifinality and uncertainty than the simple lumped or semi-distributed conceptual models in rainfallrunoff simulation (Beven, 1989;Orth et al., 2015).
But evidence is mounting that a catchment is not a random assemblage of different heterogeneous parts (Sivapalan, 2009;Zehe et al., 2013), and conceptualizing heterogeneities does not require complex laws (Chase, 1992;Passalacqua et al., 2015). Parsimonious models (e.g., Perrin et al., 2003), with empirical curve shapes, likely result in good model performance. Parameter identifiability in calibration is one of the reasons. However, the physical rationale of these parsimonious models is still largely unknown, lacking a physical explanation to interpret these empirical curves described by mathematical functions (e.g., Eq. 3 in Perrin et al., 2003).
The benefits of the new HSC module are 2-fold. From a technical point of view, the HSC allows us to make prediction in ungauged basins without calibrating the beta parameter in many conceptual hydrological models. Furthermore, the HSC module, from a scientific point of view, provides us with a new perspective on the linkage between the spatial distribution patterns of root zone storage capacity (longterm ecosystem evolution) with associated runoff generation (event-scale rainfall-runoff generation).
Asking questions of "why" rather than "what" likely leads to more useful insights and a new way forward . The HSC module provides us with a rationale from an ecological perspective to understand the linkage and mechanism between large-sample hillslope ecological observations and the curve of root zone storage capacity distribution (Figs. 1, 2, and 3). Catchment is a geomorphological and even an ecological system whose parts are related to each other probably due to catchment self-organization and evolution (Sivapalan and Blöschl, 2015;Savenije and Hrachowitz, 2017). This encourages the hope that simplified concepts may be found adequate to describe and model the operation of the basin runoff generation process. It is clear that topography, with fractal characteristic (Rodriguez-Iturbe and Rinaldo, 1997), is often the dominant driver of runoff, as well as being a good integrated indicator for vegetation cover (Gao et al., 2014b), rooting depth (Fan et al., 2017), root zone evaporation and transpiration deficits (Maxwell and Condon, 2016), soil properties , and even geology (Rempe and Dietrich, 2014;Gomes, 2016). Therefore, we argue that increasingly detailed topographic information is an excellent integrated indicator allowing modelers to continue systematically represent heterogeneities and simultaneously reduce model complexity. The model structure and parameterization of both HSC and TOPMODEL are simple, but not oversimplified, as they capture likely the most dominant factor controlling runoff generation, i.e., the spatial heterogeneity of storage capacity. Hence, this study also sheds light on the possibility of moving beyond heterogeneity and process complexity , to simplify them into a succinct and a priori curve by taking advantage of catchment self-organization probably caused by co-evolution (Wang and Tang, 2014) or the principle of maximum entropy production (Kleidon and Lorenz, 2004).

Implications and limitation
The calibration-free HSC-MCT runoff generation module enhances our ability to predict runoff in ungauged basins. PUB is probably not a major issue in the developed world, with abundant comprehensive measurements in many places, but for the developing world it requires prediction with sparse data and fragmentary knowledge. Topographic information with high spatial resolution is freely available globally, allowing us to implement the HSC model in globalscale studies. In addition, thanks to the recent development, testing, and validation of remote sensing precipitation and evaporation products at large spatial scales (e.g., Anderson et al., 2011;Hu and Jia, 2015;Duan et al., 2019), the S uMax estimation has become possible without in situ hydro-meteorological measurements (Wang-Erlandsson et al., 2016). These widely accessible datasets make the globalscale implementation of the HSC-MCT module promising.
Although the new modules perform well in the BB and the MOPEX catchments, we do not intend to propose that "a model fits all". The assumption of HSC, to some extent, is supported by large-sample ecological field observation (Fan et al., 2017), but it never means that the A s -S u curve of HSC can perfectly fit the other existing curves (e.g., HBV and TOPMODEL). Unifying all model approaches into one framework is the objective of several pioneer works (e.g., Clark, et al., 2008;Fenicia et al., 2011), but is beyond the scope of this study. Moreover, while estimating the runoff coefficient by the A s -S u relation, rainfall in the early time may cause the increase in S u /S uMax and the runoff coefficient (Moore, 1985;Wang, 2018). Therefore, neglecting this influence factor, HBV (Eq. 1), TOPMODEL (Eqs. 2-4), and HSC (Eqs. 5-6) theoretically underestimate the runoff coefficient, which needs to be further investigated.
Finally, we should not ignore the limitations of the new module, although it has better performance and modeling consistency. (1) The threshold area for the initiating a stream was set as a constant value for the entire CONUS, but the variation of this value in different climate, geology and landscape classes (Montgomery and Dietrich, 1989;Helmlinger et al., 1993;Colombo et al., 2007;Moussa, 2008) needs to be future investigated. (2) The discrepancy between observed and simulated saturation area needs to be further investigated, by utilizing more advanced field measurement and simultaneously refining the model assumption. To our understanding, there are two interpretations. Firstly, the overestimation of the HSC model is possibly because two runoff generation mechanisms -SOF and the SSF occur at the same time. However, the saturated area observed by the "squishy boot" method (Ali et al., 2014), probably only distinguished the areas where SOF occurred. Subsurface stormflow, also contributing to runoff, cannot be observed by the "squishy boot" method. Thus, this mismatch between simulation and observation probably leads to this saturated area overestimation. The second interpretation might be the different definition of "saturation". The observed saturated areas are places where 100 % of soil pore volume is filled by water. But the modeled saturation areas are located where soil moisture is above field capacity, and not necessarily 100 % filled with water, which probably also results in the overestimation of saturated areas. Interestingly, in theory the observed saturated area should be within the simulated contributing area, due to the fact that the saturated soil moisture is always larger than field capacity. From this point of view, the observed saturated area is smaller and within the contributing area simulated by HSC, but TOPMODEL missed this important feature.
(3) Only the runoff generation module is calibration free, but the interception and response routines still rely on calibration. Although we kept the interception and response routine modules the same for the four models, the variation of other calibrated parameters (i.e., S iMax , D, K f , K s , T lagF ) may also influence model performance in both calibration and validation. (4) The computational cost of the HSC is more expensive than HBV, and similar to TOPMODEL, due to the cost of preprocessed topographic analysis. But once the S u -A s curve is completed, the computation cost is quite comparable with HBV.

Summary and conclusions
In this study, we developed a simple and calibration-free hydrological module (HAND-based Storage Capacity curve, HSC) based on a relatively new topographic index (HAND), which is not only an excellent physically based indictor for the hydraulic gradient, but also represents the spatial distribution of root zone storage capacity supported by large-sample ecological observations. Based on HAND spatial distribution pattern, the soil moisture (S u ) -saturated area (A s ) relation for each catchment was derived, which was used to estimate the A s of specific rainfall event based on continuous calculation of S u . Subsequently, based on the S u -A s relation, the HSC module was developed. Then, applying the mass curve technique (MCT) approach, we estimated the root zone storage capacity (S uMax ) from observable hydroclimatological and vegetation data, and coupled it with HSC to create the calibration-free HSC-MCT module. The HBV and TOPMODEL were used as two benchmarks to test the performance of HSC and HSC-MCT on both hydrograph simulation and ability to reproduce the contributing area, which was measured for different hydrometeorological conditions in the Bruntland Burn catchment in Scotland. Subsequently, 323 MOPEX catchments in the US were used as a large-sample hydrological study to further validate the effectiveness of our proposed runoff generation modules.
In the BB exploratory study, we found that HSC, HBV, and TOPMODEL performed comparably well in reproducing the observed hydrograph. Comparing the estimated contributing area of TOPMODEL with the HSC module, we found that the HSC module performed better in reproducing saturated area variation, in terms of the correlation coefficient and spatial patterns. This likely indicates that HAND is maybe a better indicator for distinguishing hydrological similarity than TWI.
For the 323 MOPEX catchments, HSC improved the averaged validation value of I KGE from 0.62 (HBV) and 0.61 (TOPMODEL) to 0.65. In 12 % of the MOPEX catchments, the HSC module outperforms HBV, and in not a single catchment did the calibrated HBV outperform the HSC. Comparing with TOPMODEL, the HSC outperformed in 8 % of the catchments, and in only 1 % of catchments TOPMODEL has a better performance. Interestingly, we found that the HSC module showed better performance in the catchments with gentle topography, less forest cover, and larger aridity index. Not surprisingly, the I KGE of the HSC-MCT model was slightly reduced to 0.59, due to the non-calibrated S uMax , but still comparably well performed as HBV (0.62) and TOP-MODEL (0.61). This illustrates the robustness of both the HSC approach to derive the spatial distribution of the root zone storage capacity (β) and the efficiency of the MCT method to estimate the root zone storage capacity (S uMax ).
Data availability. All the sources of data are mentioned in the content.
Author contributions. HG and HHGS designed research; HG performed research; CB, CS, DT and HG provided data, among which the dynamics of the saturation areas data in the BB was provided by CB, CS, and DT; HG analyzed data; CB was involved in the interpretation of some of the modeling work in the BB; HG, MH, and HHGS wrote the paper; CS and DT extensively edited the paper, and provided substantial comments and constructive suggestions for scientific clarification.
Competing interests. The authors declare that they have no conflict of interest.