Understanding dominant controls on streamflow spatial variability to set up a semi-distributed hydrological model: the case study of the Thur catchment

This study documents the development of a semidistributed hydrological model aimed at reflecting the dominant controls on observed streamflow spatial variability. The process is presented through the case study of the Thur catchment (Switzerland, 1702 km2), an alpine and pre5 alpine catchment where streamflow (measured at 10 subcatchments) has different spatial characteristics in terms of amounts, seasonal patterns, and dominance of baseflow. In order to appraise the dominant controls on streamflow spatial variability and build a model that reflects them, we fol10 low a two-stage approach. In a first stage, we identify the main climatic or landscape properties that control the spatial variability of streamflow signatures. This stage is based on correlation analysis, complemented by expert judgement to identify the most plausible cause–effect relationships. In 15 a second stage, the results of the previous analysis are used to develop a set of model experiments aimed at determining an appropriate model representation of the Thur catchment. These experiments confirm that only a hydrological model that accounts for the heterogeneity of precipitation, 20 snow-related processes, and landscape features such as geology produces hydrographs that have signatures similar to the observed ones. This model provides consistent results in space–time validation, which is promising for predictions in ungauged basins. The presented methodology for model 25 building can be transferred to other case studies, since the data used in this work (meteorological variables, streamflow, morphology, and geology maps) are available in numerous regions around the globe.

alpine catchment where streamflow (measured at 10 subcatchments) has different spatial characteristics in terms of amounts, seasonal patterns, and dominance of baseflow. In order to appraise the dominant controls on streamflow spatial variability and build a model that reflects them, we fol-10 low a two-stage approach. In a first stage, we identify the main climatic or landscape properties that control the spatial variability of streamflow signatures. This stage is based on correlation analysis, complemented by expert judgement to identify the most plausible cause-effect relationships. In a second stage, the results of the previous analysis are used to develop a set of model experiments aimed at determining an appropriate model representation of the Thur catchment. These experiments confirm that only a hydrological model that accounts for the heterogeneity of precipitation, 20 snow-related processes, and landscape features such as geology produces hydrographs that have signatures similar to the observed ones. This model provides consistent results in space-time validation, which is promising for predictions in ungauged basins. The presented methodology for model 25 building can be transferred to other case studies, since the data used in this work (meteorological variables, streamflow, morphology, and geology maps) are available in numerous regions around the globe.
1 Introduction 30 Semi-distributed rainfall-runoff models are widely applied in operation for applications such as flood forecasting (e.g. Ajami et al., 2004) or developing sustainable irrigation practices (e.g. McInerney et al., 2018). The main purpose of these models is to simulate streamflow at a limited number of fixed 35 points along river channels (e.g. Boyle et al., 2001), and for this reason they are characterized by a coarser spatial resolution than fully distributed models, which allow a very detailed representation of the spatial variability of catchment processes. Compared to fully distributed models, they are 40 characterized by lower data and computational requirements, which represents a valuable practical advantage in their operational use.
Similarly to the case of lumped models, the parameters of semi-distributed models are estimated via calibration. 45 Therefore, it is important that the structure of these models is commensurate with the available data, including length, timescale, and spatial distribution (Wooldridge et al., 2001). However, semi-distributed models, even when used for sim-ilar applications such as streamflow predictions, differ significantly in terms of their process representation as well as the number of calibration parameters. For example, Gao et al. (2014) assume topography to be a dominant control on hydrological processes, whereas the SWAT model (Arnold et 5 al., 1998) emphasizes the role of soil properties. These differences raise the question of how to select an appropriate model structure for the data at hand, which requires understanding of the association between model parameters and the climatological and geomorphological characteristics of 10 the catchment.
Understanding the relationship between climate, landscape, and catchment response is a common objective of many research areas in hydrology, including comparative hydrology (e.g. Falkenmark and Chapman, 1989), model regionalization (e.g. Parajka et al., 2005), catchment classification (e.g. Wagener et al., 2007), and prediction in ungauged basins (e.g. Hrachowitz et al., 2013). In the case of streamflow, the attempt to explain its spatial variability is typically accomplished either using statistical approaches, which are 20 designed to regionalize selected characteristics of the hydrograph (streamflow signatures), or through hydrological models that account for relevant spatial information. In particular, statistical approaches such as regression analysis (e.g. Berger and Entekhabi, 2001;Bloomfield et al., 2009) and correla-25 tion analysis (e.g. Trancoso et al., 2017), or machine learning techniques like clustering (e.g. Sawicz et al., 2011;Toth, 2013;Kuentz et al., 2017), are used to group together catchments that present similar characteristics and to extrapolate the signatures where unknown. Such approaches have been 30 useful for quantifying the hydrological variability and identifying its principal drivers. However, they are often not designed to discover causality links and can be affected by multicollinearity, which arises when multiple factors are correlated internally and with the target variable (Kroll and Song, 35 2013).
By incorporating spatial information about meteorological forcings and landscape characteristics, semi-distributed hydrological models have the ability to mimic the mechanisms that influence hydrograph spatial variability. However, 40 identifying the relevant mechanisms is challenging. One possibility is to be as inclusive as possible in accounting for all the catchment properties that are, in principle, important in controlling catchment response. However, this approach leads to models that tend to be data demanding and contain 45 many parameters. For example, Gurtz et al. (1999) considered several landscape characteristics (elevation, land use, etc.) in their application of a semi-distributed model to the Thur catchment (Switzerland), which resulted in a model with hundreds of hydrological response units (HRUs) that 50 were defined a priori based on the complexity of the catchment. The other option is to try to identify the most relevant processes and neglect others in order to control model complexity. For example, Fenicia et al. (2016) compared various model hypotheses to determine an appropriate discretization 55 of the catchment in HRUs and appropriate structures for different HRUs. Antonetti et al. (2016) used a map of dominant runoff processes following Scherrer and Naef (2003) for defining HRUs. However, these approaches require a good experimental understanding of the area, which is not always 60 available.
Convincing model calibration-validation strategies are essential to provide confidence that the model ability to fit observations is a reflection of model realism and not a consequence of calibrating an overparameterized model (e.g. An-65 dréassian et al., 2009). A common approach for the calibration of semi-distributed models is the so-called "sequential" approach, where subcatchments are calibrated sequentially from upstream to downstream (e.g. Verbunt et al., 2006;Feyen et al., 2008;Lerat et al., 2012;De Lavenne et al., 70 2016). Although this approach may provide good fits and therefore has its practical utility where data are available, it does not provide understanding of the causes of streamflow spatial variability and results in models that are not spatially transferable. Moreover, such models are prone to con-75 tain many parameters, as each subcatchment would be represented by its own set of parameters. Alternative calibrationvalidation approaches that enable model validation not only in time but also in space are conceptually preferable, particularly when the modelling is used for process understanding or 80 prediction in ungauged locations (e.g. Wagener et al., 2004;Fenicia et al., 2016).
The objective of this study is to develop a semi-distributed hydrological model with the appropriate level of functional complexity to reproduce streamflow spatial variability in the 85 Thur catchment. For this purpose, we use a two-stage approach, the first one dedicated to an in-depth analysis of the available data and the second one focused on hydrological modelling.
Our specific objectives are to (1) explore the spatial vari-90 ability present in the Swiss Thur catchment regarding landscape characteristics, meteorological forcing, and streamflow signatures; (2) identify the main climate and landscape controls that explain the variability of the hydrological response; (3) based on this analysis, build a set of model experiments 95 aimed to test the relative importance of dominant processes and their effect on the hydrograph; and (4) appraise model assumptions against competing alternatives using a stringent validation strategy.
The paper is organized as follows: Sect. 2 presents the 100 study area and gives information about data availability; Sect. 3 illustrates the methodology; Sect. 4 shows the results; Sect. 5 analyses the results and puts them in perspective, showing what other similar studies have found; Sect. 6, finally, summarizes the main conclusions.

Study area
This study is carried out in the Thur catchment ( Fig. 1), located in the north-east of Switzerland, south-west of Lake Constance. With a total length of 127 km and a catchment area of 1702 km 2 , the Thur is the longest Swiss river, with-5 out any natural or artificial reservoir along its course. The Thur River is very dynamic, with streamflow values that can change by 2 CE1 orders of magnitude within a few hours (Schirmer et al., 2014).
The Thur catchment has been the subject of several stud-10 ies in the past: Gurtz et al. (1999) performed the first modelling study on the entire catchment using a semi-distributed hydrological model; Abbaspour et al. (2007) modelled hydrology and water quality using the SWAT model; Fundel et al. (2013) and Jorg-Hess et al. (2015) focused on low flows 15 and droughts; Jasper et al. (2004) investigated the impact of climate change on the natural water budget. Other modelling studies also include Melsen et al. (2014Melsen et al. ( , 2016, who investigated parameter estimation in data-limited scenarios and parameter transferability across spatial and temporal scales, and 20 Brunner et al. (2019), who studied the spatial dependence of floods. The Thur also includes a small-sized experimental subcatchment (Rietholzbach, called Mosnang in this paper after the name of the gauging station) that was the subject of many field studies at the interface between process un-25 derstanding and hydrological modelling (e.g. Menzel, 1996;Gurtz et al., 2003;Seneviratne et al., 2012;von Freyberg et al., 2014von Freyberg et al., , 2015. The topography of the catchment is presented in Fig. 1b; the elevation ranges between 356 m a.s.l. at the outlet and 2502 m a.s.l. at Mount Säntis. The majority of the catchment lies below 1000 m a.s.l. (75 %) and only 0.6 % is above 2000 m a.s.l. (Gurtz et al., 1999). Based on topogra-5 phy (Fig. 1b), the catchment can be visually subdivided into two distinct regions: the northern part, with low elevation and dominated by hills and flat land, and the southern part, which presents a mountainous landscape.
The land use (Fig. 1c) is dominated by pasture and 10 sparsely vegetated soil (60 %) and forest (25 %); urbanized and cultivated areas are located mainly in the north and cover 7 % and 4 % of the catchment respectively. Most of the catchment is underlain by conglomerates, marl incrustations, and sandstone (Gurtz et al., 1999). For the pur-15 pose of this study, the geological formations have been divided into three classes ( Fig. 1d): "consolidated", covering mainly the mountainous part of the catchment, "unconsolidated", located in the north, and "alluvial", located in the proximity of the river network, mainly in the plateau area; 20 the latter formation constitutes the main source of groundwater in the region (Schirmer et al., 2014). The soil depth (Fig. 1e) is shallower in the mountainous part of the catchment and deeper in the northern part.
Based on the availability of gauging stations (Table 1), the 25 catchment was divided into 10 subcatchments (Fig. 1a), with a total drained area that ranges between 3.2 km 2 (Mosnang) and 1702 km 2 (Andelfingen The raw maps (topography, land use, geology, and soil) are obtained from the Federal Office of Topography (swisstopo). The meteorological data are obtained from the Federal Office of Meteorology and Climatology (MeteoSwiss) CE3 . Precipi-40 tation and temperature are interpolated, as done in Melsen et al. (2016), with the WINMET pre-processing tool (Viviroli et al., 2009) using inverse distance weight (IDW) and detrended IDW respectively; while the first method considers only the horizontal variability (related to the distance from 45 the meteorological stations), the second adds a vertical component to the variability related to the elevation (Garen and Marks, 2001). PET data are then obtained, as done in Gurtz et al. (1999), starting from meteorological and land-use data, using the Penman-Monteith equation (Monteith, 1975), im-50 plemented as part of the PREVAH hydrological model (Viviroli et al., 2009). All these values are calculated at pixel (100 m) scale and then averaged over the subcatchments. All the time series are used at daily resolution in the subsequent analyses, aggregating the available hourly data. This choice 55 was influenced on the one hand by the need to limit the computational demand for the model experiments, for which a coarser temporal resolution is preferable, and on the other hand by the need to represent relevant hydrograph dynamics, for which finer temporal resolution is desirable (e.g. Kavetski 60 et al., 2011). A daily data resolution, although it may obscure subdaily process dynamics, appeared to be a good compromise, and it is a typical choice in distributed model applications at such spatial scales (e.g. Kirchner et al., 2004).

65
The methodology follows a two-stage approach. The first stage aims at determining the climatic and landscape controls on streamflow signatures. The second stage uses this understanding to configure the structure of a semi-distributed model, whose functional suitability is tested through a set of 70 model experiments. Section 3.1 describes the first stage of the analysis, that is, the identification of influence factors on the spatial variability of streamflow signatures. Section 3.2 describes the general structure of the semi-distributed model and the model evaluation approach. The design of the model 75 experiments, which is dependent on the outcomes of the first stage of analyses, is presented directly in the results (Sect. 4.2.2).

80
The purpose of the analysis presented in this section is to understand the influence of climatic conditions and landscape characteristics on streamflow. Climatic conditions are represented by precipitation, potential evaporation, and temperature time series. Landscape characteristics are represented by 85 maps of topography, land use, geology, and soil.
Climatic conditions, landscape characteristics, and streamflow are represented through a set of statistics (listed in Table 2). In the following, statistics calculated based on streamflow data will be called streamflow "signatures", as is often done in the catchment classification literature (e.g. Siva-5 palan, 2006). We will refer to climatic and landscape "indices" for statistics calculated based on climatic data and landscape characteristics. A broad list of signatures and indices is presented in Sect. 3.1.1; Sect. 3.1.2 presents the approach for reducing such a list to a set of meaningful vari-  Streamflow signatures (ζ ) and climatic indices (ψ) were obtained using streamflow, precipitation, PET, and temperature time series. The values were calculated using 24 years of data, between 1 September 1981 and 31 August 2005; we 20 considered 1 September to be the beginning of the hydrological year. The periods with gaps in the data (refer to Sect. 2 for details) were discarded from the analysis of the specific subcatchment. Landscape indices were obtained using the maps described in Sect. 2.  Table 3 in Addor et al., 2017). Here, we adopted their selection: while being originally introduced for a study about large-sample hydrology, 30 we believe that the indices proposed are also able to capture several different aspects of the time series and are therefore also suitable for this regional study. The streamflow signatures considered here are described hereafter, followed by an explanation of their rationale: 35 average daily streamflow ζ Q TS2 = q, where q is the streamflow time series and the overbar represents the average over the observation period; runoff ratio ζ RR = q p , where p is the precipitation time series;

40
streamflow elasticity (ζ EL ) defined as where q and p represent the streamflow and precipitation difference between two consecutive years and med is the median function;

45
slope of the flow duration curve (ζ FDC ) defined as the slope between the log-transformed 33rd and 66th streamflow percentiles; baseflow index ζ BFI = q (b) q , where q (b) represents the baseflow and was calculated using a low-pass filter as 50 illustrated in Ladson et al. (2013) with the equations with q (f ) t representing the quickflow. The settings of the filter were taken according to the findings of Ladson 55 et al. (2013) and, in particular, three filter passes were applied (forward, backward, and forward), the parameter ϑ b was chosen to be equal to 0.925, and a reflection of 30 time steps at the beginning and at the end of the time series was used;

60
mean half streamflow date (ζ HFD ) (Court, 1962), defined as the number of days needed in order to have a cumulated streamflow that reaches 50 % of the total annual streamflow; -5th and 95th percentiles of the streamflow (ζ Q5 and ζ Q95 65 respectively); frequency (ζ HQF ) and mean duration (ζ HQD ) of highflow events: they are defined as the days when the streamflow is bigger than 9 CE5 times the median daily streamflow;

70
frequency (ζ LQF ) and mean duration (ζ LQD ) of low-flow events: they are defined as the days when the streamflow is smaller than 0.2 times the mean daily streamflow.
The frequency of days with zero streamflow, present in Addor et al. (2017), was not considered in this study because 75 there are no ephemeral subcatchments in the study area. This group of streamflow signatures is capable of capturing various characteristics of the hydrograph: ζ Q measures the overall water flows, ζ RR represents the proportion of precipitation that becomes streamflow, ζ EL measures 80 the sensitivity of the streamflow to precipitation variations, with a value greater than 1 indicating an elastic subcatchment (i.e. sensitive to change in precipitation) (Sawicz et al., 2011), ζ FDC measures the variability of the hydrograph, with a steeper flow duration curve indicating a more vari-85 able streamflow, ζ BFI measures the magnitude of the baseflow component of the hydrograph and can be considered a proxy for the relative amount of groundwater flow in the hydrograph, ζ HFD measures the streamflow seasonality, ζ Q5 , ζ LQF , and ζ LQD measure low-flow dynamics, and ζ Q95 , ζ HQF , 90 and ζ HQD measure high-flow dynamics.
Climatology was represented through the following indices (see Addor et al., 2017, Table 2): average daily precipitation ψ P = p; Please note the remarks at the end of the manuscript.
Slope of the flow duration curve ζ BFI (-) Baseflow index ζ HDF (day of the year) Mean half streamflow date ζ Q5 (mm d −1 ) 5th percentile of the streamflow ζ Q95 (mm d −1 ) 95th percentile of the streamflow ζ HQF (d yr −1 ) Frequency of high-flow events ζ HQD (d) Mean duration of high-flow events ζ LQF (d yr −1 ) Frequency of low-flow events ζ LQD (d) Mean duration of low-flow events Climatic indices Frequency of high-precipitation events ψ HPD (d) Mean duration of high-precipitation events ψ HDS (-) Season with most high-precipitation events ψ LPF (d yr −1 ) Frequency of low-precipitation events ψ LPD (d) Mean duration of low-precipitation events ψ LPS (-) Season with most low-precipitation events Subcatchment characteristics Average slope ξ TS s (%) Fraction of the subcatchment with steep areas ξ TA s (%) Fraction of the subcatchment facing south ξ TA n (%) Fraction of the subcatchment facing north ξ TA ew (%) Fraction of the subcatchment facing east or west ξ SM (m) Average soil depth ξ SD (%) Fraction of the subcatchment with deep soil ξ LF (%) Fraction of the subcatchment with forest land use ξ LC (%) Fraction of the subcatchment with crop land use ξ LU (%) Fraction of the subcatchment with urbanized land use ξ LP (%) Fraction of the subcatchment with pasture land use ξ GA (%) Fraction of the subcatchment with alluvial geology ξ GC (%) Fraction of the subcatchment with consolidated geology ξ GU (%) Fraction of the subcatchment with unconsolidated geology average daily PET ψ PET = e pot , where e pot is the potential evapotranspiration time series; aridity index ψ AI = e pot p ; fraction of snow (ψ FS ), defined as the volumetric fraction of precipitation falling as snow (i.e. on days colder 5 than 0 • C); frequency (ψ HPF ) and mean duration (ψ HPD ) of highprecipitation events: they are defined as days when the precipitation is more than 5 times the mean daily precipitation;

10
season (ψ HPS ) with most high-precipitation events (defined as above); frequency (ψ LPF ) and mean duration (ψ LPD ) of dry days: they are defined as days when the precipitation is lower than 1 mm d −1 ; season (ψ LPS ) with most dry days (defined as above).
The seasonality of precipitation used in Addor et al. (2017) 5 was not considered in this study as it relied on fitting a sinusoidal function to the precipitation values, which in our case did not produce reliable results. Nevertheless, these climatological indices are able to comprehensively represent the climatic conditions of the subcatchment, with ψ P represent- 10 ing average water input, ψ PET representing average evaporative demand, ψ AI measuring the dryness of the climate, ψ FS measuring the relative importance of snow, ψ HPF , ψ HPD , and ψ HPS measuring the importance of intense precipitation events, and ψ LPF , ψ LPD , and ψ LPS measuring the importance 15 of dry days. The landscape characteristics were divided into four categories: topography, land use, soil, and geology. In order to quantify the characteristics of each category, a set of indices (ξ ) was defined. It is important to notice that all the 20 areas calculated in this analysis were normalized by the respective subcatchment area (ξ A ) in order to get comparable values between subcatchments of different sizes.
Topography was represented with the following indices, calculated based on the digital elevation model:
Land use was represented with the following characteristics, obtained by reclassifying the land-use map into four categories (from 22 original classes): fraction of the subcatchment with crop land use (ξ LC ); 35 fraction of the subcatchment with pasture land use (ξ LP ); fraction of the subcatchment with forest land use (ξ LF ); fraction of the subcatchment with urbanized land use (ξ LU ).

40
Soil type was represented with the following indices, derived by the soil map: fraction of the subcatchment with deep soil (soil depth greater than 2 m) (ξ SD ); average soil depth (ξ SM ). 45 Geology was represented by the following indices, obtained by reclassifying the original map into three categories (from 22 original classes): fraction of the subcatchment with alluvial geology (ξ GA );

50
fraction of the subcatchment with consolidated geology (ξ GC ); fraction of the subcatchment with unconsolidated geology (ξ GU ).
The reclassification of the land use and of the geology maps 55 consisted in aggregating specific classes into general classes (e.g. combining different types of forests into a unique forest class) with the objective of reducing their number, in order to facilitate subsequent analyses. The consideration of topography, land use, soil, and geology for defining landscape indices was based on their potential influence on hydrological processes and, in turn, on the shape of the hydrograph. These landscape characteristics can all play an important role in controlling hydrological processes: land use can, for example, influence the infiltration of water in the substrate; soil thickness can affect the partitioning between water storage and runoff; vegetation is typically assumed to affect evaporation, and geology can affect groundwater dynamics. Indeed, these characteristics are used 10 by many semi-distributed hydrological models, for example for determining parameter values or for dividing the catchment into areas with a homogenous hydrological response (e.g. Gurtz et al., 1999).

climatic indices, and catchment indices
The sets of statistics presented in Sect. 3.1.1 were designed to be comprehensive. However, they may also be redundant, for example by containing metrics that express similar characteristics of the underlying data. In order to facilitate subsequent 20 correlation analyses between the various sets of statistics, it is important to reduce each set to a short list of meaningful variables. The reduction of each set of streamflow signatures, climatic indices, and landscape indices was achieved through the following steps.

25
-All the statistics that did not show sufficient variability between the subcatchments were eliminated. We were in fact interested in identifying causes of spatial variability in the streamflow dynamics of the subcatchments of the Thur. Therefore, statistics that had a low variability were not of interest in this analysis. The variability was assessed using the coefficient of variation (defined by the ratio between the standard deviation and the average) and statistics with a value lower than 5 % were discarded.

35
-All the catchment indices (e.g. a certain type of land use) that account for a limited part of any subcatchment were discarded. This point was motivated by the expectation that landscape characteristics covering a very small fraction of the subcatchment should not have a 40 strong influence on the streamflow signatures considered. Here, landscape indices accounting for less than 5 % of any subcatchment area were discarded.
-Within each set of streamflow signatures, climatic indices, and catchment indices, we retained only relatively 45 independent metrics, if these are believed to represent the same underlying features of the time series. This step was motivated by the need to remove redundant information within each set. The selection of independent metrics was aided by Spearman's rank score be-50 tween each pair of metrics, which represents (also nonlinear) correlation between variables. Pairs of metrics with high absolute values of Spearman's rank score are potentially redundant. In eliminating potentially redundant variables, we adopted the following criteria.

55
-Among highly correlated metrics, we preferred those depending on single variables (e.g. only precipitation or only streamflow) to those containing multiple variables (e.g. combining precipitation and streamflow or evaporation, such as the runoff ratio 60 or the aridity index), as this may be a problem when looking for correlations between metrics. -With respect to landscape indices, in many cases the high correlation is due to the fact that they are complementary (the areal fractions sum up to unity). In 65 such cases, we kept one index per class (e.g. a single index for geology).
-A high correlation between metrics does not always mean that the metrics represent the same information. Therefore, the final selection of relevant metrics within 70 each set was guided by expert judgement.
Based on this process, we compiled a reduced list of signatures, climatic indices, and landscape indices, which was used in subsequent analyses.

on streamflow and consequences for model development
This analysis aimed to identify climatic and landscape indices that mostly control streamflow signatures. In order to identify causality links between indices (ψ and ξ ) and signa-80 tures (ζ ), we proceed as follows: we calculated the correlation between indices and signatures using Spearman's rank score and identified pairs of variables with high correlation; we scrutinized pairs of variables with high correlations 85 using expert judgement to decide whether a causality link between variables is justified.
The outcome of this process will be used to inform the semidistributed model setup. The expert judgement is a critical step in the elicitation of causality from correlation (e.g. An-90 tonetti and Zappa, 2018), and it is clearly subjective, being dependent on personal experience and subject matter knowledge. Although personal and subjective, expert decisions are based on an attempt to interpret data rather than being a priori defined, which is typically the case in the application of 95 semi-distributed hydrological models.

Semi-distributed model setup and model experiments
We assumed a generic structure for a semi-distributed hydrological model, described in Sect. 3.2.1, where some model 100 structure characteristics are defined a priori and others are to be defined. In order to motivate the open decisions, we proceeded as follows: we used the identified causality links to interpret the dominant processes influencing signature spatial vari-5 ability; we designed model experiments aimed to confirm the hypothesized climatic and landscape controls on streamflow spatial variability.
The overall objective of the model experiments is to prove 10 that only models that incorporate the correct dependencies are able to correctly predict regional streamflow variability.
In order to test this assumption, the model experiments will include cases where the assumed dependencies are not incorporated. Omitting an assumed dependency leads to a struc-15 turally simpler model, which may raise doubt that potential differences in model performance might be due to differences in model complexity. For this reason, the model experiments will include cases where alternative dependencies are incorporated, which do not reduce model complexity. In order to 20 keep the study and presentation tractable, the model experiments will be limited to a few cases, illustrated in Sect. 4.2.1, which we judge relevant for this specific application. Each HRU has a unique parameterization. However, depending on how the inputs are discretized, the same HRU can have 55 different states in different parts of the catchment. Therefore, the same HRU needs its own model representation whenever the spatial variability of states needs to be considered. For example, if the inputs are discretized per subcatchment, the same HRU needs a separate model representation in each 60 subcatchment where it is present. For more details about our model implementation of HRUs, refer to Fig. 4 of Fenicia et al. (2016).
In order to limit the levels of decisions of the semidistributed models, some of the aspects of the distributed 65 models are fixed a priori, and others are left open. In particular, the structure chosen to represent the various HRUs is kept fixed. That is, differences between HRUs will be reflected only through the parameter values.

70
-The definition of HRUs is left open. In particular, we do not a priori specify which approach is used to discretize the landscape.
-The spatial discretization of the model inputs is left open. Hence, we do not decide in advance which spa-75 tial discretization of the inputs is most appropriate.
Only the fixed decision about the HRU model structure is described here, whereas the open decisions are described in the results section (Sect. 4.2.2). The spatial organization of the model structure is represented in Fig. 6, with the 80 equations listed in Appendix A. The structure includes a snow reservoir (WR), with inputs distributed per subcatchments. Snowmelt and rainfall are input to an unsaturated reservoir (UR), which determines the portion of precipitation that produces runoff. This flux is split through a fast 85 reservoir (FR), designed to represent the peaks of the hydrograph, proceeded by a lag function to offset the hydrograph, and a slow reservoir (SR), designed to represent baseflow. This structure was chosen to be parsimonious while general enough to reproduce typical hydrograph behaviour; 90 it was tested in previous applications (e.g. van Esse et al., 2013;Fenicia et al., 2014Fenicia et al., , 2016, demonstrating its suitability for reproducing a wide range of catchment responses. It also resembles popular conceptual hydrological models such as HBV (Lindstrom et al., 1997) and HyMod (Wagener et 95 al., 2001), which have been shown to have wide applicability. The model was built using the SUPERFLEX modelling framework .
where z[y; λ] represents the Box-Cox transformation (Box and Cox, 1964) with parameter λ, which is used to account for heteroscedasticity (stabilize the variance). For λ = 0, The residual error term is assumed to follow a Gaussian distribution with zero mean and variance σ 2 : The error model has, therefore, two parameters (λ and σ 2 ); the first one was fixed to 0 where T represents the length of the time series, f N is the Gaussian probability density function (PDF) and z (q obs |θ E ) is the derivative of z(q obs , θ E ) with respect to q evaluated at the observed data q obs . Specifying Eq. (5) for the case where 30 z(q obs ; θ E ) is defined by Eq. (5), the expression of the likelihood function becomes Equation (8) represents the likelihood function that is then used, together with a uniform prior distribution, to calibrate 35 the parameters of the model as described in Sect. 3.2.3.

Calibration
Parameter calibration is performed with the objective of maximizing their posterior density. According to the Bayes equation, the posterior distribution of model parameters is ex-40 pressed as the product between the prior distribution and the likelihood function; since a uniform prior is used for the parameters, this is equivalent to maximizing the likelihood function in the defined parameter space; the optimization procedure is performed with a multi-start quasi-Newton 45 method  with 20 independent searches. We empirically established that with models of our complexity (about 10 parameters), 20 independent searches provide good confidence that a global optimum will be found.
The evaluation of the model ability to reproduce stream-50 flow is carried out in space-time validation (see also Fenicia et al., 2016). For this purpose, the time domain is divided into two periods of 12 years each (from 1 September 1981 to 1 September 1993 and from 1 September 1993 to 1 September 2005) and the subcatchments are split into two groups 55 (A and B), according to a spatial alternation (subcatchment in group A flows into a subcatchment in group B that flows into one in group A and so on); the subcatchments belonging to group A are Andelfingen, Herisau, Jonschwil, St. Gallen, and Wängi and the ones in group B are Appenzell, Frauen-60 feld, Halden, Mogelsberg, and Mosnang. This method implies a division of the space-time domain into four quadrants, such that the model can be calibrated in one quadrant and validated in the other three. For space-time validation, the model is calibrated using each group of subcatchment 65 and period and validated using the other group of subcatchment and period. That is, the model calibrated using group A and period 1 was validated using group B and period 2, and so on for the other three combinations of subcatchments and groups. The model output in the four space-time validation 70 periods is then combined to calculate model performance using various indicators (see Sect. 3.2.4). Results are presented for space-time validation, which represents the most challenging test of model performance.

Performance assessment 75
Model performance is assessed using the following metrics.
1. Time series metrics, which evaluate the ability to reproduce streamflow time series. The metrics used for this assessment are the following.
-Normalized log likelihood CE6 (F LL ), that is, the 80 logarithm of Eq. (8) normalized by the number of time steps present in the time series. This metric corresponds to the objective function used for model optimization. It can be observed that, since λ is fixed at 0.5 in the Box-Cox transformation, 85 model calibration is equivalent to maximizing the Nash-Sutcliffe efficiency (F NS ) calculated with the square root of the streamflow. F LL is not bounded, but a higher value means a better match between two time series since, in this case, the absolute 90 value of the residual is smaller and, thus, their PDF higher.
-Nash-Sutcliffe efficiency: which is often used in hydrological applications and provides a sense of the general quality of the simulations. F NS is bounded between −∞ and 1, with 5 1 meaning a perfect match.
2. Signature metrics, which determine the ability to reproduce the streamflow signatures (ζ ) selected using the procedure illustrated in Sect. 3.1.2. The agreement between simulated and observed signatures is assessed us-10 ing two metrics: Spearman's rank correlation (r) and the normalized root mean square error: TS3 While r assesses how well the simulated signatures can be described using a monotonic function,

15
F RMSE imposes a more stringent requirement, as it assesses how well the simulated and observed signatures line up on the diagonal line.  Ta-ble 3 together with the coefficient of variation. All the signatures have a coefficient of variability bigger than the thresh-40 old value of 5 %, with the most variable signature being ζ LQF (71 %) and the least variable ζ HQD (6 %). Therefore, none of these signatures was discarded. Figure 2 shows the correlations between the streamflow signatures: the lower triangle contains Spearman's rank cor-45 relation and the upper triangle the p value CE7 associated with the correlations. Based on correlations and on its interpretation, a subset of ζ can be defined as follows.
ζ Q , ζ RR and ζ EL are strongly correlated (r > 0.72). We retained ζ Q and discarded ζ RR and ζ EL because both 50 contain climatic information (precipitation) in their definition.
ζ BFI and ζ FDC are strongly correlated (r = −0.77). We decided to retain ζ BFI as it is of easier interpretation (it is a proxy for the importance of groundwater flow, 55 which is a potentially important process for the subsequent model development).
ζ HFD was kept because it measures the seasonality of the streamflow. Note that ζ HFD is strongly correlated with ζ Q (r = 0.88). However, they reflect differ-60 ent properties of the hydrograph. In particular, ζ HFD can be an useful indicator of the effect of snow-related processes.
ζ Q5 and ζ HQD were retained because they have low correlation (r < 0.71) with the other selected signatures 65 and because the first represents low flows and the second high flows; ζ Q95 , ζ HQF , ζ LQD , and ζ LQF were discarded because they all show correlations with the selected signatures.
In summary, the original set of streamflow signatures was 70 reduced to a set of five meaningful signatures, which will be used in the subsequent analyses: average daily streamflow (ζ Q ), baseflow index (ζ BFI ), half streamflow period (ζ HFD ), 5th percentiles of the streamflow (ζ Q5 ), and duration of high-flow events (ζ HQD ).

75
In terms of climatic indices, Table 4 shows their values together with the coefficient of variation. It can be seen that there are some indices that show very little or no variation at all and, therefore, they could already be excluded from the subsequent correlation analysis; they are ψ HPD (1 %), ψ HPS 80 (0 %), ψ LPF (4 %), ψ LPD (3 %), and ψ LPS (0 %). Figure 3 shows the correlation between the remaining indices. It can be observed that they all have strong internal correlation (r > 0.71). For this reason it was decided to retain only ψ P and ψ FS , as they have lower correlation. The 85 former represents an important term of the water budget, and the latter captures snow dynamics. Table 5 shows the values of the catchment characteristics considered in this study. All of them have a coefficient of  Table 2.  variation larger than the minimum threshold of 5 %. Therefore, none of them was excluded based on this criterion. The second criterion for the pre-exclusion of the catchments characteristics, consisting in removing ξ occupying less than 5 % of the subcatchments, led to the suppression of ξ LC (which 5 occupies 4 % of the subcatchment). Figure 4 shows the correlations between catchment characteristics; in many cases the high correlation is due to the fact that many indices are complementary (e.g. different types of geology). The following ξ were selected (one index 10 per class): ξ A because it is low correlated with the other features; ξ TE and ξ TA s in representation of the topography; ξ LF for the land use; ξ SD representing the soil characteristics; 15 ξ GC for the geology.
In summary, the original set of catchment indices was reduced to a set of six indices.

Selection of controlling factors on streamflow signatures
20 Figure 5 reports the results of Spearman's correlation between climatic indices plus catchment characteristics and streamflow signatures. Panel (a) contains Spearman's rank coefficients and panel (b) shows p values associated with them.

25
The following results can be noted.
-The three statistics average precipitation (ψ P ), fraction of snow (ψ FS ), and average elevation (ξ TE ) correlate strongly with average streamflow (ζ Q ) and seasonality (ζ HFD ) (r > 0.64 and p value < 0.05). This cor-30 relation can be interpreted as follows: subcatchments with high elevation (ξ TE ) tend to have higher precipitation (ψ P ) due to orographic effects, which leads to higher streamflow (ζ Q ). They also tend to have more snow (ψ FS ) due to lower temperatures, which influences 35 the seasonality (ζ HFD ).
-There are then some catchment characteristics that have no correlation (r < 0.45) with the streamflow signatures (catchment area (ξ A ) and land use (ξ LF )) or limited correlation (aspect (ξ TA s ) and deep soil (ξ SD ), with 40 r < 0.64).
-The consolidated geology (ξ GC ) presents a strong correlation (r = −0.87) only with the baseflow index (ζ BFI ); that is not captured by the other indices.
-The streamflow signatures of low and high flows (ζ Q5 45 and ζ HQD ) cannot be explained by any index, with little correlation only with ψ P and ξ TE (r < 0.60) that is not sufficient to reach a p value lower than 0.05.
These results are the premise for designing meaningful model experiments.

Hypotheses for model building
This section interprets the results found in Sect. 4.1.2 and formulates some hypotheses regarding the hydrological functioning of the catchment (Sect. 4.2.1). Section 4.2.2, then, presents the model alternatives designed for testing those hy-55 potheses.

Hypotheses on catchment functioning
The results of the correlation analysis can be interpreted to formulate the following hypotheses regarding the drivers of streamflow variability. 60 1. The precipitation is the first driver of the differences in the water balance of the subcatchments. The effect of topographic variability manifests itself primarily as an influence on precipitation (amount and type). Accounting for variability of precipitation therefore im-65 plicitly reflects such effect of topography on the hydrograph, since some inputs were interpolated taking into account the effect of the elevation (Sect. 2). Other phenomena potentially altering the water balance (e.g. regional groundwater flow) do not have a significant role 70 and should not be considered.
2. Snow-related processes (e.g. amount of snow, timing of snowmelt) control differences in streamflow seasonality between subcatchments. Hence, the model needs to account for snow-related processes and their spatial variability.
3. Geology exerts an important control on the partitioning between quickflow and baseflow. Hence, the model 5 should distinguish the different response behaviours of distinct geological areas.
4. The other catchment characteristics (e.g. soil, vegetation) show little or no correlation CE8 with the streamflow signatures, and therefore they should not be con-10 sidered if the idea is to keep the model as simple as possible.
The streamflow signatures ζ Q5 and ζ HQD , which have been selected as part of the analysis shown in Sect. 4.1.1, do not manifest a strong correlation with any of the indices (r is al-15 ways less than 0.60), meaning that the identification of their potential controls is not obvious with the chosen approach. Hence, we have not been able to build model hypotheses that specifically target those signatures. As a result, we expect that the chosen models will not excel and will perform 20 similarly in reproducing these signatures. The model comparisons used to test the four hypotheses listed above are described in Sect. 4.2.2.

Modelling experiments for testing the hypotheses
Using the model structure described in Sect. 3.2.1, four 25 model configurations were compared by varying the number and the definition of the HRUs, and changing the structure of the HRUs (Fig. 6). The objective of the experiments was to test the hypotheses 1-4 in Sect. 4.2.1 using semi-distributed hydrological models. 30 For all the models, the meteorological inputs (precipitation, PET, temperature) are aggregated at the subcatchment scale. Based on the first hypothesis, we assume that this discretization is sufficient to capture the regional difference in water balance between subcatchments. This hypothesis is 35 tested with model M0, with uniform parameters in the catchment (i.e. a single HRU) and distributed precipitation input. This model does not consider snow processes. We expect that this model will be able to reproduce differences in streamflow averages between subcatchments.

40
The second hypothesis (snow controls seasonality) is tested with model M1. Relative to M0, M1 accounts for snow processes, represented by a simple degree-day snow module (see , with inputs (temperature) distributed per subcatchment. We expect that this model will 45 be able to reproduce differences in streamflow seasonality between subcatchments.
The third hypothesis (geology controls baseflow) is tested with model M2. Relative to M1, M2 considers two HRUs, defined based on geology type. One HRU contains the areas 50 with consolidated geology, while the other HRU contains the rest of the catchment (unconsolidated and alluvial geology together). We expect that M2 will be able to reproduce differences in the baseflow index between subcatchments.
The fourth hypothesis (other catchment characteristics 55 should not be considered if the idea is to keep the model as simple as possible) is exemplified by model M3. M3 is analogous to M2 in terms of complexity, but the HRUs are based on catchment characteristics that did not show correlation with the streamflow signatures. Among those charac-60  Table 2. teristics, we have selected land use and considered an HRU based on forest and crops and the second one that occupies the rest of the catchment. This model is as complex as M2 (therefore it is more complex than M1); hence it has the same dimensions of flexibility to fit the data. However, since the 5 structure of this model does not incorporate the cause-effect relationships derived from the signature analysis, we expect that its predictive performance will be poorer than M2.
The total number of the calibrated parameters depends on the number of HRUs and on the structure used to represent 10 them: it was 8 for M0, 9 in M1, and 13 in M2 and M3, of which 5 parameters are common to all the HRUs (Fig. 6 and Table A1); these parameters are C e that governs the evapotranspiration, t OL rise and t IL rise that control the routing in the river network, k WR that regulates the outflow of the snow reser-15 voir, and S UR max that determines the behaviour of the unsaturated reservoir.

Modelling results
The models presented in Sect. 4.2.2 are evaluated in terms of hydrograph metrics (Sect. 4.3.1) and signature metrics 20 (Sect. 4.3.2). Figure 7a shows the values of the likelihood function (corresponding to the calibration objective function) for the four 25 models in calibration and validation. It can be observed that M0 is, by far, the worst model, with the lowest value  Table 2. of the likelihood function. Regarding the other three models, it can be seen that, during calibration, M1, which has the lowest number of calibration parameters, has the lowest performance, whereas M2 and M3 have higher and similar likelihood values. This behaviour persists in time validation, 5 with M2 and M3 that outperform M1. In space and spacetime validation, however, M3 has the lowest likelihood value of the three, whereas M1 and M2 limit their decrease in performance, ranking respectively second and first in terms of optimal likelihood value. 10 The likelihood function represents an aggregate metric of model performance; in order to get a sense of appreciation of model fit on individual subcatchments, Fig. 7b reports the values of Nash-Sutcliffe efficiency in space-time validation for each of the subcatchments. On average, M2 has the It can also be observed that M2 is generally better than M1, with F NS values that are higher or approximately equal except for subcatchments Andelfingen and Halden, where the F NS is slightly worse (however still higher than 0.80). 30 M3 is clearly better than M1 in Andelfingen, Frauenfeld, and Wängi, and clearly worse in Herisau and Mosnang. In particular, in Mosnang (the smallest basin), M3 reaches the worst performance of all the models on all the subcatchments.

Model performance in terms of hydrograph metrics
Regarding M0, it is interesting to observe that it has the 35 worst performance (among all the subcatchments) in Appenzell, which is the subcatchment that is most affected by snow (ψ FS = 0.21), while it reaches a performance similar to M1 in Frauenfeld and Wängi, which are two subcatchments with almost no snow.  Figure 8 compares the ability of M0 and M1 to capture the signatures representing average streamflow (ζ Q ) and seasonality (ζ HFD ). The analysis is presented for space-time validation and, for ζ HFD , focuses only on the four subcatchments 45 that are most affected by the snow (ψ FS > 0.10), to emphasize the differences between the results of the two models. Each colour represents a different subcatchment and each dot a year; the red dashed line has a 45 • slope and represents where the dots should align in case of perfect simulation re-50 sults. The normalized root mean square error and Spearman's rank score are also reported. It is important to stress that the models have not been calibrated using any of the signatures as an objective function, which therefore represent independent evaluation metrics.

55
It can be observed that M0 represents ζ Q equally well as M1, with almost no difference between the two models (r is 0.95 in both cases, whereas F RMSE is 0.11 for M0 and 0.10 for M1). Focusing on the ability to capture ζ HFD , it can be seen that the points corresponding to M0 all lie in the 60 upper-left part of the plot, meaning that this model underestimates the signature values. With respect to M1, instead, the points are more aligned around the diagonal. This difference in performance is captured by the values of F RMSE (0.13 for M0 and 0.07 for M1) and of r (0.66 for M0 and 65 0.85 for M1). Figure 9 compares the observed and simulated signatures for the other three models (M1, M2, and M3). All of them are equally good in representing ζ Q (F RMSE is 0.10, 0.10, and 0.11, and r is 0.95, 0.96, and 0.95 for M1, M2, and M3 70 respectively) and ζ HFD (F RMSE is 0.07, 0.07, and 0.05 and r is 0.85, 0.84, and 0.87 for M1, M2, and M3 respectively). In all cases the cloud of points appears to be aligned to the di- Figure 6. Spatial organization of the model structure: the catchment is divided into subcatchments (black lines), based on the location of the gauging stations, and HRUs (background colour), based on the catchment characteristics. All the HRUs have the same structure, but each HRU has its own parameterization except for some shared parameters. In the case of a single-HRU model (i.e. M0 and M1), the model maintains the subdivision into subcatchments but loses the subdivision into multiple HRUs. agonal, meaning that the three models are able to capture the values of the signatures each year. Moreover, there is no sensible difference in the various models in representing those signatures.
The performance of all the models decreases for ζ Q5 5 where the models have a similar performance, with F RMSE equal to 0.32, 0.28, and 0.33, and r equal to 0.62, 0.66, and 0.61 for M1, M2, and M3 respectively. The points are still aligned along the diagonal but are quite dispersed, especially if compared with ζ Q and ζ HFD , meaning that the 10 models capture the general tendency but have deficiencies in capturing the inter-annual variability.
In terms of ζ BFI , M2 performs clearly better than the other models. It is the only model that is able to represent this signature, with F RMSE = 0.07, r = 0.83, and the points that 15 align compactly with the diagonal. The other two models have a lower performance (F RMSE equal to 0.11 and 0.10, and r equal to 0.31 and 0.52 for M1 and M3 respectively), with points that are quite dispersed and align almost vertically, implying that the simulated values have a range of variability 20 that is definitely smaller than the observed data. Figure 10 shows the comparison between observed and simulated ζ HQD ; since this signature requires a long time window to be computed, it is not calculated year by year (as done with the other signatures) but as an aggregated value 25 over the 24 years. In terms of performance, M2 still remains the best among the three models, with F RMSE of 0.09 and r of 0.69; in second place comes M1, which outperforms M2 in terms of r (0.77) but has a higher F RMSE (0.19), meaning that M1 has the points that are more aligned but on a line that 30 is farther from the diagonal compared to M2; M3, finally, has a bad performance, with high F RMSE (0.18) and low r (0.48). All the models tend to slightly overestimate the duration of high-flow events with most of the points that lie on the righthand side of the diagonal.

Hypotheses testing
The results of the hydrological model experiments appear to support our general hypothesis that only models that account for the influence factors that affect the streamflow signatures are able to reproduce streamflow spatial variability 40 (see Sect. 4.2.1). This provides confidence that those models are a realistic representation of dominant processes in the catchment.
The implications of the modelling results with respect to the evaluation of the four hypotheses are explained as fol-45 lows. 1. Hypothesis 1: precipitation is the first driver of differences in the water balance. The good performance of model M0 in the representation of the mean annual streamflow (ζ Q ) suggests that accounting for the spatial heterogeneity of the precipitation alone is suffi-5 cient to achieve a good representation of the annual water balance. More complex models, with more HRUs, processes, and parameters, while resulting in an overall improvement of time series metrics, do not result in any improvement in simulating the water balance signa-10 ture ζ Q .
2. Hypothesis 2: snow-related processes control differences in streamflow seasonality. The improvement in the representation of the streamflow seasonality ζ HFD by M1 can be largely attributed to the (spatially vari-15 able) effect of snow accumulation and melting. More complex models (M2 and M3) do not demonstrate an improvement in this signature, indicating that the structural differences between these models do not have an influence on this signature. 20 3. Hypothesis 3: geology controls the partitioning between quickflow and baseflow. The ability of M2 to match the signature ζ BFI , which quantifies the separation between quickflow and baseflow, much better than the other models, supports the hypothesis that geology has 25 a strong control on the partitioning between quickflow and baseflow. M2 is also the model with the average best performance in terms of streamflow metrics. In summary, distributing the inputs in space and accounting for the spatial distribution of snow-related processes are sufficient to get good performance metrics of water balance and seasonality, confirming the fact that only the precipitation rate and the partitioning between rainfall and snow are the first controls on these hydrograph characteristics. However, in order to capture other important characteristics of the hydrograph, described by signatures such as ζ BFI , the 5 discretization of the catchment in HRUs is necessary. This discretization has to be carefully made and a preliminary analysis to understand dominant influence factors on signatures can help in this decision. As shown in Fig. 9, if such discretization uses landscape characteristics that are not 10 strongly correlated with the signatures (e.g. land use), the results are worse than if we choose characteristics that show a strong correlation with signatures (e.g. geology). This means that M2 is capable of capturing the signatures not just because it is more complex than M1, but because it incorporates 15 the causality link between the geology and the streamflow signatures into its structure.

General discussion
Explaining the spatial variability observed in catchment response is a major focus of catchment hydrology and a central 20 theme in classification studies (e.g. McDonnell and Woods, 2004;Wagener et al., 2007). A common approach for interpreting the spatial variability of catchment response is to identify relationships between climatic or catchment characteristics and streamflow signatures. This is typically done 25 through correlation-based analyses (e.g. Lacey and Grayson, 1998;Bloomfield et al., 2009), which however carry the limitations that correlation does not always imply causality and that the presence of multiple correlated variables can obscure process interpretation. 30 In this study, we combine a correlation analysis for identifying the dominant influence factors on streamflow signatures with hydrological modelling by using the interpretation of the first analysis as an inspiration for generating testable model hypotheses. The combination of correlation analysis 35 on streamflow signatures and hydrological modelling is beneficial because on the one hand, the speculations on dominant processes resulting from the correlation analyses can be verified in the modelling process. Specifically, we developed model experiments to test the influence of precipita-40 tion spatial distribution on streamflow average and seasonality and the influence of geology on quickflow vs. baseflow partitioning. On the other hand, model building benefits from the guidance resulting from the preliminary signature analysis. The construction of a distributed model requires several 45 decisions (e.g. Fenicia et al., 2016), including how to "break up" the catchment in a meaningful way, and preliminary signature analysis can motivate some of these decisions. For example, the definition of HRUs based on geology, which was suggested by the signature analysis, resulted in models with 50 better performance than models using HRUs defined on the basis of land use, particularly in the representation of streamflow signatures that reflect the baseflow vs. quickflow partitioning.
Although several modelling decisions were guided by data 55 analysis, it should be noted that alternative decisions would have been similarly consistent with the data. For example, both precipitation and elevation are correlated with average streamflow, and geology, topography, and soil-type characteristics are correlated between each other and with the base-60 flow index (Sect. 4.1.2 and Fig. 5). The correlation of catchment characteristics (e.g. geology, soil, and topography) can be attributed to the fact that they have evolved together in the shaping of the catchment morphology (e.g. mountainous regions have impervious topography with shallower soil and, 65 for these reasons, are less suitable for human activities, influencing land use). The decisions on which variables are chosen to reflect a causality link are not always obvious from correlation analysis alone, and they require expert judgement, which is necessarily subjective. Although subjectivity 70 is difficult to avoid, it is important to be transparent about  the decision taken and the argumentations on which they are based, how weak or strong they may be, so that they can be reappraised and revised if new evidence is acquired.
Although our results in terms of hypotheses 1-4 described in the previous section appear justifiable based on previous 5 work, they are not a priori obvious. In terms of the first hypothesis, although it is known that precipitation has a strong control on the average streamflow, it is less clear whether the spatial variability in the streamflow average can only be attributed to precipitation: some authors, for example, pointed 10 to the role of regional groundwater flow in affecting the water balance (e.g. Bouaziz et al., 2018); GR4J (Perrin et al., 2003), for example, has a parameter that quantifies catchment gains and losses. Our modelling experiments, in particular through M0, have shown that groundwater processes, which 15 potentially alter the water balance, do not influence the mean streamflow spatial variability of the Thur catchment.
In terms of the snow processes, although it is clear that, when there is snow (as in this case), the model needs to have a snow component, it is less obvious (at least just by looking at 20 hydrographs) how many of the differences in the seasonality of the streamflow response between catchments are due to snow. The objective of the comparison between M0 and M1 is not to show that adding a snow component improves the overall performance, but that the differences in seasonality 25 are captured by the model only when the snow component is integrated.
In terms of the effect of geology, Kuentz et al. (2017) made a classification study over more than 40 000 catchments across all Europe (of which almost 2700 are gauged) 30 and found that geology controls the BFI, topography the flashiness index, and, for most of the cases, land use is the second control of them; Bloomfield et al. (2009) used a linear regression model and linked the lithology of the Thames Basin (UK) with the BFI; Lacey and Grayson (1998) noted 35 that geology controls the BFI in two ways, storing the water and impacting the soil formations; Fenicia et al. (2016) compared different model structures and catchment discretization methods in the Attert Basin (Luxemburg) and discovered that the best model was the one that incorporates a spatial repre-40 sentation of the meteorological inputs and of the geology. On the other hand, this general tendency should not be generalized to all places. For example, Mazvimavi et al. (2005) found that geology was not important for the BFI, as in their case study the aquifer was deep and disconnected from the 45 river. Bouaziz et al. (2018) found a strong influence of regional groundwater flow in the Meuse catchment which altered the water balance.
The choice of streamflow signatures is based on the largesample study from Addor et al. (2017), which provides a 50 broad range of signatures typically used in hydrology. Our analyses showed that this selection is rather inclusive, with several strongly correlated signatures (e.g. ζ Q and ζ RR ). For this reason, we eventually used a much smaller selection of the original set of signatures (12 in the original set vs. 5 in 55 the final set). Although hundreds of signatures have been proposed in the literature (e.g. Olden and Poff, 2003), the apparent inclusivity of the set from Addor et al. (2017) provides confidence that the main properties of streamflow are captured in our study. However, it does not guarantee that this 60 set of signatures is sufficient in representing streamflow time series.
One of the main limitations of this work is the restricted number of catchments involved and the limited spatial extension of the study. For this reason, it is difficult to generalize 65 the results to other climatic regions. The subcatchments all belong to the same region, and the landscape and climatic characteristics, while varying substantially within the basin, are still a small sample of the characteristics found elsewhere. Moreover, although the model evaluation uses vali-70 dation in space and time, which is a relatively incisive test, the spatial validation is carried out in a nested setup. The ap-plication of systematic model development strategies to other places and scales, and spatial validation to entirely different regions, is necessary to obtain more generalizable insights.
The small number of subcatchments involved in this study (10) limits the range of viable methods for identifying relationships between landscape and climatic indices and streamflow signatures (Sect. 3.1) to rather simple approaches. In particular, our correlation analysis, although accounting for non-linearity, is limited to monotonic correlations between variables, and it is unable to identify other 10 forms of relationship, including the mutual interaction between various influence factors. The usage of more advanced techniques, including machine learning approaches such as random forest or clustering analyses, is most efficient when larger samples are available and could represent a more suit-15 able choice in these situations.

Conclusion
In this study, we presented the development process of a distributed model where model hypotheses, instead of being made a priori, are informed by preliminary analysis on de-20 termining the dominant climatic and landscape controls on streamflow spatial variability. Besides providing guidance to model development, the proposed approach is useful in the fact that modelling can be used to test specific hypotheses on dominant processes resulting from such preliminary analysis.

25
Our analysis was applied to the Thur catchment, subdivided into 10 subcatchments based on available stream gauging stations. The main findings are summarized in the following points.
1. We found large spatial variability between the subcatch-30 ments of the Thur in terms of various streamflow signatures reflecting multiple temporal scales: yearly, seasonal, and event scale. In terms of climatic characteristics, indices reflecting fraction of snow, precipitation totals, and aridity varied considerably among catchments. 35 Other precipitation characteristics such as season, frequency, and duration of dry and wet days did not vary significantly among catchments. In terms of landscape characteristics, there is large variability of topography (e.g. from upstream mountainous to downstream flat ar-40 eas), geology (with unconsolidated, more permeable, and consolidated, relatively impermeable formations), and soils (with low depths in the mountains and large depths in the floodplains) in all the catchments.
2. Based on correlation analysis and expert judgement, we 45 determined that climatic variables, especially the precipitation average, are the main controls on streamflow average yearly values; the fraction of snow is responsible for streamflow seasonality by delaying the release of winter precipitation to the spring season, and geology 50 controls the baseflow index, with a higher fraction of unconsolidated material determining higher baseflow.
3. The results of the signature analysis were translated into a set of model hypotheses: a model with uniform parameters and distributed precipitation input (M0), the addi-55 tion of a snow component (M1), the subdivision of the catchment into geology-based HRUs (M2), and the alternative subdivision of the catchment using land-usebased HRUs (M3). 4. Using model comparison and a validation approach that 60 considers model performance (also in terms of signatures) in space-time validation, we found that it is necessary to account for the heterogeneity of precipitation, snow-related processes, and landscape features such as geology in order to produce hydrographs that have sig-65 natures similar to the observed ones. In particular, we confirmed that M0, in spite of a generally poor performance, is sufficient to capture signatures of streamflow average, showing that only distributing the meteorological inputs is sufficient to explain regional differences 70 in average streamflow and that other phenomena potentially altering the water balance (e.g. regional groundwater flows) do not play a significant role. M1 improves signatures of streamflow seasonality, showing that snow is the main cause of the variability of the seasonality 75 among the catchments. M2 enables signatures such as the baseflow index to be reproduced, showing that incorporating the geology of the catchment is important for reproducing regional differences in baseflow. Model modifications that are not in line with the results of 80 the signature analysis, such as subdividing the catchment using land-use-based HRUs (M3), despite leading to the same complexity as M2, cause deterioration in model performance in space-time validation. Overall, these results confirm the hypotheses based on the 85 signature analysis and suggest that the causality relationships, explaining the influence of climate and landscape characteristics on streamflow signatures, can be constructively used for distributed model building.
The relatively good performance obtained in space-time val-90 idation suggests that the proposed approach could be used for the prediction of the streamflow in other ungauged locations within the Thur catchment. The method proposed uses data that are commonly available in many gauged catchments (e.g. meteorological data, streamflow measurements, maps of 95 elevation, geology, land use, and soil); therefore, it is easily transferable to other locations.
ibration, Table A2 lists the water-budget equations, and Tables A3 and A4 present the functions and the constitutive functions used. Table A1. Hydrological model parameters with the range of variation used for the definition of the uniform prior distribution. The "component" column indicates the element (reservoir, lag, or network) where the parameter belongs.
Parameter Unit Component Range of variability  Table A2. Water-budget equations (see the model schematic in Fig. 6).   , with smoothing parameter m P = 1.5 • C. (b) This equation is smoothed using a logistic scheme, Eq. (13) in , with smoothing parameter m M = 1.5 • C. (c) The operator * denotes the convolution operator, smoothed according to . Table A4. Constitutive functions.
Data availability. All the interpolated data used for this publication as well as the codes to calculate the metrics (e.g. indices and signatures) are stored in an institutional repository (https://doi.org /10.25678/0001RK, Dal Molin et al., 2020). Due to restrictions, the raw data have to be requested directly by the 5 providers. Meteorological data can be obtained by the Federal Office of Meteorology and Climatology, MeteoSwiss; streamflow data can be obtained from the Federal Office for the Environment (FOEN); the maps used for calculating the catchments characteristics can be obtained by the Federal Office of Topography, Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Linking landscape organisation and hydrological functioning: 20 from hypotheses and observations to concepts, models and understanding (HESS/ESSD inter-journal SI)". It is not associated with a conference.
Acknowledgements. The authors thank the Federal Office of Meteorology and Climatology, MeteoSwiss, for the meteorological 25 data and the Federal Office for the Environment, FOEN, for the streamflow data. The authors thank Conrad Jackisch (editor), Lieke Melsen, Shervan Gharari, and the two anonymous referees for their feedback and their help in improving this paper. MZ and MS dedicate this work to the memory of Joachim Gurtz, who pio-30 neered the work on distributed hydrological modelling in the Thur catchment (Gurtz et al., 1999).
Financial support. This research has been supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant no. 200021_169003). 35 Review statement. This paper was edited by Conrad Jackisch and reviewed by Lieke Melsen, Shervan Gharari, and two anonymous referees.
Remarks from the language copy-editor CE1 It is our house standard to use numerals in a mathematical context.

CE2
Please note the addition of parentheses, to make it clear that this is the abbreviation for the office.

CE3
Please note the parentheses.

CE4
Please note that we use round brackets/parentheses for all additional information such as this.

CE5
Also a mathematical context.

CE6
A hyphen is not necessary between a noun and its adjective.

CE7
Please see the previous comment.

CE8
Note that "little" must be used with mass nouns, so the "s" was removed from correlation.

Remarks from the typesetter TS1
Please note that we can only insert PNG files into the PDF version of the paper. Please note that the high-resolution version of the figures will be available through the HTML view of the paper. The high-resolution files can be downloaded via the "Download" button below the figure. Please note that this does not apply to maps and satellite images.

TS2
Please note that we write physical quantities/variables/parameters in italic font, indices which are not defined in italic font, and abbreviations (from 2 letters) in roman font.

TS3
Please give an explanation of why this needs to be changed. We have to ask the handling editor for approval. Thanks.

TS4
Please give an explanation of why this needs to be changed. We have to ask the handling editor for approval. Thanks.