Complexity and performance of temperature-based snow routines for runoff modelling in mountainous areas in Central Europe

Snow processes are a key component of the water cycle in mountainous areas as well as in many areas of the midand high latitudes of the Earth. The complexity of these processes, coupled with the limited data available on them, has led 10 to the development of different modelling approaches to improve their understanding and support management practices. Physically-based approaches, such as the energy balance method, provide the best representation of these processes but at the expense of high data requirements. Data limitations, in most situations, constrain the use of these methods in favour of more simple approaches. The temperature-index method is the most widely-used modelling approach of the snowpack processes for hydrological modelling, with many variants implemented in different models. However, in many cases, the 15 decisions on the most suitable complexity of these conceptualisations are not adequately assessed for a given model structure, application, or decision-making support tool. In this study, we assessed model structure choices of the HBV model, a popular semi-distributed, bucket-type hydrological model, for its application in mountainous areas in Central Europe. To this end, we reviewed the most widely-used choices to different components of the snow routine in different hydrological models and proposed a series of modifications to the 20 structure of HBV. We constrained the choice of modifications to those that are aligned with HBV’s modelling approach of keeping processes as simple as possible to constrain model complexity. We analysed a total of 64 alternative snow routine structures over 54 catchments using a split-sample test. We found that using (a) an exponential snowmelt function coupled with no refreezing instead of a linear function for both processes and (b) a seasonally-variable degree-day factor instead of a constant one were, overall, the most valuable modifications to the model. Additionally, we found that increasing model 25 complexity does not necessarily lead to improved model performance per se. Instead, we found that a thorough analysis of the different processes included in the model and their optimal degree of realism for a given application is a preferable alternative. While the results may not be transferrable to other modelling purposes or geographical domains, the methodology presented here may be used to assess the suitability of model design choices. https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c © Author(s) 2020. CC BY 4.0 License.

method to simulate snowpack processes. For instance, while some models use the freezing point as reference temperature (Walter et al., 2005), others include a calibrated threshold temperature parameter (Viviroli et al., 2007). Regarding the proportionality constant, some applications define it as time-independent (Valéry et al., 2014), while others establish it as 65 being seasonally-variable (Hottelet et al., 1994). Moreover, connecting with energy budget models, some applications use two proportionality constants: one for temperature and another for net radiation (Kane et al., 1997). Nevertheless, regardless of the preferred approach, the inherent simplifications made in semi-distributed temperature-index models leave out some critical aspects of the snowpack processes that may be significant in some circumstances. For instance, the disregard of lateral transport processes may lead to the development of unreasonable accumulations of snow over long periods of time 70 (i.e., snow towers) in high mountainous areas (Freudiger et al., 2017;Frey and Holzmann, 2015).
The different implementations of the temperature-index method may relate to differences in model philosophy, scope, desired resolution, or available computing power, among others. Nevertheless, these choices are not always adequately tested to ensure that they provide the best possible alternative for a given model design and application (Harpold et al., 2017). For instance, some studies have found that a more realistic representation of hydrological processes does not always 75 translate into improved model performance (Orth et al., 2015). For a long time, limitations in computing power hindered the systematic testing of different alternative model structures over a large number of catchments. In recent years, however, the increase in computing power has made these tests not only feasible but also desirable in order to ensure that model structures are adequate and robust for their intended applications (Günther et al., 2019).
In this study, we evaluated the design choices made in the snow routine of the HBV hydrological model (Bergström, 1995), 80 a typical bucket-type model with a temperature-based snow routine, for its application in mountainous central-European catchments. We implemented and tested a number of modifications to the snow routine of the model based on common practices in similar hydrological models, and investigated whether it is possible to identify model structure alternatives which generally result in improved model performance. With this, we aimed to provide a basis to decide on useful modifications to the model while avoiding adding unnecessary complexity and additional parameters. To ensure that the 85 results are representative we explored different levels of added complexity, from single modifications to combinations of multiple modifications, to a large dataset of catchments covering a wide range of climatological and hydrological conditions of the area of interest.

Materials and Methods
The HBV model is a bucket-type rainfall-runoff model, with a number of boxes (routines) including the main components of 90 the terrestrial phase of the water cycle, i.e., snow routine, soil routine, groundwater (response) routine, and routing function.
In this study, we focus solely on the snow routine of the model. We use the HBV-light software, which follows the general structure of other implementation of the HBV model and includes some additional functionalities such as Monte Carlo runs and a genetic algorithm for automated optimisation (Seibert and Vis, 2012). Henceforth we use the term 'HBV model' when https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License. referring to our simulations using the HBV-light software. With 'HBV model' we mean the 'standard HBV', i.e., the HBV 95 model with the snow routine as described in Lindström et al. (1997) or Seibert & Vis (2012).

HBV's Snow Routine
The snow routine of the HBV model is based on well-established conceptualisations of the relevant snow processes for hydrological applications. It represents the processes regarding two aspects of snow hydrology: the phase of the precipitation (snow or rain), and the accumulation and melt of the snowpack. 100 Regarding the precipitation phase, HBV uses a threshold temperature parameter, TT [°C], above which all precipitation, P [mm/Δt], is considered to fall as rain, PR [mm/Δt] (Eq. 1). This threshold can be adjusted to account for local conditions.
Below the threshold, all snow is considered to fall as snow, PS [mm/Δt] (Eq. 2). The combined effect of snowfall undercatch and interception of snowfall by the vegetation is represented by a snowfall correction factor, CSF [-].
As previously mentioned, the HBV model uses a simple approach based on the temperature-index method to simulate the evolution of the snowpack. This way, snowmelt, M [mm/Δt], is assumed to be proportional to the air temperature, T [°C], above a predefined threshold temperature, TT [°C], through a proportionality coefficient, also called degree-day factor, C0 [mm/Δt°C] (Eq. 3). The physical motivation of this approach is that the energy available for snowmelt is generally 110 proportional to the air temperature (Ohmura, 2001). The model allows for a certain volume of melted water to remain within the snowpack, given as a fraction of the corresponding snow water equivalent of the snowpack, CWH [-]. Finally, the refreezing of melted water, F [mm/Δt], takes place when the air temperature is below TT, and its magnitude is modulated through an additional proportionality parameter, CF [-] (Eq. 4).
HBV allows for a limited representation of catchment characteristics through the specification of different elevation and vegetation zones. This way, the parameters controlling the different processes included in the snow routine can be modified for individual vegetation zones. The combination of elevation and vegetation zones (also known as Elevation Vegetation Units, EVUs) is the equivalent of the Hydrologic Response Units (HRUs) used in other distribution function models (Flügel, 120 1995). Both precipitation and temperature are corrected for elevation using two parameters for the precipitation and temperature lapse rate. https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.

Proposed Modifications to Snow Routine Components
Here we review the individual components of the snow routine structure of the HBV model as well as functions that are directly related to it (e.g. input data correction with elevation) and describe the proposed modifications. Each of these 125 modifications requires one to three parameters (Table 1). When different elevation zones are used, the temperature for each zone is generally computed from some catchment-average value and a lapse rate. In HBV, usually a constant temperature lapse rate is used. Alternatively, if the available data allows, it is also possible to provide an estimation of the daily temperature lapse rate. However, if no data on the altitude dependence of temperature is available, setting a constant value throughout the year might be an oversimplification. Indeed, in an 135 experimental study on several locations across the Alps, Rolland (2002) found that the seasonal variability of the temperature lapse rate follows approximately a sine curve with a minimum at the winter solstice. Following these results, we https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.
implemented a seasonally variable computation of the temperature lapse rate using a sine function (Eq. 5). This way, the temperature lapse rate for a given day of the year, Γn [°C/100 m] (where n is the day of the year, a sequential day number starting with day 1 on the 1 st of January), depends on two parameters, namely the annual temperature lapse rate average, 140 Γ0 [°C/100 m], and amplitude, Γi [°C/100 m].
Precipitation lapse rates could not be related to a seasonal or other types of systematic variations as they are strongly dependent on the synoptic meteorological conditions and therefore highly variable. Therefore, we decided to keep the default approach of calibrating the model using an average precipitation lapse rate parameter. 145

Precipitation Phase Partition
The determination of the precipitation phase is crucial as it controls whether water accumulates in the snowpack or contributes directly to recharge and runoff. In the HBV model the distinction between rainfall and snowfall is based on the assumption that precipitation falls either as rain or as snow, depending on a threshold temperature. However, in reality, this transition is less sharp, as there are mixed events with both rain and snow and, depending on other factors such as humidity 150 and atmosphere stratification, the shift from rain to snow can occur at different temperatures (Dai, 2008;Magnusson et al., 2014;Sims and Liu, 2015). Therefore, the single threshold temperature may not adequately represent the snow accumulation, especially in areas or periods with temperatures close to zero degrees Celsius. Different approaches have been suggested to describe the snow fraction of precipitation, S [-], as a function of temperature (Froidurot et al., 2014;Magnusson et al., 2014;Viviroli et al., 2007). In this study, we considered three different conceptualisations of snowfall 155 fraction of precipitation, (Eq. 6 -7): (i) a linear function (Eq. 8), (ii) a sine function (Eq. 9), and (iii) an exponential function (Eq. 10). Both the TA [°C] and MP [°C] parameters control the range of temperatures for mixed precipitation.

Snowmelt Threshold Temperature
In addition to determining the precipitation phase, a temperature threshold is also needed to determine when snowmelt starts.
The most straightforward approach, as used in the HBV model, is to use the same threshold temperature parameter for both 165 snowfall and snowmelt. However, as these two transitions are related to different processes happening at different environmental conditions, a single parameter might not adequately describe both transitions. A more realistic approach would be to consider two separate parameters for these processes: a threshold temperature parameter for precipitation phase differentiation, TP [°C], and another one for snowmelt processes, TM [°C] (Debele et al., 2010). We implemented this modification using one additional parameter. 170

Degree-day factor
The degree-day factor is an empirical factor that relates the rate of snowmelt to air temperature (Ohmura, 2001). In the HBV model a single proportionality coefficient to estimate the magnitude of the snowmelt is used. This coefficient, multiplied by a constant (usually set to 0.05 in HBV), is also used to compute refreezing rates. However, while the degree-day factor is often assumed to be constant over time, there are good reasons to assume temporal variations due to changes such as snow 175 albedo and solar inclination. A simple way to represent this variability is to consider the degree-day factor to be seasonally variable following a sine function defined by a yearly average degree-day factor parameter, C0 [mm/Δt°C], and an amplitude parameter, C0,a [mm/Δt°C], defining the amplitude of the seasonal variation (Eq. 11) (Braun and Renner, 1992). By establishing a seasonally-variable degree-day factor instead of a constant value for this parameter, potential snowmelt rates are smaller during the winter months, while increasing during spring.

Snowmelt and Refreezing
Snowmelt water does not leave the snowpack directly; a certain amount of liquid water may be stored in the snow. This is important as it delays the outflow of water from the snowpack, and besides that, the liquid water can potentially refreeze if temperatures decrease again. In the HBV model, both the storage of liquid water and refreezing processes are considered. 185 However, since the magnitude of refreezing meltwater is generally tiny compared to other fluxes, some models disregard this process entirely as it adds complexity to the model without adding any value to it (Magnusson et al., 2014). Here we follow the approach by Magnusson et al. (2014) which, besides disregarding the refreezing process, describes snowmelt using an exponential function (Eq. 12). This conceptualisation if somewhat more realistic than the one used in HBV but requires the use of an additional parameter to control for the smoothness of the snowmelt transition, MM [°C]. This way, and contrary to the formulation used in HBV, snowmelt occurs even below the freezing point, but at negligible amounts. The impact of increasing temperatures on snowmelt is also higher for this conceptualisation.

Study Domain and Data
We selected two sets of catchments in two different geographical domains to test the proposed modifications to the 195 individual components of the snow routine of the model (Table 2, Figure 1). The first set, composed of Swiss catchments, contains a range of catchments from high altitude, steep catchments in the central Alps to lower catchments in the Pre-Alps and Jura mountains. The second set, composed of Czech catchments, is representative of mountain catchments at lower elevations compared to the Swiss catchments. https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.

Switzerland
We selected 22 catchments in Switzerland covering a wide range of elevations and areas in the three main hydrogeographical domains of the country, i.e. the Jura and Swiss Plateau, the Central Alps, and the Southern Alps (Weingartner and Aschwanden, 1989). The choice of catchments was constrained by our intention to avoid catchments with significant karst or glacierised areas, as well as catchments with substantial human influence on runoff. This decision allowed us to 215 observe the signal of snow processes, without including noise or added complexity from other processes, but limited the number of catchments in high altitudes, which are the ones with largest snowmelt contribution to runoff, and therefore those that would potentially benefit the most from an increased realism of the snow routine of the model. The resulting set of https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License. catchments has mean elevations between 600 and 2800 m a.s.l. with elevation gradients of up to 2000 m and catchment areas between 3 and 500 km 2 ( Figure 2). There is a great variability in the yearly snowmelt contribution to runoff ranging from 220 5 % to 60 % as the catchments range between pluvial to glacio-nival regimes.
We obtained the necessary meteorological data for running the HBV model from the Swiss Federal Office of Meteorology and Climatology (MeteoSwiss). More specifically, we used pre-processed gridded data products to obtain catchment-average precipitation (Frei et al., 2006;Frei and Schär, 1998), and temperature (Frei, 2014). These gridded data products are available from 1961, have a daily temporal resolution, and a spatial resolution of 1.25 degree minutes covering the entire 225 country.
We used both stream runoff and snow water equivalent data for model calibration and validation. We obtained daily stream runoff data from the Swiss Federal Office for the Environment (FOEN, 2017). Regarding snow water equivalent, we used 18 years of gridded daily snow water equivalent data at 1 km 2 resolution derived from a temperature index snow model with integrated three-dimensional sequential assimilation of observed snow data from 338 stations of the snow monitoring 230 networks of MeteoSwiss and the Swiss Institute for Snow and Avalanche Research (SLF) (Griessinger et al., 2016;Magnusson et al., 2014). Finally, we obtained the catchment areas and topography from a digital elevation model with a resolution of 25 m from the Swiss Federal Office of Topography (swisstopo).

Czechia
The second set composed of Czech catchments includes 32 mountain catchments with catchment areas ranging from 3 to 383 235 km 2 (Figure 2). We selected near-natural catchments with no major human influences such as big dams or water transfers.
The resulting catchments are at relatively lower elevations and present lower elevation ranges compared to most of the selected Swiss catchments. Additionally, they are located in the transient zone between oceanic and continental climate with lower mean annual precipitation compared to Swiss catchments. The mean annual snow water equivalent maximum for the period 1980 -2014 ranges from 35 mm to 742 mm depending on catchment elevation, resulting in 13 % to 39 % of the 240 annual runoff coming from spring snowmelt.
We obtained daily precipitation, daily mean air temperature, and daily mean runoff time series from the Czech Hydrometeorological Institute (CHMI). Additionally, we obtained weekly snow water equivalent data from CHMI (measured each Monday at 7 CET). Since no gridded data of precipitation, air temperature, or snow water equivalent are available for Czechia, station data were used for HBV model parametrization. We used stations located within individual 245 catchments when available. When no such station was available, we selected the nearest station representing similar catchment conditions (e.g., stations situated at a similar elevation). Finally, we used a digital elevation model with a vertical resolution of 5 m from the Czech Office for Surveying, Mapping and Cadastre to obtain catchment areas and topography.

Experimental Setup
Even if sub-daily data were available for most variables for the Swiss catchments, we considered that daily data was suitable for this study, as using sub-daily temporal resolutions would have required taking into account the diurnal variability of 255 some of the variables, thus requiring a higher model complexity (Wever et al., 2014). For instance, radiation and temperature fluctuations along the day would require similarly variable degree-day factor values (Hock, 2005). Other factors such as travel times between the sources of snowmelt and the streams would also become significant issues at sub-daily time scales (Magnusson et al., 2015). In order to keep the model simple but able to represent the elevation-dependent snow processes, we used a single vegetation zone per catchment but divided the catchment area into 100 m elevation zones. 260 When evaluating the performance of hydrological models to simulate snow dynamics, this evaluation is sometimes done solely looking at the simulated runoff by the model, as this variable is the main output of hydrological models (Riboust et al., 2019;Watson and Putz, 2014). Nevertheless, this analysis alone is incomplete as the performance of the model to reproduce runoff is the result of the interaction between the different components of the model, also those that are not directly related to snow processes. For this reason, here we evaluated the existing model structure as well as the proposed modifications to it 265 based on their ability to represent (i) the snow water equivalent of the snowpack, and (ii) stream runoff at the catchment outlet.
To evaluate the performance of the different model structures to reproduce the snow water equivalent of the snowpack we used a modified version of the Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970) where the model performance, RW, is https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.
given by the fraction of the sum of quadratic differences between snow water equivalent observations, WO, and simulations, 270 WS, and between observations and the mean observed value, ̅̅̅̅ (Eq. 12).
Due to the substantial differences in snow water equivalent data availability between the two datasets (gridded data in Switzerland vs point data in Czechia), we had to adapt the model calibration and evaluation procedure to each case. We evaluated the model against the mean snow water equivalent value for each elevation zone for the Swiss catchments, and 275 against the measured values at a given elevation for the Czech ones.
Regarding the evaluation of the stream runoff estimation, we deemed that the standard Nash-Sutcliffe efficiency measure was not suitable for our case study as it is skewed towards high flows (Schaefli and Gupta, 2007). Snow processes are dominant both in periods of high flows (e.g. spring flood) and low flows (e.g. winter conditions), which are equally important to estimate correctly. For this reason, we decided to evaluate the estimation of stream runoff by using the natural 280 logarithm of runoff instead (Eq. 13).
Some studies focusing on snow hydrology establish specific calibration periods for each catchment based on, for instance, the snowmelt season (Griessinger et al., 2016). In this study, however, we decided to constrain the calibration and evaluation periods in a consistent and automated manner for all catchments. For this reason, we defined the model calibration and 285 evaluation periods as comprising days with significant snow cover on the catchment (>25% of the catchment covered by snow). We also considered a full week after the occurrence of snowmelt in order to account for runoff delay. We obtained the value of 25% through empirical tests on the number of days with specific snow coverage values and their corresponding snow water equivalent values. We found that below this value the total snow water equivalent in the studied catchments becomes negligible. 290 We calibrated the model for all the catchments in the study using a split-sample approach. We selected this approach because it allowed us to assess both the best possible efficiency for each model alternative (calibration period), and a realistic model application scenario (validation period), helping us to distinguish between real model improvement and overfitting. In our case, the simulation period was limited by the input data with the shortest temporal availability, which in this case was the snow water equivalent data for the Swiss catchments. In total 20 years were available, which we divided into two equally 295 long 9-year periods plus 2 years for model warm-up. We calibrated the model for both periods and cross-validated the simulations for the remaining periods. For the Swiss catchments, we considered the period between 1 st of September 1998 and 31 st August 2016, while for the Czech catchments we considered the period between 1 st November 1996 and 31 st October 2014. The different start dates for simulation periods in the Swiss and Czech catchments correspond to the different timing for the onset of snow conditions in the different areas. Additionally, the different years included in each study domain correspond to data limitations in each area. Since both areas are geographically separated, we considered that it was more important to have the same period length for simulation in both domains rather than using the exact same years, as the meteorological conditions are different in the two study domains anyway.
Overall, we calibrated the model for all possible combinations of the single modifications to individual components of the snow routine of the HBV model described in Section 0 (n = 64), catchments (n = 54), simulation periods (n = 2), and 305 objective functions (n = 2) using a genetic algorithm (Seibert, 2000). Every calibration effort consisted of 3500 model runs with parameter ranges based on previous studies (Seibert, 1999;Vis et al., 2015). We performed 10 independent calibrations for each setup in order to capture the uncertainty of the model. In total we performed around 500 million model simulations.

Results
The large number of catchments and model variations considered in this study make it difficult to grasp the detailed results 310 when looking at the entire dataset. Therefore, we first describe the results for one single catchment to explore the implications of individual model modifications and then we progressively add more elements into the analysis. Additionally, even if we calibrated (and validated) the model for both periods defined in the split-sample test, here we only present the results for the calibration effort in period 1 and corresponding model validation in period 2, as they are representative for the entire analysis. As an example catchment, we selected one of the high altitude, snow-dominated catchments in the set, the 315 Allenbach catchment at Adelboden (CH-204), as it allows us to describe some of the general trends observed across the study domain.
The performance of HBV for this catchment is very high for both objective functions (~0.90). Looking at all the single modifications to the snow routine structure of the HBV model that we evaluated in this study, using a seasonally varying degree-day factor (C0,s) has the most substantial impact on the model performance to represent snow water equivalent and, to 320 a lesser extent, stream runoff ( Figure 3). Apart from these modifications, only the use of an exponential function to define the precipitation partition between rain and snow (ΔPe) produces significant changes in the model performance against both objective functions. In this case, however, this modification impacts the model performance in different ways depending on the objective function, even leading to decreased model performance when calibrating against stream runoff. Finally, if we look at the model, we observe that model uncertainty, as given by the performance ranges obtained when aggregating the 325 different calibration efforts, is small compared to the performance differences between the different model structures.
https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License. If we look at an example year within the calibration period, we can get a grasp of how the simulated values of snow water equivalent and stream runoff, including the model uncertainty, compared with the observed values ( Figure 3). While 335 capturing the general evolution of the snowpack, the HBV model tends to underestimate the snow water equivalent amounts, except for the spring snowmelt period. The model alternative using a seasonal degree-day factor (C0,s), which, as we have already seen, is the best possible model structure modification for model calibration against snow water equivalent for this catchment, exhibits the same behaviour but is more accurate than the HBV model structure. Regarding the calibration against stream runoff, both model alternatives perform well for low flow periods, but they miss or underestimate some of the 340 peaks. Model uncertainty is comparable for both model alternatives and is not significant when compared to the simulated values.

350
Zooming out to the entire set of catchments we can observe that, for the calibration period, the impact of the different modifications on the model performance is generally more pronounced for RW than for Rln(Q) across all of the considered catchments ( Figure 5). For most catchments, most performance improvements when calibrating against snow water equivalent were achieved by using a seasonally variable degree-day factor (C0,s). Using different thresholds for precipitation phase partition and snowmelt (TP,M) and using an exponential function for precipitation phase partition (ΔPe) also convey 355 significant improvements in some of the catchments. Nevertheless, this last modification performs almost equal to the HBV model when calibrating against stream runoff, and for some catchments even slightly worse. Using an exponential function to define the precipitation partition between rain and snow consistently penalises the model performance when calibrating the HBV model against stream runoff, whereas using an exponential function for snowmelt (Me) is the best alternative when looking at this objective function. Overall, most modifications convey slight improvements in model performance with 360 respect to snow water equivalent for most of the catchments in the dataset. Nevertheless, the modifications on the precipitation phase partition tend to penalise most Czech catchments when calibrating against snow water equivalent. We did https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License. not observe any significant connection between model performance and catchment characteristics such as mean catchment elevation, catchment area, or yearly snowmelt contribution to runoff. So far, we have seen that while some modifications have a clear and consistent impact on model calibration performance in all catchments, most of them have less pronounced, catchment-dependent impact. It is therefore difficult to decide which modifications are better than others (including the default HBV structure) for most of the catchments. Furthermore, until this point, we have only looked at the calibration efficiencies. To better understand the usefulness of the different modifications 375 in real applications, we need to take into account which of the modifications perform best with respect to the validation period ( Figure 6). As already observed in Figure 5, using a seasonal degree-day factor (C0,s) is the best modification for calibrating the model against snow water equivalent for the vast majority of the catchments. Nevertheless, this modification ranks relatively low for model validation against the same objective function. Looking at stream runoff, using an exponential function for snowmelt simulation while disregarding the refreezing process (Me) is the best-ranking modification for both 380 model calibration and validation while the HBV model structure ranks higher than several of the considered modifications.
Using an exponential function to define the precipitation partition between rain and snow (ΔPe) is the worst alternative for calibrating it against stream runoff. For model calibration, there is a diagonal pattern from the top left to the bottom right ( Figure 6), indicating that different modifications have the same rank in most catchments (notice that the ranking of modifications is different when looking at snow water equivalent with respect to stream runoff). Such a pattern is not present 385 for model validation, suggesting that there is no clear answer to which modifications convey most value to the model. https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.

Figure 6 Rank matrices for each of the model simulation scenarios. Top: model calibration (left) and validation (right) against snow water equivalent; bottom: model calibration (left) and validation (right) against logarithmic stream runoff. Each rank matrix shows the rank distribution of each modification to single components of the snow structure of the HBV model for all the
390 catchments included in this study. The modifications are ordered from highest to lowest average ranking (left to right) and include a seasonally-variable temperature lapse rate (Γs), a linear, sinusoidal, and exponential function for the precipitation phase partition (ΔPl, ΔPs, and ΔPe respectively), different thresholds for precipitation phase and snowmelt (TP,M), a seasonally variable degree-day factor (C0,s), and an exponential snowmelt with no refreezing (Me). The HBV model structure is highlighted with a white vertical line.

395
As we have seen, even if some model modifications clearly improve the performance of the model for model calibration, most modifications have at most minimal impact. A reason for this might be that the actual differences compared to the standard HBV model formulation are minimal. For this reason, we next explore whether the same trends persist when increasing the model complexity through incorporating an increasing amount of modifications simultaneously. The number of model alternatives for each number of modifications to the model structure is presented in Table 3. 400 https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.   alternatives. The impact is more modest for model validation with a significant percentage of alternative structures performing worse than HBV. Regarding model calibration against stream runoff, the effect of an increasing number of components being modified is minimal but mostly positive. The range of efficiencies is also significantly smaller than when looking at snow water equivalent, which is due to the fact that, for most catchments, the snow routine has a limited weight over the entire HBV model. For model validation we observe a similar trend, but with broader efficiency ranges. Also, the 415 fact that performance ranges vary significantly across the number of components being modified is in part due to the differences in the number of model alternatives for each of them.
https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License. against stream runoff, all bottom-ranking model structures use an exponential function for precipitation phase partition (ΔPe). 435 Other patterns that could not be clearly observed when only looking at single modifications emerge here as well. Indeed, even if a seasonal degree-day factor performs above average for most cases, this particular modification is included in all bottom-ranking model structures for model validation against snow water equivalent. Also, structures including an exponential function for snowmelt (Me) perform above average for all cases, even in almost all the top-ranking ones.

Discussion 475
From the results obtained in this study we can conclude that it is difficult to improve existing hydrological models and especially those that, like HBV, have proven capable of reproducing the hydrological behaviour of catchments over an extensive range of environmental and geographical conditions (Bergström, 2006). That being said, some of the proposed modifications to the snow routine of the HBV model that we tested in this study showed a generally positive impact on the model performance for simulating both snow water equivalent and stream runoff, albeit to different extents. The most 480 valuable single model modifications for modelling mountainous catchments in Central Europe are the use of an exponential snowmelt function and, to a lesser extent, a seasonally-varying degree-day factor. Another modification, using different thresholds for snowfall and snowmelt instead of a single threshold, produces significant improvements towards snow water equivalent, but does not convey any advantage for simulating stream runoff.
Continuing on the impact of model modifications on the different objective functions we considered in this study, we have 485 seen that there are large differences between them. In general, the impact is more evident when simulating snow water equivalent, as the simulated stream runoff is modulated and smoothened out by the other routines of the model (i.e. soil, groundwater, and routing routines), which partially compensate and mask any modifications made on the snow routine (Clark and Vrugt, 2006). Additionally, some of the modifications that improve the performance of the model to simulate snow water equivalent, such as the use of an exponential function to define the solid and liquid phases of precipitation, 490 clearly penalise the stream runoff simulations.
Unlike most modifications considered in this study, which are simplifications of complex processes, the use of an exponential function to describe precipitation phase partition and the use of a seasonally varying temperature lapse rate are both modifications derived from empirical evidence (Magnusson et al., 2014;Rolland, 2002). Nevertheless, as we have previously discussed, neither of these modifications translate into an improvement of model performance for any of the 495 objective functions. This might be due to the fact that, since the models are based on simplifications and generalisations of the processes that occur in reality, these accurate measurements of the processes do not align well with other simplifications made in the model structure and/or behaviour or the chosen spatio-temporal resolution (Harder and Pomeroy, 2014;Magnusson et al., 2015).
Other modifications are relatively similar to each other, such as the case of using linear and sine functions to describe 500 precipitation phase partition. Both these conceptualisations require only one additional parameter, and perform almost identical. Indeed, precipitation partition between rain and snow is exactly the same for both conceptualisations for most of the transition temperature range. The only divergence is in the tails, which are abrupt for the linear case and smooth for the sine one. Provided that the smooth transition is a more accurate description of the physical process which, in addition, avoids the introduction of discontinuities into the objectives functions, which might complicate model calibration (Kavetski and 505 Kuczera, 2007), and that both modifications include the same degree of complexity and perform nearly identical, we should https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.
favour the most accurate description. Nevertheless, some models, including HBV, continue to use the linear conceptualisation with the argument of simplicity.
We observe differences in model performance among the two geographical domains included in this study. Most notably, the modifications on precipitation phase partition penalised the model performance on most Czech catchments for simulating 510 snow water equivalent, while having the opposite impact for Swiss catchments. The Czech catchments have a narrower elevation range compared to the Swiss catchments, in addition to an earlier and shorter snowmelt period. These characteristics may favour the simplification of an abrupt transition between rain and snow, while using gradual transitions between rain and snow favours the more extended melt season and larger elevation ranges of the Swiss catchments. Another factor that may impact the results is the significant differences in model driving and validation data availability for each of 515 the geographical domains (Günther et al., 2019;Meeks et al., 2017). Indeed, while in Czechia there are a limited number of meteorological stations providing temperature, precipitation and snow water equivalent data, the Swiss catchments benefit from distributed data for the different catchment elevations, allowing for more accurate calibration of snow-related parameters.
Nevertheless, even with the large differences in hydrological regime, catchment morphology, and data availability between 520 the two geographical domains, the impact of the different modifications to model performance was in general comparable among them. This is a relevant point because HBV aims to be a general-purpose hydrological model that is applicable to a range of geographical domains and both in areas with data wealth and scarcity. Achieving comparable model performances under these different conditions is an indication of the suitability and strength of the HBV model for such goals.
Regarding model complexity and uncertainty, we found that increasing the complexity, and thus the number of parameters of 525 the model, generally translated into a broader range of model performances, indicating that the uncertainty related to the model structure was also increased. This is a well-known problem of hydrological models, and the focus of many studies (Essery et al., 2013;Strasser et al., 2002). Additionally, we found that, for most cases, the median model performance increase with increasing model complexity was not significant with respect to the performance range. This means that an increasing model complexity does not necessarily translate into better model performance, which is consistent with other 530 studies (Orth et al., 2015). This fact highlights the importance of carefully choosing the degree of complexity of the model based on the objectives to be achieved and the available data (Hock, 2003;Magnusson et al., 2015).
Even if model complexity is not desirable by itself if implemented in a sensible way, it can improve the performance of hydrological models. Indeed, 22 of the 63 model structure alternatives that we tested in this study (all of them conveying an increase of model complexity) convey and increase in model performance with respect to HBV for all cases. Furthermore, 535 out of these 22 alternatives, only two of them consist of a single modification, while most have 3 or 4 modifications.
Nevertheless, all of these alternatives share common traits such as using an exponential snowmelt function with no refreezing. Indeed, almost all model structures that do not have this particular modification perform worse than HBV in at least one simulation scenario. https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.
Considering these results, it is reasonable to state that, while the increased realism arising from the interplay among the 540 different modifications in these model structure alternatives play a role in improving the model performance, this is mainly the result of a few dominant modifications. This way, the use of an exponential snowmelt function is the most valuable single modification with a median performance increase of 0.002 with respect to HBV for all simulations (with individual performance increases over 0.1). However, when we combine it with a seasonal degree-day factor, we achieve a median performance increase of 0.008, almost the highest performance increase amongst all model alternatives. Adding further 545 complexity to the model does not convey significant improvements with respect to this model structure. Consequently, if we were to implement any modifications to the model, they would be to substitute the linear snowmelt and refreezing conceptualisation by an exponential snowmelt function, and replacing the constant degree-day factor by a seasonally varying one, in that order.
Finally, it is important to mention that these results are only valid for the selected study areas and cannot be extrapolated to 550 all the different alpine and snow-covered areas around the world. The different processes involved in different settings are likely to favour different conceptualisations of the snow processes. So, while the proposed modifications might be useful to tailor the model to Central-European alpine catchments, we still consider the default model structure of HBV as a very capable general purpose model that can be used in a variety of settings and for different purposes.

Conclusions 555
We evaluated the performance of the snow routine of the HBV model and explored possible modifications to make it more realistic and relevant for hydrological modelling in Alpine areas in Central Europe. We tested a number of modifications to all of the different components of the snow routine over a large number of catchments covering a range of geographical settings, and different data availability conditions on their ability to reproduce both snow water equivalent and stream runoff.
We found that the results differ greatly across the different catchments, objective functions, and simulation type 560 (calibration/validation). However, the results allow drawing the following general conclusions regarding the value of the different model modifications for the performance in mountainous catchments in Central Europe:  The comparatively simple default structure of the HBV model performs well when simulating snow-related processes and their impact to stream runoff in most of the examined catchments.  An increased model complexity does not by itself add any value to the ability of the model to reproduce snow water 565 equivalent and stream runoff. A careful examination of the design choices of the model for the given application, data availability and objective is a significantly better approach.
 An exponential snowmelt function with no refreezing is the single most valuable overall modification to the snow routine structure, followed by a seasonally variable degree-day factor. Adding further complexity to the model does not necessarily translate in significant performance improvements. 570 https://doi.org/10.5194/hess-2020-57 Preprint. Discussion started: 13 February 2020 c Author(s) 2020. CC BY 4.0 License.
These results are not transferrable to other geographical domains or purposes, but the methodology presented here may be used to assess the suitability of model design choices.

Data availability
Meteorological and hydrological data to calibrate the HBV model were obtained from the Swiss Federal Office of Meteorology and Climatology, the Swiss Federal Office for the Environment, and the Czech Hydrometeorological Institute. 575 The HBV model outputs are available from the first author upon request.

Author contribution
JS initiated the study. MGL developed the methodology and performed all analyses. MGL, NG, and MJ prepared the input meteorological and hydrological data used to calibrate the HBV model. MJPV performed all the necessary modifications to the source code of the HBV model. MGL prepared the manuscript with contributions from all co-authors. 580

Competing interests
The authors declare that they have no conflict of interest.

Acknowledgements
This project was partially funded by the Swiss Federal Office for the Environment (FOEN). The contribution of Michal Jenicek was supported by the Czech Science Foundation, project no. GA18-06217Y. 585