Impact of modellers ’ decisions on hydrological a priori predictions

Introduction Conclusions References

Introduction of the crust layer (K sat = 0.6 mm/h according to Simunek et al. (1998)) to 27 allow infiltration excess. K sat was reduced to limit subsurface flow. Modifying the lower 28 boundary condition to better represent the influence of the clay dam. 29 • Assumptions and parameters (3 rd

prediction) 30
The spatial resolution of the model grid (upper slope 10 m, middle slope 5 m and lower 31 slope 1 m resolution) was changed to 5 m in order to reduce the numerical problems 32 caused by the implementation of the surface crust. Soil crust (5 cm) was resolved with 1 vertical increments of 1 cm (before: top 20 cm with 4 cm resolution). This resulted in an 2 18% reduction of the total number of nodes. 3 Only the soil hydraulic data and the infiltration rates were selected. The virtual costs were 4 the second lowest of all modellers. These data were used to parameterize the soil and the 5 soil crust. 6 • Results 7 In the 1 st and 3 rd prediction, Catflow discharge was among the largest of all models (e.g., 8 262 mm/y and 255 mm/y in 2006/2007) mainly due to the large base flow of 20 to 25 m 3 /d. 9 Due to numerical problems Catflow's 2 nd prediction is missing. 10 For the first two years the model performance improved from the 1 st to the 3 rd prediction 11 ( Fig A-1a and A-1b). Neglecting the snowmelt event (SME) discharge was simulated for 12 nearly all years with a similar prediction uncertainty. Q 90 (77 m 3 /d, 3 rd prediction stage) 13 was the second largest of all events and Q 5 was >20 m 3 /d. Therefore, the base flow was 14 very high compared to surface runoff, which was the lowest relative runoff of all models 15 (<1% in the 1 st year to 25% in the 3 rd year). The initial conditions used a higher 16 groundwater table than the observed one and reaches complete water saturation of the 17 whole aquifer early in 2008 (Supplementary Material (E)). • Basic model features 7 Distributed hydrological model that calculates infiltration and unsaturated percolation 8 using Richards equation and the lateral saturated flow using Darcy's law. Infiltration 9 excess is routed to the stream network (Kraft et al., 2008). 10 • Assumptions and parameters (1 st

prediction) 11
Opposite to the other models CMF was developed by the user himself using an irregular 12 grid. The modeller intended to evaluate how CMF performs in a region where the model 13 was not used before. His initial parameterization followed a traditional procedure, e.g. 14 using average soil properties derived from texture data (K sat = 416.7 mm/h, AG Boden 15 (1994)) and not considering soil freezing on K sat . 16 • Assumptions and parameters (2 nd

prediction) 17
Although the field visit showed the importance of surface runoff due to the deep gullies, 18 the soil crust (to reduce infiltration) could not be implemented due to the model structure. 19 However, the modeller expected that water exfiltrates through preferential flow pathways 20 into the gullies even when the topsoil was still unsaturated. This substantially increased 21 surface runoff through the gullies. In order to more realistically represent the newly 22 constructed catchment, the modeller changed the initial state to quasi-dry conditions (pF 23 2.5). K sat was reduced to 60 ± 30 mm/h to diminish the large discharge. Finally, the leaf 1 area index (LAI) was set to 1. 2

• Assumptions and parameters (3 rd prediction) 3
After 2 nd stage prediction the CMF modeller completed his PhD-program and left the 4 modelling group. 5 • Results 6 In the 1 st prediction stage, the modeller reported one of the largest annual discharge (~220 7 mm/y), a peak discharge of 461 m 3 /d, and a Q 5 of 20 m 3 /d. Average soil properties were derived from texture data (K sat = 84 mm/h, Lundmark and 13 Jansson (2009)) since the sandy soil was assumed to be homogeneous. Soil and snow 14 evaporation are considered to be important. 15

• Assumptions and parameters (2 nd prediction) 16
The different development of the vegetation in the western and eastern half of the 17 catchment resulted in a different parameterizations. The modeller assumed dry initial 18 condition and reduced K sat to slow downstream subsurface flow. 19 • Assumptions and parameters (3 rd prediction) 20 After 2 nd stage prediction the CoupModel modeller left the modelling group due to time 21 constraints. 22 • Results 1 The model predicted the lowest annual discharges in the 1 st prediction stage with less than 2 15 mm/y base flow and 60 mm/y of surface run-off. CoupModel was the only model, 3 which included snow melt. However, the discharge during the SME was strongly under-4 predicted. Neglecting the SME resulted therefore in a lower RMSE (Fig. A-3a). 5 Although the discharge was increased to an average of 145 mm/y the RMSE and the NSE 6 do not show a significant change (Fig. A-3a and A-3b). While the peak discharge 7 decreased slightly from 77 m 3 /d to 65 m 3 /d from the 1 st to 2 nd prediction, the slower flow 8 events increased (e.g. Q 25 = 1 m 3 /d in the 1 st prediction and Q 25 = 11 m 3 /d in the 2 nd 9 prediction). Still, the model predicted also 2% of all events with zero discharge. The 10 CoupModel used a too small storage coefficient and predicted too small groundwater table  11 fluctuations (Supplementary Material (D)). The field visit made the user aware of deep gullies and different vegetation cover in the 20 western and eastern half of the catchment. 21 The original model could neither adequately describe the actual evapotranspiration (AET), 22 soil crusting, nor the clay dam effects on the groundwater dynamics. Therefore, the 23 modeller switched to using MIKE SHE, redefined the initial conditions, and made small 1 changes to reduce AET and PET (potential evapotranspiration). The soil hydraulic 2 properties stayed constant. 3

• Assumptions and parameters (3 rd prediction) 4
Soil crust newly parameterized using information which was later published by Gerwin et 5 al.
(2011) and Mazur et al. (2011) and PET adjusted by increasing the vegetation cover. 6 All soil hydraulic data were selected, but none of the others. This resulted in the lowest 7 virtual costs of all modellers. 8 • Results 9 The results calculated by Hill-Vi (1 st prediction) suffered from overestimating interflow 10 and base flow, which were reduced from about 360 mm/y (1 st prediction) to 40 mm/y (2 nd 11 prediction) by modifying PET and adding a soil crust. This improved the predictions 12 significantly decreasing RMSE ( • Basic model features 7 Rainfall infiltrates completely into the soil except when the soil column is entirely 8 saturated. Vertical distribution of soil water within the soil column is not simulated. Lateral 9 and subsurface flows are both calculated one-dimensionally (Chirico et al., 2003). 10

• Assumptions and parameters (1 st prediction) 11
Modeller tried to match the annual water balance with that of other catchments having a 12 similar rainfall regime. 13 Unsaturated zone was not simulated (model limitations). The soil properties were assumed 14 uniform along the of a soil column. Hydraulic parameters were derived from texture data 15 using PTF (K sat = 50 mm/h, Rawls and Brakensiek (1985)). Initial soil moisture and 16 groundwater levels were assumed to be uniformly distributed throughout the catchment. 17

• Assumptions and parameters (2 nd prediction) 18
The initial state was changed from pre-wetted to quasi-dry. K sat was reduced to slow down 19 discharge and the influence of the clay dam was increased. 20 21 • Assumptions and parameters (3 rd prediction) 1 Modeller realised that the model is not suitably structured for an efficient description of the 2 groundwater dynamics, particularly in the initial stage of the emerging groundwater table  3 and the clay dam required a spatial structure of the subsurface compartment. Changing the 4 model structure (introducing soil freezing, implementing a soil crust having a hydraulic 5 conductivity of 3 mm/h, and changing K sat to 100 mm/h) caused substantial costs (time) 6 without a significant reduction of the prediction uncertainty due to the uncertainty in the 7 model parameterisation based on the apparently still insufficient data set. 8 As a result, the modeller tried to obtain the best fit with the best set of the available data 9 thereby using all additional soil data including the actual measurements were selected 10 (average cost compared to other modellers' choices). 11 • Results 12 During the 1 st prediction stage, the slow flow components were too prominent because the 13 model structure allows infiltration until the soil is completely saturated. The high hydraulic 14 conductivity allowed the groundwater to drain rapidly. The changes in parameterization 15 resulted in higher peak flow (287 m 3 /d) and little slow flow (e.g. 8% of all days show zero 16 discharge). However, this did not reduce the RMSE (Fig. A-5a) compared to the 1 st 17 prediction state. Similarly, the NSE was lowered significantly so that the NSE for all years 18 is even below zero for the 2 nd prediction (Fig. A-5b). 19 Making use of the additional soil data increased discharge more than in all other models 20 (+88 mm/y) and also produced the largest peak discharge (1481 m 3  infiltration. This resulted in slower flow events (<75% of all events had a discharge of less 23 than 1 m 3 /d). This was the main cause why the RMSE slightly decreased and the NSE 24 increased between the 2 nd and 3 rd prediction stage (Fig. A-5a). Only the NetThales 25 modeller addressed soil freezing in the 3 rd prediction, which improved the predictions for 26 the first hydrological. However, excluding the SME period still results in a better RMSE 27 (Fig. A-5a). Physically-based SVAT model (Soil Vegetation Atmosphere Transfer) which solves the 10 Richards equation to estimate infiltration and soil-water fluxes, lateral groundwater flow is 11 addressed by concentration time, and surface runoff is calculated by a semi-analytical 12 solution of the Richards equation (Diekkrüger and Arning, 1995;Bormann, 2008 The modeller wanted to minimize the influence of the modeller's choice and did not decide 4 a priori on the dominant process(es) based on his experience from previous studies where 5 the model produced reliable results without prior calibration (Bormann et al., 1999). 6 The modeller used the pedotransfer function according to Rawls and Brakensiek (1985) to 7 derive soil hydraulic parameters. A national soil data-base (Adhoc AG Boden, 2005) was 8 used to estimate bulk density. These assumptions resulted in a mean K sat of 60.8 mm/h 9 while spatially distributed soil parameters were used for the modelling study (25 m grid). 10 The model started with unsaturated conditions but allowed for partial saturation in case of 11 storage based lower boundary condition. Additionally, the modeller implemented the 12 dynamics of the lake at the catchment outlet. 13 • Assumptions and parameters (2 nd prediction) 14 The modeller assumed a shallow soil layer above the dam. After field visit he considered 15 soil crusts as the major cause for the observed soil erosion. Since the modeller visited the 16 catchment in spring, he noticed the fast development of the vegetation. Bormann (2011). 28

• Assumptions and parameters (3 rd prediction) 29
The modeller noticed that the modelled results still varied significantly among the various 30 models. The simulated water balance was consistently wrong, changes in 31 evapotranspiration parameterisation did not seem to be appropriate, and the subsurface 32 storage needed to be better adapted. As a result, the modeller updated the variability in 33 surface K sat and the lower boundary condition of the soil columns once more in order to 34 better describe the infiltration as well as subsurface storage (Bormann, 2011). Additionally, 35 the initial condition was re-defined as dry soil. 36 The modeller chose the data based on its usefulness for complementing the model set-up. 1 This resulted in the 2 nd highest virtual costs. The additional soil physical data were used to 2 confirm the magnitude of the soil hydraulic parameters in the preceding simulations. The 3 data from the infiltration experiments were used to improve the description of the 4 hydraulic properties of the surface layer. The field data were complemented by literature 5 values to parameterise the spatial variability of K sat (Cosby et al., 1984). Finally, the 6 modeller used soil moisture in two steps. First, he evaluated his model with these data and, 7 in a 2 nd step, he used them to calibrate the model by adjusting the lower boundary 8 conditions of the soil columns (Bormann, 2011). This resulted in a hydraulic conductivity 9 of the soil crust of 11.6 mm/h. Although the modeller opted for using the vegetation data, 10 he did not use them after reviewing these data. 11 • Results 12 The 1 st predictions suffered by too much discharge, twice the observed value. More or less 13 all discharge was considered to be base flow. 14 The base flow was then effectively reduced to about 20% of the total discharge, which was 15 the lowest of all models (157 mm/y) in the 2 nd prediction. Since interflow was negligible, 16 about 80% was primarily surface runoff. Although SIMULAT predicted zero discharge for 17 many days, it rose rapidly and produced the largest peak discharge of all models (1433 18 m 3 /d in the 2 nd year). SIMULAT predicted primarily surface runoff (~ 80%) and negligible 19 interflow. However, like all other predictions with peak discharge of >400 m 3 /d, the 20 discharge between Q 100 and Q 95 went down by nearly two orders of magnitude. Due to 21 reduction of the vegetation cover AET was lowered considerably and was the lowest of all 22 models (157, 266 and 260 mm/y for the 1 st , 2 nd , and 3 rd year, respectively). 23 PET simulated for 2006/2007 did not change from the 2nd to the 3rd prediction. 24 SIMULAT predicted in two of the three years the largest discharge (e.g. 291 mm/y in 25 2006/2007) and an average increase of discharge by 21 mm/y. This was a consequence of 26 using the soil data. SIMULAT generated little surface runoff (22%) and mainly subsurface 27 runoff and therefore the opposite of what it simulated in the 2 nd prediction. This resulted in 28 the lowest peak discharge (106 m 3 /d) of all models (Supplementary Material (C)). This is a 29 reduction by >90% compared to the 2 nd prediction 30 The prediction quality for the 1 st year did not improve but for the two following years it did 31 (Fig. A-6a). The prediction quality from the 2 nd to 3 rd prediction shows no significant 32 change. Similar results are also shown by NSE (Fig. A-6b). properties (Arnold et al., 1998). SWAT simulates plant growth and its effects on the water 1 balance based on the EPIC model. 2

• Assumptions and parameters (1 st prediction) 3
Assumptions made in SWAT make it more adapted to simulate mesoscale catchments. 4 Although it cannot be considered a perfect pick for small catchments, applications of 5 SWAT range from hill slope case studies to large basins such as the Mississippi River. The 6 modeller therefore intended to further test its performance for small catchments. He 7 minimized the influence of the modeller's decision and used default values available from 8 the SWAT user manual. The modeller employed PTF (mean K sat = 74.5 mm/h, Rawls and 9 Brakensiek (1985)) for assessing soil hydraulic properties with a cluster analyses to check 10 the soil homogeneity assumption. The dynamics of the lake at the catchment outlet was 11 implemented with a total volume estimated from the DEM. 12

• Assumptions and parameters (2 nd prediction) 13
The user became aware of the deep gullies during the on-site inspection of the catchment. 14 Soil crusting is not included into the model. Ksat stayed constant but he tried to account for 15 a larger surface runoff by allowing reinfiltration from gullies. The modeller removed the 16 warm-up period completely so that the initial state was changed from pre-wetted to quasi-17 dry conditions. Finally, the modeller changed the volume of the already implemented lake 18 to the actual value provided during the first workshop. 19

• Assumptions and parameters (3 rd prediction) 20
The modeller kept the model setup as used for the 2 nd prediction, but corrected some 21 parameter values based on the provided dataset that may affect both physical and plant 22 processes (e.g. the organic carbon content The predictions by SWAT showed too large peak discharge and, as in case of the other 30 models, the snow melt problem. However, especially in the 2 nd and 3 rd year the results 31 were the best of all models as shown by RMSE and NSE (Fig. A-7a and A-7b) despite the 32 fact that assumptions made in the structure of SWAT can be judged inadequate to represent 33 small catchments. 34 Although SWAT changed the initial conditions to quasi-dry conditions in the 2 nd prediction 35 stage, the storage changes where negative in the 2005/2006 (-24 mm/y). Later they became positive. The peak discharge was strongly reduced (from ~900 m 3 /d in the 1 st prediction to 1 <100 m 3 /d in the 2 nd ) but the slow flow components strongly increased, e.g. Q 25 = 9 m 3 /d. 2 This resulted in a strong decrease of RMSE and increase of NSE (Fig. A-7a and A-7b). 3 SWAT had similar discharge components comparing the 3 rd and 2 nd prediction, producing 4 about 50% of surface runoff and base flow. The additional weather data only slightly 5 affected PET (+30 mm/y to 847 mm/y) but increased by +33 mm/y in 2005/2005 and +118 6 mm/y in 2006/2007. This decreased discharge in the average by -101 mm/y, the second 7 lowest peak discharge, with Q 25 < 1 m 3 /d). This reduced the prediction uncertainties for the 8 first two years slightly but increased them for 2007/2008 (Fig. A-7a and A-7b). Wouter Buytaert (University of Bristol, now: Imperial College London) 3

• Basic model features 4
A semi-distributed hydrological model that assigns a combination of storage compartments 5 such as the root zone, unsaturated and saturated zone. Infiltration is determined by the 6 Green-Ampt equation and a time delay function controls flow within a vertical soil 7 column. The lateral subsurface flow is estimated by an exponential transmissivity function. 8 The model does not consider the influence of soil freezing on K sat nor snow (Beven et al., 9 1995). 10

• Assumptions and parameters (1 st prediction) 11
The modeller wanted to minimize the influence of the modeller's choice and used 12 published PTFs for parameterization (K sat = 58 mm/h, Saxton et al. (1986)). 13

• Assumptions and parameters (2 nd prediction) 14
The modeller visited the catchment only in a spring period. The rapid appearance of 15 vegetation would be simulated, but the soil crust could not be introduced into the model. 16 This modeller did not include the clay dam for the 1 st prediction. He added it in the 2 nd 17 stage. The subsurface response was slightly increased by increasing the areal average of 18 the local transmissivities (m 2 /h) at saturation from lnTe = -2.5 to -2. This corresponds with 19 Te = 0.082 to 0.135 m 2 /h. The clay dam delays the subsurface response and the modeller 20 tried to mimic this by changing lnTe. This is a merely intuitive approach since there was 21 no guidance on how to adapt the parameter. Additionally, a surface runoff of 10% of the 22 precipitation were generated by changing K sat at the surface (5 mm/h) and the capillary 23 drive CD (1 mm) (Morel-Seytoux and Khanji, 1974). 24

• Assumptions and parameters (3 rd prediction) 25
The modeller selected only the information that appeared to be most important since the 26 model is limited by its conceptual nature. This made it difficult to integrate additional data. 27 Therefore, the modeller chose the soil hydraulic parameter without the water retention 28 curves and the soil moisture data set. 29 The modeller used these data for calibration: all TDR measured at 10 cm depth and the 30 groundwater levels of observation well L4 were averaged and used as proxies for the 31 storage deficit. He expected that these two observations are inversely related. He then 32 performed a Monte Carlo sensitivity test based on the correlation coefficient as 33 performance measure. The modeller deduced from this that only the amount of water 34 (expressed as a depth), which the soil can hold within the root zone (Sr max ) is a sensitive 35 parameter. Finally, Sr max was chosen to 0.02 m. Subsequently, the initial root zone storage 1 deficit (Sr 0 ) and the initial subsurface flow per unit area (q s0 ) were updated to be 2 compatible with Sr max . 3

• Results 4
Topmodel showed one of the best predictions during the 1 st prediction stage with a RMSE 5 about 50 m 3 /d (Fig. A-8a). The main problem were the large discharges, e.g. the 2 nd largest 6 peak discharge (777 m 3 /d). One reason was the rather low PET of ~570 mm/y. 7 The changes between the 1 st  The modeller relied on the description of the physical processes in his model and thereby 13 minimized the influence of his own decisions. He derived soil hydraulic parameters from 14 the texture data (K sat = 118 mm/h, Adhoc AG Boden (2005)). The model was initialized by 15 several warm-up runs to achieve steady state conditions. The clay dam was not considered. 16

• Assumptions and parameters (2 nd prediction) 17
The on-site inspection of the catchment showed the extent of erosion due to surface runoff 18 and the modeller noticed the different vegetation development in the western and eastern 19 half of the catchment. Therefore, the model was split into a vegetation-free (before 2008) 20 and a vegetation period with a low density grass canopy. To account for the surface runoff, 21 K sat was reduced and a soil crust having with a hydraulic conductivity of 20 mm/h based 22 on NAW (2008) was introduced. 23 The initial state was changed from pre-wetted to quasi-dry and the lake was included. The 1 formerly constant layer thickness was defined according to the Digital Elevation Model 2 (DEM) in order to account for the clay dam. The detailed description of parameterisation 3 of the 2 nd stage prediction of WaSiM-ETH (Richards) can be found in Hölzel et al. (2011). 4

• Assumptions and parameters (3 rd prediction) 5
The modeller emphasized the relevance of the groundwater dynamic by a more realistic 6 representation of the clay dam (Holländer et al., 2009). He replaced the conceptual 1-D by 7 a process-based 2-D groundwater approach (Hölzel et al., 2013). 8 The modeller still relied on the description of the physical processes and therefore, he 9 requested data which allowed improving the model such as K sat derived from slug tests in 10 the field and K sat determined in the laboratory, water retention, and soil moisture data. He 11 used these data to have control on the soil moisture dynamic. Additionally, the data of 12 weather station II were selected. 13 • Results 14 In the first prediction WaSiM-ETH (Richards) overestimated discharge discharge mostly 15 predicting >10 m 3 /d but peak discharge was rather low (140 m 3 /d). 16 The soil crust with a K sat of 200 mm/h did not effectively reduce the infiltration into the 17 soil. This lowered AET significantly (234, 273, and 285 mm/y) and discharge increased 18 (153, 309, and 306 mm/y), the largest of all models. Therefore, peak discharge increased to 19 1028 m 3 /d, the second largest of all models. Also the base flow was larger (15 m 3 /d) than 20 in the 1 st prediction. Surface runoff and interflow were obtained each at 20% and 60% base 21 flow. The modeller identified the 1-D groundwater approach used in the 2 nd prediction to 22 be a major problem because it prevented the implementation of the clay dam. Including it 23 would have reduced the base flow. The prediction quality did not significantly change from 24 the 1 st to 2 nd prediction (Fig. A-9a and A-9b). 25 Using the weather data from the new station for the 3 rd prediction reduced PET by about 30 26 mm/y to 682 mm/y and discharge by -65 mm/y. This affected both, the peak flow (1210 27 m 3 /d) and the base flow (10 m 3 /d). However, the predicted discharge was still exceeded the 28 observed values. The reduction in discharge improved the prediction quality for the first 29 two years but got worse in the third year (Fig. A-9a and A-9b) despite the apparent benefit 30 of additional and more specific data. 31 The contribution of the discharge components to total discharge changed significantly over The WaSiM-ETH (Topmodel) modeller joined the modelling group only in the 2 nd 1 prediction stage. 2

• Assumptions and parameters (2 nd prediction) 3
The modeller attended the 1 st workshop as an observer and subsequently joined the project. 4 Due to the large variation in the reported water budget and runoff, the modeller concluded 5 from the discussions that a completely physically based model for the unsaturated zone is 6 not needed in this case. During the on-site inspection of the catchment the modeller 7 became aware of the deep gullies, noticed the different vegetation development in the 8 western and eastern half of the catchment, and the rapid changes of the soil surface 9 apparently depending on soil texture. 10 The modeller used the initial conditions generated by a one-year pre-run using the 1 st year 11 data despite the fact that the catchment was initially dry. Soil hydraulic parameters were 12 estimated by PTFs according to Wösten and Nemes (2004) and Saxton et al. (1986) and 13 used a set up based on the DEM data. 14 The largest base flow was predicted by WaSiM-ETH (Richards) at least 15 m 3 /d. The peak 24 discharge was 701 m 3 /d. Therefore, the modeller obtained at 60% base flow, 21% surface 25 runoff, and 19% interflow.