Hydrologic-Land Surface Modelling of a Complex System under Precipitation Uncertainty : A Case Study of the Saskatchewan River Basin , Canada

Hydrologic-Land Surface Models (H-LSMs) have been progressively developed to a stage where they represent the dominant hydrological processes for a variety of hydrological regimes and include a range of water management practices, and are increasingly used to simulate water storages and fluxes of large basins under changing environmental conditions across the globe. However, efforts for comprehensive evaluation of the utility of H-LSMs in large, regulated watersheds have been 10 limited. In this study, we evaluated the capability of a Canadian H-LSM, called MESH, in the highly regulated Saskatchewan River Basin (SaskRB), Canada, under the constraint of significant precipitation uncertainty. The SaskRB is a complex system characterized by hydrologically-distinct regions that include the Rocky Mountains, Boreal Forest, and the Prairies. This basin is highly vulnerable to potential climate change and extreme events. A comprehensive analysis of the MESH model performance was carried out in two steps. First, the reliability of multiple precipitation products was evaluated against climate 15 station observations and based on their performance in simulating streamflow across the basin when forcing the MESH model with a default parameterization. Second, a state-of-the-art multi-criteria calibration approach was applied, using various observational information including streamflow, storage and fluxes for calibration and validation. The first analysis shows that the quality of precipitation products had a direct and immediate impact on simulation performance for the basin headwaters but effects were dampened when going downstream. In particular, the Canadian Precipitation Analysis (CaPA) performed the 20 best among the precipitation products in capturing timings and minimizing the magnitude of error against observation, despite a general underestimation of precipitation amount. The subsequent analyses show that the MESH model was able to capture observed responses of multiple fluxes and storage across the basin using a global multi-station calibration method. Despite poorer performance in some basins, the global parameterization generally achieved better model performance than a default model parameterization. Validation using storage anomaly and evapotranspiration generally showed strong correlation with 25 observations, but revealed potential deficiencies in the simulation of storage anomaly over open water areas.


development, calibration, validation over some benchmark approach, and 3) our work and methods lack novelty.
Regarding the first point, we wanted to strongly emphasize that our work was conducted based on the call of the special issue entitled 'Understanding and predicting Earth system and hydrological change in cold regions'. The special issue stated that "the urgent need to understand the nature of the changes and to develop the improved modelling tools needed to manage uncertain futures… at multiple scales with a geographic focus on western Canada, including the Saskatchewan and Mackenzie River basins". The objectives of our work are clearly in line with the call of this special issue, and the paper includes not only new insights into the modelling of a large scale river system in a cold climate, with limited data, but also advances in model capability. In particular, we demonstrated the advances in the diagnosis of an improved Canadian H-LSM (i.e. MESH with the inclusion of irrigation and flow diversion modules) in modelling the highly complex river system in western Canada with consideration of errors in precipitation data and their propagation through the model. In our view, and given the complexity and size of this basin, we presented an approach that is unique geographically and captures a very specialized application of a H-LSM that we have not seen in the literature.
With respect to the second point, we acknowledge that we did not show results regarding improvements of model development, calibration, validation over some benchmark approach. We understand that showing comparison results between improved model and original model is important to show the robustness and superiority of the improved model over the original one. However, it is vital to understand that there are no previous equivalent modeling developments for this unique system, and this contributes to the challenge in hydrological simulation of this region. In the introduction section [P4L25-34; P5L1-19], we demonstrated how our model development with MESH H-LSM is different from the available limited studies. Thus, our study, which includes extensive evaluation and diagnosis of the model deficiencies in a systematic and comprehensive way, should be considered a benchmarking attempt for a detail modeling of this complex basin, which has frankly been elusive or poorly represented in other studies. However, as pointed out by reviewer 2 we will include the comparison results between the improved model (including water management) and the original MESH (no water management) in the appendix.
For the last point, we highlight in the following the novelty and contribution of our work which includes a comprehensive three-stage evaluation strategy for an H-LSM.
Firstly, as we discussed in the Introduction, H-LSMs are rarely calibrated because of their large number of parameters the complex surface heterogeneities and complicated hydrologic and water management features of most river basins, which are heavily manged. In addition,he computational requirements escalate and multiply when considering the precipitation uncertainties (i.e. driving the H-LSM with multiple precipitation products). While calibration with multiple precipitation products could be possible with a more conceptual model (such as model that depends only on water balance and runs at coarser time resolution) without representing any water management modules in a large-scale river basin (e.g. Eum et al., 2014), it is not pragmatic to conduct the same modelling exercise with a process-based H-LSM in a heavily-regulated river basin, such as the SaskRB. We tackled this challenge and offered new insights by presenting a thorough assessment of error characteristics of several candidate precipitation products using both direct and in-direct evaluation methods before calibration (first-stage evaluation). We consider this as novel aspect of our work.
Secondly, we note that H-LSM parameter estimation through calibration is still in its infancy. It is well known in the literature that "a priori" parameter values, typically based on classical approaches, are simply not an optimal solution at these scales. While arguably it might be sufficient to calibrate the model using only streamflow observations at basin outlets for smaller basins, it would be problematic to do so for large-scale basins because of the heterogeneities of the sub-basins across the whole basins (Faramarzi et al., 2016). We addressed this issue by presenting a multi-objective multi-station optimization approach using as many streamflow stations as possible (second-stage evaluation). We further evaluated the model performance by validating the spatial model outputs with additional information from the GRACE data set and two eddy-covariance field sites (third-stage evaluation). Constraining the H-LSM with multiple stations across a largescale river basin and validating its spatial outputs have not been commonly done in previous studies, thus, we consider this as a further novel aspect of our work.
Also, we think that the length of the manuscript might affect the efficiency of delivering our main contribution to the readers (as pointed out by Reviewer 2). Therefore, we will vigorously shorten our manuscript and we will ensure we better highlight the significance and novelty of our work in the end of the Introduction Section, which is shown as follows: This study was conducted to address key questions raised in the special issue entitled "Understanding and Predicting Earth System and Hydrological Change in Cold Regions". The significance of the work is to demonstrate advances in diagnosis and calibration of an improved large-scale H_LSM (the the first to report MESH model) development for the entire SaskRB (with representation of water management) for the entire SaskRB including consideration of error propagation from the precipitation inputs by presenting a three-stage evaluation strategy and inclusion of detailed evaluation aimed to improve the understanding of the basin as a whole and create a test-bed for the simulation of alternative climate, land use and water management futures. Moreover, this work highlights that the current generation of land-surface models simply cannot capture the important hydrological controls in these complex systems.
Given the above response and the revision plan, we hope that Reviewer 1 could re-evaluate our manuscript and appreciate the novelty and contribution of our work.

Specific Comments:
(1) The methodology for choosing the 'best' precipitation product is not defined and is, therefore, not reproducible. Although several evaluation metrics against ground-based observations (precipitation gauge and streamflow) are used, it is unclear how these results were combined and used to rank objectively the various products. As it currently stands, the choice of CaPA as superior to all other products is purely subjective. The reviewer's point is well-taken. We have assessed the precipitation products by direct and indirect evaluation methods without trying to combine or rank the results based on an overall performance measure. We believe that methodology for choosing the 'best' precipitation product is an ongoing research topic by itself and developing such an objective methodology is beyond the scope of this study, which is limited by the large computational requirements of using a physics-based model over such a large basin. For example model calibration for different precipitation products was not feasible, hence a pragmatic approach was taken, using default model parameters for the screening of precipitation products. Therefore, we chose CaPA for subsequent calibration based on the following judgement. While it is not easy to identify the overall best-performing precipitation data set using or ⁄ , it is clearly seen that CaPA consistently outperformed other precipitation products at seasonal and annual scales when using and (see Table 4 in the manuscript). Additionally CaPA produced the overall highest seasonal and annual across the SaskRB (see the following Table). Therefore, the choice of CaPA is not purely subjective. In the previous section's analysis, it is not easy to identify the overall bestperforming precipitation data set when considering all performance metrics. Combining or ranking the results in a systematic and robust way could be possible, however, developing such methodology is out of the scope in this study.. We chose the best-performing precipitation data set based on superior performance on multiple performance measures. It is clearly seen that CaPA consistently outperformed other precipitation products at seasonal and annual scales when using and (Table 4). A similar situation is seen when CaPA produced the overall highest seasonal and annual across the SaskRB (Fig. 4). Therefore, CaPA is used in this section for model calibration.
(2) The logic of ranking the accuracy of precipitation products by filtering them through and un-calibrated H-LSM to compare to streamflow observations is flawed. For this approach to be entirely valid, one must accept that errors in the precipitation data are propagated identically through the H-LSM for each product, independently of the chosen model parameters. Given that each precipitation product has different error characteristics ( (3) The authors state that the rationale for choosing the best precipitation product is to derive the best (most accurate) calibrated H-LSM. However, the authors never end up demonstrating that this assumption is correct; this paper merely reinforces intuition, it does not reveal it as fact. Arguably, any number of precipitation products, once incorporated in the calibration process, could result in several parameterizations of MESH with very similar performance. In addition, by only using one precipitation product, the authors are not actually conducting what I would infer is "Hydrologic-Land Surface Modelling . . . under precipitation uncertainty . . .", as stated in the title.

(4) In the first paragraph of the introduction it reads that the motivation behind the paper is predicated on the fact that the deployment and calibration requirements of Hydrologic Land Surface Models (H-LSMs) differs markedly from Land Surface Models (LSMs), Hydrology Models (HMs) and Global Hydrology Models (GHMs). Without a clear definition/description of what constitutes/distinguishes these four type of models, treating H-LSMs as unique seems artificial. I am confident that decades of literature on the calibration of HMs does not need to be tossed because H-LSMs are so uniquely different (i.e. there is no need to start from scratch when discussing how to deploy an H-LSM).
We understand the concern of the reviewer and we think that mentioning different types of models in the Introduction section might create confusion and make the motivation of the paper unclear. Accordingly, we will revise the first paragraph in the Introduction section [P2L1-10], which is shown as follows: During the past few decades, the development of hydrological models (HMs) for large-scale application (~10 3 -10 6 km 2 ) has Land Surface Models (LSMs) have expanded in scope and complexity because of emerging water security challenges (Eagleson 1986;Clark et al., 2015;Döll et al., 2003;Vörösmarty et al., 2000). There are many large-scale HMs, which broadly fall into two categories, namely, Global Hydrological Models (GHMs) and Land Surface Models (LSMs) (Döll et al., 2016;Gudmundsson et al., 2012 (Archfield et al., 2015;Davison et al., 2016). More recently, several LSMs have included irrigation and water management modules (Haddeland et al., 2006;Voisin et al. 2013aVoisin et al. , 2013bPokhrel et al., 2017), which are well established in Global Hydrological Models (GHMs) (Döll et al., 2003;Archfield et al., 2015;Wada et al., 2017). The integration of these various processes has enabled LSMs to be used in support of a wide range of hydrological applications, in which they are referred to as Hydrologic-Land Surface Models (H-LSMs) (Pietroniro et al., 2007). Although H-LSMs have made steady advances in representing hydrologic processes and incorporating human impacts on the terrestrial water cycle, the investigation of input data uncertainty and parameter estimation through calibration for large-scale basins has been limited and is not common practice with H-LSM models compared to their extensive use by the catchment hydrological modeling community.
(5) Having as an objective the desire to 'improve the H-LSM parametrization using a state of-the-art computationally efficient calibration approach . . ." seems quite trivial. There has been sufficient research with hydrologic modelling (whether that be an HM, GHM or LSM) to indicate that model calibration improves accuracy and is a necessary step. It is also arguable whether a calibration approach that relies on a single objective function constrained only by streamflow is actually state-of-the-art. In addition, the adopted calibration approach does not actually tests the effectiveness of parameter transferability, as claimed. The use of independent streamflow gauges to evaluate the calibration does not test for transferability, as no parameters have been spatially transferred; they have in fact been calibrated in place (unless I missed something in the text). What is actually being tested using independent gauges is the spatial robustness of the calibrated model parameters.
We thank the reviewer for raising his/her concerns on the objective of our work and our calibration approach. Regarding the objective, we acknowledge that the current presentation of the objectives was not fully reflected the main goal of our study. After considering both reviewers' comments, we will revise the presentation of our study objectives [P4L1-17] to better reflect our work, which is shown as follows: Regarding parameter transferability, we want to emphasize that the parameters have been spatially transferred in the validation process in two ways. The first way is based on physical similarity of the basins (Patil and Stieglitz, 2015