A framework to regionalize conceptual model parameters for global hydrological modeling

To provide an accurate estimate of global water resources and help to formulate water allocation policies, global 10 hydrological models (GHMs) have been developed. However, it is difficult to obtain parameter values for GHMs, which results in large uncertainty in estimation of the global water balance components. In this study, a framework is developed for building GHMs based on parameter regionalization of catchment scale conceptual hydrological models. That is, using appropriate global scale regionalization scheme (GSRS) and conceptual hydrological models to simulate runoff at the grid scale globally and the Network Response Routing (NRF) method to converge the grid runoff to catchment streamflow. To 15 achieve this, five regionalization methods (i.e. the global mean method, the spatial proximity method, the physical similarity method, the physical similarity method considering distance, and the regression method) are first tested for four conceptual hydrological models over thousands medium-sized catchments (2500-50000 km 2 ) around the world to find the appropriate global scale regionalization scheme. The selected GSRS is then used to regionalize conceptual model parameters for global land grids with 0.5 o ×0.5 o resolution on latitude and longitude. The results show that: (1) Spatial proximity method with the 20 Inverse Distance Weighting (IDW) method and the output average option (SPI-OUT) offers the best regionalization solution, and the greatest gains of the SPI-OUT method were achieved with mean distance between the donor catchments and the target catchment is no more than 1500 km. (2) It was found the Kling-Gupta efficiency (KGE) value of 0.5 is a good threshold value to select donor catchments. And (3) Four different GHMs established based on framework were able to produce reliable streamflow simulations. Overall, the proposal framework can be used with any conceptual hydrological model for estimating 25 global water resources, even though uncertainty exists in terms of using difference conceptual models.

The "spatial proximity method" may well yield the highest KGE, but it cannot be used at global scale due to the lack of gauges in many regions. As an example, grid cells in the southern Ecuadorian Andes would receive parameters from donor catchments located in the Amazon, which doesn't make any sense. So while this approach may give you the best performance scores, it is not actually a method that should be used at global scale. This should be explicitly mentioned in the abstract and the conclusion. Even if the "mean distance between the donor catchments and the target catchment is no more than 1500 km" the actual difference in climate and landscape could be huge.
Reply: Thanks. This is indeed one of the limitations of the spatial proximity method. Beck et al. (2016Beck et al. ( , 2020 demonstrated that regionalization approaches provide slightly less (but still substantial) benefit in poorly gauged regions. According to the results shown in Figure  6, the advantage of the SPI-OUT method is reduced with the increased mean distance between donor and ungauged catchments, and it is not an easy task to decide which method outperforms others. However, when the mean distance is less than 1500 km, SPI-OUT generally performs better than or comparable to other regionalization methods in this study. Therefore, we took 1500 km as our threshold to get the optimal results of this method and decrease the influence of the increasing distance.
In addition, the high degree of variability in meteorological and hydrological variables, as well as the sparse meteorological and hydrological stations both make it difficult to effectively simulate runoff in poorly gauged regions. The selection of the catchment attributes which can properly represent the catchment similarity is also more difficult in these regions (Merz et al., 2020).
It is true that lack of gauges or too few gauges in many regions is a common problem for all other regionalization methods at global scale. This problem is even more pronounced for other methods than spatial proximity, for example regression method and physical similarity method, where no control about the distance for the donor catchments. Compared with other methods, spatial proximity is the only method that tries to choose the nearest catchments to the target catchments. Therefore, under the limitation of various conditions, using the spatial proximity method may be reasonable as it performs the best.
Following reviewer's comment and advice, above discussions will be added in the revised version. Reply: Yes, the k value in PCGLOBWB 2.0 is calculated following the drainage theory of Kraijenhoff van de Leur (1958) based on drainage network density and aquifer properties as that in PCGLOBWB 1.0. In addition, the PCGLOBWB 2.0 raises the possibility that coupling the MODFLOW to calculate groundwater heads and flow paths. We will add the clarification in the revised manuscript.

References
References: Van De Leur, D. K.: A study of non-steady groundwater flow with special reference to a reservoir coefficient, De Ingenieur, 70, B87-B94, 1958. so I'm surprised that this isn't reflected in your mean. Did you apply some sort of mask, or did you cap the values before calculating the mean? This info needs to be in the caption.
Reply: Sorry for the confusion. We did not do some pretreatment before calculating the mean value for the aridity index. Table 1 shows the descriptors of each catchment. Only 247 catchments belong to the arid climate according to the Köppen-Geiger climate classification  and most of them are in the semiarid steppe regions. Therefore, the mean value of the aridity index is not that high. A more clear clarification will be added in the revised manuscript.

References:
Kottek  Reply: Aridity and potential evaporation were derived from Global Aridity and PET Database (Zomer et al., 2008). Terrain characteristics were derived from Hydro1K Database (USGS, 1996). Land use was driven from GlobCover Land Cover Maps (Bichero et al., 2011). The soil index was obtained from the Harmonized World Soil Database (FAO and ISRIC, 2012). All the information above will be added in the revised manuscript. Section 2.2: Hydro1K is a very outdated dataset. A newer dataset should be used, maybe HydroSHEDS or MERIT.

References
Reply: Thanks for the suggestion. In recent years, many high-resolution topographical data sets that are potentially helpful in producing more accurate hydrography maps have been released (e.g., HydroSHEDS, MERIT, and ASTER GDEM), which are useful for regional modeling studies. However, the quality of Hydro1K has been confirmed by many previous studies. In the last few years, Hydro1K has been widely used and has become the most commonly used global ancillary files for topographic index values. Therefore, due to the relatively low resolution used in this study (0.5°), the efficiency of Hydro1K and a large amount of calculation for changing the topography data sets, we chose to use Hydro1K in this study and the use of updated datasets will be presented as a perspective in the revised manuscript. We will add it in the discussion of the revised manuscript.  Reply: Thanks for the suggestion. The World Map of Köppen-Geiger climate classification  used in this study is from Kottek et al. (2006). The climate classification of each 2277 catchment in Figure 1 will be changed to the 5 major classes in the revised manuscript.

References:
Kottek, M., Grieser, J., Beck, C., Rudolf, B., and Rubel, Reply: Thanks for the suggestion. The simultaneous regionalization method mentioned by the reviewer, in which parameter regionalization is carried out through simultaneous calibration of transfer function parameters by assuming prior relationships between basin predictors, was proposed to solve the parameter equifinality problem and was widely used in recent years (Hundecha and Bá rdossy, 2004;Samaniego et al., 2010Samaniego et al., , 2017Beck et al., 2020). However, the high degree of variability in meteorological and hydrological variables, as well as different simplifications of hydrological processes in hydrological models make it difficult to effectively select the catchment attributes and the proper transfer functions of parameters (Mizukami et al., 2017). Especially, the high computational cost in performing this regionalization method makes it difficult to use in this study.
In addition, the regionalization methods used in this study are the most commonly used, which are less data and computation demanding. Although they do have some limitations, these methods have been successfully used in the regionalization of ungauged catchments all over the world. These methods also show reasonable performance in regionalization in this study. In addition, the GHMs built by the proposed framework show reasonable efficiency in global water resource estimation and are comparable to other previous GHMs in catchment streamflow simulation (Widé n- Nilsson et al., 2007;Beck et al., 2016;Arheimer et al., 2020). Therefore, we would like to conduct an even more comprehensive comparative study for all the methods on well selected data rich regions in the future study.
"The regression-based method assumes that a well-behaved relationship exists in the observable catchment characteristics and model parameters" which in reality is almost never the case due to parameter equifinality and therefore this approach rarely works. Hence my suggestion to test the other regionalization approach.
Reply: Thanks for the suggestion. We agree that the regression-based method suffers most of the equifinality problem since this problem makes us cannot find the so called "true parameter values". Therefore, other methods are more favorable. The regression method will be deleted in the revised manuscript. In addition, we will adjust the contents and results in the revised manuscript.
"The SP method assumes that nearby catchments should have similar behavior for climate and catchment conditions (features) varying uniformly in space." "Nearby" could be several thousand kilometers away so this approach should never be used at global scale.
Reply: Sorry that we failed to explain it clear enough in the original manuscript. As we explained in the reply to the first comment above, lack of gauges or very few gauges in many regions is a common problem for all other regionalization methods at global scale. This problem is even much more pronounced for other methods than spatial proximity, for example regression method and physical similarity method, where no control about the distance for the donor catchments. Compared with other methods, spatial proximity is the only method that tries to choose the nearest catchments to the target catchments. Studies have shown that this is likely the most suitable method at global scale since other methods, among other problems, can choose donor catchment with distance much longer than the SP method. Other studies also show that spatial proximity method chose donor catchment, on average with shorter distance than other methods. We will add clarification and discussion to this important issue in the revised manuscript.
Another limitation of the study is that lumped catchment attribute and model parameter values are used, despite the often large heterogeneity within catchments. This limitation is addressed in several studies (e.g., Samaniego 2008 andBeck 2020) and should at least be discussed somewhere in the paper.
Reply: Thanks for the suggestion. Lumped models consider catchment as a whole and thus ignore the within-catchment heterogeneity in landscape and climate (Beck et al., 2020;Samaniego et al., 2010). However, due to the large number of basins and the large amount of calculations, the use of distributed models in the regionalization methods comparison and global scale regionalization scheme selection may not be appropriate. Therefore, four frequently used conceptual hydrological models in regionalization studies were selected to compare the performance of regionalization methods and to find the optimal global scale regionalization scheme in this study. We will add this in the discussion of the revised manuscript. "The results show that the distributions of model efficiency of four hydrological models are similar to each other and indicate that the difference between hydrological models was negligible in the model calibration and validation, which is in line with previous studies (Beck et al., 2016;Vetter 245et al., 2015;Demirel et al., 2015)." This is definitely not in line with previous studies, which generally found large differences between models and a considerable difference between calibration and validation scores. You actually also obtained substantial differences between calibration and validation scores (figure 2)! Reply: Sorry that we failed to make it clear enough in the previous version. Here we would like to illustrate that from the calibration to the validation period, the differences of KGE from hydrological models are similar. The sentence will be changed to "The results show that the distributions of model efficiency of four hydrological models are similar to each other. In particular, the difference of KGE among hydrological models is similar from the calibration to the validation periods." This sentence will be corrected in the revised manuscript.

References
Figures 3 and 10: This figure is impossible to interpret. https://www.climate-labbook. ac.uk/2016/why-rainbow-colour-scales-can-be-misleading/ Reply: These two figures will be revised in the revised manuscript. Figure 5: The figure is a bit difficult to interpret, maybe use vertical instead of diagonal xaxis labels, and apply some coloring to group similar methods together?
Reply: Thanks for the suggestion. The figure will be revised. Figure 6: There is no information about the number of catchments representing each bar, so a particular bar could be represented by just 1 catchment. I can't think of a solution right now, but this information should not be hidden.
Reply: Thanks for the referee's suggestion. This figure will be redrawn and the information will be added in the revised manuscript. Reply: Thanks. This will be revised. Table 6: "many data may not be directly comparable because of different continental boundaries and averaging periods." A solution would be to only use estimates representing the same area. Also, why report values from a study >40 years old (Korzun 1978)? Considering adding GSCD estimates (http://www.gloh2o.org/gscd/).
Reply: Thanks for the suggestion. This is the optimal way to compare the results of the global water resource simulated in this study. However, it is hard to realize because of the difficulty in obtaining fully coherent data. We tried to emphasize that the comparison of the discharge from the literature may not seem intuitive for the different continental boundaries and averaging periods used in different studies. Therefore, the high uncertainty exists in global and continent mean annual discharge simulation should not be ignored and more efforts should be made to improve the efficiency of global hydrological modeling.
Besides, as one of the initial reports that summarize the water balance for the globe, the results of Korzun et al. (1978) have played important roles in global water management and provided a base for global water balance research (Jones et al., 1979;Döll et al., 2003;Oki et al., 2006;Fasullo et al., 2007). Therefore, taking this report into account is of great use in making this comparison complete.
The GSCD estimates will be added in the comparison.
All the information above will be added in the revised manuscript.