On the selection of precipitation products for the regionalisation of hydrological model parameters

hydrological model parameters Oscar M. Baez-Villanueva1,2, Mauricio Zambrano-Bigiarini3,4, Pablo A. Mendoza5,6, Ian McNamara1, Hylke E. Beck7, Joschka Thurner1, Alexandra Nauditt1, Lars Ribbe1, and Nguyen Xuan Thinh2 1Institute for Technology and Resources Management in the Tropics and Subtropics (ITT), TH Köln, Cologne, Germany 2Faculty of Spatial Planning, TU Dortmund University, Dortmund, Germany 3Department of Civil Engineering, Universidad de la Frontera, Temuco, Chile 4Center for Climate and Resilience Research, Universidad de Chile, Santiago, Chile 5Department of Civil Engineering, Universidad de Chile, Santiago, Chile 6Advanced Mining Technology Center (AMTC), Universidad de Chile, Santiago, Chile 7GloH2O, Almere, the Netherlands Correspondence: Mauricio Zambrano-Bigiarini (mauricio.zambrano@ufrontera.cl)

To date, few regionalisation studies have used gridded P products at the daily time scale. Beck et al. (2016) used the Climate Prediction Center unified gauge-based P product (CPC) to provide spatially distributed HBV parameters at the global scale.
They selected CPC because it yielded better performance than ERA-Interim during calibration. Rakovec et al. (2016) used the European daily high-resolution gridded dataset (E-OBSv8.0) to force a mesoscale hydrological model over 400 catchments in 70 Europe, providing regionalised model parameters through a multivariate parameter estimation technique. More recently, Beck et al. (2020a) combined MSWEPv2.2 with a novel multiscale parameter regionalisation approach to provide global gridded parameter estimates using daily Q observations from 4,229 catchments. Although these studies have successfully used gridded P products for parameter regionalisation, they only selected one product, and thus the effects that the choice of a P dataset can have on regionalisation results remains unknown. This study aims to answer the following questions: 75 i) to what extent does the choice of gridded P forcing used in calibration affect the relative performance of regionalisation techniques?
ii) how does this relative performance vary across catchments with different hydrological regimes? Table 1. Summary of selected regionalisation studies that used spatial proximity (SP), feature similarity (FS), parameter regression (PR), or multiscale parameter regionalisation (MPR). This study has been added for completeness.

Approach Relevant conclusion
McIntyre et al.
United Kingdom 127 / Leave-one-out cross-validation SP and FS The transfer of complete model parameter sets increased the performance of regionalisation. The use of the 10 best model parameter sets provided a more robust representation of flood peaks and generated a better ensemble of the overall flow regime, although flow peaks were underestimated. A comparison against the PR approach showed that FS produced better results.
Austria 320 / Leave-one-out cross-validation SP, FS, and PR All methods performed better than the average of the model parameters of all catchments. Two methods performed the best: FS and an SP kriging approach, where the model parameters were regionalised independently based on their spatial correlation. Local regression methods outperformed the global regression method, highlighting the importance of accounting for regional differences during PR. Oudin et al. (2008) France 913 / Leave-one-out cross-validation SP, FS and PR SP performed the best, followed closely by FS. The reduced performance of FS was attributed to the lack of soil-related properties used as inputs. To construct the ensemble output using multiple catchments, averaging the Q time series performed better than averaging the model parameters. They concluded that the dense network of catchments favoured the SP method.
Samaniego et al.
Germany 1 / 10 stations within the study area

MPR
The MPR method showed improved results compared to the standard PR when the global parameters were calibrated at a coarser modelling scale and then transferred to a finer one. Bao et al. (2012) China 55 / Leave-one-out cross-validation FS and PR FS outperformed PR over both humid and arid regions. Moving from humid to arid regions, the degree to which the FS approach outperformed PR increased.
Zelelew and Alfredsen (2014) Southern Norway 11 / Leave-one-out cross-validation SP and FS The ensemble of the 10 most similar catchments outperformed the other approaches (the performance increased when 2-6 catchments were used). They recommended identifying the parameters that influence the model response in order to minimise the model parametric dimensionality.
Southern France 16 / Leave-one-out cross-validation SP and FS FS outperformed SP. They reported only a small decrease of performance from calibration/verification to regionalisation (∼10%) when evaluated during flash flood events. Using an ensemble of 2-4 donor catchments yielded the best the regionalisation performance. Using well-modelled catchments does not always produce good performances during regionalisation and parameter sets from low performing catchments can produce higher performances when transferred to ungauged settings. Athira et al. (2016) Conterminous USA 8 / Leave-one-out cross-validation

PR
The parameter values using multi-linear regression models were different to those obtained through model calibration, indicating the deficiency of regionalising the parameters directly as a function of catchment attributes. For the one catchment where SP was also tested, PR performed better. Beck et al. (2016) Global 674 / 1,113; independent evaluation

FS
The derived global maps of HBV parameter sets conform well with large-scale climate patterns, demonstrating the effect of climate on rainfall-runoff patterns. For 79% of catchments, the averaging of model outputs (from 10 donor catchments) outperformed the use of spatially uniform parameters. P underestimation appeared to be the dominant cause of low calibration scores, particularly for tropical and arid catchments. Rakovec et al. (2016) Europe 36 / 400, crossvalidation

MPR
The model performed well in simulating daily Q over a wide range of physiographic and climatic conditions, with median KGE's greater than 0.55.
This performance reduced in heavily regulated catchments. Further evaluation against complementary datasets showed the best agreement for ET, followed by TWS, and the lowest for SM. Swain and Patra (2017) India 32 / Leave-one-out cross-validation SP, FS, and PR SP (both kriging and IDW) outperformed PR and FS. The methods were evaluated against a global mean approach, which produced worse results than all tested regionalisation methods. Beck et al. (2020a) Global 4,229 / Ten-fold crossvalidation MPR They incorporated within-catchment variability in climate and landscape, and yielded an improvement in 88% of the catchments (median KGE' improved from 0.19 to 0.46). They found a weak positive correlation between regionalisation performance and catchment humidity. Considerable improvements were obtained for catchments located both near and far from those used for optimisation. Q simulation performance was best in humid regions and worst in arid regions. Neri et al. (2020) Austria 209 / Leave-one-out cross-validation SP and FS Compared to the results of the independent calibration/verification, the regionalisation performance using the TUWmodel deteriorated less than using the GR6J model. With a high density of gauged stations, both the SP and FS performed similarly well, but the results deteriorated with reduced gauge density (especially for SP). Transferring the parameter sets of more than one single catchment improves the regionalisation performance.

This study
Chile 100 / Leave-one-out cross-validation SP, FS and PR FS was the best performing method, followed by SP. The use of merged P products does not necessarily translate into an improved hydrological modelling performance. Strong performance of a P product for calibration and validation does not necessarily translate into strong performance for regionalisation. The performance of regionalisation methods depends on the hydrological regime.

Study area and selection of catchments
Our study domain is continental Chile (Figure 1), which is bounded to the west by the Pacific Ocean, to the north by Peru, and to 80 the east by Bolivia and Argentina. The territory spans 4300 km of latitudinal extension (17.5 • S-56.0 • S) and on average 180 km of longitudinal extension (76.0 • W-66.0 • W), with elevation (Jarvis et al., 2008) ranging from 0 to 6892 m a.s.l. in the Andes Mountains. Figure 1 shows the elevation, land cover (Zhao et al., 2016), Köppen-Geiger climate classification (Beck et al., 2018), and hydrological regimes for the five major macroclimatic zones presented in Zambrano-Bigiarini et al. (2017). A large variety of climates are present across the country, with the macroclimatic zones transitioning from the (hyper)arid and semi-85 arid climates in the Far North (17.50-26.00 • S) and Near North (26.00-32.18 • S), through temperate climates in Central Chile (32.18-36.40 • S), to more humid and polar climates in the South (36.40-43.70 • S) and Far South (43.70-56.00 • S). P increases with altitude and latitude (in the southern direction) ranging from almost zero in the Atacama Desert to ∼6000 mm yr −1 in the surroundings of Puerto Cardenas (∼43.2 • S). Similar to the P patterns, both the mean annual Q and rainfall-runoff ratio tend to increase from north to south (Alvarez-Garreton et al., 2018;Vásquez et al., 2021).

90
The El Niño-Southern Oscillation (ENSO) has a large impact on winter P , with negative anomalies during La Niña and positive anomalies during El Niño events (Verbist et al., 2010;Robertson et al., 2014). Although neutral ENSO conditions have prevailed since 2011 (except for a strong El Niño event during 2015), an uninterrupted sequence of dry years with increased temperatures has been observed from 2010-2018, with annual P deficits of about 25-45% across Chile. This long-term deficit in P volume, also known as the Chilean megadrought (Boisier et al., 2016;Garreaud et al., 2017), has reduced snow cover, 95 river flows, reservoir storage, and groundwater levels across Chile (Garreaud et al., 2017(Garreaud et al., , 2020. Hydroclimatic indices and characteristics for 516 catchments in continental Chile were acquired from the Catchment Attributes and MEteorology for Large-sample Studies dataset in Chile (CAMELS-CL; Alvarez-Garreton et al., 2018). The dataset includes location, topography, geology, soil types, land cover, hydrological signatures, and human intervention degree, among others. Q data were obtained from the Center for Climate and Resilience Research (CR2; http://www.cr2.cl/datos-de-caudales/, 100 last accessed October 2020) for 1930-2018 because Q data from CAMELS-CL ended in 2016 at the time of conducting this study. We selected the near-natural catchments from the CAMELS-CL database that fulfilled the following criteria: 1. Less than 25% of missing values in the daily Q time series for 1990-2018 (may be non-consecutive).

110
The drainage area of the selected catchments (100) ranges from 35 to 11,137 km 2 , with a median value of 645 km 2 . The selected catchments contain 42 nested catchments (i.e., catchments that are contained in a larger catchment). We adjusted the classification of these catchments according to hydrological regime, building on the classifications presented in several national and regional technical reports (e.g., DGA, 1998DGA, , 1999DGA, , 2004aDGA, , b, c, 2006DGA, , 2016aDGA, , b, 2018, by visually analysing the contribution of solid and liquid P to the mean monthly Q values. These regimes were classified as: i) snow-dominated, 115 ii) nivo-pluvial, i.e., snow-dominated with a rain component, iii) pluvio-nival, i.e, rain-dominated with a snow component, and iv) rain-dominated, as shown in Figure 1d. Figure A1 shows conceptual hydrographs for each of these regimes and is presented in Appendix A.

Precipitation products
Four P products were used to investigate how the choice of P forcing affects the performance of regionalisation techniques.
The P products are presented in Table 2, and were selected because previous studies have reported good agreement when evaluated against in situ measurements over continental Chile Boisier et al., 2018;Baez-Villanueva et al., 2018. The Center for Climate and Resilience Research Meteorological dataset version 2.0 (CR2MET; Boisier et al., 2018) provides daily gridded P estimates over continental Chile at a 5 km spatial resolution for 1979-2018. These estimates are produced by combining rain gauge observations with reanalysis data from ERA5, while CR2MET version 1.0 of this product was produced using ERA-Interim data . As CR2MET was developed specifically for Chile and uses all the Chilean rain gauges (874 across Chile; see Figure S1 in the supplement), it is considered as the 'reference' P product of Chile.

130
The random forest merging procedure (RF-MEP; Baez-Villanueva et al., 2020) combines gridded P products, ground-based measurements, and other spatial covariates to generate P estimates. We applied this methodology to generate a spatially distributed, daily P product for continental Chile, using daily records from 334 rain gauges (obtained from CR2; http://www. cr2.cl/datos-de-precipitacion/), gridded P data from the ERA5 reanalysis (Hersbach et al., 2020)  ERA5 (Hersbach et al., 2020) is a reanalysis product that provides hourly P estimates (as well as other variables) from 1950-present at a spatial resolution of around 30 km (∼0.28 • ). There are important improvements in its P estimates compared to its predecessor ERA-Interim, such as improved i) representation of mixed-phase clouds; ii) prognostics variables for rain and snow; iii) parametrisation of microphysics; and iv) representation of tropical variability (Hersbach et al., 2020). Although

140
ERA5 also assimilates NCEP Stage IV P estimates over the conterminous USA, which combine NEXRAD data with in-situ measurements, it does not incorporate information from any ground-based P stations over Chile. Hourly ERA5 estimates were aggregated into daily P values taking into account the reporting times of the Chilean rain gauges (08:00-07:59 local time, which represents 11:00-10:59 UTC). Although this product has a relatively low spatial resolution compared to the remaining products, we included it because i) Chile is dominated by large-scale, frontal systems  and therefore, 145 coarse-resolution products may perform well even over small catchments; ii) reanalysis products tend to perform well at high latitudes (Beck et al., 2017a); and iii) we consider that its inclusion represents a realistic situation that may exist in many practical applications (i.e., where a catchment size is small relative to P product resolution).
The Multi-Source Weighted-Ensemble Precipitation (MSWEPv2.8; Beck et al., 2017bBeck et al., , 2019 is a 3-hourly P product with a spatial resolution of 0.10 • , which takes advantage of the complementary strengths of satellite, reanalysis and ground-based 150 data. MSWEPv2.8 applies daily and monthly corrections to its estimates using data from around 77,000 rain gauge stations globally (628 of these are over Chile, see Figure S1) accounting for their local reporting times. The 3-hourly MSWEPv2.8 estimates were also aggregated into daily P to account for the difference in the reporting times. Figure 2a shows the spatial distribution of mean annual P for all products over 1990-2018, while Figure 2b shows boxplots of the mean monthly P averaged over catchments located within each macroclimatic zone. All P products show relatively sim-155 ilar patterns of spatial variability across continental Chile; however, there are substantial differences in their total P amounts.
In general, P increases from the (hyper-arid) Far North to the South, and decreases again in the Far South. P also increases from the west coast towards the Andes Mountains. ERA5 provides higher P amounts over all five macroclimatic zones, while RF-MEP generally yields the lowest annual P values. Over the Far North, all products show a marked rainy season during December-March due to summer convective P , which differs from the marked seasonality evident over the Near North, Cen- To gain a deeper understanding of the differences between the four P products, we examined the spatial distribution of median annual values of four Climdex Indices (Karl et al., 1999) for 1990-2018 ( Figure 3). First, to account for days without rain (P < 1 mm), we used the consecutive dry days index (CDD; Figure 3a), which retrieves the maximum dry spell length.
It is evident that CR2MET yields longer dry spells, mainly across the Far North and Near North regions, while ERA5 has 175 shorter dry spells over these regions, especially over the Andes Mountains. CR2MET, RF-MEP, and MSWEPv2.8 have similar spatial patterns over the Central Chile and South regions, while ERA5 has less consecutive dry days over the Andes Mountains.
Similarly, ERA5 provides shorter dry spells over the Far South, while CR2MET and RF-MEP present similar patterns. These results are consistent with the consecutive wet days index (CWD; Figure 3b), which assesses the frequency and intermittency of P . ERA5 provides the highest CWD values over the driest regions (Far North and Near North), with medians ranging from 180 0 to 25 days, followed by MSWEPv2.8 (0 to 15 days). ERA5 also shows higher CWD values over high-elevation areas in Central Chile, while the remaining products show similar spatial patterns to each other. The four products show agreement in the CWD over the South region, with values ranging from 5 to 25 days. Finally, RF-MEP shows the lowest consecutive days with P in the Far South, followed by CR2MET and MSWEPv2.8, while ERA5 shows substantially higher CWD values at latitudes greater than 47 • S.

185
To characterise high P intensities, we used the Rx5day ( Figure 3c) and R95pTOT (Figure 3d) indices, which represent the maximum P accumulated over five consecutive days, and the total P above the 95th percentile of the daily P for wet days, respectively. Figure 3c shows that ERA5 and CR2MET generally yield the highest Rx5day values, followed by MSWEPv2.8 and RF-MEP. A similar spatial variability is obtained with R95pTOT (Figure 3d), indicating that there is a greater contribution of P from extreme events in ERA5 over high-elevation areas. These spatial patterns are replicated to some extent by CR2MET,190 which provides R95pTOT values up to 1200 mm over the Andes Mountains in Central Chile. b) mean monthly P averaged over each catchment located within each macroclimatic zone (see Figure 1d). of consecutive wet days (CWD); c) maximum P over five consecutive days (RX5day); and d) annual P that is above the 95th percentile of P accumulated for events that are above the 95th percentile of the daily P for wet days (R95pTOT). The dark red horizontal lines represent the limits of each macroclimatic zone.

Air temperature and potential evaporation
Maximum and minimum daily air temperature (T ) at a spatial resolution of 0.05 • were taken from CR2MET. T is estimated using multivariate regression from the Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature (LST) and ERA5 estimates as covariates (Alvarez-Garreton et al., 2018;Boisier et al., 2018). The Hargreaves-Samani equation 195 (Hargreaves and Samani, 1985) was used to obtain daily potential evaporation (P E) from CR2MET maximum and minimum daily T at the same spatial resolution (0.05 • ).
The TUWmodel requires as inputs daily time series of P , T , and P E. The parameters used by the TUWmodel to represent the hydrological processes are listed in Table 3, including the ranges selected for model calibration, which were adopted from previous studies (Parajka et al., 2007;Ceola et al., 2015) that calibrated the TUWmodel over a large number of mountainous

Independent catchment calibration and verification
The simulation period used for this study was 1990-2018. For calibration purposes, we used the first ten years as a conservative warm-up period to initialise the model stores, as in Beck et al. (2020a). The calibration period (2000-2014) includes near 220 normal conditions and the beginning of the Chilean megadrought. The first evaluation period (hereafter, Verification 1, 1990-1999) represents near-normal/wet hydroclimatic conditions, while the second evaluation period (hereafter, Verification 2, 2015-2018) spans the second half of the Chilean megadrought, and was used to test the ability of the hydrological simulations to represent dry conditions. To initialise model stores for the Verification 1 period, we used an 8-year warm up period due to P product availability. We replicated Figures 2 and 3 for these three periods to analyse the differences between the selected P 225 products (see the supplement, Figures S2-S7). We used the modified Kling-Gupta efficiency (KGE', Eq. 1; Kling et al., 2012) to calibrate the TUWmodel, which typically provides better hydrograph simulations than other squared-error indices Kling et al., 2012;Mizukami et al., 2019) and has been used in numerous studies (e.g., Garcia et al., 2017;Beck et al., 2019;Baez-Villanueva et al., 2020;Neri et al., 2020;Széles et al., 2020). The KGE' has three components: the Pearson correlation coefficient (r; Eq. 2); the bias ratio 230 (β; Eq. 3); and the variability ratio (γ; Eq. 4). µ is the mean Q, CV is the coefficient of variation, σ represents the standard deviation of Q, and the subscripts s and o represent simulated and observed Q, respectively. The KGE' and its components have their optimum value at one, and its optimisation seeks to reproduce the temporal dynamics (measured by r), while preserving the volume and variability of Q, measured by β and γ, respectively (Kling et al., 2012).
To calibrate the model parameters, we used the hydroPSO global optimisation algorithm (Zambrano-Bigiarini and , which implements a state-of-the art version of the Particle Swarm Optimisation technique (PSO; Eberhart and Kennedy,240 1995; Kennedy and Eberhart, 1995). We used the standard PSO 2011 algorithm (Clerc, 2011a, b), defined as spso2011 in the hydroPSO R package (Zambrano-Bigiarini and . We set the number of particles in the swarm (npart = 80), the maximum number of iterations (maxit = 100), and the relative convergence tolerance (reltol = 1E − 10), while the default values were used for all other parameters. Over the last decade, hydroPSO has been successfully used to calibrate numerous hydrological and environmental models (e.g., Brauer et al., 2014a, b;Silal et al., 2015;Bisselink et al., 2016;Kundu et al., 245 2017; Kearney and Maino, 2018;Abdelaziz et al., 2019;Ollivier et al., 2020;Hann et al., 2021). For more details on the use of the hydroPSO package to calibrate the TUWmodel, readers are referred to Zambrano-Bigiarini and Baez-Villanueva (2020).

Regionalisation techniques
After obtaining catchment-specific model parameters through independent catchment calibration (Section 3.3), we compared three parameter regionalisation techniques: i) spatial proximity; ii) feature similarity; and iii) parameter regression. We as-250 sessed performance through a leave-one-out cross-validation exercise, which consists of leaving out each one of the 100 catchments, transferring model parameters, conducting Q simulations and computing performance evaluation metrics.

Spatial proximity
The spatial proximity method assumes that climatic and physical characteristics are relatively homogeneous over a region (Oudin et al., 2008). We quantified the spatial proximity between the target pseudo-ungauged and the remaining catchments 255 using the Euclidean distance between catchment centroids, computed with geographic coordinates (i.e., latitude and longitude): For each pseudo-ungauged catchment, the donor was chosen according to the minimum Euclidean distance, and the full parameter set obtained during the independent calibration of the donor catchment was transferred to the pseudo-ungauged catchment.

Feature similarity
In the feature similarity method, we transferred the calibrated parameter sets from 10 donor catchments to the pseudo-ungauged catchment based on similarity between climatic and geomorphological features, quantified using the catchment characteristics presented in Table 4. To exclude redundant information, we first performed correlation analyses between catchment descriptors using the Pearson and Spearman rank correlation coefficients (to account for linear and monotonic correlation, respectively), and discarded three descriptors with high correlations (mean elevation, mean annual P E, and SDII; see Appendix B). Also, we discarded snow cover because it was found to be unreliable, leaving nine catchment features for this method. To assign equal weight to each catchment characteristic, they were normalised into the range [0, 1] using Eq. 6: where x f is the value of the characteristic for catchment f , while x max and x min are the maximum and minimum values of the 270 characteristic x over all catchments. After normalising all catchment characteristics, we calculated the dissimilarity as follows: where S i,j is the dissimilarity index between catchments i and j; Z i,m and Z j,m are the normalised values of the m catchment characteristic for catchments i and j, respectively; and n is the total number of characteristics.
For each pseudo-ungauged catchment i, the 10 catchments j with the lowest dissimilarity indices (S i,j ) were selected as 275 donors (Oudin et al., 2008;Zhang and Chiew, 2009;Zhang et al., 2015;Beck et al., 2016). The full parameter sets obtained during the independent calibrations of each donor catchment were used to run TUWmodel in the pseudo-ungauged catchment, thus producing an ensemble of 10 Q simulations, as in previous studies (McIntyre et al., 2005;Zelelew and Alfredsen, 2014;Beck et al., 2016). The 10 Q time series were then averaged to produce a single Q time series.

Parameter regression 280
The parameter regression technique aims to detect statistical relationships between parameter values and catchment characteristics, and uses these relationships to estimate model parameters for ungauged catchments (Parajka et al., 2005;Oudin et al., 2008;Swain and Patra, 2017). To account for non-linear relationships between model parameters and catchment characteristics, we implemented the random forest machine learning algorithm (RF; Breiman, 2001;Prasad et al., 2006;Biau and Scornet, 2016) provided in the RandomForest R package (Liaw and Wiener, 2002). RF uses an ensemble of decision trees between pre-285 dictand and predictor values (also known as covariates) for regression and supervised classification, and has the capability to deal with high-dimensional feature spaces and small sample sizes (Biau and Scornet, 2016 For this study, we developed one RF model for each TUWmodel parameter, using all thirteen independent catchment characteristics listed in Table 4 as covariates. Our experimental setup used an ensemble of 2000 regression trees, a minimum of five Table 4. Selected climatic and physiographic characteristics to quantify feature similarity between catchments. All variables related to P were computed using the corresponding P product used as an input to the TUWmodel for 1990-2018.
Data source Importance 1 Mean elevation CAMELS-CL Composite indicator that influences a range of processes such as long-term P and T , and hence soil moisture availability. In some environments, it is also related to aridity and snow processes.
2 Median elevation SRTMv4.1 Same as mean elevation but provides a more robust representation of elevation over mountainous catchments.
3 Catchment area CAMELS-CL Related to the degree of aggregation of catchment processes related to scale effects. Additionally, it is an indicator of total catchment storage capacity.
4 Slope CAMELS-CL Related to the response of the catchment, routing, and infiltration processes.
5 Forest cover CAMELS-CL Forested catchments are associated with a trade-off between high water consumption rates and enhanced soil.
6 Snow cover CAMELS-CL Related to the influence of snow processes within the catchment.
7 Mean annual precipitation P product Related to the generation of runoff and P related to orographic gradients (e.g., coastal areas).
8 Mean annual air temperature CR2MET Indicator of snow processes in cold environments. It is also related to aridity, and consequently to the evaporative demand.
9 Mean annual potential evaporation Computed from CR2MET A measure of the atmospheric water demand (especially at the annual temporal scale).
10 Aridity index CR2MET and P product Represents the competition between energy and water availability.
11 Daily temperature range CR2MET Monthly mean difference between daily maximum and minimum T . Related to variations in the diurnal cycle and evaporative demands.
12 Simple precipitation intensity index P product Relation of annual P to the number of wet days (P > 1 mm). Serves as a proxy for seasonality and intensity of P events.
13 Maximum consecutive 5-day precipitation P product Related to extreme P events.
terminal nodes for each model, and p/3 variables randomly sampled as candidates at each split, where p represents the number 295 of predictors. The trained RF models were then used to predict parameter values in the pseudo-ungauged catchments.

Influence of nested catchments
To evaluate the influence of nested catchments on the performance of the three regionalisation methods, we repeated the three regionalisation methods for each target catchment, with catchments considered to be nested (in relation to the pseudo-ungauged catchment) excluded from the set of potential donor catchments. Following Neri et al. (2020), we used a cutoff point of 10% of drainage area, meaning that only catchments that cover more than 10% of the area of the parent catchment were considered to be nested.

Influence of donor catchments for feature similarity
To evaluate the influence of the number of donors used in feature similarity, we repeated the process followed in Section 3.4.2 to assess the performance of this regionalisation method when 1, 2, 4, 6, 8, and 10 donor catchments are selected. This analysis 305 evaluates the impact of averaging varying numbers of simulations compared to the results that are based on only the most similar catchment.

320
Despite the substantial variations between P products (see Section 3.1.1), TUWmodel performed well for all P products in the calibration, Verification 1 and Verification 2 periods, with median KGE' values greater than 0.77, 0.71, and 0.62, respectively.
The calibrated model parameters lay well within the selected parameter ranges in the large majority of the cases (see Figure S8 of the supplement). In other words, the selected parameter ranges were wide enough so that calibrated parameter values were not concentrated at their lower or upper limits.
325 Figure 5 shows the performance of the TUWmodel during calibration, Verification 1 and Verification 2 per hydrological regime (see Figure 1d). The TUWmodel performed better over the pluvio-nival catchments, with median KGE' values above 0.77, 0.76, and 0.69 for calibration, Verification 1 and Verification 2, respectively. During the calibration period, there was no clear second best regime. For instance, the snow-dominated catchments presented slightly higher median KGE' values but a more pronounced dispersion, while the pluvio-nival and rain-dominated catchments presented lower dispersion but reduced   parameter regression (-0.12-0.51). In addition to exhibiting a considerably lower overall performance, parameter regression returned a larger spread in KGE's for all periods.

Performance during regionalisation
The overall performances obtained for feature similarity and spatial proximity are relatively close for different P products over each period ( Figure 6)  For each regionalisation technique, Figure 7 summarises the spatial distribution of the performance of each P product for 365 the calibration, Verification 1, and Verification 2 periods. The spatial patterns obtained for all regionalisation methods were similar, independent of the P product or the evaluated period, except for parameter regression, which yielded poor results over high-elevation catchments and under dry conditions (Verification 2). These results indicate that spatial proximity and feature similarity present very similar spatial performance patterns, with feature similarity yielding higher KGE' values over the three evaluated periods.

370
All P products performed better in the Central Chile and South regions than in the Far North, Near North and Far South regions. The low performance of regionalisation in the arid north is very likely due to the convective nature of storms occur-ring in the highlands of the Chilean Altiplano (elevations above 4000 m a.s.l.), and the low density of Q stations over this area. Despite this general low performance, RF-MEP was the best performing P product over the Far North region for both spatial proximity (median KGE' of 0.28) and feature similarity (median KGE' of 0.46) in the calibration period, suggesting 375 that merging P products and ground-based observations helps to improve, to some extent, the performance of hydrological modelling across arid regions. Conversely, all products outperformed RF-MEP over the Far South. Figure 7 also highlights that spatial proximity provides the best performance over the Far South, with median KGE' values higher than 0.46, 0.27, 0.30, and 0.35 for CR2MET, RF-MEP, ERA5, and MSWEPv2.8, respectively. The systematic lower performance of feature similarity compared to spatial proximity over the Far South (except for the case of ERA5) could be attributed to: i) the lack 380 of catchment characteristics that represent the hydrological behaviour of this complex area dominated by polar and temperate climates; and ii) the low amount of potential donor catchments (eleven for latitudes > 49 • S), combined with their varied hydrological regimes. For the most southern catchments, the highest P intensities occur during March-May, while the lowest P occurs between June-August, which differs to catchments throughout the rest of the country (Alvarez-Garreton et al., 2018, their Figure 9). This may affect the hydrological simulations when model parameters from catchments located < 49 • S are 385 transferred to these far southern catchments.

Overall performance
For each P product, Figure 8 compares the performances of the three regionalisation techniques with those obtained in the independent calibration and verification periods. The independent calibration of each catchment represents the highest model 390 performance that can be obtained for a specific combination of hydrological model, objective function and catchment (i.e., an absolute benchmark), whereas the two verification periods were used to evaluate the performance of the regionalisation techniques over independent time periods (i.e., as verification benchmarks). There are marked differences in performance according to the P product used to force the TUWmodel, regardless of the regionalisation method and the evaluated period.
For example, ERA5 has more dispersion in the KGE' values compared to other products for the cases of feature similarity 395 and spatial proximity; while for parameter regression, it tends to perform the best. For all P products and evaluation periods, feature similarity performed the best, followed by spatial proximity and parameter regression, which is consistent with results from multiple studies (e.g., Parajka et al., 2005;Oudin et al., 2008;Bao et al., 2012;Garambois et al., 2015;Neri et al., 2020).
Parameter regression had both the lowest median KGE's as well as the largest spread. Comparing the two verification periods, results obtained during the (near-normal/wet) Verification 1 period were close to those obtained during calibration, while those 400 obtained during the (dry) Verification 2 were substantially lower, especially for spatial proximity and parameter regression.
These results are in agreement with the lower panels located below each map in Figure 7, which show the empirical cumulative distribution functions (ECDFs) of the performance of each regionalisation technique during the complete period of analysis . These ECDFs compare the relative performance of each regionalisation method against those obtained from the independent calibration and verification of each catchment (used as benchmarks). As expected, all regionalisation methods presented a lower performance than the independent calibration and verification, with this reduction more pronounced for parameter regression. Figure 9 shows the performance of the regionalisation techniques according to hydrological regime for all P products during the calibration period (and Figures S9 and S10 of the supplement show the same for the two verification periods). Feature similarity provided the best median performance for all hydrological regimes and P products except for snow-dominated catchments, where spatial proximity performed the best for MSWEPv2.8 for calibration and Verification 2. These results demonstrate that there was no single P product that outperformed the others for all regionalisation techniques and hydrological regimes. In other words, the best performing P product depends on the hydrological regime and chosen regionalisation method

Impact of nested catchments
We evaluated the influence of the nested catchments on the regionalisation results. Figure 10 shows the performance of the three regionalisation methods for the subset of 56 nested catchments that share a common area with at least one other catchment (i.e., the 42 nested catchments as well as all corresponding parent catchments). Here, we compare the regionalisation performance using all potential donors (dark colours) with the performance when excluding nested catchments as potential 425 donors (light colours). The order of performance of the regionalisation methods and P products did not vary when the nested catchments were excluded, as feature similarity and CR2MET remained the best performing method and product, respectively.
As expected, the regionalisation technique with the largest reduction in performance when excluding nested catchments was spatial proximity, followed closely by feature similarity. All P products showed a slight performance reduction and increased dispersion for spatial proximity, except for MSWEPv2.8, which showed a slight increase in the KGE' median value. Feature 430 similarity showed a slight reduction in performance when the nested catchments were excluded; however, the median values remained almost the same. The change in performance of parameter regression was negligible after the exclusion of nested catchments because, in the particular case of Chile, excluding only a few catchments had a negligible effect on the non-linear relationships between model parameters and the selected climatic and physiographic characteristics (see Table 4).
4.4 Impact of the number of donors in feature similarity 435 Figure 11 shows the performance of feature similarity during the calibration and both verification periods when varying the number of donors used to transfer model parameters to ungauged catchments (see Section 3.6). In general, the highest median performance is obtained when using 4 or more donor catchments. However, the application of a t-test demonstrated that the improvement in the KGE' values obtained when increasing to more than one donor was not statistically significant. The results show that the performance varies according to the P product and selected period of analysis. For the calibration period,

Performance of P products
During the independent catchment calibration (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) and two verification periods (1990-1999 and 2015-2018), good performances were obtained with all P products (see Figure 4). When decomposing the results of the KGE' objective function into its three components (see Appendix C), r exhibited the lowest performance, while β and γ values were generally closer to their optimal values, particularly for calibration and Verification 1. The results obtained with ERA5, which is a reanalysis prod-450 uct, were as good or even better than those obtained with the gauge-corrected products CR2MET, RF-MEP, and MSWEPv2.8 (e.g., see results for the pluvio-nival catchments in Figure 5). This is in agreement with Tarek et al. (2020), who concluded that ERA5 should be considered a high-potential dataset for hydrological modelling in data-scarce regions. The good performance of ERA5 suggests that, for the particular case of Chile, merging P products with ground-based measurements does not necessarily translate into improved hydrological model performance, which may be attributed to the: i) lack of P rain gauges in 455 the Andes Mountains; ii) ability of the rainfall-runoff model to compensate for the P forcing (visible in the performances of the β and γ components; Appendix C); and iii) fact that P products still have errors in the detection of P events that could impact the representation of the modelled Q dynamics (as suggested by the relative lower performance of the r component of the KGE').
Furthermore, the similar performances obtained with uncorrected (ERA5) and gauge-corrected (CR2MET, RF-MEP, and 460 MSWEPv2.8) P products, both in wet and dry periods, highlight that there was no single P dataset outperforming the others in all periods. These results demonstrate that the calibration of hydrological model parameters smooths out, to some extent, the spatio-temporal differences between P products (see Figures 2, 3, 6 and 9), which is in agreement with previous studies that have demonstrated that model calibration with each P product improves the performance of Q simulations (e.g., Artan et al., 2007;Stisen and Sandholt, 2010;Bitew et al., 2012;Thiemig et al., 2013). The decomposition of the KGE' into its components 465 also demonstrated the ability of the TUWmodel to compensate for the total volume of P , as the β component was close to the optimum value, particularly for calibration and Verification 1 (see Appendix C), which can be attributed to the improved detection of P events of the merged products (regarding RF-MEP, see Baez-Villanueva et al., 2020). This can also be observed for MSWEPv2.8, as it produced the best performance over snow-dominated catchments under dry conditions (Verification 2).
Regarding the suitability of P products for parameter regionalisation, RF-MEP provided slightly better results in the Far

470
North for the calibration period using both spatial proximity and feature similarity, suggesting that P products that are merged with ground-based information over arid climates can improve regionalisation performance. The lower performance obtained in regionalisation with ERA5 in the Far North compared to the other P products (median values < 0.18 for feature similarity in all periods) can be attributed to its high P values, which are likely due to the lack of ground-based P stations over Chile in the development of the product. The incorporation of ground-based stations has the potential to: i) compensate for overestimations caused by the evaporation of hydrometeors before they reach the ground (Maggioni and Massari, 2018); and ii) improve eventbased detection skills . The latter is evident in CR2MET and MSWEPv2.8, which are both based on ERA5 but included several rain gauges in the Far North, and have a higher performance than ERA5 (see Figures 2, 3, and S1).
Despite the low performance of all P products in the Far North and Near North (median KGE' values <0.58, see Figure 7), 480 the TUWmodel appears to be flexible enough to compensate, to some extent, for differences between P products. A similar conclusion was obtained by Elsner et al. (2014), who examined differences between four meteorological forcing datasets and their implications in hydrological model calibration in western USA using the Variable Infiltration Capacity model (VIC;Liang et al., 1994). Our results are also in agreement with Bisselink et al. (2016), who concluded that parameter sets obtained during calibration partially compensated the bias of seven P products used to force the fully-distributed LISFLOOD model in four 485 catchments in southern Africa.
An unexpected result from this study is that the spatial resolution of the P products did not play a major role in model performance during calibration, verification and regionalisation; although CR2MET and RF-MEP have a higher spatial resolution (0.05 • ; ∼25 km 2 ) than MSWEPv2.8 (∼0.10 • ; ∼100 km 2 ) and ERA5 (∼0.28 • ; ∼625 km 2 ), all four products performed well during the independent calibration of the hydrological model and the two verification periods. The performance of ERA5 over 490 the 25 smallest catchments during regionalisation (area < 353.1 km 2 ) was similar to that obtained with products with a higher spatial resolution ( Figure S11 of the supplement). This can be attributed to the fact that Chile is dominated by large-scale frontal systems , and therefore, coarse-resolution products may perform well over small catchments.
Our results also align with the findings of Maggioni et al. (2013), who concluded that the loss of spatial information associated with coarser resolution (e.g., ERA5) can be compensated through model calibration.

495
5.2 How does the calibration of TUWmodel compensate for differences in P ?
The calibration of TUWmodel was able to compensate, to some extent, for differences in annual and intra-annual P amounts, intermittency, and extremes (see Figures 2 and 3) among the four products. Using the example of the nivo-pluvial catchments, Figure 12 illustrates how TUWmodel parameters compensate for differences between the P forcings used in calibration, while Figure 13 shows the corresponding variations in the mean monthly water balance components. Similar figures for snow-500 dominated, pluvio-nival, and rain-dominated catchments can be found in the supplement (Figures S12-S17).
In general, the calibrated parameters behave as expected for each hydrological regime. A notable exception is ERA5, which shows low values for the snow correction factor (SCF) in nivo-pluvial and snow-dominated catchments (Figures 12 and S12).
These catchments are primarily located in the arid Near North region (see Figure 2 and Figure S15), where the estimated winter P is substantially lower for CR2MET, RF-MEP, and MSWEPv2.8, and a high SCF corrects this apparent underestimation. The 505 lower P amounts presented in these products may reflect the incorporation of information from rain gauges located in drier, low-lying areas to correct their P estimates (see Figure S1).
ERA5 presented relatively low SCF values over nivo-pluvial catchments compared to the other P products (Figure 13), which is expected because it exhibits the highest P values. Conversely, because RF-MEP has the lowest mean monthly P over the nivo-pluvial catchments, the model adjusts the evaporation, snow water equivalent, and soil moisture components 510 (Figure 13), thus increasing the simulated Q (to match the observed Q). Substantial differences were obtained for LPrat and field capacity (FC), which directly affect evaporation and soil moisture. For example, over the nivo-pluvial catchments, the LPrat and FC values for RF-MEP are similar to those of ERA5, despite RF-MEP having substantially lower P amounts, which in turn is reflected in the reduced soil moisture and evaporation amounts. The differences between LPrat and FC according to P product are even more pronounced for snow-dominated catchments ( Figure S12).

515
Finally, higher values of the nonlinear parameter for runoff production Beta reduce the amount of water that leaves the catchment as runoff (Széles et al., 2020, their Eq. 7). For all hydrological regimes except pluvio-nival, the median Beta parameter is substantially higher for ERA5 than for the other P products. The larger Beta values obtained with ERA5 are expected to attenuate the runoff generation from extreme P events (see Figure 3c-d). Interestingly, the Beta parameter is zero in some pluvio-nival catchments, which means that all liquid P and snowmelt was used to generate runoff ( Figure S16). This 520 behaviour was more pronounced with RF-MEP and MSWEPv2.8, which exhibited the lowest P amounts and longer dry spells ( Figure 3a) over these catchments. In general, the storage components obtained from each P product (computed as the sum of the two deepest reservoirs of the model (see Széles et al., 2020, their Figure 3)) are similar for all four P products.

Evaluation of regionalisation techniques
The compensation due to the flexibility of the TUWmodel observed during the independent calibration and verification (see 525 Section 5.2) also influences the regionalisation performance. Feature similarity provided the best performance when the TUW- model was forced with all P products (Figure 8), while spatial proximity provided similar performance to feature similarity over the Central Chile and South regions, where there is a high density of Q stations. These results are in agreement with Parajka et al. (2005), Oudin et al. (2008) and Neri et al. (2020), who demonstrated that spatial proximity performs well over densely gauged regions.

530
The inclusion of donor catchments with low model performance introduces a diversity that has the potential to benefit Q prediction in ungauged catchments, as discussed by Oudin et al. (2008). We decided to incorporate these catchments in the regionalisation process because of the diversity of climates and physiographic characteristics across continental Chile (see Figure 1), with the potential downside that this may lead to errors in the transferred model parameters. Additionally, the similarity between the performance of spatial proximity and feature similarity can be partially attributed to the fact that six of 535 the nine selected catchment characteristics are directly or indirectly related to climate, which in Chile is highly related to the geographical locations of the catchments. Parameter regression was the regionalisation method that provided the worst results ( Figures 6 and 8); however, Figure 7 shows that this method generated good results over low-elevated areas of the Central Chile and South regions, where there are many potential donor catchments located nearby.
The compensation for P differences obtained through model calibration also affected the relative performance of regionali-540 sation techniques, producing unrealistic parameter sets in some donor catchments. In particular, such compensation may have impacted the spatial transferability of model parameters with the parameter regression method. The main reason for this is that, unlike techniques that transfer the entire parameter sets, the regression process denatures the already uncertain model parameters by applying independent regression procedures using climate and physiographic characteristics (Arsenault and Brissette, 2014). This challenge can be overcome by simultaneously optimising both the model parameters and the regression equations 545 (e.g., Samaniego et al., 2010;Rakovec et al., 2016;Beck et al., 2020a), but such an exercise is outside of the scope of this study.
For both spatial proximity and feature similarity, the best and worst results were obtained for pluvio-nival catchments and rain-dominated catchments, respectively. Figure 9 shows the performances of the three regionalisation techniques according to hydrological regimes (see Figure 1d) for the calibration period. Comparing Figures 5 and 9, it is evident that the snow-550 dominated catchments performed substantially worse than in the independent performance during the same period ( Figure 5).
On the other hand, the pluvio-nival catchments performed systematically better in the independent calibration and verification as well as in regionalisation. This could be attributed to: i) the ability of the model to reproduce Q in this regime; and ii) the increased likelihood of transferring model parameters from a catchment with the same hydrological regime, as they are grouped closed together and form 40% of the total number of catchments.

Impact of nested catchments
Nested catchments play an important role in the performance of regionalisation methods as they are more likely to have a strong climatological and physiological similarity to each other. As observed in Figure 10, the regionalisation method that was most impacted by the exclusion of nested catchments was spatial proximity, followed by feature similarity. These results are in agreement with previous studies where the exclusion of nested catchments reduced the performance of regionalisation 560 techniques (Merz and Blöschl, 2004;Oudin et al., 2008;Neri et al., 2020). Feature similarity only presented a slight decrease when the nested catchments were neglected, which can be attributed to the low degree of nestedness (i.e., the number of catchments that are nested in a larger one). As expected, the exclusion of nested catchments had a negligible effect on parameter regression, as the removal of relatively few catchments had a negligible impact on the non-linear relationships between the climatic and physiographic characteristics and the model parameters that were determined using all potential donor catchments.

565
The reduction of regionalisation performance when the nested catchments were removed was lower than the reduction reported in a case study over Austria (Neri et al., 2020, their Figure 9a), which could be attributed to: i) the degree of nestedness, as the unique geography of Chile limits, to some extent, the number of nested catchments within any larger catchment (only 10 of the 100 selected catchments contained more than three nested catchments); and ii) the percentage of catchments that are nested (42% in this study, compared to 65% in the Austrian case study).

Impact of number of donor catchments
Increasing the number of donor catchments in feature similarity improved the regionalisation performance. This is in agreement with several studies that have demonstrated that using an ensemble of multiple donor catchments improves regionalisation results (McIntyre et al., 2005;Zelelew and Alfredsen, 2014;Garambois et al., 2015;Beck et al., 2016;Neri et al., 2020). Figure 11 shows that there is a slight increase in performance when 4 donors or more are used, independent of the P product 575 and evaluated period. These results are similar to those of Neri et al. (2020), who determined that three donors were optimal for the TUWmodel over Austrian catchments. Feature similarity still outperformed spatial proximity when only one catchment was used to transfer the model parameters to the ungauged catchments, which is in agreement with multiple studies that have shown the ability of this method to produce good regionalisation results (Parajka et al., 2005;Oudin et al., 2008;Bao et al., 2012;Garambois et al., 2015;Neri et al., 2020).

Conclusion
Accurate streamflow predictions in ungauged catchments are critical for water resources management, and their generation is challenged by uncertainties arising from P products. In this paper, we assessed the relative performance of three common regionalisation techniques (spatial proximity, feature similarity, and parameter regression) over 100 near-natural catchments located in the topographically and climatologically diverse Chilean territory. Four P products (CR2MET, RF-MEP, ERA5, 585 and MSWEPv2.8) were used to force the semi-distributed TUWmodel at the daily time scale, using the KGE' as the calibration objective function and metric to assess: i) the impact of selecting different P forcings on the relative performance of regionalisation techniques; and ii) possible connections between regionalisation performance and hydrological regimes. Our key findings are as follows: 1. For the selected P products, the one that provided the best (worst) performance during independent calibration and 590 verification did not necessarily yielded the best (worst) results during regionalisation.
2. The P products corrected with daily gauge observations did not necessarily yielded the best hydrological model performance. However, we expect that P products with lower performances than the ones used in this study might benefit from such a correction.
3. The spatial resolution of the P products did not noticeably affect model performance during the calibration and verifica-595 tion periods. 4. The TUWmodel was able to compensate, to some extent, the differences between P products through model calibration by adjusting the model parameters and, therefore, adjusting the water balance components (e.g., snow water equivalent, evaporation, and soil moisture).
5. Feature similarity was the best performing regionalisation technique, regardless of the choice of gridded P product or 600 hydrological regime.
6. Spatial proximity was the second best performing regionalisation method because, in our study area, spatial proximity is a good proxy of climatic similarity for most neighbouring catchments.
7. Parameter regression provided the worst regionalisation performance, reinforcing the importance of transferring complete parameter sets to ungauged catchments.

605
8. The performance of regionalisation techniques can depend on the hydrological regime. We obtained the best results in pluvio-nival catchments with spatial proximity and feature similarity, while the same techniques provided the worst performance in rain-dominated catchments.
9. The exclusion of (relatively few) nested catchments had a minimal impact on the non-linear relationships between the climatic and physiographic characteristics (i.e., predictors) and model parameters (i.e., predictands), having a negligible 610 effect on parameter regression results.
10. The performance of feature similarity increased when four or more catchments were used as donors; however, the differences in performance were not statistically significant when compared to the results of using only one donor.
The results presented here are valid only for near-natural catchments across continental Chile. Nevertheless, they provide guidance for ongoing and future studies involving the application of gridded P products for regionalising hydrological model 615 parameters in ungauged basins. The feature similarity procedure described here could be used to refine the parameter regionalisation approach adopted for national scale hydrological characterisations in Chile (e.g., Bambach et al., 2018;Lagos et al., 2019). Additionally, further analyses could address: i) the effects that objective functions may have on the simulation of streamflow-derived hydrological signatures (e.g., Pool et al., 2017); ii) other states and fluxes derived from remote sensing data (e.g., Dembélé et al., 2020); iii) the influence of parameter equifinality (mainly for parameter regression), which can be 620 accounted for by simultaneously optimising the model parameters and the regression equations, as described in Beck et al. (2020a); iv) the use of additional model structures, implemented through flexible modelling platforms (e.g., Clark et al., 2008;Knoben et al., 2019); and v) the sensitivity of regionalisation results with respect to modified climate scenarios.

Pluvio-nival
Higher peak corresponding to months with maximum P Q t Lower peak due to snowmelt

Nivo-pluvial
Lower peak corresponding to months with maximum P Q t Higher peak due to snowmelt

Rain-dominated
Peak corresponding to months with maximum P Figure A1. Conceptual illustration of the hydrological regimes used to classify the 100 near-natural catchments used in this study.
Appendix B: Selection of catchment characteristics for feature similarity

625
To avoid including redundant information when quantifying catchment similarity, we examined the correlations between the catchment characteristics described in Table 4. Figure B1 shows correlation matrices between catchment characteristics using the Pearson correlation (a) and the Spearman rank (b) correlation coefficients. We only present correlations obtained with CR2MET, since very similar results were obtained with the remaining P products. Because the mean and median elevation are highly correlated (values of 1.0 and 0.99 for the Pearson and Spearman correlation coefficients, respectively), we decided to 630 keep the median elevation under the assumption that it is more representative of topographic conditions, given the pronounced elevation gradients in continental Chile. Similarly, mean annual P E was excluded because of its high correlation with mean annual T (0.87 and 0.86 for the Pearson and Spearman correlation coefficients, respectively), notwithstanding that T was used to calculate P E. SDII was also excluded due to its high correlation to the rx5day (0.97 for both coefficients). Finally, we excluded the snow cover from CAMELS-CL, as we found it to be unreliable over the snow-dominated catchments selected in 635 our analysis. Figure B1. Correlation matrices of the catchment characteristics described in Table 4 using CR2MET as the P product for: a) the Pearson correlation, to evaluate linear correlation; and b) the Spearman correlation to evaluate the monotonic correlation.
Appendix C: Performance of the components of the KGE' Table C1. Quantiles 0.25 and 0.75 of the correlation coefficient (r) of the KGE' over the selected catchments.  helped to improve the quality of the final manuscript. We would also like to thank the HESS editorial team for their support; the Centers for Natural Resources and Development (CNRD) PhD program for their financial support to the main author; the CAMELS-CL dataset (http://camels.cr2.cl/); Camila Álvarez-Garretón for providing an initial dataset of catchment that could be considered as undisturbed for our analysis; Juan Pablo Boisier for providing the rain gauges used for CR2METv2; and Rodrigo Marinao Rivas for his support in the classification of the catchments into hydrological regimes. Dr. Zambrano-Bigiarini thanks Conicyt-Fondecyt 11150861 "Understanding the 645 relationship between the spatio-temporal characteristics of meteorological drought and the availability of water resources, by using satellitebased rainfall and snow-cover data. A case study in a data-scarce Andean Chilean catchment" for the financial support from 2016 to 2018.
Pablo Mendoza received support from Fondecyt Project 11200142. The authors are also grateful to the active R community for unselfish and prompt support, in particular to Robert J. Hijmans, and Alberto Viglione / Juraj Parajka for developing and maintaining the raster and TUWmodel R packages, respectively.