Reply on RC2

There is a lack of connection between the supplementary section and the main text. For instance, when the authors introduced the CHIRPS product (~L240), they could link it to Fig. S1 to have a clear picture of the improvement. Another example are tables S1 and S2, which are not mentioned anywhere in the text but would be a useful reference in the discussion section where the authors discuss these hydrological signatures for all models.

The authors do not specify which model configuration is the baseline (which I assume is M1). Furthermore, while they present performance statistics, it is unclear if these differences are statistically significant to merit the additional data. Moreover, when they discuss the time-series analysis and the differences between the models, they do so in a descriptive manner to quantify it better. For instance, using a distance metric to evaluate series similarity to the observed data. See DOI: 10.1016DOI: 10. /j.rse.2011.020 for a summary of some useful metrics. My suggestion would be to make plots of the Mahalanobis distance rather than presenting the original time series (or in addition to Fig. 8).
R: Thank you for raising this important point. In the revised paper, we will clarify the statistical improvement in the model performance with respect to the baseline (M1) calibrated with daily streamflow data, which usually is common practice. The %-deviation from the benchmark for all models will be used together with comparative metrics such as XXXX in a new table. With respect to your final suggestion, we consider that the distancemetric plots used in Mahalanobis (2011) are interesting but rather hard to interpret for hydrological time series; instead, we will modify Figure 8 to better explain the differences between model configurations.
I believe that the first objective should be merged into the other objectives. Running the model (independently of the computer language used) is a trivial objective as it is met from the start of the project.
R: Thank you for this suggestion. In the revised version, we will merge the first two objectives into a single objective.
The authors need to explain how they did the catchment extraction in GRASS by providing additional detail into the used parameters. Also, they need to explain the IDW method in the methods section, define the acronym, and add a reference.
R: Thank you for your point. Only to clarify, SAGA GIS was used to delineate the catchment boundaries, and GRASS GIS was used to address the space-time series of precipitation and temperature. In the revised version, we will describe the parameters used in SAGA, as well as the description of IDW interpolation.
The authors need to improve Fig. 3; interpreting it is confusing. Perhaps it would be best to have it with 4 rows rather than arrows, even if there is a degree of repetition. Figure 3 to clarify the different model configurations in the revised paper.

R: We agree and will add a row for model configuration in
Can the authors modify the presentation of the 86 parameters in L331? It is hard to understand; I would suggest presenting the numbers in parenthesis as the main parameter numbers and then elaborating on how many were linked to soil types, land cover, etc.

R: Thanks for your suggestion. In the revised version, we will clarify the presentation of the model parameters.
Can the authors add box plots of the other statistics as supplementary? It is hard to visualize them as isolated numbers. Again, can the authors perform tests of significance on the statistics to determine a significant difference between them? R: We agree. In the revised version, we will change Table 1 for a boxplot in the supplementary materials section.
Can the authors mention what the criteria for defining a KGE of 0.5 as acceptable were?
R: Thank you for raising this important point. In the revised paper, we will implement a clearer description and discussion on this issue based on recent literature. We will also introduce a clear description of the other performance metrics (KGE, Pearson Correlation Coefficient, MAE, NSE)

used for comparison purposes in the supplementary material.
Around L605, the authors mention that the corrected temperature improved model performance. The authors need to quantify this performance increase.
R: Thank you for this suggestion. We will state the performance obtained with the original temperature time series and the comparison using the corrected temperature.
The authors mention that the streamflow overestimation can be related to a precipitation bias in CHIRPSc. However, from Fig. S1, this does not seem to be the case.
R: Thank you for raising this important point. We found that precipitation overestimation persisted in drier environments despite the bias correction. The overestimate was associated with the lack of ground precipitation records to correct the CHIRPS product in headwater catchments such as Rancho Rey. Our Fig. S1 in the supplementary section shows that, in many cases, differences between the water balance fluxes (P, ET, Q) were reduced. In the revised manuscript, we will clarify this point to highlight the issues associated with precipitation bias correction.
When discussing model improvement, please quantify it. The authors mention in L635 that M3 and M4 showed better and more realistic results but failed to quantify the improvement. Moreover, from Fig. 10, it seems that even though KGE was higher for M3 and M4, M1 was able to reproduce the actual spatial distributions of PET and AET better, overlapping more with the observed ranges. M3 and M4 (shown in Figure 5). In the revised paper, we will include a statistical comparison between model performances to clarify the improvements between model configurations.

R: That is indeed an interesting point, and we thank you for the comment. Only to clarify, M1 showed high performance for PET but a lower performance for ET in comparison with
In the discussion section, the authors mention that adding PET and AET to the calibration improved model representativeness and link earlier studies. The authors need to also link this assertion to their study, which is one of their objectives.
R: Thank you for this suggestion. We will include statements and refer to the performance improvements in the text of the reviewed manuscript.
Due to missing tests, I do not see how the authors can conclude that M3 and M4 are better configurations since the statistical significance of the differences has not been evaluated. And in fact, for a lot of the variables, it seemed that M1 performed adequately well compared to M3 and M4. The authors can further support the increased accuracy of M3 and M4 by their link to the FDC information.
R: Thank you for your suggestion. In the reviewed paper, we will include a statistical comparison using the %-deviation from the model calibrated with respect to the baseline (M1), as well as adding the corresponding metrics of FDCs in Figure 9.
Finally, I suggest adding ": A case study in Costa Rica." to the title since it was the only region analyzed in the manuscript.
R: We agree with this statement that Costa Rica is a limited geographical space and will change the title of the revised paper to "Remote sensing-aided large-scale rainfall-runoff modelling in the tropics of Costa Rica".
Around L54, the authors mention the opportunities from including additional variables. Please, specify which variables or give a few examples. R: We agree. In the revised paper, we will include some examples that were found useful for model calibration in the recent literature and how this could be applied to Costa Rica.
Technical corrections: