Articles | Volume 30, issue 8
https://doi.org/10.5194/hess-30-2417-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Triple collocation validates CONUS-wide evapotranspiration inferred from atmospheric conditions
Download
- Final revised paper (published on 28 Apr 2026)
- Supplement to the final revised paper
- Preprint (discussion started on 18 Sep 2025)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2025-4225', Alexander Gruber, 21 Oct 2025
- AC1: 'Reply on RC1', Erica McCormick, 30 Jan 2026
-
RC2: 'Comment on egusphere-2025-4225', Anonymous Referee #2, 02 Jan 2026
- AC2: 'Reply on RC2', Erica McCormick, 30 Jan 2026
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Publish subject to minor revisions (further review by editor) (12 Mar 2026) by Patricia Saco
AR by Erica McCormick on behalf of the Authors (14 Mar 2026)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (10 Apr 2026) by Patricia Saco
AR by Erica McCormick on behalf of the Authors (14 Apr 2026)
This manuscript shows a rigorous analysis of the uncertainties in evapotranspiration estimates from "classical" and an alternative method using triple collocation analysis. It is very well written, clear, sound, relevant, and fits well into the scope of HESS. I recommend this manuscript to be published after minor revisions, which mainly concern clarifications of the methodology and justifications for certain assumptions.
My two main concerns are:
Sec. 2.2: How justified is the linear error model for ET? Given the non-linear nature of Eq. 1, I am a bit worried that it might not be. Then again, I don't know much about error structures in ET data, so this is more of a personal gut feeling. I can image other people having simlar concerns though, so perhaps you could add some words on that, or a reference to previous work that had looked into that?
Discussion: Your discussion revolves around the different patterns you see in \sigma_eps and R_T. If I understand your your methodology correctly, you compare *unscaled* \sigma_eps estimates (Eq. 6). How meaningful is such a comparison? In Fig 2. you show clearly that the different data sets have a different mean and variability, thus we do expect variations in the \beta terms. I'm not an ET guy, but I assume that most data set applications would try to get rid of any systematic error and therefore scale the random errors accordingly. So I would argue that it only makes sense to compare scaled random error variances, i.e., \sigma_eps, that relate to the same signal variability. After all, it is the signal-to-noise ratio that determines how well the data set information can be separated from underlying noise, and this is directly reflected (in a normalized way) by R_T.
Other comments:
It is stated repeatedly that one advantage of the SFE method is that it doesn't make assumptions about root-zone soil moisture or vegetation status. I understand that the SFE method doesn't require one to do that directly, but aren't such assumptions necessary for the computation of air temperature and humidity that are used as input for the SFE method?
The term "error" and variations thereof are used a bit loosely. There is currently a push to harmonize the terminology concerning "errors" across communities; I recommend having a look at Merchent et al. (2017) and consider adopting their proposed terminology (in particular the useage of "error" vs. "uncertainty").
Sec. 2.3 I'm missing an explaination what you do with the redundant TCA estimates from the different triplets.... In the supplement, you show the results of the individual triplets, which is fine, but in the main text, it is not clear what you show... I assume it is the average of the estimates from all triplets? Did you average for both \sigma_eps and R_t? If so, it is generally advised NOT to average correlation coefficients, but this advise comes from averaging Pearson correlations; I'm not sure if that holds here too. One could actually throw in all the four data sets in a least-squares estimator to get adjusted estimates for the signal and error variance (see Gruber et al., 2016), and then derive the R_t estimates from these, which may be a bit more robust but I'm only speculating here. Anyway, I think it would be good to at least elaborate what you did/show.
Sec. 3.2: Section titles for 3.1. and 3.3 state what its shown whereas the section title for 3.2. is a spelled out conclusion. In the discussion, titles change again to questions. I suggest just choosing one title naming style and stay consistent.
L81-82: "This suggests that..." The use of the embedded relative clause with a dangling preposition felt a bit awkward to read, I suggest rephrasing this sentence.
L104: As far as I know, triple collocation is similar, but not the same as the "three-cornered hat" approach (see e.g., Sjoberg et al., 2021). I recommend to just remove this parenthetical clause.
L139: C_p here is upper case but in Eq. 1 it's lower case. Also, perhaps change all equations symbols in text to equation mode (italic) to be consistent with the Equations?
L145: Why 10%? Can you justify that number, and might it be useful to mention the implications of this assumption?
L161: I find the explaination "By treating the product of \sigma_T as a single unknown variable ..." a bit misleading. It is not the fact that they are treated as a single variable which lets you solve for the error variance, its the fact that the betas for two data sets cancel out in the covariance ratios, which then lets you get rid of the sigma_T term by subtracting the resulting estimate from the variance of the data set.
L199: "increasing the robustness of TC assumptions" sounds a bit odd. I guess you mean that convergence of error estimates incrases our confidence that the assumptions are valid?
L278--: You compare the ET estimates qualitatively and mention some numbers in the text, but I think it could be useful to also show a summary table with all the relevant metrics (e.g., correlations and biases between all data set combinations).
L421: The acronym MAP hasn't been introduced.
L472-482: Doing a weighted averaging comes from least squares theory and serves the purpose of reducing random errors only. I guess what is meant with "this aproach has the disadvantage of obscuring the individual problems" is that if data sets have different systematic errors, especially if they are non-stationary, then you create some uncontroled blend of biased estimates, and any improvement is only a matter of luck because weights derived from random error variances do not account for these biases that are instead assumed to be zero.
L482-484: Isn't this statement trivial and already implied by the paragraph's introductory statement: "It is posible to average ET estimates weighted by each dataset's performance"?
L521: Why is this contrary to expectation? You do state that this might have to do with the lower ET amounts in these regions, so considering my argument in the beginning concerning scaling in TCA, I would argue that this is simply a result of showing unscaled \sigma_eps estimates. When looking at signal-to-noise ratios instead, this gradient vanishes, right?
L589: "complex" instead of "complicated"?
Eq. (4)-(7): The introduction of Q_ii seems a bit unnecessary to me. Since you define Q_ii just as equivalent to \sigma^2_ii, you could use the latter instead of Q directly in Eqs. 6 and 7, which I don't think would make it any more difficult to read. This might be just a personal preference though.
Figure 1: The x-axis date labelling confused me when I first looked at it. The figure caption only states "Mean annual SFE from 1979 to 2024"... Perhaps also spell out the date range shown in the example time series: "Points show time series for [...] from Dec. 2000 to Dec. 2002"?
Figure 7/8: The order of the Figure panels is inconsistent.
Supplement: I always find it hard to visually compare patterns like these. You draw the conclusion that differences are small when using different triplets, therefore assumptions can be considered to be valid. But when exactly are differences "small enough" to draw this conclusion? There isn't an aweful lot of contrast in the figures, and there indeed seem to be regions with some greaterr differences. Perhaps it might be worth plotting the actual *differences* between the TCA results for triplet combinations, or maybe complement the maps you show with boxplots of the differences?
References:
https://doi.org/10.5194/essd-9-511-2017
https://doi.org/10.1175/JTECH-D-19-0217.1
https://doi.org/10.1002/2015JD024027