Triple collocation validates CONUS-wide evapotranspiration inferred from atmospheric conditions

McCormick, Erica L.; Sanders, Lillian E.; McColl, Kaighin A.; Konings, Alexandra G.

doi:10.5194/hess-30-2417-2026

Articles | Volume 30, issue 8

https://doi.org/10.5194/hess-30-2417-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-30-2417-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 30, issue 8

Research article

|

28 Apr 2026

Research article |

| 28 Apr 2026

Triple collocation validates CONUS-wide evapotranspiration inferred from atmospheric conditions

Erica L. McCormick, Lillian E. Sanders, Kaighin A. McColl, and Alexandra G. Konings

Download

Final revised paper (published on 28 Apr 2026)
Supplement to the final revised paper
Preprint (discussion started on 18 Sep 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4225', Alexander Gruber, 21 Oct 2025

This manuscript shows a rigorous analysis of the uncertainties in evapotranspiration estimates from "classical" and an alternative method using triple collocation analysis. It is very well written, clear, sound, relevant, and fits well into the scope of HESS. I recommend this manuscript to be published after minor revisions, which mainly concern clarifications of the methodology and justifications for certain assumptions.
My two main concerns are:
Sec. 2.2: How justified is the linear error model for ET? Given the non-linear nature of Eq. 1, I am a bit worried that it might not be. Then again, I don't know much about error structures in ET data, so this is more of a personal gut feeling. I can image other people having simlar concerns though, so perhaps you could add some words on that, or a reference to previous work that had looked into that?
Discussion: Your discussion revolves around the different patterns you see in \sigma_eps and R_T. If I understand your your methodology correctly, you compare *unscaled* \sigma_eps estimates (Eq. 6). How meaningful is such a comparison? In Fig 2. you show clearly that the different data sets have a different mean and variability, thus we do expect variations in the \beta terms. I'm not an ET guy, but I assume that most data set applications would try to get rid of any systematic error and therefore scale the random errors accordingly. So I would argue that it only makes sense to compare scaled random error variances, i.e., \sigma_eps, that relate to the same signal variability. After all, it is the signal-to-noise ratio that determines how well the data set information can be separated from underlying noise, and this is directly reflected (in a normalized way) by R_T.
Other comments:
It is stated repeatedly that one advantage of the SFE method is that it doesn't make assumptions about root-zone soil moisture or vegetation status. I understand that the SFE method doesn't require one to do that directly, but aren't such assumptions necessary for the computation of air temperature and humidity that are used as input for the SFE method?
The term "error" and variations thereof are used a bit loosely. There is currently a push to harmonize the terminology concerning "errors" across communities; I recommend having a look at Merchent et al. (2017) and consider adopting their proposed terminology (in particular the useage of "error" vs. "uncertainty").
Sec. 2.3 I'm missing an explaination what you do with the redundant TCA estimates from the different triplets.... In the supplement, you show the results of the individual triplets, which is fine, but in the main text, it is not clear what you show... I assume it is the average of the estimates from all triplets? Did you average for both \sigma_eps and R_t? If so, it is generally advised NOT to average correlation coefficients, but this advise comes from averaging Pearson correlations; I'm not sure if that holds here too. One could actually throw in all the four data sets in a least-squares estimator to get adjusted estimates for the signal and error variance (see Gruber et al., 2016), and then derive the R_t estimates from these, which may be a bit more robust but I'm only speculating here. Anyway, I think it would be good to at least elaborate what you did/show.
Sec. 3.2: Section titles for 3.1. and 3.3 state what its shown whereas the section title for 3.2. is a spelled out conclusion. In the discussion, titles change again to questions. I suggest just choosing one title naming style and stay consistent.
L81-82: "This suggests that..." The use of the embedded relative clause with a dangling preposition felt a bit awkward to read, I suggest rephrasing this sentence.
L104: As far as I know, triple collocation is similar, but not the same as the "three-cornered hat" approach (see e.g., Sjoberg et al., 2021). I recommend to just remove this parenthetical clause.
L139: C_p here is upper case but in Eq. 1 it's lower case. Also, perhaps change all equations symbols in text to equation mode (italic) to be consistent with the Equations?
L145: Why 10%? Can you justify that number, and might it be useful to mention the implications of this assumption?
L161: I find the explaination "By treating the product of \sigma_T as a single unknown variable ..." a bit misleading. It is not the fact that they are treated as a single variable which lets you solve for the error variance, its the fact that the betas for two data sets cancel out in the covariance ratios, which then lets you get rid of the sigma_T term by subtracting the resulting estimate from the variance of the data set.
L199: "increasing the robustness of TC assumptions" sounds a bit odd. I guess you mean that convergence of error estimates incrases our confidence that the assumptions are valid?
L278--: You compare the ET estimates qualitatively and mention some numbers in the text, but I think it could be useful to also show a summary table with all the relevant metrics (e.g., correlations and biases between all data set combinations).
L421: The acronym MAP hasn't been introduced.
L472-482: Doing a weighted averaging comes from least squares theory and serves the purpose of reducing random errors only. I guess what is meant with "this aproach has the disadvantage of obscuring the individual problems" is that if data sets have different systematic errors, especially if they are non-stationary, then you create some uncontroled blend of biased estimates, and any improvement is only a matter of luck because weights derived from random error variances do not account for these biases that are instead assumed to be zero.
L482-484: Isn't this statement trivial and already implied by the paragraph's introductory statement: "It is posible to average ET estimates weighted by each dataset's performance"?
L521: Why is this contrary to expectation? You do state that this might have to do with the lower ET amounts in these regions, so considering my argument in the beginning concerning scaling in TCA, I would argue that this is simply a result of showing unscaled \sigma_eps estimates. When looking at signal-to-noise ratios instead, this gradient vanishes, right?
L589: "complex" instead of "complicated"?
Eq. (4)-(7): The introduction of Q_ii seems a bit unnecessary to me. Since you define Q_ii just as equivalent to \sigma^2_ii, you could use the latter instead of Q directly in Eqs. 6 and 7, which I don't think would make it any more difficult to read. This might be just a personal preference though.
Figure 1: The x-axis date labelling confused me when I first looked at it. The figure caption only states "Mean annual SFE from 1979 to 2024"... Perhaps also spell out the date range shown in the example time series: "Points show time series for [...] from Dec. 2000 to Dec. 2002"?
Figure 7/8: The order of the Figure panels is inconsistent.
Supplement: I always find it hard to visually compare patterns like these. You draw the conclusion that differences are small when using different triplets, therefore assumptions can be considered to be valid. But when exactly are differences "small enough" to draw this conclusion? There isn't an aweful lot of contrast in the figures, and there indeed seem to be regions with some greaterr differences. Perhaps it might be worth plotting the actual *differences* between the TCA results for triplet combinations, or maybe complement the maps you show with boxplots of the differences?
References:

https://doi.org/10.5194/essd-9-511-2017
https://doi.org/10.1175/JTECH-D-19-0217.1
https://doi.org/10.1002/2015JD024027

Citation: https://doi.org/10.5194/egusphere-2025-4225-RC1
- AC1: 'Reply on RC1', Erica McCormick, 30 Jan 2026
  
  We thank the reviewer for taking the time to review and provide constructive comments. Our responses to each of the comments are included in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4225-AC1
RC2:
'Comment on egusphere-2025-4225', Anonymous Referee #2, 02 Jan 2026
Summary and significance
This manuscript fits well within the scope of Hydrology and Earth System Sciences. The authors introduce a new daily, 4km evapotranspiration (ET) dataset over continental US (CONUS) using the surface flux equilibrium (SFE) approach and compares it against other ET products (GLEAM, FluxCom, ERA5-Land). The authors present a careful statistical evaluation via triple collocation, giving random error and correlation to truth metrics. This manuscript is well written and conceptually clear. I particularly appreciate how the authors have put great effort and care in explaining how SFE compares to other ET estimation approaches and explains assumptions, strengths and limitations.
The demonstration that SFE has comparable performance to more complex approaches in many regions, particularly in the western US, is useful as it adds confidence in SFE as a practical alternative to estimate ET.
I recommend publication with minor revisions for clarity.
Suggestions
Figure 1: Clarify what panels b-g are showing by giving them a title and explicitly labeling the x-axis. The current x-axis was confusing, and I suggest writing out the month and year (e.g., Dec 2000).
Methods:
L141-144: I understand that the input data for SFE has been proven to be robust at the eddy covariance tower level (addressed in the introduction, Thakur et al., 2025). This may be extended when using gridMET and ERA5-Land data for this analysis, but can the authors directly tie that logic in Section 2.1? Can the authors address the biases of gridMET and ERA5-Land and how that may affect SFE ET?

L145: Can the authors justify the 10% ground heat flux (G) assumption with a citation or provide a sensitivity analysis showing how varying G can affect σ_ε and R_T? The former is more reasonable to accomplish, but I would want to know the authors expect σ_ε and R_Tto change if G is varied (e.g., 5-20%)

L145: Can the authors explain how including days with negative net radiation (Rn) can affect daily ET estimation and triple collocation statistics and justify their exclusion?

Discussion:
I suggest adding a brief discussion about expected SFE performance outside CONUS and considerations for global implementation. How do the authors expect SFE to perform in regions with weaker land-atmosphere coupling (e.g., Southeast Asia)?

L572: The authors note that SFE bias in arid conditions needs further investigation. Can the authors please add specific recommendations for future investigation and/or what additional measurements may be needed (advocating for certain measurements?).
Citation: https://doi.org/10.5194/egusphere-2025-4225-RC2
- AC2: 'Reply on RC2', Erica McCormick, 30 Jan 2026
  
  We thank the reviewer for taking the time to review and provide constructive comments. Our responses to each of the comments are included in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4225-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (further review by editor) (12 Mar 2026) by Patricia Saco

AR by Erica McCormick on behalf of the Authors (14 Mar 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (10 Apr 2026) by Patricia Saco

AR by Erica McCormick on behalf of the Authors (14 Apr 2026)

Short summary

We estimate daily evapotranspiration (ET) across the United States using the ‘surface flux equilibrium’ approach, which assumes that the balance of temperature and humidity in the atmosphere reflects recent ET on land. Using triple collocation, we compare our estimates to three other ET datasets and find that the surface flux equilibrium ET method performs well. Surface flux equilibrium ET may therefore be useful for hydrologic studies where simple, parameter-free ET estimates are advantageous.