A continental-scale evaluation of the calibration-free complementary
relationship with physical, machine-learning, and land-surface
models

Kim, Daeha; Choi, Minha; Chun, Jong Ahn

doi:https://doi.org/10.5194/hess-2021-126

Preprints

https://doi.org/10.5194/hess-2021-126

Preprints

16 Mar 2021

| 16 Mar 2021

Status: this preprint was under review for the journal HESS but the revision was not accepted.

A continental-scale evaluation of the calibration-free complementary relationship with physical, machine-learning, and land-surface models

Daeha Kim, Minha Choi, and Jong Ahn Chun

Abstract. The widespread negative correlation between the atmospheric vapor pressure deficit and soil moisture lends strong support to the complementary relationship (CR) of evapotranspiration. While it has showed outstanding performance in predicting actual evapotranspiration (ET_a) over land surfaces, the calibration-free CR formulation has not been tested in the Australian continent dominantly under (semi-)arid climates. In this work, we comparatively evaluated its predictive performance with seven physical, machine-learning, and land surface models for the continent at a 0.5° × 0.5° grid resolution. Results showed that the calibration-free CR that forces a single parameter to everywhere produced considerable biases when comparing to water-balance ET_a (ET_wb). The CR method was unlikely to outperform the other physical, machine-learning, and land surface models, overrating ET_a in (semi-)humid coastal areas for 2002–2012 while underestimating in arid inland locations. By calibrating the parameter against water-balance ET_a independent of the simulation period, the CR method became able to outperform the other models in reproducing the spatial variation of the mean annual ET_wb and the interannual variation of the continental means of ET_wb. However, interannual the grid-scale variability and trends were captured unacceptably even after the calibration. The calibrated parameters for the CR method were significantly correlated with the mean net radiation, temperature, and wind speed, implying that (multi-)decadal climatic variability could diversify the optimal parameters for the CR method. The other physical, machine-learning, and land surface models provided a consistent indication with the prior global-scale assessments. We also argued that at least some surface information is necessary for the CR method to describe long-term hydrologic cycles at the grid scale.

Received: 03 Mar 2021 – Discussion started: 16 Mar 2021

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Daeha Kim, Minha Choi, and Jong Ahn Chun

Status: closed

RC1:
'Comment on hess-2021-126', Anonymous Referee #1, 17 Mar 2021

MODEL FORCING:

The most questionable issue in this MS is that the authors still use ERA-Interim forcing to drive the CR in the MS submitted in 2021 rather than 2011, the latter is the year when the widely-known ERA-Interim paper was published in Quarterly Journal of the Royal Meteorological Society. In 2021, this is an obviously out-of-date meteorological forcing which should not be used (and also ERA-Interim’s ET output). Its successor, ERA5, with a spatial resolution of 0.25 degree, should be considered as a significant improvement, evidenced by a few recent papers when both ERA-Interim and ERA5 were used to drive the ET model (Martens et al., 2020) and hydrological model (Tarek et al., 2020). In general, ERA-Interim has larger errors than ERA5, which would propagate into the CR-model simulated ET values.

It is very strange that the authors did not use locally-preferable meteorological forcing when they focus only on Australia. In Australia, the SILO forcing (https://www.longpaddock.qld.gov.au/silo/), produced by interpolation of ground observations from the Australian Bureau of Meteorology and also other sources, should be much better than those forcing developed for a global coverage. While SILO has no net radiation nor wind speed, its air temperature, vapor pressure, air pressure, and solar radiation should be much more reliable than those from ERA-Interim/ERA5 for Australia. For this reason, the authors should try to drive the models using these.

Net radiation is often regarded as the most sensitive input for most ET models (see e.g., Figure 3 in Fisher et al. 2017). For this reason, I have to remind that net radiation from any atmospheric reanalysis dataset is essentially from the model simulations (for upward short- and long-wave radiation), which may have greater uncertainties than satellite observations. While satellite-based net radiation often has relatively coarse spatial resolution (one degree), the authors used 0.5 degree for their simulations, which is not far from it. Therefore, I wonder if the CR model could be improved with satellite-based net radiation data driving it. I suggest trying to use net radiation from CERES (https://ceres.larc.nasa.gov/) and/or GEWEX SRB (https://www.gewex.org/data-sets-surface-radiation-budget-srb/).

Refs:

Martens, B., et al., 2020. Evaluating the land-surface energy partitioning in ERA5. Geosci. Model Dev., 13, 4159–4181

Tarek, M., et al. 2020. Evaluation of the ERA5 reanalysis as a potential reference dataset for hydrological modelling over North America. Hydrol. Earth Syst. Sci., 24, 2527–2544.

Fisher, J., et al., 2017. The future of evapotranspiration: Global requirements for ecosystem functioning, carbon and climate feedbacks, agricultural management, and water resources. Water Resources Research, 53, 2618-2626

MODEL VALIDATION:

As the water-balance-based ET is key for assessing the ET models, the weakest point in ET_wb of this MS is the grid-based runoff data (GRUN and LORA) which has much larger uncertainties than the station-measured ones at the outlet of a basin. In Australia, the most popular runoff data is “Zhang et al., 2013. Collation of Australian modeller's streamflow dataset for 780 unregulated Australian catchments. CSIRO Land and Water, 115 pp.” (available at: https://publications.csiro.au/rpr/pub?pid=csiro:EP113194). Note also that it is more appropriate to involve only the unregulated basins with minimum human activities for validation purpose. While the authors did compare their continent-averaged ET_wb with previous similar studies in Line 230-239, this does not necessarily mean that ET_wb is accurate at the grid or basin scales.

Another key deficiency is the precipitation data for calculating ET_wb. For the same reason explained above, the authors should use precipitation data with a regional/continental focus rather than the one developed for a global coverage. The BILO precipitation data is often regarded as the most reliable one for Australia (https://data.gov.au/data/dataset/67749ef0-7223-437a-851a-573edde09567), which should be used to replace GPCC for a more accurate ET_wb.

I do understand that the authors want to use grid-based ET_wb data for evaluations, but the authors should also test its reliability with basin-scale ET_wb. The latter could be derived using measured runoff data from the above-mentioned 780 basins.

I would further argue that the authors’ ET_wb data are at least 10% smaller than the real values most likely are due to the suboptimal choice of the precipitation as well as the gridded runoff data. The reason I am saying this is that FLUXCOM and CR ET yields pretty similar values seen in Fig. 4. It should be noted that FLUXCOM must provide (and it does) one of the most accurate ET data available today since it is based on actual ET measurements by eddy-covariance, even though its inter-annual variance is somewhat subdued as seen in Fig. 3 in comparison with the CR ET values, but this temporal smoothing feature of FLUXCOM is well known from previous studies.

So all in all, it is argued here that most of the difference between CR ET and ET_wb is most likely due to unsatisfactory choices in model and water-balance forcing rather than to the need of spatially changing the alpha value of the CR model. (This is not to say that the CR would not overestimate ET rates near the sea since there the air moisture is significantly decoupled form the underlying land surface). As another choice, the authors could also apply several different sources for the forcing in the CR model as well as in the water-balance and see how they affect the outcomes (this would also serve as a sensitivity analysis). Chances are they do in a significant way.

So before one just replaces a unique calibration-free model with one needing calibration, one must make sure that the original model was evaluated correctly and exhaustively. I do not feel at all this is the case in this study.

Note also that there is a significant difference between Brutsaert’s alpha and the alpha value of the CR employed in this study. The Priestley-Taylor (PT) equation is evaluated at the measured air temperature in the former case, while in the latter at the required (but mostly unknown) wet-environment air temperature (estimated via T_ws). Without the latter, the PT equation naturally overestimates the wet-environment ET rates (and thus the actual ET rates as well) the more significantly, the drier and hotter the environment has become, therefore a correction (typically based on some measure of aridity) in the alpha value is necessary in the Brutsaert model, but not in the CR model employed in this study. So in this study the alpha value is meant to be the best available estimate of the real PT alpha value and not some weak analog of it, as in the Brutsaert model (i.e., Brutsaert et al., 2020), the latter taking up values much below the physically still interpretable value of one.

Another model application issue is that in the CR model of this study T_ws is estimated only one way, while the original authors of this CR model also described another method for the T_ws estimation, yielding somewhat smaller T_ws values (as mentioned in this MS, and therefore potentially resulting in a higher alpha-value estimate, most probably bringing the derived alpha value into the often quoted, typically observed 1.1 – 1.32 interval). In fact, most of this CR model’s applications use the latter, so it would be worth to check how it affects model outcomes and the constant alpha value estimation.

Line 238: I do not think GPCC is a reanalysis precipitation data.

Please check the text for the numerous typographical errors. For example: correctly ‘Priestley’.

Citation: https://doi.org/10.5194/hess-2021-126-RC1
- AC1: 'Reply on RC1', Jong Ahn Chun, 08 Jun 2021
  
  We greatly appreciate the efforts of the referee for reviewing our study. The constructive comments will help us to improve the manuscript. To consider the referee’s comments, we will update our CR ET calculations with locally preferable forcing (the SILO data). We found that the ERA-interim forcing inputs are not very different from the SILO data. It is also found that the SILO Precipitation (P) and the GPCC P data are not considerably different at the 0.5° grid scale (see Figure S2). Hence, we believe that the evaluation of the CR method would not considerably change even though the datasets are replaced with the locally preferable ones.
  However, we will update our calculation with the locally preferable ones. It will improve confidence of our analysis at least somehow. Since the referee has a concern about ETwb, we will use the SILO P data for estimating ETwb. And, in revision, we will add a comparison between the ensemble of the modeled runoff (Q) product and the Q observations given by the referee. Since the GRUN and the LORA datasets were already validated by global Q observations, we do not expect that the ensemble of the two would not substantially biased from the observations. However, due to the scale-mismatch between the modeled Q and the observations, only sufficiently large catchments could be considered in the comparison. This revision will improve the confidence of the comparative evaluation.
  It should be noted that the accuracy of ETwb is mostly determined by the quality of P data, not by Q. Hence, even though some biases are found with the modeled Q, that would exert minor influences on the accuracy of ETwb. In Australia, 90% of P evaporates (Glenn et al., 2011). Our responses to specific comments are following. For instance, the LORA and the VIC simulated mean annual Q of the Murray-Darling river basin as much as 15 mm/a and 42 mm/a, respectively (Hobeichi et al., 2019), and the difference between the two seems large. Nevertheless, ETwb (P – Q) estimates corresponding to the mean annual P (470 mm/a) of the basin are 455 mm/a and 428 mm/a, respectively. The relative difference between the ETwb estimates is only 6%. Indeed, we did use Q products from multiple models for the evaluation, so that potential biases in the modeled flows might be reduced.
  Our responses to specific comments are following.
  Comment 1: The most questionable issue in this MS is that the authors still use ERA-Interim forcing to drive the CR in the MS submitted in 2021 rather than 2011, the latter is the year when the widely-known ERA-Interim paper was published in Quarterly Journal of the Royal Meteorological Society. In 2021, this is an obviously out-of-date meteorological forcing which should not be used (and also ERA-Interim’s ET output). Its successor, ERA5, with a spatial resolution of 0.25 degree, should be considered as a significant improvement, evidenced by a few recent papers when both ERA-Interim and ERA5 were used to drive the ET model (Martens et al., 2020) and hydrological model (Tarek et al., 2020). In general, ERA-Interim has larger errors than ERA5, which would propagate into the CR-model simulated ET values. It is very strange that the authors did not use locally-preferable meteorological forcing when they focus only on Australia. In Australia, the SILO forcing (https://www.longpaddock.qld.gov.au/silo/), produced by interpolation of ground observations from the Australian Bureau of Meteorology and also other sources, should be much better than those forcing developed for a global coverage. While SILO has no net radiation nor wind speed, its air temperature, vapor pressure, air pressure, and solar radiation should be much more reliable than those from ERA-Interim/ERA5 for Australia. For this reason, the authors should try to drive the models using these.
  Response: We will update the ERA-interim temperature forcing data with the ones provided by the Bureau of Meteorology (BoM) of Australia, since it would not take long for us to update the calculations. However, we do not agree that the ERA-Interim inputs are unreliable forcing to the CR method. Even though they are updated to the ERA-5, the ERA-Interim reanalysis system performed even better than the Bureau of Meteorology Atmospheric high-resolution Regional Reanalysis for Australia (BARRA) in reproducing observed variations of precipitation (Acharya et al., 2019). Ostensibly, one can assume that the local data sources are more reliable than global products; however, global datasets, too, could be reliable for a hydrometeorological analysis.
  The objective of this study is to evaluate the CR method at a common grid scale at which physical, machine learning, and land surface models are developed. At the grid scale, we believe that the ERA-Interim data have been reliable (e.g., Kim et al., 2021).
  In addition, the referee’s argument that global data sources would not be as good as local data is a hypothesis that should be tested. Hence, we compared the mean vapor pressure deficit (VPD) of the ERA-interim, which is a major input for the CR method, with those from the locally preferable SILO archive. Figure S1 shows that except some overrated values in the north-western part, the mean VDPs from the ERA-Interim were not considerably deviated from the SILO data (see).
  Indeed, since a higher VPD indicates a lower ET in the CR principle, the CR ET estimates will increase at least somehow when the ERA-Interim forcing is replaced with the SILO data, being more deviated from ETwb. Therefore, our conclusion on the calibration-free CR method (i.e., higher than ET_wb) would not change.
  Nonetheless, to consider the comment, we will update our calculations with the preferable ones (temperatures and vapor pressure datasets from the SILO archive), and this revision will reduce concerns about input-data quality. Since the SILO archive does not provide wind speed and net radiation data, we will use the ERA-5 datasets for the recalculation. This revision will not take long, and we do not expect largely different outcomes.
  
  Comment 2: Net radiation is often regarded as the most sensitive input for most ET models (see e.g., Figure 3 in Fisher et al. 2017). For this reason, I have to remind that net radiation from any atmospheric reanalysis dataset is essentially from the model simulations (for upward short- and long-wave radiation), which may have greater uncertainties than satellite observations. While satellite-based net radiation often has relatively coarse spatial resolution (one degree), the authors used 0.5 degree for their simulations, which is not far from it. Therefore, I wonder if the CR model could be improved with satellite-based net radiation data driving it. I suggest trying to use net radiation from CERES (https://ceres.larc.nasa.gov/) and/or GEWEX SRB (https://www.gewex.org/data-sets-surface-radiation-budget-srb/).
  Response: We disagree. To predict ET, the CR method mainly uses the response of VPD to soil moisture deficiency rather than depending largely on radiation data. Any methods that assume the proportionality of ET to net radiation (e.g., the PT-JPL and GLEAM) could produce ET products being sensitive to changes in net radiation. However, the predictor of the CR method is the ratio between E_p and E_w, which is a normalized variable. Hence, while Ma and Szilagyi (2019) found outstanding performance of the CR method with a reanalysis radiation dataset, its similar performance was found even with net radiation estimated by the simple standard method (Kim et al., 2019).
  Even though the ERA-interim radiations are modeled values, they could become forcing data to the CR method and can lead to outstanding performance (e.g., Kim et al., 2021).
  On the contrary, the satellite radiation data would lead to scale-mismatch with the other ET models and the ERA forcing inputs, while precision of the remote sensing observations are not always guaranteed. The satellite radiation is unlikely good in our assessment.
  Instead, we will re-calculate CR ET with the ERA-5 radiation datasets to benefit from the updated ERA reanalysis system.
  
  Comment 3: As the water-balance-based ET is key for assessing the ET models, the weakest point in ETwb of this MS is the grid-based runoff data (GRUN and LORA) which has much larger uncertainties than the station-measured ones at the outlet of a basin. In Australia, the most popular runoff data is “Zhang et al., 2013. Collation of Australian modeller's streamflow dataset for 780 unregulated Australian catchments. CSIRO Land and Water, 115 pp.” (available at: https://publications.csiro.au/rpr/pub?pid=csiro:EP113194). Note also that it is more appropriate to involve only the unregulated basins with minimum human activities for validation purpose. While the authors did compare their continent-averaged ETwb with previous similar studies in Line 230-239, this does not necessarily mean that ETwb is accurate at the grid or basin scales.
  Response: This is an arguable comment. The Q data for 780 unregulated catchments only can evaluate ET in the gauged catchments. They are too coarse to evaluate the modeled ET for the entire continent. Importantly, the catchment Q observations will lead to scale-mismatches with the modeled products, because many of the 780 catchments are smaller than the 0.5° grid resolution. The Q observations are accurate, but cannot lead to general conclusions on performance of the CR method for the arid continent.
  In this case, we believe that an acceptable evaluation reference with a larger spatial coverage could be a better option. The grid ETwb could be acceptable when the P and the synthesized Q data are of good precision. Even though the ETwb is not true values, it could become a reference for cross-evaluation of the modeled ET, possibly providing practical information for ungauged basins. For example, Pan et al. (2020) assessed numerous ET models against an ensemble of modeled ETs.
  Approximately, in Australia, 90% of precipitation evaporates, and thus reliability of ETwb depends mostly on the quality of P data. Figure S2 is comparison between the locally preferable SILO P and the GPCC P data. Only 2.1% difference was found between the SILO and the GPCC precipitation for 2002-2012. We do not agree that this slight difference makes the GPCC data and the ETwb unreliable. The objective of this study is a comparative evaluation, not an absolute evaluation.
  To improve confidence of our evaluation, we will compare the modeled Q data with the Q observations of catchments larger than the grid resolution.
  
  Comment 4: Another key deficiency is the precipitation data for calculating ETwb. For the same reason explained above, the authors should use precipitation data with a regional/continental focus rather than the one developed for a global coverage. The BILO precipitation data is often regarded as the most reliable one for Australia (https://data.gov.au/data/dataset/67749ef0-7223-437a-851a-573edde09567), which should be used to replace GPCC for a more accurate ETwb. I do understand that the authors want to use grid-based ETwb data for evaluations, but the authors should also test its reliability with basin-scale ETwb. The latter could be derived using measured runoff data from the above-mentioned 780 basins.
  Response: We believe that the SILO data based on local observation would be good. Beesley et al. (2009) validated the daily SILO precipitation using the leave-one-out-cross-validation, and showed acceptable performance. As replied, we will add the comparison between the catchment Q observations and the ensemble of the modeled Q.
  
  Comment 5: I would further argue that the authors’ ETwb data are at least 10% smaller than the real values most likely are due to the suboptimal choice of the precipitation as well as the gridded runoff data. The reason I am saying this is that FLUXCOM and CR ET yields pretty similar values seen in Fig. 4. It should be noted that FLUXCOM must provide (and it does) one of the most accurate ET data available today since it is based on actual ET measurements by eddy-covariance, even though its inter-annual variance is somewhat subdued as seen in Fig. 3 in comparison with the CR ET values, but this temporal smoothing feature of FLUXCOM is well known from previous studies.
  Response: We disagree. Since it could be regionally biased and inaccurate, the FLUXCOM might not be accurate in Australia, where flux observations for training are insufficient. Please see the number of the FLUXNET2015 stations in Australia here (https://fluxnet.org/sites/site-summary/). Even though the FLUXCOM is based on eddy covariance flux data, the towers are usually installed in accessible locations only. And, large part of Australia is almost inhabitable for humans. Operating a flux tower in such a location is very difficult. It is very likely that the training data were insufficient in the arid continent. Hence, the quality of the FLUXCOM is questionable in Australia, and the referee’s argument is hypothetical.
  
  Comment 6: So all in all, it is argued here that most of the difference between CR ET and ETwb is most likely due to unsatisfactory choices in model and water-balance forcing rather than to the need of spatially changing the alpha value of the CR model. (This is not to say that the CR would not overestimate ET rates near the sea since there the air moisture is significantly decoupled form the underlying land surface). As another choice, the authors could also apply several different sources for the forcing in the CR model as well as in the water-balance and see how they affect the outcomes (this would also serve as a sensitivity analysis). Chances are they do in a significant way.
  So before one just replaces a unique calibration-free model with one needing calibration, one must make sure that the original model was evaluated correctly and exhaustively. I do not feel at all this is the case in this study.
  Response: This comment is based on a hypothesis that the ERA-Interim forcing for the CR method and the P data for ETwb are unreliable. But, here we showed their acceptability by the direct comparisons with the locally preferable datasets. The referee’s argument could be an overstatement.
  However, as replied, we will update our calculations with locally preferable ones to prevent readers from any prejudice. New calculations would not take long, and evaluations on the CR method is unlikely to change much.
  
  Comment 7: Note also that there is a significant difference between Brutsaert’s alpha and the alpha value of the CR employed in this study. The Priestley-Taylor (PT) equation is evaluated at the measured air temperature in the former case, while in the latter at the required (but mostly unknown) wet-environment air temperature (estimated via Tws). Without the latter, the PT equation naturally overestimates the wet-environment ET rates (and thus the actual ET rates as well) the more significantly, the drier and hotter the environment has become, therefore a correction (typically based on some measure of aridity) in the alpha value is necessary in the Brutsaert model, but not in the CR model employed in this study. So in this study the alpha value is meant to be the best available estimate of the real PT alpha value and not some weak analog of it, as in the Brutsaert model (i.e., Brutsaert et al., 2020), the latter taking up values much below the physically still interpretable value of one.
  Response: We understand the temperature correction could change the magnitude of Ew, and will add this point in the discussion section. However, the essential difference between Brutsaert (2015) and Szilaygi et al. (2017) is the rescaling variable. Thus, the equation for alpha developed in Brutsaert et al. (2020) is unlikely to work for the calibration-free formulation of Szilagyi et al. (2017).
  We agree that the alpha in the CR framework is an analog of the PT coefficient. However, the alpha still has the same physical meaning to that of the PT coefficient. It quantifies the proportion of the aerodynamic component of the Penman equation when the surface is with ample water. A potential reason for the alpha deviating from 1.26 would be the chosen equation for E_p. The traditional Penman equation could overrate E_p as shown in Yang et al. (2019). We will add this point in the discussion section.
  
  Comment 8: Another model application issue is that in the CR model of this study Tws is estimated only one way, while the original authors of this CR model also described another method for the Tws estimation, yielding somewhat smaller Tws values (as mentioned in this MS, and therefore potentially resulting in a higher alpha-value estimate, most probably bringing the derived alpha value into the often quoted, typically observed 1.1 – 1.32 interval). In fact, most of this CR model’s applications use the latter, so it would be worth to check how it affects model outcomes and the constant alpha value estimation.
  Response: We disagree. The choice for Tws would exert very minor influences on the CR ET. As shown in Table 1 in Szilagyi et al. (2017), the choice for Tws led to small differences in the alpha within the order of 0.01 or 0.02.
  
  Comment 9: Line 238: I do not think GPCC is a reanalysis precipitation data.
  Response: It is a gridded gauge-analysis product. We will revise it.
  
  Comment 10: Please check the text for the numerous typographical errors. For example: correctly ‘Priestley’.
  Response: We will check all the typographical errors in revision.
  
  References
  Acharya, S. C., Nathan, R., Wang, Q. J., Su, C.-H., and Eizenberg, N.: An evaluation of daily precipitation from a regional atmospheric reanalysis over Australia, Hydrol. Earth Syst. Sci., 23, 3387–3403, https://doi.org/10.5194/hess-23-3387-2019, 2019.
  Beesley, C. A., Frost, A. J., and Zajaczkowski, J.: A comparison of the BAWAP and SILO spatially interpolated daily rainfall datasets, 18th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009.
  Brutsaert, W.: A generalized complementary principle with physical constraints for land‐surface evaporation, Water Resour. Res., 51, 8087–8093, https://doi.org/10.1002/2015WR017720, 2015.
  Glenn, E.P., Doody, T.M., Guerschman, J.P., Huete, A.R., King, E.A., McVicar, T.R., Van Dijk, A.I.J.M., Van Niel, T.G., Yebra, M. and Zhang, Y.: Actual evapotranspiration estimation by ground and remote sensing methods: the Australian experience, Hydrol. Process., 25, 4103–4116, https://doi.org/10.1002/hyp.8391, 2011.
  Hobeichi, S., Abramowitz, G., Evans, J., and Beck, H. E.: Linear Optimal Runoff Aggregate (LORA): a global gridded synthesis runoff product, Hydrol. Earth Syst. Sci., 23, 851–870, https://doi.org/10.5194/hess-23-851-2019, 2019.
  Kim, D., Ha, K.-J., and Yeo, J.-H.: New drought projections over East Asia using evapotranspiration deficits from the CMIP6 warming scenarios, Earth's Future, 9, e2020EF001697, https://doi.org/10.1029/2020EF001697, 2021.
  Kim, D., Lee, W.‐S., Kim, S. T., and Chun, J. A.: Historical drought assessment over the contiguous United States using the generalized complementary principle of evapotranspiration, Water Resour. Res., 55, 6244–6267. https://doi.org/10.1029/2019WR024991, 2019b.
  Ma, N., and Szilagyi, J.: The CR of evaporation: a calibration-free diagnostic and benchmarking tool for large-scale terrestrial evapotranspiration modeling, Water Resour. Res., 55, 7246–7274. https://doi.org/10.1029/2019WR024867, 2019.
  Martens, B., Schumacher, D. L., Wouters, H., Muñoz-Sabater, J., Verhoest, N. E. C., and Miralles, D. G.: Evaluating the land-surface energy partitioning in ERA5, Geosci. Model Dev., 13, 4159–4181, https://doi.org/10.5194/gmd-13-4159-2020, 2020.
  Pan, S., Pan, N., Tian, H., Friedlingstein, P., Sitch, S., Shi, H., Arora, V. K., Haverd, V., Jain, A. K., Kato, E., Lienert, S., Lombardozzi, D., Nabel, J. E. M. S., Ottlé, C., Poulter, B., Zaehle, S., and Running, S. W.: Evaluation of global terrestrial evapotranspiration using state-of-the-art approaches in remote sensing, machine learning and land surface modeling, Hydrol. Earth Syst. Sci., 24, 1485–1509, https://doi.org/10.5194/hess-24-1485-2020, 2020.
  Szilagyi, J., Crago, R., and Qualls, R: A calibration‐free formulation of the complementary relationship of evaporation for continental‐scale hydrology. Journal of Geophysical Research: Atmospheres, 122, 264–278. https://doi.org/10.1002/2016JD025611, 2017.
  Yang, Y., Roderick, M. L., Zhang, S., McVicar, T. R., Donohue, R. J.: Hydrologic implications of vegetation response to elevated CO2 in climate projections. Nature Clim. Change 9, 44–48, https://doi.org/10.1038/s41558-018-0361-0, 2019.
  
  Citation: https://doi.org/10.5194/hess-2021-126-AC1
RC2:
'Comment on hess-2021-126', Joshua Fisher, 25 Mar 2021
This is a good paper that discusses in depth the complementary relationship (CR) of evapotranspiration (ET), and conducts an extensive evaluation of the CR over Australia. The paper is well-motivated—the vapor pressure deficit (VPD)-soil moisture (SM) CR is widely used in ET estimation, primarily because high quality and high spatial resolution SM is not always readily available; whereas, VPD may be more readily available. Semi-arid places like in Australia are where the CR may be most important, and potentially where it may be the most uncertain.

The Introduction and Discussion are written very intelligently, taking a deep dive into the theory and formulations of the CR and ET estimation. The authors do a good job of describing each of the respective ET models and datasets. Relative to the strength of the overall writing, especially from these sections, the analysis and results were somewhat lacking in depth, however. Ultimately, the results were just a handful of maps and time series of the different products, with no real “truth” to benchmark against. Given how intelligent the authors were with their communication and writing of the theory, I was surprised to see the analysis so shallow. I would have liked to have seen that same intelligence from the writing applied to the analysis. The authors could have gone into much more analytical depth on spatial patterns, sensitivities, etc.

Related to the tenuous/lack of benchmarking, I suggest editing the language for use of words like bias and under/over-estimation e.g. in the Results. These terms generally refer to a metric of truth, of which none is given here (I don’t consider the water balance the “truth” given that it is also a model of models; see also comments from Reviewer 1). Better, to stick with language such as larger/smaller/etc. as the comparisons are just relative to one another.

Moreover, be cognizant in attributing pattern to process relative to model run conditions, especially when it comes to relative magnitudes. Any one model can be high or low depending on the forcing dataset it used (see e.g. comments from Reviewer 1), which is not necessarily indicative of the model (or, importantly for this paper, the inferred processes therein). The closest approximation to ascertaining process from pattern would be to identify spatial and temporal patterns regardless of magnitude. For example, the patterns mentioned for AWRA-L in L251 are interesting and likely indicative of process (though they could have easily just been attributable to something unusual in the forcing used for that model).

Line-specific comments:

Abstract is written a bit, well, abstractly. It could use more take-home information/detail like what exactly where the models and what exactly was their performances.

L37. See [, 2017].

L39. See [, 2013].

L47. See [, 2011].

L54. See, for reference, [, 2018].

L192. PT-JPL [, 2008] also incorporates the complementary relationship, citing Bouchet, in the soil evaporation component—e.g., RH^VPD. This simple formulation tracks relative surface wetness well [, 2008], and has since been used in other major models of ET, e.g., PM-MOD16 [, 2011]. Still, advection will contaminate the relationship, and replacement with direct soil moisture e.g. [, 2018], can eliminate that contamination. The new ECOSTRESS mission [, 2020] uses PT-JPL for the global ET product, but is currently being updated to incorporate the [, 2018] soil moisture formulation and inclusion, downscaled using the measured LST and NDVI following [, 2017].

Figure 3. I suggest making the symbols in the Taylor diagram more distinguishable.

Figure 4. PT-JPL data are available from 1984 from the same link where you got the current data.

L402. See [, 2018] for soil moisture incorporation into PT-JPL.

Figure 8. This seems to be redundant with Figure 4.

Josh Fisher

Colliander, A., J. B. Fisher, G. Halverson, O. Merlin, S. Misra, R. Bindlish, T. J. Jackson, and S. Yueh (2017), Spatial downscaling of SMAP soil moisture using MODIS land surface temperature and NDVI during SMAPVEX15, , (11), 2107-2111.

Fisher, J. B., K. Tu, and D. D. Baldocchi (2008), Global estimates of the land-atmosphere water flux based on monthly AVHRR and ISLSCP-II data, validated at 16 FLUXNET sites, , (3), 901-919.

Fisher, J. B., R. H. Whittaker, and Y. Malhi (2011), ET Come Home: A critical evaluation of the use of evapotranspiration in geographical ecology, , , 1-18.

Fisher, J. B., F. Melton, E. Middleton, C. Hain, M. Anderson, R. Allen, M. F. McCabe, S. Hook, D. Baldocchi, P. A. Townsend, A. Kilic, K. Tu, D. D. Miralles, J. Perret, J.-P. Lagouarde, D. Waliser, A. J. Purdy, A. French, D. Schimel, J. S. Famiglietti, G. Stephens, and E. F. Wood (2017), The future of evapotranspiration: Global requirements for ecosystem functioning, carbon and climate feedbacks, agricultural management, and water resources, , , 2618-2626.

Fisher, J. B., B. Lee, A. J. Purdy, G. H. Halverson, M. B. Dohlen, K. Cawse-Nicholson, A. Wang, R. G. Anderson, B. Aragon, M. A. Arain, D. D. Baldocchi, J. M. Baker, H. Barral, C. J. Bernacchi, C. Bernhofer, S. C. Biraud, G. Bohrer, N. Brunsell, B. Cappelaere, S. Castro-Contreras, J. Chun, B. J. Conrad, E. Cremonese, J. Demarty, A. R. Desai, A. De Ligne, L. Foltýnová, M. L. Goulden, T. J. Griffis, T. Grünwald, M. S. Johnson, M. Kang, D. Kelbe, N. Kowalska, J.-H. Lim, I. Maïnassara, M. F. McCabe, J. E. C. Missik, B. P. Mohanty, C. E. Moore, L. Morillas, R. Morrison, J. W. Munger, G. Posse, A. D. Richardson, E. S. Russell, Y. Ryu, A. Sanchez-Azofeifa, M. Schmidt, E. Schwartz, I. Sharp, L. Šigut, Y. Tang, G. Hulley, M. Anderson, C. Hain, A. French, E. Wood, and S. Hook (2020), ECOSTRESS: NASA's Next Generation Mission to Measure Evapotranspiration From the International Space Station, , (4), 1-20.

Mu, Q., M. Zhao, and S. W. Running (2011), Improvements to a MODIS global terrestrial evapotranspiration algorithm, , , 519-536.

Polhamus, A., J. B. Fisher, and K. P. Tu (2013), What controls the error structure in evapotranspiration models?, , (0), 12-24.

Purdy, A. J., J. B. Fisher, M. L. Goulden, A. Colliander, G. Halverson, K. Tu, and J. S. Famiglietti (2018), SMAP soil moisture improves global evapotranspiration, , , 1-14.
Citation: https://doi.org/10.5194/hess-2021-126-RC2
- AC2: 'Reply on RC2', Jong Ahn Chun, 08 Jun 2021
  
  We are thankful for Dr. Fisher’s positive and constructive comments. To improve the manuscript, we will conduct (1) an additional simple statistical analysis that could isolate effects of individual controls on variability of ET for each model. This would make the manuscript more informative. In addition, (2) we will show the reliability of ETwb by replacing the global precipitation product with a locally preferable one, even though the two are not very different. And, we will show predictive performance of the modeled runoff data by comparing them against available catchment runoff observations. Our responses to specific comments are following.
  Comment 1: This is a good paper that discusses in depth the complementary relationship (CR) of evapotranspiration (ET), and conducts an extensive evaluation of the CR over Australia. The paper is well-motivated—the vapor pressure deficit (VPD)-soil moisture (SM) CR is widely used in ET estimation, primarily because high quality and high spatial resolution SM is not always readily available; whereas, VPD may be more readily available. Semi-arid places like in Australia are where the CR may be most important, and potentially where it may be the most uncertain.
  The Introduction and Discussion are written very intelligently, taking a deep dive into the theory and formulations of the CR and ET estimation. The authors do a good job of describing each of the respective ET models and datasets. Relative to the strength of the overall writing, especially from these sections, the analysis and results were somewhat lacking in depth, however. Ultimately, the results were just a handful of maps and time series of the different products, with no real “truth” to benchmark against. Given how intelligent the authors were with their communication and writing of the theory, I was surprised to see the analysis so shallow. I would have liked to have seen that same intelligence from the writing applied to the analysis. The authors could have gone into much more analytical depth on spatial patterns, sensitivities, etc.
  Response: We appreciate the positive comments. To refine our statistical analysis, we will add some analysis that could isolate effects of climatic and other controls on the interannual variability of each model, e.g., via partial correlation analyses. Such information could be beneficial for selection of grid ET products in Australia.
  We want to remind that the objective of this work is a comparative evaluation of the CR method, not an absolute one. Australia include large arid areas that are almost inhabitable for humans. Awaiting ground observations in such locations for evaluating hydrologic models is unrealistic. In this case, reliable ET estimates could become an alternative evaluation reference. Pan et al. (2020), for example, compared modeled ET products even with the ensemble of modeled ETs. We believe that such an evaluation could become informative too. We do not argue that ETwb estimates are true values, but could become an acceptable reference for evaluation.
  To improve reliability of ETwb in revision, we will replace the GPCC precipitation (P) with the locally preferable SILO data as recommended by the reviewer 1. Even though the GPCC data are not considerably deviated from the SILO data (see Figure S2), the replacement would prevent prejudices in accuracy of ETwb.
  In addition, we will add evaluation of the ensemble of the modeled runoffs against the available runoff observations. Since the LORA and the GRUN are validated datasets by global runoff observations, the modeled runoff would show acceptable agreement. And, we want to highlight that ETwb is mostly determined by precipitation rather than by runoff in the arid continent . Approximately, 90% of precipitation evaporates in Australia (Glenn et al., 2011), and thus quality of the modeled runoff would exert minor influences on ETwb. Please see our responses to the Referee 1's comments
  
  Comment 2: Related to the tenuous/lack of benchmarking, I suggest editing the language for use of words like bias and under/over-estimation e.g. in the Results. These terms generally refer to a metric of truth, of which none is given here (I don’t consider the water balance the “truth” given that it is also a model of models; see also comments from Reviewer 1). Better, to stick with language such as larger/smaller/etc. as the comparisons are just relative to one another.
  Response: Thank you very much for these more suitable terms. We will use appropriate terminology in revision when discussing the comparisons between the models, and will highlight that ETwb is just an acceptable evaluation reference.
  
  Comment 3: Moreover, be cognizant in attributing pattern to process relative to model run conditions, especially when it comes to relative magnitudes. Any one model can be high or low depending on the forcing dataset it used (see e.g. comments from Reviewer 1), which is not necessarily indicative of the model (or, importantly for this paper, the inferred processes therein). The closest approximation to ascertaining process from pattern would be to identify spatial and temporal patterns regardless of magnitude. For example, the patterns mentioned for AWRA-L in L251 are interesting and likely indicative of process (though they could have easily just been attributable to something unusual in the forcing used for that model).
  Response: We agree. In revision, we will look into the patterns of relative magnitudes as well as those of absolute magnitudes. And, we will tabulate forcing inputs of each model so that readers could realize differences in forcing inputs of the models at a glance.
  
  Comment 4: Abstract is written a bit, well, abstractly. It could use more take-home information/detail like what exactly where the models and what exactly was their performances.
  Response: After revision, the abstract will be rewritten accordingly. We will include several take-home lessons in the new abstract.
  
  Comment 5: L37. See [Fisher et al., 2017]., L39. See [Polhamus et al., 2013], L47. See [Fisher et al., 2011], L54. See, for reference, [Purdy et al., 2018].
  Response: Thanks for the references. We will cite them when necessary. They could improve the introduction, .
  
  Comment 6: PT-JPL [Fisher et al., 2008] also incorporates the complementary relationship, citing Bouchet, in the soil evaporation component—e.g., RH^VPD. This simple formulation tracks relative surface wetness well [Fisher et al., 2008], and has since been used in other major models of ET, e.g., PM-MOD16 [Mu et al., 2011]. Still, advection will contaminate the relationship, and replacement with direct soil moisture e.g. [Purdy et al., 2018], can eliminate that contamination. The new ECOSTRESS mission [Fisher et al., 2020] uses PT-JPL for the global ET product, but is currently being updated to incorporate the [Purdy et al., 2018] soil moisture formulation and inclusion, downscaled using the measured LST and NDVI following [Colliander et al., 2017].
  Response: We will add the given attributse of the PT-JPL in the description.
  
  Comment 7: Figure 3. I suggest making the symbols in the Taylor diagram more distinguishable.
  Response: We will revise as recommended.
  
  Comment 8: Figure 4. PT-JPL data are available from 1984 from the same link where you got the current data.
  Response: We will update it as recommended in revision, and discussion will be revised accordingly
  
  Comment 9: See [Purdy et al., 2018] for soil moisture incorporation into PT-JPL.
  Response: We confirmed it. The discussion will be revised accordingly.
  
  Comment 10: Figure 8. This seems to be redundant with Figure 4.
  Response: Figure 8 updates Figure 4 with performance of the calibrated CR method. Hence, it is not redundant, but indicating differences from Figure 4. However, to make the manuscript more concise, we could consider simple explanation on performance of the CR method after calibration with ETwb.
  
  References
  Pan, S., Pan, N., Tian, H., Friedlingstein, P., Sitch, S., Shi, H., Arora, V. K., Haverd, V., Jain, A. K., Kato, E., Lienert, S., Lombardozzi, D., Nabel, J. E. M. S., Ottlé, C., Poulter, B., Zaehle, S., and Running, S. W.: Evaluation of global terrestrial evapotranspiration using state-of-the-art approaches in remote sensing, machine learning and land surface modeling, Hydrol. Earth Syst. Sci., 24, 1485–1509, https://doi.org/10.5194/hess-24-1485-2020, 2020.
  
  Citation: https://doi.org/10.5194/hess-2021-126-AC2
AC3: 'correction of the authors' response to the referee 1's comment on Eq. (7)', Jong Ahn Chun, 25 Jan 2022

While we thought the referee 1's comment on Eq. (7) was wrong, now we undertand why the refree required correction.
Before the third-round revision, Eq. (7) combined the two equations as Tdry = Twb + es(Twb)/gamma = Tavg + es(Tavg)/gamma. Correctly, the last term, es(Tavg)/gamma, was supposed to be es(Tdew)/gamma (or ea/gamma). So, the referee 1's comment on Tavg was right. Nonetheless, there are nothing to do in prior calculations, because we used Twb when calculating Tdry, i.e., Tdry = Twb + es(Twb)/gamma. So, It was a mistake in writing, not in calculation.
The revised manuscript does not include the mispresented last term, thus it would not be an issue for potential readers.
We deeply regret our response to the referee's comment, and send a sincere apology to the referee1. Once again, we greatly appreciate the sound comments from the referee1. Thank you.

Citation: https://doi.org/10.5194/hess-2021-126-AC3

Status: closed

RC1:
'Comment on hess-2021-126', Anonymous Referee #1, 17 Mar 2021

MODEL FORCING:

The most questionable issue in this MS is that the authors still use ERA-Interim forcing to drive the CR in the MS submitted in 2021 rather than 2011, the latter is the year when the widely-known ERA-Interim paper was published in Quarterly Journal of the Royal Meteorological Society. In 2021, this is an obviously out-of-date meteorological forcing which should not be used (and also ERA-Interim’s ET output). Its successor, ERA5, with a spatial resolution of 0.25 degree, should be considered as a significant improvement, evidenced by a few recent papers when both ERA-Interim and ERA5 were used to drive the ET model (Martens et al., 2020) and hydrological model (Tarek et al., 2020). In general, ERA-Interim has larger errors than ERA5, which would propagate into the CR-model simulated ET values.

It is very strange that the authors did not use locally-preferable meteorological forcing when they focus only on Australia. In Australia, the SILO forcing (https://www.longpaddock.qld.gov.au/silo/), produced by interpolation of ground observations from the Australian Bureau of Meteorology and also other sources, should be much better than those forcing developed for a global coverage. While SILO has no net radiation nor wind speed, its air temperature, vapor pressure, air pressure, and solar radiation should be much more reliable than those from ERA-Interim/ERA5 for Australia. For this reason, the authors should try to drive the models using these.

Net radiation is often regarded as the most sensitive input for most ET models (see e.g., Figure 3 in Fisher et al. 2017). For this reason, I have to remind that net radiation from any atmospheric reanalysis dataset is essentially from the model simulations (for upward short- and long-wave radiation), which may have greater uncertainties than satellite observations. While satellite-based net radiation often has relatively coarse spatial resolution (one degree), the authors used 0.5 degree for their simulations, which is not far from it. Therefore, I wonder if the CR model could be improved with satellite-based net radiation data driving it. I suggest trying to use net radiation from CERES (https://ceres.larc.nasa.gov/) and/or GEWEX SRB (https://www.gewex.org/data-sets-surface-radiation-budget-srb/).

Refs:

Martens, B., et al., 2020. Evaluating the land-surface energy partitioning in ERA5. Geosci. Model Dev., 13, 4159–4181

Tarek, M., et al. 2020. Evaluation of the ERA5 reanalysis as a potential reference dataset for hydrological modelling over North America. Hydrol. Earth Syst. Sci., 24, 2527–2544.

Fisher, J., et al., 2017. The future of evapotranspiration: Global requirements for ecosystem functioning, carbon and climate feedbacks, agricultural management, and water resources. Water Resources Research, 53, 2618-2626

MODEL VALIDATION:

As the water-balance-based ET is key for assessing the ET models, the weakest point in ET_wb of this MS is the grid-based runoff data (GRUN and LORA) which has much larger uncertainties than the station-measured ones at the outlet of a basin. In Australia, the most popular runoff data is “Zhang et al., 2013. Collation of Australian modeller's streamflow dataset for 780 unregulated Australian catchments. CSIRO Land and Water, 115 pp.” (available at: https://publications.csiro.au/rpr/pub?pid=csiro:EP113194). Note also that it is more appropriate to involve only the unregulated basins with minimum human activities for validation purpose. While the authors did compare their continent-averaged ET_wb with previous similar studies in Line 230-239, this does not necessarily mean that ET_wb is accurate at the grid or basin scales.

Another key deficiency is the precipitation data for calculating ET_wb. For the same reason explained above, the authors should use precipitation data with a regional/continental focus rather than the one developed for a global coverage. The BILO precipitation data is often regarded as the most reliable one for Australia (https://data.gov.au/data/dataset/67749ef0-7223-437a-851a-573edde09567), which should be used to replace GPCC for a more accurate ET_wb.

I do understand that the authors want to use grid-based ET_wb data for evaluations, but the authors should also test its reliability with basin-scale ET_wb. The latter could be derived using measured runoff data from the above-mentioned 780 basins.

I would further argue that the authors’ ET_wb data are at least 10% smaller than the real values most likely are due to the suboptimal choice of the precipitation as well as the gridded runoff data. The reason I am saying this is that FLUXCOM and CR ET yields pretty similar values seen in Fig. 4. It should be noted that FLUXCOM must provide (and it does) one of the most accurate ET data available today since it is based on actual ET measurements by eddy-covariance, even though its inter-annual variance is somewhat subdued as seen in Fig. 3 in comparison with the CR ET values, but this temporal smoothing feature of FLUXCOM is well known from previous studies.

So all in all, it is argued here that most of the difference between CR ET and ET_wb is most likely due to unsatisfactory choices in model and water-balance forcing rather than to the need of spatially changing the alpha value of the CR model. (This is not to say that the CR would not overestimate ET rates near the sea since there the air moisture is significantly decoupled form the underlying land surface). As another choice, the authors could also apply several different sources for the forcing in the CR model as well as in the water-balance and see how they affect the outcomes (this would also serve as a sensitivity analysis). Chances are they do in a significant way.

So before one just replaces a unique calibration-free model with one needing calibration, one must make sure that the original model was evaluated correctly and exhaustively. I do not feel at all this is the case in this study.

Note also that there is a significant difference between Brutsaert’s alpha and the alpha value of the CR employed in this study. The Priestley-Taylor (PT) equation is evaluated at the measured air temperature in the former case, while in the latter at the required (but mostly unknown) wet-environment air temperature (estimated via T_ws). Without the latter, the PT equation naturally overestimates the wet-environment ET rates (and thus the actual ET rates as well) the more significantly, the drier and hotter the environment has become, therefore a correction (typically based on some measure of aridity) in the alpha value is necessary in the Brutsaert model, but not in the CR model employed in this study. So in this study the alpha value is meant to be the best available estimate of the real PT alpha value and not some weak analog of it, as in the Brutsaert model (i.e., Brutsaert et al., 2020), the latter taking up values much below the physically still interpretable value of one.

Another model application issue is that in the CR model of this study T_ws is estimated only one way, while the original authors of this CR model also described another method for the T_ws estimation, yielding somewhat smaller T_ws values (as mentioned in this MS, and therefore potentially resulting in a higher alpha-value estimate, most probably bringing the derived alpha value into the often quoted, typically observed 1.1 – 1.32 interval). In fact, most of this CR model’s applications use the latter, so it would be worth to check how it affects model outcomes and the constant alpha value estimation.

Line 238: I do not think GPCC is a reanalysis precipitation data.

Please check the text for the numerous typographical errors. For example: correctly ‘Priestley’.

Citation: https://doi.org/10.5194/hess-2021-126-RC1
- AC1: 'Reply on RC1', Jong Ahn Chun, 08 Jun 2021
  
  We greatly appreciate the efforts of the referee for reviewing our study. The constructive comments will help us to improve the manuscript. To consider the referee’s comments, we will update our CR ET calculations with locally preferable forcing (the SILO data). We found that the ERA-interim forcing inputs are not very different from the SILO data. It is also found that the SILO Precipitation (P) and the GPCC P data are not considerably different at the 0.5° grid scale (see Figure S2). Hence, we believe that the evaluation of the CR method would not considerably change even though the datasets are replaced with the locally preferable ones.
  However, we will update our calculation with the locally preferable ones. It will improve confidence of our analysis at least somehow. Since the referee has a concern about ETwb, we will use the SILO P data for estimating ETwb. And, in revision, we will add a comparison between the ensemble of the modeled runoff (Q) product and the Q observations given by the referee. Since the GRUN and the LORA datasets were already validated by global Q observations, we do not expect that the ensemble of the two would not substantially biased from the observations. However, due to the scale-mismatch between the modeled Q and the observations, only sufficiently large catchments could be considered in the comparison. This revision will improve the confidence of the comparative evaluation.
  It should be noted that the accuracy of ETwb is mostly determined by the quality of P data, not by Q. Hence, even though some biases are found with the modeled Q, that would exert minor influences on the accuracy of ETwb. In Australia, 90% of P evaporates (Glenn et al., 2011). Our responses to specific comments are following. For instance, the LORA and the VIC simulated mean annual Q of the Murray-Darling river basin as much as 15 mm/a and 42 mm/a, respectively (Hobeichi et al., 2019), and the difference between the two seems large. Nevertheless, ETwb (P – Q) estimates corresponding to the mean annual P (470 mm/a) of the basin are 455 mm/a and 428 mm/a, respectively. The relative difference between the ETwb estimates is only 6%. Indeed, we did use Q products from multiple models for the evaluation, so that potential biases in the modeled flows might be reduced.
  Our responses to specific comments are following.
  Comment 1: The most questionable issue in this MS is that the authors still use ERA-Interim forcing to drive the CR in the MS submitted in 2021 rather than 2011, the latter is the year when the widely-known ERA-Interim paper was published in Quarterly Journal of the Royal Meteorological Society. In 2021, this is an obviously out-of-date meteorological forcing which should not be used (and also ERA-Interim’s ET output). Its successor, ERA5, with a spatial resolution of 0.25 degree, should be considered as a significant improvement, evidenced by a few recent papers when both ERA-Interim and ERA5 were used to drive the ET model (Martens et al., 2020) and hydrological model (Tarek et al., 2020). In general, ERA-Interim has larger errors than ERA5, which would propagate into the CR-model simulated ET values. It is very strange that the authors did not use locally-preferable meteorological forcing when they focus only on Australia. In Australia, the SILO forcing (https://www.longpaddock.qld.gov.au/silo/), produced by interpolation of ground observations from the Australian Bureau of Meteorology and also other sources, should be much better than those forcing developed for a global coverage. While SILO has no net radiation nor wind speed, its air temperature, vapor pressure, air pressure, and solar radiation should be much more reliable than those from ERA-Interim/ERA5 for Australia. For this reason, the authors should try to drive the models using these.
  Response: We will update the ERA-interim temperature forcing data with the ones provided by the Bureau of Meteorology (BoM) of Australia, since it would not take long for us to update the calculations. However, we do not agree that the ERA-Interim inputs are unreliable forcing to the CR method. Even though they are updated to the ERA-5, the ERA-Interim reanalysis system performed even better than the Bureau of Meteorology Atmospheric high-resolution Regional Reanalysis for Australia (BARRA) in reproducing observed variations of precipitation (Acharya et al., 2019). Ostensibly, one can assume that the local data sources are more reliable than global products; however, global datasets, too, could be reliable for a hydrometeorological analysis.
  The objective of this study is to evaluate the CR method at a common grid scale at which physical, machine learning, and land surface models are developed. At the grid scale, we believe that the ERA-Interim data have been reliable (e.g., Kim et al., 2021).
  In addition, the referee’s argument that global data sources would not be as good as local data is a hypothesis that should be tested. Hence, we compared the mean vapor pressure deficit (VPD) of the ERA-interim, which is a major input for the CR method, with those from the locally preferable SILO archive. Figure S1 shows that except some overrated values in the north-western part, the mean VDPs from the ERA-Interim were not considerably deviated from the SILO data (see).
  Indeed, since a higher VPD indicates a lower ET in the CR principle, the CR ET estimates will increase at least somehow when the ERA-Interim forcing is replaced with the SILO data, being more deviated from ETwb. Therefore, our conclusion on the calibration-free CR method (i.e., higher than ET_wb) would not change.
  Nonetheless, to consider the comment, we will update our calculations with the preferable ones (temperatures and vapor pressure datasets from the SILO archive), and this revision will reduce concerns about input-data quality. Since the SILO archive does not provide wind speed and net radiation data, we will use the ERA-5 datasets for the recalculation. This revision will not take long, and we do not expect largely different outcomes.
  
  Comment 2: Net radiation is often regarded as the most sensitive input for most ET models (see e.g., Figure 3 in Fisher et al. 2017). For this reason, I have to remind that net radiation from any atmospheric reanalysis dataset is essentially from the model simulations (for upward short- and long-wave radiation), which may have greater uncertainties than satellite observations. While satellite-based net radiation often has relatively coarse spatial resolution (one degree), the authors used 0.5 degree for their simulations, which is not far from it. Therefore, I wonder if the CR model could be improved with satellite-based net radiation data driving it. I suggest trying to use net radiation from CERES (https://ceres.larc.nasa.gov/) and/or GEWEX SRB (https://www.gewex.org/data-sets-surface-radiation-budget-srb/).
  Response: We disagree. To predict ET, the CR method mainly uses the response of VPD to soil moisture deficiency rather than depending largely on radiation data. Any methods that assume the proportionality of ET to net radiation (e.g., the PT-JPL and GLEAM) could produce ET products being sensitive to changes in net radiation. However, the predictor of the CR method is the ratio between E_p and E_w, which is a normalized variable. Hence, while Ma and Szilagyi (2019) found outstanding performance of the CR method with a reanalysis radiation dataset, its similar performance was found even with net radiation estimated by the simple standard method (Kim et al., 2019).
  Even though the ERA-interim radiations are modeled values, they could become forcing data to the CR method and can lead to outstanding performance (e.g., Kim et al., 2021).
  On the contrary, the satellite radiation data would lead to scale-mismatch with the other ET models and the ERA forcing inputs, while precision of the remote sensing observations are not always guaranteed. The satellite radiation is unlikely good in our assessment.
  Instead, we will re-calculate CR ET with the ERA-5 radiation datasets to benefit from the updated ERA reanalysis system.
  
  Comment 3: As the water-balance-based ET is key for assessing the ET models, the weakest point in ETwb of this MS is the grid-based runoff data (GRUN and LORA) which has much larger uncertainties than the station-measured ones at the outlet of a basin. In Australia, the most popular runoff data is “Zhang et al., 2013. Collation of Australian modeller's streamflow dataset for 780 unregulated Australian catchments. CSIRO Land and Water, 115 pp.” (available at: https://publications.csiro.au/rpr/pub?pid=csiro:EP113194). Note also that it is more appropriate to involve only the unregulated basins with minimum human activities for validation purpose. While the authors did compare their continent-averaged ETwb with previous similar studies in Line 230-239, this does not necessarily mean that ETwb is accurate at the grid or basin scales.
  Response: This is an arguable comment. The Q data for 780 unregulated catchments only can evaluate ET in the gauged catchments. They are too coarse to evaluate the modeled ET for the entire continent. Importantly, the catchment Q observations will lead to scale-mismatches with the modeled products, because many of the 780 catchments are smaller than the 0.5° grid resolution. The Q observations are accurate, but cannot lead to general conclusions on performance of the CR method for the arid continent.
  In this case, we believe that an acceptable evaluation reference with a larger spatial coverage could be a better option. The grid ETwb could be acceptable when the P and the synthesized Q data are of good precision. Even though the ETwb is not true values, it could become a reference for cross-evaluation of the modeled ET, possibly providing practical information for ungauged basins. For example, Pan et al. (2020) assessed numerous ET models against an ensemble of modeled ETs.
  Approximately, in Australia, 90% of precipitation evaporates, and thus reliability of ETwb depends mostly on the quality of P data. Figure S2 is comparison between the locally preferable SILO P and the GPCC P data. Only 2.1% difference was found between the SILO and the GPCC precipitation for 2002-2012. We do not agree that this slight difference makes the GPCC data and the ETwb unreliable. The objective of this study is a comparative evaluation, not an absolute evaluation.
  To improve confidence of our evaluation, we will compare the modeled Q data with the Q observations of catchments larger than the grid resolution.
  
  Comment 4: Another key deficiency is the precipitation data for calculating ETwb. For the same reason explained above, the authors should use precipitation data with a regional/continental focus rather than the one developed for a global coverage. The BILO precipitation data is often regarded as the most reliable one for Australia (https://data.gov.au/data/dataset/67749ef0-7223-437a-851a-573edde09567), which should be used to replace GPCC for a more accurate ETwb. I do understand that the authors want to use grid-based ETwb data for evaluations, but the authors should also test its reliability with basin-scale ETwb. The latter could be derived using measured runoff data from the above-mentioned 780 basins.
  Response: We believe that the SILO data based on local observation would be good. Beesley et al. (2009) validated the daily SILO precipitation using the leave-one-out-cross-validation, and showed acceptable performance. As replied, we will add the comparison between the catchment Q observations and the ensemble of the modeled Q.
  
  Comment 5: I would further argue that the authors’ ETwb data are at least 10% smaller than the real values most likely are due to the suboptimal choice of the precipitation as well as the gridded runoff data. The reason I am saying this is that FLUXCOM and CR ET yields pretty similar values seen in Fig. 4. It should be noted that FLUXCOM must provide (and it does) one of the most accurate ET data available today since it is based on actual ET measurements by eddy-covariance, even though its inter-annual variance is somewhat subdued as seen in Fig. 3 in comparison with the CR ET values, but this temporal smoothing feature of FLUXCOM is well known from previous studies.
  Response: We disagree. Since it could be regionally biased and inaccurate, the FLUXCOM might not be accurate in Australia, where flux observations for training are insufficient. Please see the number of the FLUXNET2015 stations in Australia here (https://fluxnet.org/sites/site-summary/). Even though the FLUXCOM is based on eddy covariance flux data, the towers are usually installed in accessible locations only. And, large part of Australia is almost inhabitable for humans. Operating a flux tower in such a location is very difficult. It is very likely that the training data were insufficient in the arid continent. Hence, the quality of the FLUXCOM is questionable in Australia, and the referee’s argument is hypothetical.
  
  Comment 6: So all in all, it is argued here that most of the difference between CR ET and ETwb is most likely due to unsatisfactory choices in model and water-balance forcing rather than to the need of spatially changing the alpha value of the CR model. (This is not to say that the CR would not overestimate ET rates near the sea since there the air moisture is significantly decoupled form the underlying land surface). As another choice, the authors could also apply several different sources for the forcing in the CR model as well as in the water-balance and see how they affect the outcomes (this would also serve as a sensitivity analysis). Chances are they do in a significant way.
  So before one just replaces a unique calibration-free model with one needing calibration, one must make sure that the original model was evaluated correctly and exhaustively. I do not feel at all this is the case in this study.
  Response: This comment is based on a hypothesis that the ERA-Interim forcing for the CR method and the P data for ETwb are unreliable. But, here we showed their acceptability by the direct comparisons with the locally preferable datasets. The referee’s argument could be an overstatement.
  However, as replied, we will update our calculations with locally preferable ones to prevent readers from any prejudice. New calculations would not take long, and evaluations on the CR method is unlikely to change much.
  
  Comment 7: Note also that there is a significant difference between Brutsaert’s alpha and the alpha value of the CR employed in this study. The Priestley-Taylor (PT) equation is evaluated at the measured air temperature in the former case, while in the latter at the required (but mostly unknown) wet-environment air temperature (estimated via Tws). Without the latter, the PT equation naturally overestimates the wet-environment ET rates (and thus the actual ET rates as well) the more significantly, the drier and hotter the environment has become, therefore a correction (typically based on some measure of aridity) in the alpha value is necessary in the Brutsaert model, but not in the CR model employed in this study. So in this study the alpha value is meant to be the best available estimate of the real PT alpha value and not some weak analog of it, as in the Brutsaert model (i.e., Brutsaert et al., 2020), the latter taking up values much below the physically still interpretable value of one.
  Response: We understand the temperature correction could change the magnitude of Ew, and will add this point in the discussion section. However, the essential difference between Brutsaert (2015) and Szilaygi et al. (2017) is the rescaling variable. Thus, the equation for alpha developed in Brutsaert et al. (2020) is unlikely to work for the calibration-free formulation of Szilagyi et al. (2017).
  We agree that the alpha in the CR framework is an analog of the PT coefficient. However, the alpha still has the same physical meaning to that of the PT coefficient. It quantifies the proportion of the aerodynamic component of the Penman equation when the surface is with ample water. A potential reason for the alpha deviating from 1.26 would be the chosen equation for E_p. The traditional Penman equation could overrate E_p as shown in Yang et al. (2019). We will add this point in the discussion section.
  
  Comment 8: Another model application issue is that in the CR model of this study Tws is estimated only one way, while the original authors of this CR model also described another method for the Tws estimation, yielding somewhat smaller Tws values (as mentioned in this MS, and therefore potentially resulting in a higher alpha-value estimate, most probably bringing the derived alpha value into the often quoted, typically observed 1.1 – 1.32 interval). In fact, most of this CR model’s applications use the latter, so it would be worth to check how it affects model outcomes and the constant alpha value estimation.
  Response: We disagree. The choice for Tws would exert very minor influences on the CR ET. As shown in Table 1 in Szilagyi et al. (2017), the choice for Tws led to small differences in the alpha within the order of 0.01 or 0.02.
  
  Comment 9: Line 238: I do not think GPCC is a reanalysis precipitation data.
  Response: It is a gridded gauge-analysis product. We will revise it.
  
  Comment 10: Please check the text for the numerous typographical errors. For example: correctly ‘Priestley’.
  Response: We will check all the typographical errors in revision.
  
  References
  Acharya, S. C., Nathan, R., Wang, Q. J., Su, C.-H., and Eizenberg, N.: An evaluation of daily precipitation from a regional atmospheric reanalysis over Australia, Hydrol. Earth Syst. Sci., 23, 3387–3403, https://doi.org/10.5194/hess-23-3387-2019, 2019.
  Beesley, C. A., Frost, A. J., and Zajaczkowski, J.: A comparison of the BAWAP and SILO spatially interpolated daily rainfall datasets, 18th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009.
  Brutsaert, W.: A generalized complementary principle with physical constraints for land‐surface evaporation, Water Resour. Res., 51, 8087–8093, https://doi.org/10.1002/2015WR017720, 2015.
  Glenn, E.P., Doody, T.M., Guerschman, J.P., Huete, A.R., King, E.A., McVicar, T.R., Van Dijk, A.I.J.M., Van Niel, T.G., Yebra, M. and Zhang, Y.: Actual evapotranspiration estimation by ground and remote sensing methods: the Australian experience, Hydrol. Process., 25, 4103–4116, https://doi.org/10.1002/hyp.8391, 2011.
  Hobeichi, S., Abramowitz, G., Evans, J., and Beck, H. E.: Linear Optimal Runoff Aggregate (LORA): a global gridded synthesis runoff product, Hydrol. Earth Syst. Sci., 23, 851–870, https://doi.org/10.5194/hess-23-851-2019, 2019.
  Kim, D., Ha, K.-J., and Yeo, J.-H.: New drought projections over East Asia using evapotranspiration deficits from the CMIP6 warming scenarios, Earth's Future, 9, e2020EF001697, https://doi.org/10.1029/2020EF001697, 2021.
  Kim, D., Lee, W.‐S., Kim, S. T., and Chun, J. A.: Historical drought assessment over the contiguous United States using the generalized complementary principle of evapotranspiration, Water Resour. Res., 55, 6244–6267. https://doi.org/10.1029/2019WR024991, 2019b.
  Ma, N., and Szilagyi, J.: The CR of evaporation: a calibration-free diagnostic and benchmarking tool for large-scale terrestrial evapotranspiration modeling, Water Resour. Res., 55, 7246–7274. https://doi.org/10.1029/2019WR024867, 2019.
  Martens, B., Schumacher, D. L., Wouters, H., Muñoz-Sabater, J., Verhoest, N. E. C., and Miralles, D. G.: Evaluating the land-surface energy partitioning in ERA5, Geosci. Model Dev., 13, 4159–4181, https://doi.org/10.5194/gmd-13-4159-2020, 2020.
  Pan, S., Pan, N., Tian, H., Friedlingstein, P., Sitch, S., Shi, H., Arora, V. K., Haverd, V., Jain, A. K., Kato, E., Lienert, S., Lombardozzi, D., Nabel, J. E. M. S., Ottlé, C., Poulter, B., Zaehle, S., and Running, S. W.: Evaluation of global terrestrial evapotranspiration using state-of-the-art approaches in remote sensing, machine learning and land surface modeling, Hydrol. Earth Syst. Sci., 24, 1485–1509, https://doi.org/10.5194/hess-24-1485-2020, 2020.
  Szilagyi, J., Crago, R., and Qualls, R: A calibration‐free formulation of the complementary relationship of evaporation for continental‐scale hydrology. Journal of Geophysical Research: Atmospheres, 122, 264–278. https://doi.org/10.1002/2016JD025611, 2017.
  Yang, Y., Roderick, M. L., Zhang, S., McVicar, T. R., Donohue, R. J.: Hydrologic implications of vegetation response to elevated CO2 in climate projections. Nature Clim. Change 9, 44–48, https://doi.org/10.1038/s41558-018-0361-0, 2019.
  
  Citation: https://doi.org/10.5194/hess-2021-126-AC1
RC2:
'Comment on hess-2021-126', Joshua Fisher, 25 Mar 2021
This is a good paper that discusses in depth the complementary relationship (CR) of evapotranspiration (ET), and conducts an extensive evaluation of the CR over Australia. The paper is well-motivated—the vapor pressure deficit (VPD)-soil moisture (SM) CR is widely used in ET estimation, primarily because high quality and high spatial resolution SM is not always readily available; whereas, VPD may be more readily available. Semi-arid places like in Australia are where the CR may be most important, and potentially where it may be the most uncertain.

The Introduction and Discussion are written very intelligently, taking a deep dive into the theory and formulations of the CR and ET estimation. The authors do a good job of describing each of the respective ET models and datasets. Relative to the strength of the overall writing, especially from these sections, the analysis and results were somewhat lacking in depth, however. Ultimately, the results were just a handful of maps and time series of the different products, with no real “truth” to benchmark against. Given how intelligent the authors were with their communication and writing of the theory, I was surprised to see the analysis so shallow. I would have liked to have seen that same intelligence from the writing applied to the analysis. The authors could have gone into much more analytical depth on spatial patterns, sensitivities, etc.

Related to the tenuous/lack of benchmarking, I suggest editing the language for use of words like bias and under/over-estimation e.g. in the Results. These terms generally refer to a metric of truth, of which none is given here (I don’t consider the water balance the “truth” given that it is also a model of models; see also comments from Reviewer 1). Better, to stick with language such as larger/smaller/etc. as the comparisons are just relative to one another.

Moreover, be cognizant in attributing pattern to process relative to model run conditions, especially when it comes to relative magnitudes. Any one model can be high or low depending on the forcing dataset it used (see e.g. comments from Reviewer 1), which is not necessarily indicative of the model (or, importantly for this paper, the inferred processes therein). The closest approximation to ascertaining process from pattern would be to identify spatial and temporal patterns regardless of magnitude. For example, the patterns mentioned for AWRA-L in L251 are interesting and likely indicative of process (though they could have easily just been attributable to something unusual in the forcing used for that model).

Line-specific comments:

Abstract is written a bit, well, abstractly. It could use more take-home information/detail like what exactly where the models and what exactly was their performances.

L37. See [, 2017].

L39. See [, 2013].

L47. See [, 2011].

L54. See, for reference, [, 2018].

L192. PT-JPL [, 2008] also incorporates the complementary relationship, citing Bouchet, in the soil evaporation component—e.g., RH^VPD. This simple formulation tracks relative surface wetness well [, 2008], and has since been used in other major models of ET, e.g., PM-MOD16 [, 2011]. Still, advection will contaminate the relationship, and replacement with direct soil moisture e.g. [, 2018], can eliminate that contamination. The new ECOSTRESS mission [, 2020] uses PT-JPL for the global ET product, but is currently being updated to incorporate the [, 2018] soil moisture formulation and inclusion, downscaled using the measured LST and NDVI following [, 2017].

Figure 3. I suggest making the symbols in the Taylor diagram more distinguishable.

Figure 4. PT-JPL data are available from 1984 from the same link where you got the current data.

L402. See [, 2018] for soil moisture incorporation into PT-JPL.

Figure 8. This seems to be redundant with Figure 4.

Josh Fisher

Colliander, A., J. B. Fisher, G. Halverson, O. Merlin, S. Misra, R. Bindlish, T. J. Jackson, and S. Yueh (2017), Spatial downscaling of SMAP soil moisture using MODIS land surface temperature and NDVI during SMAPVEX15, , (11), 2107-2111.

Fisher, J. B., K. Tu, and D. D. Baldocchi (2008), Global estimates of the land-atmosphere water flux based on monthly AVHRR and ISLSCP-II data, validated at 16 FLUXNET sites, , (3), 901-919.

Fisher, J. B., R. H. Whittaker, and Y. Malhi (2011), ET Come Home: A critical evaluation of the use of evapotranspiration in geographical ecology, , , 1-18.

Fisher, J. B., F. Melton, E. Middleton, C. Hain, M. Anderson, R. Allen, M. F. McCabe, S. Hook, D. Baldocchi, P. A. Townsend, A. Kilic, K. Tu, D. D. Miralles, J. Perret, J.-P. Lagouarde, D. Waliser, A. J. Purdy, A. French, D. Schimel, J. S. Famiglietti, G. Stephens, and E. F. Wood (2017), The future of evapotranspiration: Global requirements for ecosystem functioning, carbon and climate feedbacks, agricultural management, and water resources, , , 2618-2626.

Fisher, J. B., B. Lee, A. J. Purdy, G. H. Halverson, M. B. Dohlen, K. Cawse-Nicholson, A. Wang, R. G. Anderson, B. Aragon, M. A. Arain, D. D. Baldocchi, J. M. Baker, H. Barral, C. J. Bernacchi, C. Bernhofer, S. C. Biraud, G. Bohrer, N. Brunsell, B. Cappelaere, S. Castro-Contreras, J. Chun, B. J. Conrad, E. Cremonese, J. Demarty, A. R. Desai, A. De Ligne, L. Foltýnová, M. L. Goulden, T. J. Griffis, T. Grünwald, M. S. Johnson, M. Kang, D. Kelbe, N. Kowalska, J.-H. Lim, I. Maïnassara, M. F. McCabe, J. E. C. Missik, B. P. Mohanty, C. E. Moore, L. Morillas, R. Morrison, J. W. Munger, G. Posse, A. D. Richardson, E. S. Russell, Y. Ryu, A. Sanchez-Azofeifa, M. Schmidt, E. Schwartz, I. Sharp, L. Šigut, Y. Tang, G. Hulley, M. Anderson, C. Hain, A. French, E. Wood, and S. Hook (2020), ECOSTRESS: NASA's Next Generation Mission to Measure Evapotranspiration From the International Space Station, , (4), 1-20.

Mu, Q., M. Zhao, and S. W. Running (2011), Improvements to a MODIS global terrestrial evapotranspiration algorithm, , , 519-536.

Polhamus, A., J. B. Fisher, and K. P. Tu (2013), What controls the error structure in evapotranspiration models?, , (0), 12-24.

Purdy, A. J., J. B. Fisher, M. L. Goulden, A. Colliander, G. Halverson, K. Tu, and J. S. Famiglietti (2018), SMAP soil moisture improves global evapotranspiration, , , 1-14.
Citation: https://doi.org/10.5194/hess-2021-126-RC2
- AC2: 'Reply on RC2', Jong Ahn Chun, 08 Jun 2021
  
  We are thankful for Dr. Fisher’s positive and constructive comments. To improve the manuscript, we will conduct (1) an additional simple statistical analysis that could isolate effects of individual controls on variability of ET for each model. This would make the manuscript more informative. In addition, (2) we will show the reliability of ETwb by replacing the global precipitation product with a locally preferable one, even though the two are not very different. And, we will show predictive performance of the modeled runoff data by comparing them against available catchment runoff observations. Our responses to specific comments are following.
  Comment 1: This is a good paper that discusses in depth the complementary relationship (CR) of evapotranspiration (ET), and conducts an extensive evaluation of the CR over Australia. The paper is well-motivated—the vapor pressure deficit (VPD)-soil moisture (SM) CR is widely used in ET estimation, primarily because high quality and high spatial resolution SM is not always readily available; whereas, VPD may be more readily available. Semi-arid places like in Australia are where the CR may be most important, and potentially where it may be the most uncertain.
  The Introduction and Discussion are written very intelligently, taking a deep dive into the theory and formulations of the CR and ET estimation. The authors do a good job of describing each of the respective ET models and datasets. Relative to the strength of the overall writing, especially from these sections, the analysis and results were somewhat lacking in depth, however. Ultimately, the results were just a handful of maps and time series of the different products, with no real “truth” to benchmark against. Given how intelligent the authors were with their communication and writing of the theory, I was surprised to see the analysis so shallow. I would have liked to have seen that same intelligence from the writing applied to the analysis. The authors could have gone into much more analytical depth on spatial patterns, sensitivities, etc.
  Response: We appreciate the positive comments. To refine our statistical analysis, we will add some analysis that could isolate effects of climatic and other controls on the interannual variability of each model, e.g., via partial correlation analyses. Such information could be beneficial for selection of grid ET products in Australia.
  We want to remind that the objective of this work is a comparative evaluation of the CR method, not an absolute one. Australia include large arid areas that are almost inhabitable for humans. Awaiting ground observations in such locations for evaluating hydrologic models is unrealistic. In this case, reliable ET estimates could become an alternative evaluation reference. Pan et al. (2020), for example, compared modeled ET products even with the ensemble of modeled ETs. We believe that such an evaluation could become informative too. We do not argue that ETwb estimates are true values, but could become an acceptable reference for evaluation.
  To improve reliability of ETwb in revision, we will replace the GPCC precipitation (P) with the locally preferable SILO data as recommended by the reviewer 1. Even though the GPCC data are not considerably deviated from the SILO data (see Figure S2), the replacement would prevent prejudices in accuracy of ETwb.
  In addition, we will add evaluation of the ensemble of the modeled runoffs against the available runoff observations. Since the LORA and the GRUN are validated datasets by global runoff observations, the modeled runoff would show acceptable agreement. And, we want to highlight that ETwb is mostly determined by precipitation rather than by runoff in the arid continent . Approximately, 90% of precipitation evaporates in Australia (Glenn et al., 2011), and thus quality of the modeled runoff would exert minor influences on ETwb. Please see our responses to the Referee 1's comments
  
  Comment 2: Related to the tenuous/lack of benchmarking, I suggest editing the language for use of words like bias and under/over-estimation e.g. in the Results. These terms generally refer to a metric of truth, of which none is given here (I don’t consider the water balance the “truth” given that it is also a model of models; see also comments from Reviewer 1). Better, to stick with language such as larger/smaller/etc. as the comparisons are just relative to one another.
  Response: Thank you very much for these more suitable terms. We will use appropriate terminology in revision when discussing the comparisons between the models, and will highlight that ETwb is just an acceptable evaluation reference.
  
  Comment 3: Moreover, be cognizant in attributing pattern to process relative to model run conditions, especially when it comes to relative magnitudes. Any one model can be high or low depending on the forcing dataset it used (see e.g. comments from Reviewer 1), which is not necessarily indicative of the model (or, importantly for this paper, the inferred processes therein). The closest approximation to ascertaining process from pattern would be to identify spatial and temporal patterns regardless of magnitude. For example, the patterns mentioned for AWRA-L in L251 are interesting and likely indicative of process (though they could have easily just been attributable to something unusual in the forcing used for that model).
  Response: We agree. In revision, we will look into the patterns of relative magnitudes as well as those of absolute magnitudes. And, we will tabulate forcing inputs of each model so that readers could realize differences in forcing inputs of the models at a glance.
  
  Comment 4: Abstract is written a bit, well, abstractly. It could use more take-home information/detail like what exactly where the models and what exactly was their performances.
  Response: After revision, the abstract will be rewritten accordingly. We will include several take-home lessons in the new abstract.
  
  Comment 5: L37. See [Fisher et al., 2017]., L39. See [Polhamus et al., 2013], L47. See [Fisher et al., 2011], L54. See, for reference, [Purdy et al., 2018].
  Response: Thanks for the references. We will cite them when necessary. They could improve the introduction, .
  
  Comment 6: PT-JPL [Fisher et al., 2008] also incorporates the complementary relationship, citing Bouchet, in the soil evaporation component—e.g., RH^VPD. This simple formulation tracks relative surface wetness well [Fisher et al., 2008], and has since been used in other major models of ET, e.g., PM-MOD16 [Mu et al., 2011]. Still, advection will contaminate the relationship, and replacement with direct soil moisture e.g. [Purdy et al., 2018], can eliminate that contamination. The new ECOSTRESS mission [Fisher et al., 2020] uses PT-JPL for the global ET product, but is currently being updated to incorporate the [Purdy et al., 2018] soil moisture formulation and inclusion, downscaled using the measured LST and NDVI following [Colliander et al., 2017].
  Response: We will add the given attributse of the PT-JPL in the description.
  
  Comment 7: Figure 3. I suggest making the symbols in the Taylor diagram more distinguishable.
  Response: We will revise as recommended.
  
  Comment 8: Figure 4. PT-JPL data are available from 1984 from the same link where you got the current data.
  Response: We will update it as recommended in revision, and discussion will be revised accordingly
  
  Comment 9: See [Purdy et al., 2018] for soil moisture incorporation into PT-JPL.
  Response: We confirmed it. The discussion will be revised accordingly.
  
  Comment 10: Figure 8. This seems to be redundant with Figure 4.
  Response: Figure 8 updates Figure 4 with performance of the calibrated CR method. Hence, it is not redundant, but indicating differences from Figure 4. However, to make the manuscript more concise, we could consider simple explanation on performance of the CR method after calibration with ETwb.
  
  References
  Pan, S., Pan, N., Tian, H., Friedlingstein, P., Sitch, S., Shi, H., Arora, V. K., Haverd, V., Jain, A. K., Kato, E., Lienert, S., Lombardozzi, D., Nabel, J. E. M. S., Ottlé, C., Poulter, B., Zaehle, S., and Running, S. W.: Evaluation of global terrestrial evapotranspiration using state-of-the-art approaches in remote sensing, machine learning and land surface modeling, Hydrol. Earth Syst. Sci., 24, 1485–1509, https://doi.org/10.5194/hess-24-1485-2020, 2020.
  
  Citation: https://doi.org/10.5194/hess-2021-126-AC2
AC3: 'correction of the authors' response to the referee 1's comment on Eq. (7)', Jong Ahn Chun, 25 Jan 2022

While we thought the referee 1's comment on Eq. (7) was wrong, now we undertand why the refree required correction.
Before the third-round revision, Eq. (7) combined the two equations as Tdry = Twb + es(Twb)/gamma = Tavg + es(Tavg)/gamma. Correctly, the last term, es(Tavg)/gamma, was supposed to be es(Tdew)/gamma (or ea/gamma). So, the referee 1's comment on Tavg was right. Nonetheless, there are nothing to do in prior calculations, because we used Twb when calculating Tdry, i.e., Tdry = Twb + es(Twb)/gamma. So, It was a mistake in writing, not in calculation.
The revised manuscript does not include the mispresented last term, thus it would not be an issue for potential readers.
We deeply regret our response to the referee's comment, and send a sincere apology to the referee1. Once again, we greatly appreciate the sound comments from the referee1. Thank you.

Citation: https://doi.org/10.5194/hess-2021-126-AC3

Daeha Kim, Minha Choi, and Jong Ahn Chun

Viewed

Total article views: 2,199 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,594	536	69	2,199	61	70

HTML: 1,594
PDF: 536
XML: 69
Total: 2,199
BibTeX: 61
EndNote: 70

Views and downloads (calculated since 16 Mar 2021)

Month	HTML	PDF	XML	Total
Mar 2021	339	62	7	408
Apr 2021	48	18	1	67
May 2021	28	7	1	36
Jun 2021	56	20	5	81
Jul 2021	21	5	0	26
Aug 2021	24	7	0	31
Sep 2021	23	15	0	38
Oct 2021	29	80	0	109
Nov 2021	20	53	0	73
Dec 2021	28	9	0	37
Jan 2022	32	11	2	45
Feb 2022	31	3	0	34
Mar 2022	16	8	1	25
Apr 2022	21	7	1	29
May 2022	11	6	1	18
Jun 2022	7	7	2	16
Jul 2022	19	1	0	20
Aug 2022	4	7	0	11
Sep 2022	14	6	0	20
Oct 2022	19	3	2	24
Nov 2022	9	4	0	13
Dec 2022	12	7	0	19
Jan 2023	17	13	0	30
Feb 2023	13	1	0	14
Mar 2023	7	3	2	12
Apr 2023	8	6	0	14
May 2023	9	4	1	14
Jun 2023	4	5	2	11
Jul 2023	30	9	2	41
Aug 2023	22	5	0	27
Sep 2023	13	8	2	23
Oct 2023	7	5	0	12
Nov 2023	10	3	0	13
Dec 2023	10	1	2	13
Jan 2024	11	1	1	13
Feb 2024	17	14	0	31
Mar 2024	20	14	3	37
Apr 2024	43	5	10	58
May 2024	36	2	3	41
Jun 2024	55	3	3	61
Jul 2024	33	2	35
Aug 2024	38	4	0	42
Sep 2024	30	1	1	32
Oct 2024	26	10	1	37
Nov 2024	31	5	0	36
Dec 2024	32	5	0	37
Jan 2025	31	4	1	36
Feb 2025	34	4	2	40
Mar 2025	32	7	3	42
Apr 2025	36	8	2	46
May 2025	42	8	1	51
Jun 2025	39	17	0	56
Jul 2025	41	12	2	55
Aug 2025	6	3	0	9

Cumulative views and downloads (calculated since 16 Mar 2021)

Month	HTML	PDF	XML	Total
Mar 2021	339	62	7	408
Apr 2021	48	18	1	67
May 2021	28	7	1	36
Jun 2021	56	20	5	81
Jul 2021	21	5	0	26
Aug 2021	24	7	0	31
Sep 2021	23	15	0	38
Oct 2021	29	80	0	109
Nov 2021	20	53	0	73
Dec 2021	28	9	0	37
Jan 2022	32	11	2	45
Feb 2022	31	3	0	34
Mar 2022	16	8	1	25
Apr 2022	21	7	1	29
May 2022	11	6	1	18
Jun 2022	7	7	2	16
Jul 2022	19	1	0	20
Aug 2022	4	7	0	11
Sep 2022	14	6	0	20
Oct 2022	19	3	2	24
Nov 2022	9	4	0	13
Dec 2022	12	7	0	19
Jan 2023	17	13	0	30
Feb 2023	13	1	0	14
Mar 2023	7	3	2	12
Apr 2023	8	6	0	14
May 2023	9	4	1	14
Jun 2023	4	5	2	11
Jul 2023	30	9	2	41
Aug 2023	22	5	0	27
Sep 2023	13	8	2	23
Oct 2023	7	5	0	12
Nov 2023	10	3	0	13
Dec 2023	10	1	2	13
Jan 2024	11	1	1	13
Feb 2024	17	14	0	31
Mar 2024	20	14	3	37
Apr 2024	43	5	10	58
May 2024	36	2	3	41
Jun 2024	55	3	3	61
Jul 2024	33	2	35
Aug 2024	38	4	0	42
Sep 2024	30	1	1	32
Oct 2024	26	10	1	37
Nov 2024	31	5	0	36
Dec 2024	32	5	0	37
Jan 2025	31	4	1	36
Feb 2025	34	4	2	40
Mar 2025	32	7	3	42
Apr 2025	36	8	2	46
May 2025	42	8	1	51
Jun 2025	39	17	0	56
Jul 2025	41	12	2	55
Aug 2025	6	3	0	9

Viewed (geographical distribution)

Total article views: 1,876 (including HTML, PDF, and XML) Thereof 1,876 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 10 Aug 2025

Short summary

This work evaluate a convenient operational method to simulate evaporation over dry land surfaces across Australia. While this chosen method based on the responsive behavior of atmospheric water demand outperformed commonly-used sophisticated models in predicting evaporation in the United States and China, it showed some poor performance in wet river basins Australia. Yet, its performance was still good under (semi-)arid climates.


Total:	0
HTML:	0
PDF:	0
XML:	0