Reference crop evapotranspiration (ETo) is calculated using a standard formula with temperature, vapor pressure, solar radiation, and wind speed as input variables. ETo forecasts can be produced when forecasts of these input variables from numerical weather prediction (NWP) models are available. As raw ETo forecasts are often subject to systematic errors, statistical calibration is needed for improving forecast quality. The most straightforward and widely used approach is to directly calibrate raw ETo forecasts constructed with the raw forecasts of input variables. However, the predictable signal in ETo forecasts may not be fully implemented by this approach, which does not deal with error propagation from input variables to ETo forecasts. We hypothesize that correcting errors in input variables as a precursor to forecast calibration will lead to more skillful ETo forecasts. To test this hypothesis, we evaluate two calibration strategies that construct raw ETo forecasts with the raw (strategy i) or bias-corrected (strategy ii) input variables in ETo forecast calibration across Australia. Calibrated ETo forecasts based on bias-corrected input variables (strategy ii) demonstrate lower biases, higher correlation coefficients, and higher skills than forecasts produced by the calibration using raw input variables (strategy i). This investigation indicates that improving raw forecasts of input variables could effectively reduce error propagation and enhance ETo forecast calibration. We anticipate that future NWP-based ETo forecasting will benefit from adopting the calibration strategy developed in this study to produce more skillful ETo forecasts.

As a variable measuring the evaporative demand of the atmosphere, reference crop evapotranspiration (ETo) has been widely used to estimate potential water loss from the land surface to the atmosphere (Hopson and Webster, 2009; Liu et al., 2019; Renard et al., 2010). Quantification of ETo has been increasingly performed to support efficient water use and water management (Mushtaq et al., 2019; Perera et al., 2016). Forecasts of short-term ETo (days to weeks) are highly valuable for real-time decision-making on farming activities and water allocation to competing users (Djaman et al., 2018; Kumar et al., 2012).

A plethora of methods with different statistical assumptions, dependence on observations, and requirements of weather forecasts have been developed to predict future ETo (Bachour et al., 2016; Ballesteros et al., 2016; Karbasi, 2018; Mariito et al., 1993). ETo is affected jointly by temperature, vapor pressure, solar radiation, and wind speed (Bachour et al., 2016; Luo et al., 2014). Prediction models using these weather variables as inputs allow for representations of atmospheric dynamics and often produce reasonable ETo forecasts (Torres et al., 2011). The increasing availability of weather and climate forecasts based on numerical models has opened up new opportunities for ETo forecasting (Cai et al., 2007; Srivastava et al., 2013; Tian and Martinez, 2014; Zhao et al., 2019a). Forecasts of temperature, vapor pressure, solar radiation, and wind speed from numerical weather prediction (NWP) models/general circulation models (GCMs) could be translated into ETo forecasts using the Food and Agriculture Organization (FAO) ETo equation (Allen et al., 1998; Cai et al., 2007).

Despite the advantages in modeling atmospheric dynamics, flexibility in temporal and spatial scales (Pelosi et al., 2016), and high data availability (Er-Raki et al., 2010), NWP/GCM-based raw ETo forecasts often demonstrate systematic errors (Turco et al., 2017). Limitations in model algorithms, parameterization, and data assimilation often lead to significant errors in raw forecasts of weather variables (Lim and Park, 2019; Vogel et al., 2018). As a result, raw ETo forecasts calculated directly with the raw forecasts of input weather variables (e.g., temperature, vapor pressure, solar radiation, and wind speed) typically demonstrate substantial inconsistencies with observations (Medina and Tian, 2020; Zhao et al., 2019a) and need to be calibrated to improve forecast quality.

Effective calibration aims to correct errors in raw forecasts and provide unbiased, reliable, and skillful calibrated forecasts. Theoretically, two different strategies could be adopted to achieve this goal in the calibration of ETo forecasts. The first strategy is to construct raw ETo forecasts directly with the raw forecasts of the input variables and then calibrate the derived ETo forecasts. This strategy lumps errors from the input variables together in the raw ETo forecasts and corrects the combined errors directly (Tian and Martinez, 2014; Zhao et al., 2019a). This strategy is straightforward and thus has been adopted by most existing calibrations of NWP/GCM-based ETo forecasts. For example, Medina et al. (2018) used a linear regression bias-correction method to calibrate ETo forecasts from three NWP models and achieved significant improvements in forecast quality. Medina and Tian (2020) employed three probabilistic-based calibration methods to calibrate ETo forecasts from multiple NWP models, and generated more skillful and reliable forecasts than using a simple regression bias-correction model. Another probabilistic postprocessing method, the Bayesian joint probability (BJP) model, was adopted to improve the accuracy and skills of GCM-based ETo forecasts across multiple sites in Australia (Zhao et al., 2019a, b).

Alternatively, ETo forecast calibration could start with correcting errors in input variables. Raw forecasts of input variables could be improved first, and raw ETo forecasts could then be constructed with the corrected input variables. After that, the derived raw ETo forecasts could be further improved through calibration. This strategy requires one more step than the one using the raw input variables. With the improved input variables, errors in the resultant raw ETo forecasts could be significantly reduced (Nouri and Homaee, 2018; Perera et al., 2014). However, there is no conclusion on whether improving raw forecasts of input variables will eventually add additional skills to calibrated ETo forecasts (Medina and Tian, 2020).

Which calibration strategy produces more skillful calibrated forecasts is a critical question in NWP-based ETo forecasting, but the answer remains unclear. Since NWP/GCM-based ETo forecasting is increasingly conducted to support water resource management, there is a need to investigate the necessity of correcting raw forecasts of the input variables in ETo forecast calibration.

We hypothesize that reducing errors in input variables as a precursor will enhance ETo forecast calibration and lead to more skillful calibrated forecasts. To test this hypothesis, we compare two calibration strategies that construct raw ETo forecasts based on the raw (strategy i) or bias-corrected (strategy ii) input variables in calibrating ETo forecasts across Australia. This study aims to fill a knowledge gap in NWP-based ETo forecasting and develop a calibration strategy to produce more skillful ETo forecasts.

In this study, we use the ETo data derived from the Australian Water Availability Project (AWAP)'s gridded data of temperature, vapor pressure, and solar radiation (Jones et al., 2007, 2014), as well as wind speed data developed by Mcvicar et al. (2008), as observations for ETo forecast calibration. Weather forecasts from the Australian Community Climate and Earth System Simulator G2 version (ACCESS-G2) model are extracted as inputs for the calculation of raw ETo forecasts. We modify the spatial resolution of ACCESS-G2 forecasts using bilinear interpolation to match the AWAP data's grid spacing. The 3-hourly ACCESS-G2 forecasts during April 2016–March 2019 are aggregated to the daily scale to match the timeframe of the original site observations used to generate the AWAP data. The ACCESS-G2 weather forecasts have a forecast horizon of 9 d. AWAP ETo during April 1999–March 2019 is used for the training of the calibration model, and data during April 2016–March 2019 are selected for forecast calibration and evaluation.

We calculate ETo forecasts and AWAP ETo using the FAO56 equation (Allen et
al., 1998):

In constructing raw ETo forecasts, temperature and solar radiation are
readily available from the ACCESS-G2 outputs. To obtain the wind speed
forecasts, we first use the forecasts of zonal (

The calibration model used in this study is the Seasonally Coherent
Calibration (SCC) model, which is introduced in detail in Sect. 2.3.2. For
the calibration across Australia with a spatial resolution of 0.05

Four sets of ETo forecast calibrations.

We conduct four calibrations to evaluate how the two different strategies will affect the calibrated ETo forecasts (Table 1 and Fig. S1). Our recent investigation suggests calibrating ETo anomalies, which are calculated as departures from the climatological mean, could produce more skillful calibrated forecasts than calibrating ETo forecasts directly (Yang et al., 2021b). As a result, in this study, we primarily focus on calibrations based on ETo anomalies (calibrations 1 and 2). The comparison between calibrations 1 and 2 is to investigate whether the bias correction of input variables would further improve ETo forecasts when the calibration is conducted based on ETo anomalies and climatological mean. We also conduct additional calibrations which postprocess ETo forecasts directly (calibrations 3 and 4) to test whether the contribution of improving input variables to ETo forecast calibration, if there is any, will depend on how ETo forecasts are calibrated (based on anomalies vs. based on ETo). Calibrations 3 and 4 will help evaluate the general applicability of strategy ii to enhance NWP/GCM-based ETo forecasting. Key steps of the four calibrations could be found in the schematic diagram introducing how raw ETo forecasts are constructed and how calibrations are conducted (Fig. S1). In the main text, we primarily analyze results from calibrations 1 and 2. Improvements with the adoption of bias correction to input variables in calibrations 3 and 4 are very similar to calibrations 1 and 2 (see the Supplement). To avoid redundancy, we mainly present results from calibrations 3 and 4 in the Supplement.

In ETo forecast calibration employing strategy ii (calibrations 2 and 4), we use a nonparametric quantile mapping method (QUANT) to correct raw forecasts of the input variables. The QUANT method has been widely used in hydrological and climatological investigations to correct bias in raw forecasts (Boe et al., 2007). To use QUANT, we first build up the empirical cumulative density function (CDF) of both raw forecasts and AWAP data for each variable. We then calculate the percentile of each record in raw forecasts in their CDF. Next, these percentiles are used to search values in the corresponding AWAP data, which are then treated as the bias-corrected forecasts.

After we construct the raw ETo forecasts, based on either raw (calibrations
1 and 3) or bias-corrected (calibrations 2 and 4) forecasts of the input
variables, we employ the SCC model to further calibrate the ETo forecasts.
For the calibrations (calibrations 1 and 2) applying SCC to ETo anomalies,
the first step is to derive the climatological mean at the daily scale using
the 20-year AWAP ETo. Calibrations 3 and 4 skip this step and apply the SCC
model to ETo forecasts directly. We use the method developed by Narapusetty et al. (2009) and adopt
trigonometric functions and harmonics to simulate the annual cycle of AWAP
ETo to derive the climatological mean:

We then remove the climatological mean from both raw ETo forecasts and AWAP ETo to generate anomalies. We calibrate the derived anomalies of raw ETo forecasts against the anomalies of AWAP ETo using the SCC model. The SCC model is composed of four key components, including (i) a joint probability model to characterize the connection between raw forecasts and observations, (ii) reconstruction of seasonal patterns in raw forecasts based on the long-term observations, (iii) reparameterization to obtain parameters for short-archived raw forecasts, and (iv) generation of calibrated forecasts with the parameters and the joint model. The SCC model has been introduced in detail in our site- and continental-scale calibrations of NWP precipitation forecasts (Wang et al., 2019; Yang et al., 2021a).

In this study, we use the Yeo–Johnson transformation method to transform the
anomalies of forecasts and reference data to approach a normal distribution (Yeo and Johnson, 2000):

We assume that the transformed anomalies of ETo forecasts (

With the long-term (20-year) AWAP ETo data, we can directly estimate

With the optimized parameters (means, standard deviations, and correlations)
for the BN distribution (Eq. 4), a conditional distribution for

We evaluate the performance of the calibrations using a strict leave-one-month-out cross-validation, in which each of the 36 months during April 2016–March 2019 and the same month in the 20-year reference data (April 1999 to March 2019) are left out in parameter inference. Optimized parameters are then used to calibrate raw forecasts of this specific month. This process is repeated until all 36 months are processed. The cross-validation is to make sure that raw forecasts used to generate calibrated forecasts are not used in parameter optimization.

We also produce climatology forecasts based on the monthly mean and standard
deviation parameters of AWAP ETo (Eq. 4). The randomly sampled
climatology is used as the baseline to evaluate the calibrated ETo
forecasts. We evaluate the calibrations by checking bias, temporal
variability, skill score, and reliability of the calibrated forecasts. We
conduct

We evaluate bias of the raw and calibrated forecasts relative to AWAP ETo
using the following equation:

We use the Pearson correlation coefficient (

We use the continuous ranked probability score (CRPS) to measure skills in
the raw and calibrated forecasts (Grimit et al., 2006):

We further calculate the CRPS skill score (

In the calculation of CRPS skill score, both climatology forecasts or the
last observations (persistence) have been used as reference forecasts (Pappenberger et al.,
2015; Thiemig et al., 2015). However, reference forecasts based on
persistence are more suitable for evaluating the performance of forecasts
shorter than 2 d. As a result, we choose climatology forecasts as the
reference, since errors in climate forecasts are similar among all lead
times and thus could be used to demonstrate the increasing errors in raw and
calibrated forecasts as lead time advances. For

We evaluate the reliability of calibrated forecasts using the probability
integral transform (PIT) value calculated with the following equation:

We further evaluate the reliability of calibrated ETo forecasts from
calibration 2 using the reliability diagram (Hartmann et
al., 2002), which assesses how well the predicted probabilities of forecasts
match observed frequencies. We convert the calibrated ensemble ETo forecasts
to forecast probabilities exceeding three thresholds, including 3, 6, and 9 mm d

Raw forecasts of the five input variables demonstrate significant
inconsistencies with the corresponding AWAP data (Figs. S2–S6). In most
parts of Australia, raw daily maximum temperature (

Raw forecasts of the input variables generally agree with the AWAP data in
temporal patterns during the study period, but the

Quantile mapping effectively corrects biases in raw forecasts of the input
variables. Through the bias correction, significant overpredictions and
underpredictions in raw forecasts of the five variables are significantly
reduced, resulting in biases close to zero for all lead times across
Australia (Figs. S2–S6). In addition, quantile mapping also improves the
correlation between forecasts of input variables and AWAP data (Figs. S7–S11). The most significant improvements are found in wind speed
forecasts, in which the

Raw ETo forecasts constructed with the bias-corrected input variables are
much more accurate than those calculated with raw forecasts of the input
variables (Figs. 1 and S12). When raw ETo forecasts are constructed with
raw input variables, biases in input variables are translated into errors in
the raw ETo forecasts, which demonstrate substantial positive biases of 1 mm d

Bias in (three columns on the left) raw ETo forecasts constructed with raw forecasts of input variables and (three columns on the right) raw ETo forecasts constructed with bias-corrected input variables.

The comparison between the correlation coefficient of raw ETo forecasts constructed with the bias-corrected inputs and AWAP ETo vs. the correlation coefficient of raw ETo forecasts constructed with the raw inputs and AWAP ETo. The boxplot on the right summarizes results across all grid cells.

The adoption of quantile mapping to input variables also improves the
temporal patterns of raw ETo forecasts (Fig. 2). Compared with the raw ETo
forecasts constructed with raw input variables, the raw ETo forecasts based
on bias-corrected inputs generally shows higher correlations with AWAP ETo,
particularly in northern Australia, where

The calibration with the SCC model further reduces biases in ETo forecasts
(Fig. 3). The calibrated ETo forecasts from calibration 2 show low biases
close to zero across all grid cells and lead times. Overpredictions in
Queensland in the raw ETo forecasts calculated with the bias-corrected input
variables are effectively corrected (Figs. 1, 3, and S12), leading to
lower biases in the calibrated forecasts. According to the

Bias in calibrated ETo forecasts from calibration 2, in which raw ETo forecasts are constructed with bias-corrected input variables. Maps on the left show the spatial patterns of bias, and the boxplot on the right summarizes biases across all grid cells.

Differences in absolute bias between calibrated ETo forecasts from calibration 2 with calibration 1. Maps on the left show the spatial patterns of difference in absolute bias, and the boxplot on the right summarizes results across all grid cells.

Compared with the calibration constructing raw ETo forecasts with raw
forecasts of input variables (calibration 1), the postprocessing based on
bias-corrected input variables (calibration 2) produces more accurate
calibrated ETo forecasts (Fig. 4). Specifically, calibrated ETo forecasts
from calibration 2 demonstrate significantly smaller (

We further examine the representation of ETo temporal variability by
calibrated forecasts. The

The correlation coefficient between calibrated ETo forecasts from
calibration 2 and AWAP ETo. Maps on the left show the spatial patterns of

Differences in the correlation coefficients between calibrated
forecasts from calibration 2 and AWAP ETo vs. calibration 1. Maps on the
left show the spatial patterns of differences in

The adoption of bias correction to raw forecasts of input variables results
in better representation of ETo variability in calibrated ETo forecasts
(Fig. 6 and Table S2). Increases in

Spatial patterns of improvements in

The calibration of ETo forecasts with the SCC model significantly improves
forecast skills. The raw ETo forecasts calculated with bias-corrected input
variables demonstrate low skills, even at short lead times (Figs. 7 and
S13). Specifically, for the first two lead times, central and southern
Australia show skills better than the climatology forecasts by 10 % to
20 %. However, in most parts of northern Australia, raw forecasts are
worse than randomly sampled climatology. Skills in raw ETo forecasts
decrease quickly with lead time. Regions with positive skills shrink
substantially at lead times 3 and 4 and disappear at longer lead times. At
lead time 9, skills of raw forecasts are mainly below

CRPS skill score in the (three columns on the left) raw ETo forecasts calculated with bias-corrected input variables and (three columns on the right) calibrated forecasts from calibration 2.

The calibration significantly improves forecast skills across all lead times (Table S2). Calibrated ETo forecasts from calibration 2 show CRPS skill scores above 35 % at lead time 1 across Australia, and the skills are generally above 30 % at lead times 2 and 3. Since ETo forecasts have been widely used to inform real-time decision-making for farming, high skills in calibrated ETo forecasts for the short lead times are expected to be highly valuable for activities such as irrigation scheduling. Although skills of calibrated forecasts also decrease with lead time, they remain above zero at long lead times (Figs. 7 and S13).

Differences in CRPS skill score between the calibrated ETo forecasts from calibration 2 with those from calibration 1. Maps on the left show the spatial patterns of difference in CRPS skill score, and the boxplot on the right summarizes results across all grid cells.

We further compare skills of calibrated ETo forecasts between calibrations 2
and 1 (Fig. 8). We achieve significant increases (

The calibrated ensemble ETo forecasts from calibration 2 demonstrate high
reliability (Fig. 9). In addition to correcting bias, the SCC model
converts deterministic raw forecasts to ensemble forecasts, which use 100
ensemble members to quantify forecast uncertainty. Figure 9 demonstrates
highly reliable ensemble spreads in calibrated forecasts across all lead
times. In most grid cells, the

The

Reliability diagrams of calibrated ETo forecasts during
April 2016–March 2019 with thresholds of 3, 6, and 9 mm d

The reliability diagram further confirms the consistency between forecast
probabilities and observed frequencies (Fig. 10). The plotted curves based
on three thresholds (3, 6, and 9 mm d

We also compare the bias, correlation coefficient, CRPS skill score, and
reliability of calibrated forecasts from calibrations 3 and 4 to evaluate
whether we can obtain similar improvements through the bias correction of
input variables if we conduct the ETo forecast calibration in a different
way (without using ETo climatological mean and anomalies). Results show that
the adoption of bias correction also leads to lower bias, higher correlation
coefficient, and higher CRPS skill score in terms of magnitude, spatial
patterns, and trend along the lead times, when ETo forecasts are calibrated
directly (Figs. S15–S17). In addition, the

Although the selected metrics measure different aspects of forecast quality,
they generally agree with each other in demonstrating improvements in
calibrated ETo forecasts with the adoption of the strategy ii. As introduced
in the Method section, bias measures average differences; correlation
coefficient shows consistency between observations and forecasts in temporal
variability; the CRPS skill score measures the performance of the calibrated
forecasts relative to climatology forecast; the

This investigation further highlights the importance of statistical calibration in NWP-based ETo forecasting (Medina and Tian, 2020). According to an investigation across 40 sites in Australia, raw ETo forecasts constructed with NWP outputs reasonably captured the magnitude and variability of ETo, but forecast skills better than climatology were only limited to the first six lead times (Perera et al., 2014). Our investigation suggests that statistical calibration could substantially improve forecast skills and successfully extend the skillful forecasts to lead time 9 across Australia. Findings of this investigation agree well with the site-scale short-term ETo forecasting based on GCM outputs (Zhao et al., 2019a) in the improvements of forecast skills through statistical calibration. Calibrated forecasts from calibration 2 demonstrate similar skills as Zhao et al. (2019a) across three Australian sites. Thanks to the capability of SCC in calibrating short-archived forecasts (Wang et al., 2019), we achieve the improvements based on much shorter archived raw forecasts (3-year vs. 23-year) than Zhao et al. (2019a). Calibrated forecasts from calibration 2 also demonstrate low biases (0.32 %–0.95 %) comparable with calibrated ETo forecasts (0.49 %–0.63 %) based on the Bayesian model averaging (BMA) model and weather forecasts from three NWP models in the USA during 2014–2016 (Medina and Tian, 2020).

This investigation also contributes to filling a knowledge gap in NWP-based ETo forecasting. Although previous calibrations using raw forecasts of input variables to construct the raw ETo forecasts (strategy i) for calibration often achieved significant improvements in skills, it is unclear whether improving forecasts of input variables could further enhance ETo forecast calibration (Medina and Tian, 2020). How the raw ETo forecasts should be constructed represents a critical knowledge gap in the area of NWP-based ETo forecasting (Medina and Tian, 2020). Results of this investigation provide strong evidence for the necessity of improving input variables prior to constructing raw ETo forecasts. The nonlinear and nonstationary behaviors of the input variables used for ETo calculation have been reported (Paredes et al., 2018). This study suggests that when raw input variables are used to construct the raw ETo forecasts, complex interactions among these variables may lead to errors in raw ETo forecasts that could not be effectively corrected through statistical calibration. Bias correction of input variables could help prohibit the propagation of errors from input variables to ETo forecasts (Zappa et al., 2010), as evidenced by the higher accuracy and higher skills in calibrated ETo forecasts when input variables are bias corrected. In addition, a further evaluation based on a different way of implementing the SCC model demonstrates similar improvements in calibrated ETo forecasts with the adoption of bias correction to input variables (calibrations 3 and 4). Results from calibrations 3 and 4 further confirm that additional skills have been added to raw ETo forecasts through the bias correction of input variables, and the improvements to calibrated ETo forecasts tend to be independent of calibration models. Consequently, we anticipate that future NWP-based ETo forecasting could benefit from adopting this calibration strategy to produce more skillful calibrated ETo forecasts.

This investigation also provides valuable implications for the forecasting of integrated variables, which are derived based on multiple NWP/GCM variables. Variables such as drought index (Zhang et al., 2017), bushfire danger index (Sharples et al., 2009), and severe weather index (Rabbani et al., 2020) are often derived by combining multiple weather variables produced by NWP models. Our investigation suggests that improving the input variables could effectively reduce error propagation from inputs to integrated variables. This extra step is proven to be particularly useful in reducing errors in the integrated variables that could not be corrected through calibration. We anticipate that this extra step could help improve the predictability of integrated variables.

Although we have conducted thorough analyses on the contribution of improving input variables to ETo forecast calibration, further investigations will be needed to validate the robustness of findings in this study. First, we anticipate that the ETo forecasts could be further improved if a more sophisticated calibration model is applied to raw forecasts of the input variables. In this study, we adopt a simple bias-correction method to improve the input variables. Limitations of quantile mapping have been reported in previous studies (Schepen et al., 2020; Zhao et al., 2017). Our analyses demonstrate that the raw ETo forecasts calculated with the bias-corrected input variables still show low forecast skills, particularly at long lead times (Fig. 7). If a more sophisticated calibration method is employed to the input variables, error propagation from input variables to ETo forecasts will likely be further reduced. As a result, we anticipate that the calibrated ETo forecast will gain further improvements in forecast skills. Another advantage of correcting input variables with a sophisticated model is that it will produce a set of skillful calibrated weather forecasts. Well-calibrated forecasts of temperature, vapor pressure, solar radiation, and wind speed could be useful for forecast users such as crop modelers and bushfire managers.

Second, the two calibration strategies should be tested using other NWP models. In this study, we use one NWP model to investigate a critical knowledge gap in NWP-based ETo forecasting. Additional investigations are needed to examine whether improvements achieved with the adoption of calibration strategy ii will hold for ETo forecasting based on other NWP models. Third, further investigations based on other calibration models are needed to validate findings of this investigation. Our analyses based on two different methods (based on ETo anomalies vs. based on original ETo) demonstrate similar improvements in calibrated ETo forecasts with the adoption of bias correction to input variables. Additional evaluations will be needed to verify whether forecast skills will be improved using strategy ii but based on a different calibration model. In addition, we use bilinear interpolation to match the NWP forecasts and AWAP data. More sophisticated remapping methods should be evaluated to understand the impacts of forecast regridding on statistical calibration.

The applicability of the calibration strategy developed in this study to seasonal ETo forecasting should be further investigated. Seasonal ETo forecasting based on GCM climate forecast has been increasingly performed (Tian et al., 2014; Zhao et al., 2019b). In these investigations, raw ETo forecasts were also constructed directly with raw GCM climate forecasts. As a result, it is expected that these investigations have suffered from error propagation from input variables to seasonal ETo forecasts. Whether the calibration strategy (strategy ii) developed in this study will be applicable to seasonal ETo forecasting warrants further investigations.

NWP outputs have been increasingly used for ETo forecasting to support water resource management. Statistical calibration plays an essential role in improving the quality of ETo forecasts. However, it is unclear whether improving raw forecasts of input variables is necessary for the calibration of ETo forecasts. We aim to fill this knowledge gap through a thorough comparison of two calibration strategies in the calibration of NWP-based ETo forecasts.

This investigation clearly suggests the necessity of improving input variables as part of ETo forecast calibration. With this extra step, the bias, correlation coefficient, and skills of the calibrated ETo forecasts are all improved. Further investigation indicates that the improvements tend to be independent of the calibration method applied to ETo forecasts. Forecasting the highly variable ETo is often challenging. This investigation addresses a common challenge in NWP-based ETo forecasting and develops an effective calibration strategy for adding extra skills to ETo forecasts. We anticipate that future NWP-based ETo forecasting could benefit from adopting this strategy to produce more skillful calibrated ETo forecasts. This strategy is also expected to be applicable to enhancing the forecasting of other integrated variables that are calculated using multiple NWP/GCM variables as inputs.

Data used in this study are available by contacting the corresponding author.

The supplement related to this article is available online at:

QY and QJW conceived this study. QJW developed the calibration model. QY took the lead in writing and improving the article. All co-authors, including KH and YT, contributed to discussing the results and improving this study.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Computations for this research were undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI), which is supported by the Australian Government. This research was supported by the “Sustaining and strengthening merit-based access to National Computational Infrastructure” (NCI) LIEF grant (LE190100021) and facilitated by The University of Melbourne.

This research has been supported by the Australian Research Council (grant no. LP170100922) and the Australian Bureau of Meteorology (grant no. TP707466).

This paper was edited by Nadia Ursino and reviewed by three anonymous referees.