Evapotranspiration plays an important role in the terrestrial water cycle. Reference crop evapotranspiration (ET

As a critical process in the terrestrial water cycle, evapotranspiration transfers a large amount of water from the land surface to the atmosphere. Reference crop evapotranspiration (ET

Raw ET

Failing to correctly simulate the temporal trends of the climate system
could be partially responsible for the low skills of GCM-based raw ET

Raw ET

Statistical techniques have been developed to correct time-dependent errors in raw GCM forecasts. A commonly adopted method is to replace the linear trend in raw forecasts with the observed trend (Kharin et al., 2012). Using this method, Kharin et al. (2012) corrected trends in decadal temperature forecasts and successfully reduced the systematic residual drifts in raw forecasts. Meanwhile, improvements in trends effectively adjusted the long-term climate behavior in forecasts to match the observations (Kharin et al., 2012). To correct errors associated with the representation of temporal changes and variability, Pasternack et al. (2021) adopted a time-varying mean to characterize the climate trend in the calibration of decadal temperature forecasts. In addition to these decadal-scale calibrations, recent studies suggested that seasonal climate forecasting could also benefit from correcting time-dependent errors. For example, Shao et al. (2021) improved the BJP model by adding trend reconstruction algorithms to deal with time-dependent errors. The new algorithm allows for the reconstruction of observed trends in calibrated forecasts. With this new feature, the improved BJP model (hereafter referred to as BJP-ti) demonstrates the capability of adding extra skills to seasonal temperature forecasts through trend reconstruction.

We hypothesize that reconstructing trends in seasonal ET

We develop monthly ET

Seasonal climate forecasts from the latest version (SEAS5) of the ECMWF
model are used to construct the raw ET

To match ET

We construct monthly raw ET

In this study, ET

Calibration with the BJP-ti model involves six steps, including (1) transformation of the data, (2) detrending of the data, (3) joint probability modeling of the transformed and detrended forecasts and observations, (4) generation of ensemble calibrated forecast members conditional on the raw forecast, (5) addition of the observed trend back to ensemble members, and (6) back-transformation of the data to obtain the final calibrated forecasts. We further introduce these steps in detail, in the following.

The first step is to transform raw forecasts and observations to approach
the normal distribution. We adopt the Yeo–Johnson transformation method (Yeo
and Johnson, 2000) to transform ET

Step 2 is to generate detrended forecasts and observations in the
transformed space. For each grid cell, we separately infer linear trends for
transformed forecasts and observations. With the trend parameters
(

In step 3, we assume a bivariate joint distribution (

For each month of the year, model parameters are inferred with training data
pairs (predictor and predictand) during the study period (1990–2019). The a
posteriori distribution of the model parameters is as follows:

In the BJP-ti model, informative priors are applied to set boundaries for
inferred trends to avoid over-fitting. The priors are separately estimated for each grid cell, month, and lead time. This informative prior distribution

In step 4, once all the parameters are inferred, we draw 1000 members from a
conditional distribution of the predictand,

To evaluate the performance of the calibration, we adopt a
leave-1-year-out cross-validation strategy for each grid cell and lead
time. Specifically, for one of the 30 years during 1990–2019, we keep month

To evaluate how the reconstruction of trends affects the quality of
calibrated forecasts, we compare BJP-ti calibrated forecasts with those
generated using the original BJP model, which does not reconstruct trends.
The BJP model omits steps 2 (detrending) and 5 (retrending) in Sect. 2.3.
We present the results of the comparison in the main text for months (August,
September, and October) with large areas (Fig. S3) of statistically
significant (at the 95 % confidence interval) temporal trends in observed
ET

Evaluation metrics employed to examine the performance of calibrations include the correlation coefficient, skill score, bias, and reliability. The calculation of these metrics is further introduced as follows.

We use the Pearson correlation coefficient (

We use the continuous ranked probability score (CRPS) to measure the skill
of the raw and calibrated forecasts as follows (Grimit et al., 2006):

We further calculate the CRPS skill score (

We evaluate the accuracy of the raw and calibrated forecasts using the
following equation:

To evaluate the reliability of calibrated ensemble forecasts, we calculate
the probability integral transform (PIT) value using the following equation:

We evaluate the capability of BJP-ti in reconstructing temporal trends for
months with large areas of statistically significant trends in observed
ET

Trends in raw forecasts, BJP calibrated forecasts, and BJP-ti
calibrated forecasts at month 0 and observed ET

Observed ET

Raw ET

Trends in raw forecasts become weaker at longer lead times (left columns in
Figs. S4 and S5). For the lead time of month 3, trends in raw ET

Calibrated ET

Calibration with the BJP-ti model successfully reconstructs the observed trends in the calibrated forecasts (third column in Figs. 1, S4, and S5, respectively). Inconsistencies between raw forecasts and observations in the spatial patterns and magnitudes of trends are effectively corrected through the calibration, particularly for regions that demonstrate significant observed trends. In addition, the tendency that trends become weaker at longer lead times in the raw forecasts is also effectively corrected. In the BJP-ti calibrated forecasts (third column in Figs. 1, S4, and S5, respectively), all lead times show trends consistent with observations in both spatial patterns and magnitudes.

We further examine whether reconstructing trends improves the representation
of ET

Differences in the correlation coefficient (

Reconstruction of trends results in more skillful calibrated forecasts. We compare the CRPS skill scores of BJP-ti calibrated forecasts with those
produced with the BJP model for the three selected months (Fig. 3). At
month 0, the CRPS skill score of the calibrated forecasts is increased by
5 %–10 % in August, September, and October when trends are reconstructed. The distribution of areas with increased CRPS skill scores is generally consistent with that of the improved

Differences in CRPS skill scores between BJP-ti calibrated forecasts and the BJP calibrated forecasts for 3 selected months (August, September, October) and three lead times (months 0, 3, and 6). Red polygons show regions with significant observed trends.

Differences in CRPS skill scores between BJP-ti calibrated forecasts and the BJP calibrated forecasts over 1990–2019.

We further evaluate the overall performance of the calibration over the
whole study period by comparing the CRPS skill scores of the raw and BJP-ti
calibrated forecasts (Fig. 5). Calibration with the BJP-ti model
substantially improves the skills of the raw ET

CRPS skill scores in

We need to point out that simple bias correction is often applied to raw
ECMWF forecasts before they are used. We applied quantile mapping to the raw
ET

With the correction of errors, including the time-dependent errors, the BJP-ti calibrated forecasts demonstrate CRPS skill scores larger than 20 (%) at month 0 in most grid cells (Fig. 5). The eastern parts of Australia, such as New South Wales and Victoria, show CRPS skill scores of up to 30 (%). Beyond month 0, the skill score decreases significantly in calibrated forecasts. Most areas of Australia show CRPS skill scores lower than 10 (%) at month 1. The skill score further decreases at longer lead times but remains above zero in many parts of Australia, even at month 6, suggesting better performances than the climatology forecasts.

We also summarize the CRPS skill score of calibrated forecasts by target month at the seven lead times across Australia (Fig. 6). Individual boxes indicate the variability among all the grid cells across Australia for that month and lead time. At the first lead time (month 0), all months show a CRPS skill score that is markedly better than the climatology forecasts across most grid cells, with the median CRPS skill score being above 20 (%) for 7 months. However, the skill score decreases quickly with lead time. At lead time 1, the CRPS skill score is mainly lower than 10 (%) for all target months. The skills of calibrated forecasts vary among the months. For October, November, and December, the CRPS skill score is above 0 for more than 50 % of grid cells, even at lead time 6, indicating better performance than the climatology forecasts. For other months, such as January, April, May, and June, the median CRPS skill score decreases to values slightly below 0 beyond the first lead time (month 0).

Box plot of CRPS skill score by target month in BJP-ti calibrated forecasts.

Raw monthly ET

Bias in

The calibration based on the BJP-ti model also improves the correlation
coefficients between forecasts and observations. Raw forecasts are able to
capture the high seasonality in ET

In this study, we generate 1000 ensemble members for each raw forecast to
quantify the uncertainties of the calibrated forecasts. As indicated by the

Alpha index of BJP-ti calibrated ensemble ET

This investigation confirms that the misrepresentation of climate trends is
an important error source in GCM-based ET

This investigation also verifies our hypothesis that correcting
time-dependent errors through trend reconstruction can add extra skills to
calibrated ET

Climate change has posed challenges to the statistical calibration of seasonal climate forecasts. Many post-processing models, such as those based on the probabilistic theory (Tian et al., 2014; Wang et al., 2009), often rely on the climatology of observations to construct the probability distribution function for calibration (Wilks, 2018). However, the non-stationary behavior of the climate system induced by elevated greenhouse gas emissions has been increasingly reported (Haustein et al., 2016; Lima et al., 2015). Many calibration models developed for seasonal forecasts have not considered the climate change impacts on the observed climatology. Although these models are proven to be effective in correcting biases in raw forecasts, assuming a static climatology may have hindered the utilization of predictable information in the raw forecasts. This investigation and our previous calibration of seasonal temperature forecasts (Shao et al., 2020, 2021) suggest that reconstructing trends in calibrated forecasts is an effective solution for capturing the non-stationary behavior of the climate system for more robust statistical calibrations of seasonal climate forecasts.

This current investigation has further validated the strength of the
trend reconstruction algorithms in BJP-ti. Previously, we applied this model
to correct seasonal temperature forecasts and achieved significant
improvements in forecast skills relative to the original BJP model (Shao et al., 2020, 2021). This study further demonstrates the feasibility of the general application of BJP-ti to different hydroclimate variables showing temporal trends (Shao et al., 2022a, b). The successful application to ET

In this investigation, we successfully improve ET

First of all, more sophisticated cross-validation methods should be developed for the inference of trend parameters. The current leave-one-out method has been proven to be effective in the inference of the mean vector and covariance matrix (Shao et al., 2020). However, this strategy may not guarantee the independence between the left-out data and the data used for the inference of trend parameters. We decided not to implement the data-splitting method for cross-validation because of the risk of introducing sampling errors. Future investigations should take this challenge into consideration and develop more robust cross-validation methods for the inference of trend parameters.

In this study, we directly use the raw forecasts of individual input
variables (e.g., temperature, solar radiation, and vapor pressure) to
construct the raw ET

Correction of lead-time-dependent errors should be further investigated in
future GCM-based ET

Future forecast calibration should also investigate the impacts of climate
change on the temporal variations in ET

ET

This investigation also provides valuable insights for improving statistical
calibrations of seasonal climate forecasts in the future. In recent decades,
climate trends have been increasingly observed. However, many calibration
models for seasonal forecasts have not taken the non-stationary behavior of
the climate system into consideration. Improved forecast skills in seasonal
ET

Data used in this study are available by contacting the corresponding author.

The supplement related to this article is available online at:

QY and QJW conceived this study. QJW developed the calibration model. QY took the lead in writing and improving the paper. All co-authors, including AW, WW, YS, and KH, contributed to discussing the results and improving the paper.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank the European Centre for Medium-Range Weather Forecasts (ECMWF), for providing the SEAS5 data (

This research has been supported by the Australian Research Council (grant no. LP170100922).

This paper was edited by Yi He and reviewed by two anonymous referees.