Reconstructing climate trends adds skills to seasonal reference crop evapotranspiration forecasting

Evapotranspiration plays an important role in the terrestrial water cycle. Reference crop evapotranspiration (ETo) has been widely used to estimate water transfer from vegetation surface to the atmosphere. Seasonal ETo forecasting provides valuable information for effective water resource management and planning. Climate forecasts from General Circulation Models (GCMs) have been increasingly used to produce seasonal ETo forecasts. Statistical calibration plays a critical role in correcting bias and dispersion errors in GCM-based ETo forecasts. However, time-dependent errors, resulting from GCM’s 10 misrepresentations of climate trends, have not been explicitly corrected in ETo forecast calibrations. We hypothesize that reconstructing climate trends through statistical calibration will add extra skills to seasonal ETo forecasts. To test this hypothesis, we calibrate raw seasonal ETo forecasts constructed with climate forecasts from the European Centre for MediumRange Weather Forecasts (ECMWF) SEAS5 model across Australia, using the recently developed Bayesian Joint Probability trend-aware (BJP-ti) model. Raw ETo forecasts demonstrate significant inconsistencies with observations in both magnitudes 15 and spatial patterns of temporal trends, particularly at long lead times. The BJP-ti model effectively corrects misrepresented trends and reconstructs the observed trends in calibrated forecasts. Improving trends through statistical calibration increases the correlation coefficient between calibrated forecasts and observations (r) by up to 0.25 and improves the continuous ranked probability score (CRPS) skill score by up to 15 (%) in regions where climate trends are misrepresented by raw forecasts. Skillful ETo forecasts produced in this study could be used for streamflow forecasting, modelling of soil moisture dynamics, 20 and irrigation water management. This investigation confirms the necessity of reconstructing climate trends in GCM-based seasonal ETo forecastings, and provides an effective tool for addressing this need. We anticipate that future GCM-based seasonal ETo forecasting will benefit from correcting time-dependent errors through trend reconstruction.


Introduction
As a critical process in the terrestrial water cycle, evapotranspiration transfers a large amount of water from the land surface 25 to the atmosphere. Reference crop evapotranspiration (ETo) measures the evaporative demand of the atmosphere for a hypothetical crop of given height, with defined surface resistance factor and albedo. It is generally computed using the Penman-Monteith equation following Allen et al. (1998, see section 2.1), which is known as FAO56. McMahon et al. (2013) provides additional information about the process. Reference crop evapotranspiration (ETo) measures the evaporative demand of the 2 atmosphere and thus provides valuable information for understanding and simulating terrestrial hydrology. Forecasting of ETo 30 has been used to support water resource management (Anderson et al., 2015;Le Page et al., 2021) and improve soil moisture modelling (Yu et al., 2016). In addition, ETo forecasting also helps constrain the significant uncertainties in streamflow forecasting (Greuell et al., 2019;Van Osnabrugge et al., 2019). Seasonal ETo forecasts have been used to support water allocation among competing users (Chauhan and Shrivastava, 2009) and in planning farming activities (Zinyengere et al., 2011). In recent years, climate forecasts produced by General Circulation Models (GCMs) have been adopted increasingly 35 used for seasonal ETo forecasting, since GCMs often produce forecasts of all climate variables needed to estimate future ETo (Tian et al., 2014;Zhao et al., 2019a).
Raw ETo forecasts constructed with GCM climate forecasts often inherit significant errors from the raw forecasts of climate variables, including temperature, solar radiation, wind speed, and vapor pressure. Due to deficiencies in GCM's representation of physical processes of the atmosphere (Woldemeskel et al., 2014), model parameterization (O'Gorman and Dwyer, 2018), 40 and data assimilation (O'kane et al., 2019), raw GCM forecasts often demonstrate systematic errors (Weisheimer and Palmer, 2014). For example, inconsistencies with observations have been reported for the raw forecasts of all variables needed to construct ETo forecasts using the Food and Agriculture Organization (FAO)FAO 56 method (Groisman et al., 2000;Slater et al., 2017). These inconsistencies often lead to significant bias and low skills in the resultant raw ETo forecasts (Zhao et al., 2019b). 45 Failing to correctly simulate the temporal trends of the climate system could be partially responsible for the low skills of GCMbased raw ETo forecasts. Time-dependent errors are introduced when GCMs lack skills in modelling climate trends driven by rising atmospheric greenhouse gas (GHG) concentrations (Sansom et al., 2016). There is mounting evidence that climate change has resulted in increasing trends in temperature (Smith et al., 2007) and vapor pressure (Byrne and Gorman, 2018), but led to decreasing trends in solar radiation (Liepert, 2002). However, GCMs configured for seasonal climate forecasts often 50 misrepresent these observed trends. For example, an evaluation across nine climate regions in the U.S. showed that nine of ten selected GCMs failed to reproduce the observed temporal trends in seasonal temperature forecasts (Bhowmik and Sankarasubramanian, 2020). In the Middle East, seasonal temperature forecasts by the Climate Forecast System version 2 (CFSv2) model overestimated the warming trend in reference data by approximately 0.4° decade -1 (Alizadeh-Choobari et al., 2019). In Australia, evaluations of the European Centre for Medium-Range Weather Forecasts (ECMWF) SEAS5 model 55 identifieddemonstrated significant discrepancies between observed and forecasted trends in temperature (Shao et al., 2020(Shao et al., , 2021a. Forecasts of fire weather index (calculated with forecasts of precipitation, wind speed, temperature, and humidity) based on the ECMWF System 4 model demonstrated significant inconsistencies with observations in temporal trends in Europe during 1981-2010 (Bedia et al., 2018). As a result, it is unlikely that raw ETo forecasts constructed with raw forecasts of these climate variables would faithfully reproduce the observed climate trends. Failing to capture the observed trends inevitably 60 introduces errors to GCM-based raw ETo forecasts.
Raw ETo forecasts constructed with climate forecasts need to be calibrated to correct biases and dispersion errors. Statistical calibration models initially developed for other variables, such as precipitation or temperature, have been adopted to calibrate 3 raw ETo forecasts (Medina and Tian, 2020;Zhao et al., 2019a). Using a quantile-mapping method, Tian and Martinez (2014) improved seasonal ETo forecasts based on CFSv2 outputs in Florida, the U.S. In the calibration of seasonal ETo forecasts in 65 Australia, Zhao et al. (2019b) used the Bayesian Joint Probability (BJP) model to post-process ETo forecasts constructed with climate forecasts from the Australian Bureau of Meteorology's Australian Community Climate and Earth-System Simulator-Seasonal prediction system version 1 (ACCESS-S1) model across three weather stations. This investigation validated the BJP model's strengths in error correction and skill enhancement in ETo forecasting. However, none of these calibrations have explicitly dealt with time-dependent errors caused by the misrepresentation of climate trends in GCM forecasts. 70 Statistical techniques have been developed to correct time-dependent errors in raw GCM forecasts. A commonly adopted method is to replace the linear trend in raw forecasts with the observed trend (Kharin et al., 2012). Using this method, Kharin et al. (2012) corrected trends in decadal temperature forecasts and successfully reduced the systematic residual drifts in raw forecasts. Meanwhile, improvements in trends effectively adjusted the long-term climate behavior in forecasts to match observations (Kharin et al., 2012). To correct errors associated with the representation of temporal changes and variability, 75 Pasternack et al. (2020) adopted a time-varying mean to characterize the climate trend in the calibration of decadal temperature forecasts. In addition to these decadal-scale calibrations, recent studies suggested that seasonal climate forecasting could also benefit from correcting time-dependent errors. For example, Shao et al. (2021)  We hypothesize that reconstructing trends in seasonal ETo forecasts through statistical calibration will help correct timedependent errors and thereby improve forecast skills. To test this hypothesis, we adopt the BJP-ti model to calibrate seasonal ETo forecasts constructed with climate forecasts from the ECMWF SEAS5 model across Australia. This investigation aims to 85 1) reconstruct climate trends in seasonal ETo forecasts through statistical calibration and 2) investigate how trend reconstruction affects the skill of calibrated ETo forecasts.

Observations and forecasts
We develop monthly ETo data (treated as observations for calibration) based on gridded monthly temperature, solar radiation, 90 and vapor pressure data from the Australian Water Availability Project (AWAP) (Jones et al., 2007(Jones et al., , 2014. Since the AWAP project does not provide wind speed data, we use a constant wind speed of 2 m s -1 in deriving the ETo observations (Allen et al., 1998). Based on these AWAP variables, we produce monthly ETo observations during 1990-2019 for forecast calibration. 4 Seasonal climate forecasts from the latest version (SEAS5) of the ECMWF model are used to construct the raw ETo forecasts.
The re-forecast period of SEAS5 is 1981-2016, and thewith an ensemble size of is 25 members. SEAS5 forecasts have a 95 horizon of seven months (months 0 to 6), with a spatial resolution of 0.4°. Real-time forecasts started in 2017, with an ensemble size of 51 members (Stockdale et al., 2017). SEAS5 forecasts have a horizon of seven months (months 0 to 6), with a spatial resolution of 0.4°. While SEAS5 produces climate forecasts across the globe, the calibration in this study is performed across Australia only.
To match ETo observations, we combine the archived re-forecasts and operational forecasts to derive raw ETo forecasts for the 100 period of 1990-2019. ECMWF runs for re-forecasts, and operational forecasts are configured in a similar way, except for differences in initialization (Johnson et al., 2019). Absolute errors in raw ETo forecasts during the two periods are comparable ( Figure S1). We choose the first 25 ensemble members of the real-time forecasts (2017-2019) to match the ensemble size of the re-forecasts . To match ETo observations, we combine the archived re-forecasts and operational forecasts to derive raw ETo forecasts for the period of 1990-2019. We choose the first 25 ensemble members of the real-time forecasts 105 (2017-2019) to match the ensemble size of the re-forecasts . Next, wWe calculate the ensemble mean of the 25 ensemble members of ECMWF forecasts of temperature, solar radiation, and vapor pressure for the calculation of raw ETo forecasts. To be consistent with the ETo observations, we use a constant wind speed of 2 m s -1 in deriving raw ETo forecasts.
FinallyIn addition, we aggregate the grid spacing of AWAP data from 0.05° to match the ECMWF's spatial resolution of to 0.4° to match the ECMWF's spatial resolution.. 110

Calculation of ETo observations and forecasts
We construct monthly raw ETo forecasts and ETo observations using the monthly ECMWF climate forecasts and AWAP data based on the FAO 56 ETo method (Allen, et al., 1998): where is the monthly reference crop evapotranspiration (mm month -1 ); is the slope of the vapor pressure curve 115 ( ° ); is net radiation at the crop surface ( ℎ ); is soil heat flux density ( ℎ ), which is calculated based on temperature; is the psychrometric constant ( °-1 ); is average air temperature (°); is the wind speed at 2 m (m s -1 ); and and are saturated and actual vapor pressure ( ), respectively.

Forecast calibration with the BJP-ti model
In this study, ETo forecast calibration is conducted across Australia for each grid cell, each month, and lead time separately 120 during 1990-2019. We employ the BJP-ti model to calibrate the raw ETo forecasts. This model was developed recently by extending the original BJP model's capability to deal with errors resulting from the misrepresentation of climate trends. In this study, the calibration model is configured by month k (k = 1 to 12 corresponding to January to December) of the year. Calibration with the BJP-ti model involves six steps, including 1) data transformation, 2) data detrending, 3) joint probability modelling of the transformed, detrended forecasts and observations, 4) generation of ensemble calibrated forecast members 125 conditional on the raw forecast, 5) adding the observed trend back to ensemble members, and 6) back-transforming the data to obtain the final calibrated forecasts. We further introduce these steps in detail as follows.
The first calibration step is to transform raw forecasts and observations to approach the normal distribution. We adopt the Yeo-Johnson transformation method (Yeo and Johnson, 2000) to transform ETo:.
where is a transformation parameter; x refers to raw ETo forecasts or ETo observations (mm month -1 ); is the transformed x (forecasts or observations) generated through the Yeo-Johnson transformation. The above transformation is performed by month of the year for raw forecasts and observations separately. The transformation parameter (λ) is inferred using the Bayesian Maximum a Posterior (MAP) method (Shao et al., 2020).
Step 2 is to generate detrended forecasts and observations in the transformed space. For each grid cell, we infer linear trends 135 for transformed forecasts and observations separately. With the trend parameters ( and ), trends in transformed forecasts and observations are removed to produce detrended data. Specifically, each transformed forecast and observation record is adjusted based on the middle year of the study period (1990-2019) and trend parameters using the following equations: where ( ) and ( ) refer to transformed ETo forecasts and observations for month k (k = 1 to 12 corresponding to January In step 3, we assume a bivariate joint distribution (z) between predictor (detrended transformed raw forecasts) and predictand (detrended transformed observations) where μ is the mean vector, and Σ is the covariance matrix. We denote the parameters from equations 3-5 as a vector θ = μ, Σ, , . 150 For each month of the year, model parameters are inferred with training data pairs (predictor and predictand) during the study period . The posterior distribution of the model parameters is: where (θ) is the prior distribution for model parameters, and (D|θ) is the likelihood function. D refers to all data pairs ( ( ) and ( )) used for parameter inference. A Gibbs sampler is utilized to repeatedly sample the parameter sets θ from the 155 conditional posterior distribution of the model parameters.
In the BJP-ti model, informative priors are applied to set boundaries for inferred trends to avoid overfittingextreme . The priors are estimated for values for each grid cell, month, and lead time separately. This informative prior distribution ( ) for trend parameters and is formulated as follows (Shao et al., 2021a): where is the standard deviation of the prior, which is set based on trends of transformed forecasts and observations; is the mean and is the standard deviation for predictors or predictands extracted from the diagonal of covariance matrix Σ (see In step 4, once all the parameters are inferred, we draw 1000 members from a conditional distribution of the predictand ( ( * )), for a given new forecast ( ( * )). In step 5, we add the trend from Equation 4 back to ( * ), to produce calibrated ensemble forecast ( ( * )). In step 6, we back-transform ( * ) to the original space to produce the calibrated ensemble forecasts. Our analysis indicated that our trend-reconstruction strategy (detrending and retrending in the transformed space, and setting limits 175 to inferred trends) would not introduce unwanted large bias in the calibrated forecasts ( Figure S2).

Evaluation of forecast calibration
To evaluate the performance of the calibration, we adopt a leave-one-year-out cross-validation strategy for each grid cell and lead time. Specifically, for one of the 30 years during 1990-2019, we keep month k aside, and then use month k from the remaining 29 years to infer the BJP-ti parameters. Once the parameters are inferred, we generate a calibrated forecast for month 180 k in the year held aside. This process is repeated until a calibrated forecasts areis obtained for month k from each of the 30 years. Similar processes are conducted for other months and other lead times until we obtain calibrated forecasts for all months and the seven lead times for each grid cell across Australia. Evaluation metrics employed to examine the performance of calibrations include the correlation coefficients, skill score, bias, and reliability. The calculation of these metrics is further introduced as follows. 190

Correlation coefficient
We use the Pearson correlation coefficient (r) between raw/calibrated forecasts and observations in each month to examine their consistency in temporal dynamics: where ( ) is the ensemble mean of raw/calibrated ETo forecasts for month k in year t (mm month -1 ); T is the total years during 195 the study period; ̅ is the average of ( ) (mm month -1 ); ( ) is the corresponding ETo observations of the same month (mm month -1 ), and is the average of ( ) (mm month -1 ).

Forecast skills
We use the continuous ranked probability score (CRPS) to measure the skill of the raw and calibrated forecasts (Grimit et al., 2006): 200 where ( , ) is the cumulative density function of an ensemble forecast, and ( ) is the observation at time ; is the Heaviside step function ( = 1 if − ( ) ≥ 0 and = 0 otherwise); the overbar represents averaging across the months during 1/1990-12/2019. For deterministic raw forecasts, CRPS is reduced to absolute errors. 205 We further calculate the CRPS skill score ( ) to measure the skill of raw and calibrated forecasts relative to climatology forecasts using the following equation: where is the CRPS value of climatology forecasts; and refers to CRPS value of raw or calibrated forecasts. Positive indicates better skill than the climatology forecasts, and vice versa. To make the CRPS skill scores 210 of calibrated forecasts generated by different models (BJP vs. BJP-ti) comparable, we use the climatology forecasts from the BJP model as the reference in the calculation of . We evaluate the accuracy of the raw and calibrated forecasts using the following equation: where Bias refers to the bias in ETo (mm month -1 ); n is total months during the 30-year study period (1/1990-12/2019); ( ) is raw or calibrated forecasts of ETo (mm month -1 ), and ( ) is the corresponding ETo observations of the same month (mm month -1 ). Raw forecasts are deterministic since they are calculated based on the ensemble mean of each input variable. For calibrated forecasts, we use the ensemble mean to calculate bias.

Reliability 220
To evaluate the reliability of calibrated ensemble forecasts, we calculate the probability integral transform (PIT) value using the following equation: where ( , ) is the cumulative density function of the ensemble forecast, and ( ) is the observation. For reliable forecasts, the collection of ( ) follows a standard uniform distribution. We use the alpha ( ) index to summarize the reliability in each 225 grid cell with the following equation to check the overall reliability across Australia (Renard et al., 2010): where π* (t) is the sorted π(t), t=1,2,…n in ascending order, and n is the total number of months. The -index measures the total deviation of calibrated forecasts from the corresponding uniform quantile. Perfectly reliable forecasts should have an αindex of 1, and forecasts with no reliability would have an α-index of 0. 230

Trends in observations and raw/calibrated forecasts
We evaluate the capability of BJP-ti in reconstructing temporal trends for months with large areas of statistically significant trends in observed ETo. Since the trend parameters are estimated by month, we first examine the trend in ETo observations for each month k of the year for 1990-2019 ( Figure S1S3). August, September, and October show larger areas with statistically 235 significant trends than other months. As a result, the evaluation of trends in raw/calibrated forecasts is mainly conducted for these three months. Observed ETo shows increasing trends in many parts of Australia in the three selected months (Figure 1, right 4 th column).
Compared with findings from previous investigations, observed trends identified in this study also demonstrate significant spatial variability and varying magnitudes in different months (Donohue et al., 2010;McVicar et al., 2012). We found more 245 positive trends in our study period  than the period of 1981-2006 (Donohue et al., 2010). In August, areas with increasing trends larger than 6 mm decade -1 are mainly located in western parts of Australia. In contrast, central and eastern Australia demonstrates much lower trends of less than 4 mm decade -1 . Observed trends are close to zero in Victoria and Tasmania and even become negative in parts of the Northern Territory. In September, areas with significant increasing trends larger than 6 mm decade -1 are located in many parts of Australia, with the exception of a narrow coastal fringe and areas around 250 the Tropic of Capricorn. In this month, decreasing trends are observed in a small part of eastern areas of Western Australia, where observations are relatively poor. In October, central-eastern Australia, including the inland regions of Victoria, New South Wales, South Australia, and south-west Queensland, demonstrate increasing trends of up to 8 mm decade -1 . At the decadal scale, trends in ETo is comparable with the standard deviation.
Raw ETo forecasts also demonstrate trends, but they differ from those in observations in both spatial patterns and magnitudes 255 (left column in Figure 1). In August, raw forecasts show increasing trends (> 6 mm decade -1 ) in Western Australia, which partially match those in observations. However, in eastern parts of Australia, raw forecasts overpredict trends in observations.
In September, raw forecasts demonstrate even larger overpredictions (>8 mm decade -1 ) in trends than those of August, particularly in Western Australia and New South Wales. In October, raw forecasts are better aligned with observations in the increasing trends in south-eastern Australia; however, they overpredict trends in Western Australia, and underpredict trends in 260 Northern Australia. Figures S2 S4 and S3S5). For the lead time of month Month 3, trends in raw ETo forecasts show similar spatial patterns to those of month Month 0 in August, but the trends mainly drop to less than 2 mm decade -1 . Similarly, the magnitudes of increasing trends in the other two months are also much lower at Mmonth 3 than those at month Month 0. At month Month 6, trends in raw forecasts of the three selected months are 265 close to zero across Australia.

Trends in raw forecasts become weaker at longer lead times (left columns in
Calibrated ETo forecasts produced with the original BJP model demonstrate trends similar to those of raw forecasts in spatial patterns, but show smaller magnitudes (second columns in Figures 1, S2S4, and S3S5). Specifically, at month Month 0, the BJP-calibrated forecasts preserve the spatial variability of trends of the raw forecasts and show higher trends in Western Australia, central parts of Australia, and southern regions of the country for August, September, and October, respectively, but 270 the increasing trends are all less than 4 mm decade -1 , lower than those in raw forecasts (Figure1). Consistencies in the spatial patterns of trends are also found between BJP-calibrated forecasts and raw forecasts at other lead times ( Figures S2 S4 and   S3S5). Similarly, trends are also lower in BJP-calibrated forecasts than those of the corresponding raw forecasts at longer lead times.
Calibration with the BJP-ti model successfully reconstructs the observed trends in the calibrated forecasts (third columns in 275 Figures 1, S2S4, and S3S5). Inconsistencies between raw forecasts and observations in the spatial patterns and magnitudes of trends are effectively corrected through statistical calibration, particularly for regions that demonstrate significant observed trends. In addition, the tendency that trends become weaker at longer lead times in the raw forecasts is also effectively corrected.
In the BJP-ti calibrated forecasts (third columns in Figures 1, S2S4, and S3S5), all lead times show trends consistent with observations in both spatial patterns and magnitudes. 280  and South Australia. In October, r increases cover large areas of southern and central regions of Australia. Similar improvements are also found in the remaining nine months ( Figure S6). Slight decreases in r are also found in regions where the observed trends are not statistically significant.  Reconstruction of trends results in more skillful calibrated forecasts. We compare the CRPS skill scores of BJP-ti calibrated forecasts with those produced with the BJP model for the three selected months (Figure 3). At month Month 0, the CRPS skill 315 score of calibrated forecasts is increased by 5-10% in August, September, and October, when trends are reconstructed. The distribution of areas with increased CRPS skill scores is generally consistent with that of the improved r (Figure 2). Increases in CRPS skill score are greater at longer lead times, in both magnitude and area, than those at short lead times. At Mmonth 3, areas with increased CRPS skill scores expand in Western Australia in August and in northern Western Australia in September.

Skills of raw and calibrated ETo forecasts
Month 6 demonstrates further improvements, with larger areas showing increases in CRPS skill score of over 15% in coastal 320 areas of northern Australia in August and September, and central Australia in October. The other nine months also demonstrate similar improvements in the CRPS skill score in regions with significant trends ( Figure S7). In addition, comparison for all months together also demonstrates improved skill scores following trend reconstruction (Figure 4). Slight decreases in CRPS skill score are also found in regions where the observed trends are not statistically significant. raw ETo forecasts. Compared with the climatology forecasts, raw ETo forecasts demonstrate much lower skills, with CRPS skill scores lower than -25% in all grid cells, even for those at short lead times.
We need to point out that simple bias-correction is often applied to raw ECMWF forecasts before they are used. We applied quantile mapping to the raw ETo forecasts and were able to improve skills in ETo forecasts ( Figure S8). However, the biascorrected forecasts still demonstrate skills much worse than climatology forecasts, particularly at long lead times. 335 With the correction of errors, including the time-dependent errors, the BJP-ti calibrated forecasts demonstrate CRPS skill scores more significant than 20 (%) at month Month 0 in most grid cells. Eastern parts of Australia, such as New South Wales and Victoria, show CRPS skill scores of up to 30 (%). Beyond month Month 0, the skill score decreases significantly in calibrated forecasts. Most areas of Australia show CRPS skill scores lower than 10 (%) at month 1. The skill score further decreases at longer lead times, but remains above zero in many parts of Australia, even at Mmonth 6, suggesting better 340 performances than the climatology forecasts. We also summarize the CRPS skill score of calibrated forecasts by target month at the seven lead times across Australia (Figure  345 56). Individual boxes indicate the variability among all the grid cells across Australia for that month and lead-time. At the first lead time (Mmonth 0), all months show CRPS skill score markedly better than climatology forecasts across most grid cells, with the median CRPS skill score being above 20% for seven months. However, the skill score decreases quickly with lead time. At lead time 1, the CRPS skill score is mainly lower than 10% for all target months. Skills of calibrated forecasts vary among the months. For October, November, and December, the CRPS skill score is above 0 for more than 50% of grid cells, 350 even at lead time 6, indicating better performance than the climatology forecasts. For other months, such as January, April, May, and June, the median CRPS skill score decreases to values slightly below 0 beyond the lead time (Month 0)of month 1. The spatial patterns of bias in the raw ETo forecasts are consistent across all seven lead times, demonstrating systemic errors 360 in raw ETo forecasts ( Figure 6). The BJP-ti calibration substantially corrects the systematic errors in the raw forecasts, resulting in biases close to 0 in calibrated forecasts for all lead times (Figures 6 7 and S4S9). In this study, we generate 1000 ensemble members for each raw forecast to quantify the uncertainties of the calibrated forecasts.

Correlation between raw/calibrated forecasts and observations
As indicated by the α-index, calibrated ETo forecasts are highly reliable. The α-index of calibrated ensemble ETo forecasts is above 0.96 in most parts of Australia for all the seven lead times (Figures 7 9 and S5S11). The high reliability of the calibrated 380 forecasts suggests reasonable representations of uncertainties in calibrated ETo forecasts, and the distributions of calibrated ensemble forecasts are neither too narrow nor too wide ( Figure 79).

The necessity of reconstructing climate trends in seasonal ETo forecasting
This investigation confirms that the misrepresentation of climate trends is an important error source in GCM-based ETo 385 forecasting. Most previous investigations on climate trends in seasonal forecasts were primarily focused on temperature (Krakauer, 2019) andprecipitation (Alizadeh-Choobari et al., 2019), and existing ETo forecasting studies have not investigated trends in ETo forecasts, despite temporal trends in ETo being observed at weather stations across the globe (Djaman et al., 2018;Kousari and Ahani, 2012). Although the ECMWF model runs have been forced with the observed greenhouse gas concentrations for our study period (Johnson et al., 2019), and have actually produced temporal trends in raw ETo forecasts 390 (Figure 1), the trends show significant inconsistencies with observations. In addition, raw ETo forecasts at long lead times Although it may take decades for climate change to substantially alter the magnitude of ETo ( Figures S11 and 12), we 400 recommend that future GCM-based ETo forecasting should still correct time-dependent errors. More frequent extreme weather events in recent years support model projections that climate change will intensify in the future (Kharin et al.,201), and may induce more significant temporal trends in ETo.We recommend that future GCM-based ETo forecasting should correct timedependent errors, since climate change has been projected to intensify in the future (Kharin et al., 2013), and may induce more significant temporal trends in ETo. 405

Implications for improving statistical calibration models
Climate change has posed challenges to the statistical calibration of seasonal climate forecasts. Many post-processing models, such as those based on the probabilistic theory (Tian et al., 2014;Wang et al., 2009), often rely on the climatology of observations to construct the probability distribution function for calibration (Wilks, 2018). However, the non-stationary behavior of the climate system induced by elevated greenhouse gas emissions has been increasingly reported (Haustein et al., 410 2016;Lima et al., 2015). Many calibration models developed for seasonal forecasts have not considered the climate change impacts on the observed climatology. Although these models are proven to be effective in correcting biases in raw forecasts, assuming a static climatology may have hindered the utilization of predictable information in the raw forecasts. This investigation and our previous calibration of seasonal temperature forecasts (Shao et al., 2020(Shao et al., , 2021a, suggest that reconstructing trends in calibrated forecasts is an effective solution for capturing the non-stationary behavior of the climate 415 system for more robust statistical calibrations of seasonal climate forecasts.
This current investigation has further validated the strength of the trend-reconstruction algorithms in BJP-ti. Previously, we applied this model to correct seasonal temperature forecasts and achieved significant improvements in forecast skills relative to the original BJP model (Shao et al., 2020(Shao et al., , 2021a. This study further demonstrates the feasibility for the general application of BJP-ti to different hydroclimate variables showing temporal trends (Shao et al., 2021b(Shao et al., , 2021c. The successful application 420 to ETo forecasts confirms the robustness of trend reconstruction algorithms based on the data transformation, Bayesian inference, and using statistical significance of observed trends to deal with overfitting of trend parameters in the BJP-ti model.
We also anticipate that the BJP-ti algorithms for trend reconstruction could be adopted by other calibration models to enhance for the general application of BJP-ti to different hydroclimate variables showing temporal trends. We also anticipate that the BJP-ti algorithms for trend reconstruction could be adopted by other calibration models to enhance seasonal forecast calibration.

Future work for seasonal ETo forecasting
In this investigation, we successfully improve ETo forecast calibration by reconstructing climate trends. We also identify a few challenges that should be addressed in the future to further enhance GCM-based seasonal ETo forecasting. 430 First of all, more sophisticated cross-validation methods should be developed for the inference of trend parameters. The current leave-one-out method has been proven to be effective in the inference of the mean vector and covariance matrix (Shao et al., 2020). However, this strategy may not guarantee the independence between the left-out data and data used for the inference of trend parameters. We decided not to implement the data-splitting method for cross-validation because of the risk of introducing sampling errors. Future investigations should take this challenge into consideration and develop more robust cross-validation 435 methods for the inference of trend parameters.
In this study, we directly use the raw forecasts of individual input variables (e.g., temperature, solar radiation, and vapor pressure) to construct the raw ETo forecasts. However, trends in these variables have been reported in previous investigations.
Whether correcting errors, including time-dependent errors in the raw forecasts of each input variable, will lead to more skillful calibrated ETo forecasts, warrants further investigation. 440 Correction of lead-time-dependent errors should be further investigated in future GCM-based ETo forecasting. We found sharp declines in the skill of calibrated ETo forecasts from lead time month Month 0 to month Month 1. Model initialization with field observations plays a critical role in seasonal climate forecasting based on GCMs (Doblas-Reyes et al., 2013;Hazeleger et al., 2013). Short-lead-time forecasts are more skillful since they are closer to the observed state of the climate system than those at long lead times. At long lead times, the predictable signal is often much smaller than the intrinsic uncertainty of GCMs. 445 As a result, skills of raw forecasts often decrease quickly in the first month (Swapna et al., 2015), posing a challenge to statistical calibration, even for those using sophisticated calibration models (Hawthorne et al., 2013). Currently, we calibrate raw ETo forecasts of each lead time independently. Whether correcting the lead-time-dependent biases will add extra skill to calibrated forecasts, particularly to those at long lead times, warrants further investigation (Schaeybroeck and Vannitsem, 2018). 450 Future forecast calibration should also investigate the impacts of climate change on the temporal variations of ETo. In addition to the increasing or decreasing trends, warming climate also induced more significant temporal variations in ETo, following increasing climate extremes (Wen et al., 2012). The increasing variations could pose another challenge to statistical calibration models assuming an unchanged variance of observations. This current investigation provides a remedy for dealing with the

Conclusions
ETo forecasting provides useful information for hydrological investigations and has been increasingly used to support water resource forecasting and management. Anthropogenic disturbances have induced changes in the climate system and resulted in trends in many climate variables. GCMs often misrepresent these climate trends and thus lead to time-dependent errors in 460 seasonal climate forecasts. We have recently improved the BJP model to deal with this error source through the reconstruction of observed climate trends in calibrated forecasts. In this study, we apply the BJP-ti model to calibrate raw seasonal ETo forecasts constructed with climate forecasts from the ECMWF SEAS5 model. The BJP-ti model effectively corrects misrepresented climate trends and reconstructs observed trends in calibrated ETo forecasts. More importantly, forecast skills in areas showing statistically significant observed trends in observations are improved following trend reconstruction. This 465 investigation highlights the necessity of correcting time-dependent errors for enhancing GCM-based seasonal ETo forecasting.
We conclude that future ETo forecasting based on GCM climate forecasts could obtain more skillful forecastscould improve forecast skills through climate trend reconstructionreconstructing climate trends in forecasts.
This investigation also provides valuable insights for improving statistical calibrations of seasonal climate forecasts in the future. In recent decades, climate trends have been increasingly observed. However, many calibration models for seasonal 470 forecasts have not taken the non-stationary behavior of the climate system into consideration. Improved forecast skills in seasonal ETo forecasts through the reconstruction of temporal trends, together with our previous calibration of seasonal temperature forecasts, validate the robustness and effectiveness of biastrend-reconstruction algorithms in the BJP-ti model.
We anticipate that these algorithms would be applicable to enhance other calibration models.

Data availability: 475
Data used in this study are available by contacting the corresponding author.

Author contributions：
Q. Yang and Q. J. Wang conceived this study. Q. J. Wang developed the calibration model. Q. Yang took the lead in writing and improving the manuscript. All co-authors, including A. Western, W. Wu, Y. Shao, and K. Hakala, contributed to discussing the results and improving the manuscript.