Quantifying overlapping and differing information of global precipitation for GCM forecasts and El Niño–Southern Oscillation

. While El Niño–Southern Oscillation (ENSO) teleconnection has long been used in statistical hydroclimatic forecasting, global climate models (GCMs) provide increasingly available dynamical precipitation forecasts for hydrological modelling and water resources management. It is not yet known to what extent dynamical GCM forecasts provide new information compared to statistical teleconnection. This paper develops a novel Set Operations of Coefficients of Determination (SOCD) method to explicitly quantify the overlapping and differing information for GCM forecasts and ENSO 15 teleconnection. Specifically, the intersection operation of the coefficient of determination derives the overlapping information for GCM forecasts and Niño3.4 index, and then the difference operation determines the differing information in GCM forecasts (Niño3.4 index) from Niño3.4 index (GCM forecasts). A case study is devised for the Climate Forecast System version 2 (CFSv2) seasonal forecasts of global precipitation in December-January-February. The results show that the overlapping information for GCM forecasts and Niño3.4 index is significant for 34.94% of global land grid cells, the differing information 20 in GCM forecasts from Niño3.4 index is significant for 31.18% of grid cells and the differing information in Niño3.4 index from GCM forecasts is significant for 11.37% of grid cells. These results confirm the effectiveness of GCMs in capturing the ENSO-related variability of global precipitation and illustrate where there is room for improvements of GCM forecasts. Overall, the bootstrapping-based significance tests of the three types of information facilitate in total eight patterns to disentangle the close but divergent association of GCM forecast correlation skill with ENSO teleconnection.


Introduction
Seasonal hydroclimatic forecasts are important for agricultural scheduling, water management and drought mitigation (Sheffield et al., 2014;Anghileri et al., 2016;Peng et al., 2018;He et al., 2019;Zhao et al., 2019). Performing hydroclimatic forecasting into the future, the uncertainty generally arises from catchment initial conditions and future climate forcings (Wood and Lettenmaier, 2006;Yuan et al., 2014;Huang et al., 2020). In a short lead time up to about one month, initial conditions 30 tend to outweigh climate forcings; at longer lead times, climate forcings become a more important contributor (Li et al., 2009;Yossef et al., 2013). Therefore, besides remote sensing-based estimations of initial conditions of snow cover, soil moisture and groundwater storage (Mei et al., 2020;Xu et al., 2020b;Sheffield et al., 2014), efforts have been devoted to developing sub-seasonal to seasonal hydroclimatic forecasts of temperature and precipitation (Schepen et al., 2020;Strazzo et al., 2019;Bennett et al., 2016;Cash et al., 2019;Li et al., 2017). While temperature forecasts have been improved substantially in the 35 past decades, the generation of skilful precipitation forecasts remains a challenging task (Becker et al., 2022).
Teleconnections with climate indices generally reflect slowly varying and recurrent components, such as sea surface temperature (SST), of atmospheric circulations that link climate anomalies over large distances in both the tropics and 40 extratropics (Webster and Yang, 1992;Mason and Goddard, 2001;Lim et al., 2021). As one of the most remarkable teleconnections, ENSO affects the global climate through eastward propagating Kelvin waves, westward propagating Rossby waves and Walker circulations that span the tropical Pacific, Indian and Atlantic Oceans (Yang et al., 2018;Webster and Yang, 1992). For regions exhibiting teleconnection patterns, various forecasting models have been developed, including historical resampling methods (Hamlet and Lettenmaier, 1999;Wood and Lettenmaier, 2006;Lim et al., 2021), statistical (Bayesian) 45 methods (Hidalgo and Dracup, 2003;Strazzo et al., 2019;Emerton et al., 2017) and machine learning methods (Xu et al., 2020a;Li et al., 2021).
Major climate centers develop global climate models (GCMs) to generate operational forecasts of global climate (Bauer et al., 2015;Saha et al., 2014;Khan et al., 2017;Johnson et al., 2019a;Kirtman et al., 2014). For example, the United States National Centers for Environmental Prediction (NCEP) runs the Climate Forecast System version 2 (CFSv2) (Saha et al., 2014) and the 50 European Centre for Medium-Range Weather Forecasts operates the fifth-generation seasonal forecast system (SEAS5) (Johnson et al., 2019b). In contrast to teleconnections that are generally "statistical", GCM forecasts are "dynamical" in that GCMs assimilate observational information to reduce initial state uncertainty and couple atmosphere, land, ocean and sea ice modules to formulate complex interactions among different components of the earth system (Bauer et al., 2015;Corti et al., 2015;Becker et al., 2022). Previous studies found that GCM forecasts tend to be skilful in regions subject to prominent ENSO 55 teleconnection and also highlighted that GCM forecasts can be skilful in some extratropical regions where there is limited ENSO teleconnection (Johnson et al., 2019b;Kirtman et al., 2014;Delworth et al., 2020).
Conventional ENSO-based statistical forecasts and emerging GCM dynamical forecasts generally represent two different sources of information (Wood and Lettenmaier, 2006;Bauer et al., 2015;Emerton et al., 2017;Delworth et al., 2020;He et al., 2021). While both of them are valuable and they can further be combined to generate improved forecasts (Madadgar et al., 60 2016;Wanders et al., 2017;Strazzo et al., 2019), it is not yet known to what extent their information overlaps or differs. Small overlap and large difference highlight that GCM forecasts do offer new information comparing to ENSO teleconnection, while large overlap and small difference imply that GCM forecasts might not provide additional information. Zhao et al. (2021) investigated the overlapping information to attribute GCM forecast correlation skill to ENSO teleconnection. In this paper, we build a Set Operations of Coefficients of Determination (SOCD) method upon Zhao et al. (2021) to furthermore account for 65 the differing information. As will be demonstrated through the methods and results, besides the overlapping information, there exist two types of differing information, i.e., the differing information in GCM forecasts from ENSO and the differing information in ENSO from GCM forecasts. The three types of information facilitate eight patterns to disentangle the close but divergent association of GCM correlation skill with ENSO teleconnection. 70 2 Data description GCM precipitation forecasts are generally five-dimensional data (Kirtman et al., 2014;Saha et al., 2014;Delworth et al., 2020;Zhao et al., 2021;Becker et al., 2022). Taking the NCEP-CFSv2 forecasts as an example, the five dimensions are: 1) forecast start time s, which represents the time at which forecasts are generated, is marked by the number of months since January 1960; 2) lead time l, which represents the months ahead the start time, ranges from 0 to 9; 3) ensemble member n, which is meant to 75 explicitly account for forecast uncertainty, ranges from 1 to 24, i.e., 24 ensemble members in total; 4) latitude y; and 5) longitude x. GCM forecasts are therefore formulated as: where f represents individual forecast value under the five dimensions and all the forecast values form a dataset F.
The observed precipitation corresponding to the forecasts has three dimensions: ,, in which o represents individual observation value and O the dataset of observations. The three dimensions are target time t, 80 latitude y and longitude x. It is important to note that target time t is mathematically the sum of start time s and lead time l in aligning observations with forecasts.
Niño3.4 index that indicates the SST of the East Central Tropical Pacific (5º N-5º S, 170º -120º W) is one of the most popular indicators of the status of ENSO (Hamlet and Lettenmaier, 1999;Emerton et al., 2017;Lin et al., 2020): in which there is only one dimension, i.e., time t, for Niño3.4. 85 F, O and Niño3.4 shown in Eqs. (1) to (3) lay the basis for the analysis of overlapping and differing information in this paper.
In the North American Multi-Model Ensemble (NMME) experiment (Kirtman et al., 2014), CFSv2 retrospective forecasts that range from 1982 to 2010 have been temporally aggregated to monthly and spatially regridded to a 1.0º ×1.0º resolution. In the meantime, the daily Unified Rain-gauge Database (Chen et al., 2008) of the Climate Prediction Center (CPC-URD) precipitation observations over land have also been aggregated and regridded by the NMME. In the analysis, both CFSv2 90 forecasts and CPC-URD observations are obtained from the International Research Institute of the Columbia University (https://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/). Monthly Niño3.4 is obtained from the CPC (https://www.cpc.ncep.noaa.gov/data/indices/).

Consideration of seasonality
Precipitation worldwide exhibits seasonality, e.g., wet and dry seasons of monsoonal precipitation (Webster and Yang, 1992;Zhao et al., 2017;Liu et al., 2022). As a result, the predictive performance of GCM forecasts varies across different seasons (Kirtman et al., 2014;Bauer et al., 2015;Strazzo et al., 2019) and ENSO teleconnection also exhibits seasonal variabilities (Mason and Goddard, 2001;Peel et al., 2004;Emerton et al., 2017). By fixing the target season, lead time l would be 100 determined by start time s. Taking December-January-February (DJF) for an example, forecasts generated at the start of December are at 0-month lead time, forecasts at the start of November are at 1-month lead time, and so on.
In Eq. (5) is observed precipitation in the target season (DJF) across multiple years at the selected grid cell (y, x). Similar to forecasts, monthly observations are aggregated into seasonal.
Furthermore, the Niño3.4 index in the same season as observed precipitation is obtained: In Eq. (6) is the concurrent Niño3.4 of the target season (DJF) across multiple years.

Quantification of information in forecasts and Niño3.4
The coefficient of determination (R 2 ) is effective in quantifying the proportion of the variance of dependent variable explained by a regression model that is built upon some independent variable(s) (Pham, 2006). In this paper, the dependent variable is observed seasonal precipitation (Eq. 5). The candidate independent variables are GCM precipitation forecasts (Eq. 4) and Niño3.4 index (Eq. 6). Three classic simple linear regression models are set up to account for the information of observations 120 in forecast ensemble mean and Niño3.4 index.
The first model regresses observed seasonal precipitation o against ensemble mean f of GCM precipitation forecasts: in which 1  and 1  are respectively the intercept and slope parameters. The unexplained variance indicated by the sum of squared residuals, i.e., In this way, the proportion of variance explained by ensemble mean is quantified. 125 The second model regresses observed seasonal precipitation o against niño3.4: in which 2  , 2  and 2,k  are respectively the intercept parameter, slope parameter and residual of regression. This regression quantifies the proportion of variance of observed precipitation explained by Niño3.4.
The third model regresses observed seasonal precipitation o against both ensemble mean f and niño3.4: in which 3  , 3,1  , 3,2  and 2,k  are respectively the intercept parameter, slope parameter of ensemble mean, slope parameter 130 of Niño3.4 and residual of regression. The proportion of the variance of observed precipitation explained by the union of ensemble mean and Niño3.4 is therefore measured by this bi-variate regression.

Quantification of overlapping and differing information
As shown by Venn diagrams in Figure 1, the information of observed precipitation contained in forecast ensemble mean, 135 Niño3.4 index and their union are respectively quantified by 2 (~) Following the classic set theory, the SOCD method performs the set operations of intersection and difference to quantify the overlapping and differing information: 1) The proportion of variance explained by ensemble mean but not by Niño3.4 index is derived by the difference operation: 145 In Eq. (10), 2 3 (~) .4 f ni o R n o measures the differing information of GCM forecasts on observed precipitation from Niño3.4 index.
2) The intersection operation derives the proportion of variance of seasonal precipitation explained by both ensemble mean and Niño3.4 index: In Eq. (11), 2 3 (~) .4 f ni R no o  represents the overlapping information. 150 3) The proportion of variance explained by Niño3.4 index but not by ensemble mean is derived by the difference operation: In Eq. (12), 2 3 (~) .4 nino Ro f represents the differing information of Niño3.4 index from GCM forecasts.

Eight patterns for overlapping and differing information
The significance of overlapping and differing information is tested by bootstrapping (Efron and Tibshirani, 1986). The null 155 hypothesis is that the three variables under investigation, i.e., o , f and 3.4 nino , were fully independent from one another. Under the null hypothesis, the samples in Eqs. (4), (5) and (6) are randomly selected with replacement to calculate the overlapping and differing information; one thousand such recalculations formulate the respective reference distributions for these R 2 values. Comparing the R 2 values for the original samples respectively to their reference distributions, the p-values are obtained to tell how extreme the R 2 values for the original samples are. In this way, the significance is tested (Efron and 160 Tibshirani, 1986;Pham, 2006). As the null hypothesis is full independence, the R 2 values, which indicate the amount of information of the dependent variable contained in independent variable(s), are expected to be rather small. From this perspective, the larger the R 2 values for the original samples are, the more extreme they are and the less likely the null hypothesis holds. Therefore, the one-tailed test is implemented for the significance of the R 2 values (Pham, 2006). Specifically, under the significance level of 0.10, the SOCD method pays attention to whether the R 2 value falls into the top 10% of the 165 corresponding bootstrapping-derived reference distribution.
The one type of overlapping information and the two types of differing information each have two cases of significance, i.e., significant or non-significant. Therefore, in Table 1, a three-digit number is devised to represent the results of significance test.
The first digit indicates the significance of 2 3 the third digit the significance of 2 3 (~) .4 nino Ro f . As is shown in Table 1, there are in total 8 (2*2*2) patterns, with 1 170 representing the significant case and 0 indicating the non-significant case. The meanings of the eight patterns are illustrated in the last column of Table 1.

Spatial plots of correlation skill and ENSO teleconnection
GCM forecast correlation skill and ENSO teleconnection for DJF are shown in the left-hand side of Figure 2. The correlation skill is mathematically the Pearson's correlation coefficient between GCM forecast ensemble mean and observed precipitation.
In the upper left part of Figure 2, it is observed that the correlation skill is higher than 0.3 in a substantial number of grid cells 180 around the world. This result indicates that ensemble mean is generally indicative of observed precipitation, i.e., high values of ensemble mean coincide with high values of observed precipitation and vice versa (Saha et al., 2014;Yuan et al., 2014;Cash et al., 2019). In the lower left part is ENSO teleconnection that mathematically represents the Pearson's correlation coefficient between Niño3.4 index and observed precipitation. Both positive and negative ENSO teleconnections are observed.
For example, the teleconnection tends to be positive in southern North America, south-eastern South America, southern China 185 and Eastern Africa, implying above-average precipitation in El Niño years but below-average precipitation in La Niña years; and it turns out to be negative in the northern part of South America, southern Africa as well as Southeast Asia, i.e., there can be below-average precipitation in El Niño years and above-average precipitation in La Niña years (Mason and Goddard, 2001;Emerton et al., 2017;Yang et al., 2018).
The SOCD method facilitates in total eight patterns to characterize the overlapping and differing information for GCM forecast 190 ensemble mean and Niño3.4 index. While the Venn diagram in Figure 1 is largely conceptual, the right-hand side of Figure 2 showcases the Venn diagrams generated from real-world data. The eight patterns in Table 1

Figure 3: Spatial distribution of the eight patterns of overlapping and differing information.
The spatial distribution of the eight patterns is shown in Figure 3 by applying the SOCD method to all the land grid cells. Grid cells under the pattern 000, which indicates poor GCM correlation skill and limited ENSO teleconnection, are in grey. In the meantime, it is noted that a considerable amount of grid cells around the world are colored. That is, for the overlapping 210 information and two types of differing information, at least one of them is significant. From the left-hand side of Figure 2, it can be found that positive correlation skill corresponds to positive ENSO teleconnection in southern North America and Eastern Africa and that positive correlation skill corresponds to negative teleconnection over the northern part of South America, southern Africa and Southeast Asia. In the meantime, from Figure 3 it can be observed that in these regions a considerable number of grid cells fall under the patterns 010, 110 and 011, indicating significant overlapping information. 215

Patterns of overlapping and differing information
The eight patterns serve as a link between correlation skill and ENSO teleconnection. The pattern 010 that is concentrated on the overlapping information is shown in  Figure 4). It is also significant in southern Africa and northern South America where positive correlation skill and negative ENSO teleconnection coexist. As both correlation skill and ENSO teleconnection are mathematically the Pearson's correlation coefficient, they each can be classified into three cases, i.e., significantly positive (P), non-significant (ns) and significantly negative (N) (Kirtman et al., 2014;Emerton et al., 2017;Huang and Zhao, 2022). At the right-hand side of Figure  225 4, the Sankey diagram shows that 18.95% of the global land grid cells exhibit the pattern 010. For this pattern, 8.98% of grid cells exhibit significantly positive correlation skill, 9.85% non-significant correlation skill and 0.12% significantly negative correlation skill; 3.77% exhibiting significantly positive ENSO teleconnection, 10.92% non-significant ENSO teleconnection and 4.25% significantly negative ENSO teleconnection.

235
The pattern 100 focuses on the significant differing information of global precipitation in GCM forecasts from Niño3.4 index.
From the left-hand side of Figure 5, it can be observed that this pattern (middle left part) tends to cover grid cells where correlation skill is around or above 0.3 (upper left part) but ENSO teleconnection is nearly zero (lower left part). This observation is confirmed by the right-hand side of Figure 5. As can be seen, while the percentage of grid cells falling into the 240 pattern 100 is 17.71%, most of them are with significantly positive correlation skill (15.78% in 17.71%) but all of them exhibit non-significant ENSO teleconnection (17.71% in 17.71%). These grid cells tend to locate in Europe and North Asia, where the influence of ENSO is limited and skillful GCM forecasts can relate to other teleconnections such as Arctic Oscillation and North Atlantic Oscillation (Hamouda et al., 2021). The pattern 110 indicates that the overlapping information is significant and that the differing information in GCM forecasts from Niño3.4 index is also significant. The implication is that regarding global seasonal precipitation in DJF, GCM forecasts 250 not only contain information that is contained in Niño3.4 index but also provide a considerable amount of new information.
On the left-hand side of Figure 6, some grid cells under the pattern 110 are observed in southeast Australia, eastern Africa and northeastern Asia. Comparing Figure 6 to Figure 4, it is observed that some grid cells in southern North America, northern South America and southern Africa are under the pattern 110, although many of them tend to be under the pattern 100. Around the world, the percentage of grid cells falling into the pattern 110 is 11.35%. For these grid cells, correlation skill is 255 predominantly significantly positive (11.25% in 11.35%) and by contrast ENSO teleconnection tends to be non-significant (7.09% in 11.35%).

260
The pattern 001 pays attention to the differing information in Niño3.4 index from GCM forecasts. As shown in Figure 7, this pattern covers 4.87% of grid cells around the world. On the left-hand side of Figure 7, it is worthwhile to note that a number of grid cells in Western Australia exhibit significantly negative ENSO teleconnection but non-significant correlation skill. The implication is that therein GCM forecasts might fail to account for the information of ENSO teleconnection. At the right-hand 265 side of Figure 7, it is observed that most grid cells under the pattern 001 are with neutral correlation skill (4.86% in 4.87%) and that their corresponding ENSO teleconnection can be significantly negative (2.21% in 4.87%) or significantly positive (1.73% in 4.87%). The pattern 011 indicates that both the overlapping information and the differing information in Niño3.4 index from GCM forecasts are significant. Grid cells exhibiting this pattern tend to be scattered in parts of southern North America, northern South America, Southeast Asia and southern Africa. They account for 4.38% of grid cells around the world. Among them, 275 1.72% exhibit significantly positive ENSO teleconnection and 2.66% significantly negative ENSO teleconnection. For these areas, the significant overlap suggests that a substantial amount of information in seasonal precipitation can be explained by both GCM forecasts and Niño3.4, while the significant differing information indicates the part that can only be explained by the Niño3.4 index. The pattern 101 is shown in Figure 9. It suggests that at some grid cells, the overlapping information is not significant but the two types of differing information are significant for both GCM forecasts and Niño3.4 index. About 1.86% of grid cells fall 285 into this pattern.
The pattern 111 is shown in Figure 10. It implies that at some other grid cells, the overlapping information and the two types of differing information can all be significant. It is noted that only 0.26% of grid cells around the world exhibit pattern 111.

295
Among the eight patterns, the pattern 000 covers the most grid cells. The left-hand side of Figure 11 shows that grid cells under the pattern 000 generally exhibit non-significant correlation skill and non-significant ENSO teleconnection. This result is in sharp contrast to the pattern 010 which indicates reasonable correspondence between correlation skill and ENSO teleconnection ( Figure 4) and to the patterns 100 and 110 which suggest significantly positive correlation skill (Figures 5 and   6). Overall, the percentage of grid cells under the pattern 000 is 40.62%. These grid cells predominantly exhibit neutral 300 correlation skill (40.30% in 40.62%) and neutral ENSO teleconnection (40.47% in 40.62%).

Association of correlation skill with ENSO teleconnection
The results under the eight patterns are furthermore pooled in the analysis. From Figure 12, it can be observed that the eight patterns serve to be an effective link between correlation skill and ENSO teleconnection at the global scale. For the patterns that indicate significant information, the Sankey diagram at the right-hand side suggests that the percentage from the highest to the lowest is respectively 18.95% for the pattern 010, 17.71% for the pattern 100, 11.35% for the pattern 110, 4.87% for the 310 pattern 001, 4.38% for the pattern 011, 1.86% for the pattern 101 and 0.26% for the pattern 111. More than half of the grid cells that exhibit significant correlation skill have significant overlapping information with Niño3.4, with 11.25% (8.98%) of grid cells under the pattern 110 (010), indicating considerable impacts of ENSO teleconnection on CFSv2 correlation skill.

Figure 12: Illustrations of correlation skill (upper left part) and ENSO teleconnection (lower left part) under the eight patterns (middle left part) at the global scale and Sankey diagram showing the percentages of grid cells exhibiting significantly positive (P), non-significant (ns) and significantly negative (N) correlation skill/ENSO teleconnection (right part)
GCM forecasts and Niño3.4 index generally represent two different sources of information of global precipitation. In Figure  320 13, GCM forecast correlation skill is plotted against ENSO teleconnection by using scatter plots. Figure 13a pools global land grid cells and employs the Viridis heatmap to indicate point density. It can be observed that the correlation skill is largely positive and fall above the horizontal line. In addition, the heatmap suggests that the correlation skill tends to increase with the increase of positive ENSO teleconnection and also with the decrease of negative ENSO teleconnection. These results suggest that the skill of GCM forecasts benefits from the prominence of ENSO teleconnection since GCMs tend to capture the 325 influences of ENSO on the variability of global precipitation (Saha et al., 2014;Khan et al., 2017;Delworth et al., 2020;Johnson et al., 2019b;Becker et al., 2022).
The other eight subplots of Figure 13 are arranged in descending order of the percentage of grid cells (Figures 13b-i). Overall, a close but divergent association of correlation skill with ENSO teleconnection can be observed: 1) There exists significant overlapping information in GCM forecasts and Niño3.4 index under the patterns 010 (Figure 13c (Figure 13i). The significance is for 31.18% of global land grid cells, i.e., 17.71% (100) 335 + 11.35% (110) + 1.86% (101) + 0.26% (111). Under these patterns, it is highlighted that the correlation skill tends to be higher than ENSO teleconnection. In particular, significantly positive correlation skill coincides with overall non-significant ENSO teleconnection under the pattern 100 in Figure 13f. Overall, these results imply that apart from ENSO, GCMs account for other hydro-climatic teleconnections to produce skilful precipitation forecasts (Saha et al., 2014;Delworth et al., 2020;Lin et al., 2020); 340 3) There is significant differing information in Niño3.4 index from GCM forecasts under the patterns 001 (Figure 13f (111). Under these patterns, ENSO teleconnection is generally higher than correlation skill. Remarkable ENSO teleconnection coincides with overall non-significant correlation skill under the pattern 001 in Figure   13b. These results suggest that some ENSO teleconnection is still yet to be exploited by GCMs to improve precipitation 345 forecast skill.
4) Neither the overlapping information nor the two types of differing information are significant under the pattern 000. It covers 40.62% of grid cells. From Figure 13b, it can be observed that either correlation skill or ENSO teleconnection is limited and that the corresponding scatter plot tends to cluster around the origin point. This result suggests that despite limited ENSO teleconnection, GCM forecasts still have plenty of room for improvement. 350

Discussion
The SOCD method is furthermore applied to investigate the eight patterns considering the effects of seasonality, lead time, lag time and significance level. The additional results are presented in the supplementary material. 1) The effect of seasonality is shown in Figures S1 to S6. It can be observed that regions exhibiting significant ENSO teleconnections vary by season (Figures 360 S1 to S3) and that the eight patterns remain effective in characterizing the overlapping and differing information ( Figures S4  to S6). 2) The effect of lead time is illustrated in Figures S7 to S10. At the lead times of 1 and 2 months, the percentage of the pattern 010 remains the highest among the seven patterns other than 000. This result highlights the existence of significant overlapping information in DJF, particularly over southern North America, northern South America and Southern Africa. 3) The effect of the lag time of Niño3.4 index is illustrated in Figures S11 to S14. Compared to the concurrent teleconnection, 365 the spatial distribution of the eight patterns tends to be similar for monthly Niño3.4 index at the lag times of 1 and 2 months, with a slight increase in the percentage of the pattern 000. The result confirm the temporal persistency in the Niño3.4 index (Yang et al., 2018). 4) The effect of the significance level is shown in Figures S15 to S18. As the significance level is reduced from 0.10 to 0.05 and furthermore to 0.01, the percentage of the pattern 000 evidently increases but the seven patterns that highlight significant overlapping and differing information remain. 370 The SOCD method is also extended to evaluate the overlapping and differing information under other GCM forecasts and hydroclimatic teleconnections. In the supplementary material, Figures S19 and S20 show the results for the CanCM4 forecasts generated at the Canadian Meteorological Center (CMC) (Merryfield et al., 2013). The CanCM4 forecasts seem to be less skilful in Europe but more skilful in the western part of Australia. Overall, the percentage of the pattern 000 is slightly higher than that for CFSv2 forecasts. These results suggest that different GCM forecasts can be complementary to each other in 375 different regions and that they can be combined to generate more skilful forecasts (Kirtman et al., 2014;Slater et al., 2019;Schepen et al., 2020). Figures S21 and S22 present the eight patterns for the Indian Ocean Dipole (IOD) (Cai et al., 2021). It can be observed that the percentage of the pattern 010 is reduced from 18.95% to 9.41% while the percentage of the pattern 100 is increased from 17.71% to 22.83%. The indications are that CFSv2 forecasts exhibit less overlapping information with IOD and that there exists considerable differing information in CFSv2 forecasts from IOD teleconnection. 380 The correlation skill is one of the most popular measures of forecast skill owing to its simplicity in calculation and robustness to zero and missing values (Barnston et al., 2012;Yuan et al., 2014;Ma et al., 2016;Slater et al., 2019;Huang and Zhao, 2022). From spatial plots of correlation skill at regional or global scales, it can be observed where GCM forecasts are skilful and where GCM forecasts are not satisfactory Slater et al., 2019;Delworth et al., 2020). Previously, it was observed that GCM forecasts tend to be skilful in regions subject to prominent influences of ENSO; accordingly, forecast skill 385 is attributed to the effectiveness of GCMs in capturing ENSO-related climate dynamics (Kirtman et al., 2014;Slater et al., 2019;Lin et al., 2020). In this paper, the developed SOCD method not only confirms the significant overlapping information but also highlights that there exists significant differing information in GCM forecasts from ENSO teleconnection for 31.18% of global land grid cells and that there is significant differing information in ENSO teleconnection from GCM forecasts for 11.37% of grid cells. It is noted that the simple linear regression only accounts for linear relationships. Possible nonlinear 390 relationships between forecasts and observations suggest the usage of nonlinear models in future analysis of the overlapping and differing information (Strazzo et al., 2019;Schepen et al., 2020;Li et al., 2021).

Conclusions
While ENSO teleconnection has been conventionally used in hydroclimatic forecasting of regional precipitation and 395 streamflow, GCM forecasts are increasingly available for hydrological applications. It is important to investigate to what extent emerging GCM forecasts provide "new" information compared to conventional ENSO teleconnection. The SOCD method developed in this paper addresses this issue through the mathematical formulation of set operations. Specifically, the union operation quantifies the information of global seasonal precipitation contained in both GCM forecasts and Niño3.4 index; the intersection operation derives the overlapping information of global precipitation in GCM forecasts and Niño3.4 index; and 400 furthermore, the difference operation illustrates two types of differing information, i.e., the differing information in GCM forecasts from Niño3.4 index and the differing information in Niño3.4 index from GCM forecasts. The significance tests of the three types of information facilitate in total eight patterns to disentangle the close but divergent association of GCM forecast correlation skill with ENSO teleconnection. GCM forecasts and Niño3.4 index generally provide two different sources of data for hydroclimatic forecasting. While the existence of significant overlapping information suggests that they can provide some 405 similar information, the existence of significant differing information indicates that the two data sources can also be complementary to each other. In the future, more efforts can be devoted to investigating more datasets of GCM forecasts and more hydroclimatic teleconnections to yield insights into the forecast skill of GCM forecasts and to facilitate applications of GCM forecasts to hydrological modelling and water resources management.