the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A decomposition approach to evaluating the local performance of global streamflow reanalysis
Tongtiegang Zhao
Zexin Chen
Yu Tian
Bingyao Zhang
Yu Li
Xiaohong Chen
While global streamflow reanalysis has been evaluated at different spatial scales to facilitate practical applications, its local performance in the time–frequency domain is yet to be investigated. This paper presents a novel decomposition approach to evaluating streamflow reanalysis by combining wavelet transform with machine learning. Specifically, the time series of streamflow reanalysis and observation are respectively decomposed and then the approximation components of reanalysis are evaluated against those of observed streamflow. Furthermore, the accumulated local effects are derived to showcase the influences of catchment attributes on the performance of streamflow reanalysis at different scales. For streamflow reanalysis generated by the Global Flood Awareness System, a case study is devised based on streamflow observations from the Catchment Attributes and Meteorology for Largesample Studies. The results highlight that the reanalysis tends to be more effective in characterizing seasonal, annual and multiannual features than daily, weekly and monthly features. The Kling–Gupta efficiency (KGE) values of original time series and approximation components are primarily influenced by precipitation seasonality. High values of KGE tend to be observed in catchments where there is more precipitation in winter, which can be due to low evaporation that results in reasonable simulations of soil moisture and baseflow processes. The longitude, mean precipitation and mean slope also influence the local performance of approximation components. On the other hand, attributes on geology, soils and vegetation appear to play a relatively small part in the performance of approximation components. Overall, this paper provides useful information for practical applications of global streamflow reanalysis.
 Article
(9946 KB)  Fulltext XML

Supplement
(1824 KB)  BibTeX
 EndNote
Global streamflow reanalysis provides valuable information for water resources management (Beck et al., 2017; Harrigan et al., 2020; Pokhrel et al., 2021). Generated by using climate reanalysis to drive global hydrological models (GHMs; Alfieri et al., 2020; Hersbach et al., 2020; MuñozSabater et al., 2021), there exist multiple streamflow reanalysis datasets, e.g., the Global Flood Awareness System (GloFAS) within the European Centre for MediumRange Weather Forecasts (ECMWF)'s latest global atmospheric reanalysis (GloFASERA5; Harrigan et al., 2020), the Global ReachLevel A Priori Discharge Estimates for SWOT (GRADES; Lin et al., 2019) and the Global ReachLevel Flood Reanalysis (GRFR; Yang et al., 2021). In practice, streamflow reanalysis can bridge the data gaps for ungauged and poorly gauged catchments and provides estimates on a large spatial scale and with sufficient temporal resolution (Lin et al., 2019; Harrigan et al., 2020; Yang et al., 2021). For example, the recent GloFASERA5 provides streamflow information at the daily time step and with a spatial resolution of 0.1° across the globe (Harrigan et al., 2020).
The local performance plays a critical part in practical applications of global streamflow reanalysis (Veldkamp et al., 2018; Munia et al., 2020; Feng et al., 2021). By evaluating global reanalysis against observed streamflow, diagnostic plots and verification metrics are generated to showcase its local performance (Xie et al., 2019; Gao et al., 2020; Cantoni et al., 2022; Huang et al., 2022; Zhao et al., 2022a; Han et al., 2023; Liu et al., 2023). In the meantime, hydrological signatures derived from reanalysis are compared to those obtained from observed streamflow to facilitate insights into the effectiveness of hydrological models (Beck et al., 2017; Chen et al., 2022; Zhao et al., 2022b). For example, the performances of 10 InterSectoral Impact Model Intercomparison Project (ISIMIP) models are evaluated for low, mean and high flows using five streamflow percentile series (Chen et al., 2021). Considering limited observation data, streamflow reanalysis can serve as reference data to calibrate hydrological models, and then the model outputs can be compared to observations to see whether practical applications are available (SenentAparicio et al., 2021).
Time series analysis is one of the most important approaches to investigating the performance of hydrological models (Saraiva et al., 2021; Manikanta and Vema, 2022; Guo et al., 2022). From the perspective of time series, hydrological simulations are a combination of the components of periodic motion, trend, seasonality and error, which can be extracted by using decomposition approaches (Abebe et al., 2022; Manikanta and Vema, 2022; Xu et al., 2022). As one of the most important decomposition approaches, wavelet transform decomposes streamflow into time series of wavelet coefficients under certain frequencies (Manikanta and Vema, 2022). Therefore, it allows for multiresolution analysis compared to other decomposition approaches (Montoya et al., 2022). Owing to the time–frequency characterization, waveletbased features of reanalysis and observed streamflow can be compared in order to zoom into detailed information for multiple time series segments (Manikanta and Vema, 2022). If there are errors in the reanalysis at specific timescales or during specific periods, the sources of these errors can be identified by the technique of time–frequency characterization (Lane, 2007).
While global streamflow reanalysis has been evaluated at different spatial scales (Harrigan et al., 2020; Chen et al., 2021; SenentAparicio et al., 2021), the time series characteristics of streamflow reanalysis in the time–frequency domain are yet to be investigated. Meanwhile, it is difficult to interpret the local performance of global streamflow reanalysis across different locations (Sichangi et al., 2016; Ghiggi et al., 2019; Tu et al., 2024), let alone the additional interpretation of the local performance at different timescales. This paper aims to bridge the gap by presenting a novel evaluation of global streamflow reanalysis by combining the discrete wavelet transform (DWT) with machine learning techniques. That is, the DWT is employed to exploit streamflow reanalysis in the time–frequency domain; then the accumulated local effects (ALEs) are derived by the random forest model to showcase the performance of original time series of reanalysis and its decomposed components at different scales. As will be demonstrated in the Methods and Results sections, streamflow reanalysis does exhibit different local performances at different timescales, and the influences of catchment attributes are illustrated.
2.1 Overview of the decomposition approach
A novel decomposition approach that combines the wavelet transform with machine learning techniques is proposed to evaluate global streamflow reanalysis in the time–frequency domain. There are three steps.

Decomposition of time series. The DWT is used to decompose the reanalysis and observed streamflow time series, resulting in “approximation” and “detail” components at different scales.

Verification of decomposed series. The Kling–Gupta efficiency (KGE), correlation, bias ratio and variability ratio are derived to indicate the local performance of original time series, approximation and detail components at various scales. In the meantime, the densitybased spatial clustering of applications with noise (DBSCAN) algorithm is used to remove outliers from the verification metrics.

Influences of catchment attributes. The ALEs derived from the random forest model are employed to elaborate on the influences of catchment attributes and then identify the driving factors.
2.2 Decomposition of time series
Both reanalysis and observed streamflow time series are decomposed into approximation and detail components using the DWT (Chalise et al., 2023). It is executed by controlling the scaling and shifting factors associated with a mother wavelet (Nalley et al., 2012). Following Wei et al. (2012), the Daubechies wavelet of order 5 is used to decompose the streamflow time series (Talukder et al., 2020):
in which q(t) is the time series to be decomposed, m and n are integers that respectively represent the amount of dilation and translation of the wavelet, t represents the discrete time, and ψ represents the wavelet basis function (Nalley et al., 2012):
The DWT decomposes a signal into approximation (lowfrequency) and detail (highfrequency) coefficients, thereby separating its frequency components based on magnitude (Quilty and Adamowski, 2021). In the initial decomposition that utilizes highpass and lowpass filters and inverse DWT, the original signal is decomposed into the detail component (D_{1}) and the approximation component (A_{1}). Subsequently, the approximation component (A_{1}) resulting from this initial stage is furthermore decomposed into D_{2} and A_{2}, and so on for successive levels. This process is conducted from highpass and lowpass filters followed by a downsampling operator:
Therefore, the streamflow time series is decomposed into the approximation coefficients and detail coefficients (Talukder et al., 2020):
in which cA_{l}[t] is the coefficient of approximation, cD_{l}[t] is the coefficient of detail, the subscript l represents the decomposition level, L is the lowpass filter and H is the highpass filter. The inverse DWT is used to obtain the approximation components and detail components (Guo et al., 2022):
in which IDWT is the inverse DWT, A_{l} is approximation component and D_{l} is detail component in level l.
For reanalysis and observed streamflow time series, the decomposition is denoted as
in which d_{t} is the reanalysis, q_{t} is the observed streamflow and l_{m} is the maximum decomposition level. The subscripts d and q respectively represent reanalysis and observed streamflow.
The DWT captures time series information at multiple scales in the time–frequency domain, with each scale corresponding to a specific period (Joo and Kim, 2015; Manikanta and Vema, 2022). Specifically, the approximation and detail components at the decomposition level l correspond to the timescale of 2^{l} d (Nalley et al., 2012).
2.3 Verification of decomposed series
The KGE stands out as a widely used verification metric to evaluate the model performance (Frame et al., 2021; Huang and Zhao, 2022; Zhao et al., 2022b). It indicates the performance of original time series and approximation and detail components. When evaluating the performance of original time series, the KGE is calculated as follows:
As can be seen, the KGE_{o} is comprised of three components, namely, the Pearson correlation coefficient r_{o}, the bias ratio β_{o} and the variability ratio γ_{o}:
in which μ is the mean streamflow and σ is the streamflow standard deviation. The subscripts d and q respectively represent reanalysis and observed streamflow. The KGE ranges from −∞ to 1, with a perfect value of 1.
To investigate the relationship between reanalysis and observations, it is necessary to extract the corresponding grid cell for each hydrometric station. The grid cell in which the hydrometric station is located may not overlap with the simulated river network in streamflow reanalysis due to the inaccuracy of the routing module in a distributed hydrological model (Chen et al., 2021). There are three steps to identify the target cell: firstly, the initial cell is located according to the latitude and longitude of the hydrometric station; secondly, the KGE between reanalysis and observed streamflow is calculated for the initial cell and its eight surrounding cells; and finally, the cell with the largest KGE is used as the target cell (Zhao et al., 2022b).
Hydrometric stations with outliers in terms of the KGE, correlation, bias ratio and variability ratio are excluded from the investigation, as outliers can deteriorate the performance of machine learning techniques (Lee and Kam, 2023). The DBSCAN, which is used to remove the outliers of KGE and its three components, offers a distinctive advantage in detecting outliers by defining clusters as dense regions separated by sparser areas (Smiti, 2020). This characteristic makes the algorithm effective in distinguishing outliers from the main clusters (Li et al., 2022). There are two key parameters in the DBSCAN, including the maximum cluster radius (ε) and the minimum number of points (MinPts; Smiti, 2020). Points within a distance ε are considered part of a dense region, while those with fewer than MinPts neighbors are treated as outliers (Li et al., 2022). Following the study conducted by Brinkerhoff et al. (2020), the “elbow”based approach is used to determine ε, and MinPts is set to 5. By setting these parameters, the DBSCAN effectively identifies and isolates outliers, preserving the integrity of the main cluster structures (Hauswirth et al., 2021).
2.4 Influences of catchment attributes
The ALEs are derived by the random forest model to showcase the influences of catchment attributes on the performance of original time series and its approximation components at different scales. The random forest model is employed to establish a predictive relationship between the performance and multiple catchment attributes. This model is well suited to capture complex relationships within the dataset through its ensemble of decision trees, which renders it an effective tool for performance prediction (Wei et al., 2023). To implement the model, the data are split into training and testing sets under the ratio of 75:25 (Naghibi et al., 2017). That is, 75 % of catchments are randomly allocated for training and the remaining 25 % for testing. The random forest model is set up by the training set with the hyperparameters tuned to optimize its prediction accuracy (Wei et al., 2023). Afterwards, the model is validated by the testing set, and the coefficient of determination (R^{2}) is calculated to evaluate its prediction accuracy based on catchment attributes.
Taking the KGE of original time series as an example, the prediction of the performance of approximation components for reanalysis using the random forest model is denoted as
in which KGE_{p} is the predicted KGE using the random forest model, RF(⋅) is the random forest model and X is the catchment attributes. The R^{2} between the predicted KGE_{p} and the calculated KGE_{o} is denoted by
in which μ is the mean KGE. The KGE_{p} and KGE_{o} represent the predicted KGE of the random forest model and the calculated KGE between reanalysis and observed streamflow, respectively.
The ALEs are used to describe how catchment attributes influence the performance of approximation components at various scales for reanalysis based on the random forest model. They illustrate how changes in one input variable impact model predictions by analyzing the differences within small quantilebased intervals (Stein et al., 2021). An advantage of the ALEs is the overcoming of the confounding effects of correlated catchment attributes (Stein et al., 2021). The ALE curves reveal whether the association is linear or exhibits more complex patterns (Teng et al., 2022). The uncentered ALE ${\widehat{\stackrel{\mathrm{\u0303}}{f}}}_{j,\mathrm{ALE}}\left(x\right)$ is formulated as follows:
in which x is the value of the catchment attribute j, and k is one of the k_{j} quantiles. By dividing the range of x, n_{j}(k) is the number of x that is in quantile N_{j}(k), z_{k,j} is the boundary values of x within that quantile, f is the output of the random forest model and ${x}_{j}^{\left(i\right)}$ is the values of catchment attribute i except for j.
The ALE ${\widehat{f}}_{j,\mathrm{ALE}}\left(x\right)$ is derived from the uncentered ALE values by subtracting its mean across all quantiles (Konapala et al., 2020):
Furthermore, the local interpretable modelagnostic explanations (LIMEs) elucidate individual predictions made by a trained blackbox machine learning model (Xiang et al., 2023). The LIMEs are used to identify the dominant catchment attribute on performance of approximation components at various scales for each catchment.
A transformation is applied to the bias and variability ratios of original time series and its approximation components when investigating the influences of catchment attributes. The bias ratio and variability ratio are transformed as follows (Poncelet et al., 2017):
in which β^{*} represents the bias ratio after transformation, and γ^{*} is the variability ratio after transformation. This operation is owing to the fact that increases in the values of bias and variability ratios do not necessarily indicate improved performance. After the transformation, both β^{*} and γ^{*} take the value of 1 to be the maximum value that indicates the best performance. Notably, this transformation does not affect the ranking of performance among catchments.
3.1 Streamflow reanalysis
The GloFASERA5 streamflow reanalysis v2.1 provides valuable hydrological time series forced by the latest global atmospheric reanalysis ERA5 (Harrigan et al., 2020). Developed jointly by the Joint Research Centre (JRC) of the European Commission, the University of Reading and the ECMWF (Harrigan et al., 2020), this streamflow reanalysis is generated by coupling the Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL) land surface model with the LISFLOOD hydrological and channel routing model (Alfieri et al., 2020; Harrigan et al., 2020). Specifically, the daily surface and subsurface runoff generated by the HTESSEL model are routed using the LISFLOOD model (Harrigan et al., 2020). The GloFASERA5 provides a spatial resolution of 0.1° at a daily time step, covering the time period from 1 January 1979 to near real time (Harrigan et al., 2020). Harrigan et al. (2020) found that the GloFASERA5 streamflow reanalysis tends to be skillful across 86 % of tested catchments and also noted that there exists considerable variability in the skill, e.g., significant positive biases in central United States and Africa.
3.2 Observed streamflow
The observed streamflow is sourced from the Catchment Attributes and Meteorology for Largesample Studies (CAMELS) dataset (Newman et al., 2015; Addor et al., 2017). An advantage of this dataset is the presentation of time series from 1980 to 2015 (Addor et al., 2017). There are 671 catchments across the continental United States (CONUS), which exhibit diverse hydrometeorological characteristics. Notably, these catchments are primarily located at headwaters, resulting in minimal influence from human activities (Stein et al., 2021). In the meantime, the CAMELS provides information on six categories of catchment attributes, including climate, geology, topography, soil, vegetation and streamflow indices (Addor et al., 2017; Stein et al., 2021). Categorical attributes are not used in the investigation of the influences on model performance (Stein et al., 2021). The influences of catchment attributes on performance of streamflow time series characteristics are investigated using 38 attributes across five categories: climate, geology, topography, soil and vegetation.
To facilitate the evaluation of streamflow reanalysis, the stations whose data length meets the requirement for the decomposition into 10 levels are selected (Nalley et al., 2012). The maximum decomposition level l_{m} is denoted by
in which v represents the number of vanishing moments of the Daubechies wavelet (set to 5), and N is the number of data points. Specifically, 661 stations with a data length exceeding 9216 d are selected for the investigation.
4.1 Approximation and detail components
The time series of streamflow reanalysis and observation along with their approximation and detail components are presented in Fig. 1. The plots are for station 6224000 in which streamflow reanalysis tends to exhibit the highest KGE value of 0.82. The approximation and detail components at level l correspond to the timescale of 2^{l} d. For example, A_{1} and A_{8} correspond to the periods of 2 and 256 d, respectively. It can be observed that the original time series of reanalysis generally captures the primary features of the observed streamflow. Under the stepwise decomposition of the streamflow time series, the KGE tends to increase from 0.48 for A_{1} to 0.62 for A_{8} and increase from −4.57 for D_{1} to 0.48 for D_{8}. This result indicates that streamflow reanalysis tends to capture seasonal and annual information more effectively than daily, weekly or monthly information. At higher decomposition levels, the series of approximation and detail components becomes smoother, owing to the filtering of shortterm noise. As the decomposition level increases, the reanalysis becomes more able to capture the information in the observation.
The KGEs of approximation and detail components across the CONUS are illustrated in Fig. 2. There are respectively 554 and 417 catchments for the approximation and detail components after removing the outliers. It can be observed that the KGEs of the approximation components tend to increase from A_{1} to A_{10} and that, by contrast, the KGEs of the detail components exhibit considerable fluctuations from D_{1} to D_{10}. The comparison between the left and right parts of Fig. 2 highlights that the detail components are more difficult to be characterized than the approximation components. This outcome is attributable to the presence of environmental noise in the original time series (de Macedo Machado Freire et al., 2019). Given that the KGEs of the detail components can drop below −2.5 in some catchments, more attention is paid to the approximation components in the subsequent analysis.
4.2 Performance across the CONUS
The KGE values of original time series and its approximation components for the 554 catchments after removing the outliers are presented in Fig. 3. In total, there are 11 spatial plots for original time series and its components after decomposition. It can be observed that the original time series tends to exhibit relatively high KGEs in the western United States and relatively low KGEs in the central United States. This observation is consistent with those by Addor et al. (2017), who found poor performances in the high plains and deserts of the southwest. In the meantime, the approximation components from A_{1} to A_{10} tend to exhibit high KGEs in the western United States and low KGEs in the central United States. This finding indicates that the KGE values of approximation components are related to the KGE values of original time series. Moreover, as the scale increases from A_{1} to A_{10}, the performance of approximation components tends to improve. The KGEs in the central United States change from negative values in A_{1} to positive values in A_{10}. That is, seasonal, annual and multiannual features tend to be better represented by streamflow reanalysis than daily, weekly and monthly features.
The KGE and its three components for the 554 catchments are illustrated by boxplots in Fig. 4. For the KGEs between streamflow reanalysis and observations, it can be observed that the local performance of streamflow reanalysis generally improves from A_{1} to A_{7} and then remains promising from A_{8} to A_{10}. Specifically, the median value of KGE is 0.02 for A_{1}, 0.09 for A_{2}, 0.19 for A_{3}, 0.24 for A_{4}, 0.29 for A_{5}, 0.36 for A_{6}, 0.47 for A_{7}, 0.43 for A_{8}, 0.42 for A_{9} and 0.40 for A_{10}. This trend is due to the fact that the correlation ratio tends towards 1 from A_{1} to A_{7}. In the meantime, it is noted that A_{7} exhibits higher KGE than the original time series. This result implies that errors in the original time series primarily stem from daily, weekly and monthly components. Focusing on the correlation, the medians of correlation for approximation components exceed 0.2, implying valuable information in multiple timescale approximations. The bias ratio remains nearly constant at each scale for approximation components. That is, the mean values of approximation components are generally similar to the mean values of the original time series.
4.3 Influences of catchment attributes
The influences of catchment attributes on the KGE and its three components are measured by the mean absolute ALEs and illustrated in Fig. 5. From the first row, it can be observed that the KGE values of original time series and its approximation components are primarily influenced by precipitation seasonality. Positive (negative) values of precipitation seasonality indicate that precipitation peaks in summer (winter). That is, the season with more precipitation has a significant impact on the KGE. Longitude and mean slope also have a significant impact on the KGE across original time series and daily, weekly, and monthly features (from A_{1} to A_{5}). In the meantime, the correlations of annual and multiannual features (from A_{7} to A_{10}) are mainly affected by the precipitation seasonality, while daily, weekly and monthly features are influenced by longitude and mean slope of the catchment. This result suggests that the influences of catchment attributes on correlation of annual and multiannual features are different from daily, weekly and monthly features. Furthermore, the bias ratio is primarily influenced by mean precipitation, and the variability ratio is mainly affected by catchment area and depth to bedrock. The geology, soils and vegetation appear to have minor impacts on the local performance of global streamflow reanalysis.
To further illustrate how catchment attributes affect the performances of original time series and its approximation components, the ALE curves are presented for the three influential attributes of precipitation seasonality, mean precipitation and mean slope of catchment. The influences of precipitation seasonality on the KGE and its three components are presented in Fig. 6. It can be observed that the relationships between the KGE and precipitation seasonality are generally nonlinear. The KGE gradually decreases with the increasing precipitation seasonality. That is, the KGE values are notably low when precipitation tends to concentrate in summer and turn out to be high when precipitation tends to concentrate in winter. The ALE curves of the daily, weekly and monthly features (from A_{1} to A_{5}) are similar to original time series, reducing towards −0.5. The seasonal, annual and multiannual features (from A_{6} to A_{10}) decrease around 0. In the meantime, the influences of precipitation seasonality on the correlation, bias and variability ratios are similar to that on the KGE. These results can be due to low evaporation in winter that results in reasonable simulations of soil moisture and baseflow processes (Poncelet et al., 2017).
The influences of mean precipitation on the KGE, correlation, bias ratio and variability ratio across different scales are illustrated in Fig. 7. The mean precipitation has a positive effect on the KGE of original time series and its approximation components, with a nonlinear increase of the KGE with rising mean precipitation, particularly for the annual and multiannual features. In the meantime, it affects the correlation, bias ratio and variability ratio of original time series positively. This result suggests that mean precipitation tends to have a consistent influences on the KGE, correlation, bias and variability ratios for the approximation components. This result can be due to the fact that rainfall–runoff processes are more linear in humid catchments than in arid catchments, leading to less variability in hydrologic states and facilitating more accurate simulations (Parajka et al., 2013).
The influences of mean slope on the KGE and its three components across different scales are shown in Fig. 8. It can be observed that there is a nonlinear relationship between the KGE and mean slope of catchment. As the mean slope increases, the KGE of original time series and its approximation components tend to increase. This result may be due to the mean slope of catchment affecting the simulation of runoff generation and infiltration (Stein et al., 2021; Massmann, 2020). It is noted that the KGE values of approximation components gradually increase when the mean slope of catchment surpasses 150. In particular, the correlation and variability ratio of original time series generally increase with the increase in the KGE. That is, the mean slope of catchment has a similar effect on the KGE, correlation and variability ratio. On the other hand, bias ratio initially decreases and then increases with the increase of mean slope. In other words, the relationship between bias ratio and mean slope of catchment is nonmonotonic.
4.4 Driving factors of each catchment
The most important attribute that influences the KGE is identified for each catchment by the LIMEs method and then illustrated by spatial plots in Fig. 9. It can be observed that the most important attributes influencing the KGE exhibit regional clustering. The KGE of original time series is primarily influenced by precipitation seasonality in the western and central United States and by depth to bedrock in the eastern United States (Addor et al., 2017; Pfister et al., 2017). The substantial differences in precipitation seasonality between the western and central United States result in significant differences in the KGE. On the other hand, the most important attribute controlling the KGE of approximation components is different from that of original time series. It can be observed that the KGE values of approximation components from A_{6} to A_{8} are primarily controlled by precipitation seasonality in the eastern United States, while the original time series is controlled by depth to bedrock. The higher depth to bedrock may exhibit larger storage values, consequently leading to higher baseflow (Pfister et al., 2017). In the meantime, the number of catchments controlled by precipitation seasonality tends to increase from A_{1} to A_{8}, with a high proportion observed in A_{6}, A_{7} and A_{8}. That is, the performance of the annual variability of streamflow reanalysis is influenced by precipitation seasonality.
Global streamflow reanalysis provides valuable information for water resources management (Alfieri et al., 2020; Harrigan et al., 2020; Yang et al., 2021). Building upon previous studies evaluating the performance of hydrological signatures derived from reanalysis and observed streamflow (Beck et al., 2017; Chen et al., 2021; Tu et al., 2024), this paper presents a novel evaluation by combining the wavelet transform with machine learning. Specifically, streamflow reanalysis and observation are respectively decomposed by the DWT into detail and approximation components at different scales. As a result, streamflow characteristics in the time–frequency domain are unraveled by extracting features and removing noise from the original signal (Manikanta and Vema, 2022). This approach provides a new perspective by paying attention to the difference between global streamflow reanalysis and observed streamflow in the time–frequency domain. The KGE generally indicates that streamflow reanalysis exhibits a robust capability to capture the information of seasonal, annual and multiannual variability, particularly the annual fluctuations. This result suggests that hydrological simulations at daily or even hourly timescales are more challenging.
Hydrological models generally exhibit different performances across different catchments (Newman et al., 2015; O'Neill et al., 2021; Tu et al., 2024). The differences can be related to heterogeneous streamflow patterns under unique combinations of climate and catchment attributes (Stein et al., 2021). Previous studies have found that model performance is related to aridity index, with generally better performance in wetter catchments compared to drier ones (Poncelet et al., 2017). In addition to aridity index, other factors are also linked to the model performance, such as impact of snow (Newman et al., 2015), catchment area (Harrigan et al., 2020), precipitation intermittency (Newman et al., 2015) and human activities (Veldkamp et al., 2018). In this paper, it is found that the KGE values of original time series and approximation components are primarily influenced by precipitation seasonality. This outcome can be due to lower evaporation in winter, when the soil moisture is higher and baseflow can be better simulated (Poncelet et al., 2017). On the other hand, the relationships between KGE and catchment attributes are nonlinear. The results highlight that the wavelet transform can facilitate the evaluation of the local performance of global streamflow reanalysis to provide more effective information.
This paper has presented a novel decomposition approach to evaluating global streamflow reanalysis by combining the widely used wavelet transform and machine learning techniques. Specifically, the reanalysis and observed streamflow are decomposed by the DWT, and then they are used to indicate the local performance of the time series characteristics in the time–frequency domain. Furthermore, the influences of catchment attributes on the performance of original time series and its approximation components at various scales are investigated using the ALEs. A largesample test is conducted for the CAMELS dataset so as to evaluate the effectiveness of GloFAS streamflow reanalysis. The results show that the streamflow reanalysis tends to characterize seasonal, annual and multiannual variabilities more efficiently than daily, weekly and monthly variabilities. Precipitation seasonality is identified to be the most important attribute influencing the KGE of original time series and its approximation components using the ALEs. The longitude, mean precipitation and mean slope also influence the performance of approximation components. On the other hand, the attributes on geology, soils and vegetation seem to have a relatively minor influence on the performance of approximation components. Overall, the evaluation of global streamflow reanalysis at different timescales using decomposition approaches provides useful information for practical applications of global streamflow reanalysis.
The GloFASERA5 streamflow reanalysis v2.1 can be downloaded from the Copernicus Climate Data Store and can be accessed at https://cds.climate.copernicus.eu/ (Harrigan et al., 2020). The CAMELS dataset can be sourced from the US National Center for Atmospheric Research and is accessible with https://gdex.ucar.edu/dataset/camels.html (Newman et al., 2015; Addor et al., 2017).
TZ and ZC designed the experiments. ZC and YT carried them out. TZ and ZC developed the model code and performed the experiments. ZC, TZ, BZ, YL and XC prepared the manuscript.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
This research is supported by the National Natural Science Foundation of China (2023YFF0804900 and 52379033) and the Guangdong Provincial Department of Science and Technology (2019ZT08G090).
This research has been supported by the Ministry of Science and Technology of the People's Republic of China, Department of Science and Technology for Social Development (grant no. 2023YFF0804900), the National Natural Science Foundation of China (grant no. 52379033), and the Guangdong Provincial Department of Science and Technology (grant no. 2019ZT08G090).
This paper was edited by Hongkai Gao and reviewed by two anonymous referees.
Abebe, S. A., Qin, T., Zhang, X., and Yan, D.: Wavelet transformbased trend analysis of streamflow and precipitation in Upper Blue Nile River basin, J. Hydrol.: Reg. Stud., 44, 101251, https://doi.org/10.1016/j.ejrh.2022.101251, 2022.
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for largesample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess2152932017, 2017.
Alfieri, L., Lorini, V., Hirpa, F. A., Harrigan, S., Zsoter, E., Prudhomme, C., and Salamon, P.: A global streamflow reanalysis for 1980–2018, J. Hydrol. X, 6, 100049, https://doi.org/10.1016/j.hydroa.2019.100049, 2020.
Beck, H. E., van Dijk, A. I. J. M., de Roo, A., Dutra, E., Fink, G., Orth, R., and Schellekens, J.: Global evaluation of runoff from 10 stateoftheart hydrological models, Hydrol. Earth Syst. Sci., 21, 2881–2903, https://doi.org/10.5194/hess2128812017, 2017.
Brinkerhoff, C. B., Gleason, C. J., Feng, D., and Lin, P.: Constraining Remote River Discharge Estimation Using ReachScale Geomorphology, Water Resour. Res., 56, e2020WR027949, https://doi.org/10.1029/2020WR027949, 2020.
Cantoni, E., Tramblay, Y., Grimaldi, S., Salamon, P., Dakhlaoui, H., Dezetter, A., and Thiemig, V.: Hydrological performance of the ERA5 reanalysis for flood modeling in Tunisia with the LISFLOOD and GR4J models, J. Hydrol.: Reg. Stud., 42, 101169, https://doi.org/10.1016/j.ejrh.2022.101169, 2022.
Chalise, D. R., Sankarasubramanian, A., Olden, J. D., and Ruhi, A.: Spectral Signatures of Flow Regime Alteration by Dams Across the United States, Earth's Future, 11, e2022EF003078, https://doi.org/10.1029/2022EF003078, 2023.
Chen, H., Liu, J., Mao, G., Wang, Z., Zeng, Z., Chen, A., Wang, K., and Chen, D.: Intercomparison of ten ISIMIP models in simulating discharges along the LancangMekong River basin, Sci. Total Environ., 765, 144494, https://doi.org/10.1016/j.scitotenv.2020.144494, 2021.
Chen, Z., Zhao, T., Tu, T., Tu, X., and Chen, X.: PairwiseIHA: A python toolkit to detect flow regime alterations for headwater rivers, Environ. Model. Softw., 154, 105427, https://doi.org/10.1016/j.envsoft.2022.105427, 2022.
de Macedo Machado Freire, P. K., Santos, C. A. G., and Lima da Silva, G. B.: Analysis of the use of discrete wavelet transforms coupled with ANN for shortterm streamflow forecasting, Appl. Soft Comput., 80, 494–505, https://doi.org/10.1016/j.asoc.2019.04.024, 2019.
Feng, D., Gleason, C. J., Lin, P., Yang, X., Pan, M., and Ishitsuka, Y.: Recent changes to Arctic river discharge, Nat. Commun., 12, 6917, https://doi.org/10.1038/s41467021272281, 2021.
Frame, J. M., Kratzert, F., Raney II, A., Rahman, M., Salas, F. R., and Nearing, G. S.: PostProcessing the National Water Model with Long ShortTerm Memory Networks for Streamflow Predictions and Model Diagnostics, J. Am. Water Resour. Assoc., 57, 885–905, https://doi.org/10.1111/17521688.12964, 2021.
Gao, H., Dong, J., Chen, X., Cai, H., Liu, Z., Jin, Z., Mao, D., Yang, Z., and Duan, Z.: Stepwise modeling and the importance of internal variables validation to test model realism in a data scarce glacier basin, J. Hydrol., 591, 125457, https://doi.org/10.1016/j.jhydrol.2020.125457, 2020.
Ghiggi, G., Humphrey, V., Seneviratne, S. I., and Gudmundsson, L.: GRUN: an observationbased global gridded runoff dataset from 1902 to 2014, Earth Syst. Sci. Data, 11, 1655–1674, https://doi.org/10.5194/essd1116552019, 2019.
Guo, J., Sun, H., and Du, B.: Multivariable Time Series Forecasting for Urban Water Demand Based on Temporal Convolutional Network Combining Random Forest Feature Selection and Discrete Wavelet Transform, Water Resour. Manage., 36, 3385–3400, https://doi.org/10.1007/s1126902203207z, 2022.
Han, J., Miao, C., Gou, J., Zheng, H., Zhang, Q., and Guo, X.: A new daily gridded precipitation dataset for the Chinese mainland based on gauge observations, Earth Syst. Sci. Data, 15, 3147–3161, https://doi.org/10.5194/essd1531472023, 2023.
Harrigan, S., Zsoter, E., Alfieri, L., Prudhomme, C., Salamon, P., Wetterhall, F., Barnard, C., Cloke, H., and Pappenberger, F.: GloFASERA5 operational global river discharge reanalysis 1979–present, Earth Syst. Sci. Data, 12, 2043–2060, https://doi.org/10.5194/essd1220432020, 2020.
Hauswirth, S. M., Bierkens, M. F. P., Beijk, V., and Wanders, N.: The potential of data driven approaches for quantifying hydrological extremes, Adv. Water Resour., 155, 104017, https://doi.org/10.1016/j.advwatres.2021.104017, 2021.
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., MuñozSabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.N.: The ERA5 global reanalysis, Q. J. Roy. Meteorol. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020.
Huang, Z. and Zhao, T.: Predictive performance of ensemble hydroclimatic forecasts: Verification metrics, diagnostic plots and forecast attributes, WIREs Water, 9, e1580, https://doi.org/10.1002/wat2.1580, 2022.
Huang, Z., Zhao, T., Xu, W., Cai, H., Wang, J., Zhang, Y., Liu, Z., Tian, Y., Yan, D., and Chen, X.: A sevenparameter BernoulliGammaGaussian model to calibrate subseasonal to seasonal precipitation forecasts, J. Hydrol., 610, 127896, https://doi.org/10.1016/j.jhydrol.2022.127896, 2022.
Joo, T. W. and Kim, S. B.: Time series forecasting based on wavelet filtering, Exp. Syst. Appl., 42, 3868–3874, https://doi.org/10.1016/j.eswa.2015.01.026, 2015.
Konapala, G., Kao, S.C., Painter, S. L., and Lu, D.: Machine learning assisted hybrid models can improve streamflow simulation in diverse catchments across the conterminous US, Environ. Res. Lett., 15, 104022, https://doi.org/10.1088/17489326/aba927, 2020.
Lane, S. N.: Assessment of rainfallrunoff models based upon wavelet analysis, Hydrol. Process., 21, 586–607, https://doi.org/10.1002/hyp.6249, 2007.
Lee, E. and Kam, J.: Deciphering the black box of deep learning for multipurpose dam operation modeling via explainable scenarios, J. Hydrol., 626, 130177, https://doi.org/10.1016/j.jhydrol.2023.130177, 2023.
Li, Z., Gao, S., Chen, M., Gourley, J. J., and Hong, Y.: Spatiotemporal Characteristics of US Floods: Current Status and Forecast Under a Future Warmer Climate, Earth's Future, 10, e2022EF002700, https://doi.org/10.1029/2022EF002700, 2022.
Lin, P., Pan, M., Beck, H. E., Yang, Y., Yamazaki, D., Frasson, R., David, C. H., Durand, M., Pavelsky, T. M., Allen, G. H., Gleason, C. J., and Wood, E. F.: Global Reconstruction of Naturalized River Flows at 2.94 Million Reaches, Water Resour. Res., 55, 6499–6516, https://doi.org/10.1029/2019WR025287, 2019.
Liu, L., Zhou, L., Gusyev, M., and Ren, Y.: Unravelling and improving the potential of global discharge reanalysis dataset in streamflow estimation in ungauged basins, J. Clean. Product., 419, 138282, https://doi.org/10.1016/j.jclepro.2023.138282, 2023.
Manikanta, V. and Vema, V. K.: Formulation of Wavelet Based MultiScale MultiObjective Performance Evaluation (WMMPE) Metric for Improved Calibration of Hydrological Models, Water Resour. Res., 58, e2020WR029355, https://doi.org/10.1029/2020WR029355, 2022.
Massmann, C.: Identification of factors influencing hydrologic model performance using a topdown approach in a large number of U.S. catchments, Hydrol. Process., 34, 4–20, https://doi.org/10.1002/hyp.13566, 2020.
Montoya, R., Poudel, B. P., Bidram, A., and Reno, M. J.: DC microgrid fault detection using multiresolution analysis of traveling waves, Int. J. Elect. Power Energ. Syst., 135, 107590, https://doi.org/10.1016/j.ijepes.2021.107590, 2022.
Munia, H. A., Guillaume, J. H. A., Wada, Y., Veldkamp, T., Virkki, V., and Kummu, M.: Future Transboundary Water Stress and Its Drivers Under Climate Change: A Global Study, Earth's Future, 8, e2019EF001321, https://doi.org/10.1029/2019EF001321, 2020.
MuñozSabater, J., Dutra, E., AgustíPanareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., RodríguezFernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.N.: ERA5Land: a stateoftheart global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, https://doi.org/10.5194/essd1343492021, 2021.
Naghibi, S. A., Ahmadi, K., and Daneshi, A.: Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping, Water Resour. Manage., 31, 2761–2775, https://doi.org/10.1007/s1126901716603, 2017.
Nalley, D., Adamowski, J., and Khalil, B.: Using discrete wavelet transforms to analyze trends in streamflow and precipitation in Quebec and Ontario (1954–2008), J. Hydrol., 475, 204–228, https://doi.org/10.1016/j.jhydrol.2012.09.049, 2012.
Newman, A. J., Clark, M. P., Sampson, K., Wood, A., Hay, L. E., Bock, A., Viger, R. J., Blodgett, D., Brekke, L., Arnold, J. R., Hopson, T., and Duan, Q.: Development of a largesample watershedscale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance, Hydrol. Earth Syst. Sci., 19, 209–223, https://doi.org/10.5194/hess192092015, 2015.
O'Neill, M. M. F., Tijerina, D. T., Condon, L. E., and Maxwell, R. M.: Assessment of the ParFlow–CLM CONUS 1.0 integrated hydrologic model: evaluation of hyperresolution water balance components across the contiguous United States, Geosci. Model Dev., 14, 7223–7254, https://doi.org/10.5194/gmd1472232021, 2021.
Parajka, J., Viglione, A., Rogger, M., Salinas, J. L., Sivapalan, M., and Blöschl, G.: Comparative assessment of predictions in ungauged basins – Part 1: Runoffhydrograph studies, Hydrol. Earth Syst. Sci., 17, 1783–1795, https://doi.org/10.5194/hess1717832013, 2013.
Pfister, L., MartínezCarreras, N., Hissler, C., Klaus, J., Carrer, G. E., Stewart, M. K., and McDonnell, J. J.: Bedrock geology controls on catchment storage, mixing, and release: A comparative analysis of 16 nested catchments, Hydrol. Process., 31, 1828–1845, https://doi.org/10.1002/hyp.11134, 2017.
Pokhrel, Y., Felfelani, F., Satoh, Y., Boulange, J., Burek, P., Gädeke, A., Gerten, D., Gosling, S. N., Grillakis, M., Gudmundsson, L., Hanasaki, N., Kim, H., Koutroulis, A., Liu, J., Papadimitriou, L., Schewe, J., Müller Schmied, H., Stacke, T., Telteu, C.E., Thiery, W., Veldkamp, T., Zhao, F., and Wada, Y.: Global terrestrial water storage and drought severity under climate change, Nat. Clim. Change, 11, 226–233, https://doi.org/10.1038/s4155802000972w, 2021.
Poncelet, C., Merz, R., Merz, B., Parajka, J., Oudin, L., Andréassian, V., and Perrin, C.: Processbased interpretation of conceptual hydrological model performance using a multinational catchment set, Water Resour. Res., 53, 7247–7268, https://doi.org/10.1002/2016WR019991, 2017.
Quilty, J. and Adamowski, J.: A maximal overlap discrete wavelet packet transform integrated approach for rainfall forecasting – A case study in the Awash River Basin (Ethiopia), Environ. Model. Softw., 144, 105119, https://doi.org/10.1016/j.envsoft.2021.105119, 2021.
Saraiva, S. V., Carvalho, F. de O., Santos, C. A. G., Barreto, L. C., and de Freire, P. K. M. M.: Daily streamflow forecasting in Sobradinho Reservoir using machine learning models coupled with wavelet transform and bootstrapping, Appl. Soft Comput., 102, 107081, https://doi.org/10.1016/j.asoc.2021.107081, 2021.
SenentAparicio, J., BlancoGómez, P., LópezBallesteros, A., JimenoSáez, P., and PérezSánchez, J.: Evaluating the Potential of GloFASERA5 River Discharge Reanalysis Data for Calibrating the SWAT Model in the Grande San Miguel River Basin (El Salvador), Remote Sens., 13, 3299, https://doi.org/10.3390/rs13163299, 2021.
Sichangi, A. W., Wang, L., Yang, K., Chen, D., Wang, Z., Li, X., Zhou, J., Liu, W., and Kuria, D.: Estimating continental river basin discharges using multiple remote sensing data sets, Remote Sens. Environ., 179, 36–53, https://doi.org/10.1016/j.rse.2016.03.019, 2016.
Smiti, A.: A critical overview of outlier detection methods, Comput. Sci. Rev., 38, 100306, https://doi.org/10.1016/j.cosrev.2020.100306, 2020.
Stein, L., Clark, M. P., Knoben, W. J. M., Pianosi, F., and Woods, R. A.: How Do Climate and Catchment Attributes Influence Flood Generating Processes? A LargeSample Study for 671 Catchments Across the Contiguous USA, Water Resour. Res., 57, e2020WR028300, https://doi.org/10.1029/2020WR028300, 2021.
Talukder, S., Singh, R., Bora, S., and Paily, R.: An Efficient Architecture for QRS Detection in FPGA Using Integer Haar Wavelet Transform, Circ. Syst. Sig. Process., 39, 3610–3625, https://doi.org/10.1007/s00034019013282, 2020.
Teng, L. Y., Mattar, C. N. Z., Biswas, A., Hoo, W. L., and Saw, S. N.: Interpreting the role of nuchal fold for fetal growth restriction prediction using machine learning, Sci. Rep., 12, 3907, https://doi.org/10.1038/s41598022078830, 2022.
Tu, T., Wang, J., Zhao, G., Zhao, T., and Dong, X.: Scaling from global to regional river flow with global hydrological models: Choice matters, J. Hydrol., 633, 130960, https://doi.org/10.1016/j.jhydrol.2024.130960, 2024.
Veldkamp, T. I. E., Zhao, F., Ward, P. J., de Moel, H., Aerts, J. C. J. H., Schmied, H. M., Portmann, F. T., Masaki, Y., Pokhrel, Y., Liu, X., Satoh, Y., Gerten, D., Gosling, S. N., Zaherpour, J., and Wada, Y.: Human impact parameterizations in global hydrological models improve estimates of monthly discharges and hydrological extremes: a multimodel validation study, Environ. Res. Lett., 13, 055008, https://doi.org/10.1088/17489326/aab96f, 2018.
Wei, D., Gephart, J. A., Iizumi, T., Ramankutty, N., and Davis, K. F.: Key role of planted and harvested area fluctuations in US crop production shocks, Nat. Sustain., 6, 1177–1185, https://doi.org/10.1038/s41893023011522, 2023.
Wei, S., Song, J., and Khan, N. I.: Simulating and predicting river discharge time series using a waveletneural network hybrid modelling approach, Hydrol. Process., 26, 281–296, https://doi.org/10.1002/hyp.8227, 2012.
Xiang, X., Yu, H., Wang, Y., and Wang, G.: Stable local interpretable modelagnostic explanations based on a variational autoencoder, Appl. Intel., 53, 28226–28240, https://doi.org/10.1007/s10489023049425, 2023.
Xie, J., Xu, Y.P., Gao, C., Xuan, W., and Bai, Z.: Total Basin Discharge From GRACE and Water Balance Method for the Yarlung Tsangpo River Basin, Southwestern China, J. Geophys. Res.Atmos., 124, 7617–7632, https://doi.org/10.1029/2018JD030025, 2019.
Xu, Z., Mo, L., Zhou, J., Fang, W., and Qin, H.: Stepwise decompositionintegrationprediction framework for runoff forecasting considering boundary correction, Sci. Total Environ., 851, 158342, https://doi.org/10.1016/j.scitotenv.2022.158342, 2022.
Yang, Y., Pan, M., Lin, P., Beck, H. E., Zeng, Z., Yamazaki, D., David, C. H., Lu, H., Yang, K., Hong, Y., and Wood, E. F.: Global ReachLevel 3Hourly River Flood Reanalysis (1980–2019), B. Am. Meteorol. Soc., 102, E2086–E2105, https://doi.org/10.1175/BAMSD200057.1, 2021.
Zhao, T., Chen, H., Tian, Y., Yan, D., Xu, W., Cai, H., Wang, J., and Chen, X.: Quantifying overlapping and differing information of global precipitation for GCM forecasts and El Niño–Southern Oscillation, Hydrol. Earth Syst. Sci., 26, 4233–4249, https://doi.org/10.5194/hess2642332022, 2022a.
Zhao, T., Chen, Z., Tu, T., Yan, D., and Chen, X.: Unravelling the potential of global streamflow reanalysis in characterizing local flow regime, Sci. Total Environ., 838, 156125, https://doi.org/10.1016/j.scitotenv.2022.156125, 2022b.