Seamless streamflow model provides forecasts at all scales from daily to monthly and matches the performance of non-seamless monthly model

. Subseasonal streamflow forecasts inform a multitude of water management decisions, from early flood warning to reservoir operation. ‘Seamless’ forecasts, i.e., forecasts that are reliable over a range of lead times (1-30 days) and when aggregated to multiples time scales (e.g. daily and monthly) are of clear practical interest. However, existing forecasting products are often ‘non-seamless’, i.e., designed for a single time scale and lead time (e.g. 1 month ahead). If seamless forecasts are to be a viable replacement for existing ‘non-seamless’ forecasts, it is important that they offer (at least) similar predictive 15 performance at the time scale of the non-seamless forecast. This study compares the recently developed seamless daily Multi-Temporal Hydrological Residual Error (MuTHRE) model to the (non-seamless) monthly streamflow post-processing (QPP) model that was used in the Australian Bureau of Meteorology’s Dynamic Forecasting System. Streamflow forecasts from both models are generated for 11 Australian catchments, using the GR4J hydrological model and post-processed rainfall forecasts from the ACCESS-S climate model. Evaluating monthly forecasts with key performance metrics (reliability, sharpness, bias and CRPS skill score), we find that the seamless MuTHRE model provides essentially the same performance as the non-seamless monthly QPP model for the vast majority of metrics and temporal stratifications (months and years). When this outcome is combined found large differences in the partitioning of baseflow and direct runoff. However, to the best of the authors’ knowledge, no studies have compared aggregated probabilistic forecasts from a seamless model against probabilistic forecasts from a non-seamless model. to establish whether aggregated forecasts from a seamless model achieve comparable performance to those from a non-seamless forecasting model at its native time scale . This aim is achieved by comparing the monthly forecast 85 performance of the seamless MuTHRE model (aggregated from daily to monthly) against the non-seamless monthly streamflow post-processing model of Woldemeskel et al. (2018), used in the Australian Bureau of Meteorology’s Dynamic Forecasting System. QPP models from the entire evaluation period, i.e. all months and years, with more detailed stratified performance evaluation performed for individual months and years. We also demonstrate the ability of the MuTHRE model to produce seamless forecasts, which are reliable over a range of lead times and aggregation scales. This is achieved by evaluating both (i) daily forecasts stratified by lead times from 1-28 days, and (ii) cumulative flow forecasts for periods 1-28 days. The forecast is considered ‘seamless’ if reliability metrics are similar 310 across all lead times and aggregation scales. The evaluation of cumulative flow forecasts expands on the analysis of McInerney et al. (2020), which evaluated only daily and monthly forecasts, and provides and important demonstration of seamless forecasting over the entire range of time scales between 1 and 28 days. We note that cumulative flow forecasts over 1 month correspond to monthly forecasts.


Introduction
Subseasonal streamflow forecasts (with lead times up to 30 days) can be used to inform a range of water management decisions, from flood warning and reservoir flood management at shorter lead times (e.g. up to a week) to river basin management at 30 time scales up to a month. Since different applications require forecasts over a range of lead times and time scales, recent research has focussed on producing seamless forecasts, i.e. forecasts from a single product that are reliable and sharp across multiple lead times and aggregation time scales (McInerney et al., 2020). Current forecasting practice often employs more traditional non-seamless forecasts, i.e. forecasts that are developed and applicable at only a single lead time and time scale (e.g., Mendoza et al., 2017;Gibbs et al., 2018;Woldemeskel et al., 2018). For seamless forecasts to be a viable replacement 35 for non-seamless forecasts, it is important to establish that the performance of seamless forecasts is competitive with their nonseamless counterparts at the native time scale of the latter. This is the focus of our study. McInerney et al. (2020) has shown that seamless subseasonal forecasting is achievable. That study developed the Multi-Temporal Hydrological Residual Error (MuTHRE) model, which represents seasonality, dynamic biases and non-Gaussian errors. Using a case study with 11 catchments in the Murray Darling Basin, Australia, it was concluded that 40 subseasonal forecasts generated using the MuTHRE model are indeed seamless: daily forecasts are consistently reliable (i) for lead times between 1 and 30 days, and (ii) when aggregated to monthly forecasts.

Recent research by
Seamless subseasonal forecasts, from residual error models such as MuTHRE, produce reliable forecasts over a wide range of aggregation time scales (e.g. daily to monthly) and lead times (1-30 days). In contrast, non-seamless forecasts are only available at a single time scale (e.g. monthly), and cannot be aggregated to longer time scales. The practical benefits of this are outlined 45 as follows: 1. Seamless forecasts can be used to inform decisions at a range of time scales. Forecast users can utilize seamless subseasonal forecasts to inform a wide range of decisions, including -Flood warning, where short-term forecasts (up to 1 week) on individual days are of practical interest (Cloke and Pappenberger, 2009); 50 -Hydro-electric reservoir management, which can utilize forecasts of inflow between 7 and 15 days to increase production in the electricity grid (Boucher and Ramos, 2019); -Managing reservoirs for rural water supply, where forecast volumes over long aggregation scales (e.g. weeks/months), and at long lead times (up to 1 month), are required due to long travel times (Murray-Darling Basin Authority, 2019); -Operation of urban water supply systems, where monthly forecasts are of value (Zhao and Zhao, 2014). 55 2. Seamless daily forecasts are easily integrated into river system models used for real-time decision-making. Perhaps the greatest potential for seamless forecasts is their use as input into real-time decision-making tools used by urban and rural water authorities. These tools include river system models (e.g. eWater Source, Welsh et al., 2013), which run natively at the daily scale and are used to inform resource management decisions over larger time scales. Streamflow forecasts from non-seamless models cannot be used as input into these models, since they do not match the required input time scale of 60

Forecasting models
This section describes the seamless MuTHRE daily streamflow post-processing (QPP) model and the non-seamless monthly QPP model. 95

Probability model
Both QPP models can be represented as a probability model ( t Q ) for streamflow t q at time t , where θ are parameters of the hydrological and error models (described below), t x are inputs to the hydrological model, The transformation z , with parameters z θ , is used to reduce both heteroscedasticity and skewness in residuals. We choose 115 the Box Cox transformation (e.g., Box and Cox, 1964), . The power parameter  is set to 0.2 in both QPP models (McInerney et al., 2017). For the seamless MuTHRE model, the offset parameter A is inferred as part of the hydrological model calibration (McInerney et al., 2020), while for the non-seamless monthly QPP model it is set to 1% of the mean observed monthly streamflow, i.e. 120 mon 0.01 mean( ) A  q (Woldemeskel et al., 2018).
The residual error term t  is modelled as an AR(1) process after standardization, When generating forecasts, recent streamflow observations are used to update errors via the AR(1) model, and reduce uncertainty in t  for short lead times.

Overall approach 130
The seamless MuTHRE model operates at the daily time scale.
Uncertainty due to both forecast rainfall and hydrological errors is represented using the ensemble dressing approach (Pagano et al., 2013) . The ensemble of daily raw streamflow forecasts, raw q , obtained by propagating an ensemble of rainfall forecasts through the hydrological model h (as described in Section 2.1), accounts for forecast rainfall uncertainty. A randomly generated replicate of the residual term, η, is then added to each replicate of the raw streamflow forecasts to account for 135 hydrological uncertainty. Note that this approach to capturing forecast rainfall and hydrological uncertainty relies on the rainfall forecasts being reliable in order to produce reliable streamflow forecasts (Verkade et al., 2017).

Deterministic model implementation
In the context of equation (

Residual error model implementation
The MuTHRE model assumes that the mean of the residual errort  in equation (5)varies in time due to "seasonality" and "dynamic biases" (associated with hydrologic non-stationarity), 145 The seasonality component In the MuTHRE model, the scaling factort s in equation (5) is constant (set to 1 for simplicity).

150
Innovations are modelled using a two-component mixed-Gaussian distribution where 1  and 2  are the standard deviations of the two components, and 1 w is the weight of the first component (with component means set to zero). Compared to a standard Gaussian distribution, the mixed-Gaussian distribution allows for fatter tails (i.e., excess kurtosis) in the distribution of innovations, which has been shown to improve reliability of daily forecasts at 155 short lead times (Li et al., 2016;McInerney et al., 2020).

Calibration
In the seamless MuTHRE model the residual term η represents hydrological uncertainty only, i.e. it does not include forecast rainfall uncertainty. The parameters of the residual error model

Overall approach
The non-seamless monthly QPP model operates at the monthly time scale. The ensemble of daily raw streamflow forecasts (i.e. the uncertainty from the raw streamflow replicates is discarded). Combined forecast rainfall and hydrological uncertainty is then represented through the residual term η , with replicates of η added to the deterministic forecast det,mon q to produce the monthly streamflow forecasts. 170

Deterministic model implementation
In the context of equation (2), the deterministic term in the non-seamless model at its monthly time step t is computed as follows, Tt is averaging window (range of days) corresponding to the monthly time step t. In other words, the residual error model is applied at the monthly scale and after collapsing the ensemble of raw forecasts to a single time series.

Residual error model implementation
where y  is the standard deviation of the innovations.

Calibration
In the non-seamless monthly QPP model the residual term η represents combined forecast rainfall and hydrological

Differences between the MuTHRE and monthly QPP models
The seamless MuTHRE and non-seamless monthly QPP models differ in their model structure, and hence their approach to calibration and forecasting.

Differences in model structure
The residual error models used in the MuTHRE and monthly QPP models represent different sources of uncertainty and have 195 differences in their implementations, as outlined below and shown in Figure 1a:

Differences in calibration approach 215
Both approaches use the same deterministic hydrological model calibrated using observed rainfall and streamflow data.
However, due to structural differences in their representation of residual errors, the seamless MuTHRE and non-seamless monthly QPP models differ in approach used to calibrate the residual error model parameters. The key differences are illustrated in Figure 1(b) and outlined below:  The seamless MuTHRE model uses hydrological model simulations forced by observed rainfall, while the non-seamless 220 monthly QPP model parameters uses forecast rainfall as input to the hydrological model;  The seamless MuTHRE model is calibrated using daily observed streamflow data, while the non-seamless monthly QPP model is calibrated using monthly observed streamflow.  This difference in calibration provides another practical benefit of the seamless MuTHRE model for forecast providers, in that 235 improvements in rainfall forecasting are easily integrated into the forecasting system. Since the non-seamless monthly QPP model is calibrated using forecast rainfall, it must be recalibrated whenever a new rainfall forecast is to be used. In contrast, the seamless MuTHRE model uses only observed rainfall in calibration and does not require recalibration with different forecast rainfall, allowing for easier use of improved rainfall forecast products in operational settings.

Differences in forecasting 240
The differences in the model structure and calibration approach for the seamless MuTHRE model and non-seamless monthly QPP model results in key differences in terms of the forecasts that each model can produce. Figure 1(c) illustrates these differences and shows that the seamless MuTHRE model produces daily streamflow forecasts that can be used at a range of lead times and aggregation periods, while the non-seamless monthly QPP model produces only one-month ahead monthly forecasts. 245

Catchments and Data
A set of 11 catchments from the Murray Darling Basin in Australia, consisting of four catchments on the Upper Murray River (NSW and Victoria) and seven catchments on the Goulburn River (Victoria), is used in the case study. These catchments have winter dominated rainfall which leads to higher streamflow between June and October (see

Hydrological model
The conceptual rainfall-runoff model GR4J (Perrin et al., 2003) is used as the deterministic hydrological model h (introduced in Section 2.1) for simulating daily streamflow from rainfall and PET inputs. GR4J has been widely used and evaluated over diverse catchment climatologies and physical characteristics (Perrin et al., 2003;Hunter et al., 2021). GR4J represents processes of interception, infiltration and percolation, and has four calibration parameters: 1 x is the capacity of the production 265 store (mm), 2 x is the water exchange coefficient (mm), 3 x is the capacity of the routing store (mm), and 4 x is the time parameter of the unit hydrograph (days).

Performance metrics
Streamflow forecasts are evaluated using numerical metrics for the following attributes: 285 Reliability, which refers to statistical consistency between the forecast distribution and observations, is evaluated using the reliability metric of Evin et al. (2014) (which is based on the predictive quantile-quantile plot). Lower metric values are better, with 0 indicating perfect reliability, and 1 being worst reliability.
Sharpness refers to the spread of the forecast distribution, with sharper forecasts those with lower uncertainty. We use the sharpness metric of McInerney et al. (2020), which is based on the ratio of the average 90% inter-quantile range (IQR) of the 290 forecasts and a climatological distribution (described below). Lower values are better, with 0 representing a deterministic forecast (with no uncertainty) and 1 representing the same sharpness as climatology.
Volumetric bias refers to the long-term water balance error. It is quantified using the metric of McInerney et al. (2017) as the relative absolute difference between total observed streamflow and the total forecast streamflow (averaged over the forecast replicates). Lower values are better, with 0 representing unbiased forecasts. 295 Combined performance is quantified using the continuous ranked probability score (CRPS) (Hersbach, 2000). We express this metric as a skill score (CRPSS) relative to the climatological distribution. Higher CRPSS values are better, with a value of 1 indicating a perfectly accurate deterministic forecast, and 0 indicating the same skill as the climatological distribution.
The climatological distribution represents the distribution of daily streamflow for a given time of the year based solely on previously observed streamflow at that time of the year. The climatological distribution is constructed using a 29 day moving-300 window approach, described in detail in McInerney et al. (2020).

Aggregation and stratification
The main aim of this paper is to compare the performance of the seamless MuTHRE model and the non-seamless monthly QPP model at the monthly scale. The monthly MuTHRE forecasts are obtained by aggregating daily forecasts to the monthly scale. 305 Overall evaluation of monthly forecasts is performed using data from the entire evaluation period, i.e. all months and years, with more detailed stratified performance evaluation performed for individual months and years.
We also demonstrate the ability of the MuTHRE model to produce seamless forecasts, which are reliable over a range of lead times and aggregation scales. This is achieved by evaluating both (i) daily forecasts stratified by lead times from 1-28 days,

Page 13 of 22
Evaluation of performance differences between QPP models 315 Forecast performance for the two models is compared across multiple catchments using practical significance tests, as described next. For each combination of performance metric (e.g., reliability, CRPSS) and stratification (e.g., month, year), a statistical test is used to determine whether differences in metric values over the range of catchments are of practical relevance.
Statistical tests are performed using the paired Wilcoxon signed rank test (Bauer, 1972), with controls applied to reduce the false discovery rate (Benjamini and Hochberg, 1995;Wilks, 2006). Practically relevant differences are taken as 20% of the 320 median metric value for the non-seamless monthly QPP model (following McInerney et al., 2020).     days over all catchments. Again, we see that reliability is relatively constant over all lead times, with median metric values between 0.04 and 0.06 (Figure 4b). We also note that sharpness, volumetric and CRPSS metrics are typically better for 350 cumulative forecasts than daily forecasts (compare left and right columns in Figure 4).

Demonstration of seamless forecasting capabilities of the MuTHRE model
In summary, the forecasts from the MuTHRE model are seamless, since they are reliable over (a) the range of lead times, and (b) multiple aggregation scales, from the shortest scale of 1 day, to the longest of 1 month, and everything in between. This confirms and extends previous findings in McInerney et al. (2020). In contrast to the seamless MuTHRE model, the nonseamless monthly QPP model does not have the capability to produce forecasts of daily streamflow and cumulative flows for 355 time periods less than one month. The monthly forecasts from the MuTHRE and monthly QPP models are compared in Figure 6 in terms of overall performance (left column), and when stratified by month (middle column), and year (right column). The key findings are as follows.

Comparison between monthly forecasts
Reliability. Figure 6a shows that the overall reliability of monthly forecasts from the MuTHRE and monthly QPP models is similar. While the median metric value is 0.06 for the seamless MuTHRE model is larger than the median value of 0.04 for the non-seamless monthly QPP model, these differences are not practically significant. Figure 6b shows that when performance 370 is stratified by month, the two models have similar reliability (i.e. not practically significant) for all 12 months. When stratified by year, the MuTHRE model offers similar reliability to the monthly QPP model for 20 out of the 22 years, with the nonseamless monthly QPP model offering practically significant improvements in 2 of the 22 years (Figure 6c). Figure 6d shows that the overall sharpness of monthly forecasts from the seamless MuTHRE model is slightly better than the non-seamless monthly QPP model (median metric values of 0.44 c.f. 0.49), although differences are not 375 practically significant. Figure 6e shows that when sharpness is stratified by month, the seamless MuTHRE model provides practically significant improvement in 1 month (September) and similar performance in the other 11 months. Figure 6f shows sharpness stratified by year is similar for both models for all years.

Sharpness.
Volumetric bias. Figure 6g shows that the overall volumetric bias from both models is similar (median of 0.01). Figure 6h shows that when stratified by month, the MuTHRE model produces similar/better performance in all months, with practically 380 significant improvements in December and similar performance in the remaining 11 months. Figure 6i shows that when stratified by year, the MuTHRE model produces similar/better performance in 19 out of 22 years, with practical significant improvements in one year, while the monthly QPP model provides practically significant improvements in 3 years.

CRPSS.
In terms of overall CRPSS, Figure 6j shows that the seamless MuTHRE model (median metric value of 0.45) provides slight improvement over the non-seamless monthly QPP model (median metric value of 0.42), although these differences are 385 not practically significant. Figure 6k shows that when stratified by month, the seamless MuTHRE model actually provides similar performance in all 12 months. Figure 6i shows that when performance is stratified by year, the seamless MuTHRE model actually provides practically significant improvements in CRPSS in 2 out of 22, and similar performance in the remaining 20 years.
In summary, aggregated forecasts from the seamless MuTHRE model offer similar (not practically significant), and in some 390 cases superior performance, to forecasts from the non-seamless monthly QPP model, for the vast majority of performance metrics and stratifications considered in this study.

Interpretation of key findings
The empirical results show that the seamless MuTHRE model achieves essentially the same performance as the non-seamless 400 monthly QPP model at the monthly time scale, and even provides improvement in some aspects. At first glance, this outcome may seem surprising for the following reasons: -The seamless MuTHRE model is required to produce reliable forecasts over a range of lead times and aggregations scales, whereas the non-seamless monthly QPP model is only required to produce monthly streamflow forecasts.
-The seamless MuTHRE model is calibrated at the daily scale, using only observed daily streamflow during calibration, 405 while the non-seamless monthly QPP model is calibrated to match the observed monthly streamflow.
-The seamless MuTHRE model does not see the forecast rainfall during calibration, whereas the non-seamless monthly QPP model does.
The subsections below describe how the seamless MuTHRE model is able to achieve comparable/better performance than the non-seamless monthly QPP model despite these apparent challenges. 410

Time scale of forecasting/calibration
The seamless MuTHRE model produces daily forecasts that can be aggregated from time scales of one day to one month, whereas the non-seamless monthly QPP model produces forecasts only at the monthly scale. One might expect the enhanced capability obtained from the seamless MuTHRE model to come at some cost in performance at the monthly scale.
Encouragingly, this is not the case. 415 The ability to reliably aggregate daily forecasts to the monthly scale demonstrates that the seamless MuTHRE model is adequately capturing temporal persistence in daily forecasts. The MuTHRE model represents temporal persistence in hydrological errors using the daily AR(1) model, and the (30-day) dynamic bias component. This is important because neglecting temporal persistence in hydrological errors can result in an underestimation of hydrological uncertainty for aggregated predictions/forecasts (Evin et al., 2014). The reliability of aggregated forecasts also suggests that the (post-420 processed) rainfall forecasts are capturing the temporal persistence of observed rainfall required to produce reliable monthly rainfall forecasts (see Section 5.1.2). The seamless MuTHRE model is not calibrated to optimize performance at the monthly scale, since it uses only observed daily streamflow during calibration. On the other hand, the non-seamless monthly QPP model is calibrated to match the observed monthly streamflow, which could lead to improved performance at the monthly scale compared with the seamless MuTHRE 425 model. The performance at the monthly scale is particular impressive since monthly data is not used in calibration.

Use of observed vs forecast rainfall used in calibration
The residual error model in the non-seamless monthly QPP model represents combined rainfall and hydrological uncertainty.
It uses forecast rainfall during calibration, and can (in theory) correct for biases and under/over-dispersion in rainfall forecasts.
In contrast, the seamless MuTHRE model represents only hydrological uncertainty, and is calibrated using observed rainfall. 430 Uncertainty due to forecast rainfall is represented by propagating rainfall forecasts through the hydrological model. Since the https://doi.org/10.5194/hess-2021-589 Preprint. Discussion started: 26 January 2022 c Author(s) 2022. CC BY 4.0 License.