Seamless streamflow forecasting at daily to monthly scales: MuTHRE lets you have your cake and eat it too

. Subseasonal streamflow forecasts inform a multitude of water management decisions, from early flood warning to reservoir operation. ‘Seamless’ forecasts, i.e., forecasts that are reliable and sharp over a range of lead times (1-30 days) and aggregation time scales (e.g. daily to monthly) are of clear practical interest. However, existing forecast products are often ‘non-seamless’, i.e., developed and applied for a single time scale and lead time (e.g. 1 month ahead). If seamless forecasts are to be a viable replacement for existing ‘non-seamless’ forecasts, it is important that they offer (at least) similar predictive 15 performance at the time scale of the non-seamless forecast. This study compares forecasts from two probabilistic streamflow post-processing (QPP) models: the recently developed seamless daily Multi-Temporal Hydrological Residual Error (MuTHRE) model and the more traditional (non-seamless) monthly QPP model used in the Australian Bureau of Meteorology’s Dynamic Forecasting System. Streamflow forecasts from both post-processing models are generated for 11 Australian catchments, using the GR4J hydrological model and pre-processed 20 rainfall forecasts from the ACCESS-S numerical weather prediction model. Evaluating monthly forecasts with key performance metrics (reliability, sharpness, bias and CRPS skill score), we find that the seamless MuTHRE model achieves essentially the same performance as the non-seamless monthly QPP model for the vast majority of metrics and temporal stratifications


Introduction
Subseasonal streamflow forecasts (with lead times up to 30 days) can be used to inform a range of water management decisions, from flood warning and reservoir flood management at shorter lead times (e.g. up to a week) to river basin management at time scales up to a month.The uncertainty in these forecasts is often represented using ensemble and probabilistic methods.
Probabilistic streamflow forecasts have traditionally been developed and applied at only a single lead time and time scale (e.g., Souza Filho and Lall, 2003;Pal et al., 2013;Hidalgo-Muñoz et al., 2015;Mendoza et al., 2017;Gibbs et al., 2018).However, since different applications require forecasts over a range of lead times and time scales, recent research has focussed on producing seamless forecasts, i.e. forecasts from a single product that are (statistically) reliable and sharp across multiple lead times and aggregation time scales (McInerney et al., 2020).For seamless forecasts to be a viable replacement for more traditional non-seamless forecasts (i.e.forecasts for a single lead time and time scale) it is important to establish that the performance of seamless forecasts is competitive with their non-seamless counterparts at the native time scale of the latter.
Recent research by McInerney et al. (2020) has shown that seamless subseasonal forecasting is achievable.McInerney et al. (2020) developed the Multi-Temporal Hydrological Residual Error (MuTHRE) model for post-processing daily streamflow forecasts in order to improve reliability across a range of time scales.Using a case study with 11 catchments in the Murray Darling Basin, Australia, it was concluded that subseasonal forecasts generated using the MuTHRE streamflow postprocessing model are indeed seamless: daily forecasts are consistently reliable (i) for lead times between 1 and 30 days, and (ii) when aggregated to the monthly scale.
Seamless subseasonal forecasts are reliable over a wide range of aggregation time scales (e.g.daily to monthly) and lead times (1-30 days).In contrast, non-seamless forecasts are either: (i) only available at a single time scale (e.g. a post-processing model developed directly at the monthly scale does not generate daily forecasts), or (ii) cannot be reliably aggregated to longer time scales (e.g., from daily to monthly).The practical benefits of seamless forecasts are as follows: 1. Seamless forecasts can be used to inform decisions at a range of time scales.Forecast users can utilize seamless subseasonal forecasts to inform a wide range of decisions, including -Flood warning, where short-term forecasts (up to 1 week) on individual days are of practical interest (Cloke and Pappenberger, 2009); -Managing hydropower systems, which can utilize forecasts of inflow between 7 and 15 days to increase production in the electricity grid (Boucher and Ramos, 2019); -Managing reservoirs for rural water supply, where forecast volumes over long aggregation scales (e.g.weeks/months), and at long lead times (up to 1 month), are required due to long travel times (Murray-Darling Basin Authority, 2019); -Operation of urban water supply systems, where monthly forecasts are of value (Zhao and Zhao, 2014).
2. Seamless daily forecasts are easily integrated into river system models used for real-time decision-making.Perhaps the greatest potential for seamless forecasts is their use as input into real-time decision-making tools used by urban and rural water authorities.These tools include river system models (e.g.eWater Source, Welsh et al., 2013), which run natively at the daily scale and are used to inform resource management decisions over larger time scales.Non-seamless streamflow forecasts cannot be used as input into these models, because they do not match the time scale of the river system model, and are not reliable when aggregated to longer time scales (e.g. from daily to monthly).
3. Seamless forecasts simplify forecasting systems, as a single seamless product can serve a range of forecast requirements at different time scales.As forecasts are often required at multiple time scales (e.g.daily to monthly), nonseamless forecast strategies require developing models (e.g.hydrological, statistical or post-processing) for each time scale of interest (e.g. a daily model and a monthly model).Seamless forecasts offer practical benefits to forecast providers, e.g. the Australian Bureau of Meteorology, as they reduce the need to develop multiple non-seamless forecasts for different applications.A seamless forecasting system offers a single product that can serve a wide range of forecast requirements.
These practical benefits of seamless forecasts provide a clear motivation for their development and use.However, for seamless forecasts to be a viable replacement for non-seamless forecasts, it is important that they do not come at the cost of a substantial loss of performance at the native time scale of the non-seamless forecast.For example, if aggregated forecasts from a seamless daily model were considerably worse than monthly forecasts from an existing non-seamless model, users of the monthly forecasts would prefer to continue using forecasts from the non-seamless model.In general, one might expect forecasts from a non-seamless model, developed and calibrated at single time scale, to provide superior performance compared to forecasts from a seamless model calibrated at shorter time scale and then aggregated.While the non-seamless model has only 'one job to do', which is to provide quality forecasts at single time scale, the seamless model is expected to produce good performance over a range of lead times and aggregation time scales.Herein lies a major challenge of seamless forecasting.
Our interest in comparing the performance of aggregated seamless forecasts with non-seamless forecasts at their native time scale has similarities to previous research in aggregating deterministic streamflow predictions.For example, Wang et al. (2011) found that the WAPABA monthly rainfall-runoff model produced similar/better performance than aggregated predictions from the SIMHYD/AWBM daily rainfall-runoff models, despite only using observed monthly forcing data.Yang et al. (2016) compared daily and sub-daily versions of the SWAT model (with daily and sub-daily observed rainfall inputs) and found large differences in the partitioning of baseflow and direct runoff.However, to the best of the authors' knowledge, no studies have compared aggregated probabilistic forecasts from a seamless model against probabilistic forecasts from a non-seamless model.
The aim of this study is to establish whether aggregated forecasts from a (probabilistic) seamless model achieve comparable performance to those from a non-seamless (probabilistic) model at its native time scale.This aim is achieved by comparing the monthly forecast performance of the seamless MuTHRE post-processing model (aggregated from daily to monthly) against the non-seamless monthly streamflow post-processing model used in the Australian Bureau of Meteorology's Dynamic Forecasting System (Woldemeskel et al., 2018).
The remainder of the paper is organized as follows.Section 2 describes the forecasting methods, with a focus on the streamflow post-processing models, Section 3 introduces the case study methods, Sections 4 and 5 present and discuss case study results, and Section 6 provides concluding remarks.

Forecasting methods
The forecasting methods investigated in this study share a similar general structure but differ in the streamflow post-processing model.To facilitate the presentation, this section is organised as follows.The general structure is outlined in Section 2.1.
Common features of the post-processing models are described in Section 2.2.Specific details of the MuTHRE and monthly QPP models are described in Sections 2.3 and 2.4.

General structure
The forecasting methods in this study employ a deterministic hydrological model forced with an ensemble of rainfall forecasts and combined with a streamflow post-processing (QPP) model.This general structure is illustrated schematically in Figure 1 and detailed next.  .Second, a probabilistic streamflow postprocessing model is applied to the raw forecasts to generate the (post-processed) streamflow forecasts The streamflow post-processing models are constructed using the residual error modelling approach.They comprise a deterministic component and a residual error model.The residual error model employs a streamflow transformation to represent the heteroscedasticity and skew of the errors, an autoregressive term to represent error persistence, and components to capture other features of errors such as seasonality.
We consider two forecasting methods, which differ in the structure and details of the streamflow post-processing model.A schematic representation of these models is given in Figure 2a.
 Seamless MuTHRE streamflow post-processing model (McInerney et al., 2020).The residual error model is formulated at the daily scale and is applied directly to (daily) raw streamflow forecasts.Conceptually, the ensemble of raw streamflow forecasts accounts for forecast rainfall uncertainty and the residual error model accounts for hydrological uncertainty.
 Non-seamless monthly streamflow post-processing (QPP) model (Woldemeskel et al., 2018).The residual error model is formulated at the monthly scale.It is applied to raw streamflow forecasts aggregated to the monthly scale and collapsed to their medium value.Conceptually, the residual error model accounts for both hydrological and forecast rainfall uncertainty.
The post-processing models also differ in their parameter estimation (calibration) procedure.Figure 2b shows that the MuTHRE model is calibrated using observed daily rainfall and observed daily streamflow, whereas the monthly QPP model is calibrated to forecast daily rainfall and observed monthly streamflow (see Sections 2.3.4 and 2.4.4 for details).
Figure 2c illustrates the key operational distinction between the models.The MuTHRE model produces seamless daily streamflow forecasts that can be used at a range of lead times and aggregation periods (e.g.daily, weekly, fortnightly, monthly).
In contrast, the monthly QPP model produces only one-month ahead non-seamless monthly forecasts.
The next section presents common features of the post-processing models, before moving to specific model details.

Deterministic component
The deterministic component det t q is obtained from the raw streamflow forecasts (Figure 2a).The deterministic component used in the seamless MuTHRE and non-seamless monthly streamflow post-processing approaches are detailed in Sections 2.3.2 and 2.4.2 respectively.

Residual error model
The residual error model describing the relationship between the probabilistic streamflow estimate Qt and the deterministic component det t q is formulated as additive in transformed space, det ; ) ( ; ) ( where t  is a random residual error term.The transformation z , with parameters z θ , is used to reduce the heteroscedasticity and skewness in residuals.We choose the Box Cox transformation (e.g., Box and Cox, 1964), . The power parameter  is set to 0.2 in both streamflow post-processing models (McInerney   et al., 2017).In the seamless MuTHRE model, the offset parameter A is inferred as part of the hydrological model calibration (McInerney et al., 2020), while in the non-seamless monthly QPP model it is set to 1% of the mean observed monthly streamflow, i.e. mon 0.01 mean( ) A  q (Woldemeskel et al., 2018).
The residual error term t  is standardized and then modelled as an AR(1) process, ( ) /

Model structure
The seamless MuTHRE post-processing model operates at the daily time scale.Uncertainty due to forecast rainfall and hydrological errors is represented using the ensemble dressing approach (Pagano et al., 2013).The ensemble of daily raw streamflow forecasts, raw q , obtained by propagating an ensemble of rainfall forecasts through the hydrological model h , accounts for forecast rainfall uncertainty.A randomly generated replicate of the residual term, η , is then added to each of the foc N raw streamflow forecast ensemble members to account for hydrological uncertainty.This produces an ensemble of foc N post-processed streamflow forecasts.See schematic in Figure 2a.Note that this approach to capturing forecast rainfall and hydrological uncertainty requires the rainfall forecasts to be reliable in order to produce reliable streamflow forecasts (Verkade et al., 2017).

Deterministic component
In the context of equation ( 1 i.e., the residual error model is applied directly to each ensemble member of the raw forecasts (Figure 2a).

Residual error model
The MuTHRE model assumes that the mean of the residual errort  in equation ( 3) -varies in time due to 'seasonality' and 'dynamic biases' (associated with hydrologic non-stationarity), The seasonality component The scaling factort  in equation ( 3) -is constant (set to 1 for simplicity).
Innovations are modelled using a two-component mixed-Gaussian distribution where 1  and 2  are the means of the two components, which are set to zero, 1  and 2  are the standard deviations of the components, and 1 w is the weight of the first component.Compared to a standard Gaussian distribution, the mixed-Gaussian distribution allows for fattier tails (i.e., excess kurtosis) in the distribution of innovations, which has been shown to improve reliability of daily forecasts at short lead times (Li et al., 2016).Note that the mixed-Gaussian distribution does not offer benefits at longer lead times, nor when aggregating forecasts to the monthly scale (McInerney et al., 2020).

Calibration of residual error model
The parameters of the residual error model , , } { w  are estimated using maximum-likelihood.Full details of the calibration procedure are provided in McInerney et al. (2020).

Model structure
The non-seamless monthly QPP model operates at the monthly time scale.The raw forecasts are aggregated from daily to monthly scale and collapsed to their median value yielding det,mon q , i.e., the uncertainty from the raw streamflow ensemble is discarded.The combined forecast rainfall uncertainty and hydrological uncertainty are represented through the residual error term η .Monthly streamflow forecasts are obtained from det,mon q by adding foc N replicates of η .See schematic in Figure 2a.

Deterministic component
The deterministic component in the non-seamless model at its monthly time step t is computed as follows,

 
raw,mon( ) raw( ) * average ; * ( ) where () Tt is averaging window (range of days) corresponding to the monthly time step t.

Residual error model
The residual error model is applied at the monthly scale after collapsing the ensemble of raw forecasts to a single time series.
The monthly residual error model captures seasonality in residuals by varying the mean

Calibration of residual error model
The parameters of the monthly residual error model   (i) Monthly deterministic forecasts det,mon q obtained using forecast rainfall as described in Section 2.4.2; (ii) Monthly observed streamflow mon q .
All parameters are calibrated using the method-of-moments.Full details are provided in Woldemeskel et al. (2018).

Catchments and Data
The case study uses a set of 11 catchments from the Murray Darling Basin in Australia, including four catchments on the Upper Murray River (NSW and Victoria) and seven catchments on the Goulburn River (Victoria).These catchments have winter dominated rainfall which leads to higher streamflow between June and October (see Figure 3), and have fewer than 5% Rainfall forecasts are provided by the Australian Community Climate Earth-System Simulator -Seasonal (ACCESS-S) (Hudson et al., 2017).The ACCESS-S rainfall forecasts are pre-processed using the method of Schepen et al. (2018) in order to reduce biases and improve the reliability in comparison to observed rainfall.An ensemble of 100 pre-processed rainfall forecasts that begin on the first day of each month and extend out to a maximum lead time of 1 month are used.

Hydrological model
The conceptual rainfall-runoff model GR4J (Perrin et al., 2003) is used as the deterministic hydrological model h for simulating daily streamflow from rainfall and PET inputs (see Section 2.1).GR4J has been widely used and evaluated over diverse catchment climatologies and physical characteristics (Perrin et al., 2003;Hunter et al., 2021).GR4J represents the processes of interception, infiltration and percolation, and has four calibration parameters: 1 x is the capacity of the production store (mm), 2 x is the water exchange coefficient (mm), 3 x is the capacity of the routing store (mm), and 4 x is the time parameter of the unit hydrograph (days).

Calibration/evaluation procedure
Calibration of model parameters and evaluation of forecasts is performed using a leave-one-year-out cross validation procedure (McInerney et al., 2020).For each calendar year j , hydrological and residual error model parameters are calibrated using observed streamflow data from the entire evaluation period, except for year j and the subsequent years 1 j  to 4 j  (which are excluded to reduce the influence of system memory on model evaluation, as described in Pokhrel et al., 2013).
Hydrological model parameters are estimated using likelihood maximisation based on the BC0.2 error model (McInerney et al., 2020), implemented using a quasi-Newton optimization algorithm run with 100 independent multistarts (Kavetski and Clark, 2010).Methods for estimating residual error model parameters are described in Sections 2.3.4 and 2.4.4.
Note that in this work we do not consider parametric uncertainty (in the hydrological and residual error models), which is expected to be a (relatively) minor contributor to total forecast uncertainty given the long data period used in the estimation; this simplification is common in contemporary forecasting implementations (e.g., Engeland and Steinsland, 2014;Verkade et al., 2017).
For each year j , calibrated hydrological and error models are used to generate an ensemble of 100 streamflow forecasts.Daily forecasts from the MuTHRE model begin on the first day of each month, and extend out to a maximum lead time of 1 month (which is the same as the rainfall forecasts).This calibration/forecasting process is repeated for all 22 years, resulting in 22 sets of one-year forecasts, which are subsequently merged into a single 22-year forecast to facilitate evaluation against streamflow observations.

Performance metrics
Streamflow forecasts are evaluated using numerical metrics for the following attributes: Reliability refers to the degree of statistical consistency between the forecast distribution and the observed data.It is evaluated using the reliability metric of Evin et al. (2014).Lower metric values are better, with 0 representing perfect reliability, and 1 representing the worst reliability.
Sharpness refers to the spread of the forecast distribution, with sharper forecasts those with lower spread.We use the sharpness metric of McInerney et al. (2020), which is based on the ratio of the average 90% inter-quantile range (IQR) of the forecasts and a climatological distribution (described below).Lower values are better, with 0 representing a deterministic forecast (with no spread) and 1 representing the same sharpness as climatology.In contrast to the other attributes considered here, sharpness is a property of the forecast only and does not depend on the observed data.
Volumetric bias refers to the long-term water balance error.It is quantified using the metric of McInerney et al. (2017) as the relative absolute difference between total observed streamflow and the total forecast streamflow (averaged over the forecast ensemble).Lower values are better, with 0 representing unbiased forecasts.
Combined performance is quantified using the continuous ranked probability score (CRPS).The CRPS is defined as the sum of squared differences between forecast cumulative distribution function (CDF) and the empirical CDF of the observation.
Note that the CRPS can be decomposed into terms representing individual performance aspects, namely reliability, and uncertainty/resolution (related to sharpness) (Hersbach, 2000).We express this metric as a skill score (CRPSS) relative to the climatological distribution.Higher CRPSS values are better, with a value of 1 representing a perfectly accurate deterministic forecast, and 0 representing the same skill as the climatological distribution.
The climatological distribution represents the distribution of daily streamflow for a given time of the year based solely on previously observed streamflow at that time of the year.The climatological distribution is constructed using a 29 day movingwindow approach, described in detail in McInerney et al. (2020).

Aggregation and stratification
The study focuses on the performance of the streamflow post-processing models at the monthly scale.The monthly MuTHRE forecasts are obtained by aggregating daily forecasts to the monthly scale.The monthly QPP model generates monthly forecasts directly.
Overall evaluation of monthly forecasts is performed using data from the entire evaluation period, i.e. all months and years, with more detailed stratified performance evaluation performed for individual months and years.
We also demonstrate the ability of the MuTHRE model to produce seamless forecasts, which are reliable over a range of lead times and aggregation scales.This is achieved by evaluating both (i) daily forecasts stratified by lead times from 1-28 days, and (ii) cumulative flow forecasts for periods 1-28 days.The forecast is considered 'seamless' if reliability metrics are similar across all lead times and aggregation scales.The evaluation of cumulative flow forecasts expands on the analysis of McInerney et al. ( 2020), who evaluated only daily and monthly forecasts, and provides and important demonstration of seamless forecasting over the entire range of time scales from 1 to 28 days.We note that cumulative flow forecasts over 1 month correspond to monthly forecasts.

Evaluation of practical significance of differences between streamflow post-processing models
Forecast performance of the two streamflow post-processing models is compared across multiple catchments using practical significance tests, as described next.For each combination of performance metric (e.g., reliability) and stratification (e.g., month), a statistical test is used to determine whether differences in metric values over the range of catchments exceed a predefined margin representing practical significance (relevance).
The statistical tests are performed using the paired Wilcoxon signed rank test (Bauer, 1972), with controls applied to reduce the false discovery rate to 5%, corresponding to a confidence level of 95% (Benjamini and Hochberg, 1995;Wilks, 2006).The practical significance margin is taken as 20% of the median metric value for the non-seamless monthly QPP model (following McInerney et al., 2020).Figure 5 (left column) shows the performance of daily forecasts from the MuTHRE model for lead times of 1-28 days, evaluated over all case study catchments.The key finding from this analysis is that reliability is relatively constant over all lead times, with median metric values lying in the tight range of 0.04-0.06(Figure 5a).We also note that forecasts are sharper and have better CRPSS at short lead times, and that bias is relatively constant.

Cumulative flow forecasts
Figure 4b shows cumulative flow forecasts out to 28 days in the Biggara catchment for the representative time period.The cumulative flows based on observed streamflow lie well within the 90% probability limits of the MuTHRE forecasts for all lead times.Figure 5 (right column) shows the performance of cumulative flow forecasts from the MuTHRE model for lead times of 1-28 days over all catchments.Again, we see that reliability is relatively constant over all lead times, with median metric values between 0.04 and 0.06 (Figure 5b).We also note that sharpness, volumetric bias and CRPSS metrics are typically better for cumulative forecasts than for daily forecasts (compare left and right columns in Figure 5).In contrast to the seamless MuTHRE model, the non-seamless monthly QPP model does not have the capability to produce forecasts of daily streamflow and cumulative flows for time periods below one month.Figure 7 compares monthly forecasts from the MuTHRE and monthly QPP models in terms of overall performance (left column), and when stratified by month (middle column), and year (right column).The key findings are as follows.

Comparison of monthly forecasts
Reliability. Figure 7a shows similar overall reliability of monthly forecasts from the MuTHRE and monthly QPP models.
While the median metric value of 0.06 for the MuTHRE model is worse than the median value of 0.04 for the monthly QPP model, these differences are not practically significant (based on the test described in Section 3.4.3). Figure 7b shows that when performance is stratified by month, the two models have similar reliability (i.e.not practically significant) for all 12 months.When stratified by year, the MuTHRE model achieves similar reliability to the monthly QPP model for 20 out of the 22 years, while the monthly QPP model achieves practically significant improvements in 2 of the 22 years (Figure 7c).

Sharpness.
Figure 7d shows that the overall sharpness of monthly forecasts from the MuTHRE model is slightly better than the monthly QPP model (median metric values of 0.44 c.f. 0.49), although differences are not practically significant.Figure 7e shows that when sharpness is stratified by month, the MuTHRE model provides practically significant improvement in September and similar performance in the other 11 months.Figure 7f shows sharpness stratified by year is similar for both models for all years.
Volumetric bias. Figure 7g shows that the overall volumetric bias from both models is similar (median of 0.01).Figure 7h shows that when stratified by month, the MuTHRE model produces practically significant improvements in December and similar performance in the remaining 11 months.Figure 7i shows that when stratified by year, the MuTHRE model produces practically significant improvements in 1 year (2005), the monthly QPP model provides practically significant improvements in 3 years, with similar performance in the remaining 18 years.

CRPSS.
In terms of overall CRPSS, Figure 7j shows that the MuTHRE model (median metric value of 0.45) provides slight improvement over the monthly QPP model (median metric value of 0.42), although these differences are not practically significant.Figure 7k shows that when stratified by month, the MuTHRE model provides similar performance in all 12 months.Figure 7i shows that when performance is stratified by year, the MuTHRE model provides practically significant improvements in CRPSS in 2 out of 22, and similar performance in the remaining 20 years.
In summary, aggregated forecasts from the seamless MuTHRE model offer similar (not practically significant), and in some cases superior performance, to forecasts from the non-seamless monthly QPP model, for the vast majority of performance metrics and stratifications considered in this study.

Interpretation of key findings
The empirical results show that the seamless MuTHRE model achieves essentially the same performance as the non-seamless monthly QPP model at the monthly time scale, and even provides improvement in some aspects.At first glance, this outcome may seem surprising for the following reasons: -The seamless MuTHRE model is required to produce reliable forecasts over a range of lead times and aggregations scales, whereas the non-seamless monthly QPP model is only required to produce reliable monthly streamflow forecasts; -The seamless MuTHRE model is calibrated at the daily scale, using only observed daily streamflow during calibration, while the non-seamless monthly QPP model is calibrated to match the observed monthly streamflow; -The seamless MuTHRE model does not 'see' the forecast rainfall during calibration, whereas the non-seamless monthly QPP model does.
The subsections below describe how the seamless MuTHRE model is able to achieve comparable/better performance than the non-seamless monthly QPP model despite these apparent challenges.

Time scale of forecasting/calibration
The seamless MuTHRE model produces daily forecasts that can be aggregated from time scales of one day to one month, whereas the non-seamless monthly QPP model produces forecasts only at the monthly scale.One might expect the enhanced capability obtained from the seamless MuTHRE model to come at some cost in performance at the monthly scale.Encouragingly, this is not the case.
The ability to reliably aggregate daily forecasts to the monthly scale demonstrates that the seamless MuTHRE model is adequately capturing temporal persistence in daily forecasts.The MuTHRE model represents temporal persistence in hydrological errors using the daily AR(1) model, and the (30-day) dynamic bias component.This is important because neglecting temporal persistence in hydrological errors can result in an underestimation of hydrological uncertainty for aggregated predictions/forecasts (Evin et al., 2014).The reliability of aggregated forecasts also suggests that the (preprocessed) rainfall forecasts are capturing the day-to-day temporal persistence of observed rainfall required to produce reliable monthly rainfall forecasts (see Section 5.1.2).
The seamless MuTHRE model is not calibrated to optimize performance at the monthly scale, as it uses only observed daily streamflow during calibration.On the other hand, the non-seamless monthly QPP model is calibrated to match the observed monthly streamflow, which could lead to improved performance at the monthly scale compared to the seamless MuTHRE model.As such, the comparable performance of the MuTHRE model at the monthly scale is particularly encouraging given that monthly data is not used in its calibration.

Use of observed vs forecast rainfall used in calibration
Both approaches use the same deterministic hydrological model calibrated using observed rainfall and streamflow data.
However, due to structural differences in their representation of residual errors, the seamless MuTHRE and non-seamless monthly QPP models differ in the approach used to calibrate the residual error model parameters.The residual error model in

Future work
Future work is recommended on the following aspects:  Further testing and development of the MuTHRE model on a wide range of catchments.The monthly QPP model has been comprehensively evaluated on 300 catchments around Australia (Woldemeskel et al., 2018)  Evaluation of how the quality of rainfall forecasts impacts on the performance of the seamless MuTHRE model and its ability to match/improve on the performance of the non-seamless monthly QPP model at the monthly scale.

Conclusions
Subseasonal streamflow forecasts at time scales ranging from daily to monthly are of major interest in water management.
This study compares two streamflow post-processing (QPP) models, namely the 'seamless' daily Multi-Temporal Hydrological Residual Error (MuTHRE) model and the more traditional 'non-seamless' monthly QPP model used in the Australian Bureau of Meteorology's Dynamic Forecasting System.The MuTHRE model is designed at the daily scale and can be aggregated up to the monthly scale, whereas the monthly QPP model is designed directly at the monthly scale and does not produce forecasts at the daily scale.A case study with 11 catchments in south-east Australia, the GR4J conceptual rainfallrunoff model, and pre-processed ACCESS-S rainfall forecasts, is reported.
The key finding is that the seamless MuTHRE model achieves essentially the same monthly-scale performance as the nonseamless monthly QPP model for the majority of metrics (reliability, sharpness, bias and CRPSS) and stratifications (monthly and yearly).Remarkably, the seamless post-processing model achieves high quality forecasts (based on the metrics considered in this study) at its native daily scale and matches the performance of the non-seamless monthly model at the monthly scale, despite not being calibrated at that time scale.Seamless subseasonal forecasts, which are reliable over a wide range of lead times (1-30 days) and time scales (daily-monthly), offer numerous practical benefits over non-seamless forecasts.For users, seamless subseasonal forecasts can inform a wide range of management decisions from flood warning to water supply operation, while for service providers, seamless forecasts will reduce the number of forecast products that require development and operation.As such it represents a single modelling tool with great versatility.The encouraging results from this study help motivate broader adoption of seamless forecasts, as they offer additional capability without loss in performance.

Figure 1 :
Figure 1: Illustration of general approach used to produce streamflow forecasts.Layers represent ensemble members.

Figure 2
Figure 2 Conceptual diagrams of the seamless MuTHRE model and the non-seamless monthly QPP model.Ensemble components are indicated with multiple 'layers'.Panel (a) shows the post-processing model structure including the deterministic component and the residual error model (REM).Panel (b) shows the calibration approach to estimate the parameters of the streamflow post-150


describes the mean value of μ on the day-of-the-year () dt, the dynamic bias term (b) t  describes the mean value of μ (after removing seasonality) over the preceding b N days ( 30 b N  is used), and *  is a constant to capture the remaining bias.Full details of these terms are provided in McInerney et al. (2020).
are estimated from the following daily scale data (see Figure2b): (i) Daily hydrological model simulations sim q forced with observed rainfall x ; (ii) Daily observed streamflow q .Seasonality ( (s) μ ) and dynamic bias ( (b)μ ) terms are calculated using moving averages, parameters *  and   are estimated as the sample mean and lag-1 auto-correlation of the de-trended residuals, while the mixed-


by month.Innovations are assumed to be independent and identically distributed Gaussian, is the standard deviation of the innovations.
the following monthly scale data (see Figure2b):

Figure 3 :
Figure 3: Location of the 11 case study catchments (panel a), and mean observed streamflow for each month (panel b) and each year 255

Figure 4 :
Figure 4: Time series of daily and cumulative probabilistic forecasts from the seamless MuTHRE model for Murray River at Biggara (401012, see Figure 3) for May 2002.The non-seamless monthly QPP model does not have the capability to produce these forecasts.

Figure 4
Figure 4 illustrates the streamflow forecast time series in the Biggara catchment (Catchment ID 401012, see Figure 3).Daily forecasts from the seamless MuTHRE model for a representative time period beginning on 1 st May 2002 are shown in Figure 4a.The observed daily streamflow lies within the 90% probability limits of the MuTHRE forecasts for each lead time.As expected, the probability limits are tight for short lead times (when forecast rainfall uncertainty and hydrological uncertainty are small), and widen for longer lead times.

Figure 5 :
Figure 5: Performance of MuTHRE forecasts in terms of daily streamflow (left) and cumulative flow (right).Metrics shown for reliability (top row), sharpness (2 nd row), volumetric bias (3 rd row) and CRPSS (bottom row).The bars indicate the full range of metric values across the 11 case study catchments and the line indicates the median metric values.Note the inverted y-axis for CRPSS, for visual consistency with the other metrics.
In summary, the forecasts from the MuTHRE model are seamless, because they are reliable over (a) the range of lead times, and (b) multiple aggregation scales, from the shortest scale of 1 day, to the longest scale of 1 month, and everything in between.This result confirms and extends previous findings in McInerney et al. (2020) who focused on daily and monthly scales only.

Figure 6 :
Figure 6: Time series of monthly probabilistic forecasts for Murray River at Biggara (401012, see Figure 3) for the seamless MuTHRE model and non-seamless monthly QPP model.Results are shown between the years 2000 and 2011.

Figure 6
Figure 6 compares monthly forecasts from the seamless MuTHRE model and non-seamless monthly QPP model for the Biggara catchment.While there are some minor differences between the two forecasts (e.g. the monthly QPP model produces larger spread than the MuTHRE model during 2010), the two forecasts are clearly very similar.

Figure 7 :
Figure 7: Overall performance (all months and years, left column), performance stratified by month (middle column) and performance stratified by year (right column), of monthly forecasts from the seamless MuTHRE and non-seamless monthly QPP 395 , whereas the MuTHRE model has currently been evaluated on 11 catchments in the Murray Darling Basin.Evaluation of the MuTHRE model over a wide range of hydro-climatic conditions is required to ensure the findings of this study are robust.Potential enhancements of the MuTHRE model, including specialised treatment of zero flows in ephemeral catchments (McInerney et al., 2019; Wang et al., 2020), may be required to ensure the MuTHRE model remains competitive with the monthly QPP model over a wider range of flow regimes. Deeper understanding of the reasons for the MuTHRE model matching the monthly QPP model at the monthly scale.For example, systematic testing of different combinations of MuTHRE and monthly QPP model components could help diagnose the specific reasons why the MuTHRE model performs so well.
), the deterministic component in the MuTHRE model at its daily time step t is of days with no flow.Catchment properties are summarised in Table 1.This same set of catchments was used to extensively evaluate the MuTHRE model in McInerney et al. (2020).Time series of daily observed streamflow over a 22-year period between 1991 and 2012 are obtained from the Hydrologic Reference Stations (HRS) dataset (http://www.bom.gov.au/water/hrs).Observed rainfall and PET data over the same period are obtained from the Australia Bureau of Meteorology's climate data service (www.bom.gov.au/climate), with a climatological average used for PET (McInerney et al., 2021).