Evaluating a landscape-scale daily water balance model to support spatially continuous representation of flow intermittency throughout stream networks

There is a growing interest globally in the spatial distribution and temporal dynamics of intermittently flowing streams and rivers, and how this varies in relation to climatic and other environmental factors. However, biases in the distribution of stream gauges may give a misleading impression of spatial-temporal variations in streamflow intermittency within river networks. Here, we developed an approach to quantify catchment-wide streamflow intermittency over long time frames and in a spatially explicit manner, using readily accessible and spatially contiguous daily runoff data from a national-scale water balance model. We examined the ability of the water balance model to simulate streamflow in two hydro-climatically distinctive (subtropical and temperate) regions in Australia, with a particular focus on low-flow simulations. We also evaluated the effect of model time step (daily vs. monthly) on flow intermittency estimation to inform future model selection. The water balance model showed better performance in the temperate region characterised by steady baseflow than in the subtropical region with flashy hydrographs and frequent cease-to-flow periods. The model tended to overestimate low-flow magnitude mainly due to overestimation of gains (e.g. groundwater release to baseflow) during low-flow periods. Modelled patterns of flow intermittency revealed highly dynamic behaviour in space and time, with cease-to-flow events affecting between 29 and 80 % of the river network over the period of 1911–2016, using a daily streamflow model. The daily flow model did not perform better than the monthly flow model in quantifying flow intermittency at a monthly time step, and model selection should depend on the intended application of the model outputs. Our general approach to quantifying spatio-temporal patterns of flow intermittency is transferable to other parts of the world, and it can inform hydro-ecological understanding and management of intermittent streams where limited gauging data are available.


Introduction
Intermittent streams that cease to flow for some period of most years are prevalent within river networks globally Datry et al., 2014). Their spatial extent is projected to increase in regions experiencing drying trends related to climate change and water extraction for human uses (Larned et al., 2010). Intermittent streams have seen increasing research interest over the past decade (e.g. Costigan et al., 2016;Fritz et al., 2013;Gallart et al., 2017;, and there is a growing interest in conserving these unique ecosystems. The scarcity of spatially explicit information on flow intermittency has been identi-fied as one of the key issues confronting intermittent stream management (Acuña et al., 2017). Flow intermittency exerts primary control on the transfer of energy, materials and organisms by surface water through river networks (Jaeger et al., 2019) and is a key driver of riverine ecosystems (Stanley et al., 1997;Datry et al., 2017;Poff et al., 1997). Therefore, improved understanding of temporal and spatial patterns in flow intermittency is fundamentally important for effective river management.
Previous studies have predominantly relied on the use of gauged streamflow data to make inferences about the distribution of intermittent streams in many regions, including France (Snelder et al., 2013), Australia (Kennard et al., 2010b;Bond and Kennard, 2017), Spain, and North America (de Vries et al., 2015). However, spatial biases in the distribution of stream gauges used in such studies may give misleading impressions of spatial patterns and the extent of streamflow intermittency (Snelder et al., 2013). Alternative methods for quantifying the extent of intermittent flow include citizen observation networks supported by regular reports from trained volunteers Turner and Richter, 2011), the use of electrical arrays by measuring the electrical conductivity of the streambed (Jaeger and Olden, 2012), development of predictive models for intermittent streams (González-Ferreras and Barquín, 2017), and deployment of unmanned aerial systems (Spence and Mengistu, 2016). These alternatives are generally appropriate over small spatial extents and short time frames but are difficult to scale up to larger areas to quantify flow intermittency in space and time. Satellite remote-sensing-based quantification of flow intermittency (Hou et al., 2019) can cover larger spatial extents but, for now, remains applicable only to relatively large rivers (> 30 m in the case of Landsat imagery) and can be affected by factors such as vegetation and cloud obstruction.
Spatially contiguous runoff data derived from water balance models provide another potential alternative to quantify spatio-temporal variations in flow intermittency. For example, Yu et al. (2018) used runoff simulations obtained from a water balance model, WaterDyn (Raupach et al., 2009), to generate spatially explicit and catchment-wide estimates of streamflow intermittency, but only at a relatively coarse monthly time step. Depending on the application, flow simulations at a finer temporal scale (e.g. daily) may be necessary to capture the dynamic aspects of hydrological processes. These kinds of simulations are important to better understand the causes of flow intermittency at multiple spatial scales and enable ecologically relevant characterisation of streamflow properties such as the magnitude, frequency, duration, and rate of change in high-or low-flow events. However, there are few examples of studies quantifying spatial and temporal variation in flow intermittency across river networks using spatially contiguous daily flow data. That is partly because streamflow simulation is more challenging at a daily versus monthly time step due to higher uncertainties in input data at this finer temporal scale (Wang et al., 2011). Water balance models at a daily time step have been increasingly developed around the world (Lin et al., 2019;Bierkens et al., 2015). One prominent regional example is the Australian Water Resource Assessment Landscape (AWRA-L) model (van Dijk, 2010). The AWRA-L model has been developed by the Commonwealth Scientific and Industrial Research Organisation (CSIRO) and the Australian Bureau of Meteorology (BoM) to simulate the terrestrial water balance across Australia at a daily time step (van Dijk, 2010;Frost et al., 2016). The model yields spatially contiguous daily water availability values gridded at a spatial resolution of 0.05 arcdeg spatial resolution (approximately 5 km × 5 km) (Frost et al., 2016). The development of such water balance models in Australia and other parts of the world provides the potential to quantify spatial and temporal variation in runoff, and hence flow intermittency, at a daily time step. However, this requires an effective and efficient conversion process to translate gridded runoff estimates to accumulated streamflow estimates down the river network. This is especially challenging for large study areas due to lags in runoff, which can influence the timing of flow peaks and rates of recession. Additionally, many national-scale water balance models, including AWRA-L, were calibrated on a large domain that covers multiple climate conditions (Viney et al., 2015), providing a best "average" response but potentially inconsistent accuracy of runoff simulations within particular climate domains. As the predictive performance for ungauged basins strongly depends on climate settings, this compromise raises the question as to whether such models can be used to quantify flow intermittency over multiple climate conditions. Although substantial efforts have been made in evaluating hydrological models in different climate conditions (Do et al., 2020;Gudmundsson et al., 2012;Zaherpour et al., 2018;Lin et al., 2019), a limited number of such studies have focused particularly on model performance during low-flow conditions, which is particularly important for flow intermittency quantification.
In this study, we sought to apply spatially contiguous daily runoff outputs from the AWRA-L water balance model to quantify the spatial extent and temporal patterns of flow intermittency. To assess the accuracy of the AWRA-L model for daily flow simulations, we first developed a simple but effective technique to convert runoff to streamflow for two hydro-climatically distinctive regions. The translation of gridded runoff to aggregated streamflow/discharge on vector river flow lines makes AWRA-L outputs more accessible to fluvial geomorphologists and ecologists, who may intend to relate daily hydrologic characteristics of rivers to a broad range of physical and ecological phenomena. We further assessed the uncertainty of the AWRA-L model in capturing patterns of flow intermittency. Lastly, we evaluated the effect of time step (daily vs. monthly) on the relative performance of the model in replicating observed patterns of cease-to-flow periods at reference gauges. A previous study conducted at the monthly time step (Yu et al., 2018) was used to benchmark flow intermittency estimated from the AWRA-L model.

Study areas
This research was conducted in two hydro-climatically distinctive regions: south-east Queensland and the Tamar River catchment in Tasmania (Fig. 1). The south-east Queensland (SEQ) region is located in the eastern part of Australia (Fig. 1a) and comprises five major coastal river basins with a total area of 21 331 km 2 (Fig. 1b). SEQ has 7229 stream segments and their corresponding sub-catchments according to the Australian Hydrologic Geospatial Fabric (Geofabric), with a minimum upstream drainage area of 1.5 km 2 . SEQ is a region of transitional temperate to subtropical climate ( Fig. 1a) with substantial inter-and intra-annual variation in rainfall. The majority of rainfall and streamflow usually occurs in the summer months of January to March, often followed by a second minor discharge peak between April and June, but high and low flows may occur at any time of year (Kennard et al., 2007). Thus, there are a range of flow regimes, with many streams being intermittent to varying degrees. The Tamar River catchment (Tamar) is located in Tasmania, an island state off Australia's south coast (Fig. 1a, c). It drains a catchment area of approximately 11 215 km 2 , comprising over one-fifth of Tasmania's land mass and is located in north-east and central Tasmania. According to climate data from BoM (http://www.bom.gov.au/climate/data, last access: 10 November 2020), Tamar is characterised by a temperate climate condition, with rainfall relatively evenly distributed throughout the year.

Streamflow gauge data
Gauged daily streamflow data were sourced from the BoM water data website (http://www.bom.gov.au/waterdata, last access: 10 November 2020) and were used to assess accuracy of AWRA-L-modelled streamflow (Sect. 3.3) and to estimate an appropriate zero-flow threshold of modelled streamflow data for quantifying patterns of streamflow intermittency (Sect. 3.4). A total of 25 gauges in SEQ and 15 gauges in Tamar were selected (Fig. 1b, c) to assess modelled streamflow accuracy. These gauges had less than 0.5 % missing values over the period from 1 January 2005 to 31 December 2017 and had minimal hydrologic modification due to human activities. A larger set of 43 gauges in SEQ (including 21 of the 25 gauges used by us for streamflow validation) was used to estimate the zero-flow threshold for this region (see Yu et al., 2018, for details of stream gauges). The gauges were widely dispersed throughout each study area and encompassed a range of stream sizes, catchment areas (22-3881 km 2 in SEQ; 33-3294 km 2 in Tamar), and flow regime types, ranging from highly intermittent to perennial streams (see results). However, the set of stream gauges used in our analyses under-represented the frequency of small low-order streams in both regions. Therefore, we regard the selected gauges to be representative of the range of environmental and hydrological conditions in the regions, except for extremely small catchments with an area < 22 km 2 that likely have higher cease-to-flow occurrence.

Conversion from spatially contiguous runoff to streamflow
AWRA-L is a daily 0.05 • grid-based distributed water balance model that is conceptualised as a small catchment. It simulates the water flow through the landscape from the rainfall entering the grid cell through the vegetation and soil and then out of the grid cell through evapotranspiration, surface water flow, or lateral flow of groundwater to the neighbouring grid cells (Viney et al., 2015). AWRA-L was calibrated and validated at the national scale during its development by CSIRO and BoM, with 301 gauges used for calibration and a different set of 304 gauges used for validation (Zhang et al., 2013). Simulated daily runoff from the AWRA-L model (version 5) was downloaded from BoM (http://www.bom.gov.au/water/landscape, last access: 28 November 2018). These data are in gridded format and require conversion to streamflow for each sub-catchment by aggregating the gridded runoff data with a hierarchically nested catchment to simulate streamflow throughout river networks. The conversion process may or may not need to use a river routing model to propagate streamflow through river networks, partly depending on the size of the catchment of interest (Robinson et al., 1995). If streamflow simulated with a routing model shows little difference to that without a routing model, then the conversion process is more efficient without a routing model, and the readily available runoff data can be more accessible for potential applications, such as flow characterisation for ungauged stream segments. In addition, a conversion process involving a routing model can be computationally intensive and usually requires parallel computing to speed up the calculations (David et al., 2011b). Therefore, in this study, we applied two approaches to determine an effective and efficient runoff-streamflow conversion. The first approach coupled a river routing model to the water balance model, and its effects on flow simulations are compared to the model performance of a lumped model, which was operated without any river routing (Fig. 2). As the conversion process was achieved using the "catchstats" package (https://github.com/nickbond/catchstats, last access: 10 November 2020) in the R programming language (R Development Core Team, 2017), the second approach was to speed up the conversion process by incorporating a parallel algorithm to exiting functions of that package. The conversion process was run on a Griffith University high- performance computing node with 12 cores and 12 GB of RAM. The hierarchically nested catchment dataset used in this study was sourced from the Geofabric dataset (Stein et al., 2014), which provides a fully connected and directed stream network derived from the national 9 arcsec DEM and flow direction grid (∼ 250 m resolution), and associated catchment hierarchy at the national scale. The routing model applied in this study was the Routing Application for Parallel computatIon of Discharge (RAPID) model (David et al., 2011b). RAPID solves the matrix-based Muskingum equation to route flow through each stream of the river network and performs streamflow computation for every stream segment of a river network, including ungauged streams. Various water balance models have been used in combination with RAPID (Follum et al., 2017;Lawrence et al., 2011;Lin et al., 2019).
To test the effects of river routing, we first calculated a series of flow metrics (Table 1) for flow simulations from both the lumped and coupled models. The calculated flow metrics are commonly used to describe the critical components of flow regimes across average-, high-, and low-flow conditions, including flow magnitude and variability; the timing, frequency, and duration of high and low flows; and rates of changes in flow events (Poff et al., 1997;Olden and Poff, 2003). Calculation of these streamflow characteristics allows a comprehensive assessment of the effects of river routing on streamflow simulations in the two regions. We then applied the Wilcoxon rank sum test for each flow metric to determine whether the inclusion of river routing can improve model accuracy based on a significance level of 5 %. We used the 10th and 90th percentiles of daily flows to respectively describe low-flow and high-flow thresholds Gudmundsson et al., 2019). The calculation process was conducted with the "hydrostats" package in the R language (Bond, 2019).

Accuracy assessment of modelled streamflow
To evaluate overall model performance in streamflow simulations, we calculated the modified Kling-Gupta efficiency (KGE; Kling et al., 2012) between the observed and modelled streamflow for all gauges in SEQ and Tamar (Eq. 1). Figure 2. Model configurations and their applications in this study. AWRA-L runoff outputs are translated to accumulated streamflow estimates with the river routing algorithm (coupled model) and without (lumped model). These two model configurations are applied to test the effect of river routing on streamflow simulation accuracy. Based on the lumped model, we simulate daily streamflow throughout river networks (daily AWRA-L) and further convert the daily stimulations to monthly outputs (monthly AWRA-L). Both simulations are used to quantify streamflow intermittency, while results from a different monthly model (monthly WaterDyn) are used to benchmark the flow intermittency estimates from monthly AWRA-L. Table 1. Flow metrics used to describe average-, high-, and low-flow conditions across key components of hydrological variation. Note that a spell independence criterion of 5 d was applied to regard periods between spells of less than 5 d as "in spell".

Conditions
Component KGE is an integrated skill metric which measures the Euclidean distance between a point and the optimal point that has the maximum correlation coefficient, zero variability error, and zero bias error between the simulated and observed streamflow (Kling et al., 2012;Gupta et al., 2009). KGE takes values from −1 to 1: KGE = 1 indicates perfect agreement between simulations and observations, and KGE < −0.41 indicates that the mean of observations provides better estimates than simulations (Knoben et al., 2019). To evaluate model performance in different components of flow regimes, we also calculated each summary flow metric (Table 1) for observed and modelled streamflow data at all gauges in SEQ and Tamar and visually compared their frequency distributions. The use of KGE provides an overall assessment of AWRA-L model performance, and the flow metrics in Table 1 are used to comprehensively evaluate the model accuracy for various components of flow regimes, including the flow metrics related to low flows. Only six of the 25 gauges in SEQ and three of the 15 gauges in Tamar were the same as those used to calibrate the AWRA-L water balance model. This small overlap between the AWRA-L calibration gauge set (n = 301) and the streamflow model validation gauge set (n = 25 in SEQ and 15 in Tamar) means that potential overestimation of streamflow model performance is likely to be minimal.
where KGE is the modified KGE statistic (dimensionless); r is the correlation coefficient between simulated and observed runoff (dimensionless); β is the bias ratio (dimensionless); γ is the variability ratio (dimensionless); µ is the mean runoff in cubic metres per second (m 3 s −1 ); CV is the coefficient of variation (dimensionless); σ is the standard deviation of runoff in cubic metres per second (m 3 s −1 ); and subscripts s and o refer to simulated and observed runoff values, respectively. Furthermore, considering that this study aims to apply flow simulations to quantify flow intermittency, the model accuracy of low-flow simulation is particularly important. The study period (1 January 2005-31 December 2017) was considered sufficient to assess low flows. The 13-year study period is close to a discharge record length of 15 years, which Kennard et al. (2010a) concluded is sufficient to enable accurate estimation of low-flow metrics. In addition, our study period begins in the middle of the Australian Millennium drought (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)) and includes a significant low-flow period. A preliminary analysis showed that AWRA-L-modelled streamflow was sensitive to rainfall events, relative to the response of observed flow (Fig. 3). This finding indicates that over-responsiveness of AWRA-L to rainfall may potentially contribute to overestimation of low flow. We hypothesised that this over-responsiveness is partly due to overestimation of in situ gains to low-flow discharge (e.g. groundwater release to baseflow) as well as underestimation of transmission losses (e.g. depression filling and evapotranspiration) during water movement through various flow paths in the stream network (Davison and van der Kamp, 2008). Given that we do not have access to the AWRA-L model to directly adjust model parameters, we instead compared the observed and modelled low-flow magnitude at all gauges in the two study areas along the gradient of their catchment areas (22-3881 km 2 in SEQ; 33-3294 km 2 in Tamar) to test this hypothesis. We expect that (1) if the difference in low-flow magnitude occurs at all gauges, then low-flow overestimation can be at least attributed to the overestimation of gains to low-flow discharge. Alternatively, (2) if the difference in low-flow magnitude occurs towards the downstream of the catchment, then low-flow overestimation may be related to underestimation of transmission losses.

Quantifying flow intermittency using spatially contiguous flow simulations
Given the fact that water balance models often over-predict the magnitude of very low flows due to the difficulties of quantifying hydrological processes influencing low-flow discharge (Ye et al., 1997;Smakhtin, 2001;Staudinger et al., 2011), we adopted the same method used in Yu et al. (2018) to estimate a threshold of zero flow from the model that related measured zero-flow duration at each gauge to catchment environment variables. We used linear regression to model the mean annual zero-flow duration (daily time step) at each gauge as a function of catchment environment variables. This regression analysis was only conducted in SEQ as most gauges in the Tamar catchment had perennial flow. The environmental variables were the same as those in Yu et al. (2018) and included variables related to climate (annual daily maximum temperature), catchment geology topography (catchment area, catchment average slope, and catchment average elevation), and catchment soil properties (catchment average saturated hydraulic conductivity). Regression models were developed using all possible predictor variable combinations, and we selected the "best" model for predicting zero-flow duration based on the corrected Akaike's information criterion (AICc) (Hurvich and Tsai, 1989). To estimate the prediction error of the selected model, we applied leave-one-out cross-validation on the selected 43 gauges and reported prediction error (R 2 ) to estimate the model prediction performance. Regression model development and cross-validation were conducted with the MuMIn and boot packages in R (R Development Core Team, 2017). Regression analyses were performed on all combinations of predictor variables, and the best model with the lowest AICc (−54.2) retained five covariates, including annual daily maximum temperature, catchment area, slope, average elevation, and average saturated hydraulic conductivity. The developed predictive model showed a good model fit with an adjusted R 2 of 0.71, and the leave-one-out cross-validation on the regression model showed relatively good model performance with an average R 2 of 0.64. We checked for spatial autocorrelation of the regression model residuals (as recommended by Dormann et al., 2007) and found they were not significantly autocorrelated (Moran's I = −0.06, p = 0.69). Examination of spatial residual maps further supported this conclusion, with no spatial trends in model residuals apparent. Next, we used the predictive models to extrapolate estimates of overall flow intermittency (in terms of the proportion of days with zero flow) to each segment throughout the river network. Finally, for each segment, the time series of daily runoff was truncated (flows below the threshold were set to "0") by adopting an appropriate threshold of "zero flow" that preserved the proportion of days with flow as estimated in the previous step. The adopted thresholds ranged from 0 to 1.668 m 3 s −1 , with a median value of 0.002 m 3 s −1 . We recognise several sources of uncertainty in our approach to estimating the zero-flow thresholds. The unexplained variation in the predictive model may be due to the limited number of environmental attribute covariates used in the model and hence ability to adequately represent the range of environmental processes that influence streamflow intermittency. Additional uncertainty in model predictions may arise because the distribution of stream gauges used for model calibration under-represented the frequency of extremely small catchments that likely had higher cease-to-flow occurrence.
Based on the modelled daily streamflow from AWRA-L, we calculated annual flow intermittency as the number of zero-flow days per year over the period of 2005-2016. To evaluate the effect of time step (daily vs. monthly) on the relative performance of AWRA-L in replicating observed pat- The over-responsiveness of the model to rainfall is illustrated in the noticeable increase in modelled streamflow when a rainfall event occurred, while there is no obvious increase in observed streamflow. Rainfall data were sourced from the AWRA-L input.
terns of cease-to-flow periods, we compared model outputs with those derived from a monthly water balance modelthe WaterDyn model (Fig. 2). Monthly flow intermittency estimated from WaterDyn was thus used to benchmark results from the monthly AWRA-L. To do this, we aggregated daily outputs to a monthly time step (termed "monthly AWRA-L" hereafter, Fig. 2). We tried two different aggregation methods. One considered that the flows for a month were zero when at least one day in that month had zero flow (termed "monthly AWRA-L_01" hereafter), and the other considered that all days in a month must have zero flow for that month to be zero (termed "monthly AWRA-L_30" hereafter). These two methods together should provide both upper and lower bounds of comparing daily and monthly models in estimating flow intermittency. The WaterDyn model was developed to provide monthly spatially contiguous water balance data at the Australian continental scale by CSIRO and BoM with a similar model structure to AWRA-L (Raupach et al., 2018), and it has been used to quantify the spatial and temporal patterns of flow intermittency in SEQ following similar methods to this study (Yu et al., 2018). Modelled flow intermittency from all three sources (i.e. daily and monthly AWRA-L, and monthly WaterDyn) was also tested against the measured flow intermittency derived respectively from daily and monthly observed streamflow data at gauged locations in SEQ.
Taking advantage of the modelled long-term runoff data from AWRA-L over the period of 1911-2016, we further quantified spatial and temporal dynamics of flow intermittency for every stream segment within SEQ, and compared the results with those from the WaterDyn model over the same period (Yu et al., 2018). The spatial pattern of flow intermittency was represented by the mean annual number of zero-flow days across the period of 1911-2016 for AWRA-L and by the mean annual number of calendar months for Wa-terDyn. The temporal pattern of flow intermittency was expressed as the proportion of streams with flow intermittency > 30 d and 1 month (termed "intermittent streams" hereafter) for AWRA-L and WaterDyn, respectively.

Negligible effects of river routing on daily flow simulations
The lumped and coupled (i.e. with routing) models using AWRA-L-simulated runoff were run in both SEQ and Tamar, and produced similar values for various flow metrics between the lumped and coupled models in both regions ( Fig. 4; p values were greater than 0.50 for most flow metrics based on the Wilcoxon test results). There were noticeable but not statistically significant differences for two flow metrics related to low flows (the variability in timing and the frequency of lowflow spells), and only the duration of low-flow spells was statistically significant (p = 0.03). These results suggested that the routing algorithm has nearly negligible effects on flow simulations in our study areas, which is reasonable because of the small size of the two watersheds. Therefore, in the subsequent analysis, we only used the results from the AWRA-L lumped model as it is relatively less computationally intensive and was able to maintain a comparable model performance to that of the coupled model taking into account the routing effect.

Accuracy assessment of modelled streamflow in SEQ and Tamar
The overall accuracy of streamflow estimated by the AWRA-L lumped model (referred to as "modelled streamflow" in this section) was evaluated for 25 gauges in SEQ and 15 gauges in Tamar. Results suggested a fair to good explanatory value across all gauges (Fig. 5) Fig. 5). However, no significant difference was found in the overall model performance between the two hydro- climatically distinctive regions, according to the Wilcoxon test (w = 247, p = 0.10). Concerning model performance in simulating different components of flow regimes, the modelled streamflow in SEQ revealed a generally good match with the observed streamflow across all high-flow metrics and the magnitude of average flow, but the model tended to overestimate the variation in the magnitude of average flow (almost 2 times higher on average), report earlier timing of low flows, overestimate the frequency (48 % higher), and underestimate the duration (74 % lower) of low flows (Fig. 6). Compared to the model performance in SEQ, the flow simulations in Tamar showed slightly better performance, predicting well not only for the high-flow metrics but also for the metrics related to average flows (Fig. 6). However, flow simulations in Tamar also exhibited slightly earlier estimations for the timing of low-flow spells (13 % earlier), overestimations for low-flow spell frequency (92 % lower on average), and underestimation for low-flow spell duration (58 % lower) (Fig. 6).
Varying degrees of difference in the magnitude of low flow between the observed and simulation were found among the gauges. There appeared to be a tendency toward larger differences with increasing catchment area in SEQ but not in Tamar (Fig. 7). The models appeared to overestimate in situ gains to low flow in some reaches in both regions, while underestimating transmission losses in SEQ, suggesting that overestimation of in situ gains in AWRA-L likely contributes to the overall overestimation of low flow in downstream catchments.

Quantifying flow intermittency using flow simulations
We calculated annual flow intermittency at gauged locations in SEQ using three sources of modelled flow (daily and monthly AWRA-L, and monthly WaterDyn). Annual flow intermittency calculated using daily AWRA-L flow (i.e. the average number of cease-to-flow days per year) was tested against annual flow intermittency estimated using observed data (Fig. 8a). The AWRA-L model displayed the potential to be used to estimate flow intermittency at a daily time step, with a fair match with the observed flow intermittency (R 2 = 0.56) in SEQ. Nonetheless, the model tended to overestimate flow intermittency for gauges located in relatively wet areas (e.g. ≤ 40 d of flow intermittency per year) while underestimating it for gauges located in relatively dry areas (e.g. ≥ 40 d of flow intermittency per year). Figure 8b shows annual flow intermittency calculated using monthly AWRA-L flow and monthly WaterDyn flow. In this case, annual flow intermittency was defined as the average number of months characterised with zero flow. The WaterDyn model showed much more accuracy than the two aggregation methods based on the monthly AWRA-L model in estimating flow intermittency (R 2 = 0.53, 0.43, and 0.32 respectively for monthly WaterDyn, monthly AWRA-L_01, and monthly AWRA-L_30). More specifically, the Water-Dyn model displayed a similar estimation pattern to the daily AWRA-L model: overestimation in relatively wet areas and underestimation in relatively dry areas. By contrast, not surprisingly, the two aggregation methods showed the upper and lower bounds of flow intermittency estimates from the monthly AWRA-L model: monthly AWRA-L_01 overestimated flow intermittency and monthly AWRA-L_30 underestimated flow intermittency at nearly all gauges (Fig. 8b).  The spatial patterns of flow intermittency derived from the daily AWRA-L and monthly WaterDyn flow simulations aligned well for the main stems and some coastal streams, which were predicted to flow for most of the time (Fig. 9a,  b). There were noticeable spatial differences between model predictions of streamflow intermittency for low-order inland streams. For example, in the western Brisbane River catchment and the South Coast River catchment, most inland streams were predicted by the daily model to flow for a longer period than by the monthly model; while in the Pine River catchment and the Logan-Albert River catchment, many inland streams were predicted by the daily model to flow for a shorter period (Fig. 9a). Compared to the monthly WaterDyn model, fewer streams were predicted by the daily AWRA-L to experience extremely long dry events as well as less than 1 month of zero flows (Fig. 9c, d). However, more streams on average (60 and 49 % for the AWRA-L and WaterDyn model, respectively) were predicted to flow intermittently (> 30 d or > 1 month) to varying degrees in SEQ, which suggests that flow intermittency was prevalent in SEQ, irrespective of the water balance model used. Temporally, the daily model estimated that the proportion of intermittent streams in SEQ varied from 29 to 80 % over the study period , while the monthly model estimated the range to be from 3 to 80 % estimated during the same time span (Fig. 10). The two temporal patterns were temporally correlated (r = 0.71), and similar predictions with higher proportions of intermittent streams were estimated for the dry years by both models. Compared to dry years, the two models exhibited greater differences in predictions for the wet years, where the daily model tended to predict a higher proportion of intermittent streams. Overall, the daily model suggested a drier history in SEQ in terms of flow intermittency than the monthly model. The models successfully identified the extensive drying associated with severe drought periods. Notably, the Widespread drought (1914-1920), WWII drought (1939-1946), and Millennium drought (2001 were all visible in both two sets of model predictions.

Discussion
The scarcity of information on the spatial and temporal extent of flow intermittency has been identified as a major barrier for ecologists and managers to understand and protect intermittent stream ecosystems (Acuña et al., 2017). This barrier has been partly overcome in previous studies by using statistical models relating flow intermittency to surrounding environmental variables (Snelder et al., 2013;Jaeger et al., 2019;González-Ferreras and Barquín, 2017;Bond and Kennard, 2017), but most of these studies focused on only the spatial variations in flow intermittency, except for Jaeger et al. (2019), overlooking its temporal aspects. This issue becomes particularly urgent in a time when flow regimes of streams are changing worldwide, mainly in response to climate change and water extraction for human uses (Jaeger et al., 2014;Chiu et al., 2017). Monthly runoff data have been recently used to quantify flow intermittency for entire river networks (Yu et al., 2018), and the current study takes one step further to use daily runoff data in flow intermittency estimation, which is especially needed for studies aimed at quantifying ecological responses to short-term flow events (e.g. frequent zero-flow events within a month). In this study, we comprehensively examined the ability of a daily water balance model to simulate streamflow, with a particular focus on low-flow simulations. We also investigated how to better choose water balance models to estimate flow intermittency by answering the question of whether daily flow models outperform monthly flow models at both daily and monthly scales. Our study can not only inform the estimation of the spatial distribution of intermittent flow but also reveal the temporal dynamics of intermittent streams over long time frames.

Efficient runoff-streamflow conversion for eco-hydrological research
Effects of river routing on daily flow simulations were found to be negligible in SEQ and Tamar, most probably due to the relatively small size of the two catchments and the relatively short length of even the longest streams (Cunha et al., 2012). This can be verified with the concept of time of concentration, which is commonly used to measure the time needed for water to flow from the most remote point in a catchment to the catchment outlet. By following the formula for calculating the time of concentration proposed by Pilgrim and McDermott (1982) that has been widely used in Australia, we found the time of concentration in SEQ is around 33 h, only slightly more than a daily time step (24 h). This illustrates why it is difficult for a daily time-step routing model to effectively capture routing lags in our study domain. A negligible effect of river routing on flow simulations was also observed in previous studies (David et al., 2011a). Robinson et al. (1995) found that catchment size is a primary factor in determining which process, the hillslope or the channel net- work transport component, characterises lags in catchment runoff down the river network. In areas such as SEQ and Tamar that have a relatively small catchment size, the inclusion of channel network transport contributes little to the improvement of flow simulations. The negligible effect of river routing in SEQ and Tamar allowed us to simplify the simulation of daily flows without coupling with a river routing model. Hence we were able to use existing runoff outputs from the daily AWRA-L model. Arguably, similar opportunities exist in other small catchments.

Accuracy assessment of modelled daily streamflow in two hydro-climatically distinctive regions
Daily streamflow estimates showed a fair to good overall alignment with the observed flows in both SEQ and Tamar, with all gauges showing that flow simulations were better estimates than the mean of observations (KGE ≥ −0.41 at all gauges). Interestingly, although streamflow was more ac-curately simulated in the Tamar than in SEQ (the median values of KGE were 0.47 and 0.42, respectively), the differences between the two hydro-climatically distinctive regions were relatively small. Despite ongoing efforts to calibrate AWRA-L against a set of reference scales distributed across the continent (Viney et al., 2015), this finding was reassuring given the much higher variability in rainfall and soil moisture in SEQ, factors that typically can lead to a more non-linear streamflow response to rainfall (Poncelet et al., 2017), which possibly undermines the ability of water balance models to reliably predict runoff (Sheng et al., 2017). These results hence bode well for the application of AWRA-L outputs across diverse hydroclimatic regions.
When looking into the model performance for specific components of the flow regime, average-and high-flow metrics were both modelled well in Tamar, while only highflow metrics were modelled well in SEQ. However, in both regions, the AWRA-L model showed poor performance in low-flow metrics: overestimating the frequency and under- Figure 10. Comparison of intra-annual variation of the proportion of intermittent streams in length from 1911 to 2016 across SEQ, derived from streamflow simulations from the daily flow model (lumped, solid grey line) and monthly flow model (dashed grey line). Three severe droughts in Australia were also presented as transparent grey rectangles: Widespread drought (1914-1920), WWII drought (1939-1946), and Millennium drought (2001. The time series of annual mean precipitation is shown for reference. estimating the duration of low flows, consistent with previous studies (Costelloe et al., 2005;Ivkovic et al., 2014;Ye et al., 1997;Staudinger et al., 2011). This suggests that the AWRA-L model is a generally robust model in predicting average and high flows but still needs some improvement to better simulate low flows. Runoff generation processes can vary substantially through space and time due to such factors as variations in soil depth, antecedent soil moisture, and groundwater connectivity, and this can influence spatio-temporal variations in low-flow characteristics, including streamflow intermittency (Zimmer and McGlynn, 2017). However, it is unknown to what extent this contributed to uncertainty in the simulation of low flows and estimation in streamflow intermittency in this study. The uncertainty of AWRA-L in low-flow simulations can be linked to its overresponsiveness to rainfall, partly caused by overestimation of in situ gains and underestimation of transmission losses to low-flow discharge, as shown in SEQ. Previous studies found that lateral flow exchange between grid cells of land surface models (e.g. AWRA-L) plays a significant role in redistributing soil water (Kim and Mohanty, 2016), and thus may improve in situ surface/subsurface runoff simulations (Lee and Choi, 2017). On the other hand, hydrological processes involved in transmission losses have been extensively discussed (Jarihani et al., 2015;Konrad, 2006), and studies have developed methods to calculate transmission losses for better flow simulations (Lange, 2005;Costa et al., 2012). Therefore, low-flow simulations by AWRA-L can possibly be improved by incorporating lateral flow exchange algorithms and better accounting for hydrological process such as evapotranspiration from riparian vegetation and infiltration into channel beds. This improvement is made more likely as AWRA-L has been released as a community modelling system (https://github.com/awracms/awra_cms, last access: 10 November 2020), which allows co-development by the research community.

Choose appropriate water balance model to quantify spatio-temporal dynamics of flow intermittency
Our results suggest that the temporal resolution of analysis should be dictated by the resolution of input streamflow data. More specifically, the daily AWRA-L flow showed promise for estimating flow intermittency at a daily time step, while the monthly WaterDyn model was better than the monthly AWRA-L model in flow intermittency estimation at a monthly time step. This suggests that monthly flow models can sometimes outperform daily flow models in quantifying flow intermittency, depending on the intended temporal resolution of the analysis. For example, daily flow models may be appropriate for studies aimed at quantifying ecological responses to short-term flow events, while monthly flow models are more suitable for research requiring the average degree of flow intermittency at a large spatial or temporal scale, such as examining the effect of flow intermittency on aquatic/streamside vegetation or species distributions (Stromberg et al., 2005). In addition, our study also suggested that the suitability of a monthly model (WaterDyn) for monthly resolution of analysis was not challenged by a daily model (AWRA-L) simply through aggregating daily streamflow simulations to a monthly time step. The aggregation methods used here applied 1 or 30 d as a threshold and, respectively, either substantially overestimated or underestimated flow intermittency. Spatially contiguous runoff data were used in this study to quantify spatial and temporal dynamics of flow intermittency, shedding light on the temporal aspect of flow intermittency that has been often overlooked in previous studies. Annual flow intermittency in SEQ was shown to vary significantly from year to year, ranging from 29 to 80 % of total stream length for the AWRA-L model. However, given the limited spatial resolution of the Geofabric stream network data (9 arcsec longitude-latitude resolution, with a minimum upstream drainage area of 1.5 km 2 ) and hence ability to resolve the smallest streams, and that small streams are more likely to be intermittent, the proportion of predicted intermittent streams in SEQ may be underestimated in our study. Although there are differences in the temporal patterns of estimated flow intermittency between the AWRA-L and Water-Dyn models, neither model estimated intermittency to have a clear trend over the past century. However, there is still the concern about the potential shift of some perennial streams to intermittent streams due to climate change and intense human activities, as it has been evident in several regions where the number of low-flow and/or no-flow days is increasing (King et al., 2015;Ruhí et al., 2016;Sabo, 2014). Jaeger et al. (2014) investigated the effect of climate change on flow intermittency patterns and found that annual zeroflow days frequency were projected to increase by 27 % by mid-century in the Lower Colorado River basin of the United States. Research looking into projected changes in regional climate regimes can provide insights into future scenarios people may face, but such research is still scarce.
The approach developed here to generate spatially continuous estimates of streamflow characteristics (including flow intermittency) throughout stream networks has potential applicability to other regions of Australia and globally. All the data used in this study are available for the Australian national scale, and similar datasets also exist in other countries. For example, similar to the Geofabric data (Stein et al., 2014) used here, the National Hydrography Dataset Plus (NHDPlus) and HydroATLAS (Linke et al., 2019) provide hydrographic datasets and hydro-environmental attributes for national (USA) and global scales, respectively. In addition, similar to the daily flow model AWRA-L used in this study, other global-and national-scale hydrologic models are also available, such as the global WaterGAP model (Döll et al., 2003), the community Noah land surface model (Noah-MP) (Niu et al., 2011) in the USA, and the HYPE model (Lindström et al., 2010) in Sweden.

Conclusions
In this study, we presented an approach to quantifying spatially explicit and catchment-wide flow intermittency over long time frames based on spatially contiguous daily runoff data from a readily accessible water balance simulation. This research builds upon previous studies using monthly runoff data, and paves the way for ecological research looking for metrics of flow intermittency at a daily time step. By testing this approach in eastern Australia, we not only confirmed our previous finding that intermittent flow conditions prevailed in the majority of streams, but also provided more detailed information on their spatio-temporal variability at a daily time step. The proposed approach has potential applicability to other catchments globally, but our results also highlighted some complexities that future research should address to help improve the reliability of model outputs.
Data availability. The data used in this study are available publicly online, and the access websites have been listed in the main text where they were first mentioned.
Author contributions. AIJMvD, HXD, MJK, and SY designed the research, and SY and HXD carried it out. SY wrote the original draft, and HXD, AIJMvD, PL, NRB, and MJK contributed to writing of subsequent drafts.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. The project was supported by the Australian Climate and Water Summer Institute organised by the Australian Energy and Water Exchange Research Initiative (OzEWEX) and partners. This research was undertaken at the NCI National Facility in Canberra, Australia, which is supported by the Australian Commonwealth Government. We also gratefully acknowledge the support of the Griffith University eResearch Services team and the use of the high-performance computing cluster "Gowonda" to complete this research. We would like to thank the Editor Christa Kelleher for handling our submitted manuscript, and thank George Allen and three anonymous reviewers for their comments and suggestions that much improved the final manuscript.
Financial support. This research has been supported by the China Scholarship Council and Griffith University (grant no. 201506040057) and the University of Michigan (grant no. U064474).
Review statement. This paper was edited by Christa Kelleher and reviewed by George Allen and three anonymous referees.