A predictive model for spatio-temporal variability in 1 stream water quality 2

Abstract. Degraded water quality in rivers and streams can have large economic, societal and ecological impacts. Stream water quality can be highly variable both over space and time. To develop effective management strategies for riverine water quality, it is critical to be able to predict these spatio-temporal variabilities. However, our current capacity to model stream water quality is limited, particularly at large spatial scales across multiple catchments. This is due to a lack of understanding of the key controls that drive spatio-temporal variabilities of stream water quality. To address this, we developed a Bayesian hierarchical statistical model to analyse the spatio-temporal variability in stream water quality across the state of Victoria, Australia. The model was developed based on monthly water quality monitoring data collected at 102 sites over 21 years. The modelling focused on six key water quality constituents: total suspended solids (TSS), total phosphorus (TP), filterable reactive phosphorus (FRP), total Kjeldahl nitrogen (TKN), nitrate-nitrite (NOx), and electrical conductivity (EC). Among the six constituents, the models explained varying proportions of variation in water quality. EC was the most predictable constituent (88.6 % variability explained) and FRP had the lowest predictive performance (19.9 % variability explained). The models were validated for multiple sets of calibration/validation sites and showed robust performance. Temporal validation revealed a systematic change in the TSS model performance across most catchments since an extended drought period in the study region, highlighting potential shifts in TSS dynamics over the drought. Further improvements in model performance need to focus on: (1) alternative statistical model structures to improve fitting for the low concentration data, especially records below the detection limit; and (2) better representation of non-conservative constituents by accounting for important biogeochemical processes. We also recommend future improvements in water quality monitoring programs which can potentially enhance the model capacity, via: (1) improving the monitoring and assimilation of high-frequency water quality data; and (2) improving the availability of data to capture land use and management changes over time.



Introduction
Deteriorating water quality in aquatic systems such as rivers and streams can have signific ant environmental, economic and social ramifications (e.g.Whitworth et al., 2012;Vörösmarty et al., 2010;Qin et al., 2010;Kingsford et al., 2011).However, our ability to manage and mitigate water quality impacts is hampered by the variability in water quality both across space and time, and our inability to predict this variability (Chang, 2008;Bengraı̈ne and Marhaba, 2003;Ai et al., 2015).Water quality conditions can vary across individual events, as well as at daily, seasonal and inter-annual scales at an individual location (Arheimer and Lidén, 2000;Kirchner et al., 2004;Larned et al., 2004;Pellerin et al., 2012;Saraceno et al., 2009).Water quality conditions also typically differ significantly across locations (Meybeck and Helmer, 1989).These variabilities in stream water quality are driven by three key mechanisms: (1) the source of constituents, which defines the total amount of constituents being available in a catchment; (2) the mobilization of constituents in particulate and dissolved forms, which detaches constituents from their sources via processes such as erosion and biogeochemical processing; and (3) the delivery of mobilized constituents from catchments to receiving waters via multiple hydrologic pathways including surface and subsurface flow (Granger et al., 2010).
Spatial variability in stream water quality is driven by natural catchment characteristics (e.g., climate, geology, soil type, topography and hydrology) as well as by human activities within catchments (e.g., land use and management, vegetation cover etc.), all of which control the extent and magnitude of the three key mechanisms described above (Lintern et al., 2018a).At the same time, temporal shifts in water quality are influenced by changes in climatic, hydrological and other catchment conditions, such as temperature (Roberts and Mulholland, 2007), the timing and magnitude of rainfall events (Fraser et al., 1999), runoff generation and streamflow (Ahearn et al., 2004;Mellander et al., 2015;Sharpley et al., 2002), and vegetation cover changes over time (Kaushal et al., 2014;Ouyang et al., 2010).
Despite undertstanding of the basic mechanisms, we currently lack the ability to model these spatiotemporal variabilities at larger scales to inform the development of effective policy and mitigation strategies.Conceptual or physically-based water quality models are typically limited by the simplification of physical processes (Hrachowitz et al., 2016).Furthermore, practical implementation of these models can be also limited by the intensive requirements of data and calibration effort, https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.
In contrast, whilst most statistical water quality models are easier to implement, they often focus on either the spatial variation of time-averaged water quality conditions (Tramblay et al., 2010;Ai et al., 2015), or the temporal variation at individual locations (Kisi and Parmar, 2016;Kurunç et al., 2005;Parmar and Bhardwaj, 2015).Consequently, it remains challenging to address spatio-temporal variability simulaneously over long time periods and large regions.This lack of integrated modelling of both spatial and temporal variability in water quality can not only limits our understanding of the key factors that affect water quality dynamics over both of these dimensions.It also hinders our ability to predict future water quality changes in un-monitored locations.
The aim of this research is to develop a data-driven model to predict spatio-temporal changes in stream water quality.This model was established using long-term (21 years) stream water quality observations across 102 catchments in the state of Victoria, Australia.The model built on two previous studies on the same dataset that identified the key drivers for water quality spatial and temporal variabilit ies, respectively (Lintern et al., 2018b;Guo et al., 2019).Our approach aims to bridge the gap between fullydistributed water quality models and statistical approaches to provide useful information for catchment managers, especially for large-scale water quality assessments.

Spatio-temporal modelling framework
A Bayesian hierarchical approach was used to model the spatio-temporal variability in stream water quality.The Bayesian approach enables the inherent natural stochasticity of water quality to be incorporated into the model (Clark, 2005), and the hierarchical model structure enables the key controls of temporal variability in water quality to vary across locations (Webb and King, 2009;Borsuk et al., 2001).
The structure of this model is presented below in Eq. 1 to 6.The transformed concentration of a constituent (see Sect. 2.2 for justification) at time i and site j (  ) is assumed to be normally distributed with a mean   and standard deviation  representing inherent randomness (Eq.1).
To describe spatial variability, the site-level mean ( ̅  ) is modelled as a linear function of a global intercept (int), and the sum of the m catchment characteristics  1, to  , (e.g.land use, topography) weighted by their relative contributions to spatial varaibility (_ 1 to _  ) (Eq. 3).
The temporal variability, represented by the deviation from the mean (∆  ), is a linear combination of n temporal variables,  1, to  , (e.g., climate condition, streamflow, vegetation cover) (Eq.4), at time i and site j.
To account for differences in these temporal influences across sites, the effect of each temporal variable at site j (_ , with N in 1,2, … n) is drawn from a distribution with a mean of   _ , (Eq.5), which is then modelled with a linear combination of two additional chatchment characteristics,  1, and  2, (Eq.6).
_ , ~ ( _ , ,  _  ) ,    , , … Section 2.2 introduces the data used to develop these Beyesian hierarchical models.Section 2.3 describes how the detailed model strucutre was determined, including the choice of key predictors for the spatial variability (i.e., catchment characteristics  1, to  , ) and temporal variability (i.e. 1, to  , and  1, and  2, ), and all their corresponding coefficient values.The approaches to evaluate model performance and robustness are described in Sect.2.4.

Data collection and processing
The Bayesian hierarchical models were developed with 21-year stream water quality observations at 102 catchments in state of Victoria, Australia.The collection and processing of the data are detailed in previous publications that worked with the same dataset (Lintern et al., 2018b;Guo et al., 2019) sampled between 1994 and 2014 at 102 sites were used to develop the model (Fig. 1).This was because these sites and this time period provided the longest consistent period of continuous records over the greatest number of monitoring sites.The catchments corresponding to these water quality monitoring sites were delineated using the Geofabric tool (Bureau of Meteorology, 2012), and have areas ranging from 5 km 2 to 16,000 km 2 .The water quality parameters of interest were: total suspended solids (TSS), total phosphorus (TP), filterable reactive phosphorus (FRP), total Kjeldahl nitrogen (TKN), nitratenitrite (NOx) and electrical conductivity (EC).These parameters represent sediments, nutrients and salts, which are some of the key concerns for water quality managers in Australia and around the world.These water quality data were sampled following standard DELWP protocols (Australian Water Technologies, 1999) and analysed in National Association of Testing Authorities accredited laboratories .We selected potential spatial explanatory variables (i.e.predictors to explain spatial variability) based on catchment characterisitics that are widely known to influence water quality condition (Lintern et al., 2018a).Fifty potential explanatory catchment characteristics were selected based on a literature review .Temporal explanatory variables of discharge (originally in ML d -1 ) and water temperature (°C) corresponding to the same timestamps for water quality observations were also extracted for each site over the study period (Department of Environment Land Water and Planning Victoria, 2016).Discharge was converted to streamflow (mm d -1 ) for each catchment, which allowed us to also calculate the average streamflows over 1, 3, 7, 14 and 30 days preceding the water quality sampling dates.In addition, gridded climate data (Frost et al., 2016;Raupach et al., 2009Raupach et al., , 2012) ) and the normalized difference vegetation index (NDVI) data (NASA LP DAAC, 2017;Eidenshink, 1992) were also extracted to calculate the catchment average daily rainfall (mm), daily evapotranspiration (ET) (mm), daily average temperature (°C), daily root zone (shallower than 1m) and deep (deeper than 1m) soil moisture, as well as monthly NDVI.A summary of these data and their sources is in Table S2 in the Supplementary Material.
The raw input data were filtered and transformed to increase the data symmetry, making them more suitable for use in the linear spatio-temporal model structure (Eqs.3, 4 and 6).For the filtering process, we first removed all water quality records with flags of quality issues and values below the limits of reporting (LOR).This was because that uncertainty of values below LOR may amplify after the transformation, posing large influence in the subsequent model fitting.Furthermore, those low concentrations were of less interest; poor water quality conditions (i.e., high constituent concentrations) were our primarily concerns to model.Water quality records corresponding to days with zero flows were also excluded from further analyses.
For the transformation process, we transformed the data of each of the spatial catchment characteristics,  S3 and S4 in the Supplementary Material.

Model fitting
Based on the general spatio-temporal modelling structure (Eqs. 2 to 6), we identified the best spatial predictors ( 1 to   in Eq. 3) and the best temporal predictors ( 1 to   in Eq. 4) in two sequential studies (Lintern et al., 2018b;Guo et al., 2019).The predictors were selected using an exhaustive search approach (May et al., 2011;Saft et al., 2016), which considered a large number of potential predictors and all possible combinations of these predictors.This selection approach required firstly fitting an individual model to each candidate predictor set, and then comparing all fitted models to select a single best set of predictors.Alternative models were evaluated based on the Akaike Information Criterion (AIC) (Akaike, 1974) and Bayesian Information Criterion (BIC) (Schwarz, 1978) to ensure optimal balance between model performance and complexity.
The key factors identified for the spatial and temporal variabilities in each constituent are listed in Tables S5 and S6 in the Supplementary Materials.General speaking, the key factors controlling the spatial variability in river water quality were land-use and long-term climate conditions (Lintern et al., 2018b).
Temporal variability was mainly explained by temporal changes in streamflow conditions, water temperature and soil moisture (Guo et al., 2019).We further modelled the spatial variation in each of these temporal relationships (β_T1 to β_Tn in Eq. 4) with two spatial characteristics,  1 and  2 (Eq.6), where a higher number of predictors was not used to avoid over-fitting.We found  1 and  2  1 and  2 were selected as the two catchment characteristics which had the highest correlations with the fitted parameter values of each temporal predictor, which were also summarized in Table S6 in the Supplementary Material.
After identifying the predictors for each constituent, the Bayesian hierarchical spatio-temporal model was fitted for each constituent across all monitoring sites.To achieve this, we used the R package rstan Stan Development Team, 2018), which enabled both the sampling of parameter values from posterior distributions with Markov chain Monte Carlo (MCMC) and model evaluation.Constituent standard deviation (σ) was assumed to be drawn from a prior of minimally informative distribution of half-normal N(0,10) that was truncated to only positive values (Gelman, 2006;Stan Development Team, 2018).The regression coefficient of each spatial predictor (β_S1, β_S2, …, β_Sm in Eq. 3) was assumed to be drawn from an independent hyper-parameter normal distribution with mean of β_S and standard deviation of σ_S.The site-level regression coefficients of the temporal predictors (β_T1,j, β_T2,j, …, β_Tn,j in Eq. 4, respectively) were sampled from the corresponding hyper-parameter normal distribution with means of µ.β_T1, µ.β_T2, …, µ.β_Tn and standard deviations of σ.β_T1, σ.β_T2, …, σ.β_Tn.The hyper-parameters were further assumed to be drawn from minimally informative normal distributions with N(0,5) (for all the means) and minimally informative half-normal distribution of N(0,10) that was truncated to only positive values (for all the standard deviations).In each model run there were four independent Markov chains.A total of 20,000 iterations were used for each chain.Convergence of the chains was checked using the Rhat value (Sturtz et al., 2005).

Model performance and sensitivity analyses
The performance of the fitted model for each constituent was first evaluated by comparing the simulated and observed concentrations at 102 sites altogether to understand how the full spatio-temporal variabilties were captured (Sect.Additional evaluations of model sensitivity were conducted with calibration and validation on subsets of the full data (Sect.3.2).Firstly, to understand the sensitivity of the model to the monitoring sites included for calibration, we randomly selected 80% of the sites for calibration and used the remaining 20% for validation, and repeated this validation process for five times for each constituent.The calibration and validation performance was compared to each other, as well as with the performance of the full model.
We also evaluated the model sensitivity to the periods of calibration.Since the study region was greatly influenced by a prolonged drought from 1997 to 2009 -known as the Millennium Drought, we focused on analysing the impact of this drought period.Specifically, we calibrated the model for each constituent to pre-, during-and post-drought periods (1994-1996, 1997-2009 and 2010-2014, respectively) and then validated the model on the remaining period which was not used for calibration.For example, when calibrating to the pre-drought period (1994)(1995)(1996), validation was performed on both the during and post-drought data (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014).Each corresponding calibration and validation performance was compared with each other as well as against that of the full model, to identify potential impacts of the drought on model robustness.

Model performance
The spatio-temporal water quality models show varying performances among the constituents.When assessed with only the above-LOR data (Fig. 2), the best performing models are those for EC and TKN, which capture 90.7% and 65.8% of the total observed spatio-temporal variability.The modelling power is lowest for FRP (NSE = -1.92),which might be related to the large number of FRP records below the LOR (38%).Similar to FRP, poorer model performance is also observed for NOx and TSS, with NSE  When simulating the site-level mean concentrations, the spatio-temporal models generally show good abilities to capture variability across sites for all constituents (Fig. 3).constituents are generally consistent with their capacities to capture spatio-temporal variability (Table 1).

Model sensitivity to calibration sites and periods
This section presents model sensitivity to different calibration sites and periods of record (as detailed in Sect.2.4).Note that in these evaluations, the FRP model is not a focus due to the poor model performance observed in Sect.3.1.
We first compare the performance of each spatio-temporal model fitted with the full dataset with those obtained from the five corresponding "partial" models that were calibrated to only 80% of the monitoring sites.Across constituents, the calibration performances obtained from the full dataset are comparable with the five models calibrated with 80% of the sites (calibration dataset).In addition, each pair of calibration and validation performance is highly consistent.In either comparison, the corresponding differences in NSE are within 0.1 (  The performance of the full model for each constituent is also compared with that of the three models calibrated to the pre-, during and post-drought periods.In general, we observe consistent performance for each constituent, across calibrations to the three periods of contrasting hydrological conditions (Table 3, see Figs.S7 to S12 in the Supplementary Material for detailed model fittings).One notable common pattern is that the performance for calibration and validation is more consistent for the drought period than either the pre-and post-drought periods.However, this is most likely explained by relative sizes of the calibration data sets, which are 3, 13 and 5 years for the pre-, during and postdrought periods respectively.Of all constituents (excluding FRP), TSS shows greater differences in model performances across periodsespecially when comparing the pre-drought calibration with its validation.Fig. 4 shows the corresponding TSS model fit as represented by the site-level mean concentrations for the three calibration/validation datasets.Notably, when calibrated to the pre-drought period and validated on both the during-and post-drought periods, the model over-estimates a majority of the data (Fig. 4 (b)); and when calibrated to the during-drought period, it slightly under-estimates pre-and post-drought period TSS (Fig. 4 (d)).

Implications for statistical water quality modelling
Our spatial-temporal water quality models are able to capture the majority of observed variability across the 102 sampling locations in Victoria (Sect.3.1); the model performances also allow us to explore some limitations of the modelling framework.The greatest limiting factor for model performance seems to be when high proportions of LOR data are present.As shown in Fig. 2 and  Figure 2 highlights another possible influence on model performance, which is a combination of our inability to analyse low concentrations and the limited resolution of these low-concentration measurements due to heavy transformation in data processing.This is evidenced by visually inspecting the fittings which show distinct "categorical" behaviour for low concentrations for some constituents.
This "categorical" issue impacts the six constituents to different extents, ranking from the strongest as: FRP, TSS, TP, NOx, TKN and EC -a ranking that is broadly aligned with the degree of lacking model performance for these constituents.Similar to the below-LOR records, when these categorical values are present in large proportions of the full records (e.g.TSS and FRP), they can also violate the linear model assumptions and cause performance deterioration.This issue could be overcome by alternative model structures that explicitly account for truncated data.For example, Wang and Robertson (2011) and Zhao et al. (2016) illustrated an approach to resolving the discontinuity of the likelihood estimation in modelling fitting to data with presence of zero values, which can be potentially extended to improve fitting for the categorical levels at low concentrations.
In addition, our current models are empirical relationships which are likely unable to represent complex biogeochemical processes.For example, performances for FRP and NOx might be limited because: 1) the linear model structure can over-simplify constituent dynamics due to biogeochemical processes that are often highly non-linear; 2) the model may not include parameters that can adequately represent relevant biogeochemical processes (due to the lack of these data).To better capture changes in reactive constituents, greater consideration of and data representing biogeochemical processes may be required Lastly, it is worth noting that our results are presented in the transformed scale for which the spatiotemporal models were developed and the statistical assumptions hold (see details on data transformation in Sect.2.2).Model performance is heavily affected when simulated model outputs are backtransformed to the measurement scale (see Figs. S13 in the Supplementary Information).However, our models are very useful in representing and predicting proportional changes in concentrations, which adds important information for assessing and managing catchment water quality.For example, an increase of 1 mg L -1 in suspended solids would be alarming in pristine streams and/or periods of good water quality, while having much less impact on highly polluted conditions.The transformed models developed in this study can help managers to understand these proportional changes to identify critical locations and periods of key water quality concerns.

Implications for water quality monitoring programs
Within the current spatio-temporal models, water quality temporal variability is based on monthly monitoring data.This suggests potential oppourtunities to further strengthen the model capacity to explain temporal variability by utilizing data with higher temporal resolution.This approach can be supported by recent developments that significantly improved the accessibility of high frequency water quality monitoring data (Bende-Michl and Hairsine, 2010;Outram et al., 2014;Lannergård et al., 2019;Pellerin et al., 2016).Another potential development is to use remote sensing data to augment low frequency sampled data with higher frequency remotely sensed estimates e.g. for sediments and nutrients (Glasgow et al., 2004;Ritchie et al., 2003).Alternatively, where high frequency data are lacking for the target constituent, high frequency proxy data could also be utilized to enhance the understanding obtained from low frequency samples.For example, turbidity can be used as surrogate for sediments and nutrients (Schilling et al., 2017;Robertson et al., 2018;Lannergård et al., 2019) , 2004;Smith et al., 2013).Therefore, model performance can potentially be further improved by increased capacities in the monitoring of temporal patterns of land management.

Potential impacts of long-term drought on water quality dynamics
Results of model calibration and validation to different time periods suggest a systematic decrease in TSS concentrations since the prolonged drought, in comparison with the pre-drought period under the same spatial and temporal conditions.Such a shift is not observed for any other five constituents analysed (nutrients and salts) (Sect.3.2).
In the literature, impacts of the Millennium Drought on the hydrology and runoff regimes of southeastern Australia are well understood (van Dijk et al., 2013;Leblanc et al., 2012;Saft et al., 2015).
However, less is known about how this significant and prolonged drought event has impacted water quality (Bond et al., 2008).Previous studies on other drought events around the world mainly focused on changes in water quality as responses to the reduced streamflow during drought.For examples , reduction in sediment levels have been reported during drought, due to lower erosion from the contributing catchment and lower rates of solid transport associated with reduced flows (Murdoch et al., 2000;Caruso, 2002).At a more local scale, increasing sediment concentrations during droguht have also been observed in streams adjscent to land with high densities of livestock and bushland, which both constantly contribute to sediment load during drought, leading to elevated concentrations due to lower dilution rate (Caruso, 2002).Similarly to sediments, the impact of droughts on stream nutrient and salt concentrations were also commonly understood as responses to reduced runoff generation and streamflow.Nutrient concentrations typically decrease during droughts in catchments with no significant point-source pollution (Mosley, 2015), as nutrient leaching and overland flow are reduced (Caruso, 2002).In contrast, catchments with significant point-source pollution generally experience water quality deterioration during drought due to reduced dilution (van Vliet and Zwolsman, 2008;Mosley, 2015).For salinity, concentration often increases during drought with reduced dilution and increased evaporation (Caruso, 2002), particularly for catchments that are more influenced by saline groundwater input where drought can increase the relative contribution of saline groundwater input (Costelloe et al., 2005).
However, our findings highlight other possible pathways on how drought can affect stream water quality.The results suggest that the prolonged drought induced changes in sediment dynamics i.e.
changes of relationships between sediments and its predictors (Figs. 4 and 5).In contrast to sediments, our models for nutrients and salts maintain consistent performance for different drought and non-drought periods, suggesting no clear shifts in dynamics.A few studies have also reported changes in the concentration-discharge relationships for sediments and nutrients.Specifically, these relationships changed from high-to low-flow conditions (Zhang, 2018;Moatar et al., 2017), as well as from drought to the recovery period (Burt et al., 2015).However, effects of extended multi-year droughts on the concentration-discharge relationships are less explored.Furthermore, there is also a lack of comprehensive assessments on the change of relationships between water quality and other relevant controls (e.g.water temperature, land cover etc.) during extended drought over large geographica l regions.Our findings highlight great oppourtunities to use this dataset to further investigate the impacts of prolonged droughts on water quality dynamics, especially the changes in relationships between TSS and each of its key controls across multiple catchments.
In addition, we acknowledge that our ability to represent the pre-and post-drought conditions in this study may be limited by the record length, since only 2 years of pre-drought and 4 years of post-drought data were available.Once longer records build up, they will enable us to update our understanding of the impact of this prolonged drought.We would be also able to conduct more sophisticated investigations, such as comparing the impacts of long-term droughts versus individual dry and wet years.
Addressing these research questions are particularly important in a changing climate that will be  et al., 2015;Chiew et al., 2014;Ukkola et al., 2015).

Conclusions
Using long-term stream water quality data collected from 102 sites in south-eastern Australia, we developed a Bayesian hierarchical statistical model to analyse the spatio-temporal variabilities in six key water quality constituents: TSS, TP, FRP, TKN, NOx and EC.The spatio-temporal models are capable of predicting future water quality changes in non-monitored locations under similar conditions to the historical period we investigated.A notable shift in TSS dynamics is observed since the extended drought in the study region, which highlights potential oppourtunities for further research to better understand the impact of this significant drought event on water quality.
Despite the promising performance of these models, the results also illustrate areas of further improvement, both in the modelling framework but also in the monitoring of water quality.In improving the modelling framework, alternative statistical approaches could be considered to reduce the impact of below detection limit and low concentration data on model performance.In addition, the models could be extended to take into account some key biogeochemical processes to better represent spatial-temporal variability in non-conservative constituents (e.g., FRP or NOx).To further enhance the performance of the current models, we recommend that future water quality monitoring programs be enhanced with: 1) collection and assimilation of high-frequency sampling data to enhance the temporal resolution of water quality data; and 2) more frequent monitoring of changes in land use intensity and management to be able to include these parameters in the model.These improvements will be very helpful to operational catchment management and mitigation.

Figure 1 .
Figure 1.Map of (a) the 102 selected water quality monitoring sites and their catchment boundaries, with insert showing the location of the state of Victoria within Australia; (b) annual average temperature and (c) annual precipitation and (d) elevation across Victoria.
https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.temporalexplanatory variables, as well aseach constituent to improve the symmetry of individua l distributions.The log-sinh transformation(Wang et al., 2012) was used for all catchment characteristics due to the presence of zero values in several characteristics (e.g., percentage area of different types of land use).The best log-sinh transformation parameter was determined for each spatial explanatory variable across all 102 catchments.In addition, all observed constituent concentrations and temporal explanatory variables were Box-Cox transformed.For each variable, i.e., 21-year time-series data across all 102 sites, we first identified the optimal Box-Cox parameter at each site λ, and then the averaged λ across all sites to determine the final λ used to transform a respective variable.This ensured a consistent transformation for each variable across all sites.All log-sinh and Box-Cox transformation parameters used are summarized in Table https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.via a Spearman correlation analyses (p<0.05) between the fitted parameter values of each temporal predictor variable (β_T1 to β_Tn) and potential spatial explanatory variables as mentioned in Sect.2.2.
3.1).As explained in Sect.2.2, the model calibration for each constituent was performed with only the above-LOR data.Therefore, model performance was first evaluated with only these above-LOR data.Performance was then evaluated with the full dataset including the below-LOR data, to understand the model capacity to simulate the full distribution of https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.constituent concentration.In addition, the model performance for capturing spatial differences was assessed by comparing the simulated and observed long-term mean concentration at each site.The performance assessments were based on both visual inspection of model fitting as well as the Nash-Sutcliffe efficiency (NSE), which suggested the proportion of variability that can be explained by the models (Nash and Sutcliffe, 1970).
https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.values of 0.216 and 0.225, where the proportion of below-LOR samples were 17.3% and 15%, respectively.When evaluated against the entire dataset (i.e., including both below-and above LOR data), the models explain 19.9% (FRP) to 88.6% (EC) of spatio-temporal variability (Table1).Model performances for FRP, NOx and TSS improve notably compared with previous evaluation on above-LOR data.However, FRP, NOx and TSS remain as the three constituents that are most difficult to predict.We further discuss the possible factors influencing their model performance in Sect.4.2.

Figure 2 .
Figure 2. Performance of the spatio-temporal models for each of the six constituents, represented by the simulated and observed concentrations of above-LOR records across all 102 calibration sites, in Box-Cox transformed space.Darker regions represent denser distribution of simulation and observation points.Dashed red lines show the 1:1 lines whereas dashed blue lines show the LOR levels.For each constituent, the percentage of data below the LOR and the model performance (NSE) are also specified.Table 1.Comparison of model performance for all records and only the above-LOR records for each constituent.

Figure 3 .
Figure 3. Model fit for site-level mean concentration at the 102 calibration sites, for the selected six constituents, in Box-Cox transformed space.The NSE for each constituent is also show and dashed red lines show the 1:1 lines.
https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.Table 2. Comparison of model performances (as NSE) of the full model (Column 2) and the five partial models (Columns 3 to 7) with each calibrated to 80% randomly selected monitoring sites.In Columns 3 to 7, the top row showing calibration performance and the bottom row showing the validation performance (i.e. at the 20% sites that were not used for calibration).
and the post-drought (2010-2014) periods.For each of the models, the calibration performance is shown on the top row and the validation performance (i.e. over the periods that were not used for calibration) is shown on the bottom row.https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.304 Figure 4. Comparison of the TSS model performance, as the simulated against observed site-305 level mean concentrations in Box-Cox transformed space.The left column shows calibration 306 performance for the model calibrated to the pre-drought (1994-1996), drought (1997-2009) and 307 the post-drought (2010-2014) periods, respectively; the right column shows the corresponding 308 validation performance for each period.See Sect.2.4 for details of the calibration and validation 309 https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.majority of sites for the pre-drought period.This is consistent with Fig. 4 in suggesting a systematic decrease in TSS concentration since the drought.The better performance of the full model during and after drought (Fig. 5) can be a results of calibration period of the full spatio-temporal modelbetween 1994 and 2014which was dominated by the during-and post-drought periods; consequently, the full spatio-temporal model can be largely defined by observed TSS dynamics during and after the drought.In summary, Figs. 4 and 5 suggest that since the drought, TSS concentrations experienced a largescale downward shift compared to the pre-drought period, under otherwise identical spatial and temporal conditions.Such a shift indicates changes in the relationships between TSS and its key spatial and temporal controls since the start of the drought.Some possible causes are further discussed in Sect.4.3.

Figure 5 .
Figure 5.Comparison of the performance of the full spatio-temporal TSS model across a) predrought (1994-1996), b) during drought (1997-2009) and c) post-drought (2010-2014) periods, as represented by the simulated against observed site-level mean concentrations in Box-Cox transformed space.
https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.from removing a large proportion of below-LOR data.This substantially increased the degrees of skewness and discontinuity of the data, essentially violating the assumption of linear modelling of continuous data, and thus limiting the performance of the spatio-temporal model.It is worth noting that in this study, since we modelled spatial and temporal variabilities in an integrated manner, the model may compensate representation of the individual components of spatial and temporal variability to improve fitting to the overall variability during calibration.Consequently, in this spatio-temporal modelling framework, large presence of below-LOR data can limit the accurate representation of both variability components.
https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License.characterized by lower streamflows and possibly a shift towards more intermittent flows in many parts of the world (Saft

Table 2
, see Figs.S1 to S6 in the SupplementaryMaterial for detailed fitting plots for the partial calibration/validation).These suggest that the spatiotemporal model performance are highly robust and remain unaffected by the choice of calibration sites.

Table 1
, model performance is best for TKN and EC, where proportions of below-LOR records are low.For constituents where the LOR records occupy greater proportions of the entire dataset, we observe poorer model performances (e.g.FRP).As illustrated in Fig.2, the FRP calibration data have a clear left-truncation pattern resulted Tang et al., 2005;DeFries and Eshleman are available from Australia state agencies, such as the Victorian Water Quality Monitoring Network database (Department of Environment Land Water and Planning Victoria, 2016) and the NSW Water information database (WaterNSW, 2018), and collated at national level in the https://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License. of Meteorology's Water Data Online portal (Bureau of Meteorology, 2019).These datasets may have great potential to enhance the temporal resolutions of records for other key water quality constituents (e.g.nutrients and sediments).Changes in land management over time (e.g.tillage, fertiliser application, irrigation) are currently not considered as predictors of water quality temporal variability.This is due to a lack of availability and/or inconsistency of available data.However, changes in land use management practices can occur over short time periods, which can lead to increases in pollutant sources and changes to runoff generation processes (e.g.Tang et al., 2005;DeFries and Eshleman

Table 2 . Comparison of model performances (as NSE) of the full model (Column 2) and the five partial models (Columns 3 to 7) with each calibrated to 80% randomly selected monitoring sites. In Columns 3 to 7, the top row showing calibration performance and the bottom row showing the validation performance (i.e. at the 20% sites that were not used for calibration).
://doi.org/10.5194/hess-2019-342Preprint.Discussion started: 23 July 2019 c Author(s) 2019.CC BY 4.0 License. https