Multivariate autoregressive modelling and conditional simulation for temporal uncertainty propagation in urban water systems

Uncertainty is often ignored in urban water systems modelling. Commercial software used in engineering practice often ignores uncertainties of input variables and their propagation because of a lack of user-friendly implementations. This can have serious consequences, such as the wrong dimensioning of urban drainage systems (UDS) and the inaccurate estimation of pollution released to the environment. This paper introduces an uncertainty analysis framework in urban drainage modelling and applies it to a case study in the Haute-Sûre catchment in Luxembourg. The framework makes use of the EmiStatR model 5 which simulates the volume and substance flows in UDS using simplified representations of the drainage system and processes. A Monte Carlo uncertainty propagation analysis showed that uncertainties in chemical oxygen demand (COD) and ammonium (NH4) loads and concentrations can be large and have a high temporal variability. Further, a stochastic sensitivity analysis that assesses the uncertainty contributions of input variables to the model output response showed that precipitation has the largest contribution to output uncertainty related with water quantity variables, such as volume in the chamber, overflow volume and 10 flow. Regarding the water quality variables, the input variable related to COD in the wastewater has an important contribution to the uncertainty for COD load (66%) and COD concentration (62%). Similarly, the input variable related to NH4 in the wastewater plays an important role in the contribution of total uncertainty for NH4 load (34%) and NH4 concentration (35%). The Monte Carlo simulation procedure used to propagate input uncertainty showed that among the water quantity output variables, the overflow flow is the most uncertain output variable with a coefficient of variation (cv) of 1.59. Among water 15 quality variables, the annual average spill COD concentration and the average spill NH4 concentration were the most uncertain model outputs (coefficients of variation of 0.99 and 0.82, respectively). Also, low standard errors for the coefficient of variation were obtained for all seven outputs. These were never greater than 0.05, which indicates that the selected MC replication size (1,500 simulations) was sufficient. We also evaluated how uncertainty propagation can explain more comprehensively the impact of water quality indicators for the receiving river. While the mean model water quality outputs for COD and NH4 20 concentrations were slightly above the threshold, the 0.95 quantile was 2.7 times above the mean value for COD concentration, and 2.4 times above the mean value for NH4. This implies that there is a considerable probability that these concentrations in the spilled CSO are substantially larger than the threshold. However, COD and NH4 concentration levels of the river water will likely stay below the water quality threshold, due to rapid dilution after CSO spill enters the river.

The objectives of this study are to: 1) select and characterise the main sources of input uncertainty accounting for temporal 95 auto-and cross-correlation within EmiStatR; 2) propagate input uncertainty through EmiStatR, taking into account temporal auto-and cross-correlation of uncertain dynamic inputs; 3) quantify and assess the contributions of each uncertainty source to model output uncertainty dynamically (over time) for the Luxembourg case study. Sewer Flow (CSF) and pollution; and 6) Combined Sewer Overflow (CSO) and pollution. Figure 1 illustrates the scheme of the sewer system analysed. 105 Basically, the total dry weather flow, Q DWF [l · s −1 ] is calculated as: 4 https://doi.org/10.5194/hess-2020-342 Preprint. Discussion started: 10 July 2020 c Author(s) 2020. CC BY 4.0 License.
where Q DWFt [l · s −1 ] is the dry weather flow at time t and Q st [l · s −1 ] is the dry weather flow of the residential sewage in the catchment at time t, calculated as 86, 400 −1 · pe t · qs t (where 86, 400 = 24 × 60 × 60 is a measurement unit conversion factor), with pe t [PE] the population equivalents of the connected CSO structure at time t, and qs t [l·PE −1 ·d −1 ] the individual 110 water consumption of households at time t. Q ft [l · s −1 ] is the infiltration flow at time t that enters the pipes from groundwater flow through cracks and joints, calculated as A imp · q ft , where A imp [ha] is the impervious area of the catchment, and q ft [l · s −1 · ha −1 ] is the infiltration water inflow flux (specific infiltration discharge from groundwater flow) at time t. Variables qs t and pe t are dynamic and can be defined as time series with daily, weekly and seasonal patterns.
The contribution of rain water to the combined sewage flow, Q r [m 3 · s −1 ], is derived from precipitation as follows: where 1/6 is a factor for units conversion, P t is a time series of precipitation per unit time at time t [mm · min −1 ]; A imp is the impervious area of the catchment [ha]; A total is the total area of the catchment [ha]; C imp is the run-off coefficient for impervious areas [-]; and C per is the run-off coefficient for pervious areas [-]. From Q rt , the CSO volume calculation is based on the exceeding volume stored in the Combined Sewer Overflow Chamber (CSOC). The CSO volume depends on four CSOC [mg · l −1 ], and the concentration due to rainwater pollution at time t, COD rt [mg · l −1 ]: 125 B COD,Svt = (cs mrt + 1) −1 V Svt (cs mrt · C CODt + COD rt ) The variable V Svt depends directly on the water volume in the CSO chamber at time t, V Chambert [m 3 ]. It is computed as: where V rt is the rain weather volume at time t accumulated during a time interval ∆t [min], V dwt [m 3 ] is the total dry weather volume (amount of dry weather water in combined sewage flow) at time t, V dt is the volume of throttled outflow to the WWTP C COD t = 10 3 · pe t · C COD,S qs t · pe t + 86, 400 · A imp · q ft (5) where C COD,S is the COD sewage pollution per capita [PE] load per day [g · PE −1 · d −1 ]. Similar equations as above apply to 135 the second water pollution indicator NH 4 .

Sewer system in the Haute-Sûre catchment
The study area is composed of three sub-catchments of the Haute-Sûre catchment in the north-west of the Grand-Duchy of Luxembourg. The combined sewer system drains three villages: Goesdorf (GOE), Kaundorf (KAU), and Nocher-Route (NOR).
The local sewer system downstream each village has a CSO tank to store pollutant peaks in the first flush of combined sewage 140 flows. Table 2 shows the general characteristics of each CSO tank for each village. Figure 2 depicts the location of the CSO tanks and the delineation of the sub-catchments. The main land use types in the villages are residential, smaller industries and farms. Outside of the villages forest as well as agricultural arable and grassland are the dominating land uses. The receiving water bodies of the CSO structures are tributaries of the river Sûre (Sauer, in German).

Input data 145
The input variables of the EmiStatR model are shown in Table 1. Following Torres-Matallana et al. (2018), seven input variables were calibrated: water consumption (qs t ), infiltration flow (q ft ), flow time structure equivalent to the time of concentration to the combined sewer overflow tank (CSOT) structure (t f S ), run-off coefficient for impervious area (C imp ), run-off coefficient for pervious area (C per ), orifice coefficient of discharge (C d ), and the initial water level (Lev ini ). The main objective of the calibration process is to represent appropriately the water volume in the CSOT.

150
The observed precipitation (P t ) is a one year time series for 2010 at 10 minute time interval, measured at stations Esch-sur-Sûre and Dahl (Fig. 2). The variable water consumption (qs t ) is also dynamic and represented as a time series with a daily pattern according to factors proposed in the design German guideline ATV-A 134 (Evers et al., 2000). while COD r was equal to zero. Table 1 summarises the base values of the general input variables and Table 2 presents the base values of input variables for each individual CSO. These base values were used when running EmiStatR in deterministic mode (see Section 3.1). Some of the variables were calibrated based on observations in the CSOT to simulate water level and concentrations and loads of pollutants spilled in the CSO to the stream, river or lake. Average spill COD conc. e , C COD,Sv ,av [mg · l −1 ] 99.9th perc.. f spill COD conc., C COD, Sv ,99 .9 [mg · l −1 ] Maximum overflow COD conc., Average spill NH 4 conc., C NH4 ,Sv ,av [mg · l −1 ] 99.9th perc. spill NH 4 conc., C NH4 ,Sv ,99 .9 [mg · l −1 ] Maximum spill NH 4 conc., C NH4 ,Sv ,max [mg · l −1 ] a PE = population equivalents; b COD = chemical oxygen demand; c NH 4 = ammonium; d coef. = coefficient; d conc. = concentration; f perc. = percentile.

Selection of model input for uncertainty quantification
Following recommendations from Nol et al. (2010), not all model inputs were taken into account in the uncertainty propagation analysis. Only inputs that are very uncertain and to which the model output is very sensitive were included because these are the ones that have the largest contribution to output uncertainty (Heuvelink (1998), Section 4.4). The level of uncertainty of the inputs was defined by expert judgement and similar case studies in the literature. A quick-scan was used to determine the model sensitivity to each of the model inputs, by running EmistatR in deterministic mode with input base values given in Table 1.

170
The level of model sensitivity was defined by analysing the mathematical model structure and components of the model, expert judgement and simulations with EmiStatR. Inputs that rank high on both the level of uncertainty and on model sensitivity were selected and included in the uncertainty propagation analysis.

Uncertainty quantification of selected model input
Because we used a statistical approach, probability distribution functions (pdfs) are the basis to represent uncertainties of the 175 selected model inputs. This constitutes the most difficult step of an uncertainty analysis and is done in different ways for constants and dynamic variables, as explained in the following sub-sections.

Uncertain constants
Following Heuvelink et al. (2007), an uncertain continuous numerical constant C can be characterised by its marginal (cumulative) pdf (mpdf): Usually a parametric approach can be taken, meaning that a common shape for F C is chosen (e.g., normal, lognormal, exponential, uniform) so that the mpdf is reduced to a number of parameters. In this study, the input variables that are in this category are: water consumption (qs), infiltration inflow (q f ), total area (A total ), impervious area (A imp ), the run-off coefficients for impervious area (C imp ) and pervious area (C per ), population equivalents (pe), flow time structure (t f S ), and initial water level 185 (Lev ini ).

Univariate autoregressive modelling
Dynamic uncertain inputs may be temporally autocorrelated. This may dramatically influence the outcome of an uncertainty propagation analysis and must therefore be accounted for. One way of doing this is by assuming an autoregressive order one (AR(1)) model: where y t is the uncertain input at time t, µ is its mean, φ is the autoregressive parameter (0 ≤ φ < 1), and w t is a Gaussian white noise time series with mean zero and variance σ 2 w . The initial value y 0 is taken from a normal distribution with mean µ and variance σ 2 . The parameters of the model can be estimated based on observations, or in absence of observations, suitable values are taken based on expert judgment or literature reference values. Note that the effect of the initial condition usually 195 fades out quickly and hence is not of important concern.
The implementation of the AR(1) model in R was done via the R function arima.sim of the R base package stats (R-Core-Team and contributors worldwide, 2017), both for model calibration and simulation.

Multivariate autoregressive modelling
In case of multiple uncertain dynamic inputs, cross-correlation between these inputs may also need to be included. For ex-200 ample, C COD,S and C NH4 ,S and their uncertainties are likely correlated. This can be done using a multivariate AR(1) model (Luetkepohl, 2005), which is a natural extension of the univariate AR(1) model: where Y (t) is a vector of inputs at time t, A is a square matrix with parameters that define how the variables at time t + 1 depend on those at time t, µ is now a vector of means and ε(t) a vector of zero-mean, normally distributed white noise 205 processes. We further assume that the variance-covariance matrix C of ε(t) is time-invariant. The initial value Y 0 is assumed normally distributed and uncorrelated (Λ is a diagonal matrix). In order to estimate the vector µ and matrices A and C, a sample of the variables of interest is needed. Parameter estimation is done by means of the R-package mAr (Barbosa, 2015).

Input precipitation model
In case precipitation is selected as an uncertain input to be included in the uncertainty analysis, then it too must be characterised 210 by a pdf. Since precipitation, however, is not normally distributed and has many zeros, we cannot make the Gaussian assumption and hence we cannot use the approach described in Section 2.5.2 to model its dynamic behaviour and uncertainty. In addition, we usually have precipitation measurements nearby so we need to condition the simulations to these measurements. Recall from Section 2.3 that in the case study precipitation data are recorded at stations Esch-sur-Sûre and Dahl.
Torres-Matallana et al. (2017) present a model to simulate precipitation inside a target catchment given a known precipitation 215 time series in a nearby location outside the catchment, while accounting for the uncertainty that is introduced due to spatial variation in precipitation. The method used for input precipitation uncertainty characterisation is essentially the same as the application of a Kalman filter/smoother (Kalman, 1960;Webster and Heuvelink, 2006). Calibration of that model requires precipitation time series at two locations near the catchment of interest. We briefly summarise the method here. We denote the measured time series of precipitation at the first location as P 1 (t) and that at the second location as P 2 (t). Once the model 220 is calibrated, it is used to simulate precipitation inside the target catchment from a single precipitation time series nearby the catchment.

Calibration
We begin by relating the two precipitation time series as: where δ(t) is a positive multiplicative factor that varies over time. We assume that P 1 (t), P 2 (t) and δ(t) are stationary and log-normally distributed stochastic processes. After log-transformation we get log(P 1 (t)) = log(P 2 (t)) + log(δ(t)) (10) We apply a Kernel (daniell) smoothing to the precipitation time series to avoid rapid fluctuation of the time series for precipitation depth values smaller than 0.1 mm. This also solves problems associated with taking logarithms of near-zero 230 values. Next, in order to estimate the parameters of δ(t), we filter the time series allowing the computation of a ratio between the two measured time series. This ratio represents the difference in precipitation as registered in two nearby rain gauge stations.
It is computed only for those cases where the precipitation depth of the two time series is greater than 0.01 mm.
To simplify notation we write LP 1 (t) = log(P 1 (t)), LP 2 (t) = log(P 2 (t)) and Lδ(t) = log(δ(t)). Since two out of three determine the third, we need only define two processes. We model the joint distribution of LP 1 (t) and Lδ(t) by a bivariate 235 AR(1) process, as introduced before: where ε 1 and ε δ are zero-mean, cross-correlated and normally distributed white noise processes.

Conditional simulation
To simulate a time series P for the target catchment from an observed time series P o at a nearby location, we make use of the fact that the calibrated AR(1) model quantifies how precipitation at one location relates to that at a nearby location. We make use of Eq. 9: This requires simulations of δ(t). These are obtained using the calibrated model Eq. 11, but now applied to the vector [LP o Lδ] T , which characterises the joint pdf of LP o and Lδ. We use this model to simulate Lδ conditional to the observed time series LP o . Since the two processes are jointly normally distributed we can make use of a well known property of the multivariate normal distribution (Searle, 1997, page 47). Let U and V be two jointly normally distributed random vectors. The 250 conditional distribution of U given V = v is then also normal and given by: We make use of this equation to simulate δ by substituting: (MC) simulation. We used MC simulation (Hammersley and Handscomb, 1964;Kalos and Whitlock, 2008) to analyse how 260 input uncertainty propagates through the EmiStatR model, because it is flexible and straightforward to implement. It is also feasible in our case study because EmistatR is a relatively simple model that does not involve a long computation time.

Monte Carlo simulation
The MC method runs the EmiStatR model repeatedly, each time using different model input values, sampled from their pdf.
The method thus consists of the following steps: 2. Compute and store sample statistics from the n model outputs.

270
Here, n is the number of MC runs, i.e. the MC sample size. Common sample statistics that measure the uncertainty are the standard deviation and quantiles of the distribution of MC outputs, such as the difference between the 0.95 and % 0.05 quantile, which can be easily calculated from the n Monte Carlo outputs.
Sampling from the pdf of uncertain inputs was done using simple random sampling.

275
Proper presentation of MC outputs is important to get the most out of the experiment. Therefore, summary statistics are one important way to summarise the MC outputs. Commonly, a MC study yields n model outputs, which are stored in the MC result matrix X in Boos and Osborne (2015). From this matrix, various statistics can be computed. Basic summary statistics include the mean µ MC , the standard deviation (σ MC ) and the variance σ 2 MC . From these we can compute the coefficient of variation CV MC (σ MC /µ MC ), which is a dimensionless expression of relative uncertainty. The coefficient of variation is a standardised 280 measure of the spread of a sampling distribution, being useful because it allows to directly compare variation in samples with different units, or with very different means (Marwick and Krishnamoorthy, 2019). We computed estimates and standard errors for these statistics and also for the interquartile range (IQR MC ), 0.005 (ζ 0.005 ) and 0.995 (ζ 0.995 ) quantiles, and the 99% width of the prediction band (ζ w,0.99 ).

Bootstrap computation for Monte Carlo summary 285
Following Boos and Osborne (2015) "Good statistical practice dictates that summaries in MC studies should always be accompanied by standard errors", we used the bootstrap method to compute standard errors of all MC statistics. These tools are particularly relevant in a case without analytic solutions (Boos, 2003). According to Boos and Osborne (2015, p. 228) standard errors for MC output statistics are often not computed, being an additional computational step on top of the overall analysis.
Standard errors are straightforward to compute for simple statistics such as the sample mean over the replications of the MC 290 output, but are more difficult to compute for more complex statistics, such as medians, sample variances and the classical Pearson measures of skewness and kurtosis. Therefore, to avoid burdensome computations we opted to compute the standard 13 https://doi.org/10.5194/hess-2020-342 Preprint. Discussion started: 10 July 2020 c Author(s) 2020. CC BY 4.0 License. errors by the bootstrap method. We briefly explain the bootstrap method below. For a detailed explanation we refer to Efron (1979).
To compute the bootstrap variance of estimators we follow the logic given by Boos and Osborne (2015). From a MC sample 295 Y 1 , ..., Y n , we draw a random sample of size n with replacement Y * 1 , ..., Y * n , and compute an estimatorθ of the MC statistic θ from this resample. We independently repeat this process B times, resulting in a sample of estimatorsθ 1 , ...,θ B . Then the bootstrap variance estimate,V B , is the sample variance of this sample of estimators: whereθ is the mean of the sample of estimators. The MC standard error, se, is simply the square root of the bootstrap variance.

Contributions of input variables to total uncertainty
A number of m + 1 MC analysis are needed to compute the contributions of input variables to total uncertainty, where m is 305 the number of model input variables selected for uncertainty quantification. The first MC analysis, MC tot , is done to compute the total output uncertainty by varying stochastically all input variables. The uncertainty associated with the first variable x 1 is quantified by a second MC analysis MC 1 , in which only x 1 is equal to its deterministic value, while the other input variables vary stochastically. Similarly, the other MC simulations MC 2 , MC 3 , ..., MC m are used to quantify the uncertainty for the variables x 2 , x 3 , ..., x m .

310
To quantify the contributions of individual input variables to the total uncertainty of the model inputs, the stochastic sensitivity S i for each uncertain input x i is computed. The first-order stochastic sensitivity index S i is defined as (Saltelli et al., 2008, p. 160-161): The first-order stochastic sensitivity index represents the main effect contribution of each input factor to the total variance of 315 the output. The larger the index, the more important the input uncertainty. We computed stochastic sensitivity indices per time step and aggregated contributions for the whole year. For plotting purpose, we aggregated the outputs from 10 minutes time step to hourly time steps. The aggregation was done for each individual MC run before the contributions were computed.

Water quality impact
The results of the MC uncertainty propagation were also compared with the water quality standards. Standards are introduced 320 to evaluate the impact of emissions of COD and NH 4 in CSOs into the receiving water. However, as Toffol (2006) Blumensaat et al. (2012). We assessed the emissions accordingly to the German guideline ATV-A 128 (1992), which is the standard for dimensioning and design of stormwater structures in combined sewers and commonly used in Luxembourg. The Austrian ÖWAV-RB 19 (2007) is also taken into account because it provides key reference guidelines for design of urban water infrastructure in central Europe.
Three main indicators are taken into account: hydraulic impact, COD concentration, and acute ammonium toxicity.

Hydraulic impact
According to the Austrian guidelines and as summarised by Kleidorfer and Rauch (2011), the evaluation of the hydraulic impact is given by: is the maximum sewer overflow discharge with return period one year, and Q r1 [l·s −1 ] is 335 the maximum water discharge in the river with return period once per year. The factor f h is taken as 0.1 in more sensitive streams, whereas it is 0.5 for streams with more stable bed and higher re-colonisation potential Toffol (2006  For NH 4 a similar approach was used.

Acute ammonium toxicity
Following Kleidorfer and Rauch (2011), "the ammonia (NH 3 ) concentration depends on the ammonium (NH 4 ) concentration and on the dissociation equilibrium between NH 3 and NH 4 (which is influenced by temperature and pH-value)". According to Kleidorfer and Rauch (2011), the Austrian guideline ÖWAV-RB 19 (2007) establishes a maximum value of 2.5 mg · l −1 for 355 the ammonium (NH 4 ) concentration calculated for one hour duration for salmonid streams. For cyprinid streams a maximum value of 5.0 mg · l −1 is recommended.

Selection of model inputs for uncertainty quantification
In this section we assess the degree of uncertainty and sensitivity for all input variables, following the procedure described in 360 Section 2.4. We summarise the results in Tables 3 and 4.

Wastewater
Water consumption, qs, is a fairly uncertain input variable and the model output is sensitive to this variable. Volume and flow of CSO are sensitive to changes in qs. Regarding water quality output, total load of NH 4 is very sensitive to changes in qs.
Pollution of sewage as COD load per capita per day, C COD,S is the first selected input variable for propagation of uncertainty, 365 due to the fact that it is both a very uncertain input variable and the model output (average and 99.9 percentile overflow COD concentration) is very sensitive to it. Pollution of sewage as NH 4 load per capita per day, C NH4 ,S , is also included in the uncertainty propagation analysis. It is a very uncertain input variable and the model output (overflow load and concentrations of NH 4 ) is very sensitive to it. The variables C COD,S and C NH4 ,S are very uncertain because these are correlated to the temporal and spatial pattern of water consumption, which has a daily, weekly and seasonal temporal variability.

Infiltration water
Inflow of infiltration water, q f is a very uncertain input variable because this inflow depends of the number of anomalies in the pipes (cracks or wrong connections) that allow infiltrations flowing into and out of the system. The distribution of these anomalies has a strong random component and hence q f is very uncertain, and model output is sensitive to it. Pollution of infiltration water as COD load per capita per day, COD f and pollution of infiltration water as NH 4 load per capita per day, 375 NH4 f are not uncertain because in the Haute-Sûre study area the values of these variables are negligibly small.

Rainwater
Precipitation, P is the main driving force of the model and given the spatial variability of the rain fields, this input is considered very uncertain. The model output, additionally, is very sensitive to it. As a consequence, this input variable is treated as the third input variable in the uncertainty propagation analysis. Pollution of runoff as COD concentration, COD r is the fourth 380 16 https://doi.org/10.5194/hess-2020-342 Preprint. Discussion started: 10 July 2020 c Author(s) 2020. CC BY 4.0 License.
input variable considered in the uncertainty propagation, given that it is a very uncertain and very sensitive input variable, particularly to load and concentration of COD in the overflow. Pollution of runoff as NH 4 concentration, NH4 r is considered fairly uncertain. The model output (overflow load and concentration of NH 4 ) is very sensitive to it.

Sub-catchment
The model is very sensitive to the total area A total and to the run-off coefficient for pervious area (C per ) and sensitive to the 385 impervious area A imp and to the run-off coefficient for impervious area (C imp ). However, we did not include A total and C per in the uncertainty analysis because these can be fairly accurately derived from spatial databases and hence their uncertainty is not large. The population equivalents pe is a sensitive variable but not very uncertain. Hence this variable was not included in the uncertainty analysis. The theoretical largest flow time in the catchment t f S is not uncertain and not sensitive.

CSO structure 390
Although model output is very sensitive to maximum throttled outflow Q d,max and volume V , these are not included in the uncertainty analysis because their values are accurately known. The same is true for the variables curve level -volume lev2vol, orifice diameter D d and discharge coefficient C d . These variables are accurately known and therefore not considered as uncertain variables. The initial water level in the chamber Lev ini is very uncertain but the model output is not sensitive to this variable. Therefore, Lev ini was not included in the uncertainty analysis.

Uncertainty quantification of selected model input
After evaluation of the model output sensitivity and taking into account the degree of uncertainties of each input, we selected four input variables to be included in the uncertainty analysis. These are C COD,S , C NH4 ,S , COD r and P (Table 4).

Sewage per capita COD and Ammonium
The fit of pdfs for the two uncertain inputs C COD,S and C NH4 ,S was based on measurements under dry weather flow condi-  Table 5 presents summary statistics of the dry weather flow measurements of COD and NH4 and the corresponding value of C COD,S and 405 C N H4,S . COD is converted to C COD,S by means of a simple conversion from mg·l −1 to g·PE −1 ·d −1 , by multiplying COD by the measured per capita flow (112 l·PE −1 ·d −1 ) and dividing by 1,000. NH 4 was converted to C N H4,S in a similar way.
Closer inspection showed that C COD,S and C NH4 ,S observations are best characterised by a lognormal distribution (Fig. 3).
Since C COD,S and C NH4 ,S are dynamic and cross-correlated, we calibrated a bivariate AR(1) model with state vector Y = [log(C COD,S ) log(C NH4 ,S )] T . The estimated parameters of the model using the methodology described in Section 2.5.3 are: 410 The defined multivariate autoregressive model also capture the dynamic behaviour, temporal correlation and cross-correlation of the input variables, deriving the probability distributions of C COD,S and C NH4 ,S from measurements in the Haute-Sûre catchment, which agreed well with values reported in the literature (Katukiza et al., 2014;Heip et al., 1997).

415
Regarding COD r , due to the fact that no field measurements were available, expert judgement and reference values from the literature were the basis to characterise the pdf of this input variable. The variable was assumed to be lognormally distributed with a mean value of 71 [mg·l −1 ]. Although, House et al. (1993) and Welker (2008) reported a higher value, 107 [mg·l −1 ] for COD r , we selected a lower value due to the specific characteristics of the CSO system in the Haute-Sûre catchment. The value of 150 [mg·l −1 ] as standard deviation of COD r leads to a coefficient of variation (sd·mean −1 ) equal to 2.11, which is greater 420 than the coefficient of variation for C COD,S (0.84). We allow the standard deviation of COD r to be greater than the standard deviation of C COD,S , because COD measurements in rain water are very uncertain.  20 https://doi.org/10.5194/hess-2020-342 Preprint. Discussion started: 10 July 2020 c Author(s) 2020. CC BY 4.0 License.

Input precipitation model
Precipitation and its associated uncertainty was modelled as an autoregressive model conditioned to the observed precipitation at a nearby measurement station. We assumed a multivariate lognormal distribution and included temporal correlation of the Next we generated conditional simulations of the 10-minute precipitation for 2010 for each subcatchment using the approach described in Section 2.5.4. Note that this involves simulating log-transformed precipitation which can easily be transformed to precipitation data using the antilog. The simulation procedure was repeated as many times as simulated precipitation time series were required for the MC uncertainty propagation analysis.
The simulated precipitation time series captured the main statistics of the observed time series well. Despite the satisfactory 435 performance of the proposed method, some cases showed an overestimation of the simulated precipitation, mainly due to high values of the ratio of the multiplicative factor δ(t). This behaviour was also recognised by McMillan et al. (2011), who stated that the multiplicative factor used in their study "does not capture the distribution tails, especially during heavy precipitation where input errors would have important consequences for runoff prediction".

Uncertainty propagation 440
Model output sensitivity and the degree of uncertainties evaluation of each model input helped to define the four input variables included in the uncertainty analysis: C COD,S , C NH4 ,S , COD r and P . In this section we present the results of the uncertainty propagation for these four selected input variables to the model output, both for water quantity (volume in the combined sewer overflow tank, CSOT, and overflow volume and flow) and for water quality (loads and concentrations of chemical oxygen demand, COD, and ammonium, NH 4 ).

Monte Carlo simulation size
In order to perform the MC propagation analysis, we first did a convergence test to estimate the number of simulations required.
Besides this test, we also computed the standard error of all MC outputs. These two methods have the same aim and are closely related. In the convergence test, the standard deviation of two different MC simulations with different random seeds were computed and compared for the seven output variables of EmiStatR, three representing water quantity variables (V Chamber , 450 V Sv and Q Sv ) and four for water quality (B COD,Sv , B N H4,Sv , C COD,Sv,av , and C N H4,Sv,av ). The results of the test indicated that in most cases between 250 and 1,000 MC simulations are enough to reach stable results in terms of the Nash-Sutcliffle model efficiency coefficient (NSE), where a NSE of 1 means a perfect match between observations and model output. In this case we got a NSE ≈ 0.998 for overflow volume. Regarding the water quality variable B COD,Sv , the test showed that a larger number of MC simulations is required. Between 1,000 and 2,000 simulations are required to reach stable results (NSE ≈ 0.880 455 for overflow COD load and 0.998 for overflow NH 4 load). Therefore, a number of 1,500 MC simulations was used to perform the uncertainty analysis of the water quantity and water quality outputs. Figure 4 illustrates results of the convergence test for the cases where the number of MC replications is 250, 1,000 and 1,500. In this figure the MC1 output is plotted on the x-axis and MC2 output on the y-axis. Although the model output corresponds to yearly time series at 10 minutes resolution, we only plotted those points where the overflow magnitude, and therefore COD and NH 4 load, is different from zero. As an 460 indication, for a MC replication size of 1,500, the NSE values for overflow COD and NH 4 concentrations are 0.816 and 0.998, respectively.
The computing times per MC replication are presented in Table 6. The computations were performed with two different Linux machines, a laptop with four cores for simulations between 50 and 500 replications, and a server with 80 cores for performing the simulations above 500 replications. Similar execution times were reached for MC1 and MC2 for one-month 465 time series at 10-min time steps (August 2010, 4,464 time steps), while substantial differences were obtained when the 80 cores server was used. We obtained similar timing for 1,500 replications with 50 cores as for 250 replications using three cores in the laptop. The timing reached demonstrates the feasibility to perform a solid MC uncertainty propagation analysis with EmiStatR.

Monte Carlo output and uncertainty quantification 470
The seven output variables from EmiStatR were analysed by MC input uncertainty propagation. Figure 5   tation as main driving input. For illustration purposes, two events of two-day duration each are shown. The first event occurred in spring (May 2010), the second in both events, and shows that the uncertainty is high when there is a high precipitation event.
The more intense the precipitation input, as seen in the figure inset at top-left (May event), the greater the uncertainty band width for overflow flow, as well as for COD and NH 4 loads (insets three and four) and concentrations (insets five and bottom).
The MC estimated statistics and the standard errors (se) are presented in Table 7. The table shows the uncertainty quantification 480 of outputs obtained from the MC uncertainty propagation for the first MC simulation (all selected input variables are uncertain).  Table 7 shows the standard deviation (sd) and the coefficient of variation (cv) for the seven output variables considered in the uncertainty propagation. For the volume in the CSO chamber, V Chamber , the annual mean standard deviation, σ MC , Low standard errors (se) for the coefficient of variation were obtained for all seven outputs. These were never greater than 0.05, which indicates that the selected MC replication size (1,500 for mc1) is a suitable value. This holds for all output statistics, because in all cases the standard error is small to the estimated value.

Contributions of input variables to total uncertainty
The contributions of input variables to the total uncertainty of the model inputs were also computed using the procedure described in Section 2.6.4. A total of four MC simulations with a total of 6,000 runs were performed for estimating S i (Eq. (16)). Afterwards, four contributions were evaluated per time step and aggregated for the whole year. Following Eq. (16), the per time step contributions of input variables to output variables in terms of percentage of variance, stochastic sensitivity S i of 500 the input variables C COD,S , C N H4,S , COD r and P were calculated. An example of the contributions analysis per time step is presented in Fig. 6. Here we remark that a high uncertainty over time is shown mainly for the Spring event.
The aggregated over time contributions of input variables to output variables in terms of percentage of variance, stochastic sensitivity S i of the input variables, were also calculated ( Table 8). Note that P is the only source of uncertainty for V Chamber and V Sv , while uncertainty in NH 4 inputs only propagates to NH 4 outputs, and similar for COD (Fig. 6).     27 https://doi.org/10.5194/hess-2020-342 Preprint. Discussion started: 10 July 2020 c Author(s) 2020. CC BY 4.0 License.
We found, as expected, that precipitation, P , is the only source of uncertainty from all uncertain input considered for water quantity output variables V Chamber and V Sv . Regarding average values for the whole year, for the water quality output variables B COD,Sv and C COD,Sv ,av , C COD,s has the largest contribution to the output variance, about 66 percent for B COD,Sv and about 62 percent for C COD,Sv ,av . The second variable that contributes to uncertainty of these COD output variables is P , with about 3 percent for B COD,Sv and 9 percent for C COD,Sv ,av . Similarly, the input variable C N H4,S play an important role in the 510 contribution of total uncertainty for B NH4 ,Sv (on average about 34 percent of the variance for the whole year) and C NH4 ,Sv ,av (about 35 percent). Equally contributing to uncertainty of these NH 4 output variables is P with about 66 percent for B NH4 ,Sv and 65 percent for C NH4 ,Sv ,av . From these results we can infer that precipitation is a main source of uncertainty for all six outputs considered.

Uncertainty and water quality impact 515
Quantification and assessment of the water quality impact is an important step after the uncertainty propagation. As described in Section 2.7, the assessment of water quality standards was done taking into account the reference thresholds recommended in the European Union guidelines for COD, and the German and Austrian guidelines for hydraulic impact and acute ammonium toxicity. in the overflow volume was higher than the thresholds. However, note that when entering the river system it will quickly be 540 diluted, suggesting that the negative impact on the environment will be dampened by the receiving water body.

Acute ammonium toxicity
We compared the acute ammonium toxicity reference values presented in Section 2.7.3 (2.5 mg · l −1 for the ammonium concentration calculated for one hour duration for salmonid streams, and for cyprinid streams a maximum value of 5.0 mg · l −1 ), with the values we found for ammonium. An average spill NH 4 concentration with a mean of 7.19 mg · l −1 , standard deviation of 5.66 mg·l −1 , and 90% prediction band width of 16.64 mg·l −1 was obtained. Figure 5 (bottom) shows the average spill NH 4 550 concentration for the two periods chosen for illustration. The ammonium (NH 4 ) concentrations in the overflow flow are higher than the reference values, which are given for concentrations in the river.
Table 9 (bottom) shows the exceedance percentage of overflow NH 4 concentration over the 2.5 and 5.0 mg·l −1 thresholds for the mean, 0.95 quantile and 0.995 quantile. We found a 1.8% exceedance of the mean and quantile values over the 2.5 and 5.0 mg·l −1 thresholds. A slightly lower percentage (1.1%) was found for the 5.0 mg·l −1 threshold, regarding mean value.

Discussion
This study aimed to select and characterise the main sources of input uncertainty in urban water systems, while accounting for temporal auto-and cross-correlation of uncertain model inputs, by propagating input uncertainty through the EmiStatR model, and quantifying and assessing the contributions of each uncertainty source to model output uncertainty dynamically (over time).
In the following discussion, we start with the accuracy of Monte Carlo analysis. Then, we discuss the water quality impact of 560 the model outputs to the environment, in relation to the uncertainty analysis, and finally, we highlight some limitations and possible solutions of the approach used in this work.

Uncertainty and water quality impact
Next we discuss how the uncertainty propagation analysis done gives additional insight regarding hydraulics, COD concentration and acute ammonium toxicity impact on water quality over the river Sûre due to the CSO discharges under study. After 565 doing the uncertainty propagation analysis we not only have predictions of model outputs but we also know how uncertain these are. An added value arises when we take into account the uncertainty information. For the case of the overflow spill flow, the expected model output (mean of 55.5 l·s −1 ) is below the environmental threshold of 75 l·s −1 , but the 0.95 quantile (164.1 l·s −1 ) is much above the threshold. This indicates that there is a considerable chance of being above the threshold.
Regarding water quality outputs, although the mean model output for COD and NH 4 concentrations is fairly above of the 570 thresholds, the 0.95 quantile is 2.7 times above the mean value for COD concentration, and 2.4 times above the mean value for NH 4 . Also here we can conclude that we are not certain that we are below the threshold, because there is a considerable probability that the true values are above, even though the expected value is below the thresholds.
We were able to compute the water quantity and quality at CSO outlet to the river. We found that water quality (COD and NH 4 ) were sometimes above the environmental threshold. Even if the expected value was below the threshold there could still 575 be a considerable probability that the quality was above the threshold because of the large uncertainty. Therefore, policy and decision makers and water managers need to be aware of this, because whenever concentrations are above the threshold this may harm the environment. Nevertheless it is worth noting that we computed concentration in the outlet of the CSO. When this spilled water enters the river it will quickly mix with the much cleaner river water and concentrations will drop quickly, so it is only a local problem. How local it is and how the river water quality is distributed in space and time is not an easy problem 580 to solve and requires the use of hydrological and hydraulic river models e.g. SIMBA (IFAK, 2007) or MIKE 11 (DHI, 2017).
Those models have been well-developed and for some of them uncertainty analyses have also been done (Beven and Binley, 1992;Refsgaard, 1997;Beven and Freer, 2001;Vrugt et al., 2003aVrugt et al., , 2008Beven et al., 2010;Andrés-Doménech et al., 2010;Beven, 2012;Jerves-Cobo et al., 2020;Yu et al., 2020), but obviously such uncertainty analyses can only be done if the inputs to these models are known as well as the uncertainty associated with these inputs. One of these inputs is inlet from CSO. That 585 is where our paper makes a very valuable contribution, because our work has quantified water quantity and quality of CSO structures, including uncertainty, and that is exactly what these river models need to be able to do an uncertainty propagation analysis.

Accuracy of Monte Carlo analysis
Regarding the Monte Carlo replication size for uncertainty propagation, we presented in Fig. 4 the results for three output 590 variables and three replications size 250, 1,000 and the selected 1,500 (NSE closer to 1.0 for most of the output variables). We compute replications for 50, 100, and from 250 to 2,000 at steps of 250 replications the comparison of two equal MC runs (MC1 and MC2) with different seed for the pseudo-random number generator. the results suggest that the output variables related to COD (load and concentration) have a larger dispersion when we compare MC1 and MC2 for the same replications size. This is also reflected in the larger standard errors reported in Table 7 for e.g. the overflow COD load. Nevertheless, 1,500 runs are a 595 feasible MC replication size for running a relative simple and fast model as EmiStatR (7.29 minutes in average execution time using parallel computing and 50 cores for a time series with 4,464 time steps). For a more complex full hydrodynamic model with a high computational burden, 1,500 replication four times to compute contributions it may be not possible. Therefore, we suggest to check the intermediate results of the MC convergence test and we will find that e.g. for quantity variables as the spill overflow volume and quality variables as the overflow NH 4 load, 250 replications (7.10 minutes in average execution 600 time using parallel computing and three cores for a time series with 4,464 time steps) per individual MC execution seem to be enough, which make more feasible the execution of this kind of uncertainty propagation. Figures 5 and 6 shows that there is a large uncertainty for the early May event and smaller uncertainty for the September event. This is due mainly to the presence of a large dry period before the spill event in May, i.e. a shorter dry period preceding the spill flow leads to a lower uncertainty. This finding suggest also an interaction between the antecedent dry period and the

Other sources of uncertainty
In this work we only looked at input uncertainty and not at parameter and model structural uncertainty. Further research can be done on those topics. Neumann (2007)  uncertainty. They emphasised that "still, uncertainty analysis is seldom applied in practice and the relative contribution of the individual model elements is poorly understood.". Also, they highlighted that after inferring the river process parameters with system measurements of flow and dissolved oxygen, combined sewer overflow pollution loads became the dominant uncertainty source along with rainfall variability. These findings agreed with our results.

615
Bachmann-Machnik et al. (2018) recognised that the most important parameters causing uncertainties in the sewer system model are connected area and the stormwater runoff quality. Our analysis confirms these findings, specifically regarding the stormwater runoff quality. In our study the input variable runoff COD was an important source of uncertainty with relation to the annual mean overflow COD released from the CSO.

620
Despite the extensive temporal uncertainty propagation analysis the approach also has some limitations which we present hereafter addressing possible solutions in future work.
1. Incorporation of the spatial distribution of model inputs. Specifically for precipitation, Breinholt et al. (2012) stated that due to a poor representation of the spatial precipitation that is measured by point gauges and the complexity of the sewer systems, large output uncertainty can be expected. We also infer that we obtained a large output uncertainty 625 due to neglect of the inherent spatial variability of precipitation. Therefore, we suggest that further research is needed to account for spatial variability of precipitation, that can bring light to understand how this variability impacts in the output uncertainty and quantify it properly. This issue should be related also to the problem of change of support. When modelling precipitation, we also ignored the support effect, i.e. we ignored that the sub-catchment area is much greater than a point. Future research may address this issue of change-of-support. Studies that tackled this issue are found e.g. place and that overall uncertainty in key water quality parameters actually decreases. We contribute in this discussion by advising to quantify uncertainties at sub-model level, because as we demonstrated the computational budget can be 635 reduced and make it feasible when dealing at the sub-module uncertainty propagation.
3. Accounting for cross-correlation between the inputs precipitation and runoff COD concentration. It is worth noting that we did not include correlation between COD r and P . Including such correlation would yield a more realistic model of the uncertainty because these variables are known to have a strong correlation. It is highly recommendable to include correlations between COD r and precipitation, because loads in chemical oxygen demand are correlated with 640 the overland flow due to precipitation, which may transport distributed pollutants to the sewer system. Also the inputs C COD,s and C NH4 ,s can be related with a daily curve that reflect the pattern of consumption in the household like the German ATV-A 134 curve. We used the latest version of EmiStatR (version 1.2.2.0), which considers this kind of patterns.
4. Absence of high frequency water quality observations to compare with model outputs and uncertainty prediction 645 bands. In order to gain understanding of the temporal dynamics of nutrients (nitrogen, N, and phosphorus, P), Yu et al.
(2020) applied high frequency monitoring in a groundwater fed low-lying urban polder in Amsterdam (The Netherlands).
They argued that although spatial and temporal concentration patterns from discrete sampling campaigns of water quality parameters, such as EColi, showed a clear dilution pattern, the temporal patterns of N and P were still poorly understood, given their reactive nature and more complex biogeochemistry. Therefore, high frequency measurement, is a key factor 650 to understand these temporal dynamics and patterns.

5.
Absence of a joint spatio-temporal uncertainty analysis. According to Zhou et al. (2020), the limitations in algorithms for classic uncertainty estimates is the cause that only the uncertainty in one dimension (either temporal variability or spatial heterogeneity) is considered, whereas the variation in the other dimension is dismissed, resulting in an incomplete assessment of the uncertainties. Zhou et al. (2020), also showed that classic metrics underestimate the uncertainty through 655 averaging, which means a loss of information in the variation across spatio-temporal scales. To handle this limitation, suitable methods are the three-dimensional variance partitioning for a new uncertainty estimation in both spatio-temporal scales (Zhou et al., 2020), or spatio-temporal geostatistics (Gräler et al., 2016).

Conclusions
In this final section we conclude with highlighting the importance of temporal uncertainty propagation analysis and the selec-660 tion and characterisation of uncertain model inputs impacting model sensitivity. We also point out that uncertainty propagation analysis helps to identify the most contributing sources and can provide better evidence for the impact assessment of pollutant release from sewer systems to the environment, in particular to the receiving waters.
1. Uncertainty analysis is important because it quantifies the accuracy of model outputs and quantifies the uncertainty source contributions. The latter provides essential information to take informed decisions about how to improve 665 the accuracy of the model output. But MC uncertainty analysis is only possible if it is computationally feasible. We used a simplified urban water system model with capabilities to apply for minimising transient pollution from urban wastewater systems in parallel mode, which minimises model running time, allowing uncertainty propagation, long term simulations and evaluation of complex scenarios. These capabilities are crucial also for e.g. real time control applications, where simplified models of fast running times are desirable.
specific cases, depending on the type of input variable: i) Uncertain constant inputs, characterised by their marginal (cumulative) pdf e.g. water consumption, infiltration flow, impervious area and run-off coefficients; ii) Temporally autocorrelated dynamic uncertain inputs, characterised by univariate time series autoregressive modelling e.g. COD r ; iii) Temporally cross-correlated multiple dynamic uncertain inputs, characterised by multivariate time series modelling, considering cross-and no-correlations among variables e.g. C COD,S and C NH4 ,S ; and iv) rain gauge input precipitation, 680 characterised by autoregressive model conditioned to the observed precipitation (P ).
4. Model input uncertainty propagation through the simplified combined sewer overflow model (EmiStatR) helped to understand how does uncertainty propagate and how large is the uncertainty of EmiStatR outputs in a case study. Three output variables were considered for water quantity and four variables for water quality. The Monte Carlo uncertainty propagation analysis showed that among the water quantity output variables, the overflow flow, Q Sv , is the 685 more uncertain output variable and has a large coefficient of variation (cv of 1.585). Among water quality variables, the annual average spill COD concentration, C COD,Sv ,av , and the average spill NH 4 concentration, C NH4 ,Sv ,av , were found to have large uncertainty (coefficients of variation of 0.988 and 0.815, respectively). Also, low standard errors (se) for the coefficient of variation were obtained for all seven outputs. They were never greater than 0.05, which indicated that the selected MC replication size (1,500 simulations) was a suitable value. 690 5. Regarding the main sources of uncertainty model outputs, for water quantity outputs, was precipitation, while for COD water quality outputs were P , C COD,S and COD r , and for NH 4 outputs P and C NH4 ,S .
6. Finally, we evaluated how uncertainty propagation analysis can explain more comprehensively the impact of water quality indicators to the receiving river for the Luxembourg case study. Although the mean model water quality outputs for COD and NH 4 concentrations is fairly above of the thresholds, the 0,95 quantile is 2.7 times above the mean 695 value for COD concentration, and 2.4 times above the mean value for NH 4 . We conclude that we are not certain that environmental thresholds are not exceeded, because there is a considerable probability that values are above, even though the expected value is below the thresholds. This is valid for concentrations in the spilled CSO, therefore, is important to highlight that the results confirmed our hypothesis that annual mean COD and NH 4 river concentrations are lower than the released CSO concentrations due to dilution and henceforth compliant with the water quality thresholds given by the 700 guidelines consulted.
Code and data availability. The code scripts and datasets related to Figures 03 to 06 of this paper are available on Zenodo: