A space-time Bayesian hierarchical modeling framework for projection of seasonal maximum streamflow

. Timely projections of seasonal streamflow extremes can be useful for the early implementation of annual flood risk adaptation strategies. However, predicting seasonal extremes is challenging particularly under nonstationary conditions and if extremes are correlated in space. The goal of this study is to implement a space-time model for the projection of seasonal streamflow extremes that considers the nonstationarity (inter-annual variability) and spatio-temporal dependence of high flows. We develop a space-time model to project seasonal streamflow extremes for several lead times up to 2 months using a Bayesian 5 Hierarchical modeling (BHM) framework. This model is based on the assumption that streamflow extremes (3-day maxima) at a set of gauge locations are realizations of a Gaussian elliptical copula and generalized extreme value (GEV) margins with nonstationary parameters. These parameters are modeled as a linear function of suitable covariates describing the previous season selected using the deviance information criterion (DIC). Finally, the copula is used to generate streamflow ensembles, which capture spatio-temporal variability and uncertainty. We apply this modeling framework to predict 3-day maximum 10 streamflow in spring (May-June) at seven gauges in the Upper Colorado River Basin (UCRB) with 0 to 2 months lead time. In this basin, almost all extremes that cause severe flooding occur in spring as a result of snowmelt and precipitation. Therefore, we use regional mean snow water equivalent and temperature from the preceding winter season as well as indices of large-scale climate teleconnections – ENSO, AMO, and PDO – as potential covariates for 3-day spring maximum streamflow. Our model evaluation, which is based on the comparison of different model versions and the energy skill score, indicates that the model can 15 capture the space-time variability of extreme streamflow well and that model skill increases with decreasing lead time. We also find that the use of climate variables slightly enhances skill relative to using only snow information. Median projections and their uncertainties are consistent with observations thanks to the representation of spatial dependencies through covariates in the margins and a Gaussian copula. This spatio-temporal modeling framework helps to plan seasonal adaptation and preparedness measures as predictions of extreme spring streamflows become available 2 months before actual flood occurrence.

1 Livneh and Badger, 2020). However, such models should also consider potential nonstationarities (inter-annual-variability) due to climate change (Musselman et al., 2018) and the spatio-temporal dependence of the data as flood risk may increase if multiple catchments are affected at once (Thieken et al., 2015;Hochrainer-Stigler, 2020). Here, we develop a Bayesian Hierarchical Model (BHM) for the prediction of extreme spring streamflow that considers both nonstationarity and spatiotemporal dependence and ask: 60 -How does the representation of nonstationarity (inter-annual-variability) through suitable covariates improve seasonal predictions?
-How does the explicit representation of spatial dependencies improve prediction performance?
-To what extent are seasonal projections for longer lead times still skillful?
We address these questions by applying the proposed BHM to project 3-day spring maximum (May-June) streamflow at seven gauges in the Upper Colorado River Basin (UCRB). We consider 3-day maxima instead of 1-day maxima because snowmeltdriven events in spring typically have longer durations and flood volumes than events driven by convective storms in summer.

Proposed framework
We propose a space-time modeling framework for the prediction of seasonal streamflow extremes that has three components: (i) a hierarchical model structure, (ii) nonstationary margins, and (iii) a spatial dependence model. Each of these model com-70 ponents, their estimation strategies, and the estimation of ensembles of seasonal streamflow extremes are described below.

Hierarchical model structure
We conduct a nonstationary frequency analysis of seasonal streamflow extremes at m gauges in a river basin -say, q 1 , . . . , q m -over k years. To do this analysis, we consider a Bayesian hierarchical model that accounts for spatial dependence and nonstationarity. In the first layer of the hierarchical model structure, also known as the data layer, we consider that the joint 75 distribution of streamflow at multiple gauges in each year is modeled using a Gaussian elliptical copula with generalized extreme value (GEV) margins (Coles, 2001;Katz, 2013;He et al., 2015). Specifically, the proposed model structure for m locations is (q 1 (t), . . . , q m (t)) ∼ C m (Σ; µ (t) , σ (t) , ξ) (1) 80 where C m is an m-dimensional Gaussian elliptical copula with dependence matrix Σ and GEV parameters (location, scale, and shape) µ i (t) ∈ (−∞, ∞), σ i (t) > 0 and ξ i ∈ (−∞, ∞). The second layer of the hierarchy, also known as the latent layer, Marginal distribution y(s 1 ,t 1 )……… y(s 1 ,t j ) ………y(s 1 ,t k ) y(s i ,t 1 )………. y(s i ,t j ) ………y(s i ,t k ) ……… ……… ……… y(s m ,t 1 )………y(s m ,t j ) …….y(s m ,t k ) ……… ……… ……… q(s 1 ,t 1 ) ……… q(s 1 ,t j ) ………q(s 1 ,t k ) ……… ……… ……… q(s m ,t 1 ) ………q(s m ,t j ) …….q(s m ,t k ) considers that the three parameters of the GEV are nonstationary and are time-varying, i.e., for the streamflow gauge i at time . Specifics of models (1) and (2) are given in the next sections. A conceptual sketch of the BHM is shown in Fig. 1 which shows the data layer (Gaussian copula and GEV marginal 85 distributions) and the latent layer (time dependence of GEV parameters).

Marginal distributions
The first two GEV parameters (location and scale) are modeled as linear functions of time-dependent large-scale climate variables and regional mean variables from the previous season while the shape parameter is considered to be stationary: α µji x j (t), i = 1, . . . , m 90 log (σ i (t)) = α σ0i + n j=1 α σji x j (t), i = 1, . . . , m where x j (t) is covariate j at time t, α µji , α σji , and α ξ0i are the regression coefficients for covariate j and gauge i at time t. log (σ) is used to ensure positive scale parameters. The validity of the nonstationary location and scale parameters can be checked through the significance of their slope coefficients' posterior PDFs (i.e., by checking whether 95% of the sample 95 values are greater (lower) than 0). Covariates will be discussed in section 3.2. The shape parameter ξ is often modeled as a single value per study area/region because of large estimation uncertainties (Cooley et al., 2007;Renard, 2011;Apputhurai and Stephenson, 2013;Atyeo and Walshaw, 2012) or modelled spatially along with the other GEV parameters but considering a specific range of potential values (Cooley and Sain, 2010;Bracken et al., 2016). Here, we model the shape parameter for each catchment individually but consider it to be stationary in time.

100
2.3 Gaussian copula for spatial dependence Copulas are a flexible tool for modeling multivariate random variables since they can represent dependence independently of the choice of marginal distributions (Genest and Favre, 2007;Bracken et al., 2018;Brunner et al., 2019), which is a particularly appealing feature for extremal processes. This study focuses on Gaussian copulas because of their ease of implementation in a Bayesian and high-dimensional framework.

105
Let be a random vector of extreme streamflow at m gauges and time t. The Gaussian copula builds the joint cumulative distribution function (CDF) of q (t) as with Φ being the CDF of the standard normal distribution, and F it (·) the marginal GEV 110 or empirical CDF for streamflow gauge i at time t.
The copula dependence matrix, Σ, is a symmetric positive definite matrix that captures the strength of dependence between all gauge pairs using the Pearson correlation coefficient. The element c ij of Σ quantifies the dependence between gauges i and j and its values can vary between -1 and 1: 115 By definition, the dependency between a streamflow gauge and itself is unity, so the diagonal elements of Σ are 1's. Since Σ is symmetric, only m(m − 1)/2 values need to be fitted (values in the lower or upper triangle of Σ). The Gaussian copula only assumes linear correlation after quantile transformation of the marginals with the inverse normal CDF. This does not impose a linear correlation structure on the marginal distributions, meaning that nonlinear dependence between variables can be captured at the data level (Bracken et al., 2018).

120
There are two main approaches to estimate the unknown parameters of the conditional copula (Hochrainer-Stigler, 2020).
The first one is called inference functions for margins (IFM). In this approach, the marginal distribution parameters are estimated in the first step and the copula parameters in the second step. One major disadvantage of this approach is the loss of estimation efficiency, as in the first step, the dependence between the marginal distributions is not considered. The second approach (hereafter referred to as pseudo-observations fitting) estimates the copula parameters without assuming specific parametric distribution functions of the marginals, and pseudo-observations are used instead. The framework proposed here considers pseudo-observations fitting to estimate the copula dependence parameters.

Estimation strategy
Inference of the model parameters is done in a Bayesian framework, which can account for parameter uncertainties. The posterior distributions of the model parameters, θ = [µ, log σ, ξ] and Σ, given the data (3-day spring maximum (May-June) 130 streamflow at each gauge and covariates) and considering a record length of k days by Bayes' rule, are represents the PDF of the GEV for location i and time t, is the empirical CDF (pseudo-observations) for location i at time t, and p Σ ( Σ) and p θ (θ) represent the priors of the 135 GEV regression coefficients and Gaussian copula dependence matrix, respectively. p θ (θ) is defined as

140
where M V N α µj 0, Σ αµ j , M V N α σj 0, Σ ασ j , and M V N α ξ0 | 0, Σ α ξ 0 represent probability densities of multivariate normal distributions with mean 0 and covariance matrices Σ αµ j , Σ ασ j , and Σ α ξ 0 correspond to the priors of the GEV regression coefficients α µj , α σj , and α ξ0 respectively; and p Σα µ j Σ αµ j , p Σα σ j Σ ασ j , and p Σα ξ 0 Σ α ξ 0 are the priors of Σ αµ j , Σ ασ j , and Σ α ξ 0 , which based on Gelman and Hill (2006) are assumed to follow an inverse-Wishart distribution to ensure a positive definite covariance matrix 145 Σ αµ j ∼ Inv wishart (ν, A j I) ν corresponds to the degrees of freedom (m + 1), I is an (m + 2) × (m + 2) identity matrix, and A j , B J , and C 0 are scalars properly set for Σ αµ j , Σ ασ j , and Σ α ξ 0 , respectively. The regression coefficients are modeled jointly to capture their spatial 150 correlations. Finally, p Σ (Σ) represents the prior of the Gaussian copula dependence matrix and is also assumed to follow an inverse-Wishart distribution to ensure a positive definite covariance matrix where C u corresponds to the covariance matrix of the uniform quantiles obtained from the pseudo-observations.

155
The predictive posterior distribution of spring maximum streamflow (ensembles) for the m streamflow gauges can be estimated using the posterior distributions (M samples) of GEV regression coefficients, α µj , α σj , and α ξ0 , and the Gaussian copula dependence matrix, Σ, which have been estimated using the estimation strategies presented in the previous section. To estimate spring maximum streamflow ensemble members, for each posterior sample of all model parameters (α µj , α σj , α ξ0 , and Σ) the procedure is as follows: 160 1. If the posterior PDFs of the slope coefficients of the GEV parameters were found to be significant, compute the GEV parameters (µ i (t), log (σ i (t))) for each year and gauge using α µi , α σi , and covariates, x (t), using Eq. (3)-(4).
2. Simulate marginal cumulative distributions, F it , for the m streamflow gauges from a Gaussian copula with a dependence matrix, Σ.
3. Compute spring maximum streamflow for each streamflow gauge i at time t using the following expression where σ i (t) = exp (log(σ i (t)) 4. Repeat steps 2-4 for each gauge and year of the record.
This procedure has to be repeated M times.
We demonstrate the utility of the framework proposed in the previous section by applying it to project 3-day spring maximum (May-June) streamflow at seven gauges in the Upper Colorado River basin (UCRB) with 0 to 2 months lead time (Fig. 2).  Systems, 2007). The basin has a snowmelt-dominated flow regime, i.e., most of the precipitation is accumulated during winter in the form of snow. Over 85% of the basin's streamflow and flood peaks occur in spring due to snowmelt. Therefore, using snow information of the basin might provide skillful projections of spring maximum streamflow several months in advance.

Streamflow data 180
Daily spring, May through June, streamflow data were obtained from the US Geological Survey (USGS) using the R package dataRetrieval (De Cicco et al., 2018). We selected streamflow gauges located inside the UCRB with no more than 10% or three consecutive years of missing data from 1965 to 2018. This procedure resulted in the selection of seven streamflow gauges ( Fig. 2 and Table 1). The seven gauges have drainage areas between 14.5 and 432.5 km 2 , elevations between 2105 and 3179 m.a.s.l., and mean streamflow and mean seasonal (May-June) streamflow range from 1 to 28.3 m 3 s −1 and from 0.3 to 8.3 185 m 3 s −1 , respectively. The seven catchments are not nested, instead they are individual sub-catchments of the larger UCRB. We computed 3-day spring maximum streamflow for each year at each streamflow gauge. For a gauge with missing annual values, these values were substituted with the gauge's median value. Figure 3 shows the spatial dependence of 3-day spring maximum streamflow for the seven gauges, which was assessed using Kendall's rank correlation for each pair of stations. The dependence of 3-day spring maximum streamflows between gauges 190 is generally positive as indicated by Kendall's rank correlation values higher than 0.4 (significant at 95% confidence level) for all station pairs. These significant spatial correlations require the inclusion of a copula in the BHM to capture the spatial dependence of the data.

Covariates
It has been shown in previous studies that snow water equivalent (SWE) accumulated until the beginning of spring is the most  (ENSO, Redmond and Koch, 1991;Kahya and Dracup, 1994;Rajagopalan et al., 2000;Thomson et al., 2003;Timilsena et al., 2009). However, the relationship of these indices with annual or seasonal extremes has not been explored in-depth and earlier studies shown modest relationships -e.g. between Southwest US seasonal or annual maximum streamflow and ENSO and PDO (Sankarasubramanian and Lall, 2003;Werner et al., 2004). 205 We test the usefulness of both snow-related and climatic drivers within the BHM framework by using SWE and different climatic indices as potential covariates of the marginal GEV distributions. Specifically, we tested the following covariates for March, or April depending on the lead time (0, 1 or 2 months lead time), and spatial average April mean temperature (SAAMT).

210
As indicated earlier, we focus on projecting 3-day extreme spring streamflow at lead times of 0-, 1-and 2-months lead time corresponding to forecast issuance on May 1st, April 1st and March 1st, respectively Fig. 4. The covariates use all information prior to forecast issuance. We did not consider spatial average precipitation as covariate since it can not be well predicted ahead of more than two weeks (Werner and Yeager, 2013).
The potential covariates -monthly ENSO, PDO, and AMO climate indices, monthly accumulated SWE from November to 215 April, and daily mean temperature for April -were computed using the following data sources: For SWE and daily mean temperature, we only considered snow gauges (18)  covariates as the mean across all snow gauges and meteorological stations for each year, respectively. We considered the same SWE (SASWE) and April mean temperature (SAMAT) covariates for all the gauges because these regional variables do not rely on one single station, which may not necessarily stay in operation, and they can capture the spring maximum 225 streamflow signal for all the station gauges. For example, suppose in the future, a particular snow gauge is out of operation.
In that case, one can keep using the average of the operative snow gauges and still derive the covariate. Figure 5 displays the time series of normalized spring maximum streamflow with the SASWE (SAMAT) covariate and the best local SWE (MAT) covariate, respectively, for the station UCRB1. The two regional covariates can capture the inter-annual variability of the spring maximum streamflow without an important reduction of the correlation compared to local covariates. The suitability 230 of the regional covariates is also verified for the other catchments and lead times.
We assessed the strength of the relationship between the covariates and spring maximum streamflow by computing the Spearman's rank correlation coefficient, which is shown in Fig. 6 for a 0-month lead time (covariates were calculated from November to April). SASWE (Fig. 6a) exhibits a significant and strong positive correlation with spring maximum streamflow at all the gauges, which was expected and also supports the choice of SASWE as a covariate for all the station gauges. At 3.3 Suitability of the GEV distribution and Gaussian copula 245 We checked the suitability of the GEV distribution as a marginal and of the Gaussian copula as a spatial dependence model using their maximum likelihood estimates for 3-day spring maximum streamflow.
To check the validity of the GEV distribution as a marginal distribution, we fitted a stationary GEV distribution at each gauge using maximum likelihood. Then, we ran two goodness-of-fit tests, the Cramer-von Mises and Anderson-Darling tests (D'Agostino and Stephens, 1986). The p-values for the two tests were higher than 0.3 for most of the gauges, except for UCRB1 250 (0.11), which implies that at the 95% confidence level, there is insufficient evidence to reject the null hypothesis that the data come from a GEV distribution. Q-Q plots of the stationary GEV distribution fitted to 3-day spring maximum streamflow for the seven streamflow gauges of UCRB can be found in Fig. A3.
For testing the suitability of the Gaussian copula, we ran three multivariate normality tests using marginal transformations based on pseudo-observations: Royston's (Royston, 1982), Mardia's (Mardia, 1970), and Henze-Zirkler's tests (Henze and 255 Zirkler, 1990). The p-values obtained for the three tests were 0.27, 0.45, and 0.6, respectively, which did not reject the null hypothesis that the transformed data follow a multivariate normal distribution. As an illustration of the goodness fit of the Gaussian copula, Figure 7 shows the pairwise dependence structure between gauges UCRB2 and UCRB4 for different copula families (for more details about the other types of copulas, the reader is referred to Hochrainer-Stigler (2020) or Genest and q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (d) PDO (e) AMO  Favre (2007)). Black points in each panel ( Fig. 7b-g) correspond to the observed CDF values. The simulated contour lines 260 represent simulated contour lines for different dependence structures, including independence (no copula, Fig. 7b), Gaussian ( Fig. 7c), Student-t (Fig. 7d), Joe (Fig. 7e), Gumbel (Fig. 7f), and Vine copulas (Fig. 7g). Only Gaussian, Student-t, and Vine copulas can capture the dependence structure of the data. This visual inspection confirms that the Gaussian copula is suitable to replicate the dependence structure of the observed data.

265
The specific structure of the BHM for the UCRB incorporates the covariates described in section 3.2. We modeled the location parameter of the GEV at each gauge in a nonstationary way but the scale parameter of the GEV is kept stationary for all gauges.
An initial run of the BHM with a nonstationary scale parameter showed that the regression coefficients (α σj = 0 for j > 0 in Eq.(4)) were not significant (posterior PDFs contain zero in the 95% credible interval). The priors of the covariance matrix of α µj , α σ0 , and α ξ0 used are: We considered noninformative priors for the covariance matrix of the GEV regression coefficients by setting A 0 = 100,

275
A j = 10, B 0 = 1, and C 0 = 1. Since we fitted different types of BHMs, Σ αj is only considered for nonstationary models. For the dependence matrix of the Gaussian copula, we used an informative prior where C u corresponds to the covariance matrix of the uniform quantiles obtained from the pseudo-observations of spring maximum streamflow.

Implementation and model fitting
The model was implemented in R (R Core, 2017) using the program JAGS (Just Another Gibbs Sampler; Plummer, 2003) and the R package rjags (Plummer, 2019), which provides an interface from R to the JAGS library for Bayesian data analysis.
Posterior distributions of the GEV regression coefficients and Gaussian copula matrix were estimated using Gibbs sampling (Gelman and Hill, 2006;Robert and Casella, 2011) based on the priors assigned in the previous section. The predictive posterior 285 distributions of spring maximum streamflow (ensembles) for all years were estimated according to section 2.5. We ran three parallel chains with different initial values and each simulation was performed for 2,000,000 iterations with a burn-in size value of 1,000,000 to ensure convergence. To reduce the sample dependence (autocorrelation), the thinning factor was set to 500, which yields 6000 samples (2000 samples from each chain). The scale reduction factorR (Gelman and Rubin, 1992)

Model cross-validation and verification metrics
To test the out-of-sample predictability of the model, we performed leave-one-year-out cross-validation by dropping one year 295 from the record (1965-2018) and fitting the BHM using the remaining years, also known as the calibration years. The fitted model is applied to provide estimates for the one validation year. This cross-validation procedure was repeated 54 times.
As the goal of this study is to provide seasonal streamflow extremes projections for risk-based flood adaptation, we also implemented this leave-one-year-out cross-validation just for high-flow years in which all the station gauges exceeded their 60th percentile. 300 We computed the energy skill score (ESS) as our verification metric since we are interested in capturing the spatio-temporal dependence of the data. The energy score (ES) assesses probabilistic forecasts of a multivariate quantity (Gneiting and Raftery, 2007;Gneiting et al., 2008): where M = 3000 is the size of the ensemble forecast, q j is the 7 × 1 vector of the jth ensemble forecast at year t, q o is 305 the 7 × 1 vector of observed streamflow at year t, and || · || denotes the Euclidean norm. This performance metric is a direct generalization of the continuous ranked probability score (CRPS; Gneiting and Raftery, 2007;Gneiting et al., 2008), to which the energy score reduces in dimension d = 1. Based on ES, the energy skill score (ESS) is defined as where ES projection is the mean ES of the projection model analyzed and ES reference is the mean ES of the reference projec-310 tion. The ESS ranges from −∞ to 1. ESS < 0 indicate that the reference projection has higher skill than the projection model, ESS = 0 implies equal skill, and ESS > 0 means that the projection model has a higher skill, with ESS = 1 representing a perfect score. For this study, we considered the climatology (sampling from the observations) as the reference projection and benchmark. ESS was computed for both the calibration and validation models.  Table 2 shows DIC values for different candidate BHMs for a 0-month lead time and the best model for the other two lead times (last two rows). In the case of a 0-month lead time, the best model corresponds to the first row of Table 2. For a 0-month lead time, predictive skill resides in SWE and air temperature as this hydro-climatic information relates 320 well to the streamflow extremes in May-June (see Fig. 6). However, for longer lead times of 1 and 2-months, the large-scale climate indices add predictive value. All the models fitted consider a Gaussian copula to model spatial dependence. Detailed tables with different models fitted for 1-month and 2-months lead times can be found in Tables A1 and A2.

Ability of the BHM to capture spatio-temporal dependence
To assess the ability of the BHM in capturing spatio-temporal dependence through nonstationary covariates and a Gaussian 325 copula, we compared the best model for the 0-month lead time selected in the previous section (i.e., SASWE and SAAMT as covariates with a Gaussian copula) against three models that do not consider a Gaussian copula: Stationary, nonstationary with SASWE as covariate, and nonstationary with SASWE and SAAMT as covariates. capturing the spatio-temporal dependence compared to the benchmark (i.e., values below 0). This model also shows a lower variability of the skill score than the other Bayesian hierarchical models, which is expected because the stationary model produces the same projection for each year. When a nonstationary model with SASWE as a covariate is considered (grey box), the skill is substantially improved compared to the benchmark and the stationary model (i.e., most of the values above 0). This improvement is consistent with Fig. 6 and previous studies stating that SWE until the beginning of the spring season is the 335 most skillful predictor of spring-summer seasonal streamflow in mountainous regions (Koster et al., 2010;Livneh and Badger, 2020). When SAAMT is added as a covariate (dark grey box), there is a significant improvement of ESS compared to the model with only SASWE. The best model with copula (dark turquoise box) substantially increases model skill. Model skill increases were tested with a 95 % confidence interval using a nonparametric test for the median (Gibbons and Chakraborti, 1992).
To further highlight the ability of the Gaussian copula to capture spatio-temporal dependence, we computed spatial joint 340 threshold non-exceedance probabilities for the seven stations and compared them to non-exceedance probabilities derived from three models without the copula, i.e., spatially independent models for each streamflow gauge. In Figure 9, we display the distribution of the probability that all streamflow gauges do not exceed their kth percentile for the observations and the four models shown in Fig. 8 for a 0-month lead time and the calibration period. The model with nonstationary covariates and a Gaussian copula (dark turquoise box plots) can almost reproduce the shape of the observed distribution in terms of the 345 inter-quartile range (boxes), which is an indication that this model can capture the spatio-temporal dependence of the data well. Figure 9 also shows that even without considering a copula, a nonstationary model with skillful covariates can partially capture the spatio-temporal dependence (see in Fig. 9 the difference of the distribution shape for the stationary and nonstationary models without copula). The ability of nonstationry model to partially capture the spatio-temporal dependence suggests that with increasing skill of the covariates, the added value of the copula gets smaller. Similar performance was observed for the 350 nonstationary model with SASWE as a covariate plus a Gaussian copula and for a nonstationary model with a less skillful covariate (ENSO) plus a Gaussian copula (not shown here). In order to assess the spatial performance of the BHM, Figure 10 shows the time series of the spatial average projection of maximum specific streamflow in spring over all seven gauges of the UCRB from the calibration (1965-2018) for the best model for a 0-month lead time without a Gaussian copula and with a Gaussian copula, respectively. Our results show that 355 by adding a Gaussian copula to the BHM (Fig. 10b), observations can be better captured by the ensembles' median; and the ensembles represent a higher variability, which allows for capturing some observations that are not captured without the copula (Fig. 10a). For the remainder of this paper, we use the term "average projection of spring maximum specific streamflow" to refer to spatial average projection of spring maximum specific streamflow over all seven gauges of the UCRB.

Stationary SASWE SASWE+SAAMT SASWE+SAAMT with Gaussian copula
Observed Figure 9. Distribution of the probability that all gauges do not exceed their kth percentile for four different versions of the BHM for a 0-month lead time for the calibration period . The whiskers show the 95% credible intervals, boxes the inter-quartile range, and horizontal lines inside the boxes, the median. Outliers are not displayed. The observations are shown as blue circles and the same color scheme as in Fig. 8 is considered for the four versions of the BHM.

360
To define the extent to which seasonal streamflow extremes projections for longer lead times can be skillful, we assess the performance of the BHM at different lead times for the calibration and leave-1-year-out cross-validation, including crossvalidation focusing on extremes (60th percentile). Covariates considered for each lead time were presented in section 4.1. Figure 11 displays the Energy Skill Score (ESS) distribution for different lead times from calibration, leave-1-year-out crossvalidation, and leave-1-year-out cross-validation on extremes. EES decreases as the lead time increases indicating a decrease 365 of model skill with increasing lead time (Fig. 11a). However, the three lead times capture the multivariate dependence better than the calibration benchmark (ESS values above 0). For the leave-1-year-out cross-validation (Fig. 11b), there is a substantial decrease in performance compared to the calibration, except for the 0-month lead time (dark turquoise boxplot). However, model performance for the 1-month lead time is better than the benchmark (ESS quartiles above 0), and the model for a 2-months lead time shows lower performance than the benchmark only for the third quartile (above 0). Finally, for the leave- 1-year-out validation for extremes (Fig. 11c), all lead times show higher skill than the benchmark (ESS quartiles above 0), highlighting the potential usefulness of the model for flood risk assessments.
To assess at site performance, we computed the continuous rank probability skill score (CRPS; Hersbach, 2000;Gneiting and Raftery, 2007) at each gauge (see Fig. A4). As for the ESS, the CRPSS ranges from −∞ to 1, and its values have the same Observations not captured by ensembles Observations captured by ensembles Ensembles meaning (see section 3.6). We obtained similar results to ESS (Fig. 11) with the exception of a few gauges for leave-1-year-out 375 cross-validation and leave-1-year-out cross-validation for extremes where the performance was poorer than the benchmark for the cross-validation. Figure 12 shows the time series of average projected spring maximum specific streamflow ensembles and the distributions of the Pearson correlation coefficient from the leave-1-year-out cross-validation for the three lead times and the benchmark.
Simulations relying on models with a copula show a similar variability and can capture observed values inside of their en-380 sembles' variability for all three lead times ( Fig. 12a-b). The benchmark (Fig. 12d) cannot capture the observations since it converges to a stationary model, i.e., it gives the same projection for all years. There is a slight performance reduction for models with 1-and 2-months lead times compared to the 0-month lead time (Fig. 12e). The medians of the Pearson correlation

Discussion
Compared to operational forecast models that consider short lead times and seasonal streamflow forecast models that are useful for reservoir operation with a focus on water availability during the dry season, the proposed BHM proposed here has 390 the following benefits: -It allows for considering potential climate change effects by modeling the margins in a nonstationary setting using suitable covariates.
-It allows for capturing spatio-temporal dependence by including a Gaussian copula. Consequently, the spatial BHM captures observations that are not captured by the average projection of spring maximum specific streamflow of a BHM 395 without a copula.
-It provides average projections of spring maximum specific streamflow up to 2-months in advance by relying on the predictive skill of snow accumulated during the winter season. Whiskers show the 95% credible intervals, boxes the inter-quartile range, and horizontal lines inside the boxes, the median. Outliers are not displayed. All lead time models consider a Gaussian copula.
A question coming to mind is: how can the proposed modeling framework be used to deliver interpretable seasonal average projections? It might be difficult for decision-makers to make decisions based on average spring maximum specific streamflow 400 over all seven gauges of the UCRB (see Fig. 12). To overcome this difficulty, we propose to provide interpretable average projections of spring maximum specific streamflow by providing the first three quartiles of the ensembles along with some past streamflow values as a reference. Providing reference values can help to make decisions about risk adaptation up to two months in advance. Reference values can be, e.g., the observed median specific streamflow for an average year without flood occurrence; the maximum observed specific streamflow on the record; or the lowest observed specific streamflow (threshold) 405 that can cause flood occurrence. To find the threshold flow that can cause flood occurrence, we computed the ratio between average maximum and average spring mean specific streamflow over all seven gauges of the UCRB. Then, we picked the average spring maximum specific streamflow value of the year with the highest ratio value, which corresponds to the threshold streamflow. The reason for this is that when the basin is drier (high ratio between peak runoff and total seasonal runoff), even if the peak streamflow is not so high, a flood can occur. Based on the lowest observed specific streamflow (threshold, q thresh ) that 410 can trigger flood occurrence we define a potential flooding alarm system. This system defines different flooding alarm levels by comparing the threshold against the first three quartiles (exceedance probability, q 25th , q 50th , and q 75th ) of average projections of spring maximum specific streamflow for the year analyzed. This potential flooding alarm system is shown in Fig. 13a. Thus, for each lead time, the flooding alarm is activated with a low risk of flooding if q 25th > q thresh ; moderate risk of flooding if q 50th > q thresh ; or high risk of flooding if q 25th > q thresh . In all other cases, the alarm is not activated.  (Werner and Yeager, 2013). In addition, Figure 13c shows the average projection of spring maximum specific streamflow of the UCRB for 2018 when the flooding alarm is successfully not activated because the three quartiles for each lead time and the observed streamflow for 2018 are below the threshold streamflow.

425
The nonstationary and spatial BHM framework proposed here was applied to the UCRB using 3-day maxima. However, the framework is flexible and can be applied to other types of maxima such as 1-day maxima, be implemented in other regions, be applied to other types of extremes such as droughts, or be used under future climate conditions. In order to apply the framework in another variable or region, the choice of covariates has to be reconsidered and potentially adjusted. In the application presented here, we only modeled the location parameter as nonstationary. If the framework is applied to another basin, this 430 modeling choice has to be reconsidered. It is advisable as a first step to do an initial run of the model for defining which parameters should be considered nonstationary. If one wishes to apply the framework to predict another type of extreme such as low-flows, one needs to reconsider distribution choice and to identify suitable covariates. In addition, the framework is not limited to projections, it can also be adapted for simulation purposes by considering real-time covariates. In addition, it can be easily adjusted such that it represents future climate conditions if future projections of the covariates are available. However, 435 the predictive skill of the model fitted and applied to the UCRB may change under future climate conditions. The relative importance of snowmelt and precipitation in causing flood events may change in the future, with precipitation becoming relatively more important. Consequently, the model's predictive skill, which heavily relies on SWE as a covariate, might slightly decrease in future. However, in the case of headwater basins in mountain regions such as the one considered in this study, snowmelt will remain the dominant flood generation process in the future, as shown by climate change projections in the region (Safeeq et al., 2016). Under changing conditions, one may want to reconsider covariate choice and test additional potentially skillful covariates such as specific ocean and atmosphere features that are expected to have a stronger relationship with the basin streamflow and extremes (e.g.; Grantz et al., 2005;Regonda et al., 2006;Bracken et al., 2010). Although this framework can be applied to a more extensive stream gauges network, we do not recommend that since clusters of different streamflow behavior will develop as the size of the region of interest increases. In that case, it is more efficient to fit a model 445 for each cluster than to fit a model for the entire region, which will be more computationally expensive. Fitting a model for each cluster allows for using different covariates for each cluster, which may help to provide more skillful estimates than the non-cluster case.

Summary and conclusions
In this study, we presented a Bayesian Hierarchical Model (BHM) to project seasonal streamflow extremes for several catch-450 ments in a river basin for several lead times. The streamflow extremes at a number of gauges in a basin are modeled using a Gaussian elliptical copula and Generalized Extreme Value (GEV) margins with nonstationary parameters. These parameters are modeled as a linear function of suitable covariates from the previous season.
We applied this framework to project 3-day spring maximum (May-June) streamflow at seven gauges in the Upper Colorado River Basin (UCRB) network, at 0-, 1-, and 2-months lead time. As potential covariates, we used indices of large-scale climate 455 teleconnections -ENSO, AMO, and PDO, regional mean snow water equivalent, and temperature from the preceding winter season.
From the analysis of different models for a 0-month lead time, we conclude that: -The spatial average snow water equivalent (SASWE) accumulated during fall and spring is the most skillful predictor of spring season maximum streamflow across the UCRB.

460
-The increase in BHM performance is low when adding other climatic indices such as PDO.
-Including a copula in the BHM enables capturing the spatio-temporal dependence of streamflow extremes which is not fully possible with independent marginal models.
The comparative analysis for three different lead times revealed that increasing the lead time from 0 to 2 months only weakly decreases model skill. This finding implies that the framework proposed could be useful for the early implementation 465 of flood risk adaptation and preparedness strategies. We propose an alternative to guide decision making by providing average projections of spring maximum specific streamflow as the first three quartiles of the ensembles of the average projection of spring maximum specific streamflow along with past observed specific streamflow values as reference. Such a communication strategy could help decision-makers to implement adaptation strategies that address the spatial dimension of flooding.