An Uncertainty Partition Approach for Inferring Interactive Hydrologic Risks

15 Extensive uncertainties exist in hydrologic risk analysis. Particularly for interdependent hydrometeorological extremes, the random features in individual variables and their dependence structures may lead to bias and uncertainty in future risk inferences. In this study, an iterative factorial copula (IFC) approach is proposed to quantify parameter uncertainties 20 and further reveal their contributions to predictive uncertainties in risk inferences. Specifically, an iterative factorial analysis (IFA) approach is developed to diminish the effect of the sample size and provide reliable characterization for parameters’ contributions to the resulting risk inferences. The proposed approach is applied to multivariate flood risk inference for the Wei River basin to demonstrate the applicability of IFC for tracking the 25 major contributors to resulting uncertainty in a multivariate risk analysis framework. In detail, the multivariate risk model associated with flood peak and volume will be established and further introduced into the proposed iterative factorial analysis framework to reveal the 2 individual and interactive effects of parameter uncertainties on the predictive uncertainties in the resulting risk inferences. The results suggest that uncertainties in risk inferences would 30 mainly be attributed to some parameters of the marginal distributions while the parameter of the dependence structure (i.e. copula function) would not produce noticeable effects. Moreover, compared with traditional factorial analysis (FA), the proposed IFA approach would produce a more reliable visualization for parameters’ impacts on risk inferences, while the traditional FA would remarkably overestimate the contribution of parameters’ interaction 35 to the failure probability in AND (i.e. all variables would exceed the corresponding thresholds), and at the same time, underestimate the contribution of parameters’ interaction to the failure probabilities in OR (i.e. one variable would exceed its corresponding threshold) and Kendall (i.e. the correlated variables would exceed a critical multivariate threshold).


Introduction
Many hydrological and climatological extremes are highly correlated among each other, and it is desired to explore their interdependence through multivariate approaches. Examples include sea level rise and fluvial flood (Moftakhari et al., 2017), drought and heat waves (Sun 45 et al., 2019), soil moisture and precipitation (AghaKouchak, 2015). Moreover, even one specific hydrological extreme may have multiple attributes, such as the peak and volume for a flood, duration and severity for a drought, and duration and intensity of a storm (Karmakar and Simonovic, 2009;Kong et al., 2019). Traditional univariate approaches, mainly focusing on one variable or one attribute of hydrological extremes (e.g. flood peak), may not be 3 sufficient to describe those hydrological extemes containing multivariate characteristics. Thus the univariate frequency/risk analysis methods may be unable to obtain reliable risk inferences for the failure probability or recurrence intervals of interdependent extreme events (Chebana and Ouarda, 2011;Requena et al., 2013;Salvadori et al., 2016;Sadegh et al., 2017) 55 Since the introduction of the copula function into hydrology and geosciences by De Michele and Salvadori (2003), the copula-based approaches have been widely used for multivariate hydrologic risk analysis. The copula functions are able to model correlated variables with complex or nonlinear dependence structures. Also, these kinds of methods are easily implemented since the marginal distributions and dependence models can be estimated in 60 separate processes, which also give flexibility in the selection of both marginal and dependence models. A large amount of research has been developed for multivariate hydrologic simulation through copula functions, such as multivariate flood frequency analysis (Sraj et al., 2014;Xu et al., 2016;Fan et al., 2018Fan et al., , 2020; drought assessments (Song and Singh 2010;Kao and Govindaraju 2010;Ma et al. 2013); storm or rainfall dependence 65 analysis (Zhang and Singh 2007;Vandenberghe et al. 2010); streamflow simulation (Lee and Salas 2011;Kong et al., 2015) and other water and environmental engineering applications Huang et al., 2017).
For both univariate and multivariate analyses for hydrometeorological risks, uncertainty 70 would be one of the unavoidable issues which needs to be well addressed. The uncertainty in hydrometeorological risk inference mainly results from stochastic variability of hydrometeorological processes and incomplete knowledge of the watershed systems (Merz 4 and Thieken, 2005). Many studies have been proposed to address uncertainty in both univariate and multivariate hydrological risk analysis (e.g. Merz and Thieken, 2005;75 Serinaldi, 2013; Dung et al., 2015;Zhang et al., 2015;Sadegh et al., 2017;Fan et al., 2018;2020). However, one critical issue in uncertainty quantification of hydrological inference is how to characterize the major sources for uncertain risk inference. Qi et al. (2016) employed a subsampling ANOVA approach (Bosshard et al., 2013) to quantify individual and interactive impacts of the uncertainties in data, probability distribution functions and 80 probability distribution parameters, on the total cost for flood control in terms of flood peak flows. Even though the subsampling ANOVA approach is able to reduce the effect of the biased estimator on quantification of variance contribution resulting from the traditional ANOVA approach, it should be noticed that merely subsampling one uncertainty parameter/factor (referred as single-subsampling ANOVA), as used in the studies by 85 Bosshard et al. (2013) and Qi et al. (2016a), will lead to (i) an underestimation of the individual contribution for the factor to be sampled and (2) overestimation of contributions from those non-sampled factors. Moreover, few studies have been reported to characterize the individual and interactive effects of parameter uncertainties in marginal and dependence models on the multivariate risk inferences. 90 Consequently, as an extension of previous research, this study aims to propose an iterative factorial copula (IFC) approach for quantifying and partitioning uncertainty metrics from different sources in multivariate hydrologic risk inference. In detail, the parameter uncertainties are quantified through a Monte Carlo-based Bootstrap algorithm. The 95 interactions of parameter uncertainties are explored through a multilevel factorial analysis 5 approach. The contributions of parameter uncertainties are analyzed through an iterative factorial analysis (IFA) method, in which all uncertainty factors will be subsampled to generate more reliable results. The applicability of the proposed IFC approach will be demonstrated through case studies of flood risk analysis in the Wei River basin in China. parameter uncertainty quantification, and (iv) parameter interaction and sensitivity analysis. 105 In IFC, modules (i) and (ii) are proposed to construct the most appropriate copula-based hydrologic risk model. In detail, a number of distributions, such as Gamma, generalized extreme value (GEV), lognormal (LN), Pearson type III (P III), and log-Pearson type III (LP III) distributions, are usually employed to describe the probabilistic features of individual random variables (e.g. flood peak and volume). Also, in order to quantify the dependence 110 structures of correlated random variables, many copula functions have been proposed, such as Gaussian copula, Student t copula, Archimedean copula family (e.g. Clayton, Gumbel, Frank and Joe copulas). In the current study, the indices of root mean square errors (RMSE) and Akaike information criterion (AIC) will be employed to identify the most appropriate model for hydrologic risk inference. Module (iii) quantifies parameter uncertainties in marginal 115 distributions and copulas. Modules (iv) would be the core part of our study to identify the main sources of uncertainties in multivariate risk inference by the proposed iterative factorial analysis (IFA) approach.

125
A copula function is a multivariate distribution function with uniform margins on the interval [0, 1]. Sklar's Theorem states that any d-dimensional distribution function F can be formulated through a copula and its marginal distributions (Nelsen, 2006). In detail, a multivariate copula function can be expressed as: More details on the theoretical background and properties of various copula families can be found in Nelsen (2006).

7
If appropriate copula functions are specified to reflect the joint probabilistic characteristics 140 for a multivariate extreme event, the conditional, primary and secondary return periods (RP) can be obtained. Consider one kind of hydrological extreme (denoted as X) with d attributes (i.e. X = (x1, x2, …, xd)), and for a specific extreme event X * with its attributes being X * = (x1 * , x2 * , …, xd * ), three categories of multivariate RP can be applied for determining the potential risk of X * .

145
(i) "OR" case T OR : where Ĉ is the multivariate survival function of the Xi's proposed by Salvadori et al. (2013;, and ( | ) Salvadori et al. (2013;2016), 8 and the Inclusion-Exclusion principle proposed by Joe (2014), the multivariate survival function Ĉ can be obtained by: where #(S) denotes the cardinality of S. The joint RP in AND (denoted as T AND ) of the extreme event indicates the occurrence probability with all of its variables xi's, i = 1, 2, …, d, exceeding the corresponding thresholds * i x 's 165 (iii) "Kendall" case: The Kendall RP characterizes the hydrologic disasters exceeding a critical layer as defined by (Salvadori et al., 2011): can be expressed as (Salvadori et al., 2011): where KC is the Kendall distribution function associated with the copula C, which can be expressed as: In addition to the multivariate RP, Failure probability (FP) can be another index to provide more coherent, general and well devised tools for multivariate risk assessment and 9 communication. In general, the failure probability M p to indicate the occurrence of a critical event for at least one time in M years of design life can be defined as (Salvadori et al., 2016): Similar to the multivariate RP concept, the failure probability in a multivariate context can 180 also be characterized in "OR", "AND", and "Kendall" scenarios expressed by the following equations. For a given critical threshold * , the failure probabilities violating this critical value can be expressed as (Salvadori et al., 2016): where OR T p , AND T p , and Kendall T p respectively denote the failure probability in "AND", "OR" and "Kendall" cases. T indicates the service time of the facilities under consideration.

Uncertainty in the Copula-based Risk Model
Extensive uncertainties may be involved in the parametric estimation of a copula function 205 due to: (i) the inherent uncertainty in the flooding process; (ii) uncertainty in the selection of appropriate marginal functions and copulas; and, (iii) statistical uncertainty or parameter uncertainty within the parameter estimation process (e.g. the availability of samples) (Zhang et al., 2015). Several methods have been proposed to quantify parameter uncertainties in copula-based models. For instance, Dung et al. (2015) proposed bootstrap-based methods for 210 quantifying the parameter uncertainties in bivariate copula models. Zhang et al. (2015) employed a Bayesian inference approach for evaluating uncertainties in copula-based hydrologic drought models, in which the Component-wise Hit-And-Run Metropolis algorithm is adopted to estimate the posterior probabilities of model parameters.

215
In this study, a bootstrap-based algorithm, is applied to quantify parameter uncertainties in the copula-based multivariate risk model. The procedures describing the bootstrap-based algorithm to derive probabilistic distributions of the parameters in both marginal and dependence models are presented as follows: 1. Predefine a large number of bootstrapping samplings NB 220 2. Implement the resampling with replacement over observed pairs Z = (X, Y) to obtain Z* = (X*, Y*); Z* has the same size as Z.
3. Fit the chosen marginal distributions to X* and Y*, and estimate the associated parameters ( , XY  ).
4. Fit the chosen copula to Z*, and estimate the parameter in the copula function θ. 225 5. Repeat step 2-5 NB times, and obtain NB sets of ( , , ) XY    . Moreover, in order to reject those parameters that lead to bad fits for both marginal and copula models, the A-D test and the Cramer-von-Mises test are introduced in the bootstrap procedure to ensure that the obtained parameters can pass statistical tests for both the marginal distribution and copula models. Then the kernel method will be adopted to quantify the probabilistic features for

Interactive and Sensitivity Analysis for Parameter Uncertainties
Due to the uncertainties existing in the unknown values of parameters for a copula model, the associated risk or the return period for a flooding event may also be uncertain. Few studies have been reported to analyze the effect of uncertainties in the copula model on evaluating 245 the risk for a flood event. To address the above issue, an iterative factorial analysis (IFA) approach will be proposed to reveal the individual and interactive effects of parameter uncertainties on the predictive uncertainties of different risk inferences.
Consider a copula-based bivariate risk assessment model which has two marginal 250 distributions (A and B) and one copula (C). The parameters in the two marginal distributions are assumed to be respectively denoted as γ A with a levels and γ B with b levels, while the parameter in the copula is denoted with θ C with c levels. The three factor ANOVA model for such a factorial design in terms of the predictive risk (denoted as R) in response to the parameters γA, γB, θC and n replicates, can be expressed as: 13 0 1, 2,..., 1, 2,..., 1, 2,... 1, 2,..., where R0 denotes the overall mean effect; ,,  respectively indicate the effect for parameter θ C in the copula at the ith level, parameter γ A in the first marginal distribution at the jth level, and parameter γ B in the first marginal distribution at the kth level; ,, indicate interactions between factors θ C and γ A , θ C and γ B , as well as γ A and γ B , respectively;  However, one major issue for the ANOVA approach is that the biased variance estimator in 290 ANOVA would underestimate the variance in small sample size scenarios (Bosshard et al., 2013). Thus the sample size may significantly affect the resulting variance contributions expressed in Equations (15a) -(15e). A subsampling approach has been advanced by Bosshard et al. (2013) to diminish the effect of the sample size in ANOVA and has been employed for uncertainty partitioning in flood design and hydrological simulation (Qi et al.,295 2016a, b). In such a subsampling scheme, one factor (denoted as X) with T levels (these levels can be different values for numerical parameters, or different types for non-numerical factor (e.g. model type)), would choose two levels in each iteration. For T possible levels of X, we can obtain a total of 2 T C possible pairs for X, expressed as a 2 2 T C  matrix as follows: However, such a subsampling approach is mainly applied to subsample merely one factor or one parameter (here we refer to this method as single-subsampling ANOVA) in previous 305 studies (Bosshard et al. 2013;Qi et al., 2016a, b). However, a critical issue for the singlesubsampling ANOVA it that it will lead to an underestimation of the individual contribution for the factor to be sampled and overestimation of contributions for those non-sampled factors. Consequently, in this study, we will propose an IFA approach to subsample all the factors to be addressed, and then quantify the contribution of each factor to the response Consequently, there are a total number of 2 2 2 c a b C C C iterations in IFA for the three-factor model 320 expressed as Equation (13). For each iteration, the sums of squares can be reformulated as: Also, for each iteration, the corresponding contributions for each factor can be obtained as: Finally, the individual and interactive contributions for those factors can be obtained by averaging the corresponding contributions in all iterations, expressed as:

Applications
The proposed IFC approach can be applied to various multivariate risk inference problems. In this study, we will apply IFC for multivariate flood risk inference at the Wei River basin in China. The Weihe River plays a key role in the economic development of western China, and 355 thus is known regionally as the 'Mother River' of the Guanzhong Plain of the southern part of the loess plateau (Song et al. 2007;Zuo et al. 2014;Du et al. 2015, Xu et al., 2016. It originates from the Niaoshu Mountain at an elevation of 3485 m above mean sea level in Weiyuan County of Gansu Province (Du et al. 2015). The Weihe River basin is characterized by a semi-arid and sub-humid continental monsoon climate, resulting in significant temporal-360 spatial variations in precipitation, with an annual average precipitation of 559 mm (Xu et al., 2016). Furthermore, there is a strong decreasing gradient from south to north, in which the southern region experiences a sub-humid climate with annual precipitation ranging from 800 to 1000 mm, whereas the northern region has a semi-arid climate with annual precipitation ranging from 400 to 700 mm (Xu et al., 2016). Over the entire basin, the mean temperature 365 ranges from 6 to 14 0 C, the annual potential evapotranspiration fluctuates from 660 to 1,600, and the annual actual evapotranspiration is about 500 mm (Du et al. 2015).
Observed daily streamflow data at Xianyang and Zhangjiashan gauging stations were used for hydrologic risk analysis. Figure 2 show the locations of these two gauging stations based 370 on the daily stream flow data, the flood peak applied is defined as the maximum daily flow over a period and the associated flood volume is considered as the cumulative flow during the flood period. In this study, the flood characteristics are obtained based on an annual scale.
This means that one flood event is identified in each year. The detailed method to identify the flood peak and the associated flood volume can be found in Yue (2000Yue ( , 2001. Table 1 shows and Akaike Information Criterion (AIC), to screen the performance of those potential models.
The results are presented in concluded that the GEV and lognormal approaches show the best performance for respectively modelling flood peak and volume at both gauging stations.
Also, the goodness-of-fit statistic test is performed based on the Cramér von Mises statistic 21 proposed by Genest et al. (2009). The indices of RMSE and AIC were employed to evaluate the performance of the obtained copulas and identify the most appropriate ones. Table 3 shows statistical test results for the selected copulas. The results show that, for the Zhangjiashan station, all candidate copulas except the Joe copula performed well, while all 410 six copulas would be able to provide satisfactory risk inferences at the Xianyang station.
Moreover, based on the values of RMSE and AIC, the Gumbel copula was chosen to model the dependence of flood peak and volume at Zhangjiashan station, while the Joe copulas performed best at the Xianyang station, except that although the p-value is slightly lower than Gumbel, the overall result favours Joe. Consequently, the Gumbel and Joe copula were   uncertainties can be observed in the predictive joint RP even for moderate flood events. As shown in Figure 5, considerable uncertainties may appear in the predictive joint RP of OR even for a flood with an actual joint RP of 20 years, while prediction of the Kendall RP for a 20-year (in Kendall RP) flood event may range from 10 to 50 years, as presented in Figure 6.

470
It has been observed that parameter uncertainties in the copula-based multivariate risk model would lead to significantly imprecise risk predictions. However, one critical issue to be addressed is how the parameter uncertainties and their interactions would influence the risk inference. Consequently, a multilevel factorial analysis, based on Equations (13) and (14), 475 were proposed to primarily visualize the individual and interactive effects of parameter 24 uncertainties in the marginal and dependence models on the resulting risk inferences. In this study, a total number of 6 parameters (i.e. three from GEV, two from LN, and one from copula) was addresses, and based on probabilistic features of these parameters, three quantile levels (i.e. 0.1, 0.5 and 0.9) were chosen to characterize the resulting risk inferences under implying a significant interaction between these two parameters. Table 4 provides the results 500 from an ANOVA table for the failure probability in AND. It is quite interesting that: i) even though the effect from the parameter in the copula function is not as visible as the effects from the parameters (except the location parameter in GEV) in the marginal distributions (as shown in Figure 7), such an effect is still statistically significant; ii) the effect from the location parameter of GEV is statistically insignificant, which also lead to insignificant 505 interactive effects between the location parameter and other parameters; iii) the interactions between the parameter in copula and the parameters in marginal distributions would be more likely statistically insignificant; iv) the statistical significance (significant or not) for individual and interactive effects from parameters is almost the same between these two gauge stations. All these conclusions obtained from Table 4 are consistent with the 510 implications described in Figure 7. Table 4 here

515
In terms of the failure probabilities in OR and Kendall, as presented in Figures 8 and 9, these have similar patterns with the failure probability in AND (presented in Figure 7). The individual/main effects from the marginal distributions (except the location parameter in GEV) are generally more visible than the parameters in copula functions. Also, some In terms of the failure probability in OR, the individual and interactive effects of model parameters on predictive risk uncertainties show similar patterns with the parameters' effects on the failure probability in AND. As shown in Figure 11, the shape parameter in the GEV distribution and the sdlog in the LN distribution are the two major sources of uncertainty in 595 failure probabilities in OR. However, compared with the failure probability in AND, parameter interaction has a less effect on the resulting uncertainty of risk inference in OR. As shown in Figures 10 and 11, the effect of parameter interaction on the risk in AND ranges between 13.96% and 20.05%, while in comparison, the parameters' interactive effect on the risk in OR varies between 10.25% and 11.57%. Apparently, it can also be observed that some Conversely, the location parameter in GEV and the dependence parameter in copula merely have relatively minor individual effects. However, it is noticeable that, although the dependence parameter has a minor effect (0.78%, 1.03%) on the risk in Kendall, such an 625 effect is much higher than the effect on the risk in AND (less than 0.23%) and the risk in OR (less than 0.06%). -

Contribution Partition of Uncertainty Sources through Different Approaches
In this study, the individual and interactive contributions of parameter uncertainties are quantified through the developed IFA approach, in which each parameter has three levels (i.e. 675 0.1, 0.5, 0.9 quantiles) to be subsampled. In fact, the parameters' contributions can also be characterized by the traditional factorial analysis (FA) approach based on Equations (15) as well as the IFA approach with more factor levels (e.g. 4 or 5 levels for each parameter).  Table 7 here It can be seen that for different subsampling scenarios, the resulting contributions may be different. However, such a difference would be tolerable since (1)   As shown in Figure 13, the proposed IFA approach may lead to slightly different results for 705 34 different subsampling schemes (four or five levels). However, an increase in parameter level would highly increase computational demand. For instance, if each parameter has four levels, the IFA approach would lead to a total number of 46,656 (i.e. 6 6 ) two-level factorial designs.
Moreover, the subsampling scheme for factors with five levels would lead to a total number of one million (i.e. 10 6 ) two-level factorial designs. Consequently, the three-level 710 subsampling scheme would generally be recommended and also can generate acceptable results.

715
The proposed IFA approach would generally produce a great number of two-level factorial designs. For one specific factor (e.g. GEV_shape), it would have two levels (lower and upper levels) for all factorial designs. However, the detailed value for the lower or upper level may be different in different factorial designs. This may finally lead to different contributions for this factor. Figure 14 presents the variations of parameters' contributions to the prediction of 720 failure probabilities in AND, OR, and Kendall. We already concluded that the shape parameter in GEV (i.e. GEV_shape) and the sdlog in LN (i.e. LN_sdlog) distribution would generally have the most significant contributions to predictive uncertainties in risk inferences.
However, as shown in Figure 14, the detailed contributions for these two parameters would

Conclusions
Uncertainty quantification is an essential issue for both univariate and multivariate hydrological risk analyses. A number of research works have been posed to reveal uncertain features in multivariate hydrological risk inference. However, it is required to know the major 770 sources/contributors for predictive uncertainties in multivariate risk inferences. In this study, an iterative factorial copula approach (IFC) has been proposed for uncertainty quantification and partition in multivariate hydrologic risk inference. In IFC, a copula-based multivariate risk model has been developed and the bootstrap method is adopted to quantify the reliable risk inferences. For other catchments, the proposed IFC method can be adopted to reveal the major sources for uncertainties in risk inferences and then provide potential pathways to get reliable risk inferences.
This study is the first attempt to characterize parameter uncertainties in a copula-based 795 multivariate hydrological risk model and further reveal their contributions to predictive uncertainties for different risk inferences. As an improvement of ANOVA, the developed IFA 38 method can mitigate the effect of bias variance estimation in ANOVA and generate reliable results. Moreover, another noteworthy feature for the IFA approach is that it does not only characterize the impacts for continuous factors (e.g. model parameters in this study), but also 800 reveals the impacts of discrete or non-numeric factors. Such a feature can allow the proposed IFA approach to be employed to further explore the impacts of non-numeric factors (e.g. model structures, sample size) in hydrologic systems analysis.
Code and data availability: The flooding data for the studied catchments as well as the  Tables  Table 1. Flood characteristics for different stations  Table 2. Statistical test results for marginal distribution estimation: LN means lognormal 935 distribution, P III means Pearson Type III distribution, and LP III means log-Pearson Type III distribution. K-S test denotes the Kolmogorov-Smirnov test. Table 3. Performance for quantifying the joint distributions between flood peak and volume through different copulas: CvM is the Cramér von Mises statistic proposed by Genest et al. (2009), with p-value larger than 0.05 indicating satisfactory performance.  Gumbel and Joe copula (parameter denoted as theta) would be respectively adopted to model 965 the dependence between flood peak and volume at Zhangjiashan and Xianyang stations.   Marginal distribution F 2 = F(y, θ 2 ) Joint distribution estimation F XY (x, y) = C(F(x, θ 1 ), (y, θ 2 ), θ c ) Parameter uncertainty quantification through Bootstrap (θ 1 ,θ 2 , θ c ) 1 ~ [(X 1 ,Y 1 ) 1 , (X 2 ,Y 2 ) 1 , …, (X n ,Y n ) 1 ] (θ 1 ,θ 2 , θ c ) 2 ~ [(X 1 ,Y 1 ) 2 , (X 2 ,Y 2 ) 2 , …, (X n ,Y n ) 2 ] ………. Wei River Figure 3. Probabilistic features for parameters in marginal distributions and copula: for both Xianyang and Zhangjiashan stations, the GEV (parameters include shape, scale and location) function would be employed to quantify the distribution of flood peak, while the lognormal distribution (parameters denoted as meanlog and sdlog) is applied for flood volume. The Gumbel and Joe copula (parameter denoted as theta) would be respectively adopted to model the dependence between flood peak and volume at Zhangjiashan and Xianyang stations.    Figure 6. Uncertainty quantification of the joint RP in "Kendall": the red dash lines indicate the predictive means, the two blue dash lines respectively indicate the 5% and 95% quantiles, and the grey lines indicate the predictions under different parameter samples with the same joint RP of the red and blue dash lines; The cyan lines denote the predictions under different return periods with the model parameters being their mean values.   Figure 11. Contributions of parameter uncertainties to predictive failure probabilities in OR under different design standards (i.e. return periods (RP)) and different service periods