Simultaneously Determining Global Sensitivities of Model Parameters and Model Structure

Model structure uncertainty is known to be one of the three main sources of hydrologic model uncertainty along with input and parameter uncertainty. Some recent hydrological modeling frameworks address model structure uncertainty by supporting multiple options for representing hydrological processes. It is, however, still unclear how best to analyze structural sensitivity using these frameworks. In this work, we apply an Extended Sobol’ Sensitivity Analysis (xSSA) method that operates on grouped parameters rather than individual parameters. The method can estimate not only traditional model parameter 5 sensitivities but is also able to provide measures of the sensitivities of process options (e.g., linear vs. non-linear storage) and sensitivities of model processes (e.g., infiltration vs. baseflow) with respect to a model output. Key to the xSSA method’s applicability to process option and process sensitivity is the novel introduction of process option weights in the Raven hydrological modeling framework. The method is applied to both artificial benchmark models and a watershed model built with the Raven framework. The results show that: 1) The xSSA method provides sensitivity estimates consistent with those derived analyti10 cally for individual as well as grouped parameters linked to model structure. 2) The xSSA method with process weighting is computationally less expensive than the alternative aggregate sensitivity analysis approach performed for the exhaustive set of structural model configurations, with savings of 81.9% for the benchmark model and 98.6% for the watershed case study. 3) The xSSA method applied to the hydrologic case study analyzing simulated streamflow showed that model parameters adjusting forcing functions were responsible for 42.1% of the overall model variability while surface processes cause 38.5% of the 15 overall model variability in a mountainous catchment; such information may readily inform model calibration. 4) The analysis of time dependent process sensitivities regarding simulated streamflow is a helpful tool to understand model internal dynamics over the course of the year.


Introduction
Hydrologic processes such as infiltration of water into soil or water interception by the canopy of trees are often too complex 20 to parameterize or insufficiently understood at scales of interest to be represented in every detail in computer models. The consequence of this is that simplified conceptual or empirical models are often used to represent these physical processes; such models are typically computationally expedient and possess a relatively smaller number of parameters than continuum models based upon the Freeze and Harlan (1969) blueprint. They are also non-unique: a large number of such process algorithms can sensitivity estimates for each parameter conditional on which of the N model structures it is based on. It remains unclear on how to aggregate these N estimates to derive a global overall sensitivity for all parameters. Francke et al. (2018) proposed a similar method that was only capable of distinguishing between binary model components (i.e., a model feature/ enhancement is either present or not). Although Pfannerstill et al. (2015) did not explicitly focus on modeling frameworks, they studied the sensitivity of parameters regarding individual model processes verifying if a process 70 output is behavioral. Dai et al. (2017) proposed a so-called process sensitivity metric which is based on Sobol' sensitivities. They enable the derivation of a countable set of process options for each of the model processes and derive an overall sensitivity through model averaging.
In all cases above, the resultant sensitivity metrics may be useful for (e.g.) differentiating between the magnitude of model 75 sensitivity to structure versus that of parameters. However, these methods cannot be used to provide insight into the sensitivity of individual model structural choices, nor can they be used to disentangle the complex relations between model parameter and structure sensitivities or the interplay between interacting model structures. It is therefore difficult to use such methods to identify preferred model structures to inform the process of model calibration. In addition, none of the methods that derive sensitivities across multiple model structures recognize the fact that model parameters may be present or absent conditional 80 upon the model structure. Lastly, the above mentioned methods are generally computationally expensive, are only available for a small number of process parameterization options, and only determine the sensitivity of the parameterization in general without providing insight into what is causing this sensitivity by analyzing, for example, the sensitivity of individual parameterization options or individual parameters. While potentially useful for some applications, the available approaches have either not been applied of do not allow for such an in-depth analysis of model structure and hence might provide only limited support 85 for improvements in hydrologic modeling.
Here, we propose a new technique, the Extended Sobol' Sensitivity Analysis (xSSA) method, and apply it to models whose structure can vary continuously between discrete process options for estimating hydrologic fluxes. The proposed method is based on the existing concept of grouping parameters when applying the Sobol' method (Sobol and Kucherenko, 2004;Saltelli et al., 2008;Gilquin et al., 2015). This concept has been applied by Dai et al. (2017) to derive the process sensitivities of 90 processes with several alternative options by using model averaging of the discrete set of options. To our knowledge, the method of grouping parameters to derive sensitivities of parameters, process options, and processes without the explicit necessity of model averaging has not yet been applied. The xSSA method is made uniquely possible due to a special property of the Raven hydrologic modeling framework (Craig et al., 2020), whereby hydrologic fluxes (e.g., infiltration, runoff, or baseflow) may be be revised to accommodate such analysis. As will be demonstrated below, the xSSA is uniquely capable of simultaneously providing global sensitivities of parameters, process algorithms (e.g., the Green and Ampt (1911) infiltration method), and hydrologic processes (e.g., infiltration).
The xSSA method allows us to efficiently estimate not only the global sensitivity of model parameters independent of the chosen model structure, but also to evaluate the sensitivity of alternative model process options (e.g., that of different snowmelt 100 algorithms), and the sensitivity of hydrological process components (e.g., snowmelt vs. infiltration). We here pose these as four distinct sensitivity metrics: A. Conditional parameter sensitivity: Which model parameter is most influential given a certain model structure?
For example, which model parameter is most important in the HBV model? (This is the traditional Sobol' metric) B. Unconditional parameter sensitivity: Which model parameter is most influential independent of model option choice? 105 For example, which model parameter is overall the most influential given all possible model structures (available in the modeling framework)?
C. Process option sensitivity: Which of the available options for a process in a modeling framework is the most sensitive?
For example, which choice of the infiltration process description has the largest impact on the simulated streamflow in my catchment of interest? 110 D. Process sensitivity: Which model process or component is most influential upon model results?
For example, is infiltration more important than the handling of snow melt? Or, is the simulated streamflow more sensitive to infiltration or evaporation?
Below, we define these metrics explicitly and introduce the xSSA methodology for calculating them. The xSSA method is tested using two artificial benchmark model to check for consistency between analytical and numerically derived sensitivity 115 index estimates. The proposed method is also compared to the existing Baroni method revealing limitations that can be resolved using the xSSA method. The xSSA method is then applied to a hydrologic modeling case study using the Raven hydrologic modeling framework, demonstrating the insights that may be gained through the simultaneous in-depth analysis of model parameters and model structure to improve hydrologic modeling practices. The method is demonstrated to be more efficient than a conventional approach whereby the standard Sobol' method is repeatedly applied to distinct model structures as in the 120 study by Van Hoey et al. (2014), in addition to providing more useful information regarding model sensitivities.

Materials & Methods
The section will first introduce the models and their setups (Sec. 2.1) used to test and validate the proposed Extended Sobol' Sensitivity Analysis (xSSA) method as here applied to determine model structure and parameter sensitivities. We will briefly revisit the traditional method of Sobol' that is so far primarily used to obtain model parameter sensitivities (

Artificial Benchmark Model Setups
The artificial benchmark models are employed to demonstrate that the proposed method is capable of deriving the sensitivities of not only individual model parameters but also of grouped parameters linked to individual process options (e.g., the linear baseflow algorithm) or processes regardless of available options (e.g., baseflow). The benchmark models are further used to demonstrate limitations of existing methods that were previously used to analyze model structure sensitivities. We use two 140 hypothetical models here: One model where each model parameter is only used in distinct processes and process options (disjoint-parameter benchmark; Fig. 1A) and a more advanced benchmark model where parameters are shared between several processes and process options (shared-parameter benchmark; Fig. 1B). The latter is assumed to be more realistic as model parameters such as, for example, the thickness of the upper soil layer can appear in multiple processes (e.g., evaporation, quickflow, infiltration, and percolation). The two benchmark models are both assumed to consist of three processes: A, B, and 145 C as well as D, E, and F . The model output f (x), a function of model parameters x, is defined by The product of processes A (D) and B (E) is intended to mimic non-linear coupling of model processes while the addition of C (F ) is intended to resemble linear process coupling. Each of the three processes is assumed to allow for multiple process 150 options. For the disjoint-parameter benchmark model, the process A is set to have 2 options (A 1 , A 2 ), B has 3 options (B 1 , 5 https://doi.org/10.5194/hess-2020-215 Preprint. Discussion started: 30 June 2020 c Author(s) 2020. CC BY 4.0 License.
B 2 , B 3 ), and C has two options (C 1 , C 2 ): A 2 = 1 (4) For the shared-parameter benchmark model, the process D is set to have 2 options (D 1 , D 2 ), E has 3 options (E 1 , E 2 , E 3 ), 160 and F has two options (F 1 , F 2 ): This allows for 2×3×2 = 12 individual models for the disjoint-parameter benchmark and 12 individual models for the sharedparameter setup using seven model parameters x 1 to x 7 that are all sampled uniformly from the range [−π, π]. By design, not 170 all model parameters are used in each of the 12 models. The number of "active" parameters for the shared-parameter setup ranges from 3 (e.g., D 1 · E 1 + F 1 ) to 6 (e.g., D 2 · E 3 + F 2 ). For the disjoint-parameter setup each parameter appears in exactly one process option. In the shared-parameter setup, parameter x 1 is used in two process options of the same process (D 1 and D 2 ). Parameter x 2 and x 3 are used in multiple process options of different processes. x 2 is present in process options D 2 and E 1 to evaluate the behavior of sensitivities for multiplicative parameters and x 3 is present in process options E 2 and F 2 to 175 check for additive behavior. A schematic of the model options and associated model parameters are shown in Figure 1A and B.
The Sobol' sensitivity indexes of all 12 model configurations can be derived analytically following closely the description in Saltelli et al. (2008) (p. 179-182). They are listed for the shared-parameter benchmark model in Table B1 of the Appendix B.

185
A reasonable approach for evaluating the sensitivity of 12 individual models involves choosing exactly one process option for each process in Eq. 2, e.g. D = D 1 , E = E 2 and so on. This can be generalized by choosing a weighted sum of all available process options to represent a process, e.g. D = w 1 D 1 + w 2 D 2 + . . .. The sum of weights w i per process is assumed to be 1.
In case of the shared-parameter benchmark example, Eq. 2 is therefore changing to The 12 individual models can be obtained when the weights are set accordingly, e.g. the Ishigami-Homma function can be 195 obtained by setting w d1 = w e1 = w f 1 = 1. However, given that weights can take on non-integer values, we now have an infinite number of model structures rather than 12. The same can be constructed for the disjoint-parameter model setup.
For sampling the continuum of all process options, the weights need to be independently and identically distributed (iid).
Therefore, random numbers r i are sampled from the uniform distribution, U[0,1], and transformed into the weights following the approach described by Moeini et al. (2011). N − 1 such random numbers are required for N weights of competing options.

200
The recipe on how to transform the uniformly sampled numbers r i into weights is specified in Eq. A1 of Appendix A. For the benchmark example, one requires 4 such uniform random numbers (r 1 , . . ., r 4 ) to derive the 7 weights (w 1 , . . ., w 7 ).
The approach of weighted model options hence comes at the expense of introducing additional parameters r i to derive the weights. Larger numbers of model parameters always results into an increased number of model runs needed for the sensitivity analysis. However, using the model with weighted options, one now has to run and analyze only one generalized model structure 205 instead of 12 fixed structures. Therefore, this approach reduces the number of required model runs provided that the model allows to derive outputs of weighted process options directly. This feature is available in the hydrologic modeling framework Raven and was the primary reason for the choice of this flexible modeling framework over others.  The three processes are connected through A · B + C (C · D + E) to obtain the hypothetical model outputs. Processes A (D) and C (E) have two options, process B (E) has three. The parameters xi required for each process option are listed right of the respective process options.
Each parameter of the disjoint-parameter benchmark model (A) appears in exactly one process option (disjoint parameters) while in the more advanced benchmark model (B) parameters x1 and x3 appear in multiple process options and processes. The process formulations can be found in Eq.s 3 to 9 and Eq.s 10 to 16. (C) The third setup is used for a sensitivity analysis of the hydrologic modeling framework Raven.
The three options Mi are used for the infiltration process, three options Ni for quickflow, two options Oi for evaporation, two options Pi for baseflow, and three options Qi for snow balance. All other processes needed for the model are used with one fixed option, i.e., convolution for surface and delayed runoff R1 and S1, respectively, potential melt T1, percolation U1, rain-snow partitioning V1, and precipitation correction W1. The remaining processes also have only one option but none of them contains tunable parameters. They are merged to a "remaining" process X1. The Raven model parameters x1 to x35 are listed right of the process options. Details about the chosen process options can be found in Appendix C Table C1, and details on the model parameters and their ranges in Appendix C Table C2.

Hydrologic Modeling Framework Raven
The Raven hydrologic modeling framework developed by James R. Craig at the University of Waterloo (Craig et al., 2020) is a C++ based framework that gives users full flexibility regarding model input handling and hydrologic process description chosen for each process of the water cycle. It is platform independent, open-source and is retrievable from http://raven.uwaterloo.ca.

215
For this study we used the released version 3.0. The Raven framework currently allows for an ensemble of about 8 × 10 12 hydrologic model configurations with, for example, 14 options for infiltration, 13 options for percolation, and 9 for baseflow handling. The overall number of model structures is hypothetical as not all processes need to be present in a model setup.
For example, the sublimation process allows for 6 different options in Raven but would likely not be used to model an arid catchment. Further, other processes might appear several times. For example, convolution processes can be defined for each 220 soil layer and hence would increase the number of possible models.
In Raven, the user defines the model as a list of hydrologic processes which move water between storage compartments corresponding to physical stores (e.g., topsoil, canopy, snowpack). The list determines the state variables, connections between stores and the parameters required. For each hydrologic process, several options of process algorithms are implemented. There are, for example, 14 infiltration process options available. Amongst others, the GR4J (Perrin et al., 2003), HMETS (Martel 225 et al., 2017), UBC watershed model (Quick and Pipes, 2009), PRMS (Markstrom et al., 2015), HBV (Bergström, 1995), VIC (Wood et al., 1992), and VIC/ARNO (Clark et al., 2008)  Raven has another unique feature relative to other modular frameworks: Rather than selecting one process option (e.g.,

230
HMETS method for estimation of infiltration runoff fluxes) one can specify multiple process options (e.g., HMETS, VIC/ARNO, and HBV) and define weights for each option (e.g., 0.4, 0.5,and 0.1, respectively). Raven then uses the weighted sum of the fluxes calculated by the process options internally. Raven is run only once and not multiple times to obtain the outputs for the multiple process options. Based on this feature, we chose Raven as model for our study. Please note that the proposed sensitivity method is applicable for any multi-model framework that allows to mix-and-match process descriptions. However,

235
in the case of a framework without weights for process options, the application of the method would be much less efficient.
For the case study used herein, Raven is applied in lumped mode and we have chosen three different options M i for the infiltration process, three options N i for quickflow, two options O i for evaporation, two options P i for baseflow, and three options Q i for snow balance. All other processes, i.e. convolution for surface runoff R 1 and delayed runoff S 1 , potential melt T 1 , percolation U 1 , rain-snow partitioning V 1 , and precipitation correction W 1 are used with one fixed process option.

240
The remaining processes also have only one option but none of them contains tunable parameters. They are merged to a "remaining" process X 1 . This remaining process will never appear in the sensitivity analysis because it is constant. Details of process options can be found in Appendix C Table C1.
All other combinations are unnamed models. The Raven model is stable for all of these combinations although a check of the hydrologic realism of these models, as done by Clark et al. (2008), was not performed. Figure Table C2 of the appendix. An additional number of 13 (= 3 + 3 + 2 + 2 + 3) weights is required for the weighted model setup (similar to Eq. 18), and are also sampled using the approach described in Appendix A. Therein, 8 (= 2 + 2 + 1 + 1 + 2) parameters r i are sampled uniformly U[0, 1] and transformed into weights w i .

Case Study Domain
The Salmon River catchment located in the Canadian Rocky Mountains in British Columbia is selected as the study watershed.

255
The domain is depicted in Figure 2A and The monthly distribution of precipitation is shown in Figure 2B where rain is highlighted as the dark blue portion of each bar 265 and snow as the light blue portion of each bar. The basin has a dryness index (PET/P) of 0.735, which demonstrates the energy limitation of this catchment.
The lumped model was setup for the simulation period from January 1, 1989 to December 31, 2010 while the first two years were discarded as warm-up. Hence, 20 years of daily streamflow simulations were used for this study.

270
In this section we briefly describe the Sobol' method (Sobol', 1993) that is traditionally used to derive model parameter sensitivities (Sec. 2.2.1). This corresponds to the sensitivity metric A mentioned in the introduction. To calculate the sensitivity metrics B, C, and D, we propose an extended version of the Sobol' method which is introduced in Sec 2.2.2.

Traditional Sobol' Sensitivity Analysis: Sensitivities of Individual Model Parameters
Traditionally, the Sobol' sensitivity analysis-as all other methods-focuses on the sensitivity of model parameters. In case of Water ( estimates for parameters that are active in multiple models. This, however, may underestimate the sensitivity of parameters that are active in only a few models while parameters that are active in almost all models might be overestimated in their aggregated sensitivity. We here briefly revisit the implementation of the traditional Sobol' method to emphasize the differences with the extended 280 method we propose that will be able to handle multiple process options and derive (overall) parameter sensitivities, sensitivities of process options, and sensitivities of whole processes.
Usually two Sobol' indexes are derived: the main and total Sobol' index. The main Sobol' index S xi of a parameter x i regarding a certain model output f (x i ) represents the variability in the model output V i that can be achieved by changing this parameter while keeping all other parameters at a nominal value. This impact is normalized by the overall model variability 285 V (f ) that can be generated when all model parameters are varied. Therefore, the main index is derived by where V depicts variances and E expected values. E(f |x i ) is the expected model output when the model parameter x i is fixed.
Similarly, the total Sobol' index ST xi for a parameter x i regarding a certain model output f (x i ) is similar to the main index but includes parameter interactions. Therefore, it is derived using the variability of model output that can be generated 290 by changing all parameter subsets that include parameter x i . Since there might be a large number of such subsets, the total index for parameter x i can also be viewed as 1 minus the variability that can be achieved by changing all parameters but not parameter x i (V ∼i ) normalized by the overall possible model output variability V f : where For the numerical estimation of the indexes S xi and ST xi , one constructs two base matrices A and B which each contain K 300 parameter sets (rows) of N parameters (columns). The samples are assumed to be independent within one matrix and between the matrices. Based on A and B, a set of additional N matrices C i is constructed. C i is a copy of A but column i is replaced with column i of matrix B. The model then needs to be forced with all the parameter sets; in total K × (N + 2) model runs are required where N is the number of parameters and K is the so-called number of reference parameter sets. K needs to be chosen to be large enough to obtain stable Sobol' indexes. This number is highly dependent on the model but K = 1000 seems 305 to be a good rule-of-thumb (Cuntz et al., 2015(Cuntz et al., , 2016. The 12 possible shared-parameter benchmark models (Eq. 2) contain 3 (4 models), 4 (5 models), 5 (2 models), or 6 (1 model) parameters. Hence, 72 000 (= 4×5000+5×6000+2×7000+1×8000) model runs would be required if K = 1000 reference parameter sets would be used.

310
The Sobol' method is here generalized to groups of parameters x G rather than focusing on individual parameters x i . The subscript G is used here to refer to parameter groups, such that V G represents the variance of a group of parameters x G , e.g., The calculation of the main and total Sobol' indexes is marginally changed: Instead of changing individual parameters x i groups of parameters get changed. The derivation of the main index gets generalized to: where V depicts variances and E expected values. E(f |x G ) is the expected model output when the set of model parameters x G is fixed. This simplifies to Eq. 19 in case the group x G contains exactly one model parameter x i . Similarly, the total Sobol' index can be generalized to: where V depicts variances and E expected values. E(f |x ∼G ) is the expected model output when all model parameters except the ones in of group x G are fixed. This simplifies to Eq. 20 when the group x G contains only the parameter x i . Note that the groups are not assumed to be mutually exclusive, which means that parameters can appear in multiple groups.
The numerical approximation of these indexes is similar to the traditional approach. It is again based on the two matrices A and B containing K parameter sets each. Assuming that the sensitivity of M groups needs to be estimated, M matrices C m have to be constructed where C m is a copy of A but all columns that correspond to parameters in group m are replaced by the 325 corresponding column of B. For example, if the group consists of parameters x 2 , x 4 , and x 5 , the columns 2, 4, and 5 would be replaced by the columns 2, 4, and 5 of matrix B. The number of model runs that need to be performed is (M + 2) × K where K is the number of reference parameter sets.
The Extended Sobol Sensitivity Analysis (xSSA) method can be used to derive conventional model parameter sensitivities by using groups that contain exactly one of the model parameters x i or random number r i that are required to derive weights 330 (sensitivity metric B). It can also supply the sensitivity of process options by defining a group for each process option containing exactly the parameters of that option (sensitivity metric C) or defining a group for each process containing all parameters active in at least one of the process options to derive the sensitivity of whole model processes (sensitivity metric D).
As an example, (see Fig. 1A) the group to derive the sensitivity of process option A 2 of the artificial benchmark model would contain parameters x 1 and x 2 . The group to determine the sensitivity of process B of the benchmark model would 335 contain parameters x 2 , x 3 , x 4 , and x 5 .
The shared-parameter benchmark model consists of 7 model parameters x i and 4 random variables r i used to derive the 7 weights w i . Let's assume we used K = 1000 reference parameter sets for each of the three analyses. One would require 13 000 (= (7 + 4 + 2) × 1000) model runs to derive individual parameter sensitivities using the xSSA method. This is compared to the 72 000 model runs required when the conventional Sobol' sensitivity analysis method is applied to the 12 individual models.

340
It requires 13 000 (= (7 + 4 + 2) × 1000) additional model runs to derive the sensitivity of the seven process options A 1 , A 2 , B 1 , . . . C 2 and 5000 (= (3 + 2) × 1000) additional model runs to derive the sensitivity of the three processes A, B, and C. The total of 31 000 model runs for all three analyses thus leads to a computational cost reduction of 57% while providing additional information about process option and process sensitivity. The artificial benchmark models are used to prove that the proposed method of Extended Sobol' sensitivity indexes and its implementation is working. They are furthermore employed to demonstrate some limitations of the existing method proposed 13 https://doi.org/10.5194/hess-2020-215 Preprint. Discussion started: 30 June 2020 c Author(s) 2020. CC BY 4.0 License.

Sensitivity Analysis: Experiments
by Baroni and Tarantola (2014) to derive sensitivities regarding model structures. The analytically-derived values for the traditional approach analyzing the individual 12 models independently (sensitivity metric A) can be found in Appendix B  (2019) and Cuntz et al. (2015) have demonstrated that these indexes can be obtained with a classical Sobol' analysis for the Ishigami-Homma model. The Baroni method also applied to the two benchmark models was setup using the same computational budgets K as above.
The method requires additionally the definition of the algorithmic parameter n i which denotes the number of realizations for each source of uncertainty U i analyzed (n i and U i are terms used in the original publication). The term "sources of uncertainty" (U i ) is used by Baroni and Tarantola (2014) to describe groups of parameters with aggregated sensitivity, and is here equivalent

Experiments Using the Raven Modeling Framework
The Salmon watershed model is analyzed using the xSSA method with K = 1000 reference parameter sets assuming that this number of parameter sets is large enough to derive stable results. The analysis of individual models (sensitivity metric A) would have required 3 258 000 model runs and is not performed here.

380
The budget for the sensitivity metric B analyzing unconditional sensitivity of parameters independent of choice of model options, results in (43 + 2) × K model runs for the 35 model parameters x i and 8 weight deriving random numbers r i . The budget for process options is (27 + 2) × K model runs for the 19 process options M 1 , M 2 , . . ., W 1 and 8 weight deriving random numbers r i . The process X 1 not analyzed since it does not contain any parameters and is hence constant, resulting in a zero sensitivity. The budget for the analysis to obtain process sensitivities is (11 + 2) × K for the 11 processes M , N , . . ., W .
The sensitivities are determined for simulated streamflow Q(t) for the 20 year simulation period from 1991 to 2010. The main and total Sobol' indexes S xi (t) and ST xi (t), respectively, are determined for each time step t and are aggregated to S w xi and ST w xi using variance-weighted means (Cuntz et al., 2015):

Results and Discussion
We will present the results of the Extended Sobol' Sensitivity Analysis (xSSA) applicable not only to model parameters but also for model process options and processes. First, the xSSA method will be compared to an existing method to derive 400 sensitivities of groups of parameters introduced by Baroni and Tarantola (2014) highlighting limitations of these existing method (Section 3.1). Second, we will present the convergence of the xSSA results regarding parameters, process options, and processes focusing on the more shared-parameter benchmark model (Sec. 3.2). The results of xSSA for the hydrologic modeling framework are presented in Sec. 3.3.

405
In this section the proposed xSSA method based on grouped parameters and weighted process options will be compared against analytically-derived Sobol' sensitivity indexes for both benchmark problems ( Fig. 1A and B). The xSSA method will aslso be compared to the sensitivity analysis method introduced by Baroni and Tarantola (2014) (hereafter called Baroni method) which is also making use of grouped parameters. Weighted process options are irrelevant for the Baroni method as only one "process option" was used in their publication.

410
The Baroni method defines the term "sources of uncertainty" (U i ) to describe groups of parameters with aggregated sensitivity, and is here used to be equivalent to the process groupings A, B, and C for the disjoint-parameter benchmark model and D, E, and F for the shared-parameter benchmark model. The Baroni method pre-samples a defined number n i of sets for each "uncertainty source" U i . Forcing datasets could be one of such source of uncertainty. In this case, the Baroni method pre-samples n i input time series and the Sobol' method would use the ID of the time series (1 . . . n i ) as the hyper-parameter to derive the sensitivity of the inputs. In the case of their proposed example, the sources of uncertainty are distinct and the parameterizations of the "sources of uncertainty" are disjoint. The question is how the Baroni method would be applied if two competing error structures of the same forcings are supposed to be tested. For example, when both error structures shared one or more of the same statistical parameters such as the mean error and error variance, eliminating the disjoint nature of parameters that all past studies using the Baroni method implicitly assume and utilize. The same question appears in the context 420 of process option sensitivity when a parameter (e.g., porosity) is shared between multiple alternative process options or even multiple processes (e.g., a soil evaporation process option and a percolation process option). The xSSA method is not limited in these situations by using the weighted sum of all competing model options and the definition that parameters are allowed to appear in multiple process options and even in multiple processes.
To demonstrate this major difference of the Baroni and the proposed xSSA concept, we define three groups (sources of 425 uncertainty) as the processes A, B, and C of the disjoint-parameter benchmark model (Fig. 1A). For this scenario, all the parameters are disjoint and it can be shown that both methods converge to the analytically derived Sobol' indexes ( Fig. 3A and   3B).
The shared-parameter benchmark model (Fig. 1B), on the contrary, has parameters that appears in several process options of the same process (e.g., x 1 in A 1 and A 2 ) and parameters that are involved in several process options across processes (e.g., x 3 430 in E 2 and F 2 ). This non-disjoint (overlapping) setting of influencing factors leads to the result that the Baroni method converges to a wrong sensitivity for some processes. This is caused by the usage of the n i pre-sampled parameter sets for each source of uncertainty (here processes). When a parameter appears in two source of uncertainty, one has to make a decision which parameter value to choose for running the model; during the Sobol' analysis (when creating the M matrices C m ) one process expects this parameter to stay constant while the other assumes it is getting changed. This contradiction can not be resolved.

435
In our implementation we had chosen to use the parameter value of the last process leading to process F converging to the analytically-derived correct value. Process E also converges almost to the correct value but only because parameter x 3 which is shared between processes E and F is very insensitive (ST x3 = 0.0045). Process D shares the highly sensitive parameter x 2 (ST x2 = 0.77089) with process E. Thus the sensitivity for process D is significantly underestimated. The underestimation is caused by the fact that the parameter value is supposed to change but it is kept constant because it gets overwritten by the value 440 of x 2 of the pre-sampled set of process E.
We also tested several numbers of pre-sampled parameter sets n i as this is mentioned by Baroni and Tarantola (2014) to be one factor that can influence the convergence of the method. We tested n i with 32, 64, 128, 256, 512, and 1024 and all led to the same results (results not shown).
The xSSA method does not pre-sample parameters. When one group is analyzed, all parameters contained in this group get 445 perturbed. The C m matrices can be defined without causing any contradiction or overwriting of parameter values. The method intrinsically counts repeatedly sensitivities for parameters that appear multiple times. This characteristic of this method is intentional and desirable. When the groups are defined as process options, i.e. various conceptual implementations of the same hydrologic process, parameters will be used in multiple options. Some parameters such as, for example, soil thicknesses or is compared to the proposed method xSSA (B, D) using the disjoint-parameter and shared-parameter benchmarking model ( Fig. 1A and Fig. 1B, respectively). The model parameters are here grouped to model processes A, B, and C for the disjoint-parameter benchmarking model and D, E, and F for the shared-parameter benchmarking model. The main difference between these two models is that parameters of the latter model can appear in multiple processes (or groups of uncertainty) which is regarded to be more realistic. Note that the x-axis is in logarithmic scale. porosity might even be required across processes (for example infiltration and percolation). We do not expect the process 450 sensitivities to sum up to 1 which is anyway not achievable with non-additive models. In addition to the practical benefit of allowing for non-disjoint parameter groups, the theoretical underpinning of analytically-derived Sobol' indexes does not require this constraint as shown with the xSSA results converging towards those values. hence show a similar behavior as presented here for the example Baroni method results.

Extended Sobol' Sensitivity Analysis for Shared-Parameter Benchmark Setup
The shared-parameter benchmark setup is utilized to compare the xSSA derived numerical sensitivity metric values with the analytically derived, correct sensitivity metric values for all three metrics: parameters, process options, and processes. We have chosen the shared-parameter over the disjoint-parameter benchmark model here as it appears to be the more difficult model x 4 x 5 x 6 x 7 r 1 r 2 x 4 x 5 x 6 x 7 r 1 r 2  Figure 4F is the same as Figure 3D.
It holds for all three analyses that the model parameters/ options/ processes with the largest sensitivities converge slowest. For example, model parameter x 2 and weight generating random number r 2 have much higher sensitivities than other parameters analyzed and need about 5000 model runs to obtain an error below 0.1 ( Fig. 4A and B). Similarly, Figure Fig. 4C and 4D   465 show that for the most influential process options (D 2 , E 1 ) and most influential variable (r 2 ) converge slowest (S D2 = 0.43, S E1 = 0.38, S r2 = 0.06, ST D2 = 0.80, ST E1 = 0.77, ST r2 = 0.32). Process F's sensitivity estimates converge faster than the other two as is consistent with the fact that this process is the least sensitive ( Fig. 4E and F).
It can be noted that the weight generating random numbers r i show high interaction effects-which makes sense since they always couple at least two parameters or process options-and hence tend to converge slower due to the higher sensitivity. In 470 this study we are primarily interested in the model parameter, process option and process sensitivities and hence suggest that a number of K = 1000 model runs is sufficient to derive useful sensitivity estimates.

Extended Sobol' Sensitivity Analysis Applied to Hydrologic Modeling Framework
Each subsection here focuses on one experiment performed using the hydrologic modeling framework Raven. The subsections will address the results of the unconditional parameter sensitivities (Sec. 3.3.1), the sensitivities of the process options 475 (Sec. 3.3.2) and the processes (Sec. 3.3.3). All these sensitivity indexes are presented as variance-weighted aggregates over time such that one index per model parameter, option, or process can be analyzed. The last subsection focuses on temporally varying process sensitivities over the course of the year (Sec. 3.3.4) as shown to be of importance previously (Dobler and Pappenberger, 2012;Herman et al., 2013;Günther et al., 2019;Bajracharya et al., 2020).
It is important to note that the results of the analysis for the hydrologic framework are the product of an iterative process and processes. We strongly believe that this kind of analysis will help to analyze the hydrologic realism of models since the estimates are easier to interpret and to compare to experience and known evidence.

(Unconditional) Parameter Sensitivity
The variance-weighted main and total Sobol' sensitivity estimates S w xi and ST w xi are shown in Figure 5. The sensitivities are unconditional since the estimates are averaging over all possible model structures through the weighted sum of all analyzed model options as it is described in Eq. 18. The analysis of model parameters (Fig. 5A) shows that the most important ones are x 24 to x 27 which are all associated with the potential melt (process T ) that handles the melting of snowpack until it is gone.

490
The quickflow parameters x 5 (maximum release rate from topsoil) and x 6 (baseflow rate exponent n of topsoil) are sensitive as well. Parameters of medium sensitivity are x 8 (PET correction factor), x 16 (degree day refreeze factor), x 18 (refreeze factor), x 19 (maximum snow liquid saturation), x 29 (thickness of topsoil), x 31 (temperature of rain-snow transition), and x 34 (snow correction factor).
Besides that, the most influential parameters are the weight generating random variables associated to processes that are 495 most sensitive (indicated by same color in Fig. 5A), i.e., r 3 , r 4 , r 7 . This is intuitive since switching processes may cause large variability in the model outputs and hence shifting their weighted averages is also likely to lead to large variability.
A sensitivity analysis regarding model parameters is often performed prior to model calibration to identify the most sensitive and hence most sensitive parameters which are in turn the parameters that are most likely to be identifiable during calibration.
The analysis shows that 13 of the 35 parameters (x 5 , x 6 , x 8 , x 16 , x 18 , x 19 , x 24 , x 25 , x 26 , x 27 , x 29 , x 31 , and x 34 ) are responsible 500 for 96.5% of the overall model variability (only ST w xi ; 77.2% if ST w ri included). All other parameters are unlikely to be identifiable during model calibration using streamflow measurements alone. Independent of the model structure selected, these model parameters are negligible and thus could be fixed at default values for the Salmon River catchment over the 20-year simulation period.
This analysis helps to identify the most important parameters independent of model structure and therefore helps to identify 505 main sources of parametric uncertainty in models despite structural configuration, presuming that individual structures are equally viable. It likewise determines non-identifiable parameters-as a traditional sensitivity analysis does-with respect to streamflow.

Process Option Sensitivity
The results of the sensitivity analysis of model parameters are consistent with the analysis of process options (Fig. 5B) where 510 all model parameters used in a process option are varied together rather than individually. The analysis of the process option sensitivities identifies options as most sensitive that contain model parameters that have been previously determined to be sensitive.
The potential melt process-the algorithm used to determine incoming melt energy-is used in this study only with one process option (POTMELT_HMETS) and it is still the hydrologic process that the simulated streamflow is most sensitive 515 to (orange bar). The three infiltration options are all equally sensitive (light blue bars). Same holds for the two options of the evaporation process (dark blue bars) which are slightly more influential than the infiltration processes (light blue bars).
The quickflow options BASE_VIC and BASE_TOPMODEL (medium blue bars) are the second-most sensitive after the potential melt. The quickflow option BASE_LINEAR_ANALYTIC however is much less sensitive. The two baseflow options (dark green bars) as well as the convolution options for surface and delayed runoff (yellow and light green bar) exhibit al-520 most no influence with sensitivity metrics near zero (ST w G < 0.0017 with G ∈ {P 1 , P 2 , R 1 , S 1 }}). The only percolation option (PERC LINEAR; light red bar), rain-snow partitioning option (RAINSNOW_HBV; medium red bar), and precipitation correction (RAINSNOW_CORRECTION; dark red bar) are showing medium sensitivities similar to the ones of the infiltration options. The SNOBAL_HBV option is the most sensitive among the three snow balance options. SNOBAL_HMETS is slightly less sensitive while SNOBAL_SIMPLE_MELT has a zero sensitivity. The latter serves as a consistency check of the 525 implementation. The zero sensitivity is expected since the SNOBAL_SIMPLE_MELT option does not require any parameters (see Tab. C1). Model outputs of such options do not change for different model runs and hence have a zero variance which leads to a zero Sobol' index. Another interesting result is the high sensitivity of the weight generating random numbers associated with the snow balance options (r 7 and r 8 ). The sensitivity of these parameters is caused by the fact that they are responsible for the "mixing" of the outputs of the model options. In this case they are mixing a process that is always the 530 same (SNOBAL_SIMPLE_MELT) and two options that can vary significantly with parameter choice (SNOBAL_HBV and SNOBAL_HMETS). Hence, the weighting of these processes can perturb the model output drastically and is hence yielding a high sensitivity of r 7 andr 8 .
In summary, it can be deduced that the potential melt, the quickflow options BASE_VIC and BASE_TOPMODEL, and the evaporation options are most influential upon modeled streamflow. The interpretation and use of this process option sensitivity 535 is open, and depends upon the purpose of the sensitivity analysis. As an example of interpretation, we can consider whether or not we wish to maximize the flexibility of our models in calibration, and if so, we may wish to discard insensitive processes.
The three infiltration options are equally sensitive and hence equally appropriate. The choice of the quickflow option will therefore not influence the model performance.
This analysis of process options allows, for the first time, to objectively compare model process options by mix-and-matching 540 all of them through the approach of a weighted mean of all outputs. It can assist the setup of models by guiding choices of process options and hence guides model structure decisions depending on the purpose of the model built.

Process Sensitivity
The sensitivity analysis of the eleven processes (Fig. 5C) consistently identifies potential melt T (orange bar) to be the dominating process for the Salmon River catchment. Technically, potential melt as well as rain-snow partitioning V and precipitation 545 correction W are handling inputs to the hydrologic system and can hence be regarded to quantify input uncertainties or, in other words, adjustments of forcing functions. The three processes are responsible for 42.1% of the overall model variability in this catchment. Note that the process weights r i , unlike for parameter and process option sensitivities, are not explicitly included in the process sensitivity results in Fig. 5C (unlike Fig.s 5A and 5B). The weights are part of the parameters that get grouped for each process to assess its sensitivity.

550
Processes associated to the surface are quickflow N and snow balance Q (medium blue and medium green bar, respectively) which are the second and third-most influential processes. These two processes control together about 38.5% of the model variability. The strong impact of these processes (together with the input adjustments) highlights the importance of snow and melting processes in this mountainous, energy-limited catchment.
The soil-related processes of infiltration M , evaporation O, and percolation U show a medium sensitivity (19.2% of total 555 model variability). This demonstrates that soil and surface processes are of secondary importance for streamflow prediction, but they may gain importance if the uncertainty of the snow and melting processes can be reduced, i.e. by narrowing parameter ranges during calibration.
Baseflow P , and the convolution of the surface R and delayed runoff S have almost no influence on the simulated streamflow (control on 0.2% of overall variability). These three processes demonstrate that subsurface and routing processes are not 560 important in this catchment over the 20-year simulation period at a daily time step. This in turn leads to the conclusion that even if, for example, additional ground water observations were available, it would not help to reduce model streamflow prediction uncertainty.
This model structure-based sensitivity analysis can help to guide model development by targeting the dominant model processes. It derives a high-level sensitivity of the main model components, i.e. processes. It reveals, for the first time, the sen-565 sitivity of model processes independent of model structure chosen and hence is one step towards sensitivity analyses regarding model structure using a true model ensemble by mix-and-matching a variety of model process options.

Process Sensitivity Over Time
The previous analysis estimated the time-aggregated sensitivities S w P and ST w P of model processes P (Fig. 5C) which might mask interesting patterns in the temporal sensitivities of streamflow to the eleven processes. We therefore augment the analysis 570 by calculating the normalized total Sobol' sensitivity indexes ST P (t ) of each process P at every day of the year t (Fig. 6).
Each value displayed is an average over 20 values-one for each year of the 20 year simulation period. The figure also shows the weights V (t) (Fig. 6 black line) that were used to derive the variance-weighted total Sobol' indexes (Eq. 24) previously discussed using Fig. 5C. The weights are generally higher during the high-flow freshet period (mid Feb to mid May) and are close to zero for the rest of the year. Infiltration (light blue) has an almost constant but minor sensitivity throughout the whole year. Quickflow (medium blue) is most of the time the dominating process-especially in summer-but not during the high-flow melt season. Evaporation (dark blue) is consistently responsible for about 35.4% of the sensitivity during summer (Jun to Oct) and is, expectedly, less important during winter. Snow balance (medium green) and potential melt (orange) are important as long as snow is present (Nov to May). Potential melt is about twice as influential than snow balance process. Percolation (light red) is almost constant 580 in its sensitivity but nearly negligible. Baseflow (dark green) and the convolution of the surface runoff as well as the delayed runoff (light green and yellow) are not even visible in the graph and have negligible sensitivities throughout the whole year.
These results highlight the importance of the weighting procedure when deriving the aggregated sensitivities shown in Fig. 5.
Potential melt (orange) is the most sensitive process when using variance-weighted aggregates due to its dominant influence in the high-flow season. The arithmetic mean of all time-dependent sensitivities S i (t) and ST i (t) would have certainly resulted 585 in a much higher sensitivity of the quickflow process which is not as important during the melting period but is responsible for 41.7% of the model variability during summer (Jun to Oct). The same holds for the evaporation process, which is highly sensitive in summer but not during the melting season.

Conclusions
The traditional method to derive sensitivity index estimates for model parameters conditional on a fixed model structure is 590 of limited applicability when the model is allowed to vary in its structure. First, the number of model runs can be massive when each model is analyzed independently. Second, the analysis derives a unique model-dependent sensitivity index for each parameter but no overall parameter sensitivity across the ensemble. Third, aggregated sensitivities of model processes or the sensitivity of a set of process options may lead to more useful insights than analyzing individual model parameters.
In this work we introduce two new concepts. The first is the idea of formulating the model ensemble using weighted sums of The second key contribution here is the application of the conventional Sobol' sensitivity analysis method based on grouped of parameters and interpreting these groups as process options and hydrologic processes. The Extended Sobol' Sensitivity Analysis (xSSA) method uses these groups of parameters to perturb them simultaneously rather than individually, allowing to simultaneously perform analyses for model parameters, model process options and model processes. While grouping of 605 parameters is not a new concept for Sobol' analyses they have to our knowledge not yet been interpreted in the context of model structural sensitivity assessment. The method was successfully tested using two artificial benchmark models based upon the Ishigami-Homma function. The estimated sensitivity indexes are proven to converge against the analytically derived Sobol'  Figure 5. Results of the Sobol' sensitivity analysis of the hydrologic modeling framework Raven. (A) The sensitivities of 35 model parameters (see Table C2) and 8 parameters ri that are used to determine the weights of process options are estimated. The Sobol' sensitivity index estimates are determined also for (B) 19 process options and (C) the 11 processes. The different colors indicate the association of parameters and process options to the eleven processes. Parameters x29 and x30 are associated with several process options and are not colored but gray.
The Sobol' main and total effects are shown (dark and light colored bars, respectively). All sensitivity index estimates shown are originally time-dependent and are aggregated as variance-weighted averages (Eq. 23 and 24). The average weights over the course of the year are shown in Figure 6.   Figure 5C. sensitivity indexes for model parameters, process options, and processes. The xSSA method is shown to resolve limitations of an existing method that also derives sensitivities of groups of parameters but that cannot handle overlapping parameter 610 groupings.
The Extended Sobol' sensitivity analysis method was also applied to a hydrologic modeling framework that supports the representation of internal model fluxes using a weighted sum of fluxes calculated from individual process algorithms/ options. The sensitivity analysis of the hydrologic modeling framework used here identified potential melt process and other surface processes as the most influential processes regarding streamflow in a mountainous, energy-limited, and snow-dominated catch-615 ment while all subsurface and routing processes were insensitive. These information helps to guide further model development, model calibration, and can inform the incorporation of additional observations to reduce model uncertainty. Three processes (potential melt, rain-snow partitioning, precipitation correction) handle solely inputs to the hydrologic system and can hence be attributed as input uncertainty or, in other words, model components adjusting forcing functions.
The presented methods of weighted process options and the application of the Extended Sobol' sensitivity analysis method 620 is presenting a simultaneous analysis of model structure, model parameters, and forcing adjustments in a frugal way consistent with known methods based on the Sobol' method.
Code and data availability. The code and data used for this analysis will be made available on GitHub (https://github.com/julemai/xSSA) upon publication of the manuscript. first generate a vector of random numbers (r 1 , r 2 , . . ., r N ) from a uniform distribution between 0 and 1. The corresponding vector of weights (w 1 , w 2 , . . ., w N +1 ) can then be calculated using: . . .
This sampling leads to the following CDF F N and PDF f N for each of the N + 1 weights w i :  Is process D or E more important?
The analytically-derived sensitivity indexes of sensitivity metric A are given in Table B1 using Ishigami-Homma parameters a = 2.0 and b = 1.0. The values are given for each of the 12 models that can be built using the different process options of the 660 artificial benchmark model.
The analytically-derived results of the overall parameter sensitivities (independent of the model options chosen), the sensitivities of process options, and sensitivities of processes are listed in Eq. B1, Eq. B2, and Eq. B3, respectively. The parameters r i therein are the random variables required to derive the weights w i of the process options according to Eq. 18 using the sampling strategy described in Appendix A. r 1 is used to derive the weights w d1 and w d2 , r 2 and r 3 are used to derive the weights w e1 , w e2 and w e3 , and r 4 is used to derive the weights w f 1 and w f 2 .     is assumed to be f (x) = D · E + F to mimic additive and multiplicative model structures. The model f (x) = D1 · E1 + F1 corresponds to the Ishigami-Homma function (Ishigami and Homma, 1990). The parameters a and b of the Ishigami-Homma function are set to 2.0 and 1.0, respectively. A dash (−) in the table indicates that the parameter is not active in the according model. The Raven hydrologic modeling framework (Craig et al., 2020) has been employed for this study. We used the released version 3.0 of Raven. The process options M 1 , M 2 , . . ., X 1 selected for this study are listed in Table C1). The model parameters active in the individual process options are given in that table as well. In total 35 model parameters are used in at least one of the model options. The valid ranges and parameter descriptions are given in Table C2. Further details about the process option implementation and the parameters can be found in the Raven documentation (Craig, 2019).

Process Process option Parameters active
Processes with multiple options: