Possibilistic response surfaces combining fuzzy targets and hydro-climatic uncertainty in flood vulnerability assessment

Several alternatives have been proposed to shift the paradigms of water management under uncertainty from predictive to decision-centric. An often mentioned tool is the stress-test response surface; mapping system performance to a large sample of future hydro-climatic conditions. Dividing this exposure space between success and failure requires clear performance targets. In practice, however, stakeholders and decision-makers may be confronted with ambiguous objectives for which there are no clearly-defined (crisp) performance thresholds. Furthermore, response surfaces can be non-deterministic, 5 as they do not fully capture all possible sources of hydro-climatic uncertainty. The challenge is thus to combine two different types of uncertainty: the irreducible uncertainty of the response itself relative to the variables that describe change, and the fuzziness of the performance target. We propose possibilistic surfaces to assess flood vulnerability with fuzzy performance thresholds. Three approaches are tested and compared on a un-gridded sample of the exposure space: (i) an aggregation of logistic regressions based on α-cuts combines the uncertainty of the response itself and the ambiguity of the target within a single 10 possibility measure; (ii) an alternative approximates the response with a fuzzy analytical surface; and (iii) a convex delineation expresses the largest range of failure specific to a given management rule without probabilistic assumptions. To illustrate the proposed approaches, we use the flood-prone reservoir system of the Upper Saint-François River Basin in Canada as a case study. This study shows that ambiguity can be effectively be considered when generating a response surface and suggests how further research could build a possibilistic framework for hydro-climatic uncertainty. 15

as weights on the response surface to inform probabilities associated to climate states. GCMs can thus remain useful without conditioning the decision process, and once updated their outcomes on the system can be mapped on the response without the need for new simulations of the water system.The intention shared within the overall decision-centric framework is to adapt classic risk assessment to the "death of stationarity" (Milly et al., 2008) while producing information more useful and engaging than a fully descriptive scenario approach (Weaver 2013). Response surfaces have been illustrated by many case studies (e.g. Nazemi et al., 2013, Turner et al., 2014, Whateley et al., 2014, Herman et al., 2015, Steinschneider et al., 2015, Spence et al., 2016, Pirttioja et al., 2019, Ray et al., 2020, expanded to many-objectives or stakeholder systems (Poff et al., 2016;Culley et al., 2016, Kim et al., 2019 and sometimes officially adopted in management processes (Moody and Brown, 2013, Weaver et al., 2013. Although the response surface is a powerful and efficient tool to circumvent the problems and arbitrariness brought by "top-70 down", GCM-based assessments, the applications to date remain relatively recent and scarce (Guo et al., 2018). Moreover, many assumptions associated with the stress test approach can introduce additional uncertainty.
One source can be the ambiguity of the user-defined performance targets . The stress-test approach needs performance target values (thresholds) in order to separate the exposure space between accepted and rejected domains. However such targets are often unclear or arbitrary; and are heavily reliant on political, sociological and institutional processes 75 (El- Baroudy and Simonovic, 2004). Fuzzy set theory (Zadeh, 1965) provides an analytical framework to characterize and manipulate stakeholders' ambiguity (Huynh et al., 2007). It has been extensively used in the water domain (El- Baroudy andSimonovic, 2004, Qiu et al., 2018) in particular to solve multi-objective decision-making problems (e.g. Jun et al., 2013). However, to the best of our knowledge, fuzzy set theory has not yet been used to handle imprecise thresholds between satisfactory and failure regions of a response surface. The very notion of an arbitrary threshold defining success, like flood control relia-80 bility above 0.95, can be considered as a departure from a strictly probabilistic framework and could justify a complementary possibilistic approach based on fuzzy sets (Dubois et al., 2004).
Independently from performance targets, response functions also have their own noise or internal uncertainty, as their selected driving variables can only partially explain hydrological and climatic uncertainties. As such, performance is an expected value rather than a deterministic one, hence possibly underestimating real risks. Irreducible uncertainty usually requires adap-85 tive management (Brown et al., 2011), but there is interest to integrate part of this information into the response surface tool. Kay et al. (2014) proposed the use of uncertainty allowances that could vary depending on the response type and catchment.
More specifically, flood control systems operate on shorter time scales and are even harder to assess over long term climate shifts (Knighton et al. 2017), thus also more challenging to evaluate with response functions. Kim et al. (2018) stress how the choice of modelling time scale (daily vs hourly) can lead to risk underestimation. The choice of different scenario-neutral 90 methods can lead to different results (Keller et al., 2019), notably the choice of the synthetic series generator (Nazemi et al., 2020). Steinschneider et al. (2015) compare different sources of uncertainty, acknowledging the strong impacts of hydrological modelling and internal climate variability compared to long term climate uncertainty, as well as .
Testing a limited number of stressors as explaining variables therefore leads to a response function that returns uncertain performance. Kim et al. (2019) propose to associate probabilities to uncertain response functions through logistic regression, while sampling, which can also be a loss of information (e.g. Huang, 2000 for elevation models) and thus risk under-estimation.
The objective of the present study is to combine with a possibilistic approach two different types of uncertainty: the fuzziness of performance targets and the irreducible uncertainty of the response surface. The rationale behind it and three tested implementations are presented in section 2: a numerical approximation of a fuzzy-random logistic regression, a fuzzy analyt-100 ical approximation of the response itself and a convex delineation of the largest range of failure. A case study is presented in section 3, a flood-prone reservoir system in southern Québec, Canada. Results are presented in part 4, followed by a discussion on the respective merits and limitations of the proposed methods.

Rationale
quantifies its ability to mitigate the number or amplitude of local failures. For example, the reliability of a flood control system can be measured as the proportion of a given period where no flooding happens. When performing a stress-test of a system, 115 overall success or failure is usually defined by a performance target θ, for example reliability above 0.95 over a given period can define overall success.
A stress-test maps the performance p on a response surface, to a limited number of descriptive variables x i . It aims at delineating the subsets A and D of overall success and failure (Fig. 1). Such variables, like the mean flow, the peak flow, or temporal autocorrelations, are aggregations of the time series that are the inputs of a water system simulation. Because a limited 120 number of descriptors do not capture all possible fluctuations of a time series, a term of irreducible uncertainty remains. The response surface is then given by: In a risk-averse approach, the objective is to find the range of failure (more than success), the space over which a system fails to satisfy a performance target θ. With 2 variables, this space is the set of solutions D = (x * 1 , x * 2 ) to the inequation p < θ, Simplifying the response surface by averaging it over p (vertically) can thus under-estimate the failure domain. Irreducible uncertainty can be addressed through adaptive management (Brown et al., 2011), uncertainty allowances (Kay et al., 2014), and extensive Monte-Carlo sampling . If possible though, it can be convenient to directly integrate 130 information about remaining uncertainty within the response surface itself. It can be represented through a transition zone between success and failure domains, as performed by Kim et al. (2019) with a logistic regression. Besides, most studies use gridded sampling of the exposure space, which is a horizontal aggregation that also results in information loss like in the case of digital elevation models (Huang, 2000), and which in this case can also under-estimate risks. A simple un-gridded alternative is proposed in section 3.2.

Fuzzy performance targets
The performance target θ defines the set of successful outcomes. It is a subjective or arbitrary opinion from stakeholders or decision makers to attribute a normative value to a certain performance level. The vast majority of the studies reported in the literature assume that the threshold between satisfactory and unsatisfactory states is crisp , Culley et al., 2016, Kim et al., 2019. As such a threshold shapes directly the partition of the response function, with a crisp value the 140 exposure space can be subdivided in only two sub-spaces: failure versus success.
The very existence of a target is the basis of satisficing behaviors (Simon, 1955) that differ from utility maximizing behaviors as coined by Von Neumann and Morgenstern (1944). In practice however, while clearly following a satisficing model, there might be situations whereby the water manager is unable (or unwilling) to provide a crisp, well-defined target, or when such threshold is disagreed upon by stakeholders. For example, when controlling water levels in a reservoir to prevent inundations, 145 5 https://doi.org/10.5194/hess-2020-214 Preprint. Discussion started: 1 July 2020 c Author(s) 2020. CC BY 4.0 License. the operator can handle certain tolerances above the maximum desired level. Of course, the greater the deviation from the desired level, the less acceptable it becomes.
Mathematically, fuzzy sets theory (FST) handles imprecisely-defined or ambiguous quantities. Introduced by Zadeh in 1965, fuzzy sets theory has become a common tool in decision making analysis or computational sciences when non-probabilistic uncertainty stemming from ambiguity or vagueness must be considered (Yu et al., 2002). In our case, FST allows us to introduce 150 vagueness in target-based decision making, without forsaking a target-based model in favor of an unbounded maximizing behavior (although a fuzzy target can also be seen as a generalization of both maximizing and satisficing behaviors -see Castagnoli andLiCalzi, 1996, andHuynh et al., 2007).
We consider here the case where such a target θ may not be precisely defined by stakeholders but can take many subjective qualifications from acceptable to unbearable, hence relaxing (without fully removing) the arbitrary condition of satisfying a 155 crisp value. A fuzzy set A µ of acceptable states therefore qualifies the performance p with a membership value comprised between 0 and 1. The membership function µ associated to the fuzzy set A describes the degree to which any value of p more or less belongs to A (Figure 2, eq 3).
When a threshold corresponds to a fuzzy set, it means that there is a transition zone between success and failure states where 160 intermediate levels of membership exist. Conversely, another interpretation is that the membership function is the distribution of the possibilities (Zadeh 1978, Dubois andPrade, 1988) that any given performance p represents a success.
An α-cut A α is the crisp set over A µ for which the membership degree to A µ is equal or above α. The largest α-cut is called the support of the fuzzy set A µ (p ≥ θ 1 ). The smallest α-cut is the core of the fuzzy set (p ≥ θ 2 ).

Combination of fuzzy targets and uncertain response function
The challenge is to combine two different sources of uncertainty described in section 2.1: the uncertainty or low quality of the response itself relative to the variables that describe change, and the fuzziness of the performance target. In order to integrate both, three methods are suggested: an approximated fuzzy-random logistic regression, an analytical approximation of the response surface and a convex delineation of the space of failure.

Approximation of a fuzzy-random logistic regression
As the goal of the response surface is to divide the exposure space between success and failure, the value associated to any combination of variables can be either 0 or 1 if a specific performance target θ is reached or not. As seen in section 2.1, an intrinsic uncertainty remains in response surfaces. Kim et al. (2019) introduce the logistic regression to incorporate probabilistic information into the response surface. The logistic regression is used to explain a binary outcome from independent variables 175 (x 1 , x 2 ), and returns a probability of success π : where x i are the defining variables of the exposure space and β i the regression coefficients. The logistic response surface 180 therefore provides the probability π of meeting the target θ over the (x 1 , x 2 ) exposure space. Partitions of the space between success and failure sub-spaces, that can be defined as π − cuts, are now relative to a specific probability of success π* taken by π θ : By considering the domain of successful outcomes as a fuzzy set, we introduce a layer of uncertainty that is different in 185 nature from the irreducible hydro-climatic uncertainty. While the logistic regression returns a probability of surpassing any given performance target for a combination of variables (eq. 5 and 6), the fuzzy set of success returns the possibility of any such performance target being actually considered as a success (eq. 7).
Fuzzy regression models, including fuzzy logistic regression (e.g. Pourahmad et al., 2011, Namdari et al., 2014 replace probabilities by fuzzy numbers; they usually do not combine them. Fuzzy probabilities (Zadeh, 1984) are considered within 190 the so-called fuzzy random regression field, however no fuzzy random logistic regression seems to have been developed to date (see Chukhrova and Johannssen, 2019, for a review of the fuzzy regression field). Here we use a discretised approximation of a fuzzy random logistic regression based on α-cuts. A single target θ defining a crisp set of success A is also an α-cut of the fuzzy set of success A µ . Then a single logistic regression for any success threshold θ is also the probability of belonging to the α-cut of the fuzzy set of success defined by θ: with α = µ(θ).
Following the interpretation of Huynh et al. (2007), the overall possibility Π of the random variable p belonging to the fuzzy set A µ can be given by the integral over α of the probabilities of success defined at every α-cut.
And thus The approximated logistic regression for a fuzzy set of success is therefore the average of the logistic regressions for all the associated α-cuts. With a uniform discretization of 10 alpha levels, the spacing of every α-cut, defined with θ = µ −1 (α), relies on the shape of the membership function. A linear shape of µ(p) leads to a uniform sampling of the α-cuts, while a sigmoid 205 shape leads to a Gaussian sampling of α-cuts centered on θ (Fig. 3).

Analytical approximation of the response function and fuzzy set intersection
Instead of fitting a logistic function to the binary outcomes of success and failure, performance itself can be directly approximated as an analytical response surface (eq. 3). The final outcome is then a direct mapping from the performance approximation to a [0 1] degree of success with the membership function of the fuzzy set of success A µ (eq. 5). For every (x 1 , x 2 ), a sin-210 gle approximated performance is given by p * = g * (x 1 , x 2 ), so the possibilistic response surface is defined by µ R (p * ). The membership function µ is modified as follows to account for the fitting error R.
To any value of p * is associated the membership degree µ between 0 and 1 depending on θ 1 and θ 2 , 1 defining complete success. The fitting error, or uncertainty around performance p * can be expressed as another set S p , centered on any value p with a 2R-sized support ( Figure 4a). This set can also follow any shape depending on the user's risk aversion. With a risk-215 averse attitude, a crisp set defines here R assuming a uniform possibility distribution. But an actual distribution of R around the approximation could also be used.
The modified membership degree µ R (p * ) should account for the 2R-large interval that represents the possibility domain around p * . So at any given p * , the possible acceptability values are represented by the intersection between the sets S p and A µ , given by the MIN operator. The resulting value µ R (p * ) is the average over this intersection.

220
The new membership function µ R over the entire domain of performance is then the moving average of µ with 2R window size ( Figure 4b). A single possibility surface is thus obtained for any (x 1 , x 2 ) coordinate (eq. 11).

Convex hulls as range of success and failure
The climate stress test seeks to identify accepted and rejected sub-spaces A and D within the exposure space. As seen in section 225 2.1.1, gridded sampling can result in risk under-estimation. With an uncertain, noisy response function and a non-gridded sampling of the exposure space, the sets A and D of accepted and rejected points do not form two cohesive, identifiable and mutually exclusive sub-spaces. The methods described in sections 2.2.1 and 2.2.2 are regression or surface fits that incorporate the remaining errors but are still approximations and might not represent all possibilities of success or failure. In a risk-averse approach, decision-relevant outliers could also be considered, in order to prepare for the most unlikely, but possible failures.

230
The question is which performance values should be attributed at any location between sampled points.
One simple way to conservatively identify a sub-space from a set of points is their convex hull, the smallest possible convex space that contains the set. Convex hulls are extensively used in point process analysis and notably decision theory and risk analysis (Harris 1971). They can be used to identify failure regions when a response surface is inadequate, e.g. in mechanical engineering (Missoum et al., 2007).

235
The underlying assumption is that, for any triangle of points contained in a set, any point within the triangle also belongs to the set. Following further a possibility-centric approach, what is sought here is the largest convex range of failure (LCRF).
While less impactful for a risk-averse decision process, the largest range of success can also be expressed to further differentiate the regions of the response function. With a deterministic response function a single threshold will discriminate a space between two complementary sub-spaces, accepted and rejected. With a noisy response and a crisp target, both sub-spaces will overlap, 240 creating a transition zone (Fig. 5).
Considering a fuzzy performance target we modify the definition of both accepted and rejected sub-spaces. The loosely accepted sub-space is the convex hull of all performance values superior to the lower threshold. The loosely rejected sub-space is the convex hull of all performance values inferior to the higher threshold. (resp. success) is simply defined by the smallest convex set of points where p belongs to the smallest (resp. largest) α-cut of A µ also called core (resp. support).
This method gives more weight to outliers, as they define the convex hull. It is a simple measure of possibility, and does not discriminate points within the transition zone. Different management rules are compared according to the relative position and overlap areas of their respective transition zones.

250
As decision-centric methods rely on large number of simulations, computing power parsimony is an applicability concern (e.g. , Zatarain Salazar et al., 2017. The LRCF is only defined by its vertices, and only the border closer to the success region matters for decision purposes. Large parts of the response surface could potentially be ignored, saving computation time. An iterative sampling of the response function can thus complement the LCRF method.
3 Application 255 A reservoir system in eastern Canada is used as case study to illustrate the applicability of the possibilistic response surfaces

Upper Saint-François River Basin features
The Upper Saint-François River Basin (USFRB) is located in the province of Quebec, Canada. The selected gauging point, near the agglomeration of Weedon, drains an area of 2940 km 2 with an average annual flow of 2.1 billion cubic meters. The system (Fig. 6) involves the Saint François River, controlled by two reservoirs Lake Saint-François and Lake Aylmer with a 260 combined storage capacity of 941 million cubic meters, and the uncontrolled affluent Saumon River. lakes from floods, (ii) to ensure minimum river discharges and water levels in the lakes to preserve aquatic ecosystems, (iii) to provide the downstream run off river power station with a reliable water discharge; and (iv) to maintain desired water levels in  use streamflow stressors instead of climatic ones, as the present study does not aim at differentiating between several sources of uncertainty -climate change, climate variability, run-off modelling -but proposes a method that accommodates any of them.
In Québec, the CEHQ water agency regularly produces projections of river discharges throughout the province as part of Similarly to other stress-test studies that generate inflow instead of climate time series (Feng et al., 2017), the selected driving variables (axes x and y of the response function) are the total annual inflow volume and a measure of the intra-annual variability of streamflow. The intra-annual variability is here measured with the dispersion coefficient G, a measure also known 305 as Gini coefficient in economics but employed too in hydrology (Masaki et al., 2014). It is similar to the variation coefficient used in other studies but bound between 0 and 1, which offers convenient interpretation: at G=0 all daily discharges in a year are equal, if G=1 the entire yearly run-off happens in a single day. Like the variation coefficient it allows for a second variable statistically independent of the total annual run-off volume. Here q i are the ordered daily discharges of a given year, N=365 days.

Simulation and response surface
The model is built with HEC-ResSim, the Reservoir System Simulation software developed by the US Army Corps of Engineers (Klipsch and Hurst, 2007). It relies on a network of elements representing the physical system (reservoirs, junctions, actual reservoirs through a rule-based modeling of operational constraints and targets. Hydrologic inputs consist of 30 years long, daily river discharges for each sub-basin. The main outputs are daily water levels in lakes, reservoir releases, as well as the discharge at the outlet. A complementary Jython routine is developed in order to run HEC-ResSim in a loop to systematically load a large set of different hydro-climatic scenarios. Dam characteristics and operational rules were provided by the Quebec Water Agency (MELCC, 2018).

320
The model is developed with a first set of operating rules (rule 1) expected to mimic the current operation of the system.
It reproduces measured daily releases over the 2000-2014 period. 4008 simulations are then run, each taking an input of synthetic daily flow series spanning 30 years. In order to increase the density of the un-gridded exposure space sampling, results are divided in 5 years periods.Such decomposition is deemed acceptable based on the reservoir system, which storage capacity is designed for seasonal regulation, not multi-year, mitigating the effects of boundary conditions. It leads to a sample 325 of 24'048 points, each one representing a five-year simulation. Observation is independence not considered here, as the prime objective is to maximize the diversity and noise of the sample.
Although the operating rules were designed taking into account all operating objectives, the present study focuses on the flood control performance p. More specifically, it is the reliability (Hashimoto et al., 1982) of the system keeping the river discharge at Weedon below 300 m 3 s −1 . Mathematically, if F (t) is the state of flooding at time step t, then p is given by: The response function is built by representing p as a function of the selected inflow characteristics (yearly volume and dispersion). As developed in section 2, the separation of the exposure subspace is first performed through a performance The LRCF method is also tried through an iterative sampling to evaluate its potential for computational parsimony. The convex range of failure is thus first calculated on downscaled time series, including raw ones without bias correction (Fig. 7a).
Then each iteration expands the range by sampling pre-generated synthetic time series around its boundaries, and simulating the water system with only those. Failure points constitute the new hull for the next iteration (Fig. 7b). Not finding failure 345 points is the exit condition. Results are compared to the LRCF calculation on the full sample.

Simulations
The simulation is first run with 122 of the original time series made available by the CEHQ. These are the bias-corrected rainfall / run-off simulations considered as the most reliable, corresponding to different radiative forcing scenarios. Taken by

Logistic surface aggregated over fuzzy target
The logistic regression is first performed with the response surface converted into crisp binary outcomes. Success is defined 360 by p ≥ 0.95, failure by p < 0.95. The logistic surface provides the probability of success π at any coordinates ( Fig. 9).
Depending on the risk attitude of stakeholders or decision-maker, the surface can be divided in success and failure regions for specific probabilities of success (π-cuts, eq. 9  For both cases, and for both rule sets, the logistic regression is performed 10 times for 10 α-cuts corresponding to a uniform sampling of α-levels (see section 2.2.1.). The aggregated logistic regression at every coordinate is the average of the 10 logistic regressions, each one considering a single α-cut as the crisp set over p that defines successful outcomes. Figure 11 compares

Bivariate surface approximation
The second method consists in computing analytical functions, one for each rule, to fit the available sample, in this case with a bivariate quadratic approximation (Fig. 13). The resulting error R = 0.03 (selected here as the 95% quantile in the error distribution) is used to modify the membership function µ of the success fuzzy set A µ , with a moving average with 2R-sized window (section 2.2.2.).

390
With an explicit, deterministic function and a modified membership function µ R , the membership degree to the fuzzy set A µR allows to map success (white, membership degree 1) and failure (black, membership degree 0) sub-spaces with a continuous transition (Fig. 14a).

395
Comparison between operation rules seems however less conclusive in this case (Fig. 14b), at least around the GCM-based projections. No projection falls out of the 0.95 Π-cut. 4 projections fail to meet the 0.99 Π-cut for rule 1, 6 for rule 2 (rule 1 thus being slightly better). For rule 1, 44 projections do not reach an aggregated membership of 1, 40 for rule 2, swapping positions in this case. It can be noted that, while both rules are similar in the vicinity of GCM-based projections, rule 1 performs better in the low inflow, high dispersion zone, and rule 2 in the high inflow, low dispersion. The method allows however for a non-linear 400 relation between x 1 ,x 2 variables and performance, as opposed to the logistic regression. Dispersion has a varying effect on performance depending on total inflow, with an increasing slope as yearly inflow grows.

Largest convex range of failure (LCRF)
The third approach to identify accepted and rejected sub-spaces for each management rule is the comparison of respective convex hulls, each hull representing a possibility sub-space of success/failure of the system respective to flood reliability. It 405 heavily relies on outliers and thus represents an upper possibility bound, here called the maximum convex range of success or failure (LCRF). It answers the question "where can the system possibly fail or succeed" given a sample of points. Figure 15a shows the maximum range of success (dashed line) for both rule 1 (blue) and rule 2 (red), that includes all values with flood reliability superior or equal to the acceptability threshold, 0.95, here considered as a crisp value. The solid line hulls represent the maximum range of failure, containing all values with flood reliability inferior to 0.95. The overlap between maximum 410 ranges is a transition zone. With a similar range of failure and a larger range of success, the rule set 2 (red) would this time be considered as superior to the rule set 1.
In Fig. 15b, sub-spaces are defined with a fuzzy performance target, with a +/-0.02 tolerance and bounds defined as [0.93, 0.97[ in section 2.1.2. The largest range of success now accepts candidate values that are partially accepted (partial successes, p ≥ 0.93), so it considers success as the largest α-cut of A µ . The shape of the membership function has no influence here.

415
Respectively the largest convex range of failure considers as success the smallest α-cut of A µ , it now accepts candidate values considered as partial failures of the system (p<0.97).

Discussion
By itself, the stress-test approach is a departure from a probabilistic framework towards a possibilistic one. It asks what situations lead to a system failure, instead of evaluating the system for the most probable future. Since response surfaces are 450 not deterministic, further information of irreducible uncertainty must be incorporated through e.g. the use of logistic regression 22 https://doi.org/10.5194/hess-2020-214 Preprint. Discussion started: 1 July 2020 c Author(s) 2020. CC BY 4.0 License. (Kim et al., 2019). In this paper, we further consider that the threshold employed to define success might be itself ambiguous or contentious. The fuzzy or possibilistic framework (Zadeh 1965(Zadeh , 1978Dubois and Prade, 1988), often used in decision-making analysis provides the analytical tools to incorporate an uncertainty that is not probabilistic in nature, the ambiguity of a decision target, within the popular stress-test tool that itself seeks to depart from probabilistic approaches.

455
Applying a fuzzy target would be straightforward for a deterministic response surface, each performance value on the exposure space being mapped to a degree of success between 0 and 1. This study explores how to combine a fuzzy definition of success or failure with the remaining hydro-climatic uncertainty of the response surface, and compares different methods and interpretations.
As a first option, the aggregated logistic regression measures within a single possibility value the probabilistic information 460 of the regressions and the fuzzy definition of the performance target. The shape of the membership function also affects how the α-cuts are sampled, thus allowing for different interpretations of ambiguity and decision theories. A linear membership function translates a form of neutrality towards marginal gains or losses within the fuzzy boundary. A sigmoid shape gives more weight to the median α-cut , corresponding to a degree of success of 0.5, and diminishing marginal improvement or loss the further the α-cut is from the median, which can be thought as based on prospect theory (Kahneman and Tversky,465 1979). The relative performance of the compared rules, however, is not here altered by the inclusion of the fuzzy target, so the resulting decision is not affected. It still might be the case when the response surfaces have different slopes and gradient directions depending on the tested alternative.
The analytic approximation by a quadratic function similarly proposes a continuous measure of possibility over the exposure space, but has the advantage of identifying non-linearity between describing variables, which the logistic regression cannot.

470
Possible drawback are that, while the logistic regression considers equally all results that fall out of the fuzzy target, extreme performance values here shape the fitted surface and might have an influence that they do not have in the decision process. And of course, the sample might be unadapted to any fitting attempt. As for comparing options, the method remains less conclusive, in the vicinity of GCM projections both rules cannot be easily sorted out. Importantly, it confirms however the possibility of diverging slopes and directions, as the preference between the two rules can switch depending on the position in the exposure 475 space, and therefore that considering fuzzy targets could very well alter the preference between rules. Varying preference depending on the exposure space is also a case for adaptive management.
The largest convex ranges of success or failure provide an upper possibility bound and thus easily integrate the fuzzy target by either maximizing or minimizing the α-cut of the fuzzy set of success. The shape of the membership function has no impact here, only its bounds. An advantage of the method is its consistency: if the whole philosophy of a stress test is asking where the 480 system can possibly fail, then it is good to look and to prepare for the least probable cases of failure, those in a region where success is almost guaranteed.
However, the convex hulls are obviously highly reliant on the generation of synthetic scenarios, which has its intrinsic randomness. The largest convex range of failure focuses on a very limited number of vertices as hydro-climatic situations of interest, the least probable, but still possible, configurations of failure; but this limited set also entails a strong sensitivity to concern (Nazemi et al., 2020) and should be further studied, possibly integrating the potential differences within a possibilistic approach.
The fact that they are defined by a very small number of points could allow for much shorter simulation times with the right sampling method. However a first trial with a simple search algorithm shows for now an important spread in results for limited Un-gridded and un-aggregated sampling here allows exploring more comprehensively the variability of the response, which can be more consistent with the whole stress test approach for certain systems. It also makes use of existing streamflow scenarios, but it has drawbacks. One advantage of usual stress-tests is their scenario-neutral property (Prudhomme et al., 2010): the extensive computation of the response surface only needs to be done once, further information on future conditions can be 500 projected directly on it. Here stream-flow scenarios, based on GCM projections, were used to generate synthetic time series in order to obtain the granular, non-gridded response. This creates a link between downscaled scenarios and the response that is usually avoided. The underlying assumption is that between the number of scenarios, the different types of bias correction (or lack of thereof) and the imposed perturbations, the diversity of the synthetic time series leads to a relative independence from the initial bias of GCM simulations. Besides, the generation of synthetic time series always relies on available data on one way 505 or another; a "scenario-neutral" generation could rely on historical observations and be skewed towards a conservative bias in face of for example, brutal climate shifts. The present generation method prioritizes sample diversity, but its assumptions could be further examined, such as the lack of independence between observations and thus the applicability of the logistic regression.
Likewise, the choice of describing variables was not the focus of the study but should be subject to an initial comparison of predicting values within a larger number of predictors.

510
Another trade-off from such synthetic generation is the uneven sampling, denser at the vicinity of available streamflow scenarios from downscaled rainfall series. The tool to generate un-gridded sampling should ideally ensure a balanced sample density over the response surface. This study used like others the streamflow scenarios from GCM projections as a prioritization tool and focuses on their vicinity, thus paying less attention to the sampling density in other areas.
The integration of uncertainty and ambiguity quantification within the response surface tool could allow for further aggre-515 gation options in a multi-objective problem (like in Poff et al., 2016, Kim et al., 2019, while easily controlling its two separate components, response uncertainty and target ambiguity. Other sources of uncertainty could also be added and combined, like ambiguity about the streamflow threshold that defines a state of flooding, the goodness of fit for the approximations, or the expert judgement or trust on data quality.
We explore in this study how to integrate fuzzy performance targets within uncertain response surfaces in decision-centric vulnerability assessments. Three methods are proposed to produce a possibilistic surface. Aggregating logistic regressions over α-cuts combines probability of success and target ambiguity in a single measure. Using a quadratic approximation of the response surface itself allows for non-linear relations. The largest convex ranges seek upper bounds for the possibility of success or failure. Two possible management rules are compared for the Upper Saint-François reservoir system in Canada.

525
Aggregated logistic regression and largest range of failure show complementary ways to integrate fuzzy targets and differentiate failure domains, with respective advantage and limitations. For continuous approximations, fit quality could be integrated in the final uncertainty measure. The largest convex range method can be refined by further perturbation of the streamflow series on the vertices, in order to find a physical boundary to success and failure.
Challenging old probabilistic assumptions, notably in a climate crisis context, brings new tools that also imply new choices 530 and degrees of arbitrariness. How to transparently elaborate fuzzy targets jointly with stakeholders, or the choice of a synthetic scenario generator, are necessary research continuations. The presented approach enables further work on multi-objective problems and aggregation choices. The framework here introduced to solve a practical challenge can be consolidated from a more theoretical perspective, from both possibility theory and decision making under deep uncertainty.
Code and data availability. The data can be provided upon authorization from the MELCC, Québec, Canada (Ministère de l'Environnement 535 et de la Lutte contre les Changements Climatiques). The codes required to reproduce the results are available upon request (thibaut.lachaut1@ulaval.ca).
Author contributions. TL and AT conceptualized the study. TL developed the methods, models and simulations, and drafted the manuscript.
AT acquired the funding and provided extensive supervision.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. The work was supported by a project from Ministère de l'Environnement et de la Lutte contre les Changements Clima-