Accurate estimation of extreme discharges in rivers, such as the Meuse, is crucial for effective flood risk assessment. However, hydrological models that estimate such discharges often lack transparency regarding the uncertainty in their predictions. This was evidenced by the devastating flood that occurred in July 2021, which was not captured by the existing model for estimating design discharges. This article proposes an approach to obtain uncertainty estimates for extremes with structured expert judgment using the classical model (CM). A simple statistical model was developed for the river basin, consisting of correlated generalized extreme value (GEV) distributions for discharges from upstream tributaries. The model was fitted to seven experts' estimates and historical measurements using Bayesian inference. Results were fitted only to the measurements were solely informative for more frequent events, while fitting only to the expert estimates reduced uncertainty solely for extremes. Combining both historical observations and estimates of extremes provided the most plausible results. The classical model reduced the uncertainty by appointing the most weight to the two most accurate experts, based on their estimates of less extreme discharges. The study demonstrates that with the presented Bayesian approach that combines historical data and expert-informed priors, a group of hydrological experts can provide plausible estimates for discharges and potentially also other (hydrological) extremes with relatively manageable effort.

Estimating the magnitude of extreme flood events comes with considerable uncertainty. This became clear once more on 18 July 2021: a flood wave on the Meuse River, following a few days of rain in the Eiffel and Ardennes, caused the highest peak discharge ever measured at Borgharen. Unprecedented rainfall volumes fell in a short period of time

Extreme value analysis often involves estimating the magnitude of events that are greater than the largest from historical (representative) records. This requires establishing a model that describes the probability of experiencing such events within a specific period and subsequently extrapolating this to specific exceedance probabilities. For the Meuse, the traditional approach is fitting a probability distribution to periodic maxima and extrapolating from it

GRADE (Generator of Rainfall And Discharge Extremes) is a model-based answer to these shortcomings. It is used to determine design conditions for the rivers Meuse and Rhine in the Netherlands. GRADE is a variant of a conventional regional flood frequency analysis. Instead of using only historical observations, it resamples these into long synthetic time series of rainfall that express the observed spatial and temporal variation. It then uses a hydrological model to calculate tributary flows and a hydraulic model to simulate river discharges

GRADE is an example where underestimation of uncertainty is observed, but it is certainly not the only model. For example,

In this context, structured expert judgment (SEJ) is another data-based approach. Expert judgment (EJ) is a broad term for gathering data from judgments based on expertise in a knowledge area or discipline. It is indispensable in every scientific application as a way of assessing the truth or value of new information. Structured expert judgment formalizes EJ by eliciting expert judgments in such a way that judgments can be treated as scientific data. One structured method for this is the classical model, also known as Cooke's method

While examples of specifically using the classical model in hydrology are not abundantly available, there are many examples of expert judgment as prior information meant to decrease uncertainty and sensitivity. Four examples in which a Bayesian approach, similar to this study, was applied to limit the uncertainty in extreme discharge estimates are given by

This study applies structured expert judgment to estimate the magnitude of discharge events for the Meuse River up to an annual exceedance probability of, on average, once per 1000 years. We aim to get uncertainty estimates for these discharges. Their credibility is assessed by comparing them to GRADE, the aforementioned model-based method for deriving the Meuse River's design flood frequency statistics. A statistical model is quantified with both observed annual maxima and seven experts' estimates for the 10- and 1000-year discharge on the main Meuse tributaries. The 10-year discharges (unknown to experts at the moment of the elicitation) are used to derive a performance-based expert weight that is used to inform the 1000-year discharges. Participants use their own approach to come up with uncertainty estimates. To investigate the comparison between (a) the method that combines data and expert judgments and (b) the data-only method or (c) the expert-estimates-only approach, we quantify the model based on all three options. The differences show the added value of each component. This indicates the method's performance both when measurements are available and when they are not, for example, in data scarce areas.

Figure

Map of the Meuse catchment considered in this study, with main river, tributaries, streams, and catchment bounds.

The numbered dots indicate the locations along the tributaries where the discharges are measured. These locations' names and the tributaries' names are shown in the lower left. Elevation is shown with the grey scale. Elevation data were obtained from European Digital Elevation Model

catchment overview, i.e., a map with elevation, catchments, tributaries, and gauging locations

land use, i.e., a map with land use from

river profiles and time of concentration, i.e., a figure with longitudinal river profiles and a figure with time between the tributary peaks and the peak at Borgharen for discharges at Borgharen greater than 750

tabular catchment characteristics, such as area per catchment, as well as the catchment's fraction of the total area upstream of the downstream locations; soil composition from

statistics of precipitation, including the daily precipitation per month and catchment; sum of annual precipitation per catchment; and intensity duration frequency curves for the annual recurrence intervals of 1, 2, 5, 10, 25, 50, and the maximum, all calculated from gridded E-OBS reanalysis data provided by

hyetographs and hydrographs, i.e., temporal rainfall patterns and hydrographs for all catchments/tributaries during the 10 largest discharges measure at Borgharen (sources described below).

This information, included in the Supplement, was provided to the experts to support them in making their estimates. The discharge data needed to fit the model to the observations were obtained from

To obtain estimates for downstream discharge extremes, experts needed to quantify individual components in a model that gives the downstream discharge as the sum of the tributary discharges multiplied by a factor correcting for covered area and hydrodynamics:

The tributary peak discharge,

In summary, using the method of SEJ described in Sect.

the tributary peak discharge,

the factor

the correlation between tributary peak discharges (as explained below).

With these, the model in Eq. (

The experts' estimates are elicited using the classical model. This is a structured approach to elicit uncertainty for unknown quantities. It combines expert judgments based on empirical control questions, with the aim to find a single combined estimate for the variables of interest (a rational consensus). The classical model is typically employed when alternative approaches for quantifying uncertain variables are lacking or unsatisfying (e.g., due to costs or ethical limitations). It is extensively described in

In the classical model, a group of participants, often researchers or practitioners in the field of interest, provide uncertainty estimates for a set of questions. These can be divided into two categories; seed and target questions. Seed questions are used to assess the participants' ability to estimate uncertainty within the context of the study. The answers to these questions are known by the researchers but not by the participants at the moment of the elicitation. Seed questions are often sourced from similar studies or cases and are as close as possible to the variables of interest. In any case, they are related to the field of expertise of the participant pool but unknown to the participants. Target questions concern the variables of interest, for which the answer is unknown to both researchers and participants.

Because the goal is to elicit uncertainty, experts estimate percentiles rather than a single value. Typically, these are the 5th, 50th, and 95th percentile. Two scores are calculated from an expert's three-percentile estimates; the statistical accuracy (SA) and information score. The three percentiles create a probability vector with four inter-quantile intervals,

Additionally to the SA, the information score compares the degree of uncertainty in an expert's answer compared to other experts. Percentile estimates that are close together (compared to the other participants) are more informative and get a higher information score. The product of the statistical accuracy and information score gives the expert's weight,

The statistical accuracy dominates the expert weight, where the information score modulates between experts with a similar SA. Experts with a SA lower than

This is called the global weight (GL) DM.

Alternatively, experts can be given the same weight, which results in the equal weight (EQ) DM. This does not require eliciting seed variables, but neither does it distinguish experts based on their performance, a key aspect of the classical model (CM).

To construct the DM, probability density functions (PDFs), such as

List of experts with their affiliations and professional interests.

In this study, the seed questions involve the 10-year discharges for the tributaries of the river Meuse. An example of a seed question is “what is the discharge that is exceeded on average once per 10 years, for the Vesdre at Chaudfontaine?” The target questions concern the 1000-year discharges as well as the ratio between the upstream sum and downstream discharge. Discharges with a 10-year recurrence interval are exceptional but can, in general, be reliably approximated from measured data. Seven experts participated in the in-person elicitation that took place on the 4 July 2022. The study and model were discussed before the assessments to make sure that the concepts and questions were clear. After this, an exercise for the Weser catchment was done, in which the experts answered four questions that were subsequently discussed. In this way, the experts could compare their answers to the realizations and view the resulting scores using the classical model.

Apart from the training exercise, the experts answered 26 questions: 10 seed questions regarding the 10-year discharge (one for each tributary), 10 target questions regarding the 1000-year discharge, and 6 target questions for the ratios between the upstream sum and downstream discharge (10- and 1000-year, for three locations). A list of the seven participants' names, their affiliations, and their field of expertise is shown in Table

The model for downstream discharges (Eq.

For each tributary, a (joint) distribution of the model parameters was determined using Bayesian inference based on expert estimates and observed tributary discharge peaks during annual maxima at Borgharen. Bayesian methods explicitly incorporate uncertainty, a key aspect of this study, and provide a natural way to integrate expert judgment with observed data.

Bayes' theorem gives the posterior distribution,

The likelihood can be calculated using Eq. (

The prior consists of two parts, the expert estimates for the 10- and 1000-year discharge and a prior for the GEV tail shape parameter

Conceptual visualization of elements in the likelihood function of a tributary GEV distribution.

Apart from the expert estimates, we prefer a weakly informative prior for

With both experts' estimate

The posterior distribution comprises the prior tail shape distribution, the prior expert estimates of the 10- and 1000-year discharges, and the likelihood of the observations. As described in Sect.

With the just-described procedure, the (posterior) distributions for the tributary discharge (

Regarding the correlation matrix that describes the dependence between tributary extremes, the observed correlations were used for the data-only option and the expert-estimated correlations for the expert-only option. For the combined option, we took the average of the observed correlation matrix and the expert-estimated correlation matrix. Other possibilities for combining correlation matrices are available

The three components from Eq. (

tributary (marginal) discharges, represented by the GEV distributions from the Bayesian inference,

the interdependence between tributaries, represented by a multivariate normal copula, and

the ratio between the upstream sum and downstream discharges (

In line with the objective of this article, an uncertainty estimate is derived for the downstream discharges. This section describes the method in a conceptual way. Appendix

To calculate a single exceedance frequency curve for a downstream location, 10 000 events (annual discharge maxima) are drawn from the 9 tributaries' GEV distributions. Note that 10 tributaries are displayed in Fig.

Individual exceedance frequency curves for each GEV realization or downstream discharge and the different percentiles derived from these.

The dependence between tributaries is incorporated in two ways. First, the 10 000 events underlying each downstream discharge curve are correlated. This is achieved by drawing the 9

This result section first presents the experts' scores for the classical model (Sect.

The experts estimated three percentiles (5th, 50th, and 95th) for the 10- and 1000-year discharge for all larger tributaries in the Meuse catchment. An overview of the answers is given in the Supplement. Based on these estimates, the scores for the classical model are calculated as described in Sect.

Scores for the classical model for the experts (top 7 rows) and decision makers (bottom 3 rows).

The statistical accuracy varies between 2.3

Seed question realizations compared to each expert's estimates. The position of each realization is displayed as a percentile point in the expert's distribution estimate.

The information scores show, as usual, less variation. The expert with the highest statistical accuracy (expert D) also has the lowest information score. Expert E, who has a high statistical accuracy as well, estimated more concentrated percentiles, resulting in a higher information score.

The variation between the three decision makers (DMs) in the table is limited. Optimizing the DM (i.e., excluding experts based on statistical accuracy to improve the DM score) has a limited effect. In this case, only experts D and E would have a non-zero weight, resulting in more or less the same results compared to including all experts, even when some of them contribute with marginal weights. The equal-weights DM in this case results in an outcome that is comparable to that of the performance-based DM, i.e., a high statistical accuracy with a slightly lower information score compared to the other two DMs.

We present the model results as discussed earlier through three cases: (a) only data, (b) only expert estimates, and (c) the two combined as described in Sect.

We requested the experts to briefly describe the procedure they followed for making their estimates. Overall, three approaches were distinguished between. The first was using a simple conceptual hydrological model, in which the discharge follows from catchment characteristics like (a subset of) area, rainfall, evaporation and transpiration, rainfall-runoff response, land-use, subsoil, slope, or the presence of reservoirs. Most of this information was provided to the experts, and if not, they made estimates for it themselves. A second approach was to compare the catchments to other catchments known by the expert and possibly adjust the outcomes based on specific differences. A third approach was using rules of thumb, such as the expected discharge per square kilometer of catchment or a “known” factor between an upstream tributary discharge and a downstream discharge (of which the statistics are better known). For estimating the 1000-year discharge, the experts had to do some kind of extrapolation. Some experts scaled with a fixed factor, while others tried to extrapolate the rainfall, for which empirical statistics where provided. The hydrological data (described in Sect.

Figure

Discharge per area for each tributary and experts, based on the estimate for the 50th percentile

Extreme discharge statistics for Chooz

We calculated the extreme discharge statistics for each of the tributaries based on the procedures described in Sect.

Figure

Combining all the marginal (tributary) statistics with the factor for downstream discharges and the correlation models estimated by the experts, we get the discharge statistics for Borgharen. The results for this are shown in Fig.

As with the statistics of the tributaries, we observe high accuracy for the data-only estimates in the in-sample range, constrained uncertainty bounds for EJ-only in the range with higher return periods and both when combined. The combined results match the historical observations well. Note that this is not self-evident as the distributions were not fitted directly to the observed discharges at Borgharen but rather obtained through the dependence model for individual catchments and Eq. (

Zooming in on the discharge statistics for the downstream location of Borgharen, we consider the 10-, 100-, and 1000-year discharge. Figure

Kernel density estimates for the 10-year

Comparing the three modeling options discussed thus far, we see that the data-only option is very uncertain, with a 95 % uncertainty interval of 4000 to around 9000

The combined results are surprisingly close to the currently used GRADE statistics for dike assessment; the uncertainty is slightly larger, but the median is very similar. The EJ-only results are less precise, but the median values are similar to the combined results and GRADE statistics. The large uncertainty is mainly the result of equally weighting all experts instead of assigning most weight to experts D and E (as done for the global-weight DM). For the combined data and EJ approach, the results for the tributary discharges roughly cover the intersection of the EJ-only and data-only results (see Fig.

This study proposed a method to estimate credible discharge extremes for the Meuse River (1000-year discharges in the case of this research). Observed discharges were combined with expert estimates through the GEV distribution using Bayesian inference. The GEV distribution has typically less predictive power in the extrapolated range. Including expert estimates, weighted by their ability to estimate the 10-year discharges, improved the precision in this range of extremes.

Several model choices were made to obtain these results. Their implications warrant further discussion and substantiation. This section addresses the choice of the elicited variables, the predictive power of 10-year discharge estimates for 1000-year discharges, the overall credibility of the results, and, finally, some comments on model choices and uncertainty.

We chose to elicit tributary discharges rather than the downstream discharges (our ultimate variable of interest) themselves. We believe that experts' estimates for tributary discharges correspond better to catchment hydrology (rainfall-runoff response). Additionally, this choice enables us to validate the final result with the downstream discharges. With the chosen setup we thus test the experts' capabilities for estimating system discharge extremes from tributary components while still considering the catchment hydrology rather than just informing us with their estimates for the end results. However, this does not guarantee that the downstream discharges calculated from the experts' answers match the discharges they would have given if elicited directly.

We fitted the GEV distribution based on the elicited 10- and 1000-year discharges. In particular, the GEV's uncertain tail shape parameter is informed through this, as the location and scale parameter can be estimated from data with relative certainty. Alternatively, we could have estimated the tail shape parameter directly or estimated a related parameter, such as the ratio or difference between discharges. The latter was done by

Regarding the goodness of fit of the chosen GEV distribution, we note that some of the experts estimated 1000-year discharges much higher or lower than would be expected from observations. This might indicate that the GEV distribution is not the right model for observations and expert estimates. However, a significantly lower estimate indicates that the estimated discharge is wrong, as it is unlikely that the 1000-year discharge is lower than the highest on record. A significantly higher estimate, on the other hand, might be valid due to a belief in a change in catchment response under extreme rainfall (e.g., due to a failing dam). This would violate the GEV distribution's “identically distributed” assumption. However, the GEV has sufficient shape flexibility to facilitate substantially higher 1000-year discharges, so we do not consider this a realistic shortcoming. Accordingly, rather than viewing the GEV as a limiting factor for fitting the data, we use it as a validation for the classical model scores, as described in Sect.

Finally, we note the model's omission of seasonality. The July 2021 event was mainly extraordinary because of its magnitude in combination with the fact that it happened during summer. Including seasonality would have been a valuable addition to the model, but it would also have (at least) doubled the number of estimates provided by each expert, which was not feasible for this study. The exclusion of seasonality from our research does not alter our main conclusion, which is the possibility of enhancing the estimation of extreme discharges through structured expert judgments.

The experts participating in this study were asked to estimate 10- and 1000-year discharges. While both discharges are unknown to the expert, the underlying processes leading to the different return period estimates can be different. An implicit assumption is that the experts' ability to estimate the seed variables (a 10-year discharge) reflects their ability to estimate the target variables (a 1000-year discharge). This assumption is in fact one of the most crucial assumptions in the classical model. The objective of this research is not to investigate this assumption. For an example of a recent discussion on the effect of seed variables on the performance of the classical model, the reader is referred to

The GRADE results from

To evaluate the value of the applied approach that uses data combined with expert estimates, we compared the results that were fitted to only data or only expert judgment to the results of the combination. For the last option, we used an equal-weight decision maker, a conservative choice as the experts' statistical accuracy could potentially still be determined based on a different river where data for seed questions are available. While the marginal distributions of the EJ-only case present wide bandwidths (see Fig.

Finally, we note that using expert judgment to estimates discharges through a model (like we did) still gives the analyst a large influence on the results. We try to keep the model transparent and provide the experts with unbiased information, but by defining the model on beforehand and providing specific information, we steer the participants towards a specific way of reasoning. Every step in the method, such as the choice for a GEV distribution, the dependence model, or the choice for the classical model, affects the end result. By presenting the method and explicitly providing background information, we hope to have made this transparent and shown the usefulness of the method for similar applications.

This study sets out to establish a method for estimation of statistical extremes through structured expert judgment and Bayesian inference, in a case study for extreme river discharges on the Meuse River. Experts' estimates of tributary discharges that are exceeded in a once-per-10-year and once-per-1000-year event are combined with high river discharges measured over the past 30–70 years. We combine the discharges from different tributaries with a multivariate correlation model describing their dependence and compare the results for three approaches: (a) data-only, (b) expert-judgment-only, and (c) them combined. The expert elicitation is formalized with the classical model for structured expert judgment.

The results of applying our method show credible extreme river discharges resulting from the combined expert-and-data approach. In comparison to GRADE, the prevailing method for estimating discharge extremes on the Meuse, our approach gives similar ranges for the 10-, 100-, 1000-year discharges. Moreover, the two experts with the highest scores from the classical model had discharge estimates that correspond well with those discharges that might be expected from the observations. This indicates that using the classical model to assess expert performance is a suitable way of using expert judgment to limit the uncertainty in the out-of-sample range of extremes. The experts-only approach performs satisfactory as well, albeit with a considerably larger uncertainty than the EJ-data option. The method may also be applied to river systems where measurement data are scarce or absent, but adding information on less extreme events is desirable to increase the precision of the estimates.

On a broader level, this study has demonstrated the potential of combining structured expert judgment and Bayesian analysis in informing priors and reducing uncertainty in statistical models. When estimates on uncertain extremes, which cannot satisfactorily be derived (exclusively) from a (limited) data record, are needed, the presented approach provides a means (not the only mean) of supplementing this information. Structured expert judgment provides an approach of deriving defensible priors, while the Bayesian framework offers flexibility in incorporating these into probabilistic results by adjusting the likelihood of input or output parameters. In our application to the Meuse River, we successfully elicited credible extreme discharges. However, case studies for different rivers should verify these findings. Our research does not discourage the use of more traditional approaches such as rainfall–runoff or other hydrodynamic or statistical models. Considering the credible results and the relatively manageable effort required, the approach (when well implemented) can present an attractive alternative to models that approach uncertainty in extremes in a less transparent way.

Section

Three model components are elicited from the experts and data:

marginal tributary discharges in the form of a MCMC GEV parameter trace, where each combination

a ratio between the sum of upstream peak discharges and the downstream peak discharge, represented by a single probability distribution

the interdependence between tributary discharges in the form of a multivariate normal distribution.

The exceedance frequency curves for the downstream discharges are calculated based on nine tributaries (

The

For calculating a single curve,

This is the first of two ways in which the interdependence between tributary discharges is expressed. The second is the next step, drawing a (

An

Note that this notation corresponds to Eq. (

This procedure results in one exceedance frequency curve for the downstream discharge. The procedure is repeated 10 000 times to generate an uncertainty interval for the discharge estimate. Note that the full Monte Carlo simulation comprises

Figure

Correlation matrices estimated by the experts.

This research is part of a PhD project. The full dataset related to that project contains the data from this research as well and has the following DOI:

The supplement related to this article is available online at:

GR, OMN, and MK planned the study. GR prepared and carried out the expert elicitation, processed and analyzed the results, and wrote the paper draft. OMN and MK reviewed and edited the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

We would like to thank all experts that participated in the study – Alexander Bakker, Eric Sprokkereef, Ferdinand Diermanse, Helena Pavelková, Jerom Aerts, Nicole Jungermann, and Siebolt Folkertsma – for their time and effort dedicated to making this research possible. Secondly, we thank Dorien Lugt en Ties van der Heijden, whose hydrological and statistical expertise greatly helped in preparing the study through test rounds.

This research was funded by the TKI project EMU-FD.

This paper was edited by Daniel Viviroli and reviewed by two anonymous referees.