Assessing the rarity and magnitude of very extreme flood events occurring less than twice a century is challenging due to the lack of observations of such rare events. Here we develop a new approach, pooling reforecast ensemble members from the European Flood Awareness System (EFAS), to increase the sample size available to estimate the frequency of extreme local and regional flood events. We assess the added value of such pooling, determine where in Central Europe one might expect the most extreme events, and evaluate how event severity is related to physiographic and meteorological catchment characteristics. We work with a set of 234 catchments from the Global Runoff Data Centre matched to EFAS catchments and for which the performance of simulated floods is good when compared to observed streamflow. We pool EFAS-simulated flood events for 10 perturbed ensemble members and lead times ranging from 22 to 46 d, where flood events are only weakly dependent (

Reliable estimates of the frequency and magnitude of extreme flood events are needed to develop suitable preparedness and adaptation measures. However, estimates of flood events occurring less than twice a century are usually affected by large uncertainty and low reliability due to the shortness of observed records. To increase the sample size available for flood frequency analysis, different model-based approaches have been proposed. There are two important classes of methods to increase sample size, namely stochastic models and large ensembles, that rely on climate simulations. Stochastic models rely on statistical principles to generate large samples of flood events with similar characteristics to the observations

An alternative approach to generate large ensembles of climate variables using physical principles is reforecast simulations, i.e. forecasts generated for past periods

While this ensemble pooling or UNSEEN approach has been successfully used to assess the frequency of rare wind, wave height, storm surge, and precipitation events

To assess the potential value of reforecast ensemble pooling in flood frequency analysis, we use reforecast simulations of streamflow generated by the European Flood Awareness System (EFAS). EFAS combines a weather prediction model with a hydrological model to generate hydrological simulations including streamflow (Fig.

Illustration of workflow.

Our evaluation of the ensemble pooling approach for flood frequency analyses in Europe uses a set of 234 catchments in Central Europe, with areas ranging from a first quartile of 698 km

The 234 catchments in Central Europe selected for the analysis based on model performance and the availability of catchment boundaries and characteristics. The four example catchments, used for illustration purposes, are highlighted in red.

EFAS provides deterministic and probabilistic medium-range streamflow forecasts and early warning information

In addition to streamflow forecasts, EFAS provides streamflow reforecasts generated by forcing LISFLOOD with medium- to sub-seasonal range meteorological reforecasts

We select the most suitable catchments for analysis out of 847 catchments in Central Europe which are part of the Global Runoff Data Centre database

The pooled frequency analysis relies on reforecasts of daily streamflow time series generated through EFAS v4.0. For our analysis, we use the 10 perturbed ensemble runs and 24 lead times, a subset of available lead times (

Simulated streamflow time series can be biased because of uncertainties introduced through the modelling process. Substantial bias indicates low model fidelity because there is limited agreement between observed and modelled distributions

Comparison of observed and simulated cumulative distribution functions of peak-over-threshold flood events derived from observed streamflow time series, raw simulations without any bias correction, flood events corrected by the median ratio between observed and simulated flood distributions, and empirically quantile mapped simulations for the four example catchments.

After bias correction, we identify flood events in the time series simulated for different lead times and perturbed members (10 perturbed runs for each lead time). We use two flood extraction procedures, namely the annual maxima (AM) and peak-over-threshold (POT) approaches. Both approaches are applied to each of the simulated time series generated for different lead times (24) and perturbations (10), i.e. for 240 time series per catchment. The extracted AM event sets are used in the subsequent independence tests (see Sect.

Using the AM flood samples extracted from different simulation runs, we assess the suitability of the perturbed ensemble streamflow simulations for ensemble pooling by evaluating whether individual simulation runs can be considered independent and stable, i.e. that simulated distributions vary only slightly across lead times

First, we assess model stability, i.e. check whether the generated ensemble exhibits any changes in distribution with lead time. Ideally, a pooled ensemble should only exhibit weak changes in distribution with lead time. Such stability is assessed by comparing the distribution of AM events across different lead times (Fig.

Second, we check whether individual model runs can be considered independent. Ensemble member independence is an important factor determining the increase in effective sample size achieved through ensemble pooling. If all

For the frequency analysis, we use the POT instead of AM flood samples to ensure the inclusion of relevant events and to reduce the dependence between ensemble members (i.e. runs for different lead times and for different perturbations).

For the local (catchment-specific) frequency analysis, we pool all POT events from the perturbed members of the lead times that can be considered independent, i.e. lead times

We fit a theoretical generalized Pareto distribution

As a reference for these theoretical estimates, we provide empirical return period estimates of the observed flood events derived using the Weibull plotting position

To identify physiographical and hydrometeorological characteristics important for explaining flood quantiles at different return periods, we use linear modelling. We fit different linear regression models of different sizes, i.e. with different numbers of explanatory variables, using exhaustive search

After performing the local frequency analysis, we look at probabilities of regional flooding. That is, we estimate the return periods of events that affect a certain percentage of catchments within a larger region, i.e. river basin. We focus on the major river basins in Europe

After identifying catchments with satisfactory model performance in terms of the EFAS historical runs, we assess the suitability of the streamflow ensemble generated using the perturbed numerical weather predictions and different lead times for ensemble pooling. This assessment focuses on AM instead of POT flood samples because POT event identification can lead to the selection of an unequal number of events across lead times, which makes it impossible to compute correlations. We first consider the stability (i.e. lack of drift across lead times) of annual maxima flood events simulated for 24 lead times ranging from 0 to 46 d for one example catchment (Fig.

Stability across lead times from 0 to 46 d for one example catchment.

We take a closer look at model stability for all catchments by assessing the dependence of the empirical 95 % flood quantile on lead time, using Spearman's correlation coefficient. The median correlation between the lead time and the simulated 95 % quantile across all catchments is 0.02, and the lower and upper quartiles are

We now take a look at AM (in)dependence across perturbed ensemble members by computing Spearman's rank correlation between pairs of AM series derived for the 10 perturbed ensemble members at each lead time. AM (in)dependence across perturbed ensemble members seems to depend both on the catchment and on lead time (Fig.

Member (in)dependence (Spearman's correlation) per lead time – 0 to 110 h (46 d) – across the 10 perturbed ensemble members for four example stations with different flood seasonality ratios (strong summer vs. strong winter flood regime; clockwise from upper left to lower right).

We seek to better understand which types of catchments show high/low ensemble member dependence across lead times. Therefore, we compute the median Spearman's rank correlation across the 10 ensemble members and 24 lead times for each of the 234 catchments and try to relate this median correlation to a catchment's flood seasonality ratio. The flood seasonality ratio

Median annual maxima dependence across all lead times and ensemble members per catchment.

Median Spearman's correlation across ensemble members and catchments per lead time (0 to 1104 h) for

The high dependence at low lead times suggests that simulations at lower lead times should be removed before pooling flood events for frequency analysis. In order to determine the lead times to be excluded, we compute median AM dependence across ensemble members and catchments for each lead time and perform a Pettitt change point test

Flood estimates derived by theoretical distributions fitted to pooled peak-over-threshold (POT) flood events from 10 ensemble members and 13 lead times are more robust, i.e. have smaller uncertainty, than flood estimates derived from distributions fitted to a small sample of observed POT events, as illustrated in Fig.

Observed vs. simulated flood frequency curves, including uncertainty bounds for four example catchments with different seasonality ratios.

The differences between observation- and simulation-based best estimates and uncertainty ranges vary by return period and by catchment (Fig.

Relative difference between the observed and simulated ((obs-sim)/sim)

We now use the best estimates derived by ensemble pooling to map spatial patterns of flood quantiles over Central Europe for different return periods (Fig.

Theoretical flood quantiles corresponding to return periods of

Predictor importance for flood quantiles. Regression coefficients for significant explanatory variables retained when choosing the linear model with the lowest BIC (

Ensemble pooling can also be used to derive regional flood estimates, i.e. to compute the probability that a certain percentage of catchments within a region, i.e. large river basin, are jointly flooded (Fig.

Probabilities of regional flooding for European river basins with more than five catchments, with

Pooling flood events derived from a streamflow reforecast ensemble substantially increases the sample size available for flood frequency analysis. In doing so, it enables the study of very rare extremes absent in relatively short observed time series. Increasing the sample size also facilitates the study of spatial patterns in the distribution of flood estimates corresponding to long return periods (e.g. 200 years; Fig.

The utility of reforecast pooling rests on the performance of the underlying hydrological simulations. The use of reforecast simulations instead of observations comes at the cost of potentially introducing uncertainty through simulated meteorological input or the hydrological model itself

An additional limitation is the spatial applicability of the approach. As hydrological model simulations must be bias corrected, the use of ensemble pooling is currently limited to catchments for which streamflow observations are available. This requirement limits the application of the pooling approach to gauged catchments. In theory, using simulations instead of observations would enable the extension of the spatial coverage to ungauged locations. However, such an extension would only be possible if no bias correction was required or if bias correction could be regionalized and applied to all catchments.

Sample size is only effectively increased compared to observations if the simulated flood samples for different ensemble members can be considered independent

The flood ensemble pooling approach described herein is not limited to the EFAS reforecasts over Europe but could also be applied to other streamflow reforecast modelling systems, such as the Global Flood Awareness System

Pooling of publicly accessible reforecast flood events such as those generated through the European Flood Awareness System (EFAS) can be a useful tool to improve the robustness of flood estimates, particularly for rare events with long return periods. However, as with other extremes

Our application of the pooling approach over 234 European catchments shows that local floods are most extreme in the Alps and Great Britain and least extreme in Scandinavia and Central Europe. It also indicates that regional extreme flood events, in which a large fraction of catchments flood simultaneously, are more likely in Central Europe than in Scandinavia. We conclude that pooled reforecast ensembles are beneficial in studying the probability of extreme and spatially extensive events in the case of accurate model representation of hydrologic extremes, as they help provide flood estimates with considerably reduced uncertainty compared to observation-derived flood estimates.

The historical and reforecast simulations of river discharge generated through EFAS are available for download through the Copernicus data store (

MIB and LJS jointly designed the study and developed the methodology. MIB prepared the data, performed the analyses, and wrote the first draft of the paper. LJS revised and edited the paper.

At least one of the (co-)authors is a member of the editorial board of

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to acknowledge high-performance computing support from Cheyenne (

This work has been supported by the Swiss National Science Foundation via a PostDoc.Mobility grant (grant no. P400P2_183844; granted to Manuela I. Brunner) and a John Fell Fund grant (to Louise J. Slater). This open-access publication was funded by the University of Freiburg.

This paper was edited by Roberto Greco and reviewed by Alfonso Senatore and one anonymous referee.