Simplifying a hydrological ensemble prediction system with a backward greedy selection of members – Part 2 : Generalization in time and space

Introduction Conclusions References


Introduction
The competency of probabilistic forecast to encompass the many sources of uncertainty in Hydrological Ensemble Prediction Systems (HEPS) has already been demonstrated (Roulin, 2007;Rousset et al., 2007;Velázquez et al., 2011).Yet the simultaneous consideration of the uncertainty associated with both the meteorological inputs and the structural and parametric configuration of the hydrological models can lead to systems consisting of too many members to be computationally and operationally implementable.
Nonetheless, reliability as a crucial feature in ensemble forecasting may be achieved through the uncertainty cascade model as proposed by Pappenberger et al. (2005).This approach states that the output uncertainty of a hydrological model is affected by several components: uncertainty from the meteorological data used to drive the model, initialization uncertainty (i.e. the initial state of the model), and the model uncertainty (from parameter identification to model conceptualization).
Combining information derived from the many Meteorological Ensemble Prevision Systems (MEPS) is an avenue that has been shown to improve early flood warning systems (He et al., 2009) -the THORPEX Interactive Grand Global Ensemble (TIGGE) (Bougeault et al., 2010) favours this new opportunity.Moreover, if the parametric uncertainty of hydrological models is assessed under the principle of equifinality (Beven and Binley, 1992) and if the structural uncertainty is tackled through a multi-model approach, the number of scenarios in the uncertainty cascade model may rapidly Published by Copernicus Publications on behalf of the European Geosciences Union.D. Brochero et al.: Simplifying a hydrological ensemble prediction system, Part 2 turn out to be quite large.Simplification of such a HEPS thus becomes a mandatory step from an operational standpoint.
In such a context, the hydrological and meteorological community has focused their efforts on many lines of simplification.For instance, Pappenberger et al. (2005) evaluated 10-day ahead rainfall forecasts, consisting of one deterministic, one control, and 50 ensemble forecasts, into a rainfallrunoff model (LisFlood) for which parameter uncertainty was represented by six different parameter sets identified through a Generalized Likelihood Uncertainty Estimation (GLUE) analysis and functional hydrograph classification.Raftery et al. (2005) proposed the Bayesian Model Average methodology (BMA) as a means for the statistical postprocessing of the forecast ensembles derived from numerical weather prediction models.The BMA predictive probability density function (PDF) is a weighted average of the PDFs centred on the bias-corrected forecasts from a set of different models.The weights assigned to each model reflect that model's contribution to the forecasting skill over a training period (Vrugt et al., 2006).In line with that, Vrugt et al. (2008) proposed evaluating BMA weights with the Dif-feRential Evolution Adaptive Metropolis (DREAM) Markov Chain Monte Carlo (MCMC) algorithm.
Other studies identified the meteorological forecasts as the most uncertain component of the cascade model (Todini, 2004;Pappenberger et al., 2005;Jaun et al., 2008), triggering interest in novel member selection techniques.For example, Marsigli et al. (2001); Molteni et al. (2001) and Jaun et al. (2008) select MEPS members based on lagging ensembles, and derived representative members through hierarchical clustering over the domain of interest.Ebert et al. (2007) analysed the relation between the atmospheric circulation patterns and extreme discharges to select representative members of MEPS.Finally, Xuan et al. (2009) establish, in a deterministic way ("best match" approach), the location of the forecast that is the most similar to the rainfall pattern of the catchment.
In the companion paper, Brochero et al. (2011) described in depth the hydrological member selection methodology adopted here: a Backward Greedy Selection combined with Cross Validation, hereafter BGS-CV, to retain the uncertainty properties of a 800-member HEPS derived from the fifty members of the European Center for Medium-range Weather Forecasts (ECWMF) propagated through sixteen simple lumped hydrological models.
Another aspect of particular interest in the evaluation of probabilistic forecast, and therefore in hydrological member selection, is the identification of a pertinent criteria set.In conventional forecasting, i.e. when confronting an observation against a single prediction, it is now generally accepted that the calibration of hydrological models should be approached as a multi-objective problem (Gupta et al., 1998(Gupta et al., , 1999;;Yapo et al., 1998;Wagener et al., 2001;Confesor and Whittaker, 2007).Probabilistic forecasting is not different in that regard.In fact, the complexities of confronting an observation against an ensemble of predictions calls for a variety of criteria, here called scores, that specifically focus on one or more characteristics of the probabilistic sets.So, to assess these properties, several statistical measures should be considered concurrently (Wilks, 2005;Cloke and Pappenberger, 2009).Few studies have experimented hydrological member selection from a multi-score point of view.Vrugt et al. (2006) posed the BMA inverse problem in a multi-objective framework, examining the Pareto set of solutions between the Continuous Ranked Probability Score (CRPS), the Mean Absolute Error (MAE), and the Ignorance Score with the AMALGAM method (Vrugt and Robinson, 2007).In continuity with that, the companion paper shows that a combined criterion which groups various characteristics of the probabilistic forecast is adequate to guide the selection of hydrological members with BGS-CV method.At this point, it is important to note that the BGS-CV method offers the possibility of combining results from different studies, which is highlighted as one of the aspects related to the improvement of HEPS (Cloke and Pappenberger, 2009).
In this paper we evaluate the generalization of a simplification scheme of the complex 800-member HEPS presented in Sect. 2. A brief description of the selection of hydrological members is given in Sect.3. The generalization methodology, with local and regional orientation, is explained in Sect. 4. Thus, we test the hydrological members' selection obtained in sixteen catchments for the 9-day lead time, for the other 8 lead times.Additionally we evaluate the ability to extrapolate the selections to neighbouring catchments.Finally we present the integration of results from different catchments within a regional framework.Results and discussion are gathered in Sect.5, while conclusions and a guideline for future work are given in Sect.6.

HEPS configuration and catchment locations
As already mentioned, the 800-member HEPS at hand is the propagation of 50 perturbed members from the ECMWF EPS, that are a priori assumed to be equally likely (Gouweleeuw et al., 2005), through sixteen lumped hydrological models.Details of the HEPS conformation can be found in Brochero et al. (2011).
This HEPS was implemented over 28 French catchments, representing a large range of hydro-climatic conditions (Fig. 1), and evaluated over a 17-month period.The main characteristics of these catchments are summarized in Table 1.Henceforth each basin in Table 1 will be identified only with the first three characters.
It is important to note that this study focuses on evaluating the probabilistic hydrological forecasting from a cooperative point of view seeking diversity in the final hydrological members' selection, i.e. that each member acts as a complement to the others.This clarification is relevant in order to avoid misinterpretation of competitiveness in the different Hydrol.Earth Syst.Sci., 15, 3327-3341, 2011 www.hydrol-earth-syst-sci.net/15/3327/2011/  conceptualizations of the sixteen hydrological models used.It should be clear that the comparison would not be fair because some models such as the GR4J were specifically devised for the catchment scale, whereas others have suffered a series of substantial changes bringing them to a lumped state.

Hydrological members' selection
The hydrological members' selection is described in detail in the companion paper (Brochero et al., 2011).It is executed basically in three steps: Step 1: Resampling with a variation of the k-fold crossvalidation.Because the series are short-length (500 forecastobservation pairs), a rigorous application of the selection requires evaluating different types of events in the training, validation, and test sets.Thus, the process of selecting data follows a k-fold cross-validation technique.
Step 2: Backward greedy selection.Optimization for a preselected number of hydrological members (nmim) relies on the Combined Criterion (CC), which brings together the Continuous Ranked Probability Score (CRPS), the IGNorance Score (IGNS), the Mean Squared Error (MSE) evaluated in the Reliability Diagram (RD), the δ ratio evaluated in the rank histogram and the MeDian of Coefficients of Variation (MDCV): where the result of each criterion in the selection ensemble (se subscript) is divided by the criterion calculated on the initial 800-member HEPS (ie subscript).z m represents some thresholds to orient a direct minimization; w cp are the weights assigned to each component.Here, the weight assigned to the reliability (the critical factor) is twice that of the other factors, which have a unit weight.The mechanism of member elimination begins with all members, removing at each step the hydrological member D. Brochero et al.: Simplifying a hydrological ensemble prediction system, Part 2 that, when it is removed, has the greater impact on the training set error (i.e.minimises training error the most).
Step 3: Combination of results.It is highly likely that variability in the five experiments configured in the first step will lead to different solutions.An integration mechanism is thus needed for a global solution for each catchment.The importance of each hydrological member within the ensemble is then assumed as being directly proportional to the iteration number at which it was eliminated during the selection process in each experiment.
Attention is given to the interpretation of results of the final hydrological members' selection, because if the HEPS is driven by a MEPS with interchangeable members (e.g.ECMWF EPS), the selection should be directed more clearly to a method of selection and weighting of hydrological models based on their participation in the final selected subset.Therefore, in the simplest case, we can create a new simplified high-performance HEPS using the same proportion of the hydrological members associated with a random choice of the meteorological members.
Note that the CC could be used to compare the performance of the hydrological members' selection with respect to the 800-member set.So, in a general framework, if all features of the ensemble forecast have the same importance, one members' selection with equal performance to the 800member set will lead to a CC equal to 5, values lower than 5 indicate a selection of higher performance than the base set of 800 members, and values greater than 5 indicate the detriment of some feature of the 800-member set.Hereafter, this particular condition of unit weights in the CC will be called the normalized sum (NS).This distinction is important to display the priority that can be defined a priori to any feature in the hydrological members' selection training with BGS-CV.
It is important to note that the normalized sum may hide some deterioration compensated by one or more other metrics.It is thus necessary to accompany this measure with the results of each of its components, for a collective analysis.In this sense, the analysis is facilitated if each component is associated with an index that reflects the gain or loss of the selected subset over the initial 800-member set: Note that the absolute value is used in the denominator for accounting for possible negative values of the IGNS.The MDCV function further requires the inversion of the numerator, because the purpose of this metric is to maximize the dispersion of the selected subset of hydrological members.

Generalization test methodology
The generalization ability of a hypothesis, namely, the quality of its inductive bias, can be measured if there is access to data outside of the training process.The methodology proposed in the companion paper simulates this by dividing the training set into two parts.One part is used for training (i.e. to find a hypothesis) and the remaining part (validation set) is used to test the generalization ability.Nevertheless, if it is necessary to report the error to approximate the expected selection error, it is compulsory to make use of a third set, a test set, sometimes also called the publication set, containing examples not used in training or validation (Alpaydin, 2010;Hudson and Demuth, 2011).
Thus, the method of combining results, based on the mean rank of elimination, is derived on the use of all series as a means of optimizing the use of information in a shortlength series (seen from the point of view of the periodicity of the hydrological cycle).However, results of this procedure can be conceived as indicators of a relative performance or otherwise as an optimistic estimate of the hydrological members' selection process (Diamantidis et al., 2000).
Figure 2 shows the generalization or test methodology of the hydrological members' selection at two levels: the local focuses on the extrapolation of results to different FTH within the same catchment and another named regional, while the regional level tests the temporal and spatial performance in nearby catchments, or under a broader perspective on the integration of regional results.

Extrapolation to different forecast time horizons
The hydrological members' selection is performed on the results of sixteen hydrological models fed with the 9th day FTH of the ECMWF MEPS.Thus, the application of this selection of members for the other eight FTHs (1 to 8 days) is a first level test.It has to be stressed that the idea of simplifying the HEPS is only valuable if the hydrological member selection is invariant in regard to the FTH.However, one may always argue that the assumption of statistical independence between the test and training data, principally for FTHs next to the ninth, may be somewhat questionable.

Extrapolation to a different catchment
Transferring selected members to a neighbouring catchment, and even further to a different FTHs, constitutes a rigorous test of the generalization ability of results at both the temporal and spatial scales.The choice of the second catchment could first be viewed as a simple nearest neighbour problem.However, we explored the possibility of regionalizing the selection of hydrological members from the grouping of catchments by k-means clustering and subsequent integration of results to select the most representative hydrological members.
It is convenient at this point to define some notation to describe the assignment of catchments to a region or cluster.The property set x l for each catchment is introduced into a corresponding set of binary indicator variables b l k ∈ {0, 1}, where k = 1, ..., K describe which of the K clusters the catchment l or its property set x l is assigned to, so that if x n is assigned to cluster k then b n k = 1, and b n j = 0 for j = k.Then an objective function is given by: which represents the sum of the squares of the distances of each catchment to its assigned vector m k .The goal is to find values for the b l k and the m k so as to minimise J .Then the iterative application of Eq. (3) leads to the following procedure for finding the m k centres: Algorithm 1 k-means pseudo-code 1. Define the number of clusters (K), (here Details of the k-means clustering algorithm are given by Bishop (2006).Figure 1 shows an example of k-means clustering results based only on the geographic location of the basin outlets.

Regional integration mechanism
The hydrological members' selection integration for region X, consisting of C catchments, is defined from matrix S, which has C columns with nmin rows representing the most nmin important hydrological members as assessed by the mean rank of elimination (R) for each catchment.Then, the process of forming a regional solution rs with q members is based on taking the most important members of each catchment without replacement until the number of members in rs is equal to the desired q, i.e. each member cannot be selected again later.Algorithm 2 details this procedure: Algorithm 2 Regional integration mechanism pseudo-code 1. Determine the C catchments in the X region (clustering process).

Define the matrix S
Establish the number of hydrological members q in the regional solution rs 4. Initialize rs = {}, h = 0 and i = 1

Diversity evaluation
The participation of hydrological models in the regional selection stresses the importance of the integration of models with different characteristics.To view this in a deterministic framework, an index based on the performance rank assigned to each model in each catchment is proposed.Its calculation is summarized as follows: -MSE for catchment i and hydrological model j is first calculated (MSE i,j ).
-Performances are next ranked for each catchment, leading to PR i,j , for which the model with the lowest MSE is assigned the rank PR = 16 and the highest MSE is assigned the rank PR = 1.
-Finally, the mean rank of performance or rank index RI j for each model is estimated based on the results of all 28 basins:

Results and discussion
In the companion paper we have shown the high performance of the 800-member HEPS for the 9th day FTH.However, as one of the objectives of this paper is to show the transferability of the hydrological members selections to other FTHs, it is necessary to show the performance of the 800-member HEPS in such scenarios to clearly establish our point of reference concerning the quality of the hydrological members' selection.In the companion paper we also stressed that on the δ ratio and the RD MSE scores rest the main advantages of the 800-member HEPS.
Figure 3 shows the HEPS' behaviour with different set-up and different FTH.Results focus on the reliability (RD MSE ) and the ensemble consistency (δ ratio) for two schemes formed from sixteen hydrological models, one led by the deterministic ECMWF forecast and the other by the 50 perturbed members from ECMWF EPS.The results in Fig. 3, expressed in terms of interquartile range (iqr) and median, are due to the grouping of the scores obtained in the 28 basins evaluated here.Note that the δ ratio and RD MSE scores are directly comparable since their scale is independent of the measured variable.
Figure 3 illustrates that the 800-member HEPS advantages becomes apparent after the 4th day FTH.According to Velázquez et al. (2011), part of this difficulty may be inherited from the meteorological ensembles, which are not reliable prior to about a 3-day lead time.Furthermore the spread in the results of both the RD MSE and the δ ratio, viewed from the interquartile range, shows two features: first, the 16member HEPS has greater dispersion than the 800-member HEPS, and second, the 800-member HEPS spread diminishes with increasing lead time.

Selection process
The optimal number of hydrological members simplifying the HEPS was identified in the companion paper to be between 50 and 100, depending on the catchment.In most cases a significant gain with respect to the balance of the different criteria evaluated from the initial 800-member HEPS was then achieved.Results presented in this section are based on a selection of 50 hydrological members.
Table 2 presents the results of the 50-member selection based on the combined criterion, for 16 catchments uniformly distributed over France (see Fig. 1).The overall performance is the normalized sum given by Eq. ( 1) with unit weights definition, values lower than 5 indicate a selection of higher performance than the base set of 800 hydrological members, and values greater than 5 indicate the detriment of any feature of the 800-member set.
To facilitate the visualization of results, Table 2 shows the performance of one selection oriented with the hydrological members' proportion found in the BGS-CV process.However, Fig. 4 and 6 present an analysis that shows the performance of multiple selections oriented by the BGS-CV solution and a random choice of the meteorological members from ECMWF.
Based on the gain score formulation (Eq.2), it is noted that for the 50-member selection, the CRPS and the MDCV show low variability with mean gain indexes around 2 % and 5 %, respectively.
RD MSE shows a minimum gain of 49 % (catchment B21) and a maximum gain of 87 % (catchment K17), reflecting the emphasis given to this property in the formulation of the combined criterion used in the selection process.With respect to the IGNS, index gains between −5 % and 27 % (excluding the catchment B21) reflect an acceptable behaviour.
Finally, the δ ratio is the score more difficult to minimise or preserve; a positive index gain was obtained for only 25% of the cases (4/16), while the spread ranged from −39 % for catchment H53 to 27 % for catchment B31.Note that the δ ratio has an inverse relationship with the number of members of the selection, so it directly follows the complexity in maintaining the value of the initial 800-member HEPS in the selection process.Nonetheless, it was shown in the companion paper that the δ ratio is the best individual metric for the hydrological members' selection.

Local analysis
For operational convenience, it is fundamental that the 50 hydrological members selected for the 9th day FTH are also appropriate for the 8 previous time horizons.A lack of transferability of the selected members would considerably reduce the actual level of achieved simplification.
Here, temporal transferability is first evaluated comparing the normalized sum of the performance of the 50-member selection to the 800-member performance, whose normalized sum equals 5 in all cases.It is then compared to the performance of 200 random combinations with 50 hydrological members, in order to evaluate if any good performance may only be attributable to chance.Results for the 8 first FTHs and sixteen basins are gathered in box-plot diagrams (Fig. 4), where the performance of the solution is based on random experiments that are set-up following these guidelines: -Experiments considering the participation of hydrological models found with BGS-CV: taking into account the participation of hydrological models to assign to each model a number of members chosen randomly from ECMWF EPS.
-Without considering any "a priori" participation of hydrological models: hydrological members are picked randomly from the initial 800-member HEPS. Figure 4 shows that the median of 200 evaluations of 50member HEPS for the 9th day FTH is superior to the 800 reference members in 82 % of the evaluated cases.It is also noteworthy that in only 11 % of the cases (14/128) the 50 hydrological members selected oriented by the BGS-CV process lead to a worse performance than the 25 percentile of 200 random combinations test.Note that all these cases correspond to short lead times (1 to 3 days), remarkably in the 2-day FTH.Another aspect that draws attention is the low dispersion of the BGS-CV selections represented by the interquartile range, highlighting the importance of the hydrological models participation in the selection process.
Figure 4 also shows that the selection slowly loses efficiency as it moves away from the 9th day FTH.It also detects a systematic deficiency for catchment A69 and to a lesser extent for catchment B21.Nonetheless, these results are very encouraging.

Regional analysis
As described in Sect.4.2, the regional analysis assesses the generalization ability of the hydrological member selection for a specific catchment with respect to another one.For example, Fig. 5   selection obtained for catchment Q25 for a lead time of 9 days to catchment P72 for the 4-day lead time.
In general, Fig. 5 shows that results for the different scores are very similar for the 800-member and 50-member sets, except for the RD MSE where the gain index reaches 51 %.In particular, Fig. 5a shows that the 50-member CRPS equals the reference value.Taking into account that the CRPS generalizes the mean absolute error (CRPS) for a point forecast (Gneiting and Raftery, 2007), it is important to stress that the CRPS values are always lower than the MAE values, when the deterministic counterpart was taken as the mean of each daily ensemble, in agreement with results obtained by other authors (Boucher et al., 2009;Velázquez et al., 2011).
Another remarkable feature of CRPS is its direct relationship with the flow magnitude; the shapes of the CRPS and of the hydrograph are similar.
A direct strategy of optimization could then focus on removing the hydrological members that have a large impact on the daily extreme CRPS values.Note also that the selection not only preserves the mean CRPS (0.16) but also the structure of the CRPS series.
Figure 5b shows that the trimmed mean IGNS for the 50member HEPS (−1.65) also presents an improvement over the initial value (−1.59).Regarding the time structure of the IGNS, it is observed that both the 50-member and 800member series have high values for extreme events, showing a systemic problem in terms of ensemble bias.
With regard to the reliability diagram, Fig. 5c shows a considerable agreement improvement (4.21 × 10 −3 ) over the initial value (8.67 × 10 −3 ).This gain in reliability may be traced back to the optimization criterion used: the combined criterion (CC) that focuses primarily on system reliability as defined by its weights.Similarly, Fig. 5d reveals that the rank histograms have a nearly uniform distribution, even if the first and the last rank reflect a slight bias.Those imperfections demonstrate the difficulty inherent in minimizing the δ ratio.Figure 5e illustrates the occurrence of each lumped model within the 50-member hydrological ensemble.A wide selection of models alone could justify the multi-model approach advocated here.Results show that 12 models out of 16 were selected in this case, and that no models were selected more than 9 times.Knowing that these models are not of equal quality with regards to MSE performance, for instance, this suggests that the selection favoured a diversity of errors.At the end of the selection process, the MDCV has slightly increased, from 0.15 to 0.16.
To display an overview of the extrapolation of results to the nearest basin, Fig. 6 shows such an assessment under the same selection schemes analysed in Fig. 4, i.e. analyzing various combinations considering or ignoring the solution found with BGS-CV.Although in general the solution found with BGS-CV (red stars in Fig. 5) exhibits the highest performance, given the interchangeability of MEPS members as input of hydrological models, solutions focus on comparing the median of the evaluations that follow the participation of hydrological models found with BGS-CV.
Additionally, it is clear that the dispersion of the BGS-CV selections, evaluated from the interquartile range, is less than that assessed in completely random selections.Likewise, the median of the BGS-CV selections is usually better than the reference set of 800 hydrological members, which corresponds to a normalized sum equal to 5.
Likewise, it is noteworthy that extrapolation of the results of selection in the basins A69, A79 and B21 are tested in the basin A70; however, only the results of the hydrological members' selection in the basin A79 show considerable efficiency in most of the FTHs evaluated.It follows that while the geographic location of the basin outlet is an acceptable feature to run the extrapolation of results, it is not sufficient in some cases, which requires a more detailed analysis of other factors such as hydrometeorological and physiographic characterization of the basins.
The regional analysis that integrates several basins, which seeks to identify features that facilitate the combination of results, revealed that geographical location is the most important feature, followed by evapotranspiration, precipitation and flow, when the normalized sum is used to evaluate the gain.However, consideration of the geographic location was found to be sufficient.Such results are presented in Table 3, after application of the k-means algorithm and the regional integration procedure already described in Sect.4.2.2.
Note that the results in Table 3 are due to the evaluation of one combination of MEPS members randomly chosen, but respecting the participation of hydrological models found with BGS-CV.Additionally, for purposes of extrapolation of results, in the evaluation of the normalized sum, a threshold z 1 equal to −4 was used, because in the first lead times (1 to 4 days) some values lower than −2 were obtained for the trimmed mean IGNS.In Table 3, the normalized sum (NS) for the 9-day FTH is generally lower than 5 for catchments subjected to the regional integration (except basin A70).Furthermore, in 44 % of such assessments (catchments H24, K17, U25, J85, K73, H36, and H53) the regional integration presents better results than the local performance relative indicators shown in Table 2.Although the regional integration in clusters 1, 2 and 3 shows that the 85 % of the normalized sums are lower than 5 and the remaining 15 % corresponds principally to the first lead times (1 to 3 days), the clustering and posterior regional integration is less efficient for the groups 4 and 5, whose normalized sums are higher to 5 in 65 % of the cases.
The behaviour in cluster 5 is inherited from the low extrapolation efficiency highlighted in basins A69, A92, and B21 (Fig. 6).As such, the proposed regional integration mechanism is shown as a consistent task since its efficiency is a function of performance of its components.
With regard to cluster 4, the regional solution shows a lower diversity of hydrological models.This factor is evident in Fig. 7 which illustrates that for this cluster 70 % of the hydrological members originate from only three hydrological models (HM03, HM06, and HM14), which is quite a different behaviour than for clusters 1, 2 and 3 where the portion of the three most selected models reaches 58 %, 56 %, and 44 %, respectively.
Thus it seems that diversity as characteristic of the final selection of hydrological members appears to be a factor with a significant impact on the performance of the selection.In other words, the participation of hydrological models in the regional selection stresses the importance of the integration of models with different characteristics.To view this in a deterministic framework, the index based on the performance rank assigned to each model in each catchment (Sect.4.2.3)shows that the most selected models (HM01, HM03, HM06, HM09, and HM14) occupy quite different ranks (Fig. 7).For instance, HM03 and HM09 present a high performance while HM01, HM06 and HM14 are of lower performance.This feature exemplifies the notion of the diversity discussed in different stages of the scientific community concerning ensemble methods.Alpaydin (2010) statistically showed that if an ensemble of d models with outputs that are independent and identically distributed, has a negative correlation between their error, the error variance of the average ensemble decreases proportionally with d  Vrugt et al. (2008) proposed positive correlation (lack of diversity) as an efficient mechanism for removal of members of an ensemble.Diversity can be defined as the search for models that complement their skills, so that each model focuses on different objects.Diversity in the ensemble is thus a vital requirement for successful modelling.In practice, it appeared to be difficult to define a single measure of diversity and even more difficult to relate that measure to the ensemble performance in a neat and expressive dependency (Kuncheva, 2004).Nevertheless, the regional clusters in Fig. 7 make use of most of the 16 available models, whatever their performance rank.For example, the most frequently selected models in cluster 2 are HM03 and HM06 despite the fact that HM02 exhibits the same rank of performance as HM03 and that HM06 presents one of the lowest ranks in the ensemble.

Conclusions
A companion paper has already demonstrated the success of the backward greedy member selection technique for simplifying a 800-member HEPS combining the 50 perturbed members from the ECMWF MEPS with 16 lumped hydrological models (Brochero et al., 2011).The present paper has focused on the generalization quality in time and space of a 50-member HEPS selected from the 800-member ensemble correspondent to the 9-day FTH.When applied to the other 8 time horizons, the 50 selected members also improved performance over the initial 800-member HEPS in 82 % of the situations.It was particularly successful when applied to a nearby catchment of the same cluster.Member diversity seems to be the key to this simplified HEPS that makes use of only 6.25 % of the initial structures (50 members/800 members).Indeed, it has been shown that most 50member HEPS relied on a broad selection of hydrological models, which gives further support to the multi-model hydrological approach.
Comparing scores obtained for the 50 representative hydrological members to the ones of the initial 800-member ensemble indicated that the proposed selection methodology, which is based on cross-validation and the combination of scores into a single function, generally leads to good performance in terms of gains of individual scores.However, these gains were not entirely transferable under the scheme of extrapolation evaluated here.This drawback may in part be attributable to the simple selection methodology used here along a linear integration of scores that has no real control over balance, or the need to evaluate more features to enhance such transferability in the clustering approach.
A more sophisticated approach would optimize all performance diagnostics simultaneously or find a Pareto set of solutions identifying trade-offs among the various performance metrics.Such a framework, but in a context of combination rather than selection of hydrological members, was proposed by Vrugt et al. (2006).It consists in the optimization of Bayesian Model Averaging weights and variance using the A Multi-ALgorithm Genetically Adaptive Multiobjective (AMALGAM) method.
Finally, it would be interesting, in the case of a HEPS driven by interchangeable meteorological members, to combine the participation of hydrological models found with BGS-CV with the meteorological members chosen by a technique such as that proposed by Molteni et al. (2001) instead of testing them randomly.

Fig. 1 .
Fig. 1.Location of the catchments grouped by clusters.Some of them have been used in the BGS-CV process, while the others have been used for extrapolation.The colours identify the five regions evaluated in this paper.

Fig. 2 .
Fig. 2. Generalization test methodology for the hydrological members' selection found with BGS-CV.

Fig. 3 .
Fig.3.Interquartile range (iqr) of RD MSE and δ ratio assessed in the 28 catchments under two HEPS schemes: 16-member HEPS (16 hydrological models are driven by the deterministic forecast from ECMWF) and the 800-member HEPS (16 hydrological models are driven by the 50-perturbed member forecast from ECMWF).

Fig. 4 .
Fig. 4. Evolution of the normalized sum (NS) to evaluate the response sensibility with regard to the interquartile range (iqr) of 200 random experiments in different FTHs following these guidelines: (1) Considering the participation of hydrological models found with BGS-CV (vertical blue bars), and (2) Without regard to any "a priori" participation of hydrological models, i.e. completely random selection (vertical cyan bars).

Fig. 5 .
Fig. 5. Comparison between the initial ensemble (800 members) and the ensemble selected (50 members) for a lead time of 9 days.(a) Figure above: observed flow; figure below: CRPS (x-axis formatted as: day/month).Note the correspondence between higher observed flows and higher CRPS.(b) Figure above: observed flow; figure below: IGNS (x-axis formatted as: day/month).(c) Reliability diagram error (MSE based on vertical distances between the points).(d) Rank histogram for the 50 hydrological members selected.The horizontal dashed line indicates the frequency (N/d + 1) attained by a uniform distribution.(e) Occurrences of the employed models in the final solution of 50 hydrological members.

Fig. 6 .
Fig.6.Evolution of the normalized sum (NS) to evaluate the response sensibility of the extrapolation of results in the nearest catchments.Each vertical bar represents the interquartile range (iqr) of 200 combinations of 50 hydrological members under the following guidelines: the combination is oriented with the same proportion of hydrological models found with BGS-CV (blue vertical bars), the selection is completely random (cyan vertical bars).Note the deficiency of the selections' extrapolation in basin A69 to basin A79, notably for early lead times (2 to 5 days); these results do not appear in the figure because they are above 7.

Fig. 7 .
Fig. 7. Hydrological Models participation.Distribution in the five regions (clusters) are presented in (a)-(e).Model performance evaluated as the mean rank index is shown in (f).

Table 1 .
Main characteristics of the studied basins (mean annual values) based on a 36 year length of the series.
P: precipitation, ET: potential evapotranspiration, Q: flow.For the distinction of the basins used in training and testing, the latter are highlighted in bold.

Table 2 .
Selection of 50 hydrological members based on combined criterion and the BGS-CV process on the 9-day FTH.Beside each score is presented the gain index evaluated by Eq. (2).NS represents the normalized sum (Eq. 1 with unit weights).NHM indicates the number of hydrological models participating in the selection.RD MSE values are expressed on a 10 −3 basis.
D.Brochero et al.:Simplifying a hydrological ensemble prediction system, Part 2

Table 3 .
Test based on the normalized sum in new catchments and different FTHs of regional integration given by the analysis of clusters by geographical location of the basin outlets.Values lower than 5 determined that the scores of selection are better than the reference set.See distribution in Fig.1.In each cluster, the catchments highlighted in bold represent the series that are not used by the hydrological members' selection training methodology.