Fuzzy committees of specialized rainfall-runoff models: further enhancements and tests

Often a single hydrological model cannot capture the details of a complex rainfall–runoff relationship, and a possibility here is building specialized models to be responsible for a particular aspect of this relationship and combining them to form a committee model. This study extends earlier work of using fuzzy committees to combine hydrological models calibrated for different hydrological regimes – by considering the suitability of the different weighting function for objective functions and different class of membership functions used to combine the specialized models and compare them with the single optimal models.


Introduction
Conceptual rainfall-runoff models are based on fluxes and storages representing relevant hydrological processes, and one of the challenges is to identify a set of parameters characterizing the behaviour of time-varying streamflows in a catchment.In lumped models the parameters cannot be measured directly due to the dimensional and scaling problems (Beven, 2000).These are calibrated based on the measurement of meteorological forcing data to produce model predictions that are as close as possible to the observed discharge data using some degree of expertise and experience.Typically this approach focuses on a single model using the best single set of parameters.However the model produced by one best set of parameters might not equally well describe the characteristic of the hydrological processes for all ranges of flow, and multiple models can be built from different components of flow hydrograph that correspond to different flow regimes.These models can be then combined providing a more comprehensive and accurate representation of catchment processes.Such models are referred to as multi-models, or committee models.
Multi-model approaches are not new in hydrological modelling -examples are the early works of Keefer and Mc-Quivey (1974), Todini and Wallis (1977), Bruen (1985) and Becker and Kundzewicz (1987), who built piece-wise linear models instead of the overall linear hydrological model.Cavadias and Morin (1986) aggregated several watershed models which were considered by WMO (1986) for intercomparison of their model performances.Juemoe et al. (1987) combined a conceptual model and a statistical model, which is known as synthesized constrained linear systems model.This model was developed by a combination of the Xinanjiang model (Zhao, 1977) and the constrained linear system model (Todini and Wallis, 1977).McLeod et al. (1987) combined three models, namely, transfer function noise model, periodic autoregressive model and conceptual model for flow forecast.Since then various authors have been exploring various approaches to identification of different hydrological regimes and the ways of combining specialized models, both process-based and data driven (e.g.Shamseldin et al., 1997;Abrahart and See, 2002;Solomatine and Xue, 2004;Anctil and Tape, 2004;Solomatine, 2006;Oudin et al., 2006;Ajami et al., 2006;Fenicia et al., 2007;Nasr and Bruen, 2008;Cullmann et al., 2008;Toth, 2009).
Published by Copernicus Publications on behalf of the European Geosciences Union.

N. Kayastha et al.: Fuzzy committees of specialized rainfall-runoff models
This paper continues to explore and improve the dynamic combination "fuzzy committee" method outlined in Solomatine (2006) and further developed and tested in Fenicia et al. (2007).Weights assigned to each specialized model's output are based on optimally designed fuzzy membership functions, and they may be different at every time step depending on the current value of flow.In the present paper we test the performance of committee models that use several weighting schemes in objective functions for calibration of specialized models, as well as different membership functions to combine models.We also test their performance on test data sets.The method is tested on three catchments, and two approaches of optimization are used: (i) multi-objective optimization non-dominated sorted genetic algorithms (NSGA II) by Deb et al. (2002) to find Pareto-optimal solutions of specialized models and (ii) a single objective optimization -genetic algorithm (GA) (e.g.see Goldberg, 1989) and adaptive cluster covering algorithm (ACCO) by Solomatine (1999) are used to calibrate the single specialized models and single optimal models.

Lumped conceptual modelling
A simplified version of HBV model (Lindström et al., 1997;Fenicia et al., 2007) is used for this study.This is a lumped conceptual hydrological model which describes hydrological processes at catchment scale.The model comprises subroutines for snow accumulation and melt, soil moisture accounting procedure, routines for runoff generation, and a simple routing procedure.The model has 13 parameters; however, only 9 parameters are effectively used when there is no snowfall.

Building specialized models
We can build several sub-models instead of using only one single model to characterize better the various regimes which represent the catchment hydrological behaviour.The submodels are also called "specialized models".One of the approaches of multi-modelling has been implemented in this study.The details of such an approach, previously adopted by Fenicia et al. (2007), are briefly outlined below and complemented by the possibilities of its further improvement.We considered high flows and low flows as distinctive regimes, or states of the system behaviour.Our aim was to reproduce the system response during both regimes accurately.In order to evaluate the performance of the single hydrological model in both conditions, the two weighted objective functions are used, where one is stressing the model error with respect to low-flow simulation, and the other stressing the model error with respect to high flows.
The two objective functions are defined as follows: where n is total number of time steps, Q s,i is simulated flow for the time step i, and Q o,i is observed flow for the time step i.The types of the weighting functions (schemes) together with their parameters will be referred to further as WStype, and corresponding equations and figures are given below.
WStype WLF,i WHF,i Eq.Figures In above equations the variables l and h are calculated as where Q o,max is maximum observed flow, N is power value (for the experiment in this study we considered only 1, 2 or 3), and α is the threshold for selecting weights of flows (it was chosen to be 0.75).Note that both N and α can be also subjected to optimization, but in this study it was not done.By computing both objective functions over the whole range of discharges, both functions constrain the model to fit the entire hydrograph for WStype I where α parameter is not used.However for WStype with α parameter, W LF excludes high flows from computation of objective function if the condition is l > α.In the same way, W HF excludes low flows if h ≤ α for WStype III and IV.One potential issue related to the scaling formulas (Eq.7) is worth mentioning: Q o,max is the maximum for calibration data, but this of course does not guarantee that it will not be superseded in the future when model is in operation (or when simulating operation by using verification data).The quadratic function will still handle values above 1, but if the calibration maximum is exceeded considerably, then the high flow will be given disproportionally high weights, and low flows disproportionally low ones.A solution could be in using a bit wider range for scaling.

Combining specialized models
The specialized models are built under the conditions of different regimes of catchment hydrological responses and are combined using the appropriate combining scheme.However the issue is how to handle the compatibility at the boundaries between the two different specialized models.One of the possible ways is to use a soft weighting scheme that switches smooth transition between boundaries.The contribution of each specialized model is based on using a fuzzy membership function -the so-called "fuzzy committee" described by Solomatine (2006).In this weighting scheme we initially used trapezoidal functions parameterized by the two transitional parameters [γ , δ] (see Fig. 2a): the membership function (weight) of low-flow model is assigned 1 when the relative flow is below the parameter γ , then starts to decrease in the proximity of the region boundary when the relative flow is between γ and δ; it decreases to zero beyond the boundary when the relative flow is above δ.Similarly, the membership function of the high-flow model follows the opposite logic.These membership functions for the two specialized models are described by Eqs. ( 9) and ( 10).The outputs of models are multiplied by the weights that depend on the value of flow and then normalized (Eq.8).So the overall committee model is defined as follows: where m LF and m HF are membership functions (denoted also as MFtype) for the two specialized models, Q LF,i and Q HF,i are simulated high and low flows for the time step i; γ and δ are threshold for high and for low flows respectively, N is power value which was used to smooth between the models (if N = 1, MFtype is referred to as Type A (see Fig. 2a), and for N = 2 or more -as Type B (see Fig. 2b)).
Building committee models consists of the following steps.First the two optimal specialized models -model 1 for low flow (Q LF,i ) and model 2 for high flow (Q HF,i ) -are sought using optimization (minimizing RMSE LF for model 1 and RMSE HF for model 2); this can be done by solving a single-objective optimization problem separately for these two models, or by multi-objective optimization for two objective functions RMSE LF and RMSE HF .After that model 1 and model 2 are combined using the two membership functions m LF and m HF whose two parameters δ and γ are found by optimization to ensure the lowest classical root mean squared error (RMSE) of the committee model (Fig. 2).The resulting model is subsequently verified (tested) on the verification (test) data set, and compared to the single hydrological model (which is optimized by a single-objective optimization algorithm) based on RMSE as an objective function and in addition evaluated using Kling-Gupta efficiency index (KGE) proposed by Gupta et al. (2009).They discuss limitations of using a single criterion like Nash-Sutcliffe efficiency (NSE) (or closely related mean squared error, MSE) and show that it can be decomposed into the three components -representing correlation, variability, and bias.These authors also suggest aggregating this information into one formula (representing distance in the space of the three mentioned criteria to the ideal point, which is zero).They state however that "the primary purpose of this study was not to design an improved measure of model performance, but to show clearly that there are systematic problems inherent with any optimization that is based on mean squared errors (such as NSE).The alternative criterion KGE was simply used for illustration purposes."We decided that this statement is too modest and followed the other authors (e.g.Pechlivanidis et al., 2011;Coron et al., 2013) that explicitly use KGE as one of the model performance measures.An additional advantage of using KGE is that it is a relative measure permitting intercatchment comparison.Using RMSE along with KGE, we run the risk of having an overlap (since KGE is in fact constructed of components of NSE and hence RMSE), but still consider this to be useful to do to follow an adopted hydrological practice, and to allow for having a better judgement about various facets of model performance including an explicit absolute measure of model error.
The equations of RMSE and KGE are given below.
where Q o,i is the observed discharges for the time step i, Q s,i is the simulated discharges (single optimal or committee models) for the time step I , and n is the number of observations.
where c is the linear cross-correlation coefficient between Q o and Q s , α is a measure of variability in the data values (equal to the standard deviation of Q s , over the standard deviation of Q o ) , and β is equal to the mean of Q c , over the mean of Q o .KGE is subject to maximization with an ideal value at unity.

Results and discussion
Three catchments, namely, Alzette catchment in Luxembourg, Leaf River catchment in USA and Bagmati catchment in Nepal, are selected for case study.The summary statistics and records of data for calibration and verification of catchments are presented in Table 1.These data cover multipleyear periods (except Alzette), all seasons and multiple peak flows.Ideally, we have to try to split data into statistically similar sets (coverage of seasons, number and size of peaks, variance, mean, etc.).Of course in this type of splits of hydrological data, one is constrained by the wish to keep data in contiguous blocks (to be able to plot the time-series data such as hydrographs), so the calibration and verification data sets almost always have statistical differences.
The experiment follows the one used in an earlier study (Fenicia et al., 2007) where the Alzette catchment was considered, and only calibration data were considered for building the models without further validation.We present here two additional catchments (Leaf and Bagmati) with both calibration and verification periods, and compare the overall model performance when using different weighing schemes for objective functions (Fig. 1) and different membership functions (Fig. 2).First two months of calibration data are considered as the warming-up in Leaf and Bagmati catchment.However for Alzette catchment we used the hourly data set of only one year for the calibration period and one year for verification.To somehow compensate for the lack of data in calibration, we allocated 168 h of data for the warmup period.
The ranges of HBV model parameters for optimization are given in Table 2.We produced the specialized models (the best single model specialized on high flows and low flows) which are optimized by multi-and single-objective optimization algorithms.The identified best sets of parameters for different models are given in Table 3.It is worth mentioning that in this paper we use only one model structure (HBV), but the use of several calibration methods and several error functions results in several model parameterizations, or instantiations.For simplicity we will be speaking of several models, meaning actually the same model structure but its several parameterizations.
We present results of calibration using several optimization algorithms: NSGA-II, GA and ACCO.The reasons for using different optimization algorithms for calibration are as follows: (1) the initially used GA appeared to be quite slow (in terms of the required model runs), so we decided to test the use of faster algorithms as well; (2) to cross-check one by another since they both use randomization of initial population and this affects results.We have also tried the use of a stepwise line search (SLS) algorithm (Kuzmin et al., 2008), which showed its effectiveness and efficiency, and its working depends less on the randomized generation of the initial population but uses rather some a priori estimates; these tests however were not finalized, so we decided not to present the intermediate results in this comparison.
In each experiment a committee model is compared with the single optimal model which is calibrated by two different single-objective optimization algorithms (GA and ACCO).
The best single models specialized for low and high flows respectively (found by NSGA-II, GA and ACCO) are used in the committee model.The points denoted "committee models" correspond to the model parameterizations generated during the exhaustive search for the best γ and δ ensuring the lowest RMSE.We also tested a committee model which is built by combining the specialized models, and compared against the single optimal model for all catchments.In Tables 4 and 5 these models are denoted as Q c (ACCO), Q c (GA) and Q c (NSGA-II).Interestingly, the committee model is better on both objective functions (RMSE and KGE) than the single optimal models for all case studies.
The graphs of Pareto-optimal of single specialized models (calibrated by NSGA-II), committee models and single optimal models (calibrated by ACCO and GA) for Leaf catchment are shown in Fig. 3 (Kayastha and Solomatine, 2013).In this plot one may see the 20205 single specialized models presented as they were generated during multi-objective optimization process by using WStype I weighting scheme.The 70 model parameterizations identified as the best (Paretooptimal) are presented as well (by the darker points).The Pareto-optimal models in calibration are not necessarily the best in verification -this is of course no surprise; what is however important is to stress that their performance is not too much lower than that of calibration data.Among the Pareto-optimal models, the best single model specialized on low flows obtained 11.01 in RMSE LF (all errors are in m 3 s −1 ), and it can be seen from Figure 3b that this model is also very close to be the best in verification with RMSE LF being 18.33.However, the best single model specialized on high flows (RMSE HF is 6.08) is not too bad, but not the best in verification (its RMSE HF is 8.42 and in Fig. 3b it is easy to see that there are many models with lower RMSE HF ).
The committee model resulted in the classical RMSE of 15.63 in calibration and 25.23 in verification.To represent this model in Fig. 3 (solid square), we had to calculate for it the corresponding RMSE LF and RMSE HF , and we did it  using the same type of weighing scheme (WStype I) which was used in calibration of specialized models.In the same way we presented the single optimal models identified using the two single objective optimization methods (ACCO and GA represented by a circle and a star respectively).It can be seen that the committee models are closer to the ideal point than the other single optimal models, and this means that the committee model's performance is the highest of any of the single models.Plots for Bagmati and Alzette are not presented here due to space limitations.The performances of the best single models specialized on high and low flows (RMSE HF and RMSE HF ) on various catchments are presented in Table 4.However, again, it should be noted that RMSE LF , and RMSE HF cannot be compared since they use different formulas.RMSE LF values are even higher than those of RMSE HF -the reason is that the number of low flows is much higher than of high flows, and the denominator (total number of observations) in both formulas is the same.
In Leaf catchment we tested all possible combinations of different weighting schemes types and classes of membership functions, the results of which are presented in Table 5. Noticeably, all committee models improved their performances in verification in comparison to the single hydrological models which were optimized by single objective optimization.However, in the other two catchments Alzette and Bagmati, the number of experiments was smaller, but in all of them the committee models demonstrated the higher performance in both calibration and test data sets.
Table 4 reports the performance of committee models and single-optimal models calibrated by ACCO and GA for each catchment.For the Bagmati catchment, the RMSE calculated by single optimal model in calibration is 101.01 m 3 s −1 , and verification is 112.42 m 3 s −1 .However, it can be noticeably improved by the committee models and obtained around 94.16-95.96m 3 s −1 in calibration and 109.38-110.29 m 3 s −1 in verification.The RMSE of single model produced 26.76 m 3 s −1 in verification period for Leaf catchment.However, when new types of weighting and membership functions were used, RMSE dropped to 23.41 m 3 s −1 (see Table 5).
The visual plots of the committee models which are built from the combination of the two specialized models for high and low flows with respect to the hydrograph simulations are represented in Fig. 4. It can be observed that the committee model combines the best features of the specialized models.
Our experiments have led to one important observation related to using the weighting function for objective functions (Fig. 1 and Eqs.3-7) in calibration of specialized models quadratic function we used earlier (Fenicia et al., 2007) was in fact the first guess that it will allow for distinguishing the low and high flows.In our latest experiments it appeared, quite expectedly, that other function (for example, cubic) may work better in calibration period.
It is useful to mention an issue of overfitting that was raised by one of the reviewers and reiterated by the editor.This problem (typically addressed in data-driven modelling) may lead to a decrease of accuracy in operation (which can be detected during validation), and there are several ways of dealing with it, for example trying to limit the complexity of a model, and/or using cross-validation data set to control the calibration process.The committee model includes a number of parameters that increase accuracy of the overall model.However, this contributes to increasing the complexity of the model and may lead to overfitting.However the number of parameters added is very small (only twoδ and γ in the membership function), so an increase in complexity is also small, and so we see no need in using regularization.Earlystopping strategy cannot be used since process of training is pure exhaustive search, and not iterative, so it cannot be stopped prematurely.Alternatively, a cross-validation set can be used; however, it would require effort in finding the optimal splitting of the data set into three rather than two sets, with the smaller subsets for each purpose, and subsequent analysis of the impact of having smaller sets for calibration and testing.So in this paper we have not used crossvalidation but can recommend doing so in future research if the size of data would allow for this.

Conclusions and direction for further work
In this study we presented further improvements to a fuzzy committee approach -one possible way to improve the hydrological model prediction involving a combination of model outputs obtained by differently parameterized models with the same model structure.The two best single models specialized in low flows and high flows, which were found by optimization based on two objective functions RMSE LF and RMSE HF respectively, are combined using membership function to form the committee model and this model compared to the single hydrological model based on RMSE as an objective function.
The major findings of this study are as follows: -By conducting a number of experiments, we can confirm the initial results reported earlier in  al. ( 2007) that the combination of two specialized models indeed provides a method leading to a higher model of the resulting model.In three case studies we could reproduce the situation shown in Fig. 3 when the fuzzy committee model is better on both objective functions than the single model(s).In calibration a committee model is always better than the single model, independent of the values of parameters MFtype and WStype (however we have to optimize δ and γ ).
-We found that the best committee model identified by calibration, when tested on verification data, outperforms the best single model (identified by calibration) in all case studies (Table 4).We see this as the most important conclusion from this study.
-We conducted a thorough investigation of the impact of MFtype and WStype on model performance using the Leaf catchment data (Table 5).We cannot suggest the "universal" best set of parameters MFtype and WStype applicable for any case study: in calibration all of them led to models better than the single one, but in verification performance of models using different MFtype and WStype slightly differs for different cases (however it is still higher than that of the single model).
-There is an interesting effect concerning direct optimization of parameters γ and δ.It appeared that in most experiments after optimization these parameters obtained very close values, which means that there is a very narrow region where specialized models "work together".Potentially this may lead to situations when a minor change in average flows will force the committee model to produce relatively large changes in outputs.
-Results for the Alzette case study should be considered with care since the data set was limited (one year for calibration and one year for verification).
Further development and application of the presented approach is seen in the following.
-It would be useful to explore possible interactions between the parameters of the weighting functions and the shapes and parameters of the membership functions.
-The committee model may be sensitive to the choice of these parameters determining the shapes of the weighting and membership functions, and indeed exploring the use of more robust optimization methods for their identification may ensure higher robustness of the committee model.
-More accurate comparison between performances of various models can be made if the problem of overfitting is addressed (for example by cross-validation during model calibration and stopping the optimization process earlier to ensure minimum error on crossvalidation) instead of the "deep" optimization of the model on calibration set and using the single validation set.
An aspect related to the model performance metrics which needs attention as well is worth mentioning.We used the same objective functions for different magnitude of flows which originate from statistical theory.However the nature of this metric (e.g.RMSE) is basically oriented towards high flows and may not be suited for low flows.Therefore the performance measure can be in the form of transformed metric (e.g.transformed RMSE) to calibrate a low-flow model (e.g.van Werkhoven et al., 2009;Willems, 2009;Kollat et al., 2012), or being a multi-scale objective function taking into account errors at various timescales (Kuzmin et al., 2008).
Further developments are foreseen in improving the weighting schemes involving hydrological states and various combinations of variables influencing the streamflow (for example those presented by Oudin et al., 2006;Kim et al., 2006;Corzo and Solomatine, 2007a, b;Marshall et al., 2007;Jeong and Kim, 2009;Fernando et al., 2012).Combining these approaches will hopefully lead to the techniques for discovering various regimes in the time series representing the modelled system -and this would allow for optimal combination of domain (hydrologic) knowledge incorporated in models with automatic machine learning or time-series analysis routines.

Fig. 2 .
Fig. 2. (a) A typical fuzzy membership function used to combine the single specialized models (MFtype A), (b) a class of membership functions for high and low-flow models tested in the new experiments (MFtype B).

Fig. 3 .
Fig. 3.The identified sets of Pareto-optimal parameterizations of single specialized models (optimized by NSGA-II), committee models and single optimal models, calibrated by ACCO and GA in Leaf catchment: (a) calibration data set and (b) verification data set (model parameterizations from (a) are used).
The two weighting functions W LF and W HF allow for placing the stronger weight on the low or on the high portions of the hydrograph.As a result, RMSE LF places stronger weight on low-flow errors and weaker weight on high-flow errors than RMSE HF .(Please note that values RMSE LF and RMSE HF cannot be compared to each other and to the values of RMSE because of difference in weighting; this is important when viewing the resulting plots.)

Table 1 .
The summary of the runoff data for Alzette catchment in Luxembourg, Leaf catchment in USA, and Bagmati catchment in Nepal.

Table 2 .
The ranges of model parameters.Note: The unit d (day) is used for Leaf and Bagmati catchments instead of h (hour).

Table 3 .
Sets of parameters identified by different optimization algorithms.
SM: single hydrological model (single optimal model-optimized by a single-objective optimization algorithm based on the classical RMSE); LF and HF: low-flow model and high-flow model (optimized by a single-objective optimization algorithms and multi-objective optimization based on the RMSE LF and RMSE HF ).

Table 4 .
The performances of single optimal models (optimized based on classical RMSE) and committee models of various catchments.Committee models are assembled by combination of the weighting scheme WStype I and membership function MFtype A.

Table 5 .
Performance of committee models for the Leaf catchment, with possible combinations of the various weighting schemes and membership functions.
The value of α = 0.75 used in WStype II, III and IV (bold: best CMs, italics: best SMs).