The ecological integrity of freshwater ecosystems is intimately linked to natural fluctuations in the river flow regime. In catchments with little human-induced alterations of the flow regime (e.g. abstractions and regulations), existing hydrological models can be used to predict changes in the local flow regime to assess any changes in its rivers' living environment for endemic species. However, hydrological models are traditionally calibrated to give a good general fit to observed hydrographs, e.g. using criteria such as the Nash–Sutcliffe efficiency (NSE) or the Kling–Gupta efficiency (KGE). Much ecological research has shown that aquatic species respond to a range of specific characteristics of the hydrograph, including magnitude, frequency, duration, timing, and the rate of change of flow events. This study investigates the performance of specially developed and tailored criteria formed from combinations of those specific streamflow characteristics (SFCs) found to be ecologically relevant in previous ecohydrological studies. These are compared with the more traditional Kling–Gupta criterion for 33 Irish catchments. A split-sample test with a rolling window is applied to reduce the influence on the conclusions of differences between the calibration and evaluation periods. These tailored criteria are shown to be marginally better suited to predicting the targeted streamflow characteristics; however, traditional criteria are more robust and produce more consistent behavioural parameter sets, suggesting a trade-off between model performance and model parameter consistency when predicting specific streamflow characteristics. Analysis of the fitting to each of 165 streamflow characteristics revealed a general lack of versatility for criteria with a strong focus on low-flow conditions, especially in predicting high-flow conditions. On the other hand, the Kling–Gupta efficiency applied to the square root of flow values performs as well as two sets of tailored criteria across the 165 streamflow characteristics. These findings suggest that traditional composite criteria such as the Kling–Gupta efficiency may still be preferable over tailored criteria for the prediction of streamflow characteristics, when robustness and consistency are important.

River flow is the cornerstone of freshwater ecosystems, the ecological integrity of which relies on natural fluctuations in the river flow regime

Most rainfall–runoff models used to predict SFCs relevant for stream ecology require parameter calibration. The selection of the calibration criterion or objective function is of great importance for predictions of SFCs

In order to improve the prediction of a diverse range of SFCs (e.g. related to both high-flow and low-flow conditions or both magnitude and duration of flows), multi-objective calibration methods applied to flows (referred to as traditional objective functions hereafter) have been explored by others. For instance,

To further improve the prediction of a range of SFCs, a pragmatic approach is to use an objective function fitted to the target SFCs (referred to as tailored objective functions hereafter) in the expectation that this will improve predictions of these same SFCs.

Hydrological models are generally found to be less accurate than regional regression models in predicting particular SFCs because separate regression models can be developed for each target SFC

The objectives of this study are to compare the skills of tailored objective functions fitted to SFCs against more traditional objective functions fitted to flows to predict SFCs. This comparison is articulated around four research questions:

Which objective function provides the most accurate SFC predictions?

Which objective function provides the most robust SFC predictions?

Which objective function provides the most stable SFC predictions?

Which objective function yields the most consistent behavioural parameter sets?

The paper is organised as follows: Sect.

In the absence of adequate local data, the selection of streamflow characteristics used in the tailored objective functions relies on previous studies that identified sets of SFCs representative of the habitat preferences of fish communities in the southeastern US

The indices are listed and detailed in Table

List and description of the 18 selected streamflow characteristics. Detailed calculations for each SFC available in Table

This study used discharge records with a minimum of 14 hydrological years with complete daily discharge data in the period from 1 October 1986 to 30 September 2016. If any daily value was missing, the relevant hydrological year was discarded as the calculation of some streamflow characteristics requires a strictly continuous daily streamflow time series. The length of 14 years was set as the minimum requirement in order to have 7 years for calibration and 7 years for evaluation for each catchment. A minimum calibration period length of 5 years is recommended by

The data availability for the gauges meeting these requirements is presented on Appendix Fig.

Spatial location and information on the study catchments:

Catchment selection was also influenced by the quality of the discharge data, including the goodness of fit of the rating equation at the gauge, the number of measurements, and their coverage of low-flow and high-flow extremes, as determined by

The Soil Moisture Accounting and Routing for Transport (SMART) model used here is an enhancement of the SMARG (Soil Moisture Accounting and Routing with Groundwater) lumped, conceptual rainfall–runoff model developed at National University of Ireland, Galway

The routing component distinguishes between five different runoff pathways: overland flow, drain flow, interflow, shallow groundwater flow, and deep groundwater flow (Fig.

List and description of the 10 parameters of the SMART model.

Conceptual representation of the SMART model structure.

Split-sample tests are commonly used to analyse the performance of hydrological models

The split-sampling strategy in this study is adapted from the original approach by

Split-sampling strategy using a 7-year rolling window, adapted from

The SMART model is used in a lumped manner to predict streamflow at the catchment outlet. The model is forced with daily rainfall and potential evapotranspiration data provided by the national meteorological office,

The calibration of the model is done using six different objective functions. The calibration procedure is illustrated in steps (a) to (d) of Fig.

Model calibration and evaluation strategy for the prediction of SFCs with different objective functions. Steps

In step (c) (Fig.

Eventually, in step (d), the best 1 % parameter sets (i.e. those with the highest efficiency scores on the chosen objective function) are retained as “behavioural”, yielding a set of

To examine the absolute performance of each of the six objective functions, a benchmark is defined by randomly sampling

The method used to assess the predictions with a model calibrated with each of the six different objective functions is described in steps (e) to (h) of Fig.

First, the overall performance for the evaluation period of the model calibrated with each of the six objective functions is assessed by averaging the median efficiency scores obtained in step (g) (Fig.

Next, the calibrated models are compared using the tailored (SFC) objective functions for the evaluation period (Sect.

The use of 14 split-sample tests allows for the comparison of the calibrated models for different evaluation periods (see Fig.

The robustness of the models measures their ability to match their calibration fitting skill with their performance in the evaluation period. Poor robustness can indicate model overfitting to the calibration data, which could reduce the predictive power of the model. The robustness is calculated from the difference between the median efficiency in calibration and the median efficiency in evaluation, then by averaging these differences across the 14 split-sample tests, and finally by averaging these across the 33 study catchments (Sect.

Finally, the model consistency obtained with each of the six objective functions is explored. The concept of consistency has been previously used in selecting from competing model structures

For this analysis, the same Latin hypercube sampling of the

To explore the reasons for the trends identified in model performance, stability, robustness, and consistency, the ability of the six objective functions to predict the shape and timing, the variability, and the bias of the observed hydrograph is examined by assessing the three components of

In addition, the ability of the six objective functions to predict each individual SFC is assessed by calculating the absolute normalised error between the simulated and the observed SFC values (Eq.

For each component analysed, the approach is the same as the one used for assessing the overall model performance in Sect.

Finally, the comparative performance of the objective functions to calibrate the hydrological model is assessed on 156 different SFCs and the 9 percentiles of the flow duration curve where Eq. (

The SMART model calibrated on

On average, all six objective functions reproduce the observed hydrograph well when more weight is given to predicting high flows, with

When more importance is given to predicting average-flow conditions, i.e. using

Comparison of the overall performance in evaluation of the model calibrated with the six objective functions. The three traditional objective functions are used as evaluation efficiencies.

While

The differences between most of the objective functions are small (Fig.

Comparison of the skills in evaluation of the model calibrated with the six objective functions. The first column of panels compares them on the overall performance on the three tailored objective functions used as evaluation efficiencies (described in Sect.

The best traditional objective function to predict any of the sets of SFCs is consistently

In addition, the dispersion of the performance across the 33 study catchments, measured by standard deviation (represented as error bars on Fig.

The average stability of the performance across the 14 split-sample tests shows only small differences between the different objective functions (Fig.

The robustness of the different objective functions on the three sets of SFCs uncovers a general trend whereby traditional objective functions are more robust than tailored objective functions; i.e. the drop in performance from the calibration period to the evaluation period is smaller for the traditional objective functions (Fig.

The average drop in performance is consistently below 0.01 for

Unlike the measures of average model performance and stability, the consistency measures reveal more significant differences between the six objective functions compared here (Fig.

Comparison of the consistency of the set of behavioural parameter sets identified with the six objective functions across the 14 split-sample tests (described in Sect.

The consistency ratios for the tailored objective functions appear to be related to the number of SFCs they contain. For instance,

First, comparing the six objective functions on the three components of

Comparison of performance in evaluation of the model calibrated with the six objective functions on individual components of the objective functions. Panel

The bad performance of

The normalised errors for the 18 SFCs that are contained in the three tailored objective functions (Fig.

However,

Unlike the overall performance behaviour described in Sect.

Extending the number of SFCs examined shows that

Comparison of performance in evaluation of the model calibrated with the six objective functions on 156 streamflow characteristics and 9 percentiles of the flow duration curve. A detailed description of each SFC can be found in the Appendix of

Amongst the tailored objective functions,

Beyond the patterns identified above, the relative agreement in the SFCs showing the largest and smallest errors across the six objective functions provides some insight on the easiest and hardest SFCs to predict. It is clear that the average number of flow reversals from one day to the next (ra8) is the most difficult to predict and so are, to a lesser extent, the average slope of the rising and recession limbs (ra1 and ra3). Overall, high-flow events are trickier to predict, whether it is their magnitude (mh1–mh12 – mean daily maximum for each month, mh19 – skewness in annual maximum daily flow, and mh20 – mean annual maximum daily flow), their duration (dh1–dh10 – mean and variability in annual maximum of a moving mean of a 1, 3, 7, 30, and 90 d window), their timing (th1 – timing of annual maximum flow), or their frequency, except for the variability in high-flood events (fh2) and the average number of days exceeding 7 times the median flow (fh4). On the other hand, some SFCs based on the magnitude of flows are easier to predict, e.g. variability in the percentiles of the log-transformed discharge record (ma4), the skewness in daily flows (ma5), various ratios of flow percentiles (ma6-ma8), and various spreads between flow percentiles (m9–m11). The volume of floods exceeding the median, twice the median, and 3 times the median (mh21, mh22, and mh23) are also well predicted, alongside the 90th and 75th percentiles normalised by the median flow (mh16 and mh17). Finally, the mean annual maximum of a moving mean of a 7 and 30 d window normalised by the median flow (dh12 and dh13) are the best-predicted SFCs relating to the duration of flows. For the percentiles of the flow duration curve, it appears that all six objective functions are better suited to predicting its low tail, which is consistent with the lower relative errors for the SFCs on the magnitude of low flows compared with those of high flows.

The choice of the objective function for ecological applications influences the predictive performance of the hydrological model for specific streamflow characteristics

The selection of particular streamflow characteristics for their ecological relevance does not imply that they can represent the overall hydrograph. Indeed, while some indicators originally used as ecologically relevant SFCs

In this context, the definition of a good tailored objective function for ecologically relevant streamflow predictions must be based on SFCs that are key descriptors of the ecological response while also being key descriptors of the hydrological behaviour in the catchment. Otherwise, model consistency may be compromised, and the model predictions will not be as robust outside its calibration conditions. Moreover, the number of SFCs contained in the tailored objective function needs to be considered, given that consistency seems to improve with the number of SFCs contained in the objective function. However, as only three different sets of SFCs were tested in this study, more research would be required to confirm this hypothesis.

Composite traditional objective functions such as the Kling–Gupta efficiency are strong contenders for the prediction of these SFCs. In particular, the use of the KGE on square-rooted flows (i.e.

In future research on the skills of objective functions to predict SFCs, a recently formulated non-parametric version of the KGE criterion could prove useful to predict various SFCs at once. It reduces the emphasis on high-flow conditions, and it provides a more balanced criterion across various flow conditions while avoiding the assumptions on the nature of the errors of the original KGE not necessarily justified for streamflow records

The lack of long continuous time series of observed streamflow is known to be a limiting factor for ecohydrological studies, and, in this case study, the use of 14 years, i.e. 7 years each for the calibration and evaluation periods, is a prime example of this issue. Previous research suggests that a 5-year period is enough to capture the temporal hydrological variability

In order to overcome the lack of long time series of streamflow data, we included non-continuous (i.e. interrupted) data periods to increase the number of study catchments (see Fig.

Given forcing and evaluation data uncertainty and model structural uncertainties, the small differences in model performance calibrated with the different objective functions could be considered insignificant. However, to reduce the influence of data uncertainty, this comparison of objective functions was carried out on a set of 33 study catchments and on 14 split-sample tests. Moreover, the use of the median performance across a set of behavioural parameter sets reduces the influence of equifinality problems

The findings in this study could also be somewhat model specific and region specific. However,

Finally, the analysis of the consistency was based on the number of times the exact same parameter set was identified as behavioural across the 14 split-sample tests. However, it is possible that in some split-sample tests, a parameter set identified as behavioural is near another parameter set also identified as behavioural in another test. This is one limitation of the consistency approach selected here, and it is suggested that future research efforts on the topic could use clustering analysis techniques in order to overcome this limitation by comparing the spread of the cluster(s) formed by the behavioural parameter sets instead.

Hydrological models are usually preferred over statistical regression models when the impacts of a changing climate on the flow regime and the associated ecologically relevant SFCs is of interest. Even though regression models may fit historical calibration data better

Assuming a suitable set of SFCs has been found, as described in Sect.

Understanding the ecological response to altered flow regimes is hindered by the lack of corresponding hydrological data

One approach to regionalisation is the transfer of optimised parameter values from gauged to ungauged locations

Alternatively, streamflow characteristics can be directly transferred from gauged to ungauged locations

Desirable qualities for a useful objective function are that it identifies model parameter values that perform well in the evaluation, i.e. outside calibration and independent of the period considered, and that it consistently identifies the same parameter sets regardless of the study period, i.e. that it describes a consistent catchment hydrological behaviour. This study explored these aspects for six different objective functions intended to predict three combinations of streamflow characteristics that are assumed to be relevant for stream ecology. In relation to the research questions presented in the Introduction, the study showed that tailored objective functions (fitted to SFCs) perform marginally better than traditional objective functions (fitted to flows) in predicting all three combinations of SFCs on average (Q1), while proving to be less robust outside calibration than their traditional counterpart (Q2); no general trend could be found to support the claim that any objective function yields more stable SFC predictions across the split-sample tests (Q3); and traditional objectives functions fitted to untransformed flows and to square-rooted flows select more consistently the same parameter sets as behavioural across the split-sample tests than any of the three tailored objective functions made of SFCs (Q4). In addition, it was found that the ranking of the six objective functions is not altered when considering their performance on a very large and diverse set of SFCs.

This study reveals that a gain in fitting performance for the SFCs may be at the expense of consistency in the behavioural parameter sets across the split-sample tests. This highlights that fitting ecologically relevant SFCs well is not necessarily a guarantee of representing all the key hydrological processes (i.e. informative signature) defining the catchment response. Unless streamflow characteristics are proven to be both ecologically relevant and an informative signature at once, carefully selected traditional objective functions fitted to flows are likely to remain preferable to predict ecologically relevant streamflow predictions to avoid consistency issues.

The rainfall and potential evapotranspiration daily datasets are available online from

The supplement related to this article is available online at:

This work is part of the PhD research of TH at the UCD Dooge Centre for Water Resources Research under the supervision of MB and FEO'L. TH developed the idea. TH collected the data and performed the model simulations. TH wrote the original draft, edited the different drafts, and produced the final version of this paper. MB and FEO'L reviewed and edited the different drafts and the final version of this paper.

The authors declare that they have no conflict of interest.

The authors would like to thank the editor and the anonymous reviewers for their comments and suggestions that contributed to improving this paper.

This research has been supported by the Ireland's Environmental Protection Agency (grant no. 2014-W-LS-5).

This paper was edited by Jan Seibert and reviewed by Cristina Prieto and two anonymous referees.