In many real-world flood forecasting systems, the runoff thresholds for activating warnings or mitigation measures correspond to the flow peaks with a given return period (often 2 years, which may be associated with the bankfull discharge). At locations where the historical streamflow records are absent or very limited, the threshold can be estimated with regionally derived empirical relationships between catchment descriptors and the desired flood quantile. Whatever the function form, such models are generally parameterised by minimising the mean square error, which assigns equal importance to overprediction or underprediction errors.

Considering that the consequences of an overestimated warning threshold (leading to the risk of missing alarms) generally have a much lower level of acceptance than those of an underestimated threshold (leading to the issuance of false alarms), the present work proposes to parameterise the regression model through an asymmetric error function, which penalises the overpredictions more.

The estimates by models (feedforward neural networks) with increasing degree of asymmetry are compared with those of a traditional, symmetrically trained network, in a rigorous cross-validation experiment referred to a database of catchments covering the country of Italy. The analysis shows that the use of the asymmetric error function can substantially reduce the number and extent of overestimation errors, if compared to the use of the traditional square errors. Of course such reduction is at the expense of increasing underestimation errors, but the overall accurateness is still acceptable and the results illustrate the potential value of choosing an asymmetric error function when the consequences of missed alarms are more severe than those of false alarms.

In the operation of flood forecasting systems, it is necessary to determine the values of threshold runoff that trigger the issuance of flood watches and warnings. Such critical values might be used for threshold-based flood alert based on real-time data measurements along the rivers (WMO, 2011) or for identifying in advance, through a rainfall-runoff modelling chain, the rainfall quantities that will lead to surpass such streamflow levels, as in the Flash Flood Guidance Systems framework (Carpenter et al., 1999; Ntelekos et al., 2006; Reed et al., 2007; Norbiato et al., 2009).

A runoff threshold should correspond to a

In the absence of more sophisticated physically based approaches, based on detailed information of each specific cross section that is rarely available due to limited field surveys, the literature suggests to estimate the bankfull flow as the flood having a 1.5- to 2-year return period (Carpenter et al., 1999; Reed et al., 2007; Harman et al., 2008; Wilkerson, 2008; Hapuarachchi et al., 2011; Cunha et al., 2011; Ward et al., 2013) and a flow that is slightly higher than bankfull may be identified with the 2-year return period flood (Carpenter et al., 1999; Reed et al., 2007).

Many operational systems all around the world adopt a statistically based definition of the flooding flow and the flows associated with given return periods are used as threshold stages for activating flood warning procedures.

The 2-year recurrence is used by many river forecast services in the United States, as suggested by Carpenter et al. (1999), also due to the fact that “the good national coverage of the 2-yr return period flows that the U.S. Geological Survey (USGS) maintains nationwide supports its use” (Ntelekos et al., 2006), as well as in British Columbia (Canada).

However, the floods with different annual exceedance probabilities, associated with different levels of risk, are also frequently adopted in operational real-time flood warning systems: for example in the Czech Republic, flood watch usually corresponds to a 1- to 5-year flow return period (Daňhelka and Vlasák, 2013). In Italy, where a national directive issued in 2004 introduces a system articulated on at least two levels of flow thresholds, many regions have identified the alert levels as flood quantiles with return periods of 2, 5 or 10 years (e.g. the Abruzzo, Lombardia, Puglia Regions). In southern France, the AIGA (Adaptation d'Information Géographique pour l'Alerte) flood warning system compares real-time peak discharge estimated along the river network (on the basis of rainfall field estimates and forecasts) to flood frequency estimates of given return periods (with three categories: yellow for values ranging from 2- to 10-year floods, orange for between the 10 and the 50-year floods and red for peaks exceeding the 50-year flood) in order to provide warnings to the national and regional flood forecasting offices (Javelle et al., 2014).

For river sections where the streamflow gauges are newly installed or where historical rating curves are not available, the observations of the annual maxima are absent or very limited and it is not possible to obtain a reliable estimate of flood quantiles on the basis of statistical analyses of series of observed flood peak discharges.

For these ungauged or poorly gaged basins, the peak flow of a given frequency to be associated with the watch/warning threshold can be estimated transferring information from data-rich sites to data-poor ones, as it is done in the corpus of methodologies applied in RFFA (Regional Flood Frequency Analysis) at ungauged sites, which have always received considerable attention in the hydrologic literature (Bloeschl et al., 2013). Among the possible approaches (statistical and process based) to predict floods in ungauged basins, many researchers have traditionally applied regression-like regionalisation methods for (i) the estimation of the index flood (Darlymple, 1960), usually defined as either the mean or the median (that is the 2-year return period quantile) of the annual maximum flood series, or for (ii) the direct estimate of other quantiles of annual maxima in ungauged basins (Stedinger and Lu, 1995; Salinas et al., 2013). Such methods are based on the assumption that there is a relationship between catchment properties and the flood frequency statistics and are implemented through a regression-type model that relates the flood quantile or the index flood to a number of relevant morpho-climatic indexes. Linear or power (often linearised through a log-transformation) forms, with either a multiplicative or additive error term, are the most commonly used functions (see e.g. Stedinger and Tasker, 1985; GREHYS, 1996; Pandey and Nguyen, 1999; Brath et al., 2001; Kjeldsen et al., 2001, 2014; Bocchiola et al., 2003; Merz and Bloeschl, 2005; Griffis and Stedinger, 2007; Archfield et al., 2013; Smith et al., 2015).

In order to allow for more flexibility to the model structure (whose true form is of course not known), the international literature has recently proposed methods based on the use of artificial neural networks (ANNs), providing a non-linear relationship between the input and output variables without having to define its functional form a priori. Successful applications of ANNs for the estimation of index floods or flood quantiles at ungauged sites are reported in Muttiah et al. (1997), Hall et al. (2002), Dawson et al. (2006), Shu and Burn (2004), Shu and Ouarda (2008), Singh et al. (2010), Simor et al. (2012) and Aziz et al. (2013).

Both the traditional power form or linear regression methods and the neural networks models are generally parameterised by minimising the mean or root mean of the squared errors, i.e. a symmetric function assigning the same importance to overestimation and underestimation errors.

Nevertheless, the consequences of under or overestimating the runoff threshold when used for early warning are extremely different.

Adopting a watch threshold that is higher than the runoff/stage that actually produces flooding damages would in fact lead to missing such events, failing to issue an alarm. Underestimating the runoff threshold may instead determine the issue of false alarms.

False alarms may certainly lead to money losses and also “undermine the credibility of the warning organisation but are generally much less costly than an unwarned event” (UCAR, 2010): in fact the costs of failing to issue an alarm grow rapidly in a real emergency, since a totally missed event has strongly adverse effects on preparedness. The costs of false warnings not only are commonly much smaller than the avoidable losses of a flood, but also cannot match up to indirect and/or intangible flood damages such as loss of lives or serious injuries (Pappenberger et al., 2008; Verkade and Werner, 2011).

Furthermore, regarding the effects of false alarms, “in opposition to `cry wolf' effect, for some they may provide an opportunity to check procedures and raise awareness, much like a fire practice drill.” (Sene, 2013)

Overall, false alarms have usually a higher level of acceptance than misses and this entails that the estimate of flood warning thresholds should be cautionary, so as to conservatively reduce the number of missed alarms.

For the development of watches and warnings it is therefore important to obtain estimates as accurate as possible, minimising both positive and negative errors, but considering that an error will always be present; it is better underpredicting rather than overpredicting the threshold estimate, for safety reasons.

To obtain a conservative estimate of the thresholds, penalizing more the predictions that exceed the “observed” values (in the present case represented by the quantile estimate based on the statistical analysis of measured flow peaks) than those that underestimate them, in the present work it is proposed, for the first time to the Author's knowledge, a parameterisation algorithm that weights asymmetrically the positive or negative errors, in order to decrease the consistency of overestimation and therefore the risk of missing a flood occurrence.

It is important to underline that the proposed asymmetric error function is here applied for optimising a neural network model for predicting the 2-year return period flood (due to its association with the bankfull conditions) but it might be used to improve any other kind of methodology for the estimate of flood warning thresholds associated with any return period.

Section 2 presents the asymmetric error functions; Sect. 3 describes the information available in a database covering the entire country of Italy and the identification of the subsets to be used for a rigorous cross-validation approach. Section 4 presents the implementation of the models for estimating the 2-year return period flood in ungauged catchments, consisting of artificial neural networks calibrated using respectively the symmetric square error and the asymmetric error functions. The results are presented and then discussed in Sect. 5 and Sect. 6 concludes.

The scientific literature on forecasting applications, in any scientific area, adopts almost exclusively an objective function based on the sum or mean of the squared discrepancies, i.e. a symmetric quadratic function, due to the well-established good statistical properties of the minimum mean square error estimator.

On the other hand, in economics as well as in engineering and many other fields, there are cases where the forecasting problem is inherently non-symmetric and, in the financial forecasting literature, the use of mean squared error, even if still widely applied, is nowadays not always accepted.

Error (or loss) functions devised to keep account of an asymmetric behaviour have been proposed, such as the linear exponential, the double linear and the double quadratic (Christoffersen and Diebold, 1996; Diebold and Lopez, 1996; Granger, 1999; Granger and Pesaran, 2000; Elliot et al., 2005; Patton and Timmerman, 2006). In particular, Elliot et al. (2005) recently presented a family of parsimoniously parameterised error functions that nests mean squared error loss as a special case (Patton and Timmerman, 2006).

Asymmetric quad–quad loss function (with

Such function, adapted from Elliot et al. (2005) and defining the error

For

When

In the water engineering field, the asymmetric Elliot error function with
quadratic amplification (

It should be noted that the proposed methodology is a deterministic one, where an optimal point forecast is obtained by minimising the conditional expectation of the future loss; such a framework does not have the advantages of a probabilistic one in terms of quantification of the uncertainties of the prediction, but it aims to identify the optimal value for the threshold in terms of operational utility.

In Sect. 4, the asymmetric quadratic error function is proposed for optimising the parameters of an input–output model, based on artificial neural networks, between the input variables summarising a set of catchment descriptors (obtainable also for ungauged river sections) and the 2-year return period flood, thus warranting that overestimation errors, which would increase the risk of missing flood warnings, are weighted more than underestimation ones.

The case study refers to a database of almost 300 catchments scattered all over the Italian Peninsula, compiled within the national research project “CUBIST – Characterisation of Ungauged Basins by Integrated uSe of hydrological Techniques” (Claps and the CUBIST Team, 2008).

The 12 geomorphological and climatic descriptors are listed in Table 1. The data set unfortunately lacks information on other hydrological properties (e.g. on soils, land-cover, vegetation) and the climatic characterisation is very limited (for example information on extreme rainfall would be extremely important), but the CUBIST set is currently the only database available in the Italian hydrologists community at a national scale.

The data set is described in Di Prinzio et al. (2011), where, following a catchment classification procedure based on multivariate techniques, the descriptors were used to infer regional predictions of mean annual runoff, mean maximum annual flood and flood quantiles through a linear multi-regression model.

As described in such work, in order to reduce the high-dimensionality of the geomorphological and climatic descriptors set, a Principal Components (PC) analysis was applied, obtaining a set of derived uncorrelated variables. The PC variables are as many as the original variables, but they are ordered in such a way that the first component has the greatest variability, the second accounts for the second largest amount of variance in the data and is uncorrelated with the first, and so forth. In the present data set, the first three principal components explain more than three-quarters of the total variance (see Di Prinzio et al., 2011) and these first three PCs are chosen here as input variables to the models described in the following, assuming that they may adequately represent, in a parsimonious manner, the main features of the study catchments.

Geomorphological and climatic descriptors of the CUBIST database of Italian catchments.

The database, in addition to the morpho-pluviometric information, includes the annual maxima flow records for periods ranging from 5 to 63 years, whose median values, corresponding to the 2-year return period, represent the output variable to be simulated by the models. Only 9 of the 300 locations had less than 8 years of data and therefore, all records were deemed to be sufficient for the purposes of this study.

The data set covers a great diverseness of hydrological, physiographic and
climatic properties and in order to partially reduce such heterogeneity, it
was decided to limit the analysis to catchments that have a 2-year flood
in the range of 10–1000 m

As will be detailed in Sect. 4, the database is to be divided in three
disjoint subsets (called training, cross-validation and test sets) in order
to allow for a rigorous independent validation and also to increase the
generalisation abilities of the model when encountering records different
from those used in the calibration (or

The way in which the data are divided may have a strong influence on the performance of the model and it is important that each one of the three sets contains all representative patterns that are included in the data set. As proposed in the recent literature (Kocjancic and Zupan, 2001; Bowden et al., 2002; Shahin et al., 2004) a self-organising map (SOM) may be applied to this aim. The SOM is a data-driven classification method based on unsupervised artificial neural networks that may be applied for several clustering purposes (for hydrological applications see, for example, Minns and Hall, 2005; Kalteh et al., 2008).

Mean value (red dash) and the bars comprised between the 90th and 10th percentiles of the resulting training, cross-validation and testing sets for each of the three input variable (PC1, PC2 and PC3).

In the recent years, SOMs were also successfully applied for catchments classification either based on geo-morpho-climatic descriptors (Hall and Minns, 1999; Hall et al., 2002; Srinivas et al., 2008; Di Prinzio et al., 2011) or based on hydrological signatures (Chang et al., 2008; Ley et al., 2011; Toth, 2013); however, it is important to underline that the clustering is not carried out here in order to identify a pooling group of similar catchments for developing a region-specific model, but for the optimal division of the available data for the parameterisation and independent testing of a single model to be applied over the entire study area.

The SOM is in fact used to cluster similar data records together: an equal number of data records is then sampled from each cluster, ensuring that records from each class (that is catchments with different features) are represented in the training, validation and test sets, which, as a result, have similar statistical properties (Bowden et al., 2002; Shahin et al., 2004).

A SOM (Kohonen, 1997) organises input data through non-linear techniques depending on their similarity. It is formed by two layers: the input layer contains one node (neuron) for each variable in the data set. The output-layer nodes are connected to every input through adjustable weights, whose values are identified with an iterative training procedure. The relation is of the competitive type, matching each input vector with only one neuron in the output layer, through the comparison of the presented input pattern with each of the SOM neuron weight vectors, on the basis of a distance measure (here the Euclidean one). In the trained (calibrated) SOM, all input vectors that activate the same output node belong to the same class.

In the present application, the dimension of the input layer is equal to 3 (that is, the first three principal components of the catchments descriptors); as far as the output layer is concerned, there is not a predefined number of classes; a parsimonious output was chosen that is formed by three nodes in a row, each one corresponding to a call, to ensure the resulting sets were not too dissimilar.

The three resulting clusters are respectively formed by 121, 70 and 76 catchments; each cluster is then divided into three parts, and one-third is assigned to the training, validation and test sets respectively. Overall, the training, validation and test sets are therefore equally numerous (91, 88 and 88 records respectively) and formed by the same proportion of catchments belonging to each of the clusters, having eventually a similar information content, as shown by the similar statistics of the three variables in the three sets represented in Fig. 2.

Artificial neural networks are massively parallel and distributed information processing systems, composed of nodes, arranged in layers, which are able to infer a non-linear input–output relationship. ANN, in particular feedforward networks, have been widely used in many hydrological applications (see for example the recent review papers by Maier et al., 2010 and by Abrahart et al., 2012) and the readers may refer to the abundant literature for details on their characteristics and implementation.

Three different layer types can be distinguished: input layer (connecting the input information), one or more hidden layers (for intermediate computations) and an output layer (producing the final output); adjacent layers are connected through multiplicative weights and, in each node, the sum of weighted inputs and a threshold (called bias) is passed through a non-linear function known as an activation.

The models applied here are networks formed by one hidden layer, with tan-sigmoid activation functions, and a single output node (corresponding to the estimated flood with 2-year return period), with a linear activation function.

The identification of the network's weights and biases (called training procedure) is carried out with a non-linear optimisation, searching the minimum of an error (or learning) function measuring the discrepancy between predicted and observed values, and feedforward networks are generally trained with a learning algorithm known as backpropagation (Rumelhart et al., 1994), based on the steepest descent or on more efficient quasi-newton methods.

In order to avoid overfitting, which degrades the generalisation ability of the model, the early stopping or optimal stopping procedure was applied (see, for example, Coulibaly et al., 2000). For applying early stopping, the available data have been divided into three disjoint subsets with a similar information content, as described in Sect. 3.2: a training set, an early stopping validation set and a test set. While the network is parameterised minimising the error function on the training set, the error function on the early stopping validation set is also monitored; if the error function on such second set increases continuously for a specific number of iterations, this is a sign of overfitting of the training set: the training is then stopped and network parameters at the lowest validation error are returned. The third set (test set) is not used in any way during the parameterisation phase, but it is used for out-of-sample, independent evaluation of the resulting models.

Neural networks, including those applied in the recent hydrological
literature for the estimation of index floods or flood quantiles at ungauged
sites, are traditionally trained minimising the square error function, which
is symmetrical about the

In the present work, the results obtained by a network trained with a conventional square error function are compared with those obtained when parameterising the network through the minimisation of an asymmetric loss function, which takes into account both over and underestimation discrepancies but penalises more the overprediction errors, since the consequences of missing alarms are more severe than those of false alarms.

For both type of models, the output values (2-year flood values) are
rescaled as a function of the overall minimum and maximum values to the
[

The first implemented model is obtained through the minimisation of the traditional, symmetric mean squared error, applying the quasi-Newton Levenberg–Marquardt backpropagation algorithm (Hagan and Menhaj, 1994), widely applied and regarded as one of the most efficient neural network training algorithms.

The input variables are the first three principal components of the catchment descriptors, so the input layer is formed by three nodes; the output node corresponds to the estimated flood with 2-year return period; as far as the dimension of the hidden layer is concerned, there is, unfortunately, no definitive established methodology for its determination because the optimal network architecture is highly problem-dependent. Different architectures with a number of hidden nodes varying from 2 to 6 were set up: the mean squared error of the estimates over the third, independent set resulted the minimum one with the model having three hidden nodes.

Architecture of the chosen network, with three input nodes, three hidden nodes and 1 output node.

The architecture with three input nodes, three hidden nodes and 1 output node, represented in Fig. 3, is therefore the network finally chosen; the network parameterised minimising the symmetric mean square error function will be denoted as ANN-Symm, and in Sect. 5 its results will be compared with those of the asymmetric models having the same architecture but parameterised with a different error function.

The quad–quad loss function described in Sect. 2 is here applied for
calibrating the network parameters of the asymmetric models. The learning
function to be minimised is therefore the average value of the double
quadratic errors (mean quad–quad error, MQQE), obtainable averaging the

For this reason, in the present application, different asymmetric networks,
with

The training of the four asymmetric networks, based on the minimisation of the MQQE, is carried out through the generalisation of the backpropagation algorithm proposed by Crone (2002) and applied by Silva et al. (2010), which may be used for parameterising artificial neural networks with any differentiable (analytically or numerically) error function.

As described in Sect. 4.2, the neural networks are trained over the
standardised (rescaled) output values of the training and cross-validation
sets and they are successively used for predicting the output over the
independent test set: such ANN output values are then scaled back, obtaining
the predictions

The performances of the models are therefore evaluated through a set of
indexes that describe the prediction error (

The following error statistics have been computed: MAE (mean absolute error)

In order to keep into account the differences in sign of the errors,
representing the extent of overpredictions as compared to underpredictions,
the overall percentage of positive errors (Over %) is computed: Over % (percentage of overestimates)

Parallel box plots of the errors (

Therefore, the extent of overestimation is also evaluated through the number of high errors, keeping into account only the more relevant, and therefore potentially more dangerous, overpredictions. An estimate that is more than 30 % higher than the
corresponding target value was considered here as high
overprediction: OverH % (percentage of high overprediction errors)

On the other hand, even if – as discussed – generally less crucial in terms of consequences, the number of high underestimation errors should also be monitored, since excessively low values imply the tendency of the model to establish thresholds leading to the issuance of too many false alarms.

UnderH % (percentage of high underprediction errors):

The results may be evaluated also through the scatter plots of predicted
(

Scatter plots of the predicted (

The box plot (Fig. 4) allows to visually assess both the accuracy and the tendency to over/underestimate of the models: the boxes should be compact and close to the dotted line representing zero error but at the same time it is better if the data lie below such a line, thus indicating that the method does not tend to overpredict the thresholds and the warning system is therefore less subject to miss a potentially dangerous flood.

It may be seen in Fig. 4 that for the network that was trained minimising the traditional square error (ANN-Symm), the box and whiskers are centred on the zero-error line and the quantiles (top/bottom of the box, top/bottom whiskers) are at a similar distance from such a line, showing that the errors are equally distributed among overestimation and underestimations. The box is compact, demonstrating the good accurateness of the method for a substantial part of the test set, but, due to the symmetric disposition of the errors, many overestimation errors, also remarkably high, are issued, as shown by the position of the upper whisker.

Analysing Table 2, the relatively good accuracy of the ANN-Symm model is demonstrated by the values of the MAE and RMSE, which are the lowest among the implemented models. The symmetric distribution of the overall errors is shown by an Over % close to 50 % and the similar values of the OverH % (34 %) and UnderH % (32 %) confirm that also the high relative errors are equally split among over and underestimates.

Such results were expected since the training is based on a symmetric loss function, but the consequence is that the ANN-Symm model issues a remarkable number of significant overprediction errors, in fact for about one-third of the test catchments, the estimates are more than 30 % higher than the observations.

The analysis of Table 2 shows that the asymmetrically trained networks tend,
for decreasing

At the same time, and more importantly, the number of positive
(overestimation) errors larger than 30 % substantially decreases with

Conversely, as expected, the more asymmetric the network, the higher the underprediction errors, as shown by the values of UnderH %: the number of significant negative errors gradually increases from one-third up to 47 % of the total.

Goodness-of-fit criteria of the 2-year floods estimates obtained by the symmetric and asymmetric networks on the independent test set of catchments.

Also the accuracy (given by the total amount of the discrepancies independently of their sign) deteriorates when the asymmetry is more pronounced, but the drop is moderate and the RMSE and MAE values are not so far from those of the ANN-Symm network.

Looking at the parallel box plots (Fig. 4), it may be seen that the boxes
become less compact and, as expected, their position shifts downwards with
increasing asymmetry. The length of the upper whiskers substantially
decrease with

It may be noted, in particular from the scatter plots (Fig. 5), that, for both symmetric and asymmetric models, the errors are not negligible: this is due to the shortcomings of the available data set but mainly to the intrinsic limitations of a regional approach applied to the extreme variability of the study area. As already underlined in Sect. 3.1, the national data set lacks important information that may help to characterise the hydrological behaviour and the phenomena governing formation of extreme flows. On top of the unavoidable risk of erroneous data, the absence in the database of additional influences certainly further hampers the possibility to obtain a reliable relationship with the flood quantiles. Most importantly, the data set covers the entire Italian Peninsula, characterised by extremely different hydro-climatic settings (from Alpine to Mediterranean ones) and this high heterogeneity is certainly an additional reason that limits the performance.

Notwithstanding the limitations of the data set, which equally affect all the proposed models, the results demonstrate that the use of the double quadratic error function, even if at the expense of more substantial underestimation errors, can substantially decrease the number and extent of overestimation errors, if compared to the use of the traditional square errors.

In the application to a specific cross section, the degree of asymmetry might be identified as proportional to the risk averseness of the situation: when the impact of false alarms is, at least comparatively, small, the decision-makers are reluctant to the consequences (economic and social) of a flood and, rather than risking a missed alarm, can accept many cases of false alarm with the associated costs.

A crucial issue in the operation of flood forecasting/warning systems at ungauged locations is how to assess the possible impacts of the forecasted flows, i.e. the identification of streamflow values that may actually cause flooding, to be associated with thresholds that trigger the issuance of flood watches and warnings. The values that may produce damaging conditions (or flooding flows), when in absence of detailed local information on each cross section, are in many parts of the world estimated as the peak floods having a certain return period, often 2 years, which is generally associated with the bankfull discharge.

For locations where the gauges are new or where historical rating curves are not available, the series of past annual flow maxima are absent or very limited, and the peak flow of given frequency to be associated with the watch/warning threshold can be estimated with regionally derived empirical relationships, such as those that may be applied for the estimation of the index flood at ungauged sites. Such regression-like methods consist in a relation between a set of catchment descriptors that may also be obtained for ungauged sites and the desired flood quantile; linear or power forms are the most commonly used functions, but recent studies have successfully applied artificial neural network models, due to their flexibility, to flood quantile and index flood estimation.

Whatever the function form, such models are generally parameterised by minimising the mean square error, which assigns equal weight to overprediction or underprediction errors, whereas, instead, the consequences of such errors are extremely different when the estimates are to be used as warning threshold. In fact, false alarms (due to an underprediction of the warning threshold) generally have a much higher level of acceptance than misses (which would derive from an overestimated threshold).

For this reason, in the present work, the regression model (a feedforward neural network) is parameterised minimising an asymmetric error function (of the double quadratic type) that penalises more the overestimation than the underestimation discrepancies. The predictions of models with increasing degree of asymmetry are compared with those of a traditional (trained on the symmetric mean of square errors) neural network, in a rigorous cross-validation experiment referred to as a database of catchments covering the entire country of Italy.

The results confirm, as expected, that the more asymmetric the network, the more numerous and higher the underprediction errors, and the less numerous and less severe the overestimation errors. As also expectable, the symmetric accuracy decreases when the asymmetry is more pronounced, but the drop is moderate and the RMSE and MAE values are not so far from those of the traditionally trained network.

Undoubtedly, the nature of the regional approach, as well as the
shortcomings of the data set and the extreme heterogeneity of the study area,
generate errors much greater than those obtainable with detailed local
studies. On the other hand, where no alternatives exist, the proposed
methodology may provide a preliminary estimate of the threshold runoff that
do not overestimate the actual

Notwithstanding the acknowledged limitations of the data set, which affect equally all the proposed models, the analysis shows that the use of the asymmetric error function substantially reduces the number and extent of overestimation errors, if compared to the use of the traditional square errors. Of course such reduction is at the expense of increasing underestimation errors, but the overall precision is still acceptable and the study highlights the potential benefit of choosing an asymmetric error function when the consequences of missed alarms are more severe than those of false alarms.

Minimising the asymmetric error function has the purpose of optimising the threshold from an operational point of view, in a deterministic framework: future analyses may be devoted to investigate the uncertainty of the issued predictions, since a probabilistic approach (provided that the methodology is able to include all sources of uncertainty and its quality may be objectively assessed) may provide very valuable insights for a more complete evaluation of the model, supplementing the information provided by point-value predictions.

It is important to highlight that the asymmetric error function is used, in this study, to parameterise a neural network, but of course it might be used to optimise any other model or equation, when aiming to obtain conservative estimates, for safety reasons.

The appropriate degree of asymmetry might be identified depending on the risk averseness of the specific flood-prone context. The quantification of risk aversion is extremely difficult and case specific and it should keep into account that the perception of society may be very different from a technical appraisal of the involved costs. In addition, it should also include indirect, intangible and long-term impacts. More research on the societal perception in different contexts would greatly improve the process of risk-based decision-making (Merz et al., 2009), including the choices concerning flood-warning thresholds. Hopefully, in the next years, a more direct collaboration between the hydrologic and socio-economic research communities, as advocated in the new Panta Rhei science initiative (Montanari et al., 2013; Javelle et al., 2014), in particular with regard to data-driven modelling (Mount et al., 2016), will provide a progress in this direction.

The author thanks Stacey Archfield and the anonymous referees for their constructive suggestions and Monica Di Prinzio and Attilio Castellarin for the elaboration of the data set carried out in 2011 within the Italian National Programme CUBIST.

The present work was developed within the framework of the Panta Rhei Research Initiative of the International Association of Hydrological Sciences (IAHS), Working Group on “Data-driven hydrology”. Edited by: S. Archfield