The Nash–Sutcliffe efficiency (NSE) is a widely used score in hydrology, but it is not common in the other environmental sciences. One of the reasons for its unpopularity is that its scientific meaning is somehow unclear in the literature. This study attempts to establish a solid foundation for the NSE from the viewpoint of signal progressing. Thus, a simulation is viewed as a received signal containing a wanted signal (observations) contaminated by an unwanted signal (noise). This view underlines an important role of the error model between simulations and observations.

By assuming an additive error model, it is easy to point out that the NSE is
equivalent to an important quantity in signal processing: the
signal-to-noise ratio. Moreover, the NSE and the Kling–Gupta efficiency (KGE)
are shown to be equivalent, at least when there are no biases, in the sense
that they measure the relative magnitude of the power of noise to the power
of the variation in observations. The scientific meaning of the NSE suggests a
natural way to define

In the general cases, when the additive error model is replaced by a mixed additive–multiplicative error model, the traditional NSE is shown to be prone to contradiction in model evaluations. Therefore, an extension of the NSE is derived, which only requires one to divide the traditional noise-to-signal ratio by the multiplicative bias. This has a practical implication: if the multiplicative bias is not considered, the traditional NSE and KGE underestimate or overestimate the generalized NSE and KGE when the multiplicative bias is greater or smaller than one, respectively. In particular, the observed mean turns out to be the worst simulation from the viewpoint of the generalized NSE.

In hydrology, the Nash–Sutcliffe efficiency (NSE) is one of the most widely used similarity measures for calibration, model comparison, and verification (ASCE, 1993; Legates and McCabe, 1999; Moriasi et al., 2007; Pushpalatha et al., 2012; Todini and Biondi, 2017). However, Schaefli and Gupta (2007) pointed out the noticeable fact that the NSE is not commonly used in other environmental science fields, despite the fact that calibration, model comparison, and verification are also employed in such fields. Does this mean that the NSE is a special metric that is only relevant for hydrological processes? If this is not the case, what causes this limited use outside of hydrology? One of the reasons for the limited use of the NSE outside of hydrology can be traced back to the lack of a consensual scientific meaning in the literature.

The NSE was first proposed by Nash and Sutcliffe (1970), who approached
calibration from a linear regression viewpoint (Murphy et al., 1989).

A similar efficiency was introduced in Ding (1974),
4 years after the
introduction of the NSE. We call this efficiency the Nash–Ding efficiency (NDE):

Identifying the NSE as

However, when applied to forecast verification, in which simulations are
replaced by forecasts, the special choice of

In recent years, starting with the work of Gupta and Kling (2011), the NSE has been
recognized as a compromise between different criteria that measures overall
performance by combining different scores for means, variances, and
correlations. The decomposed form of the NSE in terms of the correlation

One of weak points of the multiple-criteria viewpoint is that it explains the elegant form (Eq. 1) using the unintuitive form (Eq. 5). We suspect that a more profound explanation for the elegant form (Eq. 1) exists that also gives us the scientific meaning of the NSE. In pursuing this explanation, we will come back to the insight of Nash and Sutcliffe (1970) when they first proposed the NSE as a measure. This insight was expressed clearly in Moriasi et al. (2007), who understood the NSE as the relative magnitude of the variances in noise and the variances in informative signals. This encouraged us to approach the NSE from the perspective of signal processing. We will show that the NSE is indeed a well-known quantity in signal processing.

This paper is organized as follows. In Sect. 2, we revisit the traditional NSE from the viewpoint of signal processing of simulations and observations. In practice, the nature and behavior of the NSE can only be established with an additive error model imposing on simulations and observations. As the additive error model implies that the variances in simulations are greater than variances in observations, Sect. 3 extends the error model from Sect. 2 by introducing multiplicative biases in addition to additive biases in order to cover other cases. An extension of the NSE in these general cases is then derived. Finally, Sect. 4 summarizes the main findings of this study and discusses some implications of using the NSE in practice.

From now on, we will consider simulations and observations from the
perspective of signal processing. According to this view, observations form
a desired signal that we wish to faithfully reproduce whenever we run a
model to simulate such observations. This simulation introduces another
signal known as the received signal in signal processing, and it is assumed to
be the wanted signal (the observations) contaminated by a certain unwanted
signal (noise). This means that we will have a good simulation whenever
model errors, as represented by the noise, are small. In this section, we
assume a simple additive error model for simulations:

Using the error model shown in Eq. (7), it is easy to calculate two expectations in the
formula of the NSE,

In order to examine the relationship between the NSE and SNR, we note that the
error model shown in Eq. (7) is preserved in the translations

Because the reciprocal of

This new interpretation of the NSE has two important implications for the use of the NSE in practice. Firstly, note that the NSR depends not only on the power of noise but also on the power of the signals under consideration. Thus, the NSE should not be used as a performance measure when comparing two different kinds of signals. We may commit a possibly erroneous assessment by considering that our model is better for flow regime A than for flow regime B, when this may be the consequence of the simple fact that the signals in case A are stronger than those in case B. From its mathematical form, it is clear that the NSR favors high-power signals (i.e., strong signals always result in a small NSR); therefore, it is easy to get high NSE values for strong signals. Such NSE values may be wrongly identified as an indicator of good performance, resulting in misleading evaluations of model performance.

Secondly, as a ratio of the power of noise to the power of the variation in
observations, the

Recall that the NSE is invariant in the translations along the vector

In order to avoid the abovementioned misjudgment, it is desirable to have a score
that is invariant in any translation. From Eq. (16), it is easy to see that
the bias term causes the NSE to vary with different displacements of

Similar to

Joint probability distributions of simulations and observations
with different values of

In the previous section, we showed that the four variables

The upper bounds of the NSE, NDE, and KGE as functions of

Firstly, the three scores are monotonic functions of

As we can make any new score by simply assigning any monotonic function
of

Secondly, in practice, the choice of an appropriate score can be determined
by its magnitude and sensitivity. In this sense, Fig. 2 explains why
modelers tend to favor the KGE in practice. This is because the

Thirdly, the smaller the correlation, the more sensitive the NSE and KGE. This
is the consequence of the non-linear dependence of

Finally, at the threshold

Similar to the threshold of

The threshold of

In order to extend the additive error model to the general cases, we first
note that the error model shown in Eq. (7) indeed gives us the conditional distribution
of simulations on observations. As all of the information on simulations and
observations is encapsulated in their joint probability distribution, we can
seek the general form of this conditional distribution from their joint
distribution in the general cases. For this purpose, we will assume that
this joint probability distribution is a bivariate normal distribution:

It is worth noticing that the nature and behavior of the NSE in Sect. 2 is
constructed solely relying on the additive error model without any
assumption on the joint probability distribution of

We show a further argument for the irrelevance of the traditional NSE in
the error model shown in Eq. (33) by proving that the NSE (Eq. 1) is not invariant in the
translations that preserve the error model in Eq. (33). In the case of the error
model shown in Eq. (7), we have shown that this additive error model is preserved in
the translations

In order to seek an appropriate form of the NSE in the general cases, we rely on
the nature and behavior of the traditional NSE examined in Sect. 2 by
imposing three conditions on the generalized NSE: (1) it measures the noise
level in simulations; (2) it is invariant in the translations

In Sect. 1, we noted that the decomposed form (Eq. 5) of the NSE is relatively unintuitive, even though it is derived from the elegant form (Eq. 1). From Sect. 3.1, we know that Eq. (1) is indeed only relevant in the additive error model shown in Eq. (7). It becomes irrelevant when multiplicative biases are introduced into Eq. (7). Therefore, if we continue to use the traditional NSE in the general cases, an unintuitive form of the NSE will be expected, as verified by Eq. (5). The appropriate NSE in such cases is the generalized NSE (Eq. 45).

What is the scientific meaning of the generalized NSE (Eq. 45)? Clearly, it
measures the relative magnitude of the power of noise to the power of the
variation in observations when the multiplicative factor is removed. Thus,
similar to the traditional NSE, the NSE value of zero still marks the
threshold between good and bad simulations. It also attains a maximum equal to
one when models do not have additive biases and random errors. However, a subtle difference exists in the general cases: the perfect score

We now prove a surprising result: the upper bound of the NSE in the general
cases is the same as in the cases of the additive error model, which is
given by Eq. (25). By making use of the two identities obtained with Eqs. (34) and (35) in Eq. (44), we
have

Joint probability distributions of simulations and observations
with the same

A further simple argument will show why the noise
levels are the same in Fig. 3. Let us consider a simulation

As the upper bound of the generalized NSE is invariant when we introduce
multiplicative biases into the additive error model (Eq. 7), all conclusions in
Sect. 2.3 still hold. Thus, it is legitimate to use the upper bounds of the NDE
and KGE, as expressed by Eqs. (23) and (26), respectively, in the general cases. This
implies that the values

In order to check the work of the generalized versions of the NSE, NDE, and KGE,
we re-evaluate the performance of the two simulations in Eqs. (36) and (37). In the
case of

With the generalized NSE, it is now possible to deal with the benchmark
model

In order to assign an appropriate value of the NSE for the cases

In order to clarify the aforementioned sophisticated problem, we summarize our
arguments as follows:

In the perspective of signal processing, the additive error model cannot
deal with the benchmark model

In the additive error model,

The mixed additive–multiplicative model enables us to interpret the case of
the observed mean when the multiplicative bias

However, the traditional NSE is not robust to multiplicative biases. When we design a new score robust to multiplicative biases, the observed mean should be interpreted as the worst simulation which gives us no information on observation variability.

Although the observed mean can be easily obtained in hydrological model calibration and seems to be reasonable as a benchmark, it makes no sense to choose the observed mean as a benchmark simulation from the signal-processing viewpoint of the NSE.

The Nash–Sutcliffe efficiency (NSE) is a widely used score in hydrology, but it is not common in the other environmental sciences. One of the reasons for its unpopularity is that its scientific meaning is somehow unclear in the literature. Many attempts to establish a solid foundation for the NSE from several viewpoints, such as linear regression, skill scores, and multiple-criteria scores, exist This study contributes to these previous works by approaching the NSE from the viewpoint of signal progressing. Thus, a simulation is viewed as a received signal containing a wanted signal (observations) contaminated by an unwanted signal (noise). This view underlines the important role of the error model between simulations and observations, which is usually implicit in our assumption. Thus, our approach follows Bayesian inference, in which an error model is formally defined and a goodness-of-fit measure is then derived (Mantovan and Todini, 2006; Vrugt et al., 2008). The rational is to avoid the use of the NSE as a predefined measure without an explicit error model, like in generalized likelihood uncertainty estimation (Beven and Binley, 1992), which has caused a long debate in the hydrology community (Mantovan and Todini, 2006; Stedinger et al., 2008).

By assuming an additive error model, it is easy to point out that the NSE is
equivalent to an important quantity in signal processing: the
signal-to-noise ratio. More precisely, the NSE measures the relative magnitude
of the power of noise to the power of the variation in observations. Therefore,
the NSE is a universal metric that should be applicable in any scientific
field. However, due to its dependence on the power of the variation in
observations, the NSE should not be used as a performance measure to compare
different signals. Its scientific meaning suggests a natural way to choose

As the NSE can be easily increased simply by adding appropriate constants to
simulations and observations, we seek its upper bound

As the additive error model cannot describe the simulations that have
variances smaller than the observation variances, we need to work with a more
general error model to deal with such cases. By assuming a bivariate normal
distribution between simulations and observations, the general error model
is found to be the mixed additive–multiplicative error model. In the
general cases, the traditional NSE is shown to be prone to contradictions: different evaluations of model performance can be drawn from a
simulation by just scaling this simulation. Therefore, an extension of the NSE
needs to be derived. By requiring that the generalized NSE is invariant in
affine transformations of simulations and observations induced by the
general error model, which helps to avoid any contradiction, the most
appropriate form is found to be the traditional one adjusted by the
multiplicative bias. Again, this has a practical implication on the use of
the NSE and KGE: if the multiplicative factor is not taken into account and the
traditional ones are used instead, both the scores are
underestimated or overestimated when the multiplicative bias is greater than or smaller
than one, respectively. The threshold values of

Finally, we summarize some profound explanations that the signal
processing approach to the NSE proposes:

Despite their different forms, the NSE, NDE, KGE, and the correlation coefficient are equivalent, at least when there are no biases, in the sense that they measure the noise-to-signal ratio between the power of noise and the power of the variation in observations.

The threshold

Furthermore, the signal-processing-based approach seamlessly enables us to
derive the corresponding thresholds for other scores (like the NDE and KGE) in the
same manner, a problem which is not well defined if the benchmark approach
is still followed. Corresponding to

The traditional form of the NSE only reflects the noise-to-signal ratio in the additive error model. It no longer reflects this when multiplicative biases are introduced; as a result, it has an unintuitive form in the general cases.

It is necessary to adjust the traditional NSE in the general cases to avoid potential contradictions in model evaluations. If the effect of multiplicative biases on the noise-to-signal ratio is not considered and the traditional NSE continues to be used, the NSE is underestimated or overestimated when the multiplicative bias is greater than or smaller than one, respectively.

All simulations that are uncorrelated with observations are considered to be the worst
simulations when measured by the NSE or KGE, as no information on the variation
in observations can be retrieved in these cases. The constant simulation
given by the observed mean

The source codes used in this study are available at

No datasets were generated or analyzed during the current study.

LD raised the idea and prepared the manuscript. The idea was further developed during discussions. YS corrected the treatment of the NSE and KGE in hydrology and revised the manuscript.

The contact author has declared that neither of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has been supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) within the framework of the Program for Promoting Researches on the Supercomputer Fugaku “Large ensemble atmospheric and environmental prediction for disaster prevention and mitigation” project (grant nos. hp200128, hp210166, and hp220167); the Foundation of River & Basin Integrated Communications (FRICS); and the Japan Science and Technology Agency “Moonshot R&D” project (grant no. JPMJMS2281).

This research has been supported by the Ministry of Education, Culture, Sports, Science and Technology (grant no. hp220167) and the Japan Science and Technology Agency (grant no. JPMJMS2282).

This paper was edited by Roger Moussa and reviewed by two anonymous referees.