When working with hydrological data, the ability to quantify the similarity of different datasets is useful. The choice of how to make this quantification has a direct influence on the results, with different measures of similarity emphasising particular sources of error (for example, errors in amplitude as opposed to displacements in time and/or space). The Wasserstein distance considers the similarity of mass distributions through a transport lens. In a hydrological context, it measures the “effort” required to rearrange one distribution of water into the other. While being more broadly applicable, particular interest is paid to hydrographs in this work. The Wasserstein distance is adapted for working with hydrographs in two different ways and tested in a calibration and “averaging” of a hydrograph context. This alternative definition of fit is shown to be successful in accounting for timing errors due to imprecise rainfall measurements. The averaging of an ensemble of hydrographs is shown to be suitable when differences among the members are in peak shape and timing but not in total peak volume, where the traditional mean works well.

A fundamental aspect of hydrology is understanding the distribution and movement of water through the Earth system. It is therefore necessary to quantify the similarity of a pair of spatial (e.g. precipitation fields) or temporal (e.g. hydrographs) distributions of water. The goal of this quantification may simply be to gauge the semblance of the distributions. In other more complex cases, it may act as an objective function for parameter estimation, model calibration, or data assimilation, where the goal is to minimise the discrepancy between a model output and observed data. Again still, we may be interested in the “average” distribution of water among an ensemble. Each of these varied tasks is unified in the requirement of some measure of similarity between pairs of distributions, with the characteristics of this choice important for the quality of the result.

Such a quantification varies in nomenclature according to discipline and purpose, but some common terms are “objective”, “response”, or “misfit function”. While it is abundantly clear that there is a great need for comparative measures in hydrology, there remains ambiguity in selecting that which best quantifies discrepancy for any given application. In this selection process, we are led back to the more fundamental query:

Two hydrographs (

Synthetic example of the comparison of modelled (purple) and observed (green) spatial fields in panel

A commonly used metric for comparing both hydrographs and spatial fields is the root mean square error (RMSE), given by Eq. (

The cause of the large residuals in Fig.

In a more general, multi-variate setting, an analogous issue can occur. The misalignment of features displayed in Fig.

This poor quantification of fit under the influence of the displacement of features has led to a range of other “scores” being used when validating forecasts from numerical weather predictions (see for example

Root mean square error

An inadequate measure of fit can also lead to spurious local minima in the parameter estimation objective function, making local optimisation algorithms ineffectual

Whether the misfit is unsuitable due to incorrect quantification of timing errors (Fig.

While briefly addressed in

Much like techniques considered in

In this work, rather than detailing highly specific applications with complex models, we instead introduce methods that are suited for a range of tasks in (but not limited to) hydrology. The main goal is to highlight some of the key concepts of optimal transport and how they may find use in hydrology via simple, illustrative examples. The limitations of its use are also discussed and where further research is required. We also limit our experiments to one-dimensional problems, in particular the comparison of hydrographs. However, Sect.

Optimal transport (OT) provides the numerical machinery for interrelating density functions. As distributions of water can be well represented as a density function with appropriate scaling, we suggest that OT gives the required tools for making more satisfactory comparisons than can be achieved by point-wise misfit functions.

Before the more modern interpretation with regards to density functions, OT originated with the work of

Little progress was made on generalised solution methods for this problem until it was rephrased by

The central aspect of the

Panel

Key restrictions for transport are mass conservation, meaning the source and target must have equal mass, and non-negativity, meaning that there cannot be negative mass at any point in either. These properties have led OT to be framed as between probability densities, as these naturally obey both requirements. If we denote the source as

With the overarching goal of transporting

Comparison of RMSE and squared 2-Wasserstein distance for the double-Gaussian problem displayed in Fig.

Once found, the optimal transport map can be used to define the

Armed with the Wasserstein distance, we can, for example, gauge the similarity of two hydrographs or a pair of precipitation fields. Again, note the key property that this is a

These properties with respect to displacements have encouraged application in atmospheric chemistry and seismology alike

Beyond a straightforward measure of fit, the Wasserstein distance can also be used to define an “average” of density functions, known as the

Two Gaussian density functions (blue, red,

Taking the arithmetic mean at each point of the two distributions gives a poor intermediate representation. Both of the densities have a single peak, but the intermediate one in Fig.

Now that the favourable properties of the Wasserstein distance for mass transport problems have been explored, we must discuss the computational methods. This aspect is perhaps the greatest difficulty in large-scale implementation. There are a variety of available techniques, some of which have been explored previously in similar applications, whilst others are a source of future discussion. A brief survey follows, with further details given in the associated references, many of which have unexplored potential for hydrological applications.

It was shown by

The more commonly used approach, especially for machine learning and graphics applications, is to use the Kantorovich formulation and thus consider the source and target densities to be discrete probability distributions, for which a linear program can be solved to obtain a transport plan (a generalised version of the transport map). While this is effective for a broader range of problems than the Monge–Ampère partial differential equation (PDE) that is specific to the squared Euclidean distance cost function, solving the linear program is computationally expensive and scales poorly with the size of the problem. This burden can be somewhat alleviated using the entropically regularised approach pioneered by

There appears to be a bias in the literature towards discrete approaches, seemingly due to many machine learning problems not having a continuous interpretation, leaving the Monge–Ampère and related continuous approaches meaningless. We therefore should not be discouraged from using continuous methods for problems in the Earth sciences with spatial or temporal data, despite the relative lack of previous applications in the present literature. Indeed, we can see previous success of these methods in

The method used in this work is structured around the special case of transport in one dimension, which accommodates a highly computationally efficient solution. When the source density

Process of moving from densities to 2-Wasserstein distance in one spatial/temporal dimension

In this study, we use the Nelder–Mead algorithm for optimisation, so the gradient of the Wasserstein distance is not required. However, if a gradient-based method were to be used, the derivative for the one-dimensional case can be derived directly from Eq. (

Much like the Wasserstein distance, finding the Wasserstein barycentre of a series of densities is vastly simplified in one dimension. Rather than performing some type of optimisation scheme to minimise Eq. (

Overall, the Wasserstein barycentre definition of the average is equivalent to the histogram interpolation of

Method of finding the barycentre (solid black) from five Gaussian densities (dashed). Panel

Again, it should be made clear that the computational results discussed here apply

OT is defined only for probability densities. For the applications we envisage, the non-negativity requirement will naturally be obeyed, as we are measuring inherently positive masses of water. The restriction of unit mass requires a little further consideration. We could take one of two differing philosophies: modify the data so they are compatible with OT or redefine OT such that it works with the type of data used. Here, we will consider both. Firstly, we can scale the water distributions such that they have unit mass and

Visual representation of computing the hydrograph–Wasserstein distance. Panel

We will now take the alternative approach of making a modification to OT to suit our hydrological purposes. The proposed result is derived for one-dimensional data but could potentially be extended to the multi-variate case in the same manner as the radon Wasserstein distance

In a transport sense, we can see from Fig.

Now that the mapping and form of the inverse cumulative distributions have been proposed, the modified Wasserstein distance under the interpretation that it is the total transport cost can be defined according to Eq. (

While Eq. (

The adjustment of barycentres for distributions of arbitrary total mass is much simpler. We can use the scaling defined in Eq. (

As a first application, the calibration of conceptual rainfall–runoff models using the Wasserstein distance as a minimisation objective was considered. This experiment is similar to that found in

A conceptual rainfall–runoff model gauges the relationship between a rainfall time series (hyetograph) and streamflow time series (hydrograph) for a particular watershed. By calibrating the model parameters to best capture this relationship, the model can be used for forecasting future streamflow under chosen rainfall conditions or to infer properties of the watershed itself. Automatic calibration methods seek to do this by minimising the discrepancy between the simulated streamflow from the rainfall–runoff model and the streamflow that was observed at gauging stations

As hydrographs are a time series, the efficient one-dimensional techniques described in Sect.

As was used by

As precipitation is a more uncertain measurement than streamflow, we subjected the synthetic rainfall event to timing errors, recalling that these types of errors are poorly represented when using point-wise misfit metrics

True rainfall (solid blue) with timing errors in observations (dashed blue) and the observed hydrograph (black) produced from the true rainfall and storage model described by Eqs. (

We then calibrated the three model parameters to this streamflow using the erroneous rainfall measurements with a variety of misfit functions. Note that in the case of error-less data and a unique solution, all misfit functions discussed here would have a global minimum of 0 at the true model parameters. Optimisation was performed with the RMSE, Wasserstein distance (

Misfit surface as a function of

The same display as Fig.

The same display as Fig.

The Wasserstein-based distances provided a misfit minimum closer to the true model parameters than the RMSE. A greater understanding of this can be garnered by examining the hydrographs produced by the calibrated model (Fig.

Output of the rainfall–runoff model for calibrated parameters under each objective function. Observations displayed in black and simulation in red.

The Wasserstein-based distances better recovered the model parameters and visually give a better hydrograph fit. The RMSE calibration underestimated the maximum flow peaks in favour of more sustained lowered flows. The reason for this can be traced back to Fig.

Of course, this only shows the efficacy of the Wasserstein distance for one particular rainfall event. Random synthetic rainfall events were therefore generated using the method described in Appendix

Calibrated model parameters for 500 rainfall events generated randomly using the method in Appendix

Whilst still having some variability and bias, the Wasserstein-based distances performed significantly better across the 500 trials, with a median result closer to the true value for all parameters and lowered variability. The ordinary Wasserstein distance outperformed the hydrograph–Wasserstein distance for this model. The size of the penalty-weighting factor

The better performance of the penalised Wasserstein distance compared to the hydrograph–Wasserstein distance may be due to the form of this model. As

Subsequently, there is no “trade-off” between lowering the Wasserstein distance or penalty term; both are always possible as the model parameters only influence both if mass is pushed outside the time window (hence there being

If there were model parameters that influenced both shape

Section

The effective rainfall time series was generated using the method described in Appendix

Effective rainfall (top, blue) and observed streamflow (bottom, black) for the test problem. Rainfall was generated using the method described in Appendix

The behaviours of both the Wasserstein distance and the RMSE were explored here with respect to the time delays induced by the IUH model. The misfit as a function of the two model parameters is shown in the left-hand-side panels of Fig.

Root mean square error and 2-Wasserstein (with

The key difference between the misfit surfaces in Fig.

From Fig.

Two hydrographs for the same rainfall event but with different hydrological models (

We do acknowledge that it takes a relatively poor initial estimate of the model parameters to become trapped in a local minimum for this IUH model. Indeed, with only two model parameters, a brute force global optimisation method is certainly feasible. Furthermore, a good initial estimate may be generated using the method of moments for the Nash IUH. However, these results show the ability of the Wasserstein distance to recognise the error in improperly aligned peak flows.

The power of the Wasserstein distance may come to the fore when a more complex watershed and thus model are used, perhaps producing multiple pulses per rainfall event and yielding multiple peaks in the IUH. These results also capture the general behaviour of models possessing significant delay times, allowing the simulated hydrograph to “shift” in time across the observations, in a similar manner to Fig.

Beyond the use of the Wasserstein distance purely as an objective function, attention will now be turned to the application of Wasserstein barycentres to hydrology, using hydrographs as the object of study. The power of the Wasserstein barycentre is that it gives a notion of an “average” when features have been displaced.

This means that an ensemble of hydrographs describing the same event, perhaps with different climate or hydrological models, can be “averaged” into a single hydrograph that carries characteristics of each ensemble member. Note that this will only be true if the main source of difference between ensemble members is timing and peak shape differences rather than peak volume differences.

To test this premise, two different IUH models were applied to the same synthetic rainfall event, giving the differing hydrographs shown in Fig.

Unlike the ordinary mean, the Wasserstein distance captured the correct number and general characteristics of the peak flows. Again, this is built on the assumption that the peaks of each hydrograph mainly differ in shape and timing but not volume. If they differed in volume, the barycentre could exhibit peaks in unusual locations, as the mass is being “viewed” halfway through transport across the domain.

Consider the two hydrographs shown in Fig.

The same hydrological model applied to rainfall events differing in amplitude (dashed black and red, inverted plots in panel

We therefore see that while the Wasserstein barycentre is well suited for an ensemble of hydrographs with differing peak flow timings, it is not well suited for amplitude differences. This is to be expected from its definition. We therefore suggest that the conventional mean is best suited for an ensemble with differing amplitudes but consistent timings, whilst the Wasserstein barycentre is appropriate when timing is variable, but volumes in peak flows are consistent.

Quantifying the similarity of temporal or spatial distributions of water occupies an important role within hydrology. The way in which a “good” fit is defined directly influences the character of the results. Many commonly used misfit functions, such as root mean square error and Nash–Sutcliffe efficiency, quantify fit by considering differences in amplitude. While this is perfectly acceptable when errors are restricted to amplitude, these measures of misfit do not well quantify displacement errors.

In this work, we have suggested the Wasserstein distance, derived from optimal transport, as a candidate misfit function for applications where displacement or timing errors are prominent. This quantifies difference in terms of the effort required to transform one mass distribution into another. A modification of traditional OT and the associated “hydrograph–Wasserstein distance” was also developed.

While the Wasserstein-based measures certainly gave better calibration results than RMSE, there was still a slight bias towards peaks of reduced amplitudes, although to a much lesser extent. Some type of multi-objective method may circumvent this by using a measure comparing the amplitude of peak flows. Although the Wasserstein distance with a penalty term outperformed the hydrograph–Wasserstein distance for these synthetic tests, there may be broader implications for this second misfit function. Very little interest has been given thus far to the modification of OT for particular applications, with most preferring to force data into the density function mould. Other ways of modifying whilst still capturing the essence of OT are therefore a key point of further research, as they may allow the simultaneous capture of displacement and total mass errors without the need for a user-defined weighting term between the aspects. Focus in this work has been placed upon the Wasserstein distance rather than the optimal map from which it is derived. There is unexplored potential in this optimal map for providing a two-way mapping between collected data and a reference distribution in a similar vein to flow anamorphosis

As proposed by

It is also important to remember that, for more complex models and data, we cannot expect the Wasserstein distance to capture

For the synthetic experiments, a method for generating random rainfall events was required. This is done by first setting a time window length. Let this window be the interval [0,

A number of rainfall events

To corrupt the measurements with timing errors, the timing of each storm is shifted independently by a number

Code is available at

All data used in this study are synthetic and can be generated from the code found at

JCM wrote the manuscript and code for this paper. JCM and MS both developed the theory of the work. MS assisted in editing and preparing the manuscript and supervised the project.

The contact author has declared that neither of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jared C. Magyar completed this work with the support of an InLab honours scholarship funded by the CSIRO Future Science platform for Deep Earth Imaging. The authors would also like to acknowledge helpful discussions with Andrew Valentine on this topic. The authors would like to thank Uwe Ehret and Luk Peeters for their constructive reviews that improved the quality of the manuscript.

This paper was edited by Gerrit H. de Rooij and reviewed by Uwe Ehret and Luk Peeters.