This article introduces an improvement in the Series Distance (SD) approach for the improved discrimination and visualization of timing and magnitude uncertainties in streamflow simulations. SD emulates visual hydrograph comparison by distinguishing periods of low flow and periods of rise and recession in hydrological events. Within these periods, it determines the distance of two hydrographs not between points of equal time but between points that are hydrologically similar. The improvement comprises an automated procedure to emulate visual pattern matching, i.e. the determination of an optimal level of generalization when comparing two hydrographs, a scaled error model which is better applicable across large discharge ranges than its non-scaled counterpart, and “error dressing”, a concept to construct uncertainty ranges around deterministic simulations or forecasts. Error dressing includes an approach to sample empirical error distributions by increasing variance contribution, which can be extended from standard one-dimensional distributions to the two-dimensional distributions of combined time and magnitude errors provided by SD.

In a case study we apply both the SD concept and a benchmark model (BM) based on standard magnitude errors to a 6-year time series of observations and simulations from a small alpine catchment. Time–magnitude error characteristics for low flow and rising and falling limbs of events were substantially different. Their separate treatment within SD therefore preserves useful information which can be used for differentiated model diagnostics, and which is not contained in standard criteria like the Nash–Sutcliffe efficiency. Construction of uncertainty ranges based on the magnitude of errors of the BM approach and the combined time and magnitude errors of the SD approach revealed that the BM-derived ranges were visually narrower and statistically superior to the SD ranges. This suggests that the combined use of time and magnitude errors to construct uncertainty envelopes implies a trade-off between the added value of explicitly considering timing errors and the associated, inevitable time-spreading effect which inflates the related uncertainty ranges. Which effect dominates depends on the characteristics of timing errors in the hydrographs at hand. Our findings confirm that Series Distance is an elaborated concept for the comparison of simulated and observed streamflow time series which can be used for detailed hydrological analysis and model diagnostics and to inform us about uncertainties related to hydrological predictions.

Manifold epistemic and aleatory uncertainties make the simulation of streamflow a fairly uncertain task. The assessment of uncertainties, i.e. quantification, evaluation, and communication, is thus of great concern in decision making, model evaluation, the design of technical structures like flood protection dams or weirs, and many other issues. The quantification and evaluation of uncertainties typically involves the comparison of simulated and observed rainfall–runoff data.

For this purpose, visual hydrograph inspection is still the most widely used technique in hydrology as it allows for the simultaneous consideration of various aspects such as the occurrence of hydrological rainfall–runoff events, the timing of peaks and troughs, the agreement in shape, and the comparison of individual rising or falling limbs within an event. The main strength of visual hydrograph comparison results from the human ability to identify and compare matching, i.e. hydrologically similar parts of hydrographs (“to compare apples with apples”) and particularly to discriminate vertical (magnitude) and horizontal (timing) agreement of hydrographs. Whereas the former implies that rising and falling limbs of the two time series are intuitively and meaningfully matched before they are compared, the latter refers to a joint but yet individual consideration of timing and magnitude errors. Visual hydrograph inspection is hence a powerful yet demanding evaluation technique which is still rather difficult to mimic by automated methods. Clear disadvantages of visual hydrograph inspection, however, are its subjectivity and that its application is restricted to a limited number of events.

To overcome this shortcoming, a large number of numerical criteria

A less recognized issue of MSE-type criteria is that these compare points with identical abscissa, i.e. at the same position in time. This means that points in the observation are “vertically” compared to points in the simulation (in the following we refer to them as vertical metrics). The problem with this is that small errors in timing may be expressed as large errors in magnitude. It is obvious that neither individual criteria nor the combination of different vertical metrics within a multi-objective approach can compensate for this.

Just as with performance criteria, many methods related to the quantification,
visualization, and communication of uncertainties were developed in recent
decades, and the value of knowledge about simulation uncertainty is now
generally acknowledged. The range of methods is large and comprises manifold
probabilistic and non-probabilistic approaches. Probabilistic concepts, for
instance, include the total model uncertainty concept

Motivated by the limitations of vertical distance metrics,

Here we present substantial improvements (Sect. 2) to the original approach
of

Time series of observed (black) and simulated (grey) discharge
during a hydrological event. The horizontal line represents a user-specific
threshold which differentiates between event and non-event periods. The light
grey lines represent the Series Distance connectors linking hydrologically
comparable points in the two time series. Time and magnitude distances are
calculated between these points. The black rectangle highlights time steps
where a part of the recession of the simulation overlaps with a rising part
of the observation (figure from

SD was developed to resemble the strengths of visual
hydrograph inspection in an automated procedure, which typically rests on the
following premises

Hydrographs contain individual events separated by periods of low flow.

Events are composed of rising and falling limbs or segments which are separated by peaks and troughs.

These different parts of event hydrographs reflect different hydrometeorological
processes and should be compared individually, so as to not compare apples
with oranges. This is of particular importance if the simulated (sim in the
following) and observed (obs in the following) hydrographs do belong to different parts of the hydrograph at the same time
step

A comprehensive evaluation of the agreement of matching rising and falling limbs of two hydrographs requires consideration of both errors in timing and magnitude as this better informs us about ways to improve the model. A simulated rising limb can, for example, match perfectly with its observed counterpart with respect to values but occur systematically too early or too late, which would indicate the need to adjust model parameters related to runoff concentration and flood routing or to improve the related model components.

A comprehensive comparison of sim and obs should also provide information on the overall agreement with respect to the occurrence of relevant events and times of low flow. This is typically expressed by contingency tables, which contain information about correctly predicted, missed, and falsely predicted events.

These criteria listed above inform about different error sources, and their
individual evaluation therefore provides useful information for a targeted
model improvement. As SD accounts for all of these aspects, it is not a
single formula but rather a procedure which includes the following steps. For
each step, the main innovations are described in detail in the sections below.

Hydrograph preprocessing (Sect. 2.1). New: routines to create gap-free, non-negative time series and to filter irrelevant fluctuations.

Identification and pairing of events (Sect. 2.2). New: routines to read user-specified events and to treat the entire time series as a single, long event.

Identification, matching, and coarse-graining of segments (Sect. 2.3): New: this part has been completely reworked and now applies the coarse-graining procedure.

Calculation of the distance between matching segments with respect to both timing and magnitude (Sect. 2.4). This is the core of SD, and it is important to note that the distances are computed between points of the hydrographs considered to be hydrologically similar. New: routines to calculate a scaled magnitude error.

Calculation of a contingency table which counts matching, missing, and false events. No changes.

The application of SD usually requires some preprocessing to assure gap-free
and non-negative time series of equal length; related routines are now
included in the SD code. Further routines are available for the adjustment
of consecutive identical values; the identification of rising and falling limbs
requires non-zero gradients and for time series smoothing, which is often
necessary due to the presence of sensor-related non-relevant microsegments.
Smoothing is based on the Douglas–Peucker algorithm

For many aspects of hydrology such as flood forecasting or studies of
rainfall–runoff transformation, it is useful to consider a hydrograph as a
succession of distinct events, usually triggered by rainfall events, separated
by periods of low flow. As SD is based on the concept of comparing similar
parts of obs and sim hydrographs, it ideally also involves the steps of
identifying events both in the obs and sim time series and then relating the
resulting events between the series. On this level, the general agreement of
the two series is evaluated with a contingency table, which counts the number
of hits (observed events that have a matching simulated counterpart), misses
(observed events without a simulated counterpart), and false alarms (simulated
events without an observed counterpart). This is also the basis for the
further steps of the SD procedure: only for matching pairs of obs–sim
events can matching segments of rise and fall within the events be
identified and the combined time–magnitude error be computed. For misses,
false alarms, and periods of low flow this is not possible. For these cases,
the best indicator of hydrological similarity in obs and sim is similarity
in time; i.e. the distance between the observed and simulated hydrograph can
be computed with a standard vertical distance measure. The detection of
events in hydrographs and their subsequent pairing, however, is not trivial and
has to our knowledge not yet been solved in an automated and generalized way.
The original version of SD applied a simple no-event threshold (see
Fig.

This section describes the core of the SD concept, i.e. the way to identify, within a matching pair of an observed and a simulated event, hydrologically comparable points of the hydrographs in order to quantify their distance in magnitude and time. This pattern matching procedure has been substantially improved in the new version of SD and is therefore described in detail here.

The term “hydrologically comparable” relates to how a hydrologist would
visually compare hydrographs and includes several aspects and constraints. The first constraint is based on the perception that even if hydrological
simulations may deviate from the observations in magnitude or timing, their
temporal order is usually correct. Therefore, in SD, matching points are
compared chronologically by preserving their temporal occurrence: the first
point in obs is compared to the first in sim, the second to the second, the
last to the last. Please note that this does not require the two events to be
of equal length, as in SD, the hydrograph is considered a polygon from
which the points to compare can be sampled by linear interpolation without
restriction to its edge nodes. This is explained in detail below. The second
constraint relates to the slope of the hydrograph: to ensure hydrological
consistency, points within rising segments of sim are only compared to points
in rising segments of obs, and the same applies to falling segments. This
creates a problem related to the within-event variability of the two
hydrographs: it is easy to imagine a case in which the number of segments in the
obs and sim event differs. This can be either due to sensor-related
high-frequency micro fluctuations of the observations, which can create
sequences of many short rising and falling segments, or to general deviations of
the simulation from the observation, such as a double-peaked simulated event
while the observed event is single-peaked. In visual hydrograph evaluation, a
hydrologist will detect the dominant patterns of rise and fall in the two
time series and identify matching segments by doing two things: filtering out
short, non-relevant fluctuations and then relating the remaining ones by jointly
evaluating their similarity in timing, duration, and slope. The stronger the
overall disagreement of the obs and sim event, the more visual
coarse-graining will be done before the hydrographs are finally compared,
while at the same time the degree of coarse-graining will also influence the
hydrologist's evaluation of the hydrograph agreement: the higher the required
degree of coarse-graining, the smaller the agreement. In SD, these steps
are emulated by iteratively maximizing an objective function: while
increasingly coarse-graining the two events, their overall time and magnitude
distance is evaluated. The final evaluation of agreement is then done on the
level at which the optimal trade-off between coarse-graining and hydrograph
distance occurs, i.e. where the objective function is minimal. The procedure
consists of four steps and is explained in the following sections:
(1) determination of segment properties, (2) equalizing the number of segments in
the obs and sim event, (3) iterative coarse-graining, and (4) distance
computation for the optimal coarse-graining level.

For each segment

If the number of segments in the obs and sim event differs, they are

It is important to note that this procedure is a purely logical assimilation:
the timing and magnitude of the points in the dissolved segment remain
unchanged;
they are only reassigned to the new and larger segment. This also implies
that the meaning of coarse-graining in the context of SD is slightly
different from its meanings in statistics and thermodynamics or in upscaling

Obviously, this procedure includes a false classification: the rising segment
in the previous example is now hidden within a larger falling segment. This
can be considered as the price of coarse-graining and can be quantified by
the number of falsely classified edge nodes (

With the number of segments in the obs and sim events equalized, their SD
timing and magnitude distance can be computed. To this end, the first obs
segment is compared to the first sim segment, the second to the second, etc.
Since the segments can differ in length we here assume that for each segment
pair, the appropriate number of points is evenly distributed along the
segment duration and can thus be found by linear interpolation between the
time series edge nodes. The first point in the obs segment is then connected
to the first point in the sim segment, the second to the second, etc. For
each connector its horizontal and vertical projection, i.e. length in time
and magnitude, respectively, is determined (compare again
Fig.

In the initial version of SD, the number of points for each segment pair was
found by calculating the mean of the two relative durations,

At this point the result of the SD procedure – a two-dimensional
distribution of time and magnitude errors, separately for the rising and the
falling segments – is available. However, in practice the problem of
non-intuitive segment matching often spoils the results. Due to the
constraint of time-ordered segment matching, any minor change in monotony
within a rising or a falling limb that is only present in either the obs or
sim event will produce a false matching of segments. The left panel in
Fig.

We overcome this problem using iterative coarse-graining again: within the
events, successively more segments are logically aggregated with their
neighbours until finally the entire event consists of only two segments: one
rise and one fall. Compared to the last step, in which we apply
coarse-graining to either sim or obs in order to equalize the number of
segments in the simulated and observed event, we here apply it simultaneously
to the obs and sim event. Hence, an equal number of segments and unique
segment matching is ensured. The final comparison of the two events is done
for the coarse-graining step in which the total SD errors and the degree of
coarse-graining together are small. Both requirements are considered in the
coarse-graining objective function (

Note that

As can be seen in Fig.

Illustration of the time-ordered matching of segments in the coarse-graining procedure. The rising and falling segments of the simulation (sim) and observation (obs) are numbered and colour-coded according to their chronological order. Series distance compares segments with identical number and/or colour.

Coarse-graining steps: all plots contain data from the same
multi-peak discharge event but for different levels of coarse-graining. The
initial conditions (top left panel) are characterized by a large number of
poorly matching simulated (dashed) and observed (solid) segments as indicated
by the non-intuitively placed SD connectors (grey lines). Segments required
to match according to the chronological order constraint of SD are indicated
by matching colours. In the last coarse-graining step (top right panel) the
connectors are placed more meaningfully but the representation of the entire
event by only two segments (one rise, one fall) appears inadequately coarse.
The optimal level of coarse-graining, here reached at step three, yields
visually acceptable connectors while preserving a detailed segment structure
(bottom left panel). This step is associated with a minimum of the
coarse-graining objective function (Eq.

Once the coarse-graining is done, the optimal value of

In the initial version of SD, the magnitude error (

The application of SD timing and magnitude error models (

Analogously to the scaled vertical SD error model in Eq. (

The SD concept can be applied to a variety of tasks such as model
diagnostics, parameter estimation, calibration, or the construction of
uncertainty ranges. In this section we provide one example thereof and
describe a heuristic approach for the construction of uncertainty ranges for
deterministic streamflow simulations. Uncertainty ranges provide regions of
confidence around an uncertain estimate and are of practical relevance and a
straightforward means of highlighting and of assessing magnitude and timing
uncertainties of hydrological simulations or forecasts. Conceptually,
uncertainty ranges should be wide enough to capture a significant portion of
the observed values but as narrow as possible to be precise and, thus,
meaningful. These requirements are antagonistic as large uncertainty ranges,
which capture most or all observations, are usually imprecise to a degree
that makes them useless for decision-making purposes

The method we propose here follows the concept proposed by

Provided with a record of past streamflow observations (

The choice of the sampling strategy, however, strongly influences the
statistics of the resulting uncertainty ranges and should be carefully
selected. In our case, the precondition was that the approach should be
extendible to two-dimensional cases to allow its later application to the
error distributions of the SD approach. Therefore, we defined the sampling
strategy according to the variance contribution, which is straightforward to
apply for the one-dimensional case: for each point of the error distribution
its relative contribution (d

SD yields two-dimensional distributions of coupled errors in timing and
magnitude and thus requires a two-dimensional strategy for the sampling of
error subsets and the construction of envelope curves (Fig.

How does one sample from bivariate distributions of coupled errors with different
units? Statistics and computational geometry offer concepts based on the ordering
of multivariate data sets, such as geometric median or centre point
approaches. The former provides a central tendency for higher dimensions and
is a generalization of the median which, for one-dimensional data, has the
property of minimizing the sum of distances. Centre points are generalizations
of the median in higher-dimensional Euclidean space and can be approximated
by techniques such as the Tukey depth

Sketch of the one- and two-dimensional error dressing method using
normally distributed random numbers (

Analogously to the one-dimensional case, the points are ordered by increasing
combined variance contribution d

SD distinguishes periods of low flow, rising, and falling limbs. Hence, subsets of two 2-D error distributions (rising and falling limb) and from one
one-dimensional error distribution (low flow) are calculated and applied to
each point of a simulation: points of low flow are dressed with the low-flow
error subset, points of rise with error subsets from rising limbs, etc.
Altogether this yields a region of overlapping error ovals around a
simulation (Fig.

This case study, based on real-world data, serves to present and to discuss relevant aspects of SD by comparison with a benchmark error model (BM).

We used discharge observations (

For the benchmark model, we derived distributions of 1-D vertical errors. We
did not differentiate cases of low flow and events, which is rather
simplistic but standard practice. For the SD approach we did differentiate
these cases. This may be considered an unfair advantage for SD as it allows
the construction of more custom-tailored uncertainty envelopes. However, as
the objective of the case study is not a competition between the two
approaches but a way to present interesting aspects of SD, we considered it
justified. For SD, the required starting and end points of hydrological
events were manually determined both in

Both for SD and BM, we applied scaled errors (

Based upon SD and BM we derived empirical error distributions from the
entire test period and then used them, in the same period, to construct
uncertainty envelopes around the simulation

The evaluation of deterministic uncertainty ranges requires methods to quantify properties such as coverage or precision. Here we propose a set of statistics which can be applied to uncertainty ranges irrespective of how they were constructed. While this ensures comparability of the SD and BM-derived ranges, it does not exploit the advantages of the SD approach, i.e. separate treatment of time and magnitude uncertainties.

Coverage (

Precision (PRC) allows the comparison of different uncertainty ranges. PRC is
the average width of the uncertainty envelope, i.e. the average difference of
the upper (UE

Finally we suggest scaling PRC by the value of the simulation according to
Eq. (

In the case study, we used

In this section we first discuss some general aspects of the SD concept and then compare it to the benchmark approach using the case study data.

Series Distance is an elaborate method for the comparison of simulated and observed streamflow time series. The concept allows the distinction between different hydrological conditions (low flow and rising and falling limbs) and determines joint errors in timing and magnitude of matching points within matching segments of related hydrographs. Differences in the high- and/or low-frequency agreement of the obs and sim hydrographs are considered with an iterative coarse-graining procedure, which effectively mimics visual hydrograph comparison. This differentiated evaluation makes SD a powerful tool for model diagnostics and performance evaluation.

The challenges of SD are, however, in the details: the robust, precise, and
meaningful partitioning of the hydrograph into periods of low flow and events
is difficult. We tested various approaches including baseflow separation and
filtering techniques

Qualitative analyses of the weighting factors

Qualitative description of the impact of the different weighting
factors of the objective function

Optimal coarse-graining solution of the event depicted in
Fig.

The hydrograph matching algorithm (HMA) proposed by

Error dressing is a simple method and straightforward to apply. Conceptually
it is very similar to statistical concepts like the total uncertainty method
introduced by

The error dressing concept in the presented form does not distinguish between seasonality or different flow magnitudes as the same error distributions are applied to each rising (and/or falling) limb. More sophisticated implementations are of course possible, such as a differentiation of errors according to flow magnitudes to better capture extremes, or differentiation according to forecast lead times. The same applies for the sampling strategy: as an alternative to the method presented here based on combined variance contribution, the sampling of specific quantiles using the median as central reference or the fitting and application of any parametric function to the distribution is of course possible. A practical insight from applying the error dressing concept is that the variance-based method effectively filters outliers, which sometimes occur when errors are calculated between poorly matching segments.

A last general issue relates to the sampling from the two-dimensional error
distribution. Due to the superposition of error clouds in successive time
steps it is possible that errors in timing at one time step mimic errors in
magnitude at neighbouring time steps (Fig.

Statistical properties of the individual Series Distance (SD) and
benchmark (BM) error distributions from the case study. For the entire
distribution we provide the first and third quartile, the mean, median, and
the percentage of outliers (data points which are more than 3 standard
deviations apart from the mean). For the subset we provide the sampled upper
(maximum) and lower (minimum) boundaries. The subscripts with SD refer to errors
in magnitude (

As described in Sect.

Altogether four error distributions were calculated: for SD two 2-D
distributions (one for the rising and one for the falling event limbs) and
one 1-D distribution for the low-flow conditions; for BM a single 1-D
distribution of magnitude errors for the entire time series. The
distributions are shown in Fig.

One- and two-dimensional error distributions from the case study.
The upper row contains Series Distance (SD) results for the rising and
falling limbs. The left panel in the lower row shows the one-dimensional SD
distribution of errors for the periods of low flow. The panel in the bottom
right contains the 1-D distribution of magnitude errors of the benchmark
model (BM) for the entire time series. The highlighted subset represents the
80 % subset used to construct the uncertainty envelopes. Distribution
statistics are provided in Table

Comparing the 2-D distributions reveals distinct differences in shape: for
the rising limbs the distribution is rather oval; for the falling limbs it is
almost circular. This is particularly evident in the sampled subsets. The
uniform spread of the errors within the oval and the circle indicates that
for the data at hand, the timing and magnitude errors are largely
uncorrelated but dependent upon the hydrological conditions (rise or fall).
The (scaled) magnitude errors for both distributions are located
between

Together, these results confirm that different flow conditions,
i.e. low-flow, rising or falling limbs of events, exhibit different error
characteristics. This suggests that a differentiation between hydrological
conditions can be meaningful. For instance, timing errors of the recession
in the case study would be strongly underestimated by timing errors of the
rising limbs, and vice versa, as depicted in the lower panel of
Fig.

Subsets of both the SD and BM error distributions were used to construct
uncertainty envelopes (UE) around the entire simulated time series

In comparison, the uncertainty envelope of the BM model appears slimmer and
more precise. However, due to the lack of consideration of timing
uncertainties, especially during steep flood rises, the uncertainty envelopes
become very narrow. Such a “vanishing” of the uncertainty envelopes implies
that there are no timing errors to be expected at all (compare, e.g., the
period 6–7 June 2001 in Fig.

The statistical evaluation of the different uncertainty envelopes
(Table

Coverage (

Time series detail showing the resulting one- and two-dimensional
uncertainty envelopes around the historic streamflow simulation. The
envelopes were derived upon Series Distance (UE

Vertical and horizontal error bars. The upper panel shows magnitude
error bars (

To further investigate the individual effects of errors in timing and
magnitude, we also applied them separately to the simulated time series. To
this end we applied case-specific subsets of the error distributions –
i.e. 2-D errors for rising and falling limbs and 1-D error distributions for
low flow – to each point of the simulated time series just as in the
previously described error dressing approaches. The difference was that we
did not apply the entire error subset (oval or circle) but its projection on
the time and magnitude axis, respectively. The resulting uncertainty bars
therefore extend from the maximum to the minimum magnitude (upper panel) and
timing (lower panel) values of the error subsets and are depicted in
Fig.

The lower panel in Fig.

Finally, even if the SD error distributions are not used to construct
uncertainty envelopes, knowledge of magnitude and timing error distributions
is valuable for model diagnostics: in their approach to identifying
characteristic error groups in hydrological time series

The main goal of this paper was to present major developments in the SD
concept since its first version presented by

Applying the SD concept and a benchmark model (BM) based on standard magnitude errors to a 6-year time series of observations and simulations in a small alpine catchment revealed that different flow conditions (low flow and rising and falling limbs during events) exhibit distinctly different characteristics of timing and magnitude errors with respect to mean and spread. Separate treatment of timing and magnitude errors and a differentiation of flow conditions as done in SD is thus recommended in general as it preserves useful information. Exploiting these characteristics and their correlations can support targeted model diagnostics. Deeper insights can easily be provided if the error distributions are further differentiated by discharge magnitude classes, by season, or by considering the temporal autocorrelation of errors. The latter would allow the development of a time-conditioned error sampling strategy when constructing 2-D uncertainty envelopes.

Applying the error distributions of both SD and BM to construct uncertainty ranges around the fairly accurate simulation revealed a remarkable timing uncertainty. This suggests that we commonly underestimate the role of horizontal uncertainties in streamflow simulations. For the given data, the BM-derived uncertainty ranges were in consequence visually narrower and statistically superior to the SD ranges. This suggests that the use of the SD concept to construct uncertainty envelopes according to the proposed error dressing method implies a trade-off between two effects: on the one hand, the explicit consideration of timing errors potentially yields better-tailored uncertainty envelopes, as apparent timing errors are treated as such. On the other hand, the time-spreading effect of the SD envelope construction, which essentially is the union of the time and magnitude error uncertainty ranges, can lead to an undesirable inflation. For the case study data, the latter effect predominated, while for hydrological forecasts based on uncertain meteorological forecasts the opposite may be the case. This also opens interesting avenues for new ways to construct uncertainty ranges based on the SD concept, e.g. as the intersection (rather than the union) of the two error components.

We conclude that Series Distance is an elaborate concept for the comparison
of simulated and observed streamflow time series which can be used both for
detailed hydrological analysis and model diagnostics. Its application,
however, involves considerably more effort than standard diagnostic measures,
which are typically justified if timing errors are dominant or of particular
interest. More generally, we believe that for hydrological studies there is a
large potential for intuitive distance metrics such as the hydrograph
matching algorithm proposed by

To foster the use of the SD concept and the methods therein we publish a
ready-to-use Matlab program code alongside to the manuscript under a Creative
Commons license (CC BY-NC-SA 4.0). It is accessible via

A development release of the Series Distance program code (Ehret and Seibert,
2016), licensed under a Creative Commons license (CC BY-NC-SA 4.0), is
published alongside this manuscript via GitHub

Parametrization files for the Large Area Runoff Simulation Model (LARSIM, Ludwig and Bremicker, 2006) and the corresponding hydrometeorological data sets which we used in the case study can be obtained from the authors upon request. LARSIM executables can be obtained upon request from the Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg, Germany (LUBW, 2016).

We thank Tilmann Gneiting from the Heidelberg Institute for Theoretical Studies (H-ITS) for valuable discussions on the error dressing concept, Clemens Mathis from Wasserwirtschaft Vorarlberg for providing the case study data and hydrological model, and all users of SD who provided valuable feedback and constructive criticism throughout recent years. We furthermore thank the three anonymous referees for providing valuable comments and acknowledge support for open-access publishing by the Deutsche Forschungsgemeinschaft (DFG) and the Open Access Publishing Fund of Karlsruhe Institute of Technology (KIT). The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association. Edited by: V. Andréassian Reviewed by: three anonymous referees