When estimating discharges through rating curves, temporal data consistency is a critical issue. In this research, consistency in stage–discharge data is investigated using a methodology called Bidirectional Reach (BReach), which departs from a (in operational hydrology) commonly used definition of consistency. A period is considered to be consistent if no consecutive and systematic deviations from a current situation occur that exceed observational uncertainty. Therefore, the capability of a rating curve model to describe a subset of the (chronologically sorted) data is assessed in each observation by indicating the outermost data points for which the rating curve model behaves satisfactorily. These points are called the maximum left or right reach, depending on the direction of the investigation. This temporal reach should not be confused with a spatial reach (indicating a part of a river). Changes in these reaches throughout the data series indicate possible changes in data consistency and if not resolved could introduce additional errors and biases. In this research, various measurement stations in the UK, New Zealand and Belgium are selected based on their significant historical ratings information and their specific characteristics related to data consistency. For each country, regional information is maximally used to estimate observational uncertainty. Based on this uncertainty, a BReach analysis is performed and, subsequently, results are validated against available knowledge about the history and behavior of the site. For all investigated cases, the methodology provides results that appear to be consistent with this knowledge of historical changes and thus facilitates a reliable assessment of (in)consistent periods in stage–discharge measurements. This assessment is not only useful for the analysis and determination of discharge time series, but also to enhance applications based on these data (e.g., by informing hydrological and hydraulic model evaluation design about consistent time periods to analyze).

For many applications in hydraulics, hydrology and water management, reliable river discharges are crucial. A commonly used practice for the estimation of these discharges is the use of rating curves. Through the calibration of a relation between stage and discharge measurements (i.e., a rating curve), high-frequency stage measurements can be transformed into high-frequency discharge measurements. As this relation is based on only a limited number of simultaneous stage–discharge measurements, it is a relatively budget-friendly method for discharge assessment in rivers.

The use of rating curves requires attention for the consistency of the
measured stage–discharge data set. Several causes (e.g., geometric changes of
the river bed, infrastructure works, weed growth) can alter the hydraulic
behavior of the river in the considered measurement location temporarily or
permanently and thus limit the validity of a calibrated rating curve.
Information about this temporal (in)consistency is hence critical to prevent
additional errors and biases from occurring in the determined river
discharges. Moreover, a correct assessment of (in)consistent periods can
enhance other applications based on the investigated data. For instance,
errors in hydrological or hydraulic model results that are caused by changes
in a river's situation can (if they lead to inconsistency of the rating curve
data as well) be avoided by using known consistent stage–discharge time
periods for model evaluation. Methods to detect and describe this temporal
(in)consistency have been studied by several authors (see

Therefore,

In

Study sites and their characteristics.

The BReach methodology is applied to three stage–discharge data sets in the
UK, two in New Zealand and five in Belgium. These stations are selected based
on their particular properties with regard to data consistency. Their
well-documented history enables a verification of the results of a BReach
analysis. An overview of these stations and their main characteristics is
given in Table

The aim of the BReach methodology is to identify consistency in rating curve
data based on a quality analysis of model results. The methodology consists
of several consecutive steps

Step 1: selection of a model structure for the analysis.

Step 2: sampling of the parameter space.

Step 3: assessment of acceptable model results.

Step 4: assessment of different degrees of tolerance.

Step 5: assessment of the bidirectional reach for all degrees of tolerance.

Step 6: identification of consistent data periods.

A first step is the choice of a rating curve model that appropriately approximates the relation between discharge and stage for an important part of the measured range. In this paper, the chosen rating curve model depends on the characteristics of the measurement station.

For the station of Colsterworth (UK, Table

For all other analyzed stations (except from the station of Clog-y-Fran, UK),
a segmented rating curve with two segments is used

Although both model structures are simple, this approach is satisfactory for
nearly all stations. By analyzing well-chosen subsets of the data (e.g.,
winter data if the influence of weed growth can be expected, as in the river
Grote Nete at Hulshout, Belgium) or by performing an analysis on the data
after sorting them by stage instead of chronologically (e.g., to assess the
influence of a downstream movable weir in the river Meuse at Maaseik,
Belgium), the chosen rating curve models also satisfy for less
straightforward flow situations (see
Sects.

The complex flow situation in the river Taf at Clog-y-Fran (UK), however, requires a rating curve model with increased complexity. In this station, the hydraulic behavior is influenced by the combination of weed growth affecting low flow behavior, a considerable overspill over the right bank at higher stages and an unstable bed control. For these reasons, a segmented rating curve with three segments is used. The second segment overcomes similar difficulties, as described for the two-segment rating curve. The third segment is representing the flow for stages higher than bank overspill.

Generally, the choice of rating curve model should maximally be based on the
existing flow situation at the rating curve station. In case more complex
flow situations (e.g., hysteresis or backwater effects) can be observed and
described, it is possible to apply the BReach methodology with an adapted
rating curve model

It is important to mention that all decisions to be made in the BReach
methodology, such as the assessment of the measurement uncertainty
(Sects.

The sampling of the power law parameters (Eq.

The two-segment rating curve (Eq.

Values for

The three-segment rating curve used at Clog-y-Fran has 11 parameters, of
which

For all types of rating curves, the parameter space is sampled using a Latin
Hypercube sampling. For the single power law,

Following

An estimation of these measurement uncertainties is made by many authors. A
good literature overview with a summary of the major findings is given in

Discharge measurements are more uncertain and their errors are subjected to
heteroscedasticity, i.e., error distributions vary with changing discharge
values. Therefore, uncertainty on discharge measurements is typically
expressed as a percentage of the occurring discharge. In nearly all studies,
it is assumed to have a negligible bias.

Although most of these studies share some general considerations, eventual
uncertainties on stage and discharge measurements can depend upon location,
flow conditions and measurement technique, and hence estimated uncertainty
boundaries are subjected to a relatively large variation. Therefore, this
paper maximally uses available local information for the estimation of
observational uncertainty boundaries. Nevertheless, the ranges provided in
literature offer a valuable framework to validate these local findings. For
each country separately, an estimation of local uncertainty boundaries is
described in Sect.

Based on the estimated observational uncertainty, results of a model (i.e., a rating curve model with a sampled set of parameters) are categorized as acceptable or nonacceptable. The result of this step is a binary matrix with classification results for each parameter set and each data point.

As mentioned in the introduction, the BReach methodology evaluates the capacity of a rating curve model to describe a subset of the data in each observation. For this evaluation, a definition of satisfactory behavior of a rating curve model is necessary. In this paper, this definition is called the degree of tolerance and expresses the percentage of model results that are allowed to be nonacceptable in a sequence of data points.

Before assessing the bidirectional reach, all stage–discharge measurements
are sorted chronologically and their index within this sorted data series is
used to refer to them. Subsequently, a degree of tolerance is selected and
the binary classification matrix is used to evaluate a model and its results
from the perspective of one data point. The temporal span for which this
model behaves satisfactorily is assessed both in the direction of the previous
and the following data points using a directional search, that stops as soon
as the required conditions are not met. Within these spans, the index of the
outermost observation with an acceptable result is referred to as the left
(previous points) or right (following points) reach. This information is
aggregated for all parameter sets by taking the outermost left and right
reaches. They are called the maximum left and right reach and represent the
indices beyond which none of the sampled parameter sets is acceptable within
a data series with satisfactory behavior. Assessment of the maximum left and
right reach is repeated for all data points and for all degrees of tolerance,
and results are summarized in a combined BReach plot (e.g.,
Fig.

Combined BReach plots (e.g., Fig.

In this paper, a BReach analysis is performed for all stations. If a seasonal variation in the rating curve behavior (due to weed growth) is presumed, a second analysis is performed on a subset of data measured during winter months (between December and March). In the UK and in Belgium, such a set of winter data is not expected to be influenced by weed growth. The combination of a BReach analysis on all data that shows no consistency and an analysis on only winter data that indicates consistent periods can confirm the influence of weed growth.

If it can be assumed that the behavior of the rating curve changes with changing stages, an additional BReach analysis is performed. For the latter, the data are sorted by stage instead of chronologically. Results of such an analysis can reveal the height ranges in which the rating curve behavior alters. As multisegmented rating curves aim to overcome these alterations, it is not interesting to use them in this context. Therefore, a single power law is used for all BReach analyses on data sorted by stage.

To avoid confusion between both a temporal BReach analysis and an analysis on
data sorted by stage and between several types of rating curve models,
results of the analyses will be referred to as BReach

The assessment of 95 % uncertainty boundaries of the stage and discharge data is based on available local information. This information availability differs for each country and hence, in this section, the followed approach is described for each country.

For the UK stations,

In

However,

For the Belgian measurement stations, no prior information concerning measurement uncertainties was available. Nevertheless, it is possible to gain insight into plausible characteristics of measurement errors by analyzing simultaneous measurements. Although, in this paper, a BReach analysis is performed on only five Belgian measurement stations, simultaneous measurements of nine different stations are used for a preliminary uncertainty assessment of discharge measurements in order to maximize the amount of (scarce) data.

A pair of simultaneously measured discharges consists of two discharge measurements that are measured with the same type of device within a time span of 2 h and for which the corresponding measured stages are identical. Combining this information for nine Belgian stations results in a set of 42 simultaneous pairs that are all measured with an OTT QLiner. The restriction to only one type of measurement device prevents a mixture of possibly different error distributions, each corresponding with a different measurement technique. The errors of two simultaneous measurements are assumed to be independent.

To overcome the heteroscedastic character of discharge measurement errors,
they are expressed as a percentage of the real discharge. Nevertheless,
different authors find that parameters of error distributions change with
changing discharges

Neither data set allows for a direct assessment of measurement errors.
However, if an error distribution is assumed, it is possible to test equality
between the distributions of both the relative differences of the
simultaneously measured discharge pairs and a created set of relative
differences based on two equally sized samples of measurement errors from the
assumed distribution. For instance, a Gaussian measurement error with zero
mean and a standard deviation of 4 % is assumed for the low flow data set
(see Sects.

Independent of the real discharge, this relative difference of two
simultaneous measurements can thus be expressed by their measurement errors.
If the assumed error distribution (Gaussian,

The same analysis is repeated for both low and high flow data and for
Gaussian and logistic error distributions with different values of the scale
parameters, equidistantly taken from the interval [1 %, 6 %] and
[0.35 %, 4.4 %], respectively. As an example,
Fig.

Kolmogorov–Smirnov test results (simultaneous discharge measurements).

Low flow data have values for

The pattern of the other results (Gaussian distribution with high flow data,
logistic distribution with both low and high flow data) is very similar.
Table

The limited amount of data prohibits a more precise description of this
tendency toward lower uncertainties for higher normalized discharges.
Moreover, the lowest normalized flow in the set of simultaneous discharge
measurements is 0.72. Results of

For the assessment of uncertainties on stage measurements, the data
availability is different. Two simultaneous stage measurements are provided
for each stage–discharge measurement. However, the first type of measurement
is recorded from a staff gauge during a discharge measurement and the second
type is registered by a continuous measurement device. Hence, it can be
expected that error distributions of both data types differ and a similar
approach to that for discharge measurements is not justified. Therefore, 95 %
uncertainty boundaries are estimated to be

In order to benchmark the results of the BReach analyses, a residual analysis
is performed for several of the investigated measurement stations. An
analysis of the relative deviations from an “average” rating curve is
frequently used in operational hydrology, as their behavior can be used as an
indication of the stability of a measurement station or of a shift in the
rating curve

The performed analysis is based on a set of parameters that results from the
minimization of the root mean square error (

For each measurement station, results of the BReach analyses are validated using the available local information.

In Fig.

In Fig.

The English Environment Agency uses a segmented power law to assess discharges in this measurement station. Rating curve changes generally imply changes of the rating curve coefficients for the lowest and medium flows. The time instants of these official changes are indicated with cyan lines that depart from the bisector. If the change involves also the flood rating curve, an asterisk and (if available) some background information is added to the date indication. Although many of the rating curve changes correspond with discontinuities, the BReach plot sometimes suggests different or fewer moments of change.

In Fig.

Combined

In Fig.

Although the flow situation in Clog-y-Fran is complex, the available
information about the station can be linked with results of a BReach
analysis. These results indicate the need for an in-depth analysis that
should lead to an appropriate modeling approach for periods with weed growth.
For the remaining (winter) data, an assessment of consistent periods is
possible. However, the choice of an appropriate rating curve model is crucial
for success. Figure

In Fig.

In the stage–discharge data of Clog-y-Fran (Fig.

In Fig.

The Horizons Regional Council interpolates rating curves based on stage–discharge measurements. As these interpolations are changed up to a few times a year, it is not informative to plot these official rating curve changes on the BReach plot.

Combined BReach

Figure

Combined BReach

Combined BReach

In Fig.

In Fig.

In

Combined BReach

As data are available in two other measurement stations on the river Demer, a
comparison between the results of these stations is interesting.
Figure

Stage–discharge data for

Figure

The Flemish Hydrological Information Centre uses a segmented power law to
assess discharges in this measurement station. In Figs.

Combined BReach

Figure

Although the data set is limited to only 38 points, BReach results offer insight into the situation of the measurement station. However, it is likely that a more elaborate data set will result in more robust conclusions.

In Maaseik, BReach

Combined BReach

In Fig.

In Maaseik, data points with limited maximum reaches in the temporal BReach
plot correspond with points with more extreme values for the residuals
(Fig.

Also, in the results for Aarschot (Fig.

In the station of Barnett's Bank, however, both approaches show a different
amount of information (Fig.

In this section, some general thoughts about the use of the BReach methodology for rating curve data are given. It is obvious that the quality of results is related with the gauging frequency of the stage–discharge data. The stations analyzed in this paper vary from densely measured (up to a mean amount of 22 gaugings a year) to rather poorly measured (2 gaugings a year). Stations with a more complex flow situation are measured more frequently. In many cases, local hydrological services decide to apply a similar differentiation in the gauging frequency that depends on the station's complexity. Based on the available data, it was possible to recognize the history and characteristics of each analyzed station in the BReach results. Nevertheless, it is difficult to pinpoint a minimum required gauging frequency to guarantee a successful application. If a large time gap occurs in the measured data, this can introduce uncertainty about the exact moment of a consistency change. In extreme situations, a temporary change can even disappear from the data, resulting in a (misleading) apparently consistent period. The bar (with indication of the years) above a BReach plot permits detection of these noninformative periods. If more detail is wanted, it can be interesting to create an additional BReach plot in which the absolute time is used in both axes (and thus the indices used in the current plots are projected on these time axes) and with an indication of the moments of the available gaugings on the bisector.

Results of

In any case, it is necessary to be informed about the specific situation of
the analyzed rating curve station. Not only is it important for an adequate
choice of a rating curve model, it is also required for a correct
interpretation of the BReach results and the design of possible alternative
BReach analyses (Sect.

The computational load of the BReach methodology depends on several aspects.
First, it increases linearly with the size of the sample of the parameter
space (and is thus larger for more complex rating curves). Second (and more
important), the necessary calculation time strongly depends on both the
amount of stage–discharge data points and the degree of consistency of the
data set. The principle of the BReach algorithm is that for each data point,
a maximum left and right reach must be searched. If a data set is highly
consistent, the length of these searches increases significantly. Doubling
the amount of data points can (for consistent data sets) hence result in
8 times the original calculation time. In the research for this paper,
all calculations are performed on a personal computer with a 3.4 GHz CPU
Core I7 and 8 GB RAM. For most stations, a BReach analysis took a few
minutes to a few hours. In the most complex case (Clog-y-Fran, with 1166 data
points and

At the moment, interpretation of BReach results is done manually by the user. The availability of a (semi)automatic routine that identifies possible consistent data periods would improve the BReach methodology. As the degree of squareness of a BReach plot within a certain period expresses the lack of important discontinuities, it might play a role in the decision process for assessing consistent periods.

The objective of this paper was to test the BReach methodology for assessing temporal consistency in rating curve data on various stage–discharge data set in the UK, New Zealand and Belgium. This led to successful results for all tested sites.

For each country, local information is maximally used to estimate observational uncertainties that serve as an input for the methodology. In this context, a new approach is proposed for the Belgian data using relative differences between simultaneous discharge measurements to test the plausibility of several a priori assumed error distributions. This approach offers promising insights in the plausible character of measurement error distributions in addition to a more general use of existing literature data about observational uncertainties. However, the limited size of the data set with simultaneous measurements is an important restriction. In order to investigate the possibilities of the proposed approach more profoundly, a more elaborate data set with large spread in time, measurement stations, measurement device and flow conditions is necessary. Such an enlarged data set would not only increase the reliability of a KS test, but would also enhance the possibility to use more bins with smaller ranges of normalized discharge (replacing the current two arbitrary subgroups) and to investigate other measurement devices.

Overall, results of the BReach analyses correspond with site-specific situations. Nevertheless, the investigated cases show that knowledge about the local situation of a measurement station is crucial to design the necessary BReach analyses and to interpret their results correctly. Results show consistency in locations that are known as stable. Where human interventions (e.g., installation of a weir, deepening of a river) altered the rating curve behavior, results show corresponding consistency changes. In locations influenced by weed growth, a higher consistency can be assessed after isolating winter data. Similarly, consistency can be assessed for higher stages in a station where a downstream weir influences low flow behavior. Stations that are prone to geomorphological changes caused by flood events show discontinuities in the BReach plots at the time instants of the highest floods. Moreover, the plots can also indicate which peak floods do not cause consistency changes. The return period that serves as a threshold for consistency changes varies from station to station. These results provide extra insight into the rating curve behavior and confirm the added value of the proposed BReach methodology as a preliminary assessment of data consistency prior to an in-depth determination of discharges and their uncertainty. Moreover, this assessment of (in)consistent periods can enhance other applications based on the investigated data (e.g., by informing hydrological and hydraulic model evaluation design about consistent time periods to analyze).

A comparison between the results of both a residual analysis and a BReach analysis shows that the latter mainly provides additional information in case of a data set that consists of different, consecutive consistent time periods that mutually differ.

In the BReach methodology, the chosen rating curve model is required to
appropriately approximate the relation between discharge and stage for an
important part of the measured range. In this paper, analyses with only a
subset of the data or with stage–discharge data sorted by stage
(BReach

The BReach code is available on

The New Zealand rating curve data were obtained freely from
Horizons Regional Council (

KVE, SVH and NV developed the BReach methodology and designed the overall setup of the analyses. SVH implemented the BReach methodology as a Python code. GC and JF provided observational uncertainties for the UK data and contributed to the design of the analyses and the interpretation of results for these data. KVE assessed observational uncertainties for Belgian data, performed the BReach analyses and prepared the paper with contributions from all co-authors.

The authors declare that they have no conflict of interest.

The authors thank Flanders Hydraulics Research and its Hydrological Information Centre for providing gauging data and accompanying information and Waterwegen en Zeekanaal NV for providing information about infrastructure works in Aarschot, Zichem and Diest. The authors also thank the Environment Agency, Natural Resources Wales, Horizons Regional Council and Marlborough District Council for providing the stage–discharge and rating curve data for the UK and New Zealand sites. Furthermore, they thank Brent Watson for providing additional information about the measurement station in Mais. This research has benefitted from a statistical consultation with Ghent University FIRE (Fostering Innovative Research based on Evidence). Gemma Coxon and Jim Freer were supported by NERC MaRIUS: Managing the Risks, Impacts and Uncertainties of droughts and water Scarcity, grant number NE/L010399/1. The authors also want to thank Kolbjorn Engeland and the anonymous reviewer for their constructive comments and helpful suggestions on an earlier version of the paper. Edited by: Bettina Schaefli Reviewed by: Kolbjorn Engeland and one anonymous referee