Consistency assessment of rating curve data in various locations using Bidirectional Reach (BReach)

Abstract. When estimating discharges through rating curves, temporal data consistency is a critical issue. In this research, consistency in stage–discharge data is investigated using a methodology called Bidirectional Reach (BReach), which departs from a (in operational hydrology) commonly used definition of consistency. A period is considered to be consistent if no consecutive and systematic deviations from a current situation occur that exceed observational uncertainty. Therefore, the capability of a rating curve model to describe a subset of the (chronologically sorted) data is assessed in each observation by indicating the outermost data points for which the rating curve model behaves satisfactorily. These points are called the maximum left or right reach, depending on the direction of the investigation. This temporal reach should not be confused with a spatial reach (indicating a part of a river). Changes in these reaches throughout the data series indicate possible changes in data consistency and if not resolved could introduce additional errors and biases. In this research, various measurement stations in the UK, New Zealand and Belgium are selected based on their significant historical ratings information and their specific characteristics related to data consistency. For each country, regional information is maximally used to estimate observational uncertainty. Based on this uncertainty, a BReach analysis is performed and, subsequently, results are validated against available knowledge about the history and behavior of the site. For all investigated cases, the methodology provides results that appear to be consistent with this knowledge of historical changes and thus facilitates a reliable assessment of (in)consistent periods in stage–discharge measurements. This assessment is not only useful for the analysis and determination of discharge time series, but also to enhance applications based on these data (e.g., by informing hydrological and hydraulic model evaluation design about consistent time periods to analyze).

Abstract. When estimating discharges through rating curves, temporal data consistency is a critical issue. In this research, consistency in stage-discharge data is investigated using a methodology called Bidirectional Reach (BReach), which departs from a (in operational hydrology) commonly used definition of consistency. A period is considered to be consistent if no consecutive and systematic deviations from a current situation occur that exceed observational uncertainty. Therefore, the capability of a rating curve model to describe a subset of the (chronologically sorted) data is assessed in each observation by indicating the outermost data points for which the rating curve model behaves satisfactorily. These points are called the maximum left or right reach, depending on the direction of the investigation. This temporal reach should not be confused with a spatial reach (indicating a part of a river). Changes in these reaches throughout the data series indicate possible changes in data consistency and if not resolved could introduce additional errors and biases. In this research, various measurement stations in the UK, New Zealand and Belgium are selected based on their significant historical ratings information and their specific characteristics related to data consistency. For each country, regional information is maximally used to estimate observational uncertainty. Based on this uncertainty, a BReach analysis is performed and, subsequently, results are validated against available knowledge about the history and behavior of the site. For all investigated cases, the methodology provides results that appear to be consistent with this knowledge of historical changes and thus facilitates a reliable assessment of (in)consistent periods in stage-discharge measurements. This assessment is not only useful for the analysis and determination of discharge time series, but also to enhance applications based on these data (e.g., by informing hydrological and hydraulic model evaluation design about consistent time periods to analyze).

Introduction
For many applications in hydraulics, hydrology and water management, reliable river discharges are crucial. A commonly used practice for the estimation of these discharges is the use of rating curves. Through the calibration of a relation between stage and discharge measurements (i.e., a rating curve), high-frequency stage measurements can be transformed into high-frequency discharge measurements. As this relation is based on only a limited number of simultaneous stage-discharge measurements, it is a relatively budgetfriendly method for discharge assessment in rivers.
The use of rating curves requires attention for the consistency of the measured stage-discharge data set. Several causes (e.g., geometric changes of the river bed, infrastructure works, weed growth) can alter the hydraulic behavior of the river in the considered measurement location temporarily or permanently and thus limit the validity of a calibrated rating curve. Information about this temporal (in)consistency is hence critical to prevent additional errors and biases from occurring in the determined river discharges. Moreover, a correct assessment of (in)consistent periods can enhance other applications based on the investigated data. For instance, errors in hydrological or hydraulic model results that are caused by changes in a river's situation can (if they lead to inconsistency of the rating curve data as well) be avoided by using known consistent stage-discharge time periods for model evaluation. Methods to detect and describe this temporal (in)consistency have been studied by several authors (see Van Eerdenbrugh et al., 2016, for a more extensive review). McMillan et al. (2010) assumed that changes in rating curve behavior are mainly caused by floods and that periods in between are consistent. Other methods described the variation of rating curve parameters in time (Westerberg et al., 2011;Reitan and Petersen-Øverleir, 2011). Within predefined consistent periods, Jalbert et al. (2011) accounted for an aging error toward an initial rating curve, which expresses the increasing risk of a change of the river bed in time. Morlot et al. (2014) expanded this method with two preliminary steps. First, the stage-discharge data set is segmented into consistent periods, and subsequently hydraulic analogues of each stage-discharge measurement are selected within these periods. Although measurement uncertainties are considered in the latter step, the first and thus defining data segmentation does not account for them. Van Eerdenbrugh et al. (2016) discuss that in the methods mentioned above, the assessment of temporary or permanent changes of the hydraulic regime requires assumptions or decisions that more or less influence eventual results. Moreover, most of the methods start from a definitive choice of a rating curve model and comprehend an assessment of its parameters distribution. However, if data consistency is assessed prior to a definitive in-depth analysis (as in Morlot et al., 2014), it can provide an increased understanding that contributes to the selection of an appropriate, more definitive rating curve model. An important criterion for this preliminary consistency analysis is that results minimally depend upon choices and decisions made by users.
Therefore, Van Eerdenbrugh et al. (2016) have developed a methodology to enable the detection of consistent periods in stage-discharge data. It is called Bidirectional Reach (BReach) and considers a period to be consistent if no consecutive and systematic deviations from a current situation occur that exceed observational uncertainty. This definition of consistency is commonly used in operational hydrology (Reitan and Petersen-Øverleir, 2011). It requires the assessment of (1) observational uncertainty, (2) a current situation and (3) the consecutive and systematic character of nonacceptable deviations. Observational uncertainty is estimated for each country using regional information (Sects. 2.2.3 and 2.4). The assessment of a current situation is done by evaluating the capability of a rating curve model to describe a subset of the data in each pair of stage-discharge measurements (Sect. 2.2.5). This capability is defined by a degree of tolerance, i.e., a definition of satisfactory behavior for the rating curve model in a series of gauging points (Sect. 2.2.4). By combining multiple degrees of tolerance, complementary information is provided that allows for the exclusion of causes for model failure other than data inconsistency. Hence, changes throughout time in the combined model performance indicate possible changes in data consistency. This information is used for the last requirement in the definition of consistent periods, i.e., the assessment of the consecutive and systematic character of nonacceptable deviations (Sect. 2.2.6). In Sect. 2, the different steps of the methodology are briefly explained and all necessary choices are discussed.
In Van Eerdenbrugh et al. (2016), one observed and several synthetic data sets are used to evaluate and test the robustness of the methodology. The methodology was shown to perform well, with robust results despite decreased data availability, erroneous estimations of measurement uncertainty and even a partially deficient rating curve model. All investigated data sets in this study belong to the same geographical location. Therefore, the objective of the current paper is to perform an additional analysis with more diverse measured data sets in order to further explore the methodology's applicability. For this purpose, several gauging stations in the United Kingdom (UK), New Zealand and Belgium are selected based on their well-documented history and their specific characteristics related to rating curve consistency. For each country, regional information is maximally used to estimate observational uncertainty. Based on this uncertainty, a BReach analysis is performed and, subsequently, results are validated against available knowledge about the history and behavior of the site. In a selection of the investigated stations, results of the BReach methodology are additionally compared with results of a classical residual analysis.

Study areas and data
The BReach methodology is applied to three stage-discharge data sets in the UK, two in New Zealand and five in Belgium. These stations are selected based on their particular properties with regard to data consistency. Their welldocumented history enables a verification of the results of a BReach analysis. An overview of these stations and their main characteristics is given in Table 1. The UK data are provided with a quality indication and hence only stagedischarge measurements marked as "good" are used in this research. The New Zealand data were preprocessed by the Horizons Regional Council and the Marlborough Regional Council and are assumed to have a sufficient quality level. For the Belgian stations, raw (unprocessed) gauging data are available. Therefore, stage-discharge measurements with recorded stages that deviate more than 5 cm from the nearest continuous value are treated as outliers and not used in the analysis. These continuous stage data have a temporal resolution of 1 h (before 2003) and of 15 min (after 2003 Wilson and Wöhling (2015).
that only measurements with large errors are excluded from the analysis.

BReach methodology: description and practical application
The aim of the BReach methodology is to identify consistency in rating curve data based on a quality analysis of model results. The methodology consists of several consecutive steps (Van Eerdenbrugh et al., 2016): - Step 1: selection of a model structure for the analysis. - Step 2: sampling of the parameter space. - Step 3: assessment of acceptable model results.
-Step 4: assessment of different degrees of tolerance. - Step 5: assessment of the bidirectional reach for all degrees of tolerance. - Step 6: identification of consistent data periods.
In this section, all steps and their practical application in this paper are briefly described.

2.2.1
Step 1: selection of a model structure for the analysis A first step is the choice of a rating curve model that appropriately approximates the relation between discharge and stage for an important part of the measured range. In this paper, the chosen rating curve model depends on the characteristics of the measurement station. For the station of Colsterworth (UK , Table 1), a flat V weir controls the flow and thus a power law can be used to describe the stage-discharge relationship: where Q is the discharge (m 3 s −1 ), c is a scale coefficient (m 3/n s −1 ), h is the stage (m), h 0 is a location parameter (m) that expresses the stage of zero flow and n is an exponent (-) that is a function of the type and the shape of the considered cross section. For all other analyzed stations (except from the station of Clog-y-Fran, UK), a segmented rating curve with two segments is used (e.g., Reitan and Petersen-Øverleir, 2008;Le Coz et al., 2014;McMillan and Westerberg, 2015): where each segment describes a different flow situation and h br, 1 is the breakpoint between two consecutive segments. In this breakpoint, continuity between both segments must be provided. By using this rating curve model with a breakpoint at low flow conditions, it is possible to account for two different situations. First, the model is able to describe a change in flow situation. In many cases, flow at low stages is locally controlled (e.g., by one or several riffles). At higher stages, the flow situation at these riffles becomes drowned and the flow is controlled by a longer river reach (e.g., Reitan and Petersen-Øverleir, 2008;Le Coz et al., 2014). Second, a twosegment rating curve allows the effect of geomorphological changes throughout time to be accounted for. If the river bed deepens, the value of h 0 in Eq. (1) is expected to decrease. It is thus possible that in certain periods of the measured stage-discharge data, the stage of zero flow is higher than the lowest measured stage within the complete period and hence the sampling range of h 0 (Sect. 2.2.2) is too narrow. For these periods, the use of a second segment can overcome this shortcoming and the role of the first segment will thus be small(er). Although both model structures are simple, this approach is satisfactory for nearly all stations. By analyzing wellchosen subsets of the data (e.g., winter data if the influence of weed growth can be expected, as in the river Grote Nete at Hulshout, Belgium) or by performing an analysis on the data after sorting them by stage instead of chronologically (e.g., to assess the influence of a downstream movable weir in the river Meuse at Maaseik, Belgium), the chosen rating curve models also satisfy for less straightforward flow situations (see Sects. 2.3, 3.7 and 3.8).
The complex flow situation in the river Taf at Clog-y-Fran (UK), however, requires a rating curve model with increased complexity. In this station, the hydraulic behavior is influenced by the combination of weed growth affecting low flow behavior, a considerable overspill over the right bank at higher stages and an unstable bed control. For these reasons, a segmented rating curve with three segments is used. The second segment overcomes similar difficulties, as described for the two-segment rating curve. The third segment is representing the flow for stages higher than bank overspill.
Generally, the choice of rating curve model should maximally be based on the existing flow situation at the rating curve station. In case more complex flow situations (e.g., hysteresis or backwater effects) can be observed and described, it is possible to apply the BReach methodology with an adapted rating curve model (e.g., Jones, 1916;Petersen-Øverleir, 2006;Dottori et al., 2009;Reitan and Petersen-Øverleir, 2011). In case there is little or no knowledge of the flow situation, it is tempting to use a rating curve model with multiple segments and wide sample ranges for the breakpoints. If the amount of samples is sufficiently large, the possibility of obtaining nearly identical values for the parameters of two adjacent segments theoretically enables an excess of segments in the chosen model to be eliminated. As shown in the example at Clog-y-Fran (Sect. 3.3), the parameter sets that result in a model structure with the largest maximum reaches will be decisive for eventual BReach results. This approach however involves the risk of overfitting the model to the available gauging data, mainly in case of small and inconsistent stage-discharge data sets. It is not implausible that in such a case of sparse gauging data, eventual BReach results are obtained by a model structure that is not capable of describing the real flow situation at the site, but instead incidentally fits a series of consecutive gauging points that belong not only to different height ranges but also to different consistent periods. Therefore, and similar to in other rating curve applications, the choice of an appropriate rating curve model should preferably be based on a hydraulic analysis of the measurement site (Le Coz et al., 2014).
It is important to mention that all decisions to be made in the BReach methodology, such as the assessment of the measurement uncertainty (Sects. 2.2.3 and 2.4) and of different degrees of tolerance (Sect. 2.2.4) are made independently of the appropriateness of the chosen rating curve model. Despite the methodology's ability to account for a limited model deficiency (Sect. 2.2.4 and Van Eerdenbrugh et al., 2016), this additionally advocates a well-considered choice of a model structure.

2.2.2
Step 2: sampling of the parameter space The sampling of the power law parameters (Eq. 1) is similar to Van Eerdenbrugh et al. (2016), where sampling intervals are bounded to a physically realistic order of magnitude; h 0 is sampled from the interval [h bed -40 cm, h min, cont -2 cm], where h bed is the lowest bed level of the available reliable cross section measurements. If no cross section data are available, it is the local datum toward which the measured stages are expressed. The value of h min, cont is the lowest measured stage in the continuously measured data series. Samples of n are taken from the interval [0.5, 3.5]. The outermost values obtained when applying the power law function for all gauging points with the upper and lower limits for h 0 and n are used to define the sampling interval for the coefficient c. The lower limit is obtained by halving the resulting lowest value for c, and for the upper limit the highest value is doubled. Both parameters h 0 and n are sampled from a uniform distribution. For parameter c, a more dense sampling is aimed at for smaller values. Hence, this parameter is sampled from a uniform distribution after log-transformation.
The two-segment rating curve (Eq. 2) has seven parameters, of which h 1 is computed to obtain continuity between the two consecutive segments of the rating curve. Sampling of h 0 , n 0 , n 1 , c 0 and c 1 is the same as for the single power law. The sampling interval for h br, 1 is [h min, cont -2 cm, h br, max ]. For all stations, the height range in which the lowest flows occur is assessed visually from the stage-discharge plots. An upper limit of this height range is estimated and taken as a value for h br, max ( Table 2).
The three-segment rating curve used at Clog-y-Fran has 11 parameters, of which h 0 , n 0 , n 1 , n 2 , c 0 , c 1 and c 2 are sampled similarly to for the two-segment power law and h 1 and h 2 are again computed to obtain continuity between two Based on the information about out-of-bank flow at higher stages, h br, 2 is sampled from the interval [2.9 m, 3.5 m]. Both parameters are sampled from a uniform distribution. For all types of rating curves, the parameter space is sampled using a Latin Hypercube sampling. For the single power law, 1.3×10 6 samples are taken. For the two-segment and the three-segment power law, 6.5×10 6 and 1.3×10 7 samples are taken, respectively.

Step 3: assessment of acceptable model results
Following Van Eerdenbrugh et al. (2016), a result of a rating curve model and a parameter set is classified as acceptable if it fits in a rectangular acceptance zone that is enclosed by the 95 % uncertainty boundaries of the accompanying stage and discharge measurement.
An estimation of these measurement uncertainties is made by many authors. A good literature overview with a summary of the major findings is given in Pelletier (1988) and in McMillan et al. (2012) for both stage and discharge measurements. In these studies, errors on stage measurements are generally indicated as relatively small. Most of the estimated 95 % uncertainty boundaries lie within ±10 mm, although values up to ±40 mm are also mentioned for more uncertain locations. Error distributions are mostly assumed to have a negligible bias and to be independent of the value of the measured variable (homoscedastic).
Discharge measurements are more uncertain and their errors are subjected to heteroscedasticity, i.e., error distributions vary with changing discharge values. Therefore, uncertainty on discharge measurements is typically expressed as a percentage of the occurring discharge. In nearly all studies, it is assumed to have a negligible bias. Pelletier (1988) reports 95 % uncertainty boundaries between ±4 and ±17 % for 5-35 verticals with the velocity-area method. McMillan et al. (2012) report the same order of magnitude for the velocity-area method with various techniques. Coxon et al. (2015) found that despite expressing errors as a percentage of occurring discharge, the value of the scale parameter in the assumed error distribution depends on the value of the normalized discharge (i.e., measured discharge divided by mean discharge), with 95 % boundaries up to ±25 % for low normalized flows and ±13 % for the highest normalized flows.
Although most of these studies share some general considerations, eventual uncertainties on stage and discharge measurements can depend upon location, flow conditions and measurement technique, and hence estimated uncertainty boundaries are subjected to a relatively large variation. Therefore, this paper maximally uses available local information for the estimation of observational uncertainty boundaries. Nevertheless, the ranges provided in literature offer a valuable framework to validate these local findings. For each country separately, an estimation of local uncertainty boundaries is described in Sect. 2.4.
Based on the estimated observational uncertainty, results of a model (i.e., a rating curve model with a sampled set of parameters) are categorized as acceptable or nonacceptable. The result of this step is a binary matrix with classification results for each parameter set and each data point.

Step 4: assessment of different degrees of tolerance
As mentioned in the introduction, the BReach methodology evaluates the capacity of a rating curve model to describe a subset of the data in each observation. For this evaluation, a definition of satisfactory behavior of a rating curve model is necessary. In this paper, this definition is called the degree of tolerance and expresses the percentage of model results that are allowed to be nonacceptable in a sequence of data points. Van Eerdenbrugh et al. (2016) discuss that possible causes of a rating curve model being nonacceptable in a data point are (1) the occurrence of a higher observational error than estimated for the definition of acceptable results, (2) model deficiency in certain ranges of the investigated variables and (3) data inconsistency. Due to the random occurrence in time of causes (1) and (2), corresponding nonacceptable results tend to be singularities in a chronologically sorted series of stage-discharge measurements. If, on the contrary, a nonacceptable model result is caused by the occurrence of data inconsistency, it can be expected that for the same model, nonacceptable results will also occur in other, neighboring (in time) data points. Hence, these causes of failure will be highlighted when using higher degrees of tolerance (i.e., relaxation of the amount of points that can be nonacceptable in a sequence of data). As different degrees of tolerance provide complementary information, degrees of 0, 5, 10, 20 and 40 % are used for this research.

2.2.5
Step 5: assessment of the bidirectional reach for all degrees of tolerance Before assessing the bidirectional reach, all stage-discharge measurements are sorted chronologically and their index within this sorted data series is used to refer to them. Subsequently, a degree of tolerance is selected and the binary classification matrix is used to evaluate a model and its results from the perspective of one data point. The temporal span for which this model behaves satisfactorily is assessed both in the direction of the previous and the following data points using a directional search, that stops as soon as the required conditions are not met. Within these spans, the index of the outermost observation with an acceptable result is referred to as the left (previous points) or right (following points) reach. This information is aggregated for all parameter sets by taking the outermost left and right reaches. They are called the maximum left and right reach and represent the indices beyond which none of the sampled parameter sets is acceptable within a data series with satisfactory behavior. Assessment of the maximum left and right reach is repeated for all data points and for all degrees of tolerance, and results are summarized in a combined BReach plot (e.g., Fig. 3a).
In this plot, each gray tint represents results for a specific degree of tolerance. For each data point on the x axis, the gray zone represents the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). The vertical distance between the bisector and the index of the maximum left reach represents the maximum amount of data points before the investigated data point that can be described with at least one set of parameters under the chosen degree of tolerance. Similarly, the vertical distance between the index of the maximum right reach and the bisector represents this maximum amount for the data points after the investigated data point.

2.2.6
Step 6: identification of consistent data periods Combined BReach plots (e.g., Fig. 3a) provide a visual means to evaluate the capability of the rating curve models to describe a subset of the data in each point. Changes of this capability throughout time result in discontinuities of a BReach plot, and each degree of tolerance provides complementary information. In accordance with the discussion in Sect. 2.2.4, discontinuities in the maximum reaches for stringent degrees of tolerance provide information about the diversity of measurements caused by (1) the occurrence of a higher observational error than estimated for the definition of acceptable results, (2) model deficiency in certain ranges of the investigated variables or (3) data inconsistency. The resulting BReach plots show changes in model performance precisely, but include too wide a variety of possible causes to detect data inconsistency. For a higher degree of tolerance, a model is allowed to generate nonacceptable results in a larger percentage of the data points. Therefore, discontinuities caused by (1) and (2) will disappear from the plots due to their random character. As a result, changes in consistency will be emphasized in the plot, but the larger tolerance does not facilitate a precise location of these changes. If plots that combine all degrees of tolerance indicate consistent data periods (i.e., periods without important discontinuities), plots with higher degrees of tolerance are used to assess the amount and indicative locations of consistency changes, and, based on this information, plots with stringent degrees of tolerance are used to locate these possible consistency changes more precisely (Van Eerdenbrugh et al., 2016).

Alternative analyses
In this paper, a BReach analysis is performed for all stations. If a seasonal variation in the rating curve behavior (due to weed growth) is presumed, a second analysis is performed on a subset of data measured during winter months (between December and March). In the UK and in Belgium, such a set of winter data is not expected to be influenced by weed growth. The combination of a BReach analysis on all data that shows no consistency and an analysis on only winter data that indicates consistent periods can confirm the influence of weed growth.
If it can be assumed that the behavior of the rating curve changes with changing stages, an additional BReach analysis is performed. For the latter, the data are sorted by stage instead of chronologically. Results of such an analysis can reveal the height ranges in which the rating curve behavior alters. As multisegmented rating curves aim to overcome these alterations, it is not interesting to use them in this context. Therefore, a single power law is used for all BReach analyses on data sorted by stage.
To avoid confusion between both a temporal BReach analysis and an analysis on data sorted by stage and between several types of rating curve models, results of the analyses will be referred to as BReach x_ys . In this formulation, x is the type of analysis (t, on chronologically sorted data, or h, on data sorted by stage) and y is the amount of segments in the chosen rating curve model.

Assessment of uncertainties on stage and discharge measurements
The assessment of 95 % uncertainty boundaries of the stage and discharge data is based on available local information. This information availability differs for each country and hence, in this section, the followed approach is described for each country.

UK measurement stations
For the UK stations, Coxon et al. (2015) have analyzed the relative rating curve residuals from 26 measurement stations with very stable rating curves. A relative residual is defined as the ratio of the deviation (between discharge measurement and derived rating curve) and the measured discharge. The distribution of these residuals is investigated for different bins of normalized flow Q n (i.e., measured flow divided by mean flow). Results of this investigation show that logistic distributions with a zero location parameter (i.e., µ = 0) and a scale parameter (σ ) that varies exponentially with normalized discharge (Eq. 3) fit the residuals well for all bins.
The 95 % uncertainty boundaries of discharge measurements for the UK data used in this paper are derived from these distributions and vary between ±28 % for the lowest normalized flows and ±13 % for the highest normalized flows. For stage measurements, that typically have smaller measurement errors than discharges, a uniform error of ±5 mm was assumed by Coxon et al. (2015). Again, 95 % boundaries of this error (±4.875 mm) are used for the definition of the acceptance zone in the BReach methodology.

New Zealand measurement stations
In McMillan et al. (2010), the uncertainty on measured discharges in the measurement station of Barnett's Bank is assumed to follow a Gaussian distribution with zero mean (µ) and a standard deviation (σ ) of 4 %. Errors on stage measurement are considered Gaussian with zero mean and a standard deviation of 2 cm. Uncertainty boundaries of 95 % are thus ±8 % for discharges and ±4 cm for stages. These estimations are based on literature data and local expertise. However, McMillan and Westerberg (2015) assume error distributions for Barnett's Bank similarly to those described in Sect. 2.4.1 (Coxon et al., 2015). In this case, 95 % uncertainty boundaries for discharge measurements vary again between ±28 % for the lowest normalized flows and ±13 % for high normalized flows and are thus substantially higher than in the abovementioned approximation with a normal distribution. Stage uncertainty boundaries, on the contrary, are estimated smaller by Coxon et al. (2015) (±4.875 mm versus ±4 cm). Therefore, two different BReach analyses are performed for all New Zealand data, each based on one of these uncertainty estimations, and results are compared.

Belgian measurement stations
For the Belgian measurement stations, no prior information concerning measurement uncertainties was available. Nevertheless, it is possible to gain insight into plausible characteristics of measurement errors by analyzing simultaneous measurements. Although, in this paper, a BReach analysis is performed on only five Belgian measurement stations, simultaneous measurements of nine different stations are used for a preliminary uncertainty assessment of discharge measurements in order to maximize the amount of (scarce) data.
A pair of simultaneously measured discharges consists of two discharge measurements that are measured with the same type of device within a time span of 2 h and for which the corresponding measured stages are identical. Combining this information for nine Belgian stations results in a set of 42 simultaneous pairs that are all measured with an OTT QLiner. The restriction to only one type of measurement device pre- To overcome the heteroscedastic character of discharge measurement errors, they are expressed as a percentage of the real discharge. Nevertheless, different authors find that parameters of error distributions change with changing discharges (e.g., McMillan et al., 2012;Coxon et al., 2015). To investigate this, the simultaneous discharge measurements are sorted according to their normalized discharge (see Sect. 2.4.1). Subsequently, two subsets of this data set are created, containing the 21 lowest and highest pairs of measurement. They are referred to as low flow and high flow data. These subsets are assumed to be unbiased (error distribution with zero mean, see Sect. 2.2.3).
Neither data set allows for a direct assessment of measurement errors. However, if an error distribution is assumed, it is possible to test equality between the distributions of both the relative differences of the simultaneously measured discharge pairs and a created set of relative differences based on two equally sized samples of measurement errors from the assumed distribution. For instance, a Gaussian measurement error with zero mean and a standard deviation of 4 % is assumed for the low flow data set (see Sects. 2.4.2 and McMillan et al., 2010). From this distribution, two samples 1 and 2 are taken, each with size m (in this paper m = 10 6 ), and they pairwise represent the assumed errors of two simultaneous flow measurements. As these errors are expressed as a percentage of the real discharge, a measurement (for both j = 1 and j = 2) can be written as follows: with i ∈ [1, m], Q meas, j, i one of both measured discharges in measurement pair i and Q true, i the real discharge that occurred during the measurements. Combining Eq. (4) for both measurements in a pair leads to the following: Independent of the real discharge, this relative difference of two simultaneous measurements can thus be expressed by their measurement errors. If the assumed error distribution (Gaussian, µ = 0 % and σ = 4 %) is correct, a data set calculated from the measurement pairs using the left-hand side of Eq. (5) (further called Q m, % ) will have the same distribution as a data set calculated from the two sets of sampled errors using the right-hand side of Eq. (5) (further called Q , % ). When applying a two-sample nonparametric Kolmogorov-Smirnov (KS) test on these data sets, the resulting p value is 0.62, which is much higher than the commonly used 5 % level for rejection of the hypothesis that both data sets are equally distributed. The corresponding value of the Kolmogorov-Smirnov statistic is 0.16. This is the maximum vertical distance between the empirical cumulative distribution functions (ECDFs) of both tested data sets (Fig. 1a).
The same analysis is repeated for both low and high flow data and for Gaussian and logistic error distributions with different values of the scale parameters, equidistantly taken from the interval [1 %, 6 %] and [0.35 %, 4.4 %], respectively. As an example, Fig. 1b shows the resulting values of the KS statistic against the corresponding value of the scale parameter for the low flow data set using a Gaussian distribution. A p value resulting from a KS test depends both on the value of the KS statistic and on the number of points in the investigated data sets. As the latter remains constant for all tests, p values and values of the KS statistic will show a similar (although inverse) pattern. Figure 1b clearly shows the occurrence of the lowest value of the KS statistic (and corresponding highest p value) for a standard deviation of 3.12 %. In Fig. 1c, the ECDF of Q m, % corresponds well with the ECDF of Q , % for this latter distribution. However, the occurrence of a high p value (and corresponding small value of the KS statistic) provides no confirmation of the null hypothesis, and it is possible that many other hypotheses lead to similar p values. Nevertheless, the value of the KS statistic provides information not only about differences in central tendency but about any difference in the ECDFs. From this perspective, Spear and Hornberger (1980), Hornberger and Spear (1981) and Hornberger et al. (1985) compared the ECDFs of both behavioral and nonbehavioral parameter values and used the KS statistic as a measure for the sensitivity of a parameter. In this research, it is used to evaluate the behavior of error distributions that are a priori chosen based on currently available knowledge, without excluding the plausibility of other, unexplored error distributions.
The pattern of the other results (Gaussian distribution with high flow data, logistic distribution with both low and high flow data) is very similar. Table 3 shows the characteristics of both flow classes and both distribution types that correspond with a minimum KS statistic. There is a good correspondence between the ECDFs of Q m, % and Q , % for both distribution types. Adjusted p values (i.e., p values after Benjamini-Hochberg correction, that accounts for a false discovery rate; Benjamini and Hochberg, 1995) and KS statistics are also very similar and hence prohibit making a distinction in favor of one single distribution type. Like in other studies (Sect. 2.2.3), this table clearly indicates that high flow data correspond with lower values of the scale parameters (and thus smaller uncertainty boundaries) than low flow data. For each flow class, the 95 % uncertainty boundaries of the two distributions do not differ strongly, but they are relatively small compared with the uncertainty boundaries applied in Sects. 2.4.1 and 2.4.2 and with literature data (e.g., Pelletier, 1988;McMillan et al., 2012). A tentative explanation for these low uncertainty values could be the relatively tranquil flow situations in the investigated measurement stations due to low slopes. Moreover, most of the investigated locations are situated at a bridge, facilitating discharge measurements in controlled conditions. The limited amount of data prohibits a more precise description of this tendency toward lower uncertainties for higher normalized discharges. Moreover, the lowest normalized flow in the set of simultaneous discharge measurements is 0.72. Results of Coxon et al. (2015) show that an increase of measurement uncertainties can be expected for lower normalized flows. As more than 80 % of all investigated Belgian stage-discharge data have normalized discharges within the range of the low flow data subset or lower, it was decided to assume 95 % uncertainty boundaries to be ±6.4 % for all investigated Belgian discharge measurements. Although these values originate from the investigated low flow data measured with QLiners, they are applied for all discharge measurements, independent of their measurement technique. It can be expected that discharge uncertainties will differ for different techniques (e.g., McMillan et al., 2012;Song et al., 2012;Le Coz et al., 2014), but a lack of simultaneous discharge measurements prohibits a similar analysis for other measurement devices. Extra measurement campaigns might augment insight for these devices.
For the assessment of uncertainties on stage measurements, the data availability is different. Two simultaneous stage measurements are provided for each stage-discharge measurement. However, the first type of measurement is recorded from a staff gauge during a discharge measurement and the second type is registered by a continuous measurement device. Hence, it can be expected that error distributions of both data types differ and a similar approach to that for discharge measurements is not justified. Therefore, 95 % uncertainty boundaries are estimated to be ±2 cm. This value is based on literature data and on local expertise.

Residual analysis as a benchmark
In order to benchmark the results of the BReach analyses, a residual analysis is performed for several of the investigated measurement stations. An analysis of the relative deviations from an "average" rating curve is frequently used in operational hydrology, as their behavior can be used as an indication of the stability of a measurement station or of a shift in the rating curve (e.g., Petersen-Øverleir, 2004; World Meteorological Organisation, 2010; Morlot et al., 2014).
The performed analysis is based on a set of parameters that results from the minimization of the root mean square error (E RMS ) of the chosen rating curve model in all data points (which is further referred to as "the E RMS optimized model"). This approach assumes that relative errors of discharge measurements are homoscedastic and follow a Gaussian distribution (Petersen-Øverleir, 2004(Petersen-Øverleir, , 2006. At the stations that are selected for this analysis (Maaseik, Aarschot and Barnett's Bank), similar conditions are assumed when assessing discharge measurement uncertainty boundaries (Sect. 2.4).

Results and discussion
For each measurement station, results of the BReach analyses are validated using the available local information.

River Witham at Colsterworth, UK
In Fig. 2a, a combined BReach t_1s plot is shown for Colsterworth. The BReach plot shows data consistency during the entire measured period. This corresponds with the nature of the measurement station, a flat V weir with a stable stage-discharge relationship. Even a 0 % degree of tolerance shows no discontinuities within the entire data period. Figure 2b shows the available stage-discharge measurements at Colsterworth.

River Taw at Taw Bridge, UK
In Fig. 3a, a combined BReach t_2s plot is shown for Taw Bridge. In this plot, time instants near the peak discharges (return period ≥ 2 years) are indicated with a red mark  on the bisector. This plot shows changes in data consistency that correspond with historical information. An important change in consistency occurs at index 299 and stagedischarge points after this time instant are likely to belong to one single consistent period. This starting point corresponds with the moment of installation of a flat V weir (October 1998). Before this date, the data series shows many discontinuities, also for higher degrees of tolerance. The time instants of these discontinuities often coincide with those of the highlighted peak floods. Hence, the plot suggests that these flood events caused geomorphological changes of the river bed that induced changes in consistency and that periods in between were relatively stable. The English Environment Agency uses a segmented power law to assess discharges in this measurement station. Rating curve changes generally imply changes of the rating curve coefficients for the lowest and medium flows. The time instants of these official changes are indicated with cyan lines that depart from the bisector. If the change involves also the flood rating curve, an asterisk and (if available) some background information is added to the date indication. Although many of the rating curve changes correspond with discontinuities, the BReach plot sometimes suggests different or fewer moments of change.
In Fig. 3b, results of a BReach h_1s analysis on the stagedischarge data measured after installation of the weir is shown. As can be expected, the plot shows consistency for nearly the complete height range. Only for the highest stages and the lower degrees of tolerance, some discontinuities occur in the plot. Figure 3c shows the available stage-discharge measurements at Taw Bridge. Data measured after installation of the weir are indicated separately.

River Taf at Clog-y-Fran, UK
In Fig. 4a and 5a, combined BReach t_3s plots based on only winter data, and all data are shown for Clog-y-Fran. Although the plot with all data (Fig. 5a) indicates many discontinuities, the maximum reaches of the more tolerant degrees cover a large part of the data set for several points. They are sometimes alternated by data points with more limited reaches. The plot based on only winter data (Fig. 4a) indicates larger consistent blocks. In this latter plot, time instants near the peak discharges (return period ≥ 5 years) are indicated with a red mark on the bisector. The time instants of the discontinuities often coincide with those of the highlighted peak floods. Hence, the plots do not only confirm the influence of weed growth, but also suggest that high flood events cause geomorphological changes of the river bed that induce changes in consistency and that periods in between often  are relatively stable. Nevertheless, not all floods in Fig. 4a cause discontinuities and not all discontinuities can be linked with the occurrence of large floods. Besides erosion due to large floods, the cross section is also known to be prone to the (more gradual) build-up of silt. This and other unknown processes might influence the result of this BReach analysis to some extent. Natural Resources Wales, who manage this gauging station, use a segmented power law to assess discharges in this measurement station. In Fig. 4a, the available time instants of these official changes are indicated with cyan lines that depart from the bisector. Although many of the rating curve changes correspond with discontinuities, the BReach plot sometimes suggests different or fewer moments of change.
Although the flow situation in Clog-y-Fran is complex, the available information about the station can be linked with results of a BReach analysis. These results indicate the need for an in-depth analysis that should lead to an appropriate modeling approach for periods with weed growth. For the remaining (winter) data, an assessment of consistent periods is possible. However, the choice of an appropriate rating curve model is crucial for success. Figure 4b and 4c show  . Combined (a) BReach t_3s plot (winter data), (b) BReach t_1s plot (winter data) and (c) BReach t_2s plot (winter data) for Clog-y-Fran. In all subplots, for each index on the x axis the gray area indicates the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). Each gray tint represents a different degree of tolerance (i.e., percentage of data points allowed to have nonacceptable model results). results of a BReach t_1s and a BReach t_2s analysis on winter data in Clog-y-Fran, respectively. The two-segment rating curve has only a breakpoint at the stage of overspill over the right bank. These two figures do not mutually differ a lot, showing that a difference in the rating curve model that affects higher flows has a minor effect on eventual BReach results. This corresponds with earlier results of Van Eerdenbrugh et al. (2016) based on synthetic data. However, comparison of Fig. 4b and c with Fig. 4a reveals that BReach results alter importantly when adding an extra segment (with a breakpoint at low stages) to the rating curve model. In all other stations where a segmented rating curve (two segments) is used, there is a more limited or even a negligible difference with BReach results resulting from a simple power law. Therefore, it is plausible to assume that in Clog-y-Fran, the flow situation changes from a locally controlled flow (e.g., caused by a riffle affecting the lowest flows) towards a flow situation controlled by a longer river reach for higher flows. It is plausible that this importance of an appropriate modeling of low flow stage-discharge relations on BReach results corresponds with a higher distinctive capacity of these data toward temporal consistency.
In Fig. 5b, results of a BReach h_1s analysis of the winter data is shown. The plot shows a relatively large consistency for stages above index 172 (1.33 m). This is linked with the influence of geomorphological changes on river stages, that is expected to decrease for increasing discharges due to the corresponding increase of the conveyance of the river cross sections. For high discharges, the order of magnitude of these influences will not exceed the width of the observa- Degr. of tolerance Figure 5. (a) Combined BReach t_3s plot (all data) and (b) combined BReach h_1s plot (winter data) for Clog-y-Fran. In all subplots, for each index on the x axis the gray area indicates the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). Each gray tint represents a different degree of tolerance (i.e., percentage of data points allowed to have nonacceptable model results). (c) Stage-discharge data for Clog-y-Fran.
tional uncertainty boundaries anymore and will thus result in more consistent BReach results. Figure 5c shows the available stage-discharge measurements at Clog-y-Fran. Winter data are indicated separately.
In the stage-discharge data of Clog-y-Fran (Fig. 5c), four gaugings have discharge values that are smaller than 50 % of all other available discharge measurements with a similar stage. These four observations are all measured on the same day and there is no indication of similar deviations in the months before and after this date. Although it is plausible that these deviations are caused by an erroneous registration of the discharge, there was not enough information to consider these gaugings as outliers. These data occur near the end of the time series (January 2007) and have only a minor effect on the BReach results.

River Pohangina at Mais, New Zealand
In Fig. 6a, a combined BReach t_2s plot based on measurement uncertainties as applied in McMillan et al. (2010) is shown for Mais. In this plot, time instants near the highest measured stages (return period ≥ 1 year) are indicated with a red mark on the bisector. Throughout the whole data set, many discontinuities occur in the plot. The time instants of these discontinuities often coincide with those of the highlighted peak floods. Hence, this plot confirms that, in this gravel-bed river, most of these flood events cause geomorphological changes of the river bed that induce changes in consistency and that periods in between are relatively stable. The Horizons Regional Council interpolates rating curves based on stage-discharge measurements. As these interpolations are changed up to a few times a year, it is not informative to plot these official rating curve changes on the BReach plot. Figure 6b is a combined BReach t_2s plot based on measurement uncertainties described by Coxon et al. (2015). There is a high resemblance with Fig. 6a and general conclusions are identical. There are a few reasons for this high resemblance. First, Van Eerdenbrugh et al. (2016) show that a limited misjudgment of observational errors does not alter the conclusions of a BReach analysis fundamentally. Moreover, the classification of results of a rating curve model as acceptable or nonacceptable is based on the assessed uncertainties on both stage and discharge measurements (see  Figure 6c shows the available stage-discharge measurements at Mais.

River Wairau at Barnett's Bank, New Zealand
In Fig. 7a and b, combined BReach t_2s plots are shown for Barnett's Bank, with measurement uncertainties of McMillan et al. (2010) and of Coxon et al. (2015), respectively.
Again, both plots are very similar and general conclusions are identical. Figure 7c shows the available stage-discharge measurements at Barnett's Bank. In Fig. 7a, time instants near the highest measured stages (return period ≥ 0.5 years) are indicated with a red mark on the bisector. McMillan et al. (2010) suggest a 0.5-year return period as a threshold that induces consistency changes in this gravel-bed river. This is partly confirmed in the BReach plot, in which discontinuities often (but not always) coincide with the highlighted peak floods that cause geomorphological changes of the river bed. Periods in between these  Coxon et al. (2015). In all subplots, for each index on the x axis the gray area indicates the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). Each gray tint represents a different degree of tolerance (i.e., percentage of data points allowed to have nonacceptable model results). (c) Stage-discharge data for Barnett's Bank.
consistency changes are relatively stable. The Marlborough Regional Council interpolates rating curves based on stagedischarge measurements. As these interpolations are changed up to a few times a year, it is not informative to plot these official rating curve changes on the BReach plot.

River Demer at Aarschot, Zichem and Diest, Belgium
In Van Eerdenbrugh et al. (2016), stage-discharge measurements in the river Demer at Aarschot are used to validate the BReach methodology. Although in the current research, a different rating curve model is used and observational uncertainties are assessed slightly different (see Sects. 2.2.1 and 2.4.3), the resulting BReach t_2s plot (Fig. 8) shows similar results and indicates no consistency before index 29 (August 1982) due to a deepening and widening of the river's cross section and a heightening of the river dikes. After that time instant, a more consistent period starts that lasts until index 233 (February 2005). During the last decade, the data show again a lack of consistency. This is possibly a joint effect of the occurrence of large floods, the introduction of new measurement devices and local maintenance works that af- fect the cross section of the river bed (Van Eerdenbrugh et al., 2016). In this figure, time instants near the highest measured stages (return period ≥ 5 years) are indicated with a red mark on the bisector. As data are available in two other measurement stations on the river Demer, a comparison between the results of these stations is interesting. Figure 9a shows combined BReach t_2s results at Zichem, situated 16 km upstream of Aarschot. Moments near the highest measured stages (return period ≥ 5 years) are indicated with a red mark on the bisector. In Zichem, an important change in consistency is shown at index 72 (December 1988). This corresponds with historical information. In 1988, the river bed near Zichem was deepened and widened and the dikes were heightened, causing the detected consistency change. Before that time instant, the plot shows several discontinuities that possibly suggest changes in consistency. Unfortunately, it was not possible to verify these changes due to a lack of information about this time period. After 1988, the plot suggests the start of a new consistent period until index 144 (March 2008). However, it is difficult to pinpoint the end of this second consistent period precisely. In Zichem, the stage-discharge measurements of March 2008 are the first available measurements since Octo-ber 2002 and thus this change may already be situated within this period. Since then, the stage-discharge data show nearly no consistency. Again, it is likely that this is a joined effect of several different causes (occurrence of floods, change of measurement device, deviation of the mouth of a small tributary at the location of the measurement station, occasional observations of weed growth in the river). Figure 9b shows combined BReach t_2s results in Diest (5 km upstream of Zichem). Moments near the highest measured stages (return period ≥ 5 years) are indicated with a red mark on the bisector. In this station, only 34 stage-discharge gaugings measured during 1 decade are available. Nevertheless, a similar tendency to that in the recent data of Aarschot and Zichem can be noticed in the plot. The data are consistent until index 24 (March 2008). Again, it is plausible that this consistency change is linked with the occurrence of peak discharges, with a change in measurement device and with the occasional occurrence of weed in the river bed.
The Flemish Hydrological Information Centre uses a segmented power law to assess discharges in this measurement station. In Figs. 8 and 9a, b, the time instants of these official changes of the rating curves are indicated with cyan lines that depart from the bisector. Many of the rating curve changes correspond with discontinuities or with the start of a year with a major flood. Nevertheless, the BReach plot sometimes suggests different moments of change. In Fig. 10a-c, a plot of the available stage-discharge measurements are given for Aarschot, Zichem and Diest. These plots show that for low stages in Aarschot, recently measured discharges (black) are higher than discharges during the long consistent period (red). In Zichem and Diest, however, recent discharges tend to be smaller. This latter effect is possibly caused by the observed weed growth in these two stations.
3.7 River Grote Nete at Hulshout, Belgium Figure 11a shows a combined BReach t_2s plot for Hulshout. Although the plot indicates no consistent periods, the maximum reaches of the most tolerant degree cover almost the complete data set for several data points. They are alternated by data points with very limited reaches. Figure 11b shows a combined BReach t_2s plot of the winter data in Hulshout. For this subset of data, the plot indicates a high consistency for almost the complete period. These results indicate the in-  fluence of weed growth and the need for an in-depth analysis that should lead to an appropriate modeling approach for periods with weed growth. Figure 11c shows the available stage-discharge measurements at Hulshout. Winter data are indicated separately.
Although the data set is limited to only 38 points, BReach results offer insight into the situation of the measurement station. However, it is likely that a more elaborate data set will result in more robust conclusions.

River Meuse at Maaseik, Belgium
In Maaseik, BReach t_2s plots of all data points with high degrees of tolerance (Fig. 12a) show an alternation of data points with nearly no reach and data points that have maximum reaches that cover a large part of the data set. In Fig. 12b, results of a BReach h_1s analysis on the same stagedischarge data are shown. This plot shows no consistency for the lower stages, but indicates a relatively high consistency for stages beyond index 31 (23.46 m). This corresponds with the local situation in Maaseik. Stage-discharge measurements at lower stages are influenced by the downstream movable weir in Linne. For higher stages, this influence is smaller. Moreover, the effect of dredging the river bed and of the installed guiding dam on the occurring stages decreases for increasing discharges due to the corresponding increase of the conveyance of the river cross sections. For high discharges, the order of magnitude of these influences will not exceed the width of the observational uncertainties boundaries anymore and will thus result in more consistent BReach results. An in-depth analysis should lead to an appropriate modeling approach for low flow data. Figure 12c shows the available stage-discharge measurements at Maaseik.

Results of the residual analysis
In Fig. 13, results of the BReach analysis are plotted together with the relative residuals of the E RMS optimized model for Maaseik, Aarschot and Barnett's Bank.
In Maaseik, data points with limited maximum reaches in the temporal BReach plot correspond with points with more extreme values for the residuals (Fig. 13a). When sorting the stage-discharge data along height, the residuals show the same pattern as the BReach plot ( Fig. 13b) with large absolute values and a high variability of the residuals (and thus no data consistency) for low stages and small absolute values and lower variability (and thus large consistency) for higher stages. In both subplots, the two approaches thus provide comparable information.
Also, in the results for Aarschot (Fig. 13c), a period that is indicated as consistent in the BReach results corresponds with smaller absolute values and a lower variability of the residuals, while inconsistent periods coincide with larger absolute values and a high variability of the residuals. Again, the information content of both methods can be compared.
In the station of Barnett's Bank, however, both approaches show a different amount of information (Fig. 13d). This station is subjected to many geomorphological changes that are mainly caused by floods. The BReach results suggest the existence of different consecutive consistent periods and provide information about the floods that are situated at discontinuities in the plot (and thus probably related to an important change in the river's geometry  Figure 11. Combined BReach t_2s plot with (a) all data and (b) only winter data for Hulshout. In all subplots, for each index on the x axis the gray area indicates the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). Each gray tint represents a different degree of tolerance (i.e., percentage of data points allowed to have nonacceptable model results). (c) Stage-discharge data for Hulshout.
of information is the general character of the E RMS optimized model, that is fitted to the complete data set. If a data set mainly consists of a long consistent time period (as in Aarschot), the model fit will be dominated by this period and thus residuals in this period will be small. In case the data set consists of different consecutive situations that mutually differ (as in Barnett's Bank), this general fit will be insufficient to meet the characteristics of individual consistent time periods, and a residual plot will thus be uninformative. The approach of the BReach methodology, that evaluates the performance of a chosen model from the perspective of each data point separately, does not suffer from this generalization and is thus capable of revealing these smaller consistent periods.

General considerations regarding the use of the BReach methodology
In this section, some general thoughts about the use of the BReach methodology for rating curve data are given. It is obvious that the quality of results is related with the gauging frequency of the stage-discharge data. The stations analyzed in this paper vary from densely measured (up to a mean amount of 22 gaugings a year) to rather poorly measured (2 gaugings a year). Stations with a more complex flow situa-tion are measured more frequently. In many cases, local hydrological services decide to apply a similar differentiation in the gauging frequency that depends on the station's complexity. Based on the available data, it was possible to recognize the history and characteristics of each analyzed station in the BReach results. Nevertheless, it is difficult to pinpoint a minimum required gauging frequency to guarantee a successful application. If a large time gap occurs in the measured data, this can introduce uncertainty about the exact moment of a consistency change. In extreme situations, a temporary change can even disappear from the data, resulting in a (misleading) apparently consistent period. The bar (with indication of the years) above a BReach plot permits detection of these noninformative periods. If more detail is wanted, it can be interesting to create an additional BReach plot in which the absolute time is used in both axes (and thus the indices used in the current plots are projected on these time axes) and with an indication of the moments of the available gaugings on the bisector. Results of Van Eerdenbrugh et al. (2016) show that stagedischarge data for higher stages have a smaller distinctive capacity in the BReach t analysis. This corresponds with results of Di Baldassarre and Claps (2011), who confirm the validity of one single flood rating curve throughout a period with different geometric situations (affecting the rating curve for  Figure 12. (a) Combined BReach t_2s plot (all data), (b) BReach h_1s plot (all data) for Maaseik. In all subplots, for each index on the x axis the gray area indicates the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). Each gray tint represents a different degree of tolerance (i.e., percentage of data points allowed to have nonacceptable model results). (c) Stage-discharge data for Maaseik.
lower flows). A limited deficiency of the rating curve model for the highest flows leads to satisfying BReach results as the effects of the model deficiency disappear from the plots with higher degrees of tolerance (Van Eerdenbrugh et al., 2016). In the current paper, results in Clog-y-Fran (Sect. 3.3) confirm these findings. It is, however, important to emphasize that these results are site-specific and are expected to depend on the extent to which the higher parts of the cross section contribute to changes in the flow situation. On the contrary, a model deficiency in a height range that contributes significantly to changes in the flow situation will lead to important changes in BReach t results. This is shown in this paper for low stages at Clog-y-Fran (Sect. 3.3). In any case, it is necessary to be informed about the specific situation of the analyzed rating curve station. Not only is it important for an adequate choice of a rating curve model, it is also required for a correct interpretation of the BReach results and the design of possible alternative BReach analyses (Sect. 2.3). For instance, it would not be possible to distinguish between the BReach t results of all available data at Hulshout and Maaseik (Sect. 3.7 and 3.8) without any knowledge of the local situation.
The computational load of the BReach methodology depends on several aspects. First, it increases linearly with the size of the sample of the parameter space (and is thus larger for more complex rating curves). Second (and more important), the necessary calculation time strongly depends on both the amount of stage-discharge data points and the degree of consistency of the data set. The principle of the BReach algorithm is that for each data point, a maximum left and right reach must be searched. If a data set is highly consistent, the length of these searches increases significantly. Doubling the amount of data points can (for consistent data sets) hence result in 8 times the original calculation time. In the research for this paper, all calculations are performed on a personal computer with a 3.4 GHz CPU Core I7 and 8 GB RAM. For most stations, a BReach analysis took a few minutes to a few hours. In the most complex case (Clog-y-Fran, with 1166 data points and 1.3 × 10 7 samples), calculation of BReach results required 72 h.
At the moment, interpretation of BReach results is done manually by the user. The availability of a (semi)automatic routine that identifies possible consistent data periods would improve the BReach methodology. As the degree of squareness of a BReach plot within a certain period expresses the lack of important discontinuities, it might play a role in the decision process for assessing consistent periods.

Index [-]
Index [-] Figure 13. Combined BReach t_2s plot (all data) and relative residuals of the E RMS optimized rating curve for (a) Maaseik, (c) Aarschot and (d) Barnett's Bank. Combined BReach h_1s plot (all data) and relative residuals of the E RMS optimized rating curve for Maaseik. In all BReach plots, for each index on the x axis the gray area indicates the span between the index of the maximum left reach (under the bisector) and the maximum right reach (above the bisector). Each gray tint represents a different degree of tolerance (i.e., percentage of data points allowed to have nonacceptable model results).

Conclusions
The objective of this paper was to test the BReach methodology for assessing temporal consistency in rating curve data on various stage-discharge data set in the UK, New Zealand and Belgium. This led to successful results for all tested sites.
For each country, local information is maximally used to estimate observational uncertainties that serve as an input for the methodology. In this context, a new approach is proposed for the Belgian data using relative differences between simultaneous discharge measurements to test the plausibility of several a priori assumed error distributions. This approach offers promising insights in the plausible character of measurement error distributions in addition to a more general use of existing literature data about observational uncertainties. However, the limited size of the data set with simultaneous measurements is an important restriction. In order to inves-tigate the possibilities of the proposed approach more profoundly, a more elaborate data set with large spread in time, measurement stations, measurement device and flow conditions is necessary. Such an enlarged data set would not only increase the reliability of a KS test, but would also enhance the possibility to use more bins with smaller ranges of normalized discharge (replacing the current two arbitrary subgroups) and to investigate other measurement devices.
Overall, results of the BReach analyses correspond with site-specific situations. Nevertheless, the investigated cases show that knowledge about the local situation of a measurement station is crucial to design the necessary BReach analyses and to interpret their results correctly. Results show consistency in locations that are known as stable. Where human interventions (e.g., installation of a weir, deepening of a river) altered the rating curve behavior, results show corresponding consistency changes. In locations influenced by weed growth, a higher consistency can be assessed after isolating winter data. Similarly, consistency can be assessed for higher stages in a station where a downstream weir influences low flow behavior. Stations that are prone to geomorphological changes caused by flood events show discontinuities in the BReach plots at the time instants of the highest floods. Moreover, the plots can also indicate which peak floods do not cause consistency changes. The return period that serves as a threshold for consistency changes varies from station to station. These results provide extra insight into the rating curve behavior and confirm the added value of the proposed BReach methodology as a preliminary assessment of data consistency prior to an in-depth determination of discharges and their uncertainty. Moreover, this assessment of (in)consistent periods can enhance other applications based on the investigated data (e.g., by informing hydrological and hydraulic model evaluation design about consistent time periods to analyze).
A comparison between the results of both a residual analysis and a BReach analysis shows that the latter mainly provides additional information in case of a data set that consists of different, consecutive consistent time periods that mutually differ.
In the BReach methodology, the chosen rating curve model is required to appropriately approximate the relation between discharge and stage for an important part of the measured range. In this paper, analyses with only a subset of the data or with stage-discharge data sorted by stage (BReach h ) enable a part of a known model deficiency to be overcome. Nevertheless, it is advisable to select a best possible model structure based on the available knowledge about flow conditions in the investigated measurement site.