The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs.

This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash–Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.

The flow duration curve (FDC) provides a compact summary of the variability
of daily streamflow by indicating what proportion of the flow regime exceeds
a given flow rate. FDCs have considerable practical relevance, particularly
in supporting decisions that are affected by the availability and reliability
of surface water. Common applications of FDCs include the design and
management of hydropower infrastructure

Despite their utility, empirical FDCs are unavailable for many basins,
primarily because they require extensive on-site observations of daily
streamflow

Recent efforts to predict FDCs in ungauged catchments focus on statistical
approaches that predict the flow distribution based on the catchment's
similarity to nearby, gauged watersheds

Stochastic, process-based models that mechanistically link the drivers,
state, and response of the system are a promising avenue to address these
issues. In these models, basic assumptions about the stochastic structure
of rainfall and the (deterministic) response of catchments allow the analytic
derivation of streamflow probability density functions (PDFs). (Note that
because the FDC can be obtained directly by transforming the PDF, a
predictive technique that yields the streamflow PDF will also allow the FDC
to be estimated.)

Process-based models successfully reproduce streamflow PDFs in numerous
gauged catchments worldwide

Using Nepal as a test case, this study compares the process-based and statistical approaches on the basis of (i) their ability to predict FDCs in ungauged basins, (ii) their sensitivity to data scarcity, represented both by the spatial density of the stream gauge network and by the temporal extent (length) of the available streamflow records, and (iii) their ability to accommodate changes in the rainfall regime.

Nepal provides an ideal setting to compare the two approaches, for four
reasons. First, the country is representative of global availability of
streamflow data, as measured by the density of its stream gauge network
(Fig.

Section

The process-based approach models daily streamflow as a random variable.
Subject to strong simplifying assumptions about rainfall stochasticity and
runoff generation, the streamflow PDF can be analytically derived. During the
wet season, daily rainfall is represented as a stationary marked Poisson
process with exponentially distributed depths. Assuming linear
evapotranspiration losses,

The model was successfully validated in a variety of regions with seasonally
dry climates worldwide, including Nepal, where observed FDCs were predicted
in 24 gauged catchments with a median Nash–Sutcliffe coefficient of 0.90 on
log-transformed flow quantiles

In ungauged catchments, the process-based model is implemented as follows.
Three of the seven parameters of the model (

The statistical approach is entirely driven by observation data and does not
assume any specific runoff generation process. Instead, it identifies and
exploits statistical correlations that may occur between streamflow observed
at existing gauges and the geology, topography, and climate of the
corresponding catchments. The index flow model used in this study was
developed by

Catchment characteristics. Median values and interquartile distances (IQD) are given for the whole sample of 25 gauges. The table also presents characteristics of the Chepe Kohla watershed considered in the analysis as a case study.

Predictions in ungauged catchments are obtained by first using linear
regressions to predict

The two methods were evaluated using observed streamflow data from 25
Nepalese catchments mapped in Fig.

We focused on the Chepe Kohla catchment in central Nepal (Fig.

Rainfall characteristics over the sampled catchments were obtained from 178
precipitation gauges

Finally, the precipitation depth received on any given day by a catchment is
assumed to be the average of the precipitation depths observed by individual
rain gauges. It follows that the aggregated mean rainfall intensity can be
expressed as

If no precipitation station is located within the catchment, rainfall
characteristics observed at the rain station closest to the catchment
centroid were considered. Although aggregating rainfall

Recession characteristics were estimated using streamflow observations as
described in

Potential evapotranspiration was approximated by applying the empirical
relation estimated by

We used three cross-validation techniques to evaluate the predictive ability of both methods in ungauged basins. Firstly, a leave-one-out analysis was carried out to assess predictive performances in a realistic situation, where FDCs are predicted in Nepal using all streamflow gauges available in the region. Secondly, we examined the sensitivity of the methods to decreasing data availability by reducing the number of gauges available to calibrate the models. Finally, we performed a similar data-degradation procedure, but in this case we reduced the number of daily streamflow observations, while holding the number of gauges constant. This final analysis accounts for the challenges posed by recent or temporary installation of stream gauges, which introduce uncertainties into the estimation of model parameters due to the short streamflow records used. These errors can propagate through the model and affect the prediction of FDCs.

Numerical simulation analysis to assess predictions under change.
Future rainfall characteristics (frequency

In a leave-one-out analysis, one gauge is “left out” of the data set, and
streamflow is predicted at the “missing” location using observations from
the remaining gauges. The predicted FDC is then compared to observations from
the omitted gauge. The resulting error between observation and prediction
yields the prediction performance of the method at that catchment if it was
not gauged. Repeating the procedure for all gauges offers an approximation to
the overall prediction error of the method. To measure this error, we
constructed error duration curves

The effect of the number of calibration gauges was assessed using a jackknife
cross-validation analysis

The available streamflow data did not allow a direct evaluation of the
effects of time-series length through cross-validation, because such an
analysis requires substantial overlaps in the monitoring periods of all
gauges. Therefore we focused the final analysis on the Chepe Kohla catchment,
which has the longest observation record in our data set. We evaluated the
effect of the length of the available observation records on parameter
estimation, and propagated the ensuing uncertainty in the parameters to the
FDCs predicted by each model. To do this, we selected a fixed number of full
years of streamflow observations, estimated the parameters, predicted the FDC
using these parameters, and compared the results to the empirical FDC
obtained from the full observation record. The procedure was repeated 10 000
times. The estimation errors in the model parameters and the resulting FDC
prediction performances (NSC) were recorded as a function of the number of
sampled years. This analysis is not intended to describe the models' ability
to predict FDCs at catchments with short observation records: in this case,
constructing an empirical FDC using the available (however short) observation
record is likely to be the best course of action

Sensitivity of models to changes in the precipitation regime.

We used numerical simulations to assess the ability of both models to predict
streamflow when subject to changing rainfall regimes, as described in Fig.

Synthetic streamflow time series were generated by coupling the stochastic
rainfall generator described in

Flow duration curve prediction performance in ungauged basins. The
error duration curves of the leave-one-out cross-validation analysis using
the process-based and statistical models are presented in panels

We compared the synthetic FDCs to model predictions that were made with

Although the recession assumptions of the process-based model are taken to
generate the synthetic streamflow used as a control, we believe that the
analysis is not biased against the statistical approach for three reasons.
Firstly, the only parameter of the statistical approach that is influenced by
rainfall (

Results from the leave-one-out cross-validation analysis are presented in
Fig.

Sensitivity of models to data scarcity.

Figure

When considering short observation windows, parameter uncertainties also
drive the performance of the models. Figure

Simulation results presented in Fig.

The analysis suggests that both statistical and process-based methods to
estimate FDCs in ungauged basins perform comparably in Nepal, over a wide
range of gauge densities and observation durations. Yet prediction
performances varied significantly between the models as data became
increasingly sparse. The statistical method is more sensitive to spatially
sparse data, which degrades the interpolation accuracy of

The statistical model relies on two assumptions about the correlations of
observed data. The first assumption is that catchments with similar low-flow
indices (

While the performance of the process-based model is also driven by parameter
estimation uncertainties, these errors arise from simplifying assumptions
about local hydrological processes (rather than uncertainties from their
statistical interpolation from neighboring gauges). Additional
cross-validation analyses (shown in Sect. S2 of the Supplement) suggest that
uncertainties caused by the aggregation of observed point-rainfall statistics
at the catchment level drive prediction errors of high-flow quantiles. While
increasingly accurate remote sensing rainfall data will progressively allow
such spatial heterogeneities to be resolved, current precipitation products
(e.g., TRMM 3B42) remain substantially biased in mountainous regions like
Nepal, where they do not outperform available rain gauges in predicting the
frequency and intensity of areal rainfall

This study compares two specific methods on their ability to predict FDCs in
the particular context of ungauged Nepalese basins. Results are thus not
necessarily representative of the relative performance of process-based and
statistical methods in general, particularly in regions where abundant field
data allow more advanced statistical approaches to be implemented. Yet
fundamentally, the statistical model relies on observed correlations rather
than assumptions about hydrologic mechanisms. Because FDC shapes are modeled
non-parametrically, the approach is applicable to regions with highly
variable catchment responses. However, prediction performance in ungauged
basins is constrained by interpolation errors in the mean flow. This makes
the method unsuitable for regions where the local determinants of mean flow
(i.e., rainfall, evapotranspiration, glacial melt) cannot be accurately
monitored at the catchment level. In contrast, a key advantage of the
process-based model is its ability to exploit characteristics of the
stochastic structure of rainfall that can be estimated from daily rainfall
observations. The model is appropriate for regions where the spatial
heterogeneity of runoff is driven by rainfall, and where the frequency and
intensity of rainfall depths at the catchment level can be readily estimated
(i.e., small catchments with numerous rain gauges, or places where satellite
observations provide a good representation of rainfall statistics). Unlike
rainfall, recession behavior arises from lumped and complex interactions
between climate, vegetation, and groundwater processes that typically cannot
be monitored in a spatially explicit manner. The process-based model is
therefore inappropriate for regions where the hydrologic response of the
catchment is the main source of runoff heterogeneity, or where the assumed
recession behavior (in particular the relation between

Conveniently, the appropriate implementation contexts for both methods appear
to be complementary, and the optimal method in a given region is determined
by the driving source of runoff heterogeneity in the catchments. Ultimately,
the performance of both methods is constrained by their ability to estimate
their parameters in ungauged basins. This relation is apparent in Fig.

Expected shifts in the frequency and intensity of monsoon rainfall over Nepal
only have a marginal impact on the streamflow distributions in the Chepe
Kohla catchment, as shown by the numerical simulation presented in
Fig.

Although rainfall stationarity is an inherent assumption of the process-based
approach, climate change can be incorporated by updating the relevant
parameters to their future value to predict the (pseudo-)stationary future
state of the system. The method accounts for otherwise confounding changes in
the frequency and intensity of rainfall, which are expected in Nepal. By
explicitly accounting for soil moisture dynamics and recession behavior, the
model emulates the (causal) effect of rainfall on streamflow. As a result,
the method reliably predicts the distribution of future streamflow, provided
that governing flow generation processes are in line with the basic
assumptions listed in Sect.

In contrast, the statistical model is solely based on observed correlations,
leading to two important sources of errors for predictions under change.
First, the model only accommodates rainfall changes to the extent that the
estimated statistical relation between rainfall and runoff is representative
of local runoff coefficients. The model will not reliably predict future
streamflows if runoff coefficients are strongly spatially heterogeneous, or
if the cross-sectional sample of gauges fails to capture important processes
governing mean flow. This source of uncertainty appears to be significant in
Nepal, as illustrated by the substantial bias in annual flow predictions in
Fig.

Lastly, a key assumption in this study is that catchment response (in terms of low-flow or recession characteristics) is independent of climate. It is possible that shifts in climate have an effect on catchment response by affecting the partitioning of effective rainfall between storage and runoff. Although not quantitatively assessed in this study, we expect that this effect would negatively affect the performance of both approaches.

Stochastic, process-based models predicted the FDCs for ungauged catchments
in Nepal well, with a performance that was comparable to that of statistical
models. It suggests that in regions with globally representative gauge
densities, and under seasonally dry climates, the advantages of the
statistical approaches relative to stochastic models noted in previous
analyses

Nonetheless, this study finds a complementarity between the different sources of uncertainty in the stochastic and statistical methods. This suggests that model selection should be driven by a consideration of the main drivers of heterogeneity in any study catchment: process-based models are advisable if climate is likely to be the main source of runoff heterogeneity. Conversely, statistical methods are more appropriate for regions with substantially different recession behaviors across catchments. These distinctions provide a potentially robust basis for model selection in any given application.

The results also suggest that the sensitivity of statistical approaches to
changes in rainfall statistics is dependent on the “resilience” of the flow
regime as defined by

The excellent performance of both process-based and statistical models for
the FDC and PDF in ungauged basins suggests that extending probabilistic
analyses in such basins to also include flow-derived variables such as
hydropower capacity

This appendix presents the analytical expression of FDC in seasonal climates
derived in

Dry-season streamflow is modeled as a seasonal recession starting at the last
discharge peak of the wet season. Because wet-season streamflow is a
gamma-distributed variable, streamflow at discharge peaks, and therefore the
initial condition of the seasonal recession, is itself a gamma-distributed
variable

Assuming a power-law relation between discharge and recession rate, the
cumulative distribution function of dry-season streamflow can be expressed as

The FDC for seasonally dry climates is finally obtained by plotting the
streamflow quantiles

The Swiss National Science Foundation is gratefully acknowledged for funding (M. F. Müller). Edited by: S. Archfield