A comparative assessment of rainfall–runoff modelling against regional flow duration curves for ungauged catchments

Rainfall–runoff modelling has long been a special subject in hydrological sciences, but identifying behavioural parameters in ungauged catchments is still challenging. In this study, we comparatively evaluated the performance of the local calibration of a rainfall–runoff model against regional flow duration curves (FDCs), which is a seemingly alternative method of classical parameter regionalisation for ungauged catchments. We used a parsimonious rainfall–runoff model over 45 South Korean catchments under semi-humid climate. The calibration against regional FDCs was compared with the simple proximity-based parameter regionalisation. Results show that transferring behavioural parameters from gauged to ungauged catchments significantly outperformed the local calibration against regional FDCs due to the absence of flow timing information in the regional FDCs. The behavioural parameters gained from observed hydrographs were likely to contain intangible flow timing information affecting predictability in ungauged catchments. Additional constraining with the rising limb density appreciably improved the FDC calibrations, implying that flow signatures in temporal dimensions would supplement the FDCs. As an alternative approach in data-rich regions, we suggest calibrating a rainfall–runoff model against regionalised hydrographs to preserve flow timing information. We also suggest use of flow signatures that can supplement hydrographs for calibrating rainfall–runoff models in gauged and ungauged catchments.


Introduction
A standard method to predict daily streamflow is to employ a rainfall-runoff model that conceptualises catchment functional behaviours, and simulate synthetic hydrographs from atmospheric drivers (Wagener and Wheater, 2006;Blöschl et al., 2013).A prerequisite of this conceptual modelling approach is parameter identification to enable the rainfallrunoff model to imitate actual catchment behaviours.Conventionally, behavioural parameters are estimated via model calibration against observed hydrographs (referred to as the "hydrograph calibration" hereafter).The hydrograph calibration provides convenience to attain reproducibility of the predictand (i.e.streamflow time series), which is commonly used as a performance measure in rainfall-runoff modelling studies.Because the degree of belief in hydrological models is normally measured by how they can reproduce observations (Westerberg et al., 2011), use of the hydrograph calibration has a long tradition in runoff modelling (Hrachowitz et al., 2013).
The hydrograph calibration, however, can be challenged by epistemic errors in input and output data, sensitivity to calibration criteria, and inability under no or poor data availability (Westerberg et al., 2011;Zhang et al., 2008).Importantly, it is difficult to know whether the parameters optimised toward maximising hydrograph reproducibility are unique to represent actual catchment behaviours, since multiple parameter sets possibly show similar predictive performance (Beven, 2006(Beven, , 1993)).This low uniqueness of the optimal parameter set, namely the equifinality problem in conceptual hydrological modelling, can become a significant uncertainty source particularly when extrapolating the optimal parameters to ungauged catchments (Oudin et al., 2008).
Published by Copernicus Publications on behalf of the European Geosciences Union.
To overcome or circumvent those disadvantages, distinctive flow signatures (i.e.metrics or auxiliary data representing catchment behaviours) in lieu of observed hydrographs can be used to identify model parameters (e.g.Yilmaz et al., 2008;Shafii and Tolson, 2015).The flow duration curve (FDC) has received particular attention in the signaturebased model calibrations as a single criterion (e.g.Westerberg et al., 2014Westerberg et al., , 2011;;Yu and Yang, 2000;Sugawara, 1979) or one of calibration constraints (e.g.Pfannerstill et al., 2014;Kavetski et al., 2011;Hingray et al., 2010;Blazkova and Beven, 2009;Yadav et al., 2007).The FDC, the relationship between flow magnitude and its frequency, provides a summary of temporal streamflow variations in a probabilistic domain (Vogel and Fennessey, 1994).Many FDC-related studies have found that climatological and geophysical characteristics within a catchment determine the shape of the FDC (e.g.Cheng et al., 2012;Ye et al., 2012;Yokoo and Sivaplan, 2011;Botter et al., 2007).With only few physical parameters, the shape of the period-of-record FDC could be analytically expressed (Botter et al., 2008).Based on this strong relationship between catchment physical properties and the FDC, one may hypothesise that model calibration against the FDC (referred to as the "FDC calibration" hereafter) can provide parameters that can sufficiently capture actual catchment behaviours.Sugawara (1979) is the first attempt at the FDC calibration, emphasising its advantage to reduce negative effects of epistemic errors in rainfall-runoff data.Westerberg et al. (2011) also showed that the FDC calibration may provide robust predictions to moderate disinformation such as the presence of event flows under inconsistency between inputs and outputs.
If it allows rainfall-runoff models to sufficiently capture functional behaviours of catchments, the FDC calibration would have an especial value in comparison to the parameter regionalisation for prediction in ungauged catchment.The parameter regionalisation, which transfers or extrapolates behavioural parameters from gauged to ungauged catchments (e.g.Kim and Kaluarachchi, 2008;Oudin et al., 2008;Parajka et al., 2007;Wagener and Wheater, 2006;Dunn and Lilly, 2011), conveniently provides a priori estimates of behavioural parameters and thus became a popular approach to parameter identification in ungauged catchments (see a comprehensive review in Parajka et al., 2013).However, it has a critical concern that regionalised parameters are highly dependent on model calibrations at gauged sites that may have substantial equifinality problems.Under no flow information in ungauged catchments, it is impossible to know whether regionalised parameters are behavioural.Thus, regionalised parameters might be insufficiently reliable and highly uncertain (Bárdossy, 2007;Oudin et al., 2008;Zhang et al., 2008).
On the other hand, the calibration against regional FDC (referred to as "RFDC_cal" hereafter) may reduce the primary concern in the classical parameter regionalisation scheme.The regional models predicting FDC at ungauged sites have showed strong performance -for instance, via regression analyses between quantile flows and catchment properties (e.g.Shu and Ouarda, 2012;Mohamoud, 2008;Smakhtin et al., 1997), geostatistical interpolation of quantile flows (e.g.Pugliese et al., 2014;Westerberg et al., 2014), and regionalisation of theoretical probability distributions (e.g.Atieh et al., 2017;Sadegh et al., 2016) among many variations.The parameters obtained from RFDC_cal are deemed behavioural, because a distinctive flow signature of the target ungauged catchment directly identifies them; however, predicted FDC should be reliable in this case.An FDC is a compact representation of runoff variability at all timescales from inter-annual to event scale, embedding various aspects of multiple flow signatures (Blöschl et al., 2013).Based on this strength, several studies have already showed promising predictive performance using RFDC_cal for ungauged catchments (e.g.Westerberg et al., 2014;Yu and Yang, 2000).
Nevertheless, practical questions arise when using RFDC_cal for ungauged catchments.First, the FDC is simplified information with flow magnitudes only; hence, the FDC calibration could worsen the equifinality problem relative to the hydrograph calibration.Due to no flow timing information in regional FDC, one may cast a concern that parameters obtained from RFDC_cal may provide poorer predictive performance than regionalised parameters gained from the hydrograph calibration.Indeed, there is additional uncertainty in predicted FDC possibly introduced by the regionalisation models (Westerberg et al., 2011;Yu et al., 2002).RFDC_cal may be undesirable when a simple parameter regionalisation can provide better performance, because regionalising observed FDC may require expensive efforts.Several comparative studies on parameter regionalisation (e.g.Parajka et al., 2013;Oudin et al., 2008) have suggested that the simple proximity-based parameter transfer can be competitive in many regions.Second, there may be additional flow signatures to improve predictive performance of the FDC calibration.Additional constraining can lead to better predictive performance of the RFDC (Westerberg et al., 2014); however, it is still an open question which flow signatures can supplement the FDC calibration.
As discussed, RFDC_cal seems promising for prediction in ungauged catchments.However, to our knowledge, RFDC_cal has never been evaluated in a comparative manner with classical parameter regionalisation except by Zhang et al. (2015), who assessed its performance in part.Therefore, this study aimed to evaluate predictive performance of RFDC_cal in comparison with a conventional parameter regionalisation.We focused on the absence of flow timing in the FDC and its impacts on rainfall-runoff modelling.In this work, a parsimonious four-parameter conceptual model was used to simulate daily hydrographs for 45 catchments in South Korea.To predict FDC in ungauged catchments, a geostatistical regional model was adopted here.The Monte Carlo sampling was used to identify model parameters and measure equifinality in the hydrograph and the FDC calibrations.

Description of the study area and data
For this study, we selected 45 catchments located across South Korea with no or negligible human-made influences on flow variations (Fig. 1).South Korea is characterised as a temperate and semi-humid climate with rainy summer seasons.North Pacific high pressure brings monsoon rainfall with high temperatures during summer seasons, while dry and cold weathers prevail in winter seasons due to Siberian high pressure.Typical ranges of annual precipitation are 1200-1500 and 1000-1800 mm in the northern and the southern areas respectively (Rhee and Cho, 2016).Annual mean temperatures in South Korea range between 10 and 15 • C (Korea Meteorological Administration, 2011).Approximately 60-70 % of precipitation falls in summer seasons between June and September (Bae et al., 2008).Streamflow usually peaks in the middle of summer seasons because of heavy rainfall or typhoons, and hence information of catchment behaviours is largely concentrated on summerseason hydrographs.Snow accumulation and ablation occurring at high elevations have minor influences on flow variations due to the relatively small amount of winter precipitation (Bae et al., 2008).
The study catchments were selected based on availability of streamflow data.High-quality daily streamflow data across South Korea have been produced since the establishment of the Hydrological Survey Centre in 2007 (Jung et al., 2010), though river stages have been monitored for an extensive length at a few gauging stations.Thus, we collected streamflow data at 29 river gauging stations from 2007 to 2015 together with inflow data of 16 multi-purpose dams for the same data period from the Water Resources Management Information System operated by the Ministry of Land, Infrastructure, and Transport of the South Korean government (available at http://www.wamis.go.kr/).The mean annual flow of the study catchments was 739 mm yr −1 with a standard deviation of 185 mm yr −1 during 2007-2015.
In addition, as atmospheric forcing inputs, we collected daily precipitation and maximum and minimum temperatures for 2005-2015 at 3 km grid resolution produced by spatial interpolations between 60 stations of the automated surface observing system (ASOS) maintained by the Korea Meteorological Administration (2011).The ASOS data were interpolated by the Parameter-elevation Regression on Independent Slope Model (PRISM; Daly et al., 2008), and overestimated pixels of the PRISM grid data were smoothed by the inverse distance method.Jung and Eum (2015) found that this combined method improved the spatial interpolation of precipitation and the temperatures in South Korea.The annual mean precipitation and temperature of the study catchments vary within ranges of 1145-1997 mm yr −1 and 8.0-13.8• C during 2007-2015.Hydro-climatological features of the 45 catchments are summarised in Table 1.

Hydrological model (GR4J)
A parsimonious rainfall-runoff model, GR4J (Perrin et al., 2003), was adopted to simulate daily hydrographs of the 45 catchments for 2007-2015.GR4J conceptualises functional catchment response to rainfall with four free parameters that regulate the water balance and water transfer functions.Figure 2 schematises the structure of GR4J.The four parameters (X1 to X4) conceptualise soil water storage, groundwater exchange, routing storage, and the base time of unit hydrograph respectively.Since its parsimonious and efficient structure allows robust calibration and reliable regionalisation of the parameters, GR4J has been frequently used for modelling daily hydrographs with various purposes under diverse climatic conditions (Zhang et al., 2015).The computation details and discussion are found in Perrin et al. (2003).The potential evapotranspiration (PE in Fig. 2) was estimated by the temperature-based model proposed by Oudin et al. (2005) for lumped rainfall-runoff modelling.

Preliminary data processing
Before rainfall-runoff modelling, we preliminarily processed the grid climatic data to convert precipitation data to liquid water forcing (i.e.rainfall and snowmelt depths) using a physics-based snowmelt model proposed by Walter et al. (2005).The preliminary snowmelt modelling was mainly for reducing systematic errors from no snow component in GR4J, which may affect model performance in catchments at relatively high elevations.We chose this preliminary processing to avoid adding more parameters (e.g. the temperature index) to the existing structure of GR4J.In the case of GR4J, one additional parameter implies 25 % complexity increase in terms of the number of parameters.The snowmelt model uses the same inputs of GR4J to simulate point-scale snow accumulation and ablation processes (i.e.no additional inputs are required).The snowmelt model is a physics-based model but uses empirical methods to estimate its parameters for the energy balance simulation.As outputs, it produces the liquid water depths and the snow water equivalent.For lumped inputs to GR4J, we took spatially averaged pixel values of the liquid water depths and the maximum and minimum temperatures within the boundary of each catchment.
After the snowmelt modelling, consistency between the liquid water depths and the observed flows (i.e.input-output consistency) was checked using the current precipitation index (CPI; Smakhtin and Masse, 2000) defined as where I t is the CPI (mm) at day t, K is a decay coefficient (0.85 d −1 ), and R t is the liquid water depth (mm d −1 ) at day t.CPI mimics temporal variations of typical streamflow data by converting intermittent precipitation data to a continuous time series with an assumption of the linear reservoir.
The input-output consistency can be evaluated using correlation between CPI and observed streamflow as in Westerberg et al. (2014) and Kim and Kaluarachchi (2014).The Pearson correlation coefficients between CPI and streamflow data of the 45 catchments had an average of 0.67 with a range of 0.43-0.79,and no outliers were found in the box plot of the correlation coefficients.Hence, we assumed that consistency between climatic forcing and observed hydrographs was acceptable.

The hydrograph calibration in gauged catchments
To search behavioural parameter sets of GR4J against the streamflow observations (i.e. the hydrograph calibration), we used the objective function of Zhang et al. (2015) as the calibration criterion to consider the Nash-Sutcliffe efficiency Hydrol.Earth Syst.Sci., 21, 5647-5661, 2017 www.hydrol-earth-syst-sci.net/21/5647/2017/ (NSE) and the water balance error (WBE) together: where Q obs and Q sim are the observed and simulated flows respectively, Q obs is the arithmetic mean of Q obs , and N is the total number of flow observations.The best parameter set for each study catchment was obtained from minimisation of the OBJ using the Monte Carlo simulations described below.
To determine sufficient runs for the random simulations, we calibrated GR4J parameters using the shuffled complex evolution (SCE) algorithm (Duan et al., 1992) for one catchment with moderate input-output consistency with the parameter range given in Table 2 by Demirel et al. (2013).Then, the total number of random simulations was iteratively determined by adjusting the number of runs until the minimum OBJ of the random simulations became adequately close to the OBJ value from the SCE algorithm.We found that approximately 20 000 runs could provide the minimum OBJ value equivalent to that from the SCE algorithm.Subsequently, GR4J was calibrated by 20 000 runs of the Monte Carlo simulations for all 45 catchments, and the parameter sets with the minimum OBJ values were taken for runoff predictions.In addition, we sorted the 20 000 parameter sets in terms of corresponding OBJ values in ascending order, and the first 50 sets (0.25 % of the total samples) were taken to measure the degree of equifinality.We measured the equifinality simply by the prediction area between 2.5 and 97.5 % boundaries of runoff simulations given by the collected 50 parameter sets.This prediction area was later compared to that from the FDC calibration under the same Monte Carlo framework.Note that we estimated the prediction area to comparatively evaluate the degree of equifinality between the hydrograph and the FDC calibrations under the same sampling size and the same acceptance rate for all the catchments.For more sophisticated and reliable uncertainty estimation other methods are available, such as the generalised likelihood uncertainty estimation (GLUE; Beven and Bingley, 1992), the Bayesian total error analysis (BATEA; Kavetski et al., 2006), and the differential evolution adaptive Metropolis (DREAM; Vrugt and Ter Braak, 2011).
3.4 Model calibration against the regional FDC for ungauged catchments Each catchment was treated ungauged for the comparative evaluation of RFDC_cal in the leave-one-out crossvalidation (LOOCV) mode.For regionalising empirical FDC, the geostatistical method recently proposed by Pugliese et al. (2014) was used.Pugliese et al. (2014) employed the top-kriging method (Skøien et al., 2006) to spatially interpolate the total negative deviation (TND), which is defined as the area between the mean annual flow and belowaverage flows in a normalised FDC.The top-kriging weights that interpolate TND values were taken as weights to estimate flow quantiles of ungauged catchments from empirical FDC of surrounding gauged catchments.The FDC of an ungauged catchment in Pugliese et al. ( 2014) is estimated from normalised FDC of surrounding gauged catchments as where ˆ (w 0 , p) is the estimated quantile flow (m 3 s −1 ) at an exceedance probability p (unitless) for an ungauged catchment w 0 , φ (w 0 , p) is the estimated normalised quantile flow (unitless), Q (w 0 ) is the annual mean streamflow (m 3 s −1 ) of the ungauged catchment, and φ i (w i , p) and λ i are normalised quantile flows (unitless) and corresponding top-kriging weights (unitless) of gauged catchment w i , respectively.The unknown mean annual flow of an ungauged catchment, Q (w 0 ), can be estimated with a rescaled mean annual precipitation defined as where MAP * is the rescaled mean annual precipitation (m 3 s −1 ), MAP is mean annual precipitation (mm yr −1 ), and A is the area (km 2 ) of the ungauged catchment, and the constant 3.171 × 10 −5 converts the units of MAP * from mm yr −1 km 2 to m 3 s −1 .A distinct advantage of the geostatistical method is its ability to estimate the entire flow quantiles in an FDC with a single set of top-kriging weights.Since a parametric regional FDC (e.g.Yu et al., 2002;Mohamoud, 2008) is obtained from independent models for each flow quantile in many cases -for instance, by multiple regressions between selected quantile flows and catchment properties -fundamental characteristics in an FDC continuum would be entirely D. Kim et al.: Performance of hydrological modelling against the FDCs or partly lost.The geostatistical method, on the other hand, treats all flow quantiles as a single object; thereby, features in an FDC continuum can be preserved.It showed promising performance to reproduce empirical FDC only using topological proximity between catchments.More details on the geostatistical method can be found in Pugliese et al. (2014).
For regionalising empirical FDC of the 45 catchments, we followed the same procedure of Pugliese et al. (2014).We obtained top-kriging weights (λ i ) by the geostatistical interpolation of TND values from observed FDC for the calibration period (2011)(2012)(2013)(2014)(2015).Then, the top-kriging weights were used to interpolate empirical flow quantiles.The number of neighbours for the TND interpolation was iteratively determined as five, at which level additional neighbouring TND are unlikely to bring better agreement between the estimated and observed TND.In other words, normalised flow quantiles of five catchments surrounding the target ungauged catchment were interpolated with the top-kriging weights.Then, MAP * of the target ungauged catchment was multiplied.We predicted flow quantiles at 103 exceedance probabilities (p of 0.001, 0.005, 99 points between 0.01 and 0.99 at an interval of 0.01, 0.995, and 0.999) for rainfall-runoff modelling against regional FDC (i.e.RFDC_cal).
For runoff prediction in ungauged catchments, the GR4J parameters were identified by the same Monte Carlo sampling but towards minimisation of OBJ value between the regional and the modelled flow quantiles at the 103 exceedance probabilities.The best parameter set, which provided the minimum OBJ value, was taken as the best behavioural set of RFDC_cal for each catchment.

Proximity-based parameter regionalisation for ungauged catchments
We selected the proximity-based parameter transfer (referred to as "PROX_reg" hereafter) to comparatively evaluate predictive performance of RFDC_cal.The parameter regionalisation has three classical categories: (a) proximity-based parameter transfer (i.e.PROX_reg; e.g.Oudin et al., 2008); (b) similarity-based parameter transfer (e.g.McIntyre et al., 2005); and (c) regression between parameters and physical properties of gauged catchments (e.g.Kim and Kaluarachchi, 2008).A comprehensive review on the parameter regionalisation in Parajka et al. (2013) reported that PROX_reg has competitive performance under humid climate with low-complexity models relative to the other categories.Based on modelling conditions in this study (semihumid climate and four parameters), we chose PROX_reg to evaluate RFDC_cal.
To predict runoff at the 45 catchments in the LOOCV mode, we transferred the behavioural parameter sets obtained from the hydrograph calibration of the five donor catchments used for the FDC regionalisation.In other words, we used the same donor catchments for FDC regionalisation and PROX_reg.This allowed us to have consistency in transferring hydrological information from gauged to ungauged catchments between RFDC_cal and PROX_reg.Using the best behavioural parameter sets of the five donor catchments, we generated five runoff time series and took the arithmetic averages of them to represent runoff predictions by PROX_reg.

Performance evaluation
We used multiple performance metrics to evaluate predictive performance of all modelling approaches applied in this study.Predictive performance of each modelling approach was graphically evaluated using box plots of the performance metrics of the 45 catchments.In addition, we performed several paired t tests to check the statistical significance of performance differences between the modelling approaches.What follows is the description of the performance metrics.
To measure high-and low-flow reproducibility, we chose two traditional performance metrics: (1) the NSE between observed and predicted flows (Eq.2b) and ( 2) the NSE of logtransformed flows (LNSE) respectively.LNSE is calculated as Although NSE and LNSE are frequently used for performance evaluation, they may be sensitive to errors in flow observations (Westerberg et al., 2011).Hence, we additionally selected three typical flow metrics that embed dynamic flow variation in a compact manner: the runoff ratio (R QP ), the baseflow index (I BF ), and the rising limb density (D RL ).R QP , I BF , and D RL are proxies of aridity and water-holding capacity, contribution of the baseflow to flow variations, and flashiness of catchment behaviours, respectively.They are defined as the ratio of runoff to precipitation, the ratio of baseflow to total runoff, and the inverse of average time to peak (d −1 ) as where Q and P are average flow and precipitation for a given period (mm d −1 ), Q t and Q B,t (m d −1 ) are the streamflow and the base flow at time t respectively, N RL is the number of rising limb, and T R is the total amount of time when the hydrograph is rising (days).Q B,t can be calculated by sub- tracting direct flow Q D,t from Q t as where c is the filter parameter, which was set to 0.925 (Brooks et al., 2011;Eckhardt, 2007).Flow signature reproducibility of RFDC_cal and PROX_reg were evaluated by the relative absolute bias between modelled and observed signatures as where D FS is the relative absolute bias, FS sim is a flow signature of the modelled flows, and FS obs is that of the observed flows.

Hydrograph calibration and FDC regionalisation in gauged catchments
Figure 3a displays results of the parameter identification against the observed hydrographs (i.e. the hydrograph calibration).The 45 catchments had mean NSE and LNSE of 0.66 and 0.65 between the simulated and observed flows for the calibration period, respectively.The average NSE reduction from the calibration to the validation periods was 0.06 with a standard deviation of 0.10.The temporal transfer of the calibrated parameters did not decrease the mean LNSE value, while a wider LNSE range indicates that uncertainty of low-flow predictions may increase when temporally transferring the calibrated parameters.The predictive performance was closely related to the input-output consistency (Fig. 3b), which was measured by the Pearson correlation coefficient between the CPI and the observed flows.A low input-output consistency implies that the rainfall-runoff data may include significant epistemic errors such as minimal flow responses to heavy rainfall or excessive response to tiny rainfall.If the model calibration compensates disinformation from such errors, the parameters would be forced to have biases.Figure 3b shows that consistency in input-output data is a critical factor affecting parameter identification and thus performance.Perhaps screening catchments with low input-output consistency would provide better predictions in ungauged catchments.However, we did not consider it in the LOOCV for RFDC_cal and PROX_reg, since variation in input-output consistency would be a common situation.Rather, reducing the number of gauged catchments lowers spatial proximity, and thus can cause biases for ungauged catchments too.Overall, 27 catchments and 33 catchments showed NSE and LNSE values greater than 0.6.We assumed that the hydrograph calibration under the Monte Carlo framework, which was assisted by the SCE optimisation, was able to acceptably identify the behavioural parameters under given data quality.
Figure 4 illustrates the 1 : 1 scatter plot between the observed and predicted flow quantiles of all the catchments, indicating high applicability of the top-kriging FDC regionalisation.The overall NSE and LNSE values between the observed and regionalised flow quantiles show good applicability of the geostatistical method.The NSE and LNSE values for individual catchments have averages of 0.83 and 0.91 with standard deviations of 0.25 and 0.11, respectively, implying that low-flow predictions were slightly better.The performance of the geostatistical method was relatively poor at locations where gauging density is low.Catchments 4, 10, 35, and 36, which recorded 0.6 or less NSE, are limitedly hatched with or adjacent to the other catchments; nonetheless, LNSE values of those catchments were still greater than  2016) that performance of the geostatistical method was sensitive to river gauging density.Transferring flow quantiles from remote catchments may not sufficiently capture functional similarity between donor and receiver catchments.In spite of the minor shortcomings, the geostatistical FDC regionalisation was deemed acceptable based on the high NSE and LNSE of flow quantiles.Topological proximity was generally a good predictor of flow quantiles for the study catchments.

Comparing hydrograph predictability between RFDC_cal and PROX_reg
Figure 5 compares the box plots of NSE and LNSE values between RFDC_cal and PROX_reg.PROX_reg generally outperforms RFDC_cal in predicting both high and low flows, suggesting that transferring parameters identified by observed hydrographs would be a better choice than a local calibration against predicted FDC.The differences between NSE values of PROX_reg and RFDC_cal have an average of 0.22 with a standard deviation of 0.34.Only eight catchments showed higher NSE with RFDC_cal.These higher NSE values of PROX_reg imply that PROX_reg is preferable when high-flow predictability is needed such as for flood analyses.
In the case of LNSE, PROX_reg still had a higher median than RFDC_cal (0.53 and 0.62 for RFDC_cal and PROX_reg respectively).In 25 catchments, PROX_reg provided LNSE values greater than those of RFDC_cal.The low performance of RFDC_cal was also found in the comparative assessment of Zhang et al. (2015), which evaluated RFDC_cal for 228 Australian catchments using the same GR4J model.Zhang et al. (2015) found that RFDC_cal was inferior to PROX_reg in the Australian catchments, because the FDC calibration poorly reproduced temporal flow variations relative to the hydrograph calibration.This study confirms the difficulty of capturing dynamic catchment behaviours with FDC containing no flow timing information.
A major weakness of RFDC_cal is the absence of flow timing information in the parameter calibration process.Unlike RFDC_cal, PROX_reg did not discard the flow timing information.The regionalised parameters may be able to implicitly transfer the flow timing information from gauged to ungauged catchments (this hypothesis will be discussed in Sect.4.4).Figure 6 illustrates how the absence of flow timing negatively influences predictive performance.For this comparison, the parameters were recalibrated against the observed FDC (not regional FDC) under the same Monte Carlo method to discard errors introduced by the FDC regionalisation (i.e.equivalent to calibration against perfectly regionalised FDC).The parameters identified by the observed hydrograph (Fig. 6a) brought a good predictability in both high and low flows, resulting in an excellent performance to reproduce the FDC.On the other hand, an excellent FDC reproducibility does not guarantee a good predictability in high flows (Fig. 6b).This indicates that reproducing FDC with rainfall-runoff models would be less able than the hydrograph calibration to capture functional catchment responses.
In addition, Fig. 6 shows that the prediction area of the 50 behavioural parameters from the Monte Carlo simulations (indicated by the grey areas and the blue arrows) became much larger when using the FDC calibration instead of the hydrograph calibration.We calculated the ratio of the prediction area of the FDC calibration to that of the hydrograph calibration, and refer to this as the equifinality ratio.It quantifies the degree of equifinality augmented by replacing the hydrograph calibration with the FDC calibration.Figure 7 displays the scatter plot between the equifinality ratio and the input-output consistency.The equifinality augmented by the loss of flow timing is likely to increase as the inputoutput consistency decreases.The average of the equifinality ratios was 1.96, implying that potential equifinality inherent in RFDC_cal could be substantial.This may suggest that the  The equifinality ratio is defined as the ratio between the prediction areas of the 50 behavioural parameters gained from the FDC calibration and the hydrograph calibration.equifinality problem embedded in RFDC_cal could be more significant than that in PROX_reg.

Comparing flow signature predictability between
RFDC_cal and PROX_reg PROX_reg.This result can be explained by considering that baseflow has fewer temporal variations than direct runoff in the South Korean catchments under typical monsoonal climate.High seasonality of monsoonal precipitation causes high temporal variations in direct runoff during June to September, while relatively steady baseflow is dominant during dry seasons (October to May).In Namgang Dam (whose flow variation is displayed in Fig. 6), for example, the coef- ficient of variance (CV) of direct runoff was 5.86 for 2007-2015, which is approximately 3.5 times as high as the CV of the baseflow.
On the other hand, RFDC_cal was less able to reproduce D RL than PROX_reg.This highlights the weakness of RFDC_cal in which only flow magnitudes were used for identifying model parameters.PROX_reg showed better performance in predicting D RL than did RFDC_cal.Flow timing information gained from the observed hydrographs could be preserved, even after behavioural parameters were transferred to ungauged catchments.Overall, PROX_reg seems to be better than RFDC_cal to predict the three flow signatures together.
The box plots in Fig. 9 provide an indication that D RL is likely to supplement the FDC calibration and thus improve RFDC_cal.From the collection of 50 behavioural parameter sets given by the FDC calibration, we chose the parameter set providing the lowest bias for each flow signature as the best behavioural sets, and simulated runoff again for all catchments.The high-flow predictability was fairly improved by additional constraining with D RL , suggesting that flow metrics associated with flow timing make up for the weakness of the FDC calibration.Additional constraining with R QP and I BF did not bring appreciable improvement in the FDC calibration.However, PROX_reg was still better than the additional constraining with D RL , indicating that a further study is needed for better constraining rainfall-runoff models using FDC together with additional flow metrics.

Paired t tests between the modelling approaches
For comparative evaluation in this study, we produced several runoff prediction sets using multiple rainfall modelling approaches.First, we calibrated GR4J against the observed hydrographs (referred to as Q_cal), and transferred the behavioural parameters to ungauged catchments in the LOOCV mode (PROX_reg).We constrained GR4J with the regional FDC (RFDC_cal).To evaluate equifinality, we recalibrated the GR4J parameters against the observed FDC (referred to as FDC_cal).Additionally, we constrained the model with observed FDC plus the flow signatures, and significant performance improvement was found with D RL (referred to as FDC + D RL _cal).A paired t test using the performance metrics (NSE, LNSE, or D FS ) between these modelling approaches can answer various questions beyond the graphical evaluations with box plots.For paired t tests, we added one more case of transferring parameters gained from FDC_cal to ungauged catchments (referred to as FPROX_reg).FPROX_reg transfers behavioural parameters with no flow timing information from gauged to ungauged catchments.The mean NSE of FPROX_reg was 0.44 with a standard deviation of 0.49.
A primary hypothesis of this study was that RFDC_cal could outperform PROX_reg.This question can be addressed by looking at the NSE differences between RFDC_cal and PROX_reg.The mean NSE difference between them was −0.22 and the standard error was 0.051, providing an evaluation that the NSE differences were less than zero at a 95 % confidence level.The paired t test did not lend support to the hypothesis (i.e.PROX_reg outperformed RFDC_cal significantly).However, we can assume that D RL can improve the predictive performance of FDC_cal.The mean NSE difference between FDC + DRL_cal and FDC_cal was 0.12 and the standard error was 0.025, confirming the significance at a 95 % confidence level.
Likewise, we tested several questions relevant to rainfallrunoff modelling in ungauged catchments using different combinations.In Table 3, we summarise the results of paired t tests for scientific questions that may arise from this study.One interesting question is, "Did the behavioural parameters from Q_cal contain flow timing information for ungauged catchments?"We addressed this question by comparing between PROX_reg and FPROX_reg with a hypothesis that predictability in ungauged catchments would decrease if the regionalised parameters were gained only from flow magnitudes.FPROX_reg uses FDC_cal for searching behavioural parameters at gauged catchments; thus, it cannot transfer flow timing information to ungauged catchments through the behavioural parameters.The mean NSE difference between PROX_reg and FPROX_reg was 0.10, and the standard error was 0.031.The NSE differences were greater than zero significantly.The behavioural parameters from Q_cal were likely to have flow timing information affecting predictability in ungauged catchments.5 Discussion and conclusions

RFDC_cal for rainfall-runoff modelling in ungauged catchments
The use of regional FDC as a single calibration criterion appears to be a good choice for searching behavioural parameters in ungauged sites.As discussed earlier, the FDC is a compact representation of runoff variability at all timescales, and thus able to embed multiple hydrological features in catchment dynamics (Blöschl et al., 2013).A pilot study of Yokoo and Sivapalan (2011) discovered that the upper part of an FDC is controlled by interaction between extreme rainfall and fast runoff, while the lower part is governed by baseflow recession behaviour during dry periods.The middle part connecting the upper and the lower parts is related to the mean within-year flow variations, which is controlled by interactions between water availability, energy, and water storage (Yaeger et al., 2012;Yokoo and Sivapalan, 2011).It is well documented that hydro-climatological processes within a catchment are reflected in the FDC (e.g.Cheng et al., 2012;Ye et al., 2012;Coopersmith et al., 2012;Yaeger et al., 2012;Botter et al., 2008), and therefore the model parameters identified solely by a regional FDC are expected to provide reliable predictions in ungauged catchments (e.g.Westerberg et al., 2014;Yu and Yang, 2000).
The comparative evaluation in this study provides another expected result, that the FDC calibration is able to reproduce the FDC itself, but it insufficiently captures functional responses of catchments due to the absence of flow timing information.A hydrograph is the most complete flow signature embedding numerous processes interacting within a catchment (Blöschl et al., 2013), being more informative than an FDC.Since any simplification of a hydrograph, including the FDC, loses some amount of flow information, it is no surprise that the FDC calibration worsens the equifinality.This study emphasises that the absence of flow timing in RFDC_cal may cause larger prediction errors than regionalised parameters gained from observed hydrographs.The paired t test between PROX_reg and FPROX_reg highlights that regionalised parameters gained from observed hydrographs were likely to contain intangible flow timing information even for ungauged catchments.The flow timing information implicitly transferred to ungauged catchments is a major difference between PROX_reg and RFDC_cal.The errors introduced by the FDC regionalisation were not significant due to the high performance of the geostatistical method in this study.
Because the hydrograph calibration can compensate for the errors in input-output data, one may convert the hydrograph into the FDC to avoid effects of disinformation on rainfall-runoff modelling.However, in this case, valuable flow timing information should be balanced in tradeoff.For RFDC_cal in this study, we began with converting the observed hydrographs into the flow quantiles to regionalise them; thus, the flow timing information was initially lost.As shown, the performance of RFDC_cal was generally lower than that of PROX_reg.Therefore, when condensing observed hydrographs into flow signatures, preserving all available flow information in the hydrograph would be key for a successful rainfall-runoff modelling.This study shows that using only regionalised FDC could lead to less reliable rainfall-runoff modelling in ungauged catchments than regionalised parameters.An FDC is unlikely to preserve all flow information in a hydrograph necessary for rainfallrunoff modelling.

Suggestions for improving RFDC_cal
Westerberg et al. ( 2014) suggested the necessity of further constraining to reduce predictive uncertainty in RFDC_cal.This study found that RFDC_cal could provide comparable performance to regenerate the flow signatures within which only flow magnitudes are involved (i.e.R QP and I BF ).To supplement regional FDC, flow signatures associated with flow timing seem to be essential.Figure 9 shows the potential of additional constraining with D RL , and Q2 in Table 3 confirms it.Other flow signatures in temporal dimensions, such as the high-and the low-flow event durations in Westerberg and McMillan (2015), can be candidates to improve RFDC_cal.However, uncertainty in those flow signatures will be a challenge when it comes to building regional models for ungauged catchments (Westerberg et al., 2016).An alternative method of RFDC_cal is to directly regionalise hydrographs to ungauged catchments (e.g.Viglione et al., 2013).In data-rich regions, topological proximity could better capture spatial variation of daily flows than rainfallrunoff modelling with regionalised parameters (Viglione et al., 2013).Although a dynamic model may be required for regionalising observed daily flows at an expensive computational cost, flow timing information would be contained in regionalised hydrographs.The parameter identification against the regional hydrographs may become a better approach than RFDC_cal and/or other signature-based calibrations.

Limitations and future research directions
There are caveats in our comparative evaluation.First, uncertainty in input-output data was not considered in our assessment.McMillan et al. (2012) reported typical ranges of relative errors in discharge data as 10-20 % for medium to high flow and 50-100 % for low flows.We assumed that quality of the discharge data was adequate.However, other methods objectively considering uncertainty could better estimate model performance and the equifinality (e.g.Westerberg et al., 2011Westerberg et al., , 2014)).Second, we used a conceptual runoff model with a fixed structure for all the catchments.Uncertainty from the model structure would vary across the study catchments; nevertheless, the structural uncertainty was not measured here.Our comparative assessment was based on the basic premise that modelling conditions should be fixed for all study catchments.Third, we compared RFDC_cal and PROX_reg in a region with sufficient data lengths and quality at gauged catchments.The lessons from this study may not be expandable to ungauged catchments under poor data availability.Finally, though the proximity-based parameter regionalisation was good for the South Korean catchments, comparison between RFDC_cal and other regionalisation methods, such as the regional calibration and the similaritybased parameter transfer, may provide beneficial information for rainfall-runoff modelling in ungauged catchments.Comparative assessment between RFDC_cal and other parameter regionalisation using more sample catchments under diverse climates will provide more meaningful lessons.
We can no longer hypothesise that the parameters gained against regionalised FDC would perform sufficiently, because an FDC contains less information than a hydrograph (i.e. the absence of flow timing).For improving RFDC_cal, we suggest supplementing RFDC_cal with flow signatures in temporal dimensions.Then, the question of how to make flow signatures more informative than (or equally informative to) hydrographs should be addressed.This may be impossible only using flow signatures originating from hydrographs (e.g.mean annual flow, baseflow index, recession rates, FDC).Combinations of those signatures are unlikely to be more informative than their origins (i.e.hydrographs), though it depends on how much disinformation is present in the observed flows.Future research topics could include finding new signatures that supplement hydrographs, and how to combine them with existing flow signatures for rainfallrunoff modelling in ungauged catchments.

Conclusions
While rainfall-runoff modelling against regional FDC appeared a good approach for prediction in ungauged catchments, this study highlights its weakness in the absence of flow timing information, which may cause poorer predictive performance than the simple proximity-based parameter regionalisation.The following conclusions are worth emphasising.
For ungauged catchments in South Korea, where spatial proximity well captured functional similarity between gauged catchments, the model calibration against regional FDC is unlikely to outperform the conventional proximitybased parameter transfer for daily runoff prediction.The absence of flow timing information in regional FDC seems to cause a substantial equifinality problem in the parameter identification process and thus lower predictability.
The model parameters gained from observed hydrographs contain flow timing information even for ungauged catchments.This intangible flow timing information should be discarded if one calibrates a rainfall-runoff model against regional FDC.This information loss may reduce predictability in ungauged catchments significantly.
To improve the calibration against regional FDC, flow metrics in temporal dimensions, such as the rising limb density, need to be included as additional constraints.As an alternative approach, if river gauging density is high, regionalised hydrographs preserving flow timing information can be used for local calibrations at ungauged catchments.
For better prediction in ungauged catchments, it is necessary to find new flow signatures that can supplement the Hydrol.Earth Syst.Sci., 21, 5647-5661, 2017 www.hydrol-earth-syst-sci.net/21/5647/2017/ observed hydrographs.How to combine them with existing information will be a future research topic for rainfall-runoff modelling in ungauged catchments.

Figure 1 .
Figure 1.Locations of the study catchments in South Korea.The numbers are labelled at the outlet of each catchment.

Figure 3 .
Figure 3. (a) Box plots of high flow (NSE) and low flow (LNSE) reproducibility of the behavioural parameters obtained from the hydrograph calibration at the 45 catchments.(b) The relationship between the input-output consistency and the model performance.The straight lines in the box plots connect the performance metrics for the calibration (2011-2015) and the validation periods (2007-2010) in each catchment.

Figure 4 .
Figure 4. 1 : 1 scatter plot between the empirical flow quantiles and the flow quantiles predicted by the top-kriging FDC regionalisation method.

Figure 5 .
Figure 5. Box plots of NSE and LNSE values between the observed and the predicted hydrographs by RFDC_cal and PROX_reg for the 45 catchments under the cross-validation mode.

Figure 6 .
Figure 6.The observed and predicted hydrographs, the prediction areas, and the observed and predicted FDC given by (a) the hydrograph calibration and (b) the FDC calibration for Namgang Dam (Catchment 2 in Fig. 1).

Figure 7 .
Figure 7.The input-output consistency vs. equifinality increased by replacing the hydrograph calibration with the FDC calibration.The equifinality ratio is defined as the ratio between the prediction areas of the 50 behavioural parameters gained from the FDC calibration and the hydrograph calibration.

Figure 8
Figure 8 summarises the performance of RFDC_cal and PROX_reg to regenerate the three flow signatures of R QP , I BF , and D RL .RFDC_cal is competitive in reproducing the averaged-based signatures R QP and I BF , while it showed relatively weak ability to regenerate the event-based signature D RL .R QP and I BF are flow metrics based on averages of long-term flow and precipitation in which no flow timing information is involved.In particular, RFDC_cal showed strong performance in reproducing I BF relative toPROX_reg.This result can be explained by considering that baseflow has fewer temporal variations than direct runoff in the South Korean catchments under typical monsoonal climate.High seasonality of monsoonal precipitation causes high temporal variations in direct runoff during June to September, while relatively steady baseflow is dominant during dry seasons (October to May).In Namgang Dam (whose flow variation is displayed in Fig.6), for example, the coef-

Figure 8 .
Figure 8. Flow signature reproducibility comparison between RFDC_cal and PROX_reg in terms of R QP (a), I BF (b), and D RL (c).

Figure 9 .
Figure 9. Predictive performance of the FDC calibrations additionally conditioned by R QP (FDC + RQP), I BF (FDC + IBF), and D RL (FDC + DRL) in comparison to the other modelling approaches.Q_cal and FDC_cal refer to the hydrograph and the FDC calibration in gauged catchments respectively.Thirty-eight catchments with positive NSE for all the modelling approaches were used in the box plots.

Table 1 .
Summary of hydrological features of the study catchments.
a Ratio of potential ET to total precipitation.b Percentage of snowfall to total precipitation.Climatological features were calculated using spatial averages of the grid data, while the flow metrics were from the daily hydrographs for 2007-2015 as explained in Sect.3.6.

Table 3 .
Results of the paired t tests for potential questions on rainfall-runoff modelling in ungauged catchments.Performance metric used for t test.b Mean PM difference between the corresponding pair.c Standard error of PM. * PM is significantly different from zero.The significance was evaluated at 95 % confidence levels. a