Calibration approaches for distributed hydrologic models in poorly gaged basins: implication for streamflow projections under climate change

This study tests the performance and uncertainty of calibration strategies for a spatially distributed hydrologic model in order to improve model simulation accuracy and understand prediction uncertainty at interior ungaged sites of a sparsely gaged watershed. The study is conducted using a distributed version of the HYMOD hydrologic model (HYMOD_DS) applied to the Kabul River basin. Several calibration experiments are conducted to understand the benefits and costs associated with different calibration choices, including (1) whether multisite gaged data should be used simultaneously or in a stepwise manner during model fitting, (2) the effects of increasing parameter complexity, and (3) the potential to estimate interior watershed flows using only gaged data at the basin outlet. The implications of the different calibration strategies are considered in the context of hydrologic projections under climate change. To address the research questions, high-performance computing is utilized to manage the computational burden that results from high-dimensional optimization problems. Several interesting results emerge from the study. The simultaneous use of multisite data is shown to improve the calibration over a stepwise approach, and both multisite approaches far exceed a calibration based on only the basin outlet. The basin outlet calibration can lead to projections of mid-21st century streamflow that deviate substantially from projections under multisite calibration strategies, supporting the use of caution when using distributed models in data-scarce regions for climate change impact assessments. Surprisingly, increased parameter complexity does not substantially increase the uncertainty in streamflow projections, even though parameter equifinality does emerge. The results suggest that increased (excessive) parameter complexity does not always lead to increased predictive uncertainty if structural uncertainties are present. The largest uncertainty in future streamflow results from variations in projected climate between climate models, which substantially outweighs the calibration uncertainty.


Introduction
In an effort to advance hydrologic modeling and forecasting capabilities, the development and implementation of physically based, spatially distributed hydrologic models has proliferated in the hydrologic literature, supported by readily available geographic information system (GIS) data and rapidly increasing computational power.Distributed hydrologic models can account for spatially variable physiographic properties and meteorological forcing (Beven, 2012), improving simulations compared to conceptual, lumped models for basins where spatial rainfall variability effects are significant (Ajami et al., 2004;Koren et al., 2004;Reed et al., 2004;Khakbaz et al., 2012;Smith et al., 2012) and for nested basins (Bandaragoda et al., 2004;Brath et al., 2004;Koren et al., 2004;Safari et al., 2012;Smith et al., 2012).The benefits of distributed modeling have been recognized by the U.S. National Oceanic and Atmospheric Administration's National Weather Service (NOAA/NWS) and demonstrated in the Distributed Model Intercomparison Project (DMIP) (Reed et al., 2004;Smith et al., 2004Smith et al., , 2012Smith et al., , 2013)).Importantly, distributed hydrologic models can evaluate hydrolog-S.Wi et al.: Implication for streamflow projections under climate change ical response at interior ungaged sites, a benefit not afforded by lumped models.The use of distributed hydrologic modeling for interior point streamflow estimation is particularly relevant for poorly gaged river basins in developing countries, where reliable predictions at interior sites are often required to inform water infrastructure investments.As international development agencies begin to integrate climate change considerations into their decision-making processes (e.g., Yu et al., 2013), these investments need to be robust under both current climate conditions and possible future climate regimes.
Despite their roots in physical realism, distributed hydrologic models can suffer from substantial uncertainty.A major source of uncertainty originates from the proper identification of parameter values that vary across the watershed, especially when observed streamflow data is only available at one or a few points (Exbrayat et al., 2014).Parameters can be discretized across the watershed in several ways (Flugel, 1995;Efstratiadis et al., 2008;Khakbaz et al., 2012): uniquely for each grid cell or hydrologic response unit (fully distributed), based on sub-basins whose boundaries do not necessarily ensure homogenous characteristics (semi-distributed) or, in the simplest case, a single parameter set for all model grid cells (lumped).With limited data, the parameter identification problem, particularly for the fully distributed case, can be impractical or infeasible (Beven, 2001).The parameterization challenge has spurred substantial advances in understanding appropriate calibration techniques for distributed hydrologic models.Many studies have attempted to reduce the dimensionality of the calibration problem to alleviate the issue of equifinality (Beven and Freer, 2001), which is the phenomenon whereby multiple parameter sets produce indistinguishable model performance.This work has found favorable results when the parametric complexity of the distributed model is aligned with the data available for calibration (Leavesley et al., 2003;Ajami et al., 2004;Eckhardt et al., 2005;Frances et al., 2007;Zhu and Lettenmaier, 2007;Cole and Moore, 2008;Pokhrel and Gupta, 2010;Khakbaz et al., 2012).There has also been extensive research exploring the use of multiple objectives and different operational procedures to understand parameter estimation tradeoffs and identifiability for distributed model calibration, with great success (Madsen, 2003;Efstratiadis and Koutsoyiannis, 2010;Li et al., 2010;Kumar et al., 2013).
Despite these advances, important questions still persist.It still remains difficult to compare the uncertainty that emerges from different operational calibration procedures for multisite applications (i.e., whether gages in series should be used sequentially or simultaneously for calibration) and under different levels of parametric complexity.Due to the computational burden required to calibrate distributed models, this uncertainty is problematic to explore.Furthermore, in poorly gaged basins, it is challenging to quantify the lost accuracy and increased uncertainty for interior flow estimation when a distributed model is calibrated only at an out-let gage (which is often all that is available in developingcountry river basins).In the case of significant spatial variability in the basin properties that influence runoff generation (e.g., permeability, vegetation, and slope), accurate runoff predictions are unlikely at interior locations based only on the lumped information obtained at the basin outlet (Anderson et al., 2001;Cao et al., 2006;Breuer et al., 2009;Lerat et al., 2012;Smith et al., 2012;Wang et al., 2012).The extent of this error and uncertainty is not well understood for heterogeneous basins due to the computational expense required to explore this issue.Finally, rarely have the implications of these calibration issues been explicitly examined for possible future climate conditions, which is required in climate change impact studies.This question has been explored for lumped, conceptual models (Wilby, 2005;Steinschneider et al., 2012), but has been difficult to evaluate for computationally expensive distributed models.
This study addresses the above research challenges by focusing on the following four questions: (1) how does calibration procedure for using multisite data affect the accuracy and uncertainty of distributed models used for streamflow predictions at ungaged sites; (2) what effects does increased parameter complexity have on distributed model calibration and prediction; (3) how much degradation in model accuracy and uncertainty can be expected for interior flow estimation based on a calibration procedure using only the basin outlet; and (4) how do different calibration formulations for a distributed model alter projections of streamflow at ungaged sites under climate change conditions?These questions are considered in an application of a distributed version of the daily HYMOD hydrologic model to the Kabul River basin in Afghanistan and Pakistan.To address these research questions, high-performance computing is utilized to manage the computational burden that often hinders such explorations (Laloy and Vrugt, 2012;Zhang et al., 2013).

Study area
The Kabul River basin (67 370 km 2 ) is a plateau surrounded by mountains located in the eastern central part of Afghanistan (Fig. 1).It is the most important river basin of Afghanistan, containing 35 % of the country's population.While it encompasses just 12 % of the area of Afghanistan, the basin's average annual streamflow (about 24 billion cubic meters) is about 26 % of the country's total streamflow volume (World Bank, 2010).
Water resources from the basin are shared by Afghanistan and Pakistan and serve as a water supply source for more than 20 million people.The shared use of transboundary water between these two countries is central in establishing regional water resources development for this area (Ahmad, 2010).It is crucial to develop tools that can support engineering plans for existing and potential water infrastructure to take full advantage of the water resources in the basin.The government While the dominant source of streamflow in winter is baseflow and winter rainfall, glaciers and snow cover are the most important long-term forms of water storage and, hence, the main source of runoff during the ablation period for the basin (Shakir et al., 2010).In total, 2.9 % (1954 km 2 ) of the basin is glacierized based on the Randolph Glacier Inventory version 3.2 (Pfeffer et al., 2014).The meltwater from glaciers and snow produce the majority (75 %) of the total streamflow (Hewitt et al., 1989).each sub-watershed delineated by the stations located inside the Kabul Basin (Fig. 1).Two different climate patterns are distinguishable across the sub-basins.The sub-basins on the Kunar River tributary (Kama, Asmar, Chitral, Gawardesh, and Chaghasarai) receive moderate annual precipitation and are highly affected by snow and glacier covers.All of these sub-basins have high ratios of mean annual flow to mean annual precipitation, with the ratios for the Kama, Asmar, Chitral, and Chaghasarai sub-basins larger than 1.Conversely, the Daronta sub-basin contains only minimal glacial cover, and is relatively dry.Daronta is also much less productive, with annual streamflow far below the other sub-basins with an average of only 165 mm yr −1 .Issues of shared water resources between Afghanistan and Pakistan in the Kabul River basin are becoming complex due to the impacts of climatic variability and change (IUCN, 2010).The vulnerability of glacial streamflow regimes to changes in temperature and precipitation (Stahl et al., 2008;Immerzeel et al., 2012;Radic et al., 2014) highlights the need to assess the impact of climate change on future water availability in this area.

Data
Gridded daily precipitation and temperature products with a spatial resolution of 0.25 • C were gathered between calendar years 1961 and 2007 from the Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) data set (Yatagai et al., 2012).There has been some concern regarding underestimation of precipitation in APHRODITE for some regions of Asia (Palazzi et al., 2013); our preliminarily data analysis (intercomparison of precipitation products between five different databases) confirmed this for the Kabul River basin (shown in Fig. S1 in the Supplement).Thus, the APHRODITE precipitation was bias-corrected by the precipitation product from the University of Delaware global terrestrial precipitation (UD) data set (Legates and Willmott, 1990).Daily series of bias-corrected APHRODITE precipitation were coupled with APHRODITE temperature for 160 0.25 • C grid cells to produce a climate forcing data set for the distributed domain of the Kabul River basin model.
This study used the set of global climate change simulations from the World Climate Research Programme's Coupled Model Intercomparison Project Phase 5 (CMIP5) multimodel ensemble (Talyor et al., 2012).Monthly climate outputs of GCMs (general circulation models) were downscaled to a daily temporal resolution and 0.25 • C spatial resolution based on the bias-correction spatial disaggregation (BCSD) statistical downscaling method introduced by Wood et al. (2004).
Monthly streamflow observations for seven locations in the Kabul River basin (Fig. 1) were gathered between calendar years 1960 and 1981 from two data sources: the Global Runoff Data Centre (GRDC) database and the United States Geological Survey (USGS) database (Table 1).Streamflow data were not collected in Afghanistan after September 1980 until recently because stream gaging was discontinued soon after the Soviet invasion of Afghanistan in 1979 (Olson and Williams-Sether, 2010).Though measurements were taken at a daily time step, data are only made available for public use at monthly aggregated levels, calculated using the mean of the daily values.The available monthly streamflow observations at each station were used for calibrating and validating the distributed hydrologic model (Fig. 2).Kama and Asmar stations are treated as ungaged sites because they align with the potential dam project on the Kunar River tributary.The two gage stations are left out of the processes of multisite calibrations in order to evaluate the model's ability to predict streamflow at interior ungaged sites.Furthermore, half of the record at the Dakah station, located at the basin outlet, is also used for validation purposes.
The Randolph Glacier Inventory version 3.2 (RGI 3.2) data set (Pfeffer et al., 2014) was used to extract glacial coverage in the Kabul River basin, which totaled 5.7 % of the basin area (Fig. S2).In the hydrological modeling process, the model needs to be informed by reliable estimates on volume of water retained in glaciers, especially for future simulations under warming conditions.We followed the method proposed in Grinsted (2013), which uses multivariate scaling relationships to estimate glacier and ice cap volume based on elevation range and area.Specifically, the scaling law including area and elevation range factors was applied to estimate glacier/ice cap volume when the glacier depth exceeded 10 m.Otherwise, glacier/ice cap volume was estimated with the area-volume scaling law.The elevation range spanned by each individual glacier is estimated using the global digital elevation model (DEM) from the shuttle radar topography mission (SRTMv4) in 250 m resolution (Jarvis et al., 2008).Density of ice (0.9167 g cm −3 ) is applied to calculate glacier/ice cap volume in meters of water equivalent.
The database for land covers and soil types of the Kabul River basin (Fig. 1) are provided by the Food and Agriculture Organization of the United Nations (Latham et al., 2014) and United States Department of Agriculture -Natural Resources Conservation Service Soils (USDA-NRCS, 2005), respectively.

Distributed Hydrologic Model (HYMOD_DS)
In this study the lumped conceptual hydrological model HY-MOD (Boyle, 2001) is coupled with a river routing model to be suitable for modeling a distributed watershed system.We name it HYMOD_DS denoting the distributed version of HYMOD.Snow and glacier modules have been introduced to enhance the modeling process for glacier and snow covered areas within the Kabul River basin.The HYMOD_DS is composed of hydrological process modules that repre-sent soil moisture accounting, evapotranspiration, snow processes, glacier processes and flow routing.The model operates on a daily time step and requires daily precipitation and mean temperature as input variables.The overall model structure of the HYMOD_DS and its 15 parameters are described in Fig. 3 and Table 2, respectively.Further details are provided below.
The HYMOD conceptual watershed model has been extensively used in studies on streamflow forecasting and model calibration (Wagener et al., 2004;Vrugt et al., 2008;Kollat et al., 2012;Gharari et al., 2013;Remesan et al., 2013).The HYMOD is a soil moisture accounting model based on the probability-distributed storage capacity concept proposed by Moore (1985).This conceptualization represents a cumulative distribution of varying storage capacities (C) with the following function: where the exponent B is a parameter controlling the degree of spatial variability of storage capacity over the basin and C max is the maximum storage capacity.The model assumes that all storages within the basin are filled up to the same critical level (C * (t)), unless this amount exceeds the storage capacity of that particular location.With this assumption, the total water storage S(t) contained in the basin corresponds to Consequently, two parameters are introduced for the runoff generation process with two components: where P (t) is precipitation, Runoff 1 is surface runoff, and Runoff 2 is subsurface runoff.A parameter (α) is introduced to represent how much of the subsurface runoff is routed over the fast (Q fast ) and slow (Q slow ) pathway: The potential evapotranspiration (PET) is derived based on the Hamon method (Hamon, 1961), in which daily PET in millimeters is computed as a function of daily mean temperature and hours of daylight:  where L d is the daylight hours per day, T is the daily mean air temperature ( • C), and Coeff is a bias correction factor.The hours of daylight is calculated as a function of latitude and day of year based on the daylight length estimation model (CBM model) suggested by Forsythe et al. (1995).
The HYMOD_DS includes snow and glacier modules with separate runoff processes, i.e., the runoff from the glacierized area is calculated separately and added to runoff generated from the soil moisture accounting module coupled with the snow module.The implicit assumption here is that there is no interchange of water between soil layers and glacial area and runoff from glacial areas is regarded as surface flow.The runoff from each area is weighted by its area fraction within the basin to obtain total runoff.
The time rate of change in snow and glacier volume governed by ice accumulation and ablation (melting and sublimation) is expressed by the degree day factor (DDF) mass balance model (Moore, 1993;Stahl et al., 2008).The dominant phase of precipitation (snow vs. rain) is determined by a temperature threshold (T th ).The snowmelt M s and glacier melt M g is calculated as with DDF s (T s ) and DDF g (T g ) applied separately for snow and glacier modules, respectively.To account for the higher melting rate of glaciers than snow owing to the low albedo (Konz and Seibert, 2010;Kinouchi et al., 2013), we introduced a parameter r > 1 to constrain DDF g to be larger than DDF s (i.e., DDF g = r × DDF s ).For the rain that falls on the glacierized area, the glacier parameter K g determines the portion of rain becoming surface runoff as a multiplier for the rainfall.The remaining rainfall is assumed to be accumulated to the glacier store.
The within-grid routing process for direct runoff is represented by an instantaneous unit hydrograph (IUH) (Nash, 1957), in which a catchment is depicted as a series of N reservoirs each having a linear relationship between storage and outflow with the storage coefficient of K q .Mathematically, the IUH is expressed by a gamma probability distribution: where is the gamma function.The within-grid groundwater routing process is simplified as a lumped linear reservoir with the storage recession coefficient of K s .
The transport of water in the channel system is described using the diffusive wave approximation of the Saint-Venant equation (Lohmann et al., 1998): where C and D are parameters denoting wave velocity (Velo) and diffusivity (Diff), respectively.Similar to most other hydrological models (Efstratisdis et al., 2008), HYMOD_DS is not designed to model water abstractions for agricultural lands and dam operations within the basin.According to the World Bank (2010), water demand for agricultural use is about 2000 million cubic meters, or about 8.3 % of the total annual flow.The Naglu dam (Fig. 1) upstream of the Daronta streamflow gage forms the largest and most important reservoir in the basin, with an active storage of 379 million cubic meters.In our hydrologic modeling process, the water consumed by irrigated croplands is implicitly accounted for by the evapotranspiration module.We note that the degree of irrigation impact during the time frame used for calibration  is likely much smaller than the current level.We also expect that using monthly data for calibration somewhat reduces the bias from human interference, particularly the daily operations of the Naglu dam.Nevertheless, the calibration results for the gage below this dam (Daronta), and to a lesser extent the basin outlet (Dakah), should be approached with caution.Given that a majority of the gages examined in this study are on an underdeveloped branch of the Kabul River, issues of human interference on calibration are somewhat mitigated.

Methods
The purpose of this study is to explore the implications of different calibration strategies and choices for a computationally expensive distributed hydrologic model.A variety of calibration experiments are conducted, with the results from preceding experiments informing choices made for subsequent ones.All calibration approaches are tested in terms of their ability to predict flows at interior site gages that were left out of the calibration process.In all cases, the genetic algorithm (GA) introduced by Wang (1991) is used as an optimization method for model parameter calibration, and the objective function is based simply on the Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliff, 1970), which is by far the most utilized performance metric in hydrological model applications (Biondi et al., 2012).A multisite average of the NSE is used when evaluating performance across multiple sites.We fully recognize that the use of one objective, such as the NSE, is inferior compared to multiobjective approaches that can identify Pareto optimal solutions that provide good model performance across different components of the flow regime (Madsen, 2003;Efstratiadis and Koutsoyiannis, 2010;Li et al., 2010;Kumar et al., 2013).However, in this particular study daily hydrologic model simulations can only be compared against available monthly streamflow records, reducing the number of viable objectives against which to calibrate.That is, statistics representing peak flows, extreme low flows, and other daily flow regime characteristics often used in multiobjective optimization approaches are unavailable.We believe that the use of a monthly NSE value as a single objective, while coarse, does not inhibit our ability to provide insight into the research questions posed.In addition to the NSE, the Kling-Gupta efficiency (KGE) (Gupta et al., 2009) is adopted as an alternative model performance metric, which equally weights model mean bias, variance bias, and correlation with observations.
In this study, three levels of parameter complexity are considered: lumped, semi-distributed, and fully distributed formulations (Fig. 4).The different levels of parameter complexity are defined according to the spatial distribution of unique hydrologic model parameters.In the lumped formulation a single parameter set is applied to the entire basin.In the semi-distributed formulation, a unique parameter set is assigned to each sub-basin, defined based on the location of available streamflow gaging sites.The fully distributed parameter structure follows the spatial discretization of climate input grids, allowing for a unique parameter set for each grid cell.No matter the parameterization scheme, the model structure follows the climate input grids; i.e., the hydrological water cycle within each grid cell is modeled separately.We note that a lumped model structure (i.e., no gridded or sub-unit structure) has often been considered as a baseline model formulation in the assessment of distributed modeling frameworks (e.g., see Smith et al., 2013).However, the focus of our study is on ungaged interior site streamflow estimation, making this formation somewhat inappropriate.Furthermore, preliminary tests comparing streamflow simulations at the basin outlet (Dakah) between a gridded and basin-averaged structure, both with a lumped parameter formulation, support the use of the distributed grid structure (Fig. S3).
The parameter complexity will vary depending on the calibration experiment being conducted but, for each experiment regardless of the parameterization, the optimization is implemented 50 times using the GA algorithm to explore calibration uncertainty.The considerably high computational cost required to perform a large number of calibrations is managed using the parallel computing power provided by the Massachusetts Green High-Performance Computing Center (MGHPCC), from which several thousands of processors are available.
In the first modeling experiment, we explore two calibration strategies for using multisite streamflow data, a stepwise and pooled approach.In the stepwise calibration, parameters are calibrated for upstream gaged sub-catchments and subsequently fixed during calibration of downstream points, while for the pooled approach, parameters are calibrated for multiple sub-catchments simultaneously.Both approaches are assessed for the semi-distributed formulation.The better of the two methods is identified for use in the second experiment, where the effects of increased parameter complexity are tested in terms of streamflow prediction accuracy and uncertainty.In the third experiment, we consider the situation where there is only data at the basin outlet for calibration.
Here, the model is calibrated against the outlet gage under all levels of parameter complexity and is compared against the best combination of calibration strategy (stepwise or pooled) and parameter complexity (lumped, semi-distributed, or fully distributed) identified in the previous experiments.Finally, a subset of the calibration approaches deemed worthy of further investigation are compared in terms of their projections of future streamflow under climate change to highlight how model calibration differences can alter the results of a climate change assessment for water resources applications.These experiments are described in further detail below.

Multisite calibration: stepwise and pooled approaches
In the first experiment, the semi-distributed parameterization concept is compared under alternative multisite calibration strategies, the stepwise and pooled calibration approaches.
To conduct the stepwise calibration, a nested class of subbasins is defined corresponding to multiple gaging stations.
In the first step of the stepwise calibration, the optimization process is carried out with nested sub-basins at the lowest level (i.e., the most upstream sites).Once parameters of nested sub-basins are determined, the parameters are fixed, and the calibration procedure proceeds with nested basins at upper levels until parameters for the entire basin are determined.In this particular application to the Kabul River basin, five gaged sub-basins were selected and the stepwise calibration procedure for those sub-basins followed this direction: Chitral → Gawardesh → Chaghasarai → Daronta → Dakah (Fig. 5).The stepwise calibration approach involves a number of GA implementations corresponding to the number of gaging sites.The GA optimization was carried out a total of 250 times in this application, with 50 optimization runs containing GA implementations for five sub-basin regions.
The pooled calibration strategy involves calibrating all parameters of the model domain simultaneously against multiple streamflow gages within the watershed.This approach aims at looking for suitable parameters that are able to produce satisfactory model results at all gaging stations in a single implementation of GA optimization.That is, the GA searches the entire parameter space at once to maximize the average NSE across all sites.This operational feature reduces the processing time spent on the GA implementation compared to the stepwise calibration strategy.To identify the better of the two multisite calibration approaches, the comparison focused on their ability to predict streamflow and calibration uncertainties at two interior site gages (Kama and Asmar) that were assumed to be ungaged (Fig. 5), as well as for validation data at the basin outlet.
It is important to note that the evaluation of these multisite calibration strategies is somewhat weakened because of the lack of overlapping data periods among most of the stations (Fig. 2).This drawback prevents the calibration methods from accounting for simultaneous information from different tributaries, which, if available, would better enable the calibration methods to account for heterogeneity of hydrological processes across the sub-basins.

Increased parameter complexity
In the second experiment, the better of the two approaches (stepwise or pooled) identified in the first experiment is further tested with respect to the three different levels of parameter complexity.In addition to the semi-distributed parameter formulation considered in the first experiment, lumped and fully distributed parameter formulations are calibrated for the selected approach to investigate the gain or loss arising from different levels of parameter complexity.Since the hydrologic model HYMOD employed in this study involves 15 parameters, the lumped version of the HYMOD_DS contains a single, 15-member parameter set applied to all model grid cells.The semi-distributed conceptualization of HY-MOD_DS contains a single parameter set for each subbasin, totaling 75 parameters.In the distributed parameterization the number of parameters increases dramatically.With 160 0.25 • C grid cells, the number of parameters requiring calibration reaches 2400.As the number of parameters increase across the parameterization schemes, calibration becomes increasingly computationally expensive.The number of model runs used in the GA optimization algorithm for the lumped, semi-distributed, and distributed parameterization schemes are 15 000 (150 populations × 100 generations), 75 000 (750 × 100), and 480 000 (2400 × 200), respectively.These population/generation sizes were supported using convergence tests for each calibration.Again, 50 separate GA optimizations were used to explore calibration uncertainties for each parameterization scheme.To give a sense of the computational burden of this experiment, we note that 50 trials of the HYMOD_DS calibration under the distributed conceptualization required 1000 processors over 7 days on the MGHPCC system.

Basin outlet calibration
The third experiment considers the situation where there is only gaged data at the basin outlet (Dakah) for calibration, a common situation when calibrating hydrologic models in data-scarce river basins.Here, we evaluate the potential of the basin outlet calibration to estimate interior watershed flows in terms of both accuracy and precision at all gaging stations.All levels of parameter complexity are considered for this calibration.The main purpose of this experiment is to compare the veracity of a distributed hydrologic model calibrated only using basin outlet data with results from multisite calibrations to better understand the degradation in model performance under data scarcity.Other than the use of an NSE objective only at the basin outlet, all other GA settings for each level of parameter complexity are identical to the settings used in the second experiment.

Climate change projections of streamflow
The fourth experiment investigates how the choice of calibration approach can alter the projections of future streamflow under climate change.To explore this question, streamflow simulations for the 2050s, defined as the 30-year period spanning from 2036 to 2065, are carried out using climate projections from the CMIP5 (Talyor et al., 2012).A total of 36 different climate models run under two future conditions of radiative forcing (RCP 4.5 and 8.5) are used.Streamflow projections are developed for the basin outlet (Dakah) and two interior gages left out of the calibration (Kama and Asmar).By using 36 different GCMs and 50 optimization trials for each calibration scheme, this analysis compares the uncertainty in future streamflow projections originating from uncertainty in different hydrologic model parameterization schemes and under alternative future climates.
Streamflow projections are considered under all three parameterization schemes (lumped, semi-distributed, and fully distributed) for both the basin outlet model and the best multisite calibration approach (stepwise or pooled).Multiple streamflow characteristics are evaluated, including monthly streamflow, wet (April-September) and dry (October-March) season flows, and daily peak flow response.The differences and uncertainty in these metrics across calibration approaches will highlight the importance of calibration strategy for evaluating future water availability and flood risk.

Results
For the remaining part of the paper, we introduce the following shorthand: Lump, Semi, and Dist indicate the lumped, semi-distributed, and fully distributed parameterization schemes, and Outlet, Stepwise, and Pooled correspond to basin outlet, stepwise, and pooled calibrations.The comparison between different calibration strategies is based on the model performance evaluated with the NSE, as well as an alternative metric, the KGE.

Pooled calibration vs. stepwise calibration
This section reports the results from the first experiment comparing the stepwise and pooled calibration approaches for the semi-distributed model parameterization.Figure 6 shows the comparison between the Semi-Stepwise and Semi-Pooled with box plots representing the 50 trials of calibration.Under the stepwise calibration the results for four subbasins (Chitral, Gawardesh, Chaghasarai, and Daronta) are optimal because there is no interaction between those subbasins.However, the calibrated parameter sets of each subbasin act as constraints in the last step of the Semi-Stepwise resulting in the degradation of model skill at the basin outlet (Dakah) and two left-out gages (Asmar and Kama).This becomes apparent when comparing the Semi-Stepwise to the Semi-Pooled results.The model skill under the Semi-Pooled is similar to that from the Semi-Stepwise with respect to the four upstream sub-basins, but it outperforms at the verification gages.This is particularly true for the Asmar gage, which exhibits a downward bias and substantial variability in performance under the Semi-Stepwise.The Semi-Pooled results suggest that small sacrifices of model performance at certain sites can improve and stabilize basin-wide performance.Expected values of KGE from 50 calibrations are also provided (values in parenthesis in the bottom of Fig. 6) and this performance metric also leads to the same conclusion.Therefore, the Semi-Pooled was selected as the better multisite calibration strategy and is considered for further analyses in the following sections.

Pooled calibration with alternative parameterizations
Here we examine results for the three levels of parameter complexity applied to the pooled calibration approach.Figure 7 shows the comparison of the pooled calibrations.Unsurprisingly, streamflow predictions from the Lump-Pooled have the lowest accuracy and largest uncertainty at the calibration sites, particularly for the Chaghasarai and Daronta sites.This demonstrates the well-known difficulty in representing flow characteristics of a spatially variable system with a homogenous parameter set (Beven, 2012).The pooled calibration substantially improves with increasing parameter complexity at the calibration sites.Both the Semi-Pooled and Dist-Pooled produce NSE values above 0.8 for all calibration sites; however, the Dist-Pooled shows a somewhat higher performance, undoubtedly from its greater freedom to overfit to the calibration data.However, the advantage of the Dist-Pooled with respect to streamflow predictions at validation sites becomes less clear.that the fully distributed conceptualization leads to overfitting of the model as compared to the Semi-Dist conceptualization.We reached the same conclusion when examining the KGE values, which rise with greater parameter complexity at calibration sites but no longer follow this pattern strictly at validation sites.Interestingly, the Lump-Pooled performs well at the verification sites despite its poor performance at calibration sites.The Lump-Pooled does not show significant degradation in skill at Kama compared to the more complex parameterizations, and the flow prediction at Asmar actually exhibits the best performance of all three model variants.A partial reason for this unexpected result arises from different overlapping periods in the calibration and validation data (see Fig. 2).The periods used for the calibration for Chitral (1978)(1979)(1980)(1981) and Gawardesh (1975Gawardesh ( -1978) ) have no overlapping periods with the one for Asmar (1966)(1967)(1968)(1969)(1970)(1971), which encompasses those two sub-basins.Instead, the validation at Asmar is mostly affected by the calibration to Dakah because of the overlapping 4 years (1968)(1969)(1970)(1971) between those two sites.This explains the reason why the Lump-Pooled shows high skill at Asmar despite the low skill at its sub-basins.However, the low model skill at Chaghasarai from the Lump-Pooled propagates to the validation result at Kama, as these two sites have a relatively long overlapping period (8 years, from 1967 to 1974).

Limitations of the basin outlet calibration
In the third experiment the HYMOD_DS was calibrated only to data at the basin outlet under all levels of parameter complexity, and streamflow records for all six sub-basins, as well as flows at Dakah not used during calibration, are used for model validation.First, we consider the flows at Dakah.During the calibration period, all three parameterization schemes produce very accurate streamflow predictions with NSE (KGE) values above 0.95 (0.96) (Fig. 8).High accuracy holds even under the Lump-Outlet, despite the spatial heterogeneity of the basin.While NSE and KGE values at Dakah rise marginally with greater parameter complexity during calibration, this no longer holds during the validation period, suggesting no benefit with an increase in parameter complexity.
The validation results for the six sub-basins demonstrate the danger in relying on outlet data alone when calibrating a distributed model for flow prediction at interior points.Streamflow predictions at interior sites exhibit low accuracy and high uncertainty, with the worst performance at the Daronta site (all NSEs and KGEs are negative).We note that the poor performance at Daronta is likely due in part to the impacts of water abstraction and the operation of Naglu dam.Further examination (Fig. S4) showed that the HYMOD_DS significantly overestimated streamflow at Daronta and underestimated flow at three sites in the eastern part of the basin (Chitral, Gawardesh, and Chaghasarai).Model performance at Kama and Asmar is somewhat better than at the other validation sites, although improvements are not the same across all parameterizations.The Lump-Outlet predictions at these sites still have low average accuracy (average NSE < 0.7 and average KGE < 0.6), while the Semi-Outlet exhibits large uncertainty in performance across the 50 optimization trials.Surprisingly, the over-parameterized Dist-Outlet shows promising results with high expected accuracy at Kama and Asmar (mean NSE (KGE) of 0.84 (0.71) and 0.90 (0.88), respectively) and comparable performance at many of the other sites.One exception is Gawardesh, where the Lump-Outlet outperforms the other model variants, although the reason for this is not immediately clear.Overall, the results indicate that any calibration based on basin outlet data should be used with substantial caution when predicting flows at interior basin sites.
After reviewing all of the calibration experiments, it becomes clear that the Semi-Pooled and Dist-Pooled calibrations provide more robust performance compared to the basin outlet calibrations due to their improved representation of internal hydrologic processes across the basin.To further compare these calibration strategies against one another, we evaluate the variability in optimal parameters resulting from the 50 trials of the GA algorithm.Figure 9 shows the coefficient of variation (CV) of C max (a parameter for the soil moisture account module) over the basin from all combinations of calibration approaches (the outlet and pooled) and three parameterization schemes.A clear pattern of increasing variability (higher uncertainty in C max ) emerges as parameter complexity increases for both the outlet and pooled calibration strategies.That is, the semi-and fully distributed parameterizations lead to significantly variable parameter sets that produce similar representations of the observed basin response.Figure 9 also suggests that the equifinality can be alleviated to an extent by pooling data across sites.The pooled calibration approaches consistently show lower variability in C max compared to the outlet calibration at the same level of parameter complexity.These results are relatively consistent across the remaining 14 HYMOD_DS parameters.The implications of parameter stability on streamflow projections under climate change is addressed in the next section.

Climate change projections of streamflow with uncertainty
Here we explore how projections of future water availability and flood risk under climate change are influenced by the choice of calibration approach.For the Kabul River basin, We first examine average monthly streamflow estimates across four calibration strategies: the Semi-Pooled and Dist-Pooled (most promising calibration strategies), as well as the Lump-Outlet (as a baseline) and Dist-Outlet (the best outlet calibration strategy).Figure 10 shows the monthly streamflow estimates for the historical period with the whisker bars indicating the uncertainty range across the 50 calibration trials.The monthly streamflow predictions are also provided for the 2050s under the RCP 4.5 and 8.5 scenarios.For the future scenarios, the whisker bars are derived by averaging over the 36 different climate projections for each of the 50 trials.For the historical time period, all calibration schemes match the observed monthly streamflow at Dakah well, but monthly streamflow is underestimated in most months at Kama and Asmar under the basin outlet calibrations, particularly by the Lump-Outlet.The historical monthly streamflow estimates from the outlet calibration strategies also tends to be highly uncertain for the months of June, July, August, and September, especially compared to the Semi-Pool and Dist-Pool.
Under future climate projections for the 2050s, the four calibration strategies show similar changes in monthly streamflow at Dakah, but the magnitudes of change are somewhat different.All calibration strategies suggest reduction in streamflow for June, July, and August under both RCP 4.5 and 8.5 scenarios.Also, the peak monthly flow, which occurred in June or July in the historical period, is shifted to May at Dakah.However, the Lump-Outlet predicts less reduction of flow in June and July and a greater reduction in August and September as compared to the other three calibrations.Considering that all calibration schemes had similar levels of good performance at this site for both calibration and validation periods, it is notable that they project future streamflow somewhat differently.
Future monthly streamflow predictions at Kama and Asmar vary widely between the four calibration schemes, mostly an artifact of their historic differences (Fig. 10).Streamflow projections under the outlet calibration strategies tend to show large uncertainties at these two sites, particularly the Lump-Outlet calibration.For three months, July-September, the outlet calibration and pooled calibration strategies provide substantially different insights about future water availability at Kama and Asmar.The outlet calibrations suggest less water with large uncertainties for those months as compared to the pooled calibrations.At Kama, the pooled calibrations suggest significant changes in the pattern of peak monthly flow timing under both RCP scenarios; instead of having a clear peak in July, streamflow from May to August show similar amounts of water.
To further understand the sources of uncertainty in future water availability, we evaluate the separate and joint influence of uncertainties in parameter estimation and future climate on seasonal streamflow projections across all calibration schemes.Figure 11 represents the uncertainty of wet and dry seasonal streamflow at Dakah from three sources: (1) calibration uncertainty across the 50 trials, with future climate uncertainty averaged out for each trial; (2) future climate uncertainty across the 36 projections, with calibration uncertainty averaged out across the 50 trials; and (3) the combined uncertainty across all 1800 (50 × 36) simulations.The results suggest somewhat surprisingly that uncertainty reduction can be expected as parameter complexity increases and, less surprisingly, by applying pooled calibration approaches.Another clear point is that the uncertainty resulting from different climate change scenarios substantially outweighs that from calibration uncertainty.
Up to this point, there has been little difference between the Semi-Pooled and Dist-Pooled model variants.These two versions were further analyzed with respect to extreme streamflow to see if distinguishing characteristics emerge.It has been demonstrated that clear gains in predicting peak  flows from distributed models are noticeable (Reed et al., 2004) and spatial variability in model parameters significantly influence the runoff behavior (Brath and Montanari, 2000;Pokhrel and Gupta, 2011).The spatial variability of optimal parameters derived from the Semi-Pooled and Dist-Pooled is shown in Fig. S6, with larger variability across all parameters for the Dist-Pooled than for the Semi-Pooled.To understand the effects of spatial variability and calibration uncertainty of parameters on extreme event estimation, the 100-year daily flood event was calculated under the Semi-Pooled and Dist-Pooled for each of the 50 historic simulations and 1800 future simulations across both RCP scenarios.Although the intermodel comparison is intended to be a useful addition that provides a distinction between the parameterization schemes in the pooled calibration approach, results from this analysis should be viewed in the context of a theoretical calibration exercise, not for decision-making purposes, because no observed daily streamflow is available against which to compare the estimated 100-year daily flood events.Projections of the 100-year daily flood, estimated using a log-Pearson type III distribution fit to annual peaks of 30 years, differ somewhat between the Semi-Pooled and Dist-Pooled (Fig. 12).At three validation sites, extreme floods are consistently larger under the Semi-Pooled than the Dist-Pooled, and the mean difference in the 100-year daily flood estimate between the two calibration approaches grows between the historic runs and the RCP 4.5 and 8.5 scenarios.This suggests that the flood-generation process is fundamentally different between the two parameterizations, with the Semi-Pooled formalization magnifying the effect of climate change on extremes.Furthermore, there is substantially more uncertainty in the 100-year daily flood estimate under the Semi-Pooled.Figure 12 shows the combined uncertainty across both climate projections and calibrations, but this uncertainty is broken down further in Fig. 13.Similar to Fig. 11, three sources of uncertainty are evaluated for the 100-year daily flood, including calibration uncertainty alone, climate projection uncertainty alone, and their combined effect.For both the Semi-Pooled and Dist-Pooled, calibration uncertainty has a smaller influence than projection uncertainties and, for all sites, the Dist-Pooled has a smaller uncertainty range than the Semi-Pooled, even for calibration uncertainty alone.This was a truly surprising result, given the parametric freedom in the Dist-Pooled model and the fact that no daily data were ever used in the calibration of either model.It appears that a lack of model parsimony does not necessarily lead to greater uncertainty in model simulations under different climate conditions, somewhat counter to what would be expected of overfit models.One possible reason for this result would be if increased parametric freedom somehow offset the effects of structural deficiencies in the model.However, further research is needed to investigate this issue.

Discussion and conclusion
In this study we examined a variety of calibration experiments to better understand the benefits and costs associated with different calibration choices for a complex, distributed hydrologic model in a data-scarce region.The goal of these experiments was to provide insight regarding the use of multisite data in calibration, the effects of parameter complexity, and the challenges of using limited data for distributed model calibration, all in the context of projecting future streamflow under climate change.This study tested two multisite calibration strategies, the stepwise and pooled approaches, finding that the pooled approach using all data simultaneously provides improved calibration results.This suggests that small sacrifices of model performance at certain sites can improve and stabilize basinwide performance.The pooled calibration substantially improves with increasing parameter complexity at the calibration sites, but similar streamflow predictions at the validation sites between the semi-distributed and distributed pooled calibrations were found, suggesting overfitting of the model from the fully distributed conceptualization.It is worth noting that for the transformation of rainfall to runoff, up to five or six parameters can be identified on the basis of a single hydrograph (Wagner et al., 2001).Under this premise, the number of the HYMOD_DS parameters being calibrated in the semi-distributed approach remains realistic, but the fully distributed parameterization scheme likely causes poor identifiability of the parameters.Thus, pursuing a parsimonious configuration (e.g., optimization for a small portion of the parameters) with an effort to increase the amount of information (e.g., multivariable/multisite) is critical in the calibration of watershed system models (Gupta et al., 1998;Efstratiadis et al., 2008).We also note the important role of experienced hydrologists in designing a parsimonious hydrologic calibration (e.g., Boyle et al., 2000).In this study, the feasible ranges of the HYMOD_DS parameters were kept wide (as is often done in automatic hydrologic calibrations) without consideration of the physical properties of the basin; the judgment of local hydrologic experts could help reduce the feasible ranges used during the calibration and thus contribute to a reduction of calibration uncertainty.
Calibration only based on data at the basin outlet is all too common in hydrologic model applications and is sometimes considered comparable to multisite calibrations even for predictions at interior gauges (Lerat et al., 2012).In contrast, others have reported improvements in interior flow predictions by using internal flow measurements (Anderson et al., 2001;Wang et al., 2012;Boscarello et al., 2013).This is in agreement with the findings from this study, demonstrating the superiority of the pooled calibration approach to the basin outlet calibration in terms of its ability to represent interior hydrologic response correctly.This study shows the danger in relying on an outlet calibration for interior flow prediction.
It was shown that caution is needed when using an outlet calibration approach for streamflow predictions under future climate conditions.This study showed that the basin outlet calibration can lead to projections of mid-21st cen-tury streamflow that deviate substantially from projections under multisite calibration strategies.From the test of implications of the pooled calibration in the context of climate change, it was found that applying the pooled calibration with semi-distributed and distributed parameter formulations showed clear gains in reducing uncertainties in predictions of monthly and seasonal water availability as compared to the basin outlet calibrations.Surprisingly, increased parameter complexity in the calibration strategies did not increase the uncertainty in streamflow projections, even though parameter equifinality did emerge.The results suggest that increased (excessive) parameter complexity does not always lead to increased uncertainty if structural uncertainties in the model are present.
The semi-distributed pooled and distributed pooled calibrations are very similar for monthly streamflow projections, yet differ in their projections of extreme flows in part due to their differences in the spatial variability of optimal parameters, with the distributed pooled calibration showing less uncertainty for 100-year daily flood events.We evaluated the separate and joint influence of uncertainties in parameter estimation and future climate on projections of seasonal streamflow and 100-year daily flood across calibration schemes and found that the uncertainty resulting from variations in projected climate between the CMIP5 GCMs substantially outweighs the calibration uncertainty.These results agree with other studies showing the dominance of GCM uncertainty in future hydrologic projections (Chen et al., 2011;Exbrayat et al., 2014).While the GCM-based simulations still have widespread use in assessing the impacts of climate change on water resources availability, the bounds of uncertainty resulting from an ensemble of GCMs cannot be well-defined because of the low credibility with which GCMs are able to produce time series of future climate (Koutsoyiannis et al., 2008).This issue hinders a straightforward appraisal of future water availability under climate change and has motivated other efforts; e.g., performance-based selection of GCMs (Perez et al., 2014).
In addition to the uncertainties surrounding model parameters and future climate explored in this study, there is also significant uncertainty in streamflow projections stemming from structural differences between applied hydrologic models, which can be especially pertinent where robust calibration is hampered by the scarcity of data (Exbrayat et al., 2014).Furthermore, the residual error variance of hydrologic model simulations would increase the effects of hydrologic model uncertainty as compared to that of the climate projections (Steinschneider et al., 2014).These issues need to be addressed in future work for exploring a comprehensive uncertainty assessment of climate change risk for poorly monitored hydrologic systems.
Successful automatic calibration algorithms for hydrologic models are based primarily on global optimization algorithms that are computationally expensive and require a large number of function evaluations (Kuzmin et al., 2008).
Although the speed and capacity of computers have increased multifold in the past several decades, the time consumed by running hydrological models (especially complex, physically based, distributed hydrological models) is still a concern for hydrology practitioners.A single trial of parameter optimization of HYMOD_DS associated with 100 000 runs can take 28 days on a single processor (Fig. S7).Accordingly, the use of high-performance computing power was essential in this study to better understand the implications of different calibration choices and their associated uncertainty for streamflow projections.Enhanced data with high spatial and temporal resolution are increasingly available from remote sensing and satellite products.In the future, remote sensing and satellite information can be integrated into calibration approaches to develop more robust estimates of spatially distributed parameter values, enabling internal consistency of distributed hydrological modeling.Significant progress has been made toward this end (Tang et al., 2009;Khan et al., 2011;Thirel et al., 2013).Future work will consider using high-performance computing power (e.g., Laloy and Vrugt, 2012;Zhang et al., 2013) to understand how such information can enhance the hydrologic simulation at ungaged sites and reduce the calibration uncertainty of distributed hydrologic models in data-scarce regions.
The Supplement related to this article is available online at doi:10.5194/hess-19-857-2015-supplement.

Figure 2 .
Figure 2. Streamflow data usage for the model calibration and validation.

Figure 4 .
Figure 4. Model structure based on climate input grids and three different parameterization concepts.

Figure 5 .
Figure 5. (a) Sub-basins corresponding to five gaging stations are used for the multisite calibrations.(b) Two sub-basins (Kama and Asmar) are assumed to be ungaged and used for evaluating the calibration approaches.

Figure 6 .
Figure 6.Comparison of the stepwise and pooled calibrations under the semi-distributed parameterization.Each calibration is conducted 50 times.Values on the bottom represent expected values of NSE (in upper row) and KGE (within parenthesis in lower row) from 50 calibrations.

Figure 7 .
Figure 7.Comparison of the pooled calibrations for the 3 parameterizations of lumped, semi-distributed, and distributed.Each calibration is conducted 50 times.Values on the bottom represent expected values of NSE (in upper row) and KGE (within parenthesis in lower row) from 50 calibrations.

Figure 8 .
Figure 8.Comparison of the basin outlet calibrations for the three parameterizations of lumped, semi-distributed, and distributed.Each calibration is conducted 50 times.Values on the bottom represent expected values of NSE (in upper row) and KGE (within parenthesis in lower row) from 50 calibrations.

Figure 9 .
Figure 9. Coefficient of variation (CV) of 50 optimal values of C max (parameter for the soil moisture accounting module in the HYMOD_DS) from the basin outlet calibrations (left panel) and the pooled calibrations (right panel).

Figure 10 .
Figure 10.Historical and 2050s average monthly streamflow predictions at Dakah, Kama, and Asmar under four calibration strategies: Lump-Outlet, Dist-Outlet, Semi-Pooled, and Dist-Pooled.The error bars represent the streamflow ranges resulting from 50 trails of the HYMOD_DS calibration.For each of the 50 trials, the 2050s streamflow predictions are averaged over 36 GCM climate projections.

Figure 11 .
Figure 11.Uncertainties in wet and dry season average streamflow predictions for 2050s are derived from the basin outlet and pooled calibrations for Dakah.Uncertainties are evaluated by the CV of average season streamflow predictions.Three uncertainty sources are considered: calibration uncertainty across 50 calibration trials (Par), climate uncertainty across GCM projections (Clim), and combined uncertainty (Joint).

Figure 12 .
Figure 12.Comparison of GCM average 100-year daily flood events derived from the semi-distributed and distributed pooled calibrations.The uncertainty range is from 50 trials of the model calibration.

Figure 13 .
Figure 13.Uncertainties in 100-year daily flood estimates for 2050s are assessed using the Semi-Pooled and Dist-Pooled calibrations.Uncertainties are evaluated by calculating the CV of the 2050s 100year flood estimates under three uncertainty sources: calibration uncertainty across 50 calibration trials (Par), climate uncertainty across GCM projections (Clim), and combined uncertainty (Joint).

Table 1 .
Streamflow gaging stations in the Kabul River basin.
Velo Wave velocity in the channel routing (m s −1 ) 0.5 5 Diff Diffusivity in the channel routing (m 2 s −1 ) 200 4000