Exploring the physical controls of regional patterns of flow duration curves – Part 2: Role of seasonality, the regime curve, and associated process controls

The goal of this paper is to explore the process controls underpinning regional patterns of variations of streamflow regime behavior, i.e., the mean seasonal variation of streamflow within the year, across the continental United States. The ultimate motivation is to use the resulting process understanding to generate insights into the physical controls of another signature of streamflow variability, namely the flow duration curve (FDC). The construction of the FDC removes the time dependence of flows. Thus in order to better understand the physical controls in regions that exhibit strong seasonal dependence, the regime curve (RC), which is closely connected to the FDC, is studied in this paper and later linked back to the FDC. To achieve these aims a top-down modeling approach is adopted; we start with a simple two-stage bucket model, which is systematically enhanced through addition of new processes on the basis of model performance assessment in relation to observations, using rainfall-runoff data from 197 United States catchments belonging to the MOPEX dataset. Exploration of dominant processes and the determination of required model complexity are carried out through model-based sensitivity analyses, guided by a performance metric. Results indicated systematic regional trends in dominant processes: snowmelt was a key process control in cold mountainous catchments in the north and north-west, whereas snowmelt and vegetation cover dynamics were key controls in the north-east; seasonal vegetation cover dynamics (phenology and interception) were important along the Appalachian mountain range in the east. A simple two-bucket model (with no other additions) was found to be adequate in warm humid catchments along the west coast and in the south-east, with both regions exhibiting strong seasonality, whereas much more complex models are needed in the dry south and southwest. Agricultural catchments in the mid-west were found to be difficult to predict with the use of simple lumped models, due to the strong influence of human activities. Overall, these process controls arose from general east-west (seasonality) and north-south (aridity, temperature) trends in climate (with some exceptions), compounded by complex dynamics of vegetation cover and to a less extent by landscape factors (soils, geology and topography).


Introduction
This is the second paper of a 4-part series (the others being Cheng et al., 2012;Coopersmith et al., 2012;and Yaeger et al., 2012) that attempts to understand the physical controls on regional patterns of variations of signatures of streamflow variability, with a particular focus on the flow duration curve (FDC).Instead of directly exploring the FDC, a key frequency-based signature of daily streamflow variability, as in the first paper (Cheng et al., 2012), we will approach it from a different perspective, exploring regional patterns of another signature of streamflow variability, the regime curve (RC), which denotes the mean seasonal variation streamflow.This is motivated by a previous modeling study in hypothetical catchments by Yokoo and Sivapalan (2011), which suggested that the regime curve contains valuable information on the middle part of the FDC, serving as the bridge between the high and low flows at either ends of the FDC, and that understanding the physical controls of the regime curve can assist in achieving the same regarding the FDC.An empirical study of the FDCs of 197 catchments across the United States presented by Cheng et al. (2012), as part of the present study, has provided empirical support to these model predictions.
Motivated by the findings of Yokoo and Sivapalan (2011) and Cheng et al. (2012), the goal of this study is to explore the process controls of regime behavior, i.e., seasonal variation of streamflow, through a comparative study of 197 catchments located across the continental United States, covering a range of climates and physiographic properties, and belonging to the MOPEX dataset.This is essentially a databased study, assisted by process-based modeling.Instead of applying an existing model to all 197 catchments, the analysis involves systematic model development and assessment of model predictions and performance in comparison to observed data.This downward or top-down approach to model development (Klemeš, 1983;Jothityangkoon et al., 2001;Farmer et al., 2003;Sivapalan et al., 2003;Bai et al., 2009;Thompson et al., 2011) commenced with the development of a simple two-bucket model (hereafter referred to as the "base model").This model was initially applied to all 197 catchments, and its performance assessed.Guided by alternative hypotheses regarding the reasons for the poor fits against regime curves estimated from observed streamflow data, the model was enhanced step by step through addition of new processes initially left out of the base model.Model development was continued until the model performance could not be improved any longer.The complete model was then utilized in sensitivity studies to decipher (a) the dominant process controls on the regime curve and (b) the minimum complexity of models (i.e., the mix of processes required) needed to achieve a satisfactory fit to the empirical regime curves.In this way it is hoped to develop an understanding of the process controls of the regime curves across the continental United States, and also the main climatic and landscape factors that contribute to the regional patterns of the process controls underpinning the regime curves.
The work presented in this paper is an exercise in comparative hydrology (Falkenmark and Chapman, 1989;Sivapalan, 2009), where the goal is to develop generalizable understanding through comparative analysis of rainfall-runoff data in catchments located along a climatic or other gradient.Instead of studying one catchment in considerable detail, the focus is on the use of simpler models to discover features or process controls that are similar or different amongst a population of catchments (Sivapalan et al., 2011).Finally, the assessment of catchment response is with respect to holistic signatures of catchment response (e.g., flow duration curves, regime curve, flood frequency curve etc.) and not in terms of detailed process descriptions.This Darwinian (Harte, 2002;Sivapalan et al., 2011) and functional (Black, 1996;Sivapalan, 2005;McDonnell et al., 2007;Wagener et al., 2007;Sawicz et al., 2011) approach to comparative data analysis and modeling is in contrast with much of the past research in catchment hydrology modeling, which has focused on developing predictive understanding in individual catchments on the basis of models based on individual processes or internal descriptions (Dooge, 1986).Such bottom-up approaches have been hampered by the inability to map the heterogeneity of subsurface pathways and process complexity.Extrapolation to and prediction of catchment responses across different places and a range of scales has remained a challenging problem.A synthesis of these two top-down and bottom-up approaches is possibly the key to developing new understanding and new theories of hydrologic responses at catchment scales.The present study is a step in this direction.
The paper begins with information on the data used in the study and the methodology used to achieve its aims, which is presented next in Sect. 2. This section presents in particular the outlines of the downward approach to model development adopted in the paper, and procedures for model calibration and model performance assessment.Section 3 presents an illustration of the model development exercise, using the results from nine selected example catchments.This is followed, in Sect.4, by a comparative assessment of model performance to determine (a) the dominant process control of the regime curve for the entire population of catchments, (b) the minimum model complexity required to achieve satisfactory predictions of the regime curves, and (c) the manifestations of these process controls on the shapes of the FDCs.The results are summarized in the form of a schematic diagram.Section 5 summarizes the main conclusions of the study and recommendations for further research.

Data
This is a study in comparative hydrology and uses data from 197 catchments located across the continental United States belonging to the MOPEX dataset and spanning a variety of climates and physiographic regions, with over 50 yr of continuous daily climatic and flow data.Daily precipitation (P ), temperature (T ), and potential evaporation (PET) time series are used as climate inputs, while the daily flow data are used to generate regime curves (RCs), 50-yr averages of streamflow for each day of the year, which are used for model development, calibration and comparative performance assessment.The PET was calculated based on the NOAA Pan Evaporation Atlas (NOAA, 1982), where it was estimated using the Penman (1948) method, and the solar radiation required in the calculation was estimated from percent sunshine (Hamon et al., 1954).The mix of vegetation types for each catchment and the characteristic LAI (leaf area index) profiles for each vegetation type were obtained from the NASA Land Data Assimilation Systems (available at: http://ldas.gsfc.nasa.gov/nldas/NLDASmapveg.php).The composite LAI profile for each catchment, which is then used as input to the models, is calculated as the average of the monthly values for each vegetation type from the Mosaic vegetation dataset (University of Maryland (UMD) vegetation classification, with 14 classes in total), weighted by the area fraction of each vegetation type within the catchment.
Nine example catchments, chosen from this dataset and spread across the country (from north to south, west to east, and dry to humid), are used to highlight the diversity of regime behaviors exhibited within the continental United States.Besides, they are also used to illustrate the systematic, downward approach to model development (Sivapalan et al., 2003) that is eventually implemented in the 197 study catchments.They are selected based on both their locations and their classes within the Köppen climate classification map.Therefore, we can consider them representative of the climate conditions under the regional similarity assumption (Merz and Blöschl, 2004;Patil and Stieglitz, 2011), even though they are not wholly representative of the whole country.Figure 1 presents the empirical regime curves of the nine selected catchments (estimated over the calendar year), which are located in the states of Washington (WA), Idaho (ID), New York (NY), California (CA), Missouri (MO), Georgia (GA), Texas (TX) and Florida (FL).Regime curves are presented for P , PET, and total streamflow (Q), as well as the fast flow (Q f ) and slow flow (Q u ) components of measured streamflow.The fast flow and slow flow components were obtained by the baseflow separation algorithm of Lyne and Hollick (1979): where a is the filter parameter, which was set to 0.925 (Brooks et al., 2011).Since the hydrologic partitioning is not strongly sensitive to baseflow separation methods (Troch et al., 2009), we will use this easily implementable algorithm for the baseflow separation in this study.The RCs for PET show evident similarity with an almost sinusoidal variation with a uniform peak near the middle of the year, and also differences in amplitudes across the continental United States, exhibiting significant regional variations.For comparative purposes, the aridity index (AI), which is the ratio of annual PET to annual precipitation, is also noted in Fig. 1.Individually, catchments near the east coast (NY, GA, FL) are relatively humid with AI < 1.In the north-east, e.g., NY, precipitation tends to remain constant throughout the year without much seasonality.In the south-east, rainfall seasonality increases north to south (GA, FL), with FL exhibiting strong precipitation seasonality that is almost in-phase with PET, due in part to the influence of the hurricane season.Consequently, while within-year variability of flows tends to decrease as we move from north to south along the east coast, the timing of peak flow shifts from March in NY to September in FL.As we move east to west in the north (NY, ID, WA), seasonality of precipitation increases, indeed becoming out of phase with PET (note ID and WA, which exhibit strong out-of-phase seasonality).NY and ID exhibit pronounced peak flows during spring not seen in the south, evidently due to snowmelt, whereas the catchment in WA experiences bimodal streamflow variability, during spring and again in winter.In the middle of the continental United States, the aridity index increases from the north (ID) to south (TX), with the seasonality of precipitation undergoing a significant transformation, culminating in a bi-modal distribution in TX (peaks in spring and again in autumn).In TX, because of high aridity, with PET > P over the entire year, there is hardly any streamflow observed.Catchments on the west coast are very diverse, although they all display a precipitation seasonality that is out of phase with PET.The Washington catchment has flow peaks not only in winter but also in spring (likely arising from mountain snowmelt), whereas the catchment in Northern California remains humid, exhibiting high flows due to strong winter precipitation that coincides with low PET but without the spring flow peak caused by snowmelt.In Southern California, in spite of the fact that the climate is as dry as Texas, there is spring streamflow due to the outof-phase seasonality between precipitation and PET.Overall, the variability captured in the nine example catchments (presented in Fig. 1) provides a snapshot into the enormous spatio-temporal variability of climate and hydrology across the continental United States.
Figure 2 shows the corresponding FDCs of the nine selected catchments, which are plotted as the sorted 50-yr daily streamflow against the frequency of occurrence.They indicate clear differences between the shapes of the FDCs of fast flow, Q f (which show significant ephemerality in all cases), and those of slow flow, Q u , and total flow, Q.On the other hand, for each catchment, the FDCs of Q u and Q show strong similarities to each other.In spite of this, there are regional differences between the FDCs, with the nine catchments dividing into two groups, organized around the aridity index: TX and Southern CA exhibiting strong ephemerality of flows, and all of the remaining (more) humid catchments exhibiting similar FDCs, in spite of the strong differences in the timing of the within-year variability of climate and streamflow.In other words, much of the richness in the regime curves presented in Fig. 1 is lost in the FDCs, due to the fact that the timing of flows is ignored in the construction of the FDCs.

Downward approach to model development
We have already seen a glimpse into the enormous diversity of both regime behavior and FDCs, and the connections between the two.The main goal of this paper is the elucidation of the process controls underpinning regional patterns of variation of streamflow regime.To achieve this, we adopt a comparative modeling approach, using data from 197 catchments belonging to the MOPEX dataset, and representing strong gradients of climate (including aridity and seasonality), as well as soils, geology, topography and vegetation.The model development follows the downward approach pioneered by Jothityangkoon et al. (2001) and Farmer et al. (2003), and later reviewed by Sivapalan et al. (2003).Model development commences with a simple two-stage bucket model, which we call the base model.We initially apply the base model to all the study catchments, and attempt to obtain the best possible fits to the empirically derived regime curves using an automatic calibration algorithm.Since our motivation is to explore the first-order effects only, regime curves can provide sufficient information for this study.To keep it simple and robust, we use the regime curves estimated over the full length of record for the calibration.
Being a simple model, it is not likely that the base model will be adequate in many catchments.In catchments where improved parameterization cannot improve the predictions, we incorporated additional processes that we hypothesized would be able to fill the gap between predictions and observed data.We then reapply the improved model to the study catchments, especially to catchments for which the previous model was found deficient, calibrate the parameters, assess the resulting improvements in model performance, and explore possible further improvements.We continue this process of model development until no further improvements can be obtained in model performance.Through this systematic assessment of model prediction, model updating, and model re-assessment, we used the model as a tool to explore the catchments' runoff characteristics.Note that the focus of the modeling is on comparative assessment across many catchments, and exploration of dominant process controls, and not on obtaining perfect fits to the observed streamflow hydrographs or quantifying model performances in detail for any given model or catchment.
The details of the base model, several model enhancements that were made to the base model as part of the downward approach outlined above, and the final complete model will be presented later in Sect. 3 together with the results of each improvement (Fig. 3).We next describe the approach adopted for model calibration and parameter estimation, and methods used to carry out comparative assessment of model performance as a way to elucidate dominant process controls and the minimum complexity required to reproduce observed regime behavior.

Parameter calibration and model performance assessment
The distillation of dominant processes from these 197 catchments and the heterogeneous features that describe them is accomplished in four parts.First, we must determine which parameters are required -this was achieved, as described in Sect.2.2, by sequentially increasing model complexity, adding new processes until the model's performance is adequate.Second, these parameters must be automatically calibrated for all the 197 catchments -this is done via the Markov chain Monte Carlo (MCMC) algorithm, a tool designed to search a multidimensional parameter space more efficiently than brute force.Third, the model's performance must be assessed -this is done by a simple sum of squared errors between the observed and predicted regime curves.Finally, the performance of the various models for each catchment, containing differing number of parameters, must be compared, addressing the relative differences in complexity.This last step is managed with the use of the Akaike information criterion (AIC) (Akaike, 1974), which assesses the marginal value of each new parameter added.This section will discuss the last three parts: parameter calibration, model performance assessment, and process selection.

Parameter calibration and validation
The measured total streamflow was separated into fast flow and slow flow through the application of the baseflow separation algorithm of Lyne and Hollick (1979), and regime curves of both flows were calculated for the purpose of model performance assessment.The full model (i.e., the base model with all modifications) was applied to all 197 MOPEX catchments to simulate the regime curves of both the fast and slow flow; explicit Euler was used to solve the model equations; and model parameters were estimated through automatic calibration, by comparing the predicted streamflow regime curves to those estimated from observed data.We adapted the parameter estimation method from Harman et al. (2011), in what is called a naïve Bayesian model.Based on the fits obtained during model application, we assume that the errors associated with predicted fast flow and slow flow regime curves (Q f , Q u ) are approximately normally distributed, i.e., N[x|(µ, σ 2 )].We also assume Q f and Q u are normally distributed with their means as the values predicted by the model (Q f = f (P , PET, GSI, LAI, S b1 , t w , α, t c ), Q u = g(Q w , PET, GSI, S e , S b2 , t u , t c )) with unknown variances (σ 2 f , σ 2 u ).The likelihood function L(X|θ ) of the observations X = {Q f1 , Q f2 , . . ., Q fn , Q u1 , Q u2 , . . ., Q un }, given the model θ = {S b1 , t w , α, t c , S e , S b2 , t u , σ 2 f , σ 2 u } with P , PET, GSI as input, can be calculated as follows: The posterior likelihood function of the model based on the Bayes' theorem is then where L(θ ) is the prior distribution; since we do not have definite information about the variables, it is set to unity as a uniform prior distribution.L(X) is the probability of the observations, although it is not necessary to evaluate it since the sampling method we use depends only on ratios of successive likelihoods, and so this term cancels.
We then employ the Metropolis algorithm (Metropolis et al., 1953;Kuczera and Parent, 1998)  Starting with an optimum based on previous model development, we calculate the likelihood value for each randomly selected set of parameters (θ i+1 ) near the current parameter value (θ i ).The new parameter set is accepted if it leads to a larger likelihood value (L(X|θ i+1 ) > L(X|θ i )), i.e., it helps predict the streamflow regime better than the previous set, and then a new search starts from a new set (θ i+1 ).However, there is the possibility that this set can lead to another local optimum.To reach the globally optimal parameter set, we accept the inadequate parameter set if the ratio of the likelihood values L(X|θ i+1 )/L(X|θ i ) is larger than a uniform random value between zero and one.We run this algorithm to search the next available parameter set that improves upon the largest likelihood and to save the 500 samples in a chain.This algorithm is run twice to generate 1000 samples in total for each site.The parameter set with largest likelihood was selected as optimal for the full model.
One of the advantages of a Bayesian framework is that we can estimate uncertainty (Bai et al., 2009;Harman et al., 2011): the upper and lower bounds are defined from the plot of likelihood and parameter values.For each catchment, throughout the MCMC sampling, there is a chain of likelihood values which are added cumulatively from the smallest parameter value; the upper and lower bounds are then defined when the sum of the likelihood values just exceeds 95 % and 5 % of the total.The relative error is calculated as half of the range between the upper and lower bounds as a percentage of the parameter with the maximum likelihood value.Median relative error presented in Table 1 is the median of the uncertainty among the catchments.
Since our goal is not to deliver precise predictions of the streamflow time series, but rather to gain a general understanding of first-order impacts of different processes on flow generation mechanisms along a climatic or other gradient, a qualitative validation, also called "scientific validation" (Biondi et al., 2012) suits our purpose better.Scientific validation can be used to identify integral processes for which the model should account, as well as to demonstrate the model's ability to adequately represent reality, since validation tests alone may not guard against an equifinite solution (Biondi et al., 2012).This is the essence of the downward approach to modeling, as outlined in Sect.2.2, and it is this systematic model development procedure itself that helps to validate the importance of each remaining process.As a model could produce good results with a wide range of specific parameter values, to ensure that the model produces reasonable results with realistic parameters, the parameter set should be considered as a combined set (Freer et al., 1996).The Bayesian framework we used is able to find optimum parameter sets by giving greater weight to the better simulations.These parameter sets and predictions then can be chosen as more likely than others.In addition to the assessment of model hypotheses and parameters, a multi-criteria approach can also be used to verify model performance.In this work, we calibrate the parameters to optimize both the fast flow and slow flow simultaneously.This multi-objective check helps provide information regarding where individual subsystems or processes are significant in the catchments.For example, some processes may not affect the total discharge, but could influence the quantities of observed fast flow (Figs. 6 and 7).A multi-objective calibration enables us to detect those improvements in model performance that negatively affect the global discharge but are beneficial for characterizing the fast flow component and detecting the main control processes.

Performance assessment for the full model in all 197 catchments
Even with all modifications, the model is still relatively simple, and it is probable that even the full model may not be able to reproduce streamflow satisfactorily in catchments that have other, perhaps anthropogenic, factors dominating the flow generation mechanism.Therefore, after the calibration, we assessed the model performance for all 197 catchments and removed 45 catchments where the full model failed to generate adequate predictions.These were mostly located in the agricultural Midwest, many of them known to be dominated by tile drains or irrigation.Different catchments have distinct flow characteristics (i.e., the magnitude and the variability of the flow).To compare the performance among catchments, the model predictions are then assessed through the use of a performance indicator, the mean square error (MSE) estimated on the standardized flows (separately for both fast and slow flows) as follows: where SQ obs and SQ sim are standardized flow value for observed and simulated flow, and N is the length of data.Both flows are standardized by the observed mean and standard deviation to remove the influence of the flow characteristic differences: where Q represents the time series of flows (observed for SQ obs or model-predicted for SQ sim ), Q obs the time series of observed flow, SQ obs the standardized observed flow, and SQ sim the standardized simulated flow; both SQ obs and SQ sim are represented by SQ in the equation as they are calculated in the same way.The summations in Eqs. ( 4) and ( 5) are over 1-365 days, considering that we are dealing with the regime curve only.

Process selection for catchments with satisfactory prediction by full model
For the catchments classified as satisfactory, we assume the full model captures the dominant processes in those catchments.For each well-modeled catchment, we then performed comparative assessments of the models using different combinations of the four modified processes identified through the model's development.The comparative assessment is carried out to determine (a) dominant processes that contributed most to the reproduction of the observed regime curves and (b) the minimum model complexity (i.e., the number and type of model enhancements needed to be added to the base model to reproduce the observed regime curve).The Akaike information criterion (AIC) is used to perform this comparative performance assessment (Akaike, 1974).The AIC is a statistical metric often used to measure the relative goodness of fit of models by generating a measure of information loss, and is used in model selection to choose the candidate model that minimizes information loss.Recently, it has also been used to assess needed model complexity to achieve the required quality of model predictions (Engelhardt et al., 2012).The smaller the AIC value, the less information is lost, and the better the model.Assuming, for simplicity, a Gaussian distribution for the streamflow, we can estimate the AIC using the following expression: where n is the sample size (i.e., in this case 365 days as we resolve the regime curve on a daily basis) and k is the number of parameters used in each model.The difference between the AIC of the model prediction after each model enhancement and the AIC of the base model prediction, i.e., AIC 1 = AIC 0 − AIC 1 , is used as a measure of the improvement in model performance.Comparative assessments of the model performance after the addition of each process enhancement at the first level can be used to determine the dominant process, i.e., the one process that helps most to improve the prediction in comparison to that of the base model.Similarly, the required minimum model complexity is inferred also through the use of the AIC, when it can be determined that the addition of a particular process enhancement does not lead to significant improvement in model performance.

Illustrative results: progression of model development
In this section we present the detailed development and results of the model enhancement process, including the thought processes involved in making the model choices.In this presentation, we focus on bringing out the process controls of the streamflow regime curve in qualitative terms, using some of the nine catchments presented in Figs. 1 and 2 as examples.

Base model
Yokoo and Sivapalan (2011) suggested, in terms of reproducing the flow duration curve, that a catchment's streamflow response can be partitioned into two different components: fast flow (e.g., surface streamflow processes whose variability directly reflects that of event precipitation), and slow flow (e.g., subsurface flow whose variability reflects the strong filtering of precipitation variability by flow pathways with significantly longer residence times, and is therefore reflected in the catchment's regime curve).
Guided by this thinking, we start with a nonlinear, sixparameter model operating as a two-stage filter, with two buckets arranged in series and simulating both fast flow and slow flow and their interactions (Fig. 3).In the first stage, precipitation events are filtered nonlinearly into fast streamflow and soil wetting (infiltration to deeper soil).In the second stage, the infiltrated water is filtered (somewhat more linearly), governed by the competition between topographically-driven subsurface drainage and vegetationdriven evapotranspiration.In terms of streamflow generation, the first bucket is treated as an overflow bucket, whereas the second is treated initially as a leaky bucket (with no overflows).Each of the two filters (buckets) is assigned a storage capacity (i.e., S b1 and S b2 , respectively, although in the base model S b2 is not invoked) and associated characteristic response times (i.e., t w and t u , respectively).The second (deeper) bucket is also assigned a root zone storage capacity (i.e., S e ) that is used in the prediction of transpiration.Two more buckets are added to route the fast flow and slow flow components separately.In reality, once the fast flow and slow flow enter the channel, both flows are routed together.However, since we are not aiming to predict the hydrograph or peak flow precisely, but rather, to appropriately predict regime behavior, such a technique is acceptable for our purposes.Because the drainage area of these catchments varies from hundreds of square kilometers (10 2 km 2 ) to tens of thousands of square kilometers (10 4 km 2 ), these two routing buckets are used to introduce lag time for the flow and to attenuate the variability to obtain a smoother regime curve more closely resembling the observed regime curve.A sixth parameter (i.e., t c ) is used here to represent the lag times introduced by flow routing in the river network.
The water balance equations for the two storage buckets are as follows: where S 1 and S 2 are the water storage in the first stage and second stage, P the precipitation, Q 1f = (S 1 − S b1 )/ t saturation excess streamflow from the first bucket, Q w = S 1 /t w the wetting (infiltration) into the second bucket, ET 1 = PET(S 1 /S b1 ) evapotranspiration from the first bucket, Q 2u = S 2 /t u the subsurface drainage from the second bucket, and ET 2 = PET(S 2 /S e ) evapotranspiration from the second bucket.The water balance equations for the two stream routing buckets are as follows: where S c1 is the water storage in the river network from the first bucket, Q f = S c1 /t c the fast flow at the catchment outlet after stream routing, S c2 the water storage in the river network from the second bucket, and Q u = S c2 /t c the slow flow at the catchment outlet after stream routing.The parameter t c is the mean residence time -the catchment-scale-averaged time raindrops need to travel from hillslope to catchment outlet.It relates to the drainage area, river network structure, topographic gradient, etc.; however, in this paper we will estimate it through calibration.In spite of treating these runoff components separately because of their distinct generation mechanisms and flow paths, still they are routed together in the network once they enter river channels; thus we use the same mean residence time parameter for both fast flow and slow flow.This base model works well in humid catchments that exhibit strong seasonality (Fig. 4) such as those found in Northern CA, WA, and FL.In this case, there was little enhancement needed in spite of the fact that these are vegetated catchments.Since precipitation is a main driver of the model, it is reasonable to say that the model works well in catchments whose streamflow response follows a similar pattern as that of the precipitation.

Modification 1: snowmelt
The base model worked well in many humid catchments that exhibited strong seasonality (e.g., catchments in Northern CA and Florida, Fig. 1).However, it failed in over half of the catchments, many of which were the northern, colder catchments.As seen in Fig. 1, most of these catchments (e.g., WA, ID, NY) experience sharp peak flows in spring.Considering the temperatures at this time of the year, a plausible reason for this is snowmelt, especially in ID and NY.Winter precipitation at these latitudes, especially in mountainous regions, is typically in the form of snow which accumulates on the ground during winter months and remains there until spring when the temperatures increase and the snowpack melts.To improve the model further in these catchments, we incorporated a simple snowmelt component to the base model using the degree-day factor method (e.g., Eder et al., 2003), based on available mean daily air temperatures.The snowmelt component added to the model is as follows: where S n is the storage in the snow pack, P s the precipitation in the form of snow, P r the precipitation in the form of rain, Q n the snowmelt, T crit the snow-rain transition temperature (assumed here as 0 • C), ddf (1.5 mm day −1 K −1 ) the degree day factor, and H pos the temperature excess over the critical temperature, used in combination with the degree-day factor, as a surrogate for the driving forces for snowmelt.Figure 5 presents a comparison of the predictions of the base model with those of the enhanced model that included the snowmelt component for the catchment in Idaho.The results show that the enhanced model leads to a dramatic improvement in the ability to predict streamflow timing, duration and magnitude, even though the enhancements for snowmelt have been rather parsimonious.On the other hand, the catchment in NY (we are not presenting a figure for the sake of brevity) required further modifications to reproduce the observed regime curves.

Modification 2: subsurface-influenced fast flow
With the incorporation of the snowmelt component, the model was able to capture the flow peak during late spring and early summer that was caused by snowmelt.It performed well in the northern mid-western mountainous catchments (e.g., ID, WY, etc.), but continued to under-estimate the fast flow during late winter and early spring, in the northeastern catchments (e.g., NY) where snowmelt was significant, and also southeastern (e.g., GA, VA) catchments, which exhibit low seasonality of precipitation and present little or no snowmelt impact.The rainfall during this period is similar to the rainfall experienced in summer but generates much larger streamflow and this non-linear rainfall-runoff response could be related to the high water table (Lana-Renault et al., 2007;Li et al., 2011).These studies have shown that, during the wet season, the hydrological response could be more dependent on the water table level than simply the precipitation characteristics (depth and intensity).Along with the influence of the rising water-table level, the dominant flow generation mechanism would then switch from infiltration excess to saturation excess.Analysis of internal dynamics based on model predictions (not presented here for brevity) showed that the under-estimation of fast flow during spring was accompanied by large amounts of water stored in the second bucket, suggesting that water that otherwise would overflow to the river is being kept in storage due to the absence of an overflow mechanism in the second bucket.This may explain the underestimation of fast flow during spring, when PET and ET are small.
As a result, an overflow mechanism that mimics a saturation excess-induced fast flow (Q 2f ) mechanism (albeit in a somewhat conceptual or qualitative manner) was introduced: where Q 2f is the overflow from and S b2 is the threshold storage capacity of the second bucket.
To illustrate the impact of this process, we applied the model to a catchment with little snowmelt influence.Figure 6 presents a comparison of model predictions in GA between the base model (with snowmelt included) and an enhanced model that included the snowmelt and the subsurfaceinfluenced fast flow component.The results show that this enhancement indeed helped to increase fast flow during winter and early spring, but still over-estimated the fast flow during summer and autumn seasons; the underestimation in slow flow was not improved.The improvement due to this component is less significant than the improvement due to the snowmelt component in the ID mountainous catchment, because there snowmelt is the dominant streamflow generation mechanism, and as such was able to transform both the timing and magnitude from precipitation to streamflow.In GA, other processes such as interception loss and phenology are all important in streamflow generation, and thus the streamflow regime curves follow the trend of precipitation regime curves, which has already been captured by the base model.The addition of these other processes helps to adjust the peak flows rather than alter both timing and magnitude of the streamflow dramatically, as snowmelt did in the ID catchment.However, this does not mean that subsurfaceinfluenced fast flow is not important; as we will show in Sect.3.5, the combination of all three processes does improve considerably the estimation of both streamflow timing and magnitude.

Modification 3: interception loss
Although the incorporation of the subsurface-influenced Q f helped improve the fast flow prediction during late winter and early spring, we still tended to over-estimate the magnitude of Q f for most of the year.This was especially evident in several humid catchments where seasonality of precipitation is not significant (e.g., no snow and precipitation is uniform throughout the year) and vegetation cover variability is the strongest controlling factor.In these catchments, Q f tended to be over-estimated during the growing season (from late spring to autumn) when vegetation cover begins to reach its maximum value.Since catchments on the east coast have dense vegetation cover, the overestimation of surface flow during the growing season and the underestimation during the non-growing season could be caused by the presence of vegetation.One of the effects of vegetation on the water cycle is canopy interception (Savenije, 2004).It has been shown that interception could have a significant impact on the water cycle (Beven, 2001;Savenije, 2004); evaporation from intercepted water may reach 35 % of total rainfall in wet catchments and over 40 % in dry areas (Calder, 1990).This influence can then affect the infiltration, antecedent soil moisture, and runoff generation (Keim et al., 2006).Given the high proportion of vegetation cover in these catchments, the interception mechanism should not have been ignored.Therefore, to reduce the overestimation of surface flow during the growing season, we added the interception loss component I as follows: where α is the fraction of precipitation that is intercepted (a model parameter, to be estimated by calibration), LAI remotely sensed estimates of LAI, and LAI max the annual maximum of the LAI used to normalize the LAI time series.
Figure 7 shows the comparison of model predictions by the model enhanced with both the snowmelt and the subsurfaceinfluenced fast flow component and a further enhanced one with snowmelt, subsurface-influenced fast flow, and interception loss.The results show that the incorporation of canopy interception helps reduce the fast flow magnitude throughout the year and increases the slow flow during winter and early spring slightly, but is still not able to capture the strong seasonality in the flow.

Modification 4: phenology
In several catchments where the intra-annual variability of precipitation is relatively small, seasonality of flow is nevertheless much stronger than that of precipitation.The incorporation of the interception loss reduced the fast flow magnitude without differentiation, but was not able to increase the seasonality in the flow; the model continued to underestimate the spring flow peak of both fast and slow flow components.This is even more pronounced in some semi-humid and humid catchments (e.g., GA, VA), where rainfall arrives yearround without significant seasonality, as illustrated by GA in Fig. 1.We attribute this discrepancy to the growth cycle of vegetation and its impact on both interception and transpiration.Therefore, we applied a correction to the PET data using a growing season index (GSI) (Thompson et al., 2011) in order to improve the estimates of actual evapotranspiration and account for the effects of these plant water-use patterns, i.e., phenology.The phenology-corrected PET, denoted as PET c , is estimated as follows: where T min and T max were originally proposed as the minimum and maximum threshold soil temperatures of −2 • C and 5 • C (Jolly et al., 2005) to cover a large range of species.
Here, we approximate them by air temperatures of −5 • C and 10 • C (Thompson et al., 2011) due to the non-availability of soil temperatures.
Figure 8 shows the comparison of the predictions by the base model with snowmelt, subsurface-influenced fast flow, and interception added to it and an enhanced model that incorporated phenology as well.The introduction of the growing season index (GSI) affects the value of both Q f and Q u by increasing it substantially during winter and spring when transpiration from the vegetation is much smaller.This can also be seen in the simulated ET, where the ET for the model without phenology closely follows the PET during winter (since there is no restriction on water availability during this period), whereas ET for the enhanced model is much lower from November to April, thus increasing both the slow flow and fast flow substantially.With these three modifications, the model now performs well in these forested catchments.As a result, we reach the final and complete model resulting from the four different enhancements presented above (Fig. 3), and the final water balance equations for the two complete hillslope buckets are shown below: 4 Comparative model performance assessment

Performance of complete model across study catchments
The key aim of this paper is to use the complete model developed through the use of the downward approach above to explore (a) the dominant process controls that underpin the magnitude and timing of the regime curve, and (b) the minimum model complexity, in terms of the mix of processes, needed to reproduce the observed regime curves.Before we embark on this exploration, which is the subject matter of this section, we need to reassure ourselves that the complete model is sufficient for these purposes.For this reason we assessed the quality of model predictions on the basis of the MSE for normalized flows (see Eq. 4).Simulation results with the full model showed that model simulations of the 50yr averaged fast flow and slow flow regime curves fitted the corresponding empirical regime curves well in the eastern and western catchments, but failed in several mid-western catchments (e.g., Iowa) and also in extremely dry catchments in Oklahoma and Texas.Catchments in the southwest (TX, OK) are very dry, with aridity indices exceeding 1.5.The primary vegetation cover is grassland, and rivers are ephemeral -there can be as few as just one flow event during the entire year.Catchment responses in these areas were found to be much more difficult to predict with the use of simple lumped models, compared to the humid, and more forested catchments in the east, or the highly seasonal catchments on the west coast.Another area where the complete model did not produce good predictions is in the Midwest (especially catchments in Iowa) where the dominant vegetation cover is agricultural and anthropogenic effects related to agricultural water extractions cannot be ignored.For example, in the Raccoon River catchment in Iowa, subsurface (i.e., tile) drainage is estimated to cover over 40 % of the area (Zucker and Brown, 1998).Additionally, there appears to be considerable human-induced water extraction (Hatfield et al., 2009).These human activities have significantly altered the hydrologic response, which our simple model is not yet able to address.
As a simple model, we would not expect that it could accommodate anthropogenic activities or very complex catchments; therefore, we need to eliminate these catchments where the model performs poorly.To ensure that the model captures the dynamics as well as the volume of the streamflow, we use MSE as our criterion.The decomposition of the MSE (or Nash-Sutcliffe efficiency) shows that the MSE consists of three components: the mean, variance and correlation coefficient (Gupta et al., 2009).However, as the error is scaled by the standard deviation, it can be problematic for comparisons amongst catchments.To avoid this, we standardized the flow before the MSE calculation.We selected the 90 % of the catchments with the lowest MSE in fast flow, slow flow, and total flow separately and then obtained the intersection of these three sets to determine those catchments that had the lowest MSE in fast flow, slow flow, and total flow simulation.The resulting 152 catchments were then considered as "satisfactory" catchments, and the regional breakdown of the MOPEX catchments into "satisfactory" and "not satisfactory" is presented in Fig. 9.Note that the "not satisfactory" catchments are left out from the analyses of comparative performance assessments presented next.

Regional distribution of model parameters
In the rest of the analysis, we will focus on the catchments in which the complete model generated satisfactory regime curves for both fast flow and slow flow.Table 1 presents the overview of the parameters in all the "satisfactory" catchments: the mean, minimum, maximum, standard deviation and median relative error.The median relative error is the lowest, close to 10 % (11.5 %) for the second bucket capacity (S b2 ), around 20 % for interception loss (α) and the subsurface flow drainage time scale (t u ), around 30 % for the mean residence time associated with river network routing (t c ), the characteristic time scale of wetting (t w ) and the first bucket capacity (S b1 ), and 46 % for the root zone soil moisture capacity (S e ).Given the simplicity of the model and the large variations between catchments, this is deemed acceptable.
The average values of the key parameters are presented in Table 2 in detail for three catchment groups to give an impression of the regional distribution of these parameters: eastern US, central US, and western US.The eastern catchments are located near the east coast and within the Appalachian mountain region, while the western catchments are those located on the west coast and in the Rocky Mountains area; the remainder of the catchments forms the central US group (after removal of catchments deemed "not satisfactory").Nevertheless, these results should be considered as indicative only, given the conceptual nature of the models and the relative parsimony of model structures used.
Table 2 shows that interception loss as a fraction of precipitation (α), which has a significant impact on the water balance, especially on evaporation (Liu, 1997), lies in the 20-30 % range.Generally, it is larger on the east coast where vegetation is dense, and smaller in the dry catchments in the west and south-west (e.g., Texas and Southern California).This is consistent with what would be expected: forests are believed to be able to intercept more rainfall than grasslands (Deguchi et al., 2006), while coniferous forests tend to retain more rainfall than broad-leaved forests (Marin et al., 2000).Average bucket storage capacities of the first (surface) bucket (e.g., S b1 ) do not exhibit significant differences between the three regions.On the other hand, bucket capacities of the second (subsurface) bucket (e.g., S b2 ) show considerable variation: the mean value of S b2 in eastern US catchments is comparatively smaller than those in the central US, which is smaller yet than those in the west, suggesting effectively deeper soils as we move towards the west and southwest.The root zone soil moisture capacity, S e , is small in north-eastern catchments and in some southern mountainous catchments, reflecting the presence of thin soils and shallowrooted trees.Root zone storage capacity turns out to be highest in central parts of the continental United States, reflecting deeper soils and deep rooted vegetation.This increasing trend of soil moisture capacity from east to west may be related to climate seasonality (Samuel et al., 2008): in the eastern, humid catchments, where rainfall arrives throughout the year, the low moisture storage capacity and higher slopes help to drain this water quickly, leading to a smaller quantity of storage; in the center of the continental United States, with moderate seasonality and flat topography, the Midwestern catchments are usually characterized by deep soils and stronger soil moisture retention characteristics (Endres et al., 2001;McIsaac et al., 2010); near the west coast, due to the strong seasonality in P , which is out of phase with PET, the soil moisture tends to accumulate during the wet season, leading to higher overall storage.The characteristic time scale of wetting (t w ) is longest in the east, smaller in the west, and smallest in the central US.This trend is opposite to that of the subsurface flow drainage time scale (t u ).This must reflect the effects of soil permeability and topographic slope, which show a similar regional pattern with respect to t u .This is consistent with the findings of McGuire et al. (2005) in seven catchments with diverse geologic and geomorphic conditions: instead of basin area, the residence time is strongly related to terrain indices representing flow path distance and gradient.The mean residence time associated with river network routing (t c ) is a function of topographic slope and drainage area: the larger the drainage area, the flatter the topography, the longer is the network residence time.In any case, the t c -values are much smaller than those of subsurface flow residence time, t u .Since we are mainly concerned with the regime curve, the magnitudes of t w and t c are too small to have any impact on the streamflow regime curve, whereas the magnitude of t u is highly critical.

Elucidation of dominant processes for fast flow and slow flow
Having completed the modeling of all 197 catchments, we then sought to identify which of the four process modifications we made to the base model contributed most to improving the model performance.model with each one of the process enhancements, one by one, while maintaining all remaining model parameters at their previously calibrated values.For presentation purposes, we will denote the base model and the 4 subsequent additions by the names M0 to M4, where the numbers (1-4) refer to the number of processes added to the model.We then use the letters P, I, S, and G to specify the added process, respectively as phenology, interception, snowmelt, or subsurfaceinfluenced fast flow.For example, M1P is a Level 1 model, i.e., the base model plus phenology, and M3PIS is a Level 3 model, with base model plus phenology, interception and snowmelt.We estimated the AIC for the base model (AIC 0 ) and the AICs for each of four Level 1 models (AIC 1P , AIC 1I , AIC 1S , and AIC 1G ), along with the corresponding reductions in AIC ( AIC 1P , AIC 1I , AIC 1S , and AIC 1G ).Based on assessments of model performance of the four Level 1 models (M1), the process addition that leads to the highest improvement in model performance (i.e., in relation to the base model) would then be deemed as the dominant process.For example, in the Idaho catchment (Figs. 1 and 5), AIC 1S turned out to be largest, on the basis of which we could conclude that snowmelt is the dominant process in this catchment.
Note that if, in a particular catchment, none of the processes contributed to a decrease in AIC through its addition to the base model, or if the reduction is too small (e.g., less than 3 %), we would then consider the base model to be sufficient.The latter means that the magnitude of precipitation and its seasonality are the main or dominant controls on the regime curve, and the roles of vegetation, temperature and topography are second-order effects, and thus can be left out in any initial model simulations.
Figure 10 presents the results of this assessment of dominant processes for the 152 satisfactory catchments, separately for fast flow and slow flow.For fast flow (Fig. 10a), generally the dominant process in northern catchments is snowmelt due to the considerable amount of precipitation as snow (these catchments are circled and labeled as a, b, c, and d).Yet, there are slight differences among them: the northwestern catchments (circles a, d) are mountainous catchments, and snowmelt is the only additional process needed.Moving east to the center of the continental United States, i.e., catchments in the Midwest such as in Indiana (circle b), catchments are much flatter and winter temperatures are higher than in the northwestern mountainous catchments.Snowmelt is no longer the only dominant process for these catchments; some are dominated by subsurface-influenced fast flow, due to the fact that the soil in these places is silty clay loam with relatively smaller subsurface drainage rates, and consequently the water table could rise to the surface during parts of the year, generating saturation excess overland flow.On the other hand, on the east coast, the Appalachian catchments are covered with dense vegetation, and phenology is therefore dominant.The snow influence fades in the central and southern catchments, where vegetation impact increases (circles e, f, g).For the central catchments in Missouri (circle e), snow and vegetation impacts are equally important; in some of the northern catchments snowmelt helps to reduce AIC more and in others phenology and/or interception reduces AIC more.Looking at the eastern forested watersheds (circle f) where snow is rarely seen, phenology and interception are the most dominant processes in some of them, and in others, due to the small soil moisture storage capacity, subsurface-driven fast flow appears to be important.Catchments in New Mexico and Arizona (circle g), even though they are arid, do contain woodland or wooded grassland coverage over 60 % of the catchment areas, and, given the dry climate, streamflow is extremely sensitive to vegetation effects.Southeastern catchments (circle h) are marked here as base model dominant.Although there is dense vegetation coverage in these catchments, seasonality in climate makes simulation much easier.The catchments in Florida experience a wet season from mid-summer to early autumn (Fig. 1i); they receive abundant rainfall that is caused by frequent convective activity as well as the occasional tropical storms similar to those experienced in monsoon Asia (Fernald and Purdum, 1998).The catchments in Georgia also display seasonality of precipitation; they receive heavy rainfall during winter and spring when the evapotranspiration rate is quite low, thus enhancing the seasonality observed in streamflow generation (Opsahl et al., 2007).The phenology influence is mitigated somewhat in these southern catchments since the duration of vegetation coverage is much longer than the Appalachian Mountain catchments When it comes to slow flow, spatial patterns of dominant processes are, for the most part, similar to those for fast flow: snowmelt dominates in northern catchments, replaced by vegetation effects in southern catchments.Snowmelt is the most dominant process in north-western catchments (catchments located within circles a, d, e.g., ID).As we move further east, vegetation cover increases and phenology appears to be the dominant process in many northeastern catchments (circles b, c).Most of the catchments in the Mississippi River region (circle e) indicate phenology to be the dominant process given the considerable vegetation cover and intermediate rainfall.As in the case of fast flow, phenology is dominant in Arizona and New Mexico, and these otherwise dry catchments appear to be highly sensitive to vegetation effects (circle f).The dominant processes in southeastern catchments for slow flow generation appear to be more diverse than in the case of fast flow.In this case all four process additions are sufficiently involved in slow flow generation; their effects are of a similar order and not one process is most dominant.

Minimum model complexity for reproduction of regime curves
Although the full model generated acceptable predictions of the streamflow regime for all "satisfactory" catchments, especially outside of the mid-west and south-west (see Fig. 9), we discovered in the previous section that the importance of each process addition was not the same everywhere.Some of the processes are never invoked (i.e., snowmelt in warm catchments) or could easily be left out in some of the catchments without loss of overall performance (i.e., phenology in southern catchments where the weather is always warm).
In this section, we want to determine the minimum model complexity that can generate satisfactory predictions, including all processes that are deemed essential to reproduce the regime curve to reveal and concentrate on the most necessary processes in those catchments.In some catchments this is obvious; for example, snowmelt is clearly not needed in southern catchments.In many other catchments, this is not so self-evident, and we can only determine this through careful quantitative assessment.
Once again we use the AIC to measure model performance.However, this time we apply the optimized parameter sets for the full model repeatedly to the 15 possible model structures (including one Level 0 model (i.e., the base model), four Level 1 models, six combinations of Level 2 models, and four combinations of Level 3 models).In each case we estimate the AIC of the total flow predictions for each of the 15 models.Starting from the base model (M0), we compare the AIC at every modeling step with the AIC of the full model (AIC 4 ): if the AIC of the base model (AIC 0 ) is smaller than that of AIC 4 , then we can say that the base model is adequate to generate satisfactory predictions.Otherwise, we continue to the Level 1 model (M1) and after comparing AIC 1 with AIC 4 , if none of the M1 models can reduce AIC from AIC 4 , we continue to the Level 2 models, and so on.This comparative assessment comes to an end when we arrive at model structure that produces the smallest AIC.
Since interception and phenology are both vegetation effects, to reduce the number of models for presentational purposes (i.e., to obtain a clearer picture), we combine interception and phenology into a single category of "vegetation effects".In this way half of the model classes are eliminated, with only 8 remaining model groups.Figure 11 presents the results of this analysis, displaying regional patterns of needed model complexity.
One can see in Fig. 11 that the base model is sufficient for the west coast catchments as well as the southeastern catchments in Florida (circles a, i) where the climate is humid and seasonality is strong.Consistent with what was found in the case of the dominant process for fast and slow flows (Fig. 10), snowmelt is again found to be important in many northern catchments (circles b, c, d, e).Most of the northwestern mountainous catchments (circle b) need the base model plus snowmelt only, although some indicate the need to include vegetation effects and also subsurface-influenced fast flow (presumably reflecting the presence of thin soils and substantial vegetation cover).Moving further east, both snow and vegetation effects are found to be necessary (circles c, d).This is again consistent with the dominant processes identified for fast and slow flow (Fig. 10), where both phenology and snow were seen to be equally important.On the east coast (circle e), not only vegetation and snow, but also subsurface induced fast flow is found to be necessary (reflecting the occurrence of saturation excess streamflow).In southern catchments (circles h, f, g), snow is obviously not needed, but vegetation effects and subsurface-influenced fast flow must be accounted for.In North Carolina (circle g), vegetation effects are seen as the only addition needed, while both vegetation and subsurface-influenced fast flow are found to be needed in Georgia and Missouri (circle f).

Mapping the model process classes
The results from the model performance assessments presented in the previous sections, especially those presented in Figs. 10 and 11, can now be synthesized to develop broad classifications regarding dominant processes underpinning regional patterns of the variation of streamflow regimes across the continental United States.The results are presented in Fig. 12, along with the cluster plot of the observed flow regime curves, to demonstrate this regional and functional hydrological similarity.Although these results must be looked at with some caution, considering that they are based on analysis of the 152 satisfactory catchments, the broad generalizations presented in Fig. 12 can serve as the foundation or even motivation for further detailed data analyses and modeling investigations.
The results shown in Fig. 12 indicate, firstly, that the base model is sufficient to capture the regime curve in western and Fig. 12. Conceptual map of the spatial distribution of the controlling processes and the regime curve clusters: "B" refers to the base model; "S" refers to snowmelt; "V" denotes vegetation impact (phenology and/or interception); "G" stands for subsurface-influenced fast flow; and "Human Impacted" means with strong anthropogenic activity impact.
south-eastern catchments where seasonality dominates.In north-western mountainous catchments, such as in Idaho, the addition of snowmelt to the base model is sufficient to capture the shorter duration high flows occurring in late spring and early summer.Going west to east in the northern humid/cold regions, seasonality of precipitation decreases, vegetation cover becomes denser, and models must capture both snowmelt and vegetation effects, as well as the possibility of saturation excess overland flow.Moving north to south (in the east), the importance of snowmelt decreases, and only vegetation effects and saturation excess streamflow remain important.As one approaches Florida, once again the base model appears to be sufficient.As one moves east to west from Florida, catchments become drier, with much reduced streamflow, and prediction of regime behavior becomes increasingly difficult with simple lumped models, until one reaches Southern California, where again the base model appears sufficient due to the out-of-phase seasonality experienced there.
Figure 12 also summarizes the main drivers of the regional patterns of dominant processes and needed model complexity.In broad terms, seasonality increases east to west, while temperature and climate aridity increase north to south and phenology decreases north to south.There are exceptions to these trends as well.For example, the extreme southeast experiences strong seasonality, likely due in part to the influence of hurricanes as well as close proximity to two large bodies of water -the Gulf of Mexico and the Atlantic Ocean.Likewise, the north-west (e.g., Washington State) is warmer than would be expected for such northern latitudes.Additional features that are critical include the occurrence of precipitation as snow in northern latitudes, and vegetation cover dynamics (i.e., phenology) in the forested regions in prediction of the FDC is not so good.This is to be expected since a model focused on predicting the regime curves only cannot be expected to predict well the high and low flows; therefore, the model needs to be further enhanced to achieve this.
The results also demonstrate that in some catchments (WA, MO, and FL) removal of a process does not have an obvious effect on the FDC.Conversely, for snowmeltdominated catchments such as the one in Idaho, the removal of snowmelt makes the FDC much flatter.This is consistent, given that snowmelt is the most important process addition in Idaho.On the other hand, in eastern catchments with dense vegetation cover (e.g., NY, GA), removal of phenology actually steepens the FDC.In dry catchments (Southern CA, TX), only the influence of phenology is recognizable, although we have learned from the regime curves that the other three processes are also important.Thus, differences in dominant processes can contribute to significant differences between the regime curves, which cannot be easily recognized in the FDCs because of the strong influence of high flows and low flows.In general, because of the connection between the RC and the FDC, seasonality is present in the FDC, though not as obviously as in the RC, due to the loss of temporal information.While the time element is lost in the FDC, information on extreme values and frequencies, which are averaged out in the RC, is gained.

Discussion and conclusions
The goal of this paper has been to identify the dominant processes underpinning streamflow regime behavior across the continental United States.For this reason, we analyzed rainfall-runoff data from 197 catchments belonging to the MOPEX dataset.The analyses involved a systematic process of model development following the downward approach (Sivapalan et al., 2003); starting with a simple base model, it is enhanced through the addition of key processes needed to reproduce the regime curve.
The resulting final (complete) model was then used to perform sensitivity studies to (a) decipher the most dominant process control, and (b) to determine the minimum model complexity needed to generate a satisfactory reproduction of the empirical regime curves.The sensitivity analyses were carried out in opposite directions.In one case, we started with the base model, and then increased model complexity by including additional processes one by one until we reached the final form of the model, all the while monitoring the improvement in model performance.In the other case, we start with the full model and drop processes one by one until we arrive at the minimum model complexity needed to achieve satisfactory predictions.
The results revealed interesting regional patterns in the process controls of the regime curves across the continental United States, which is also related to Köppen's climate classification map.Snowmelt was found to be the most important process for modeling northwest catchments which falls in Köppen's snow steppe climate class (Dsa) for both fast and slow flows.However, it was not sufficient for slow flow prediction in cold, north-eastern catchments (the snow, fully humid, warm summer class, Dfb), where the vegetation effects take over as most important due to the presence of significant forest cover.Vegetation effects and the role of rising water table are found to be significant for fast flow in the Appalachian and southern catchments.The requisite process for modeling cold, mountainous forested catchments is snowmelt; for cold, forested catchments near the east coast, however, the processes include both snowmelt and vegetation; the warm, humid catchments in the southeast with strong seasonality can be easily modeled with the simple base model (the warm temperate, fully humid, hot summer class; Cfa), the warm, very dry catchments in the south and south-west (Bsk: arid steppe cold arid) require much more complex models.
The reasons for the regional patterns of process controls of regime curves across the United States also became clear through these regional studies.The obvious reasons are seasonality (which increases east to west, with some exceptions), aridity (which increases north to south with some exceptions) as well as temperature (which increases north to south, again with exceptions due to effects of mountain topography, and proximity to oceans).As the seasonality increases from east to west, needed model complexity decreases (except in the mid-west due to human interferences); the same phenomenon is also observed as we go from south to north with the decrease in aridity; and importance of snowmelt increases from warm to cold catchments (south to north).
Despite the understanding gained regarding the process controls underpinning regional variations of regime curves, their impact on the shapes of FDCs has been found to be less strong.Two different processes that occur during different times of the year could have a significant effect on the shape of the regime curve, yet may not significantly affect the shape of the FDC.However, interesting regional patterns were seen in both the process controls on the regime curve determined here, and the empirically determined parameters of the mixed gamma distribution as applied to the FDC determined in Cheng et al. (2012).Sorting these catchments into classes may be a way to provide more explanatory power for these patterns and process controls, thus motivating the development of the classification scheme outlined in Coopersmith et al. (2012).

Fig. 1 .Fig. 2 .
Fig. 1.Observed regime curves of precipitation, PET, fast flow (Q f ), slow flow (Q u ) and total flow (Q) in the nine selected catchments across the country.AI is the aridity index (PET/P ).

Fig. 3 .
Fig. 3. Structure of the complete model: reservoirs are represented in solid green boxes; green is used for state variables, blue for fluxes and brown for model parameters.Red boxes represent the added processes, and dashed lines denote the fluxes from these added processes.
adapted from Harman et al. (2011) to sample the parameter space towards constructing the posterior distribution.The algorithm, a Markov chain Monte Carlo (MCMC) technique, is able to sample the S. Ye et al.: Controls of regional patterns of flow duration curves -Part 2: Process modeling parameters efficiently in the vicinity of the maximum likelihood.

Fig. 4 .
Fig. 4. Comparison of regime curves of P , PET, ET, Q, Q f , and Q u in a catchment in Northern CA between observation (blue line) and base model simulation (red line).

Fig. 5 .
Fig. 5. Comparison of regime curves of P , PET, ET, Q, Q f , and Q u in a catchment in ID among observation (blue line), base model (B, red line) and base model with snowmelt component (BS, solid red line).

Fig. 6 .
Fig. 6.Comparison of regime curves of P , PET, ET, Q, Q f , and Q u in a catchment in GA among observation (blue line), base model with snowmelt (BS, solid red line) and base model with snowmelt as well as subsurface-influenced fast flow component (BSG, red dotted line).

Fig. 7 .
Fig. 7. Comparison of regime curves of P , PET, ET, Q, Q f , and Q u in a catchment in GA among observation (blue line), base model with snowmelt and subsurface-influenced fast flow component (BSG, solid red line) and base model with snowmelt, subsurfaceinfluenced fast flow and interception loss component (BSGI, red dotted line).

Fig. 8 .
Fig. 8.Comparison of regime curves of P , PET, ET, Q, Q f , and Q u in a catchment in GA among observation (blue line), base model with snowmelt, subsurface-influenced fast flow and interception loss component (BSGI, solid red line) and the complete model (BS-GIP, red dotted line).

Fig. 9 .
Fig. 9. Spatial distribution of the goodness of the model prediction in 197 catchments.

Fig. 10 .
Fig. 10.The most important process in catchments with effective model prediction: (a) fast flow, (b) slow flow.The circled areas represent regions of process similarity.

Fig. 11 .
Fig. 11.The needed process complexity for catchments that produced satisfactory simulation performance.The circled areas represent regions of process similarity.

Table 1 .
Overview of the estimated parameters for all the satisfactory catchments.

Table 2 .
Mean value, standard deviation and median relative error of 7 parameters for eastern, central and western catchments.