Articles | Volume 28, issue 6
Technical note
26 Mar 2024
Technical note |  | 26 Mar 2024

Technical note: Testing the connection between hillslope-scale runoff fluctuations and streamflow hydrographs at the outlet of large river basins

Ricardo Mantilla, Morgan Fonley, and Nicolás Velásquez

A series of numerical experiments were conducted to test the connection between streamflow hydrographs at the outlet of large watersheds and the time series of hillslope-scale runoff yield. We used a distributed hydrological routing model that discretizes a large watershed ( 17 000 km2) into small hillslope units ( 0.1 km2) and applied distinct surface runoff time series to each unit that deliver the same volume of water into the river network. The numerical simulations show that distinct runoff delivery time series at the hillslope scale result in indistinguishable streamflow hydrographs at large scales. This limitation is imposed by space-time averaging of input flows into the river network that are draining the landscape. The results of the simulations presented in this paper show that, under very general conditions of streamflow routing (i.e., nonlinear variable velocities in space and time), the streamflow hydrographs at the outlet of basins with Horton–Strahler (H–S) order 5 or above (larger than 100 km2 in our setup) contain very little information about the temporal variability of runoff production at the hillslope scale and therefore the processes from which they originate. In addition, our results indicate that the rate of convergence to a common hydrograph shape at larger scales (above H–S order 5) is directly proportional to how different the input signals are to each other at the hillslope scale. We conclude that the ability of a hydrological model to replicate outlet hydrographs does not imply that a correct and meaningful description of small-scale rainfall–runoff processes has been provided. Furthermore, our results provide context for other studies that demonstrate how the physics of runoff generation cannot be inferred from output signals in commonly used hydrological models.

1 Introduction

The question of how will river flows change in the presence of climatic and anthropogenic changes dominates the literature in water resources journals and research conducted around the world (Arnell and Lloyd-Hughes, 2014; Blöschl et al., 2019; Gudmundsson et al., 2021; Hirabayashi et al., 2013; Whitehead et al., 2018). Slowly evolving regional climatic changes and rapid anthropogenic modifications to the landscape will impact how watersheds deliver water to communities along the river network, and recognition of this fact has fostered the development of methods and techniques to address this urgent problem (Kang et al., 2016; Kourtis and Tsihrintzis, 2021). Physics-based hydrological modeling has emerged as the preferred alternative for predicting the future of the hydrological cycle under the projected changes (Barnett et al., 2008). An example of rapid anthropogenic change in the US Midwest is the use of drainage tiling, which is effective for increasing corn yields by promoting the rapid movement of subsurface flows into rivers (Fonley et al., 2021; Schilling et al., 2019; Schilling and Helmers, 2008; Velásquez et al., 2021). This raises the question of how these landscape modifications will impact the cycles of flooding and droughts in future climatic scenarios (Akter et al., 2018; Júnior et al., 2015; Peng et al., 2019). In the typical approach to this question, a land-surface model is conceptualized, calibrated, validated, and then forced with past and future climatic inputs (rainfall, radiation, relative humidity) derived from global circulation models (Barnett et al., 2008; Condon et al., 2015; Hidalgo et al., 2009; Sadeghi Loyeh and Massah Bavani, 2021; Taye et al., 2015; Quintero et al. 2018). The underlying premise is that the hillslope-scale process equations correctly describe the water movement and partitioning into runoff and infiltration. This premise allows for changing parameter values to reflect future land-surface conditions (e.g., vegetation types, deforestation, land cover, land use) along with future conditions of meteorological forcing. Nevertheless, results from Huang et al. (2017) at 12 large-scale watersheds using nine models indicate significant limitations during extreme events. Some limitations can be improved by calibration at the cost of losing representation of processes such as evapotranspiration (Ahmed et al., 2023).

The use of land-surface models relies on their validation after parameter calibration (Beven, 1989). Historically, parameters in land-surface models are adjusted (calibrated) to match streamflow hydrographs at gauged sites in the outlet of large watersheds (> 100 km2), and a portion of the data is left to validate the hydrological model using unobserved events (Bérubé et al., 2022; Gupta et al., 2006; Refsgaard, 1997; Shen et al., 2022). Once this test has been passed, a land-surface model is deemed appropriate to explore future scenarios (Sadeghi Loyeh and Massah Bavani, 2021; Schilling et al., 2008; Whitehead et al., 2018). Several authors have highlighted the issue that stronger and more appropriate methods and techniques for validating/invalidating hydrologic models are needed (e.g., Beven and Lane, 2019), but the ability to simulate streamflow hydrographs continues to dominate the literature as the standard for model validation. Other data-based techniques for streamflow prediction are not appropriate to address these questions, because their parameters cannot be directly linked to physical processes and because there is a recognition that non-stationarities render historical information unreliable to predict the future behavior of hydrological systems (Bayazit, 2015; Cancelliere, 2017; Milly et al., 2008; Salas et al., 2018). Furthermore, a recent simulation-based study by Remmers et al. (2020) concluded that “it is challenging and in most cases impossible to infer model structure from model output for the part of model space, bucket-based hydrological models, that [were] sampled”. This again raises the historic question: what are the mechanisms that blur such differences? This issue of identifiability of hydrologic models has been raised as far back as research in the early 60s (Dawdy and O'Donnell, 1965). In this sense, different authors have arrived at similar conclusions. According to de Boer-Euser et al. (2017), it is challenging to link hydrograph differences and model structure components. Moreover, in an intercomparison experiment of three models, Vetter et al. (2015) observed similar performance among them. Tijerina et al. (2021) described similar spatial performance of two hydrological models over the conterminous USA and the Mai et al. (2022) comparison of 13 models obtained different rankings in the best-performing models when evaluating discharge versus distributed variables such as snow-water equivalent (SWE). Additionally, in the recent evaluation of the Hydrology Laboratory Research Distributed Hydrologic Model (HL-RDHM) by Madsen et al. (2020), it was shown that the performance of streamflow predictions was scale dependent, decreasing for small basins compared to the performance in larger basins, a result that is consistent with other multiscale evaluations of predictive performance of distributed models (Seo et al., 2021, 2023).

There is a long history in the hydrologic literature showing that dispersion and aggregation of flows in the river network dominates the shape of the hydrograph at larger scales (Surkan, 1968; Kirkby, 1976; Beven and Wood, 1993; Snell and Sivapalan, 1994; Mantilla et al., 2006; Zarlenga et al., 2022); however, there is not a systematic evaluation of how quickly the river network effects dominate over hillslope-scale variability under generic conditions of routing and heterogeneity among hillslopes. In the present work, we explore how different hillslope-scale ( 0.1 km2) surface runoff time series correspond to hydrographs at the outlet of larger-scale (> 100 km2) basins. We used a distributed hydrological model that discretizes the landscape into small hillslope control volumes interconnected by the full extent of the river network of the Cedar River basin ( 17 000 km2). We created a set of forcing signals that are significantly different from each other but share the same volume of water injected into the hillslope surface. Each of the input signals represents the output of distinct land-surface process descriptions. Our results show that input signals that are very different from each other at the hillslope scale produce streamflow hydrographs at large scales that are indistinguishable from each other. This result confirms that our ability to reproduce hydrographs at the outlet of a large basin is not a reliable indicator that we have correctly described small-scale processes controlling runoff production that include the description of vegetation, soil types, land use practices, snowmelt rates, etc. Furthermore, we provide a quantitative prediction for how quickly the information of small-scale processes dissipate in the river network, showing why it is difficult, if not impossible, to establish a causal link between runoff generation mechanisms and output hydrographs as discussed by Remmers et al. (2020). Finally, our work provides quantitative measures that prevent linking the properties of outlet hydrographs to plot-scale processes that can prevent accepting model descriptions that can be right for the wrong reasons (Kirchner, 2006).

Our paper is organized as follows. In Sect. 2, we present the main methodological steps of our work, including the description of the routing model, the river network used, the input signals representing surface runoff, and the metrics for comparison of similarity between hydrographs. In Sect. 3, we present the main results of the simulations for three levels of complexity in the description of flow routing. Section 4 discusses the results and how they furnish the basis for the conclusions presented in Sect. 5.

2 Methodology and data

2.1 The hydrological model and the river network

The distributed hydrological model used in this study can be succinctly described as a nonlinear water transport equation through a directed river, which is the routing component of the operational flood forecasting model HLM (Mantilla et al., 2022). The model is formulated in the context of a mass conservation equation developed by Gupta and Waymire (1998), and it uses the water velocity parameterization given by Mantilla (2007) where flow velocity for any link j in the river network at time t is given by v(j,t)=voqj,nλ1Ajλ2, where Aj (in km2) is the upstream basin area from link j, and qj,n (in m3 s−1) is the flow from the channel link j toward the downstream channel link n, which is further related to water storage in the link scj by the relation qj,n=v(j,t)ljscj(t). The equation is non-homogeneous; therefore, the velocity vo (given in m s−1) is a reference quantity that corresponds to the flow velocity for a channel link that drains a 1 km2 catchment when 1 m3 s−1 flows through. The nonlinear scaling relationship of velocities given by the parameters λ1 and λ2 are derived from assumptions about the variation of local and downstream variation in hydraulic geometry (cross-sectional area and friction) with respect to basin area that are well established in the literature (Singh, 2022). The parameterization of velocity replaces the momentum conservation equation (see Mantilla (2007) for details) and is equivalent to the kinematic wave simplification of the Saint-Venant equations integrated over the channel length. The parameterization is given by

(1) d q j , n ( t ) d t = v o q j , n λ 1 t A j λ 2 l j 1 - λ 1 q p , c j , t - q j , n t + f j q f , j t .

The index f in the equation refers to the set of upstream links draining into link j. The parameterization is equivalent to assuming that the channel slope is equal to the slope of the water surface (normal flow) and that the flow is purely kinematic, obeying a power-law relation between flow depth and velocity (such as Manning, Chezy, or the Darcy–Weisbach equations). As pointed out by Beven (1979), there is distinction between flow velocity and celerity in the system. In our equation, celerity c is related to flow velocity by the relationship cj,t=vj,t1-λ1, making celerity spatially and temporally variable. In particular, for the three scenarios of flow velocity that are investigated in this paper (see Velásquez and Mantilla, 2020 for definitions), when velocity is assumed to be constant in space and time (i.e. λ1=0 and λ2=0), celerity is equal to velocity; in the case of the global self-similar scenario where λ1=0.3, celerity cj,t=1.43v(j,t); and for the local self-similar scenario where λ1 takes random values between 0.1 and 0.5, celerity takes values between 1.1 v(j,t) and 2.0 v(j,t). Finally, the function qp,c(j,t) represents the flow from the hillslope j into the channel link j. For this paper, the runoff function is

(2) q p , c j , t = q p , c j , 0 + a h j 0 t p t - τ e - k p τ d τ ,

where the parameter kp is the inverse residence time in the hillslope surface when water flows at constant velocity vh, i.e., qp,cj,t=ahjkpspj=ahjvhlhjspj over the hillslope length lhj, and p(t) represents the flow of water into the hillslope surface in units of [L T−1] (e.g., mm h−1). In hydrological models, the function p(t) represents “effective precipitation” or is the result of other physical processes represented in the land-surface model, such as snowmelt, that are controlled by the accumulation of water in the hillslope surface, including interception, evaporation, infiltration, and exfiltration.

The river network chosen for this study was the Cedar River upstream of Cedar Rapids, Iowa (Fig. 1). We chose this network to maintain the realism of the connectivity structure. However, simulations not presented in this paper indicate that our results can be generalized to any random self-similar network. At the outlet, the network reaches a Horton order of 8, with a width function that shows the largest accumulation of channels at a common distance in the upper region of the watershed (Fig. 1a). We also show width functions for select smaller Horton-order locations in the river network to illustrate the natural variability of river network connectivity across scales.

Figure 1Cedar River watershed network structure (right) with the width function at the outlet (ω=8) and at nested subwatersheds with orders ω=2 to 7.

2.2 Development of hillslope-scale runoff fluctuations

In general, any rainfall pattern which is integrable over an interval of length can be written in terms of a Fourier series expansion as

(3) p t = a 0 2 + n = 1 N a n cos 2 π n t T r - φ n ,

where phase shifts allow us to represent sines (typically present in the Fourier series expansion) in terms of cosines of the same period. In this format, the coefficients can be chosen to sufficiently represent any function, including those that are not smooth.

We applied inverted and shifted cosines representing one, two, three, and four cosine waves distributed over the course of 1 or 3 d. The cosines were shifted to affirm that they were nonnegative and inverted – beginning and ending with zero precipitation intensity – to impose smoothness on the rainfall signal. We included both right-skewed and left-skewed sawtooth (linear) patterns and signals depicting uniform precipitation over 1, 2, 3, and 6 d.

Input to the model was applied as flux into a hillslope surface represented by a linear reservoir that feeds into the adjacent river link. Each input was designed to deliver 48 mm of water to the hillslope, and each signal began at time zero. Flow in the river network was primed by simulating a 1 d rainfall event and allowing the system to drain for 10 d before applying the rainfall signal so that each link of the river network had an initial flow that was commensurate with the watershed it drained.

2.3 Developing a reference hydrograph for each location in the river network

Comparing hydrographs across scales can be difficult due to the variability involved. To address this difficulty, we created dimensionless hydrographs for all locations in the river network with the following equations:

(4) q = q q ref and t = t t ref ,

where qref is a reference maximum streamflow, and tref is a reference hydrograph time to peak. For reference, we chose the instantaneous unit hydrograph that resulted from applying precipitation via a Dirac delta function, such that the volume of water was directly applied as an initial condition onto the hillslope surface, acting as an instantaneous event. From that hydrograph, we identified the peak flow and the time to peak flow for every link in the river network. We defined an event's end as the moment when the flow was smaller than 1 % of the peak flow.

2.4 Analysis of convergence of streamflow signals at larger scales

To analyze the convergence of the hydrographs for streams of order ω, we compared moments about the time axis (t) and about the streamflow axis (q). We computed the first moment in t and in q as


and we computed the second-centered moment with the following equations,


Using these definitions of moments, a ratio of the moments between two different hydrographs can be defined as

(9) R i ( A | B ) Z ( ω j ) = max M i z ω j A , M i z ω j B min M i z ω j A , M i z ω j B ,

where Z represents the variable of interest (in this case q and t). If the moments of two different functions, A and B, approach each other, then Eq. (9) tends to 1 in the limit as

(10) lim ω R i ( A | B ) Z ω = 1 .

The min and max functions in Eq. (9) guarantee that the limit will be reached from above in all cases. This property allowed us to define a sigmoid function to characterize the rate of convergence to the limit of ω as

(11) R ^ i B Z ω = S f 1 - 1 1 + e ω i - ω α ,

where Sf is a scale factor, ωi is the starting Horton order, and α is a parameter controlling the shape of the curve. Larger values of α indicate rapid convergence of the curve to 1 as the Horton order of the basin increases; conversely, small values of α indicate slower convergence to 1.

3 Results

For each of the proposed input signals, we integrated the HLM routing equations using different assumptions about the distribution of velocity in space and time, including linear routing with constant velocity in space and time, nonlinear global self-similar routing, and locally self-similar nonlinear routing (Velásquez and Mantilla, 2020). For each setup, we obtained hydrographs at all the watershed links (43 000 in total). At the smaller scales (ω=2,3, or 4), each signal produced a distinct hydrograph. However, the hydrographs became indistinguishable with increasing Horton order.

3.1 Comparing hydrographs at multiple scales

In Fig. 2, we show the resulting hydrographs at multiple scales for input signals with a duration of 3 d. The top-left panel in Fig. 2 shows the different input signals used in this experiment. We chose locations in the river network corresponding to complete-order streams of orders 2 to 8 to illustrate differences in hydrograph shapes at different scales. The first set of graphs shows the hydrographs calculated under the assumption of constant flow velocity in the channels (i.e. λ1=0 and λ2=0) for locations with increasing Horton order. We repeated the calculation using more complex nonlinear routing assumptions. In the global self-similar case, we assumed v0, λ1, and λ2 to be the same in the entire network (yellow shaded panels in Fig. 2), while in the local self-similar case the same parameters are variable throughout the catchment (green shaded panels in Fig. 2).

Figure 2The 3 d duration input signals and resulting hydrographs at subwatersheds of orders between 2 and 8. The magnitude and time duration of the hydrographs is standardized relative to the 3Day1Cos (blue) hydrograph.


3.2 Convergence

In Fig. 2, we illustrate how all the 3 d duration signals tend to converge. Moreover, we can define the convergence of two or more signals relative to any signal by comparing their moments. Figure 3 shows the average value of RA|B for the first (circles) and the second (triangles) moments when we compare moment ratios between the LTri3Day signal with the 3DayUniform one. In both dimensions (time and magnitude, Figures 3a and b, respectively), the value of RA|B tends towards 1 with the increase of the Horton order. In the Supplement, we illustrate how the convergence of hydrographs occurs for all links in the network (Supplement Fig. S1). The convergence of hydrographs also holds for the more complex cases of nonlinear routing under the assumption of global and local self-similarity.

Figure 3Changes of the first (circles) and second (triangles) moment ratios of the LTri3Day signal with respect to the 3DayUniform signal for different routing schemes. In panel (a), moment ratios are about the relative time RiBt, and in panel (b) moment ratios are about the relative streamflow RiBq.


3.3 General convergence to the instantaneous hydrograph (Dirac simulation)

To obtain a more general idea of the convergence, we computed RA|B for all the input signals using the Dirac function as a reference. RA|B at each Horton order ω was the median value obtained for all the links of order ω. The dots in Fig. 4 correspond to the computed median value of RA|B. According to our results, each signal has a different RA|B for ω=2 and, in all the cases, tends to converge toward 1 with increasing values of ω. Moreover, the convergence rate increases as a function of the initial differences. Signals with higher RA|B values at ω=2 tend to converge at a faster rate than others.

Figure 4Convergence for all the signals relative to the Dirac simulation assuming constant velocity routing. The dots correspond to the median value of the moments ratio R, and the lines correspond to Eq. (11) results.


We describe the convergence rate of the median values in Fig. 4 using Eq. (11). In the equation, we fixed the scale factor Sf to 1.3, and we took ωi=2 as the reference Horton order. Then, using the nonlinear least squares method, we adjusted the parameter α for a set of generated hydrographs. With this approach, we obtained a good representation of the signal's convergence starting at ω=2 (continuous lines in Fig. 4).

According to Fig. 4, the parameter α is an index of the convergence rate for any signal. Larger values of α correspond to faster convergence rates. Moreover, we found that α depends on the order of the moment, the dimension being considered (t or q), and the streamflow routing approach. In Fig. 5, we present α vs. RA|B(ω=2) for each moment order, dimension, and assumptions on the variability of velocities in space and time. In all cases,α was proportional to RA|B(ω=2). This means that the greater the difference in the hydrographs at smaller scales, the faster the convergence rate to the Dirac shape at larger scales. The Supplement presents convergence for the nonlinear global and local self-similarity cases (Figs. S2 and S3).

Figure 5Values of the estimated parameter α as a function of moment ratios for hydrographs at order 2 streams for different scenarios of routing.


We found that the lower α values in the ratio of the second moment in time (Fig. 5b) correspond to lower values of RA|B (Fig. 4b). Additionally, higher α values occur for the ratios of the second moment in q (Fig. 5d), which coincide with large RA|B values (Fig. 4b). Also, we found that the convergence rates of the linear, nonlinear, and self-similar HLM setups are more alike in t than in q.

4 Discussion and conclusions

Hydrologists strive to “be right for the right reasons” when modeling the hydrologic cycle; however, the datasets available to validate hydrological models are sparse and often comprise only streamflow observations at the outlets of large catchments. Typically, hydrologic modelers calibrate and validate their models using available streamflow observations. Our study sheds some new light on the limits of this strategy and provides an explanation for the difficulty in establishing a causal link between small-scale runoff generation processes and hydrograph shapes at the outlets of large river basins.

The moment ratios expressed in Eq. (9) show how hydrographs generated by different hillslope-scale runoff signals differ from each other across Horton order scales. Our study demonstrates that differences in the hydrographs at the hillslope scale are smoothed out for larger scales. The processes controlling the convergence rate are the spatial aggregation imposed by the self-similar river network draining the landscape and the attenuation that is controlled by the flow routing equations. Simulations not shown in this paper indicate that our results hold true for any self-similar network structure. However, a systematic analysis of how different network configurations determine the rate of convergence and if self-similarity is a necessary condition remains to be done. Two interesting findings of our study are that (i) the rate of convergence of hydrographs to a common shape at larger scales is proportional to how different the input signals are at the hillslope scale, and (ii) that the convergence is independent of the assumptions on the spatial and temporal variability of in-channel velocities imposed in the routing scheme.

Typically, validation of small-scale land-surface representations relies on the ability to reproduce streamflow hydrographs at the outlet of large catchments where historical records are available. An intermediate step in developing a site-specific hydrological model is calibrating its parameters that control the rainfall–runoff transformation (e.g., infiltration rates, hydraulic conductivity, interception rates) and runoff routing through the river network (e.g., channel and floodplain friction coefficients). Our results suggest that two very different descriptions of small-scale processes (e.g., variable saturated area vs. variable infiltration rates) can lead to equivalent hydrographs at larger scales. The fact that two competing hypotheses lead to the same result hampers the possibility of determining how changes in the small-scale process will affect streamflow hydrographs in larger catchments. Our analysis shows that the river network connectivity leads to an averaging of the runoff produced in different locations and at different times, indicating that if the right volume of runoff is applied at a given Horton-scale the hydrographs for the network of five orders or above would be indistinguishable.

The generic routing schemes tested in this study give us confidence that our results can be generalized to any hydrological distributed model of any basin that explicitly includes a river network with Horton–Strahler order 5 or larger. This extends the problem of model “equifinality” (Beven, 2006) to a larger problem of “equivalent models” where distinct descriptions of hillslope-scale descriptions lead to the same resulting outlet hydrograph.

Code and data availability

Data and software for this research can be found at (Mantilla et al., 2022).


The supplement related to this article is available online at:

Author contributions

RM conceptualized the problem, prepared preliminary results, and wrote the original manuscript, MF contributed to conceptualization, developed the mathematical equations to prescribe runoff at the hillslope scale, and wrote code for initial testing. NV contributed to the conceptualization, implemented the equations in parallelized HPC (high-performance computing), performed all the large-scale simulations needed in the study, and developed figures in the article. All the authors performed review and editing tasks.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


This work was completed with support from the Iowa Flood Center, MID-American Transportation Center (grant number: 69A3551747107), the Iowa Highway Research Board, and Iowa Department of Transportation (contract number: TR-699). RM also acknowledges current funding from NSERC (RGPIN-2023-04724) and Manitoba Hydro.

Financial support

This research has been supported by the Iowa Flood Center, MID-American Transportation Center (grant no. 69A3551747107), the Iowa Highway Research Board, and Iowa Department of Transportation (contract number: TR-699). Ricardo Mantilla also received funding from current funding from NSERC (grant no. RGPIN-2023-15 04724) and Manitoba Hydro.

Review statement

This paper was edited by Yongping Wei and reviewed by Keith Beven and Warrick Dawes.


Ahmed, M. I., Stadnyk, T., Pietroniro, A., Awoye, H., Bajracharya, A., Mai, J., Tolson, B. A., Shen, H., Craig, J. R., Gervais, M., Sagan, K., Wruth, S., Koenig, K., Lilhare, R., Déry, S. J., Pokorny, S., Venema, H., Muhammad, A., and Taheri, M.: Learning from hydrological models' challenges: A case study from the Nelson basin model intercomparison project, J. Hydrol., 623, 129820,, 2023. 

Akter, T., Quevauviller, P., Eisenreich, S. J., and Vaes, G.: Impacts of climate and land use changes on flood risk management for the Schijn River, Belgium, Environ. Sci. Policy, 89, 163–175,, 2018. 

Arnell, N. W. and Lloyd-Hughes, B.: The global-scale impacts of climate change on water resources and flooding under new climate and socio-economic scenarios, Climatic Change, 122, 127–140,, 2014. 

Barnett, T. P., Pierce, D. W., Hidalgo, H. G., Bonfils, C., Santer, B. D., Das, T., Bala, G., Wood, A. W., Nozawa, T., Mirin, A. A., Cayan, D. R., and Dettinger, M. D.: Human-Induced Changes United States, Science, 319, 1080–1083,, 2008. 

Bayazit, M.: Nonstationarity of Hydrological Records and Recent Trends in Trend Analysis: A State-of-the-art Review, Environ. Process., 2, 527–542,, 2015. 

Bérubé, S., Brissette, F., and Arsenault, R.: Optimal Hydrological Model Calibration Strategy for Climate Change Impact Studies, J. Hydrol. Eng., 27, 1–13,, 2022. 

Beven, K.: On the generalized kinematic routing method, Water Resour. Res., 15, 1238–1242,, 1979. 

Beven, K.: Changing ideas in hydrology – The case of physically-based models, J. Hydrol., 105, 157–172,, 1989. 

Beven, K.: A manifesto for the equifinality thesis, J. Hydrol., 320, 18–36,, 2006. 

Beven, K. and Wood, E. F.: Flow routing and the hydrological response of channel Network, in: Channel Network Hydrology, edited by: Beven, K. and Kirkby, M. J., John Wiley & Sons, Chichester, 99–128, ISBN 10 0471935344, ISBN 13 978-0471935346, 1993. 

Beven, K. and Lane, S.: Invalidation of Models and Fitness-for-Purpose: A Rejectionist Approach, in: Computer Simulation Validation. Simulation Foundations, Methods and Applications, edited by: Beisbart, C. and Saam, N., Springer, Cham,, 2019. 

Blöschl, G., Hall, J., Viglione, A., Perdigão, R. A. P., Parajka, J., Merz, B., Lun, D., Arheimer, B., Aronica, G. T., Bilibashi, A., Boháč, M., Bonacci, O., Borga, M., Čanjevac, I., Castellarin, A., Chirico, G. B., Claps, P., Frolova, N., Ganora, D., and Živković, N.: Changing climate both increases and decreases European river floods, Nature, 573, 108–111,, 2019. 

Cancelliere, A.: Non Stationary Analysis of Extreme Events, Water Resour. Manage., 31, 3097–3110,, 2017. 

Condon, L. E., Gangopadhyay, S., and Pruitt, T.: Climate change and non-stationary flood risk for the upper Truckee River basin, Hydrol. Earth Syst. Sci., 19, 159–175,, 2015. 

Dawdy, D. R. and O'Donnell, T.: Mathematical Models of Catchment Behavior, Am. Soc. Civil Engineers Proc., 91, 123–137,, 1965. 

de Boer-Euser, T., Bouaziz, L., De Niel, J., Brauer, C., Dewals, B., Drogue, G., Fenicia, F., Grelier, B., Nossent, J., Pereira, F., Savenije, H., Thirel, G., and Willems, P.: Looking beyond general metrics for model comparison – lessons from an international model intercomparison study, Hydrol. Earth Syst. Sci., 21, 423–440,, 2017. 

Fonley, M. R., Qiu, K., Velásquez, N., Haut, N. K., and Mantilla, R.: Development and Evaluation of an ODE representation of 3D subsurface tile drainage flow using the HLM flood forecasting system, Water Resour. Res., 57, e2020WR028177,, 2021. 

Gudmundsson, L., Boulange, J., Do, H. X., Gosling, S. N., Grillakis, M. G., Koutroulis, A. G., Leonard, M., Liu, J., Schmied, H. M., Papadimitriou, L., Pokhrel, Y., Seneviratne, S. I., Satoh, Y., Thiery, W., Westra, S., Zhang, X., and Zhao, F.: Globally observed trends in mean and extreme river flow attributed to climate change, Science, 371, 1159–1162,, 2021. 

Gupta, H. V., Beven, K. J., and Wagener, T.: Model Calibration and Uncertainty Estimation, in: Encyclopedia of Hydrological Sciences, edited by: Anderson, M. G. and McDonnell, J. J.,, 2006. 

Gupta, V. K. and Waymire, E. C.: Spatial variability and scale invariance in hydrologic regionalization, in: Scale Dependence and Scale Invariance, Hydrology, edited by: Sposito, G., Cambridge University Press, 88–135,, 1998. 

Hidalgo, H. G., Das, T., Dettinger, M. D., Cayan, D. R., Pierce, D. W., Barnett, T. P., Bala, G., Mirin, A., Wood, A. W., Bonfils, C., Santer, B. D., and Nozawa, T.: Detection and attribution of streamflow timing changes to climate change in the Western United States, J. Climate, 22, 3838–3855,, 2009. 

Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Yamazaki, D., Watanabe, S., Kim, H., and Kanae, S.: Global flood risk under climate change, Nature Climate Change, 3, 816–821,, 2013. 

Huang, S., Kumar, R., Flörke, M., Yang, T., Hundecha, Y., Kraft, P., Gao, C., Gelfan, A., Liersch, S., Lobanova, A., Strauch, M., van Ogtrop, F., Reinhardt, J., Haberlandt, U., and Krysanova, V.: Evaluation of an ensemble of regional hydrological models in 12 large-scale river basins worldwide, Climatic Change, 141, 381–397,, 2017. 

Júnior, J. L. S., Tomasella, J., and Rodriguez, D. A.: Impacts of future climatic and land cover changes on the hydrological regime of the Madeira River basin, Climatic Change, 129, 117–129,, 2015. 

Kang, N., Kim, S., Kim, Y., Noh, H., Hong, S. J., and Kim, H. S.: Urban drainage system improvement for climate change adaptation, Water Switzerland, 8, 268,, 2016. 

Kirchner, J. W.: Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology, Water Resour. Res., 42, W03S04,, 2006. 

Kirkby, M. J.: Tests of the random network model, and its application to basin hydrology, Earth Surf. Process., 1, 197–212,, 1976. 

Kourtis, I. M. and Tsihrintzis, V. A.: Adaptation of urban drainage networks to climate change: A review, Sci. Total Environ., 771, 145431,, 2021. 

Mai, J., Shen, H., Tolson, B. A., Gaborit, É., Arsenault, R., Craig, J. R., Fortin, V., Fry, L. M., Gauch, M., Klotz, D., Kratzert, F., O'Brien, N., Princz, D. G., Rasiya Koya, S., Roy, T., Seglenieks, F., Shrestha, N. K., Temgoua, A. G. T., Vionnet, V., and Waddell, J. W.: The Great Lakes Runoff Intercomparison Project Phase 4: the Great Lakes (GRIP-GL), Hydrol. Earth Syst. Sci., 26, 3537–3572,, 2022. 

Madsen, T., Kristie F., and Hogue, T.: Evaluation of a Distributed Streamflow Forecast Model at Multiple Watershed Scales, Water, 12, 1279,, 2020. 

Mantilla, R.: Physical basis of statistical scaling in peak flows and stream flow hydrographs for topologic and spatially embedded random self-similar channel networks, PhD thesis, University of Colorado, Boulder, CO, 2007. 

Mantilla, R., Gupta, V. K., and Mesa, O. J.: Role of coupled flow dynamics and real network structures on Hortonian scaling of peak flows, J. Hydrol., 322, 155–167,, 2006. 

Mantilla, R., Krajewski, W. F., Velásquez, N., Small, S. J., Ayalew, T. B., Quintero, F., Jadidoleslam, N., and Fonley, M.: The Hydrological Hillslope-Link Model for Space-Time Prediction of Streamflow: Insights and Applications at the Iowa Flood Center, in: Extreme Weather Forecasting, Elsevier, ISBN 978-0-12-820124-4,, 2022. 

Mantilla, R., Fonley, M., and Velásquez, N.: Data for: Technical Note: Testing the Connection Between Hillslope Scale Runoff Fluctuations and Streamflow Hydrographs at the Outlet of Large River Basins, Zenodo [data set],, 2022. 

Milly, P. C. D., Betancourt, J., Falkenmark, M., Hirsch, R. M., Kundzewicz, Z. W., Lettenmaier, D. P., and Stouffer, R. J.: Stationarity is dead: Whither water management?, Science, 319, 573–574,, 2008. 

Peng, Y., Wang, Q., Wang, H., Lin, Y., Song, J., Cui, T., and Fan, M.: Does landscape pattern influence the intensity of drought and flood?, Ecol. Indic., 103, 173–181,, 2019. 

Quintero, F., Mantilla, R., Anderson, C., Claman, D., and Krajewski, W.F.: Assessment of changes in flood frequency due to the effects of climate change: Implications for engineering design, Hydrology, 5, 19,, 2018. 

Refsgaard, J. C.: Parameterisation, calibration and validation of distributed hydrological models, J. Hydrol., 198, 69–97,, 1997. 

Remmers, J. O. E., Teuling, A. J., and Melsen, L. A.: Can model structure families be inferred from model output?, Environ. Model. Softw., 133, 104817,, 2020. 

Sadeghi Loyeh, N. and Massah Bavani, A.: Daily maximum runoff frequency analysis under non-stationary conditions due to climate change in the future period: Case study Ghareh Sou basin, J. Water Clim. Change, 12, 1910–1929,, 2021. 

Salas, J. D., Obeysekera, J., and Vogel, R. M.: Techniques for assessing water infrastructure for nonstationary extreme events: a review, Hydrolog. Sci. J., 63, 325–352,, 2018. 

Schilling, K. E. and Helmers, M.: Effects of subsurface drainage tiles on streamflow in Iowa agricultural watersheds: Exploratory hydrograph analysis, Hydrol. Process., 22, 4497–4506,, 2008. 

Schilling, K. E., Jha, M. K., Zhang, Y. K., Gassman, P. W., and Wolter, C. F.: Impact of land use and land cover change on the water balance of a large agricultural watershed: Historical effects and future directions, Water Resour. Res., 45, 1–12,, 2008. 

Schilling, K. E., Gassman, P. W., Arenas-Amado, A., Jones, C. S., and Arnold, J.: Quantifying the contribution of tile drainage to basin-scale water yield using analytical and numerical models, Sci. Total Environ., 657, 297–309,, 2019. 

Seo, B., Krajewski, W. F., Quintero, F., Buan, S., and Connelly, B.: Assessment of Streamflow Predictions Generated Using Multimodel and Multiprecipitation Product Forcing, J. Hydrometeor., 22, 2275–2290,, 2021. 

Seo, B., Quintero, F., and Krajewski, W. F.: Hydrologic Assessment of IMERG Products Across Spatial Scales Over Iowa, J. Hydrometeorol., 24, 997–1015,, 2023. 

Shen, H., Tolson, B. A., and Mai, J.: Time to Update the Split-Sample Approach in Hydrological Model Calibration, Water Resour. Res., 58, 1–26., 2022. 

Singh, V. P.: Regional Hydraulic Geometry, In: Handbook of Hydraulic Geometry: Theories and Advances, Cambridge University Press, Cambridge, 529–554,, 2022. 

Snell, J. D. and Sivapalan, M.: On geomorphological dispersion in natural catchments and the geomorphological unit hydrograph, Water Resour. Res., 30, 2311–2323,, 1994. 

Surkan, A. J.: Synthetic Hydrographs: Effects of Network Geometry, Water Resour. Res., 5, 112–128,, 1969. 

Taye, M. T., Willems, P., and Block, P.: Implications of climate change on hydrological extremes in the Blue Nile basin: A review, J. Hydrol. Reg. Stud., 4, 280–293,, 2015.  

Tijerina, D., Condon, L., FitzGerald, K., Dugger, A., O'Neill, M. M., Sampson, K., Gochis, D., and Maxwell, R.: Continental Hydrologic Intercomparison Project, Phase 1: A Large-Scale Hydrologic Model Comparison Over the Continental United States, Water Resour. Res., 57, 1–27,, 2021. 

Velásquez, N. and Mantilla, R.: Limits of predictability of a global self-similar routing model in a local self-similar environment, Atmosphere, 11, 791,, 2020. 

Velásquez, N., Mantilla, R., Krajewski, W.F., Fonley, M., and Quintero, F.: Improving Hillslope Link Model Performance from Non-Linear Representation of Natural and Artificially Drained Subsurface Flows, Hydrol., 8, 187,, 2021. 

Vetter, T., Huang, S., Aich, V., Yang, T., Wang, X., Krysanova, V., and Hattermann, F.: Multi-model climate impact assessment and intercomparison for three large-scale river basins on three continents, Earth Syst. Dynam., 6, 17–43,, 2015. 

Whitehead, P. G., Jin, L., Macadam, I., Janes, T., Sarkar, S., Rodda, H. J. E., Sinha, R., and Nicholls, R. J.: Modelling impacts of climate change and socio-economic change on the Ganga, Brahmaputra, Meghna, Hooghly and Mahanadi river systems in India and Bangladesh, Sci. Total Environ., 636, 1362–1372,, 2018. 

Zarlenga, A., Fiori, A., and Cvetkovic, V.: On the interplay between hillslope and drainage network flow dynamics in the catchment travel time distribution, Hydrol. Process., 36, e14530,, 2022. 

Short summary
Hydrologists strive to “Be right for the right reasons” when modeling the hydrologic cycle; however, the datasets available to validate hydrological models are sparse, and in many cases, they comprise streamflow observations at the outlets of large catchments. In this work, we show that matching streamflow observations at the outlet of a large basin is not a reliable indicator of a correct description of the small-scale runoff processes.