Interactive comment on “ Large-sample hydrology : a need to balance depth with breadth ”

A holy grail of hydrology is to understand catchment processes well enough that models can provide detailed simulations across a variety of hydrologic settings at multiple spatiotemporal scales, and under changing environmental conditions. Clearly, this cannot be achieved only through intensive place-based investigation at a small number of heavily instrumented catchments, or by empirical methods that do not fully exploit our understanding of hydrology. In this opinion paper, we discuss the need to actively promote and pursue the use of a "large catchment sample" approach to modeling the rainfall–runoff process, thereby balancing depth with breadth. We examine the history of such investigations, discuss the benefits (improved process understanding resulting in robustness of prediction at ungauged locations and under change), examine some practical challenges to implementation and, finally, provide perspectives on issues that need to be taken into account as we move forward. Ultimately, our objective is to provoke further discussion and participation, and to promote a potentially important theme for the upcoming Scientific Decade of the International Association of Hydrological Sciences entitled Panta Rhei.


Introduction
"Because almost any model with sufficient free parameters can yield good results when applied to a short sample from a single catchment, effective testing requires that models be tried on many catchments of widely differing characteristics, and that each trial cover a period of many years."(Linsley, 1982).

Motivations for developing large-sample hydrology
A holy grail of hydrological science is to achieve a degree of process understanding that enables construction of models that are capable of providing detailed and physically realistic simulations across a variety of different hydrologic environments, and at multiple spatial and temporal scales (Nash and Sutcliffe, 1970;Klemeš, 1986a;Michel et al., 2006).With its focus on reducing predictive uncertainty, the Prediction in Ungauged Basins (PUB) initiative of the International Association of Hydrological Sciences (IAHS) has helped move the culture of hydrologic science closer to this objective, and away from a reliance upon universal models applied via recalibration at each study location (Hrachowitz et al., 2013).
This move has deep roots (see Linsley, 1982), but has become considerably stronger since the 1999 IAHS meeting in Birmingham where the idea was extensively discussed.It has helped drive the search for improved understanding of the hydrological cycle, and for modeling approaches that a. achieve the three R's (reliability, robustness and realism); b. have greater generality and transposability; and for which c. the parameters can be more easily specified from data.
Published by Copernicus Publications on behalf of the European Geosciences Union.
Clearly, this cannot be achieved only through detailed studies at heavily instrumented catchments -although such studies are of critical importance.Nor can it be achieved by simple regionalization methods based primarily in statistical approaches and/or spatial extrapolation rather than improved understanding of hydrologic behavior.What is needed is to take further advantage of the extensive data sets now available (and becoming available) to make common a largesample approach to hydrological investigation (Andréassian et al., 2006).While this emphasis has long been the focus of what is known as comparative hydrology (e.g., Kovacs, 1984;Falkenmark and Chapman, 1989;Thompson et al., 2011;Blöschl et al., 2013; see also Sect.5.2), it is gratifying to see that more and more recent investigations have recognized the value of adopting this approach (see numerous examples cited in this paper).

The context of current practice
The context of much current hydrological practice is a focus on depth rather than breadth (with several notable exceptions), wherein detailed process investigation and model development/refinement are conducted at only one or a limited number of catchments.The typical goal is to (a) learn more about a specific catchment by improving upon some prior concept, or (b) establish a basis for prediction and decisionmaking at that specific location.This might be called placebased learning.By contrast, the scientific aspiration is to generalize from the study of specific cases, so that we can discover and establish general hydrological principles, thereby advancing hydrological understanding.
Over the past several decades, the hydrology community has developed a number of generic model codes or model development frameworks (multi-hypothesis environments), that can be applied to a given catchment study (see discussion in Clark et al., 2011 andGupta et al., 2012).These codes make it possible to select a specific off-the-shelf model structure, after which one need only specify values for the parameters.If model performance is inadequate, attempts can be made to diagnose model deficiencies (Fig. 1) and find ways to correct/improve the model/hypothesis (Gupta et al., 2008;McMillan et al., 2011;Euser et al., 2013).The community has also made available a large number of data sets that can be used to test such codes.
However, attempts at such generalization are fraught with difficulties.Faced with the tremendous geo-eco-hydroclimatic variability of environments across the world, attempts based on generic models are typically complicated by considerable noise in the individual results (e.g., Oudin et al., 2010;Savenije, 2009) -these being due to unresolved model identification issues arising from a combination of inadequate data (insufficient information), data noise, model structural inadequacy, and weak model identification techniques (e.g., see Gupta et al., 1998Gupta et al., , 2008Gupta et al., , 2012)).This is particularly true as both model realism and spatiotemporal resolution are increased.
On the one hand, such difficulties have contributed to the counter notion of uniqueness of place (Beven, 2000), and that the model structure should adapt to reflect spatial differences in the dominant hydrologic processes.On the other hand, global-scale hydrological studies typically use only a single conceptual structure to represent all locations around the world, while attempting to account for the place-to-place differences entirely through the specification of model parameters (representing differences in soil and vegetation properties) while largely ignoring the spatial variability in dominant hydrological processes (Dirmeyer et al., 2006).

Purpose of the paper
The purpose of this opinion paper is to encourage a greater focus on large catchment sample type studies in hydrology, to complement the approach of intensive place-based investigation.Since the use of large samples in hydrology is certainly not new, we provide some historical perspective (but do not attempt a comprehensive review), motivating the need for more such studies, illuminating some of the challenges, and examining issues related to their design and implementation.To be clear, we do not suggest a reduction in efforts dealing with detailed catchment studies; both kinds of investigations (including ones in the middle ground) are necessary.Our objective is to provoke further discussion and participation, and to promote an important theme for the upcoming IAHS scientific decade (Panta Rhei; Montanari et al., 2013

Early attempts at large-sample studies
The issue of using large numbers of catchments for hydrological investigation is not new; early attempts go back more than thirty years.However, practical reasons -including limitations in data access, computing requirements or the ability to collaborate efficiently -made such efforts difficult.Further, there was a common belief that models developed in one location could not be readily applied elsewhere.Consequently, it was difficult to really know the respective merits and usefulness of any existing model.
In 1967, the World Meteorological Organization (WMO) launched an initiative to develop an inventory of models, along with advice to users regarding their accuracy under various hydro-climatic conditions.Not surprisingly, it was quickly deemed useful to carry out an actual model intercomparison study and, in 1973, ten simulation models from seven countries were applied to a set of six catchments (from the USA, the former Soviet Union, Australia, Japan, Cameroon and Thailand) that represent a variety of hydroclimatic conditions (WMO, 1975;Sittner, 1976).Although the investigation did not arrive at definitive conclusions regarding the merits of the models tested, it drew attention to the need for a deeper and wider evaluation of models.At that time, a small number of other model inter-comparison studies involving more than one catchment were also carried out; for example see Mein and Brown (1975) who compared three models on four catchments in Australia, and James (1972), Egbuniwe and Todd (1976) and Magette et al. (1976) for their work on the Stanford model using 2 to 16 catchments.From the 1970s onward, a significant number of studies have used large samples for the statistical investigation of floods (e.g., Hardison, 1974;Hosking et al., 1985;Lichty et al., 1990).
A few years later Linsley (1982) listed generality as one of four main properties desirable in hydrological models (along with accuracy, applicability and ease-of-use).Linsley argued that it was necessary to break out of the habitual practice of developing a different model for each catchment, because that eliminates the opportunity for learning what comes with repeated applications of the same model.He suggested the necessity for extensive testing of new models (see introductory quote) so that ones that are not sufficiently general can be eliminated, and stressed the usefulness of large-scale applications, saying the application of a model to many catchments results in many sets of parameters which can conceivably serve as a basis for objective determination of parameters from physical characteristics of the catchments.
Along the same line of thought, Klemeš (1986a) proposed a formal four-level testing scheme to evaluate transposability of a model in time and space.Despite the demanding nature of such testing, Klemeš (1986a) regarded this scheme as a minimum requirement, and stated that the "use of more test basins [. . .] would increase the credibility standing of a model, and an accumulation of test results may lead to meaningful generalizations".Bergström (1991) agreed, stating "growing confidence in hydrological modeling can be obtained by applying the model under a span of different geographical, climatological and geological conditions." These ideas, expressed by leading hydrologists, set the basis for studies involving large catchment samples, that have now begun to be more common.

Examples of relevant literature from a rainfall-runoff modeling perspective
While several modeling studies during the 1980s used more than one catchment (e.g., Weeks and Hebbert, 1980;Naef, 1981;Pirt and Bramley, 1985;Loague and Freeze, 1985;Weeks and Ashkanasy, 1985;WMO, 1986;Srikanthan and Goodspeed, 1988), the actual emergence of large-sample studies arguably occurred in the early 1990s, coinciding with a progressive increase in availability of computing power, and using time series that are sufficiently long to enable robust assessments.
To illustrate the growing interest in large-sample studies, a list of 94 published rainfall-runoff modeling studies that used more than 30 catchments is presented in the Supplement.This list is not intended to be a comprehensive or exhaustive list, but instead a rough overview, given that there are many other kinds of hydrological investigations (such as flood studies, hydro-ecological investigations, etc.) that are based in a large-sample approach, but these are not listed here.The sample sizes in our list range from 30 to 1508, with a median of 140.Early studies were mainly from Australia, France and Belgium, followed later by the UK, Austria and USA, etc. (see e.g., the reviews by Boughton, 2006 andMerz et al., 2006, for Australia and Austria respectively, and more comprehensive discussion in Blöschl et al., 2013).
The catchments in this list are from a variety of physical, climatic and hydrological conditions (see Fig. 2).While some studies focused on national data sets, others included catchments from several countries (although typically less than five).The studies focused on a range of spatial and temporal scales: catchment areas ranged from 1 to 130 000 km 2 and models were run at hourly, daily, monthly, annual and inter-annual time steps.The study goals included a variety of purposes, most commonly being related to Not surprisingly, most studies used conceptual-type rainfall-runoff (CRR) models rather than physically based models, probably due to their relative ease of implementation on large samples and lower data and computing requirements.In general, the reasons for using large catchment samples were 1. to arrive at conclusions that were more general than could be achieved using a single catchment (e.g., about the relative merits of various methods); 2. to establish range of applicability, or expected level of efficiency, of methods/models; or 3. to ensure sufficient information to enable statistically significant relationships to be established (e.g., between catchment descriptors and model parameters in regionalization studies).
In addition, several groups compiled large-sample data sets with the goal of facilitating collective efforts.The pioneering inter-comparison studies sponsored by the WMO (1975WMO ( , 1986WMO ( , 1992) ) involved several teams and data sets.The Model Parameter Experiment promoted by NOAA (MOPEX, www.nws.noaa.gov/oh/mopex/;see Schaake et al., 2001;Chahinian et al., 2006;Duan et al., 2006;Schaake and Duan, 2006, etc.) engaged several groups in the study of a sample of 438 catchments from the USA and 40 in France.Most recently, the Distributed Model Intercomparison Projects (DMIP-I and DMIP-II, Smith et al., 2004Smith et al., , 2012;;www.nws.noaa.gov/oh/hrl/dmip/)engaged several groups in a comparative assessment of spatially distributed hydrological models (and associated parameter estimation strategies), using comprehensive data sets from a large number of US catchments.
Likewise large-scale comparative studies such as the North American Land Data Assimilation System (NLDAS-I and NLDAS-II), the Project for Intercomparison of Landsurface Parameterization Schemes (PILPS), the Rhone-Aggregation Land Surface Scheme Intercomparison Project, and the Global Energy and Water Cycle Exchanges Project (GEWEX), have compiled a large array of hydrometeorological data sets (see Table 1).While these data sets were compiled for regional-scale land-surface models at a relatively large spatial resolution (1/8 • or larger), such data sets could provide a useful starting point for hydrological investigations over large and diverse spatial domains.

General benefits of large-sample hydrology
There are at least four clear benefits to studies that work with data from large numbers of catchments:

Improved understanding
First, large-sample studies provide better opportunities for improving hydrological science (Ehret et al., 2013) by facilitating rigorous testing of competing model structures and component hypothesis (Clark et al., 2011), and enabling better diagnosis of limitations, range of applicability and capabilities for extrapolation (Gupta et al., 2008;Martinez and Gupta, 2011;Coron et al., 2012).Recent applications of the comparative hydrology approach are an excellent example (Blöschl et al., 2013

Robustness of generalizations
Second is the possibility of bringing methods of statistical analysis to bear, so that statistical robustness can be achieved, and degrees of confidence can be established (Mathevet et al., 2006).In doing so, large samples can help to both reduce the impact of severe data errors (sometimes present in only a few of the catchments), and to identify and target outliers (unusual cases) for special attention (Andréassian et al., 2010).
In this context, it is important to not reject a data set just because the selected model fails to reproduce its behavior (Le Moine et al., 2007;Boldetti et al., 2010).

Classification, regionalization and model transfer
Third, large samples support and facilitate the development of catchment classification and regionalization systems that provide insights into hydrological behavior (a key motivation of the PUB initiative), thereby facilitating transfer of understanding to ungauged locations (Sivapalan, 2003;Götzinger and Bardossy, 2007;Oudin et al., 2010).Since, process dominance varies with climatic and physiographical characteristics of a catchment (terrain, soil, geology, etc.) and other factors, a satisfactory regionalization system (see important work by Winter, 2001;McDonnell and Woods, 2004;Wagener et al., 2007; among others) cannot be achieved without spatially extensive data sets that are representative of catchment types worldwide.Further, prior testing on large and diverse catchment sets paves the way for improved model transfer into operational use; as expressed by Bergström (1991) "the large number of applications have gradually built up our confidence in the use of these models to a degree where we can continue our operational applications and accept the models as the foundation for further model development."

Estimation of uncertainty
Fourth, large samples support and facilitate better understanding of how much uncertainty can be expected in model predictions given available knowledge (Andréassian et al., 2007), by making possible a statistical regionalization of uncertainty estimates (Skøien and Blöschl, 2007;Bourgin et al., 2013;Blöschl et al., 2013) and thereby indicating (a priori) how much prediction uncertainty can be expected at arbitrary locations (including ungauged ones).

Availability and quality of data sets having large numbers of catchments
Widespread availability of large-sample data sets is, of course, a key requirement for further progress.However, attempts to make (currently local) data sets available to the global community have run into problems related to economics and ownership.While hydrology data in the USA are mainly in the public domain and easily accessible to scientists worldwide, legal and economic barriers to exchange of data inhibit the wider spread of many national data sets (Fig. 3).While discharge data are increasingly more freely available, climate data are often less easily accessible in many countries.To better understand such barriers, Viglione et al. (2010) surveyed data providers and users from 32 European countries.The primary reason cited was economic, with public institutions responsible for data collection and administration facing growing financial pressure due to reductions in government funding (Freebairn and Zillman, 2002) 1 that could be useful for large-sample hydrological investigations.However, there is the important issue of adequate metadata (details regarding how the data were collected, and what can be done with them).While it is increasingly becoming standard practice to provide metadata, at least for large data sets (e.g., see the WMO/GRDC initiative www.bafg.de/GRDCand wis.wmo.int, and the work on metadata for NEXRAD by Kruger et al., 2011), such information is not generally available for catchment data sets.One reason is that it is generally more difficult to apply the same kind of rigorous quality checks that can be implemented at heavily instrumented catchments, and even something as basic as visual inspection can be too time-consuming when compiling samples of hundreds of catchments.
Finally, as pointed out by a reviewer we should mention that there is a considerable amount of relevant grey literature dealing with catchment data sets from applied sciences and operational practice (e.g., NOAA, 2007;Hughes, 2013;Uhlemann et al., 2013).We face a significant challenge in better exploiting and incorporating such data into scientific investigation.

Reporting and sharing protocols for data and models
A longstanding issue is, of course, the need for coherence in the way data and models are reported, stored and shared; this is currently done in a number of different ways depending on their nature and the purpose for which they were compiled.Data deemed to be of wider interest to the hydrologic community are now increasingly published in journals through data and analysis notes, along with metadata.Further, data collected by Hydro-meteorological Services, and other public agencies, are more easily accessible via the Internet, although typically with much less meta-information (Viglione et al., 2010).While considerable attention has been given to protocols for documenting and sharing data during the past several decades (Jones et al., 1979;Goodall et al., 2008;Viglione et al., 2010; see also https://www.wmo.int/pages/prog/gcos/Publications/gcos-96.pdf), the procedures for documenting and sharing models (computer codes) continue to remain extremely ad hoc.At the same time, protocols for the reporting of model performance are largely non-existent.As noted elsewhere: "As a community, we have fallen into reliance on measures and procedures for model performance evaluation that say little more than how good or bad the modelto-data comparison is in some 'average' sense" (Gupta et al., 2008).Consistent reporting of sets of more informative (than mean squared error or Nash-Sutcliffe efficiency) and properly benchmarked measures of model performance are necessary to better facilitate the generalization of findings from individual case studies (Mathevet et al., 2006;Perrin et al., 2006;Schaefli and Gupta, 2007, among others).
It is important to keep in mind that the primary purpose of reporting is to make the information useful to the recipient.The comparative assessment of Blöschl et al. (2013) reported that inconsistency in use and reporting of model performance hampered their investigation.Further, they had to actually approach the authors of the publications to ask for their data and model outputs.To effectively handle such problems, a data repository linked to the papers would have been extremely useful.

Identification of model structures and parameters
With growing availability of geo-spatial data sets and increases in computational power, there has been significant progress in the development of local-, regional-and continental-scale spatially distributed hydrologic models that simulate hydrological processes at relatively high resolution, and at points that are effectively ungauged (e.g., Carpenter and Georgakakos, 2004;Ivanov et al., 2004;Reed et al., 2004;Koren et al., 2004, Smith et al., 2012).However, while such models make possible detailed support for water resource management (e.g., Blöschl et al., 2008), more work needs to be done on identifiability and transferability of their structures and parameters (e.g., see Beven, 1989Beven, , 2002;;Grayson et al., 1992;Kirchner, 2006;Samaniego et al., 2010;Andréassian et al., 2012), and on quantifying predictive uncertainty (Beven and Freer, 2001;Doherty and Johnston, 2003;Pokhrel et al., 2008).To establish credibility, such models (and their sub-components) must be properly tested at large numbers of catchments encompassing a wide variety of land-surface and climatic conditions.

Overcoming barriers to sharing data
To state the obvious, large-sample hydrology requires large samples of relevant data sets.Most current studies of largesample catchment hydrology have focused on regional or national scales (e.g., Parajka et al., 2005Parajka et al., , 2007;;Oudin et al., 2008Oudin et al., , 2010;;Kumar et al., 2013) and there is a need to extend these to global scale.As computer power and data storage capabilities continue to increase, one might expect that data exchange will also increase.Nonetheless, the PUB synthesis effort (Blöschl et al., 2013), driven largely by the desire to conduct comparative studies, demonstrated clearly that it is indeed possible to compile and jointly analyze large data sets.
Clearly, it will be necessary for the community at large to continue to develop, and vigorously promote, specific policies designed to make hydrology-related data more easily and widely available (Beniston et al., 2012), and this will require deliberate efforts by a spectrum of organizations, including the International Association of Hydrological Sciences, and the Panta Rhei (Change in Hydrology and Society) initiative.At a more general level it may be necessary to bring about a policy change for data providers and data users.Governments and hydrological services need to be informed about the benefits of shared information for them, and more transparent protocols for the transfer of information may be needed to fully exploit the rich hydrologic data legacy that exists around the world (Viglione et al., 2010).By pointing beyond benchmarking, to the possibility of actually learning new things, we hope that hydrologists and institutions will be convinced of the need to find ways to overcome the economic and legal constraints to building and sharing reference data sets.
Further, as mentioned in Sect.4.2, easier access to the data and model outputs used in published work will be extremely useful (the Supporting Materials concept now used by several journals is a good step forward).It is common knowledge that the disparity in data characteristics (amount, type, quality, resolution, etc.) from region to region impedes our attempts to make progress in improving process understanding, refining models, improving predictions and reducing uncertainty.In this regard, progress will likely be much faster if large-sample studies include some discussion of the data requirements and availability specific to the various regions, and where and how the models are limited by data.Of course, there will always be regional issues, and issues with interpreting historical data sets.

Enhancing the link with comparative hydrology
Large-sample studies have traditionally focused on statistical analysis (e.g., to develop regression relationships for flood regionalization or to estimate and transfer model parameters to ungauged basins).With PUB, there has been a move beyond this to focus on the importance of investigating and demonstrating causal processes (Merz and Blöschl, 2008a, b;Carrillo et al., 2011;Troch et al., 2013), by seeking to understand the controls and macro-scale signatures that arise from their actions and interactions (Wagener and Montanari, 2011;Grauso et al., 2008;Sawicz et al., 2011Sawicz et al., , 2013)).
In this regard, the study of comparative hydrology (Falkenmark and Chapman, 1989) seeks to exploit knowledge regarding a much wider array of conditions and processes than can ever be possible with a single model structure, or by studies based on limited numbers of catchments, thereby providing a valuable complement to the detailed investigation of specific catchments.For example, the PUB Synthesis report (Blöschl et al., 2013) revealed significant trends of decreasing performance with increasing aridity, and increasing performance with increasing catchment size and data availability (see Parajka et al., 2013;Salinas et al., 2013), patterns than would have been impossible to detect in any other way.However, comparative studies of regionalization approaches have resulted in strikingly varied conclusions.One likely reason, though not the only one (see Oudin et al., 2008), may be the use of insufficiently large catchment samples, which would cause the conclusions to be overly dependent on differences in the characteristics represented by each data set.
In fact, large-sample hydrology provides an opportunity for both regionalization and comparative hydrology.While regionalization studies focus on improved hydrological predictions by making use of data from neighboring catchments, in comparative hydrology the focus is on contrasting catchments from different regions to develop generalized understanding about the causes and controls that give rise to these differences.It is noteworthy that the PUB decade of IAHS has seen significant progress in both regionalization approaches and in comparative hydrology as summarized by Blöschl et al. (2013).
Of course, several key issues will need deeper investigation.For example, no clarity has yet emerged regarding which (observable) catchment characteristics are hydrologically relevant to catchment classification, as required for comparative hydrology.Further, how process dominance (and complexity) varies with environment and scale (Skøien et al., 2003), and the role of thresholds, are not well understood.These make it difficult to properly link catchment types to the selection of appropriate process representation (thereby determining model structures).Importantly, classification could help in identifying typical (i.e., normal or representative) catchments to be targeted for intensive investigation, thereby providing a stronger basis for catchment observatory initiatives such as CUAHSI.Meanwhile, the contrast with typical or normal ranges for catchments in a given class will facilitate the detection of outlier catchments similarly deserving of targeted special attention.

Achieving good performance for the right reasons
Clearly, models structures used in any large-sample investigation should be representative of the diversity of hydrological processes present, so that good performance achieved is for the right reasons (Klemeš, 1986b;Kirchner, 2006;Martinez andGupta, 2010, 2011).Moreover, the tendency to universally apply a fixed set of assumptions regarding driving mechanisms and process structure can sometimes miss the more important processes in a particular catchment (Savenije, 2009;Halihan et al., 2009;Blöschl et al., 2013).
One obvious approach to model development would be to begin with a highly complex representation and progressively simplify the structure applied to each catchment to achieve a justifiably parsimonious representation.However, as expressed by Bergström (1991), "Going from complex to simpler model structures requires an open mind, because it is frustrating to have to abandon seemingly elegant concepts and theories.It is normally much more stimulating, from an academic point of view, to show significant improvement of Hydrol.Earth Syst.Sci., 18, 463-477, 2014 www.hydrol-earth-syst-sci.net/18/463/2014/ the model performance by increasing complexity."Following the latter approach, Nash and Sutcliffe (1970) had presented a strategy for model development from the simple to the complex "which may help to avoid this frustration" (Bergström, 1991).This allows one to begin with simple assumptions regarding process structure, identify catchments where these assumptions result in poor performance, and then progressively introduce appropriate complexity justified by diagnostic tests and other evidence.In this regard, the repeated use of a universal model structure as a starting point can aid in the development of the experience necessary to diagnose what does, and does not, work at a specific location.
Ultimately, the goal of working with large catchment sets is to better understand the hydrological cycle.Through a process of designing improved local models, and by understanding the relationships between functional behavior (expressed through model structure) and observable characteristics of the catchment, we should progress towards better comprehension of catchments as systems.

Estimating model parameters
Structure selection does not, of course, resolve the problem of parameter specification, which is made more complex by the spatial heterogeneities involved in large-sample studies.Fortunately, progress has been made by recognizing that parameter values at specific locations within spatial fields can be linked to observable basin physical characteristics (soil texture, vegetation, topography, etc.) that display strong patterns of organization in space (Abdulla and Lettenmaier, 1997;Fernandez et al., 2000;Hundecha and Bardossy, 2004;Pokhrel et al., 2008;Hundecha et al., 2008;Samaniego et al., 2010;Kumar et al., 2013;among others).For example, some studies have pursued an approach of developing regional relationships that map observable catchment characteristics onto what are sometimes called a priori parameter estimates (Koren et al., 2000;Leavesley et al., 2003;Blöschl, 2005).An important benefit of this approach is to regularize the optimization problem, limiting the degrees of freedom to a small number of regional transfer function coefficients called global-, super-, or hyper-parameters (see Pokhrel et al., 2008;Samaniego et al., 2010).
To obtain robust estimates for these coefficients, the model must be run at a large number of catchments, and across diverse hydro-climatic conditions, so that hypothesis tests about the structures of the regional transfer functions can be conducted.This approach modifies the modeling problem in an interesting way, as one must now evaluate an augmented model hypothesis consisting of both (a) the catchment model structure relating system inputs to state variables and outputs, and (b) the regional transfer function structure and coefficients relating catchment properties to model parameters.
Although the complexity of the hypothesis under investigation is increased, this approach provides statistical advantages by (i) accounting for random variability in catchment properties that tends to diminish their correlation with lumped catchment-scale properties (Kling and Gupta, 2009); (ii) reducing the biasing effects of data noise due to the damping effects of larger sample sizes; and (iii) improving identifiability due to increased diversity of hydro-climatic conditions covered.The recent multi-scale parameter regionalization approach (Samaniego et al., 2010;Kumar et al., 2013) suggests that the approach is robust, and can facilitate the transfer of model hypotheses across spatial domains.Further strength could be achieved by constraining the model using information about hydrological dynamics provided by soil moisture, snow cover, etc. (Parajka and Blöschl, 2008;Parajka et al., 2009).

Testing hypotheses and assessing reliability
Ultimately, for models to be demonstrably robust, they must pass the kind of crash testing proposed by Linsley (1982), Klemeš (1986a), Andréassian et al. (2009), andCoron et al. (2012), among others.While large-sample studies offer a unique opportunity to test hypotheses that are based in process understanding, the aggregate performance indices commonly used in hydrology for performance assessment are poorly benchmarked (Legates and McCabe, 1999;Seibert, 2001;Schaefli and Gupta, 2007;Parajka et al., 2013) and not easily related to catchment properties and functioning (Gupta et al., 2008).They are, therefore, of limited usefulness in large-sample assessments.A key to better use of large-sample studies is to find ways to relate model performance and predictive uncertainty (and therefore model adequacy) to catchment structure and function, thereby providing insight into which processes the model is incapable of describing well (Fig. 1) and moving beyond the commonplace approach of tuning the model to compensate for model structural adequacy problems (Gupta et al., 2012 In this regard, it will be helpful to shift the focus of model evaluation away from data fitting towards an emphasis on reproduction of diagnostic signatures (Fig. 4) -a move away from hydrograph mimicry towards model fidelity (Vogel and Sankarasubramanian, 2003;Yadav et al., 2007;Gupta et al., 2008;Blöschl et al., 2008;Martinez and Gupta, 2011;Clark et al., 2011;Koster and Mahanama, 2012;Castiglioni et al., 2010;etc.)-the idea being to make it harder to win the game through calibration (in the sense of model tuning) and to instead seek answers that are correct for the right reasons (Kirchner, 2006).Therefore, further progress in diagnostic analysis (Yilmaz et al., 2008;Martinez andGupta, 2010, 2011;McMillan et al., 2011), improved understanding of physical land-surface constraints (Koster and Mahanama, 2012), and improved assessments of model structural adequacy (Gupta et al., 2012) and uncertainty (Montanari et al., 2009;Montanari, 2011;Montanari and Koutsoyiannis, 2012) are required.All of these areas depend on increased analytical sophistication, and will undoubtedly benefit from the breadth of information contained in large-sample data sets.

Coping with variability and change
The non-stationarities underlying changing climate and catchment conditions make the problem of estimating hydrological fluxes and predicting catchment response more difficult.In such cases, it becomes important to distinguish between situations that are predictable from those that are not (Blöschl and Montanari, 2010;Kumar, 2011).It is now common to use scenario analysis as a way of assessing the impacts of future change on hydrological response (Mahmoud et al., 2009(Mahmoud et al., , 2011; among many others).However, investigation using large-sample data sets can facilitate a much wider range of analyses.
For example, large-sample data sets were used by Merz et al. (2011) and Coron et al. (2012) to investigate changes in estimated model parameters associated with changes in climate, and by Ter Braak and Prentice (1988) to use spatial gradients as a surrogate for temporal gradients in investigations of change.Similarly, Peel and Blöschl (2011) suggested that changing climate could cause hydrological processes in one catchment to become similar to those in other catchments currently experiencing conditions similar to the target climate, thereby providing a basis for understanding the effects of change.Ultimately, these analyses must exploit the information provided by data other than runoff, including hydrogeological information, soil moisture from remote sensing products (Parajka et al., 2006), snow characteristics (e.g., Blöschl andKirnbauer, 1991, 1992), etc.

Trading depth of analysis for sample size
Finally, while the benefits of large-sample hydrology are many and important, they come at a cost.When large numbers of catchments are analyzed, all procedures for selecting model structures and estimating parameters must be automated.This makes it difficult to attend to clues that might otherwise be provided by the assessment of local knowledge (whether hard or soft data), or by process understanding that can be gained from field trips.In this regard, soft information on catchment functioning can be as valuable as hard data for improving understanding about catchment functioning (Seibert and McDonnell, 2002).Ultimately, there is no free lunch, and investigations that exploit the benefits of large catchment sample sizes (providing breadth) must be complemented by detailed investigations at specifically targeted catchments (providing depth).

In conclusion
This paper has discussed the need to actively promote and pursue the use of a large catchment sample approach to modeling the rainfall-runoff process, thereby balancing depth with breadth.In either case, the need to understand hydrological change will require that large-sample investigations be guided by process understanding, with statistical analysis playing the important supporting roles of (a) capturing the summary effects of various controls, including feedbacks across processes and scales, and (b) detecting spatial and temporal patterns that will need to be properly explained.
Only then can we expect to achieve improved predictability and the ability to extrapolate to new situations (Kumar, 2011).
We hope that this paper serves the function of provoking further discussion, and will promote what could be an important theme for Panta Rhei, the upcoming IAHS Scientific Decade.In this regard, as suggested by reviewer Sopan Patil, we end with a call for increasing the diversity of efforts in large-sample hydrology (beyond the focus on catchment modeling taken by this paper).Edited by: M. Sivapalan

Fig. 1 .
Fig. 1.How an approach based in evaluation of signature properties can be used to detect and diagnose model deficiencies and develop appropriate ways to improve the model/hypothesis (figure based on ideas presented in Gupta et al., 2008).

Fig. 2 .
Fig. 2.Map showing locations of catchments reported in the Supplement.Size of symbol is proportional to numbers of catchments from the corresponding country.

Fig. 3 .
Fig. 3. Shows perceptions regarding economic barriers to the availability of hydrometeorological data in Europe.Survey results are stratified by data providers (dark grey) and data users (light grey); western Europe (left subplots) and eastern Europe (right subplots); and type of data (bars indicating streamflow, precipitation, radar, geospatial, and others).Yes responses indicate that economic barriers are perceived to exist, whereas No indicates that economic barriers are perceived not to exist (reproduced from Viglione et al., 2010).

Fig. 4 .
Fig. 4. Classical versus Diagnostic model evaluation.The classical approach compares model simulations directly with the collected data.In the diagnostic approach, patterns in the model simulations are compared with corresponding patterns in the data (figure based on ideas from Gupta et al., 2008).
this article is available online at http://www.hydrol-earth-syst-sci.net/ 18/463/2014/hess-18-463-2014-supplement.pdf.Acknowledgements.This paper was partially developed in the context of the Panta Rhei research initiative of the International Association of Hydrological Sciences.We wish to express our appreciation to the Editor, Murugesu Sivapalan, and the several reviewers, Bethanna Jackson, Hillary McMillan, Dingbao Wang, Evan Coopersmith, Sopan Patil, Manfred Ostrowski and Thorsten Wagener, plus one additional anonymous reviewer, for their detailed and constructive comments.Their many remarks and suggestions helped to streamline and improve the presentation of our opinion paper.The first author acknowledges partial support by a grant from the Co-operative Research Programme (Trade and Agriculture) of the Organization for European Co-Operation and Development (OECD), and from the Australian Research Council Hydrol.Earth Syst.Sci., 18, 463-477, 2014 www.hydrol-earth-syst-sci.net/18/463/2014/through the Centre of Excellence for Climate System Science (grant number CE110001028).The third author acknowledges funding from the Austrian Science Fund (projects W1219-N22 and P 23723-N21).Olivier Delaigue is thanked for providing Fig. 2.

Table 1 .
Examples of data sets potentially useful for large-sample hydrological investigations.