TopREML: a topological restricted maximum likelihood approach to regionalize trended runoff signatures in stream networks

. We introduce topological restricted maximum likelihood (TopREML) as a method to predict runoff signatures in ungauged basins. The approach is based on the use of linear mixed models with spatially correlated random effects. The nested nature of streamﬂow networks is taken into account by using water balance considerations to constrain the covariance structure of runoff and to account for the stronger spatial correlation between ﬂow-connected basins. The restricted maximum likelihood (REML) framework generates the best linear unbiased predictor (BLUP) of both the predicted variable and the associated prediction uncertainty, even when incorporating observable covariates into the model. The method was successfully tested in cross-validation analyses on mean streamﬂow and runoff frequency in Nepal (sparsely gauged) and Austria (densely gauged), where it matched the performance of comparable methods in the prediction of the considered runoff signature, while signiﬁcantly outperforming them in the prediction of the associated modeling uncertainty. The ability of TopREML to combine deterministic and stochastic information to generate BLUPs of the prediction variable and its uncertainty makes it a particularly versatile method that can readily be applied in both densely gauged basins, where it takes advantage of spatial covariance information, and data-scarce regions, where it can rely on covariates, which are


Introduction
Regionalizing runoff and streamflow for the purposes of making predictions in ungauged basins (PUB) continues to be one of the major contemporary challenges in hydrology.
At global, regional and local scales only a small fraction of catchments are monitored for streamflow , and this fraction is at risk of decreasing given the ongoing challenge of maintaining existing gauging stations (Stokstad, 1999). Reliable information about local streamflows is essential for the management of water resources, especially in the context of changing climate, ecosystem and demography; flow prediction uncertainties are bound to propagate and lead to significantly suboptimal design and management decisions (e.g., Sivapalan et al., 2014). Techniques are needed to maximize the use of available data in datascarce regions to accurately predict streamflow, while providing a reliable estimate of the related modeling uncertainty.

Linear models
There are a number of approaches to predicting runoff in ungauged catchments, including process-based modeling (e.g., Müller et al., 2014), graphical methods based on the construction of isolines (e.g., Bishop and Church, 1992), and statistical approaches. Statistical approaches are often implemented via linear regression, wherein the runoff signature of interest is considered to be an unobservable random variable correlated with observable features of both gauged and ungauged basins (e.g., rainfall, topography). Such linear models are well understood and widely implemented, not only for PUB (see review in Blöschl et al., 2013, p.83) but also across a wide variety of fields in the physical and social sciences (e.g., Myers, 1990).
Spatial correlation is generally problematic for linear model predictions, including the multiple regression approaches commonly used for regionalization. For example if these models predict a hydrologic outcome, y, using a matrix (1) Here τ is an a priori unknown set of weights that represent the influence of each external trend on the hydrological outcome being modeled. The residuals, η, are the observed variation of y that cannot be explained by a linear relation with X. If the residuals are independent and identically distributed (iid), the best linear unbiased predictions (BLUP) of both y and its uncertainty (i.e., Var(y)) can readily be obtained using ordinary least squares (OLS) regression. Unfortunately, residuals are rarely iid in hydrological applications due to the spatial organization of hydrological processes around the topology of river channel networks. This organization has the potential to introduce non-random spatial correlations with a structure imposed by the river network. To recover a suitable model in which residuals remain independent requires that the model structure be altered to explicitly account for the spatial and topological correlation in the residuals.

Spatial correlation models
There are several techniques available to address spatially correlated data. Within PUB, kriging-based geostatistical methods (Cressie, 1993) have been widely used (e.g., Huang and Yang, 1998;Gottschalk et al., 2006;Sauquet, 2006;Sauquet et al., 2000;Skøien et al., 2006). In a geostatistical framework, a parametric function is used to model the relationship between distance and covariance in observations. The ensuing semi-variogram is assumed to be homogenous in space, and predictions at a point are computed as a weighted sum of the available observations. The weights are chosen to minimize the variance while meeting a given constraint on the expected value of the prediction. In ordinary kriging for PUB applications, this given constraint is simply the average of the streamflow signature as observed in gauged catchments. Ordinary kriging can also be extended as "universal kriging" to include a linear combination of observable features (Olea, 1974). Kriging approaches are widely used to predict spatially distributed point-scale processes like soil properties (e.g., Goovaerts, 1999) and climatic features (e.g., Goovaerts, 2000). Although ordinary kriging has also been used to interpolate runoff (e.g., Huang and Yang, 1998), the theoretical justification for this approach is less robust than for point-scale processes. Runoff is organized around a topological network of stream channels, and the covariance structure implied for prediction should reflect the higher correlation between streamflow at watersheds that are "flow connected" (i.e., share one or more subcatchments), compared to unconnected but spatially proximate catchments. Currently, two broad classes of geostatistical methods accommodate this network-aligned correlation structure.
The first suite of methods posits the existence of an underlying point-scale process, which is assumed to have a spatial auto-correlation structure that allows kriging to be applied. Because the runoff point-scale process is only observed as a spatially integrated measure made at specific gauged locations along an organized network of streams, the spatial auto-correlation structure of the point-scale process cannot itself be observed. Block-kriging approaches (Gottschalk et al., 2006;Sauquet, 2006;Sauquet et al., 2000) infer the semi-variogram of the (unobserved) point scale so as to best reproduce the observed spatial correlation of the area-integrated runoff at the gauges -a procedure known as regularization. The topology of the network is implicitly accounted for by the fact that nested catchments have overlapping areas, which affect the relation between observed (area integrated) and modeled (point-scale) covariances. Yet, complex catchment shapes complicate the regularization of semi-variograms, meaning that the estimation of the pointscale process becomes analytically intractable and requires a trial-and-error approach in most practical applications (e.g., Top-kriging; Skøien et al. (2006)). Top-kriging is an extension of the block-kriging approach that accommodates non-stationary variables and short observation records. Topkriging provides an improved prediction method for hydrological variables when compared to ordinary kriging or linear regression techniques Viglione et al., 2013;Castiglioni et al., 2011) and was recently extended to account for deterministic trends . Topkriging represents an important advance for PUB, but it does have a few drawbacks: (i) the regularization process is unintuitive, and requires extensive trial-and-error to determine both the form of a suitable point-scale variogram, and its parameters; (ii) this trial-and-error process is likely to be computationally expensive; (iii) like all kriging techniques, the estimation of the variogram is challenging when accounting for observable features: the presence of an unknown trend coefficient and variogram leads to an underdetermined problem, making consistent estimates for both challenging. Cressie (1993, p. 166) showed that the presence of a trend tends to impose a spatially inhomogeneous, negative bias on the estimated semi-variogram. The bias increases quadratically with distance, meaning that estimates of the long-range variance (the sill) are strongly impacted by the presence of the trend, leading to an underestimation of the prediction uncertainty. This bias, however, only marginally affects the prediction itself.
Geomorphological considerations of the topology of a river network generally focus on the channels, and lead to an intuitive conceptualization that topological interpolation should focus on runoff correlations along flow paths. The second type of approach embraces this topological structure. It does not consider a point-scale runoff generation process, but instead models the hillslope-scale runoff delivery process to the channel network as a uni-dimensional directed tree (Cressie et al., 2006;Ver Hoef et al., 2006). Runoff correlation is expected to decrease with the distance along the stream following a known parametric function. However, un-M. F. Müller and S. E.Thompson: TopREML -runoff regionalization on stream networks 2927 like Euclidian distances, the streamwise distance does not have the necessary properties to provide a solvable kriging system. This issue is addressed in Cressie et al. (2006) andPeterson (2010), where streamflow is modeled as a random process represented by a Brownian motion that starts at the trunk of the tree (i.e., the river mouth) moves upstream, bifurcates and evolves independently on each branch. The resulting model only allows spatial dependence with points that are upstream on the river network and provides a positive definite covariance matrix that is estimated through restricted maximum likelihood (REML). Models of this nature have been successfully tested on stream chemistry data (Ver Hoef et al., 2006) and further developed to also allow spatial auto-correlation among random variables on stream segments that do not share flow, with potential applications to the modeling of the concentration of upstream moving species (e.g., fishes or insects) (Ver Hoef and Peterson, 2010). While these methods do not account for the streamflow generation process, they avoid the conceptual and prediction uncertainty challenges confronted by kriging techniques.

The topological restricted maximum likelihood approach
Inspired by both types of approaches, here we present a method based on the use of linear mixed models to generate a BLUP for hydrological variables on a flow network. Rather than using a kriging estimator, we adopt a REML framework (Gilmour et al., 2004;Patterson and Thompson, 1971;Lark et al., 2006) to estimate variance parameters. This reduces the bias on the semi-variogram by allowing the variance to be estimated independently from the trend coefficients (Cressie, 1993;Lark et al., 2006). This use of a REML framework to estimate a linear mixed effect model on a topological support is termed topological restricted maximum likelihood (TopREML). The approach is based on the following conceptual assumptions: -Flow generation and propagation: similar to Topkriging, runoff is assumed to be generated at a point scale on the landscape, from where it is routed to a channel and measured at a gauge (Fig. 1i). Runoff observations made at any individual gauge (Fig. 1ii) can be broken up into a local contribution, derived from a never previously gauged catchment area, and an upstream contribution that was previously observed at upstream gauge(s) along the channel (Fig. 1iii). TopREML disaggregates all flow contributions into a cascade of local components, as observed at each successive gauge, and uses these characteristics to constrain the covariance structure of runoff and to account for the stronger spatial correlations between flow-connected basins.
-Treatment of time: for the local effects to form a suitable basis for spatial interpolation, variations associated with temporal correlation (e.g., travel time effects) need to be removed. This is achieved by considering timeaveraged streamflow data, with the proviso that the time averaging window is much greater than the characteristic catchment and channel response timescales. This treatment of time has several specific consequences. First, TopREML is only suitable for the regionalization of time-averaged and statistically stationary runoff properties (i.e., runoff signatures). Stationarity is necessary to ensure that the water balance assumption used to separate local from upstream runoff contributions is valid. However, as a consequence, TopREML cannot be used to interpolate transient signatures, such as those associated with real-time forecasting. Nor can it be used to describe runoff properties that are correlated over timescales larger than the time averaging window. Because of the stationarity assumption applied, all correlation arguments described in this manuscript refer to the spatial, and not temporal, correlation of the runoff signatures.
-Network topology: network topology in TopREML also follows a conceptual model that is similar to the model posited by Top-kriging. Topology is conceptualized by area connectivity. That is, flow-connected gauges are characterized by overlapping drainage areas. Unlike Top-kriging, TopREML does not require information about a spatially random point process, but solely relies on information measured at the gauges. It uses the inter-centroidal Euclidian distance between drainage areas of the local flow contributions at each gauge -the isolated drainage areas (IDA) -as a distance metric to compute streamflow correlation. The underlying assumption is that runoff signatures of local flow generation regions that are close to each other (in Euclidian space) are more likely to be identical. Although TopREML does not require that the characteristics of a point-scale runoff generation process are known in order to support interpolation (a necessary requirement for Top-kriging), the existence of such a point process is consistent with the treatment of spatial correlation in TopREML. To illustrate this consistency, a stylized example relating point-scale runoff generation to the existence of a covariance structure that relates flowconnected gauges is outlined in Appendix A.

Paper outline
We first derive the TopREML estimator and its variance for mass conserving (i.e., linearly aggregated) variables, with extensions to some non-conservative variables (Sect. 2). We then apply the approach in two case studies to evaluate its ability to predict mean runoff and runoff frequency by comparison to other available interpolation techniques: Sects. 3.1 and 4.1 present leave-one-out cross-validations in Nepal (sparse gauges, significant trends) and Austria (dense gauge network, no observed trends). In both cases, TopREML performed similarly to the best alternative geostatistical method. We then use numerical simulations to illustrate the effect of the two distinguishing features of TopREML: its ability to properly predict runoff using highly nested networks of stream gauges and its ability to properly estimate the prediction variance when accounting for observable features (Sects. 3.2 and 4.2). Finally, we discuss the limits and delineate the context in which TopREML -and geostatistical methods in general -can successfully be applied to predict streamflow signatures in ungauged basins (Sect. 5).

Accounting for spatially correlated residuals
Linear models can be used to make predictions about hydrological variables along a network, provided that the models explicitly address the effects of network structure. A mixed linear model approach provides a suitable framework for this accounting. In this framework, the effects of observable features on the hydrological outcome are assumed to be independent of the network, and retain their influence independently, as so-called "fixed effects". The role of spatial structure is assumed to lead to correlation specifically in the residuals η. The residuals are split into two parts: (i) one containing "random effects", u, that exhibit spatial correlation along the flow network and (ii) a remaining, spatially independent, white noise term, , which does not have any spatial structure. With these assumptions, the mixed linear model is written as: . (2) To proceed, we assume that u and (and therefore y) are normally distributed with zero mean and are independent from each other. The variance associated with is denoted σ 2 , the variance of u is assumed to be proportional to σ 2 according to some ratio, ξ , and finally, u is assumed to have a spatial dependence captured by a correlation structure G, which is related to the spatial layout of gauges along the river network and a distance parameter φ (the correlation range). Thus, the random effects can be specified as To solve this mixed model, five unknowns must be found: σ 2 , ξ , φ, the fixed (τ ) and random (u) effects. Once τ and u are known, the empirical best linear unbiased prediction (E-BLUP) of y can be made at ungauged locations (Lark et al., 2006). The solution strategy adopted here is to prescribe a parametric form for G(φ), allowing the covariance structure along the network to be specified, and the likelihood function for the model to be written in terms of all five unknowns. Identifying the parameter values that optimize this model thus simultaneously solves for the correlation structure, covariance parameters, fixed and random effects. To proceed with the specification of G(φ), however, the form of the covariance structure that arises along the network needs to be addressed.

Covariance structure of mass conserving variables
In the linear mixed model framework, the propagation of hydrological variables through the flow network introduces Hydrol. Earth Syst. Sci., 19, 2925-2942, 2015 www.hydrol-earth-syst-sci.net/19/2925/2015/ topological effects into the covariance structure of that variable. Firstly, linearly propagated variables, such as annual specific runoff, are discussed. Nonlinearly propagating variables can in some cases be transformed to allow the linear solutions to be used (as outlined in Sect. 2.5). Consider a set of streamflow gauges monitoring a watershed as illustrated in Fig. 1ii. Because of the nested nature of the river network, the catchment area related to any upstream gauge is entirely included within the area drained by all downstream gauges.
To account for the network structure, the catchment at any location along a stream can be subdivided into the IDA that are monitored for the first time by an upstream gauge. This is illustrated in Fig. 1iii, and leads to a subdivision into nonoverlapping areas, each associated with the most upstream gauge that monitors them. In making this subdivision, it is implicitly assumed that the timescales at which a hydrological variable is propagated in the channel are negligible compared with the timescales on which hillslope effects operate (a generally valid assumption for small to moderately sized watersheds; see D' Odorico and Rigon, 2003). IDAs can be associated with both gauged locations and ungauged locations. In what follows, indices i, j , k and m are used to refer to gauged sites, while index n refers to ungauged sites where a prediction is to be made. With these assumptions, observations of y i made at gauge i can be expressed as a linear combination of contributions from the upstream IDAs: where y k is the contribution of the IDA related to gauge k (that is, y i is equivalent to y i only if there are no gauges upstream of gauge i); UP is the set of IDA monitored by gauges that are located upstream of i; a k = A k / UP m=i A m ≤ 1 is the surface area of the drainage area k normalized by the total watershed area upstream of gauge i. The covariance between observations of y made at different gauges can then be expressed as where Cov y k , y m is the covariance between the contributions of sub-catchments k and m. By summing over UP in Eq. (5) (rather then the complete set of available gauges), the model assumes no correlation between runoff observed at flow-unconnected gauges.
Here we assume that the area-averaged process y is drawn from a second-order stationary random process, and that the covariance between y k and y m will depend only on the relative position of sub-catchments m and k, given some specified correlation function ρ(·) of the distance c km between the centroids of the two sub-catchments (Cressie, 1993). We assume that this function is well approximated by an exponential function ρ(c km , φ) = exp(−c/φ). A justification for this assumption, which reproduces the streamflow variances observed in our case studies well (Fig. 8), is derived for strongly idealized conditions in Appendix A. Finally, because the observations made at the gauges represent an area-averaged process, the averaging generates a nugget variance σ 2 that is homogenous across observations. The nugget consists of the variance of processes that are spatially correlated over scales smaller than the sub-catchments (see Appendix A) and of measurement errors at the gauges.
With this background, the covariance matrix of y can be expressed as where σ 2 = Var(y k , y k ), U i,j = 1{j ∈ UP i }, A = aa T , and where R i,j = ρ(c i,j , φ). [· ·] denotes the element-byelement matrix multiplication. The matrix G describing the correlation between the random effects in Eq. (3) is finally The topology of the network is described by the matrix U, which ensures that only those catchments that are on the same sub-network (upstream or downstream) of the considered gauge are utilized in the determination of the covariance of y. This spatial constraint comes at the expense of neglecting potential correlations with neighboring catchments that are not flow-connected, and the effects of this tradeoff are investigated in the Monte Carlo experiment described in Sect. 3.2. The effect of spatial proximity is addressed by use of the Euclidian distance between catchment centroids (matrix R), and the effect of scale is accounted for by weighting by the catchment area of the IDAs (matrix A).

REML estimation
The restricted maximum likelihood approach partitions the likelihood of y ∼ N Xτ , σ 2 (ξ G + I N ) into two parts, one of which is independent of τ (Corbeil and Searle, 1976). This allows the determination of fixed effects and the variance parameters of the model (here σ 2 , φ and ξ ) to be undertaken separately. The variance parameters are then estimated by maximizing the restricted log likelihood expression (Gilmour et al., 1995) where det(·) is the matrix determinant operator, ν = N − k, , R is the correlation matrix in Eq. (7) and K is the block matrix: The REML estimatorsσ 2 andφ that maximize λ R can be obtained through numerical optimization.

E-BLUP and prediction variance at ungauged catchments
Once the variance componentsφ andξ are estimated, the fixed effect coefficientsτ and the random effects u can be obtained by solving the linear system (Henderson, 1975): The empirical best linear unbiased prediction of y n at an ungauged site n can be computed by summing the fixed and random effect predictions (Lark et al., 2006) where x n is the vector of fixed covariates at ungauged site n and g n a correlation vector between site n and each gauge. Knowingφ, g n can be readily obtained from the relative position of site n and the gauges in the river network. The variance of the TopREML prediction error can be expressed as The covariance matrix of the error on τ and u in Eq. (11) can be expressed as a function of the inverted model matrix K (Lark et al., 2006): This provides where K −1 11 , K −1 22 and K −1 12 are k × k, N × N and k × N partitions of the inverted K matrix. If is an error that is truly iid and does not affect the true value of y n (e.g., measurement errors), then Eq. (14) corresponds to the mean square error of the TopREML prediction of y n . If, by contrast, represents random variations of the true value of y n that are correlated over short distances (and so do not appear correlated in our data), then should be included in Eq. (11) and the prediction variance becomes because and u are independent. In reality is likely composed of both spatially correlated and iid error components and the true variance will be somewhere between these two bounds (Lark et al., 2006).

Application to non-conservative variables
Unlike mean specific runoff, numerous streamflow signatures (e.g., runoff frequency or descriptors of the recession behavior) are non-conservative and cannot be expressed as linear combinations of their values in upstream subcatchments. In such conditions the derivations in Sect. 2.2 cannot be applied and the correlation structure in Eq. (7) will lead to biased REML predictions. The effect of the network structure on streamflow can nonetheless be accounted if the nonlinearities can be neglected or eliminated through algebraic transformations. For instance, runoff frequency λ is defined as the probability, on daily timescales, that a gauge will record a positive increment in streamflow (Botter et al., 2007;Müller et al., 2014). Provided all sub-basins are large enough to significantly contribute to streamflow, a runoff pulse at any of the upstream sub-basins causes a streamflow increase at the gauge. Therefore, runoff frequency does not scale linearly through the river network. It can nonetheless be shown (see Appendix B) that if runoff pulses occur independently for each sub-basin, the logarithm of the complement to runoff probability (i.e., ln(1−λ)) propagates linearly throughout the network, enabling the application of TopREML to predict runoff probability at ungauged catchments.
A similar reasoning can be applied to predict recession parameters. For example, the exponential function Q(t) = Q 0 exp(−k r t) is a widely used approach to model base flow recession, where Q(t) is the discharge at time t, Q 0 the peak discharge and k r the recession constant which can be considered to represent the inverse of the average response time in storage (Wittenberg and Sivapalan, 1999). Because expected values scale linearly, the average response time at a gauge can be modeled as a linear combination of the mean response times of the upstream IDAs. Therefore, although recession constants themselves do not propagate linearly, their value in ungauged basins can be estimated by taking the inverse of TopREML predictions of average response times.  Figure 2. Empirical correlograms of the mean specific summer flow recorded at the 57 gauges of the Austrian data set. Distance has a different effect on the correlation between flow-connected (black circles) and flow-unconnected (white triangles) gauges. Both correlograms are well fitted by an exponential function but the spatial correlation range doubles when gauges are flow connected. Both empirical correlograms are constructed using 5 km bins.

Implementation
The computational implementation of TopREML in R (R Core Team, 2008) is described in Appendix C and an operational script is provided as a Supplement to this manuscript. To run the script, two vector data sets (e.g., ESRI Shapefiles) are needed as inputs -one containing the catchments where runoff is available and another containing the basins where predictions are to be made. Catchment polygons and explanatory and predicted variables must be provided as attributes of the vector polygons. The way in which the catchment polygons are nested provides the topology of the stream network. TopREML uses the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (Wright and Nocedal, 1999) to maximize the restricted log likelihood, though stochastic algorithms are required if a non-differentiable (e.g., spherical) covariance function is selected. The selection of initial values for σ 2 , φ and ξ is a key user input that may affect the performance of optimization algorithms by causing them to converge to a local extrema. We found that initial values of [σ 2 0 , φ 0 , ξ 0 ] = [σ 2 LM , Ec km , 1] worked well in our case studies, with σ 2 LM the variance of the OLS residuals of the linear model and Ec km the average distance between IDA centroids.

Case studies
Observed streamflow data are used to evaluate the ability of TopREML to predict streamflow signatures in ungauged basins. The assessment is based on leave-one-out cross-validations, where the tested model is applied to predict runoff at one basin based on observations from all the other basins. After predicting runoff at all available basins in that manner, the model is evaluated based on its mean absolute prediction error. Streamflow variables from 57 catchments in Upper Austria  and 52 catchments in Nepal (Department of Hydrology and Meteorology, 2011) are used in two separate leave-one-out analyses. The location of the gauges is shown in Fig. 3, and Table 2 provides a summary of relevant catchment characteristics. Further details on the data sets are provided in Skøien et al. (2014) for Austria and Müller et al. (2014) in Nepal. The two regions differ significantly with respect to gauge density (high in Austria and low in Nepal) and in the nature of the runoff signature and observable features. The Nepalese data set provides specific runoff and wet season runoff frequency as well as gauge elevation and bias-adjusted annual rainfall derived from the Tropical Rainfall Measurement Mission 3B42v7 data set (Müller and Thompson, 2013). Gauge elevation and annual rainfall are used as observable features for specific runoff (Chalise et al., 2003). The Austrian data set was directly taken from the rtop package , where mean summer runoff observations are provided to demonstrate Top-kriging. The Austrian data set did not contain additional observable features and previous studies have found spatial proximity to be a significantly better predictor of runoff than catchment attributes in Austria (Merz and Blöschl, 2005).
The predictive ability of TopREML was evaluated on (a) specific annual runoff in Nepal, (b) wet season runoff frequency in Nepal and (c) average summer streamflow in Austria. The performance of TopREML (TR) was compared to five other widely used regionalization methods: sample mean (LM 0 ), linear regression (LM), universal kriging (UK) and Top-kriging (TK). As shown in Table 1, these methods cover a wide spectrum of incrementally specific assumptions and comparing them provides an assessment of the value added by increased model complexity for regionalization of these streamflow parameters. Code to implement all four methods is readily available in R, with dedicated packages available for Top-kriging -rtop -and universal kriging -gstat (Pebesma, 2004).  N is the number of catchments; Q the specific runoff (mm yr −1 ) in Nepal and the mean summer streamflow (m 3 s) in Austria; λ is the rainy season runoff frequency (d −1 ) in Nepal; A the catchment area in km 2 ; c the distance in km between the centroids of IDA; Dpt the depth of the stream network graph (i.e., the maximum number of flow-connected gauges); P y the annual rainfall in millimeters given by TRMM over Nepal and adjusted according to (Müller and Thompson, 2013); z g is the gauge elevation in meters above sea level. Median values are provided with 25th and 75th quantiles in parenthesis.

Figure 3. Location of the gauges and related catchments included in the cross-validation analyses in Upper Austria (a) and Nepal (b).
Coloring is semi-transparent to emphasize overlapping catchment areas. Dark colors represent upstream catchments, whose runoff is monitored by many gauges downstream. Light colors represent downstream catchments with only few downstream gauges to monitor runoff.

Network effects
Conventional geostatistical methods predict runoff by weighing observations from surrounding basins based on their geographic distance. TopREML also incorporates the topology of the stream network by including or excluding basins based on their flow-connectedness. This adds topological information to the determination of the covariance structure of runoff, at the expense of discarding information that could be derived from correlations between spatially proximate regions that are not connected to the gauge of interest by a flow path. Assessing the net benefits of accounting for network effects requires being able to control the topology of the network, and thus requires numerical simulations. A series of Monte Carlo experiments as described in Fig. 4 were run to simulate network complexity by varying the number of flow-connected basins that are within (N inner ) and beyond (N outer ) the predefined spatial auto-correlation range of the randomly generated runoff. A non-topological geostatistical method like universal kriging would include all basins within and exclude all basins beyond the spatial autocorrelation range. We expect TopREML to outperform universal kriging when the number of flow-connected basins beyond the auto-correlation range increases and the number of connected basins within the auto-correlation range decreases.

Variance estimation and observable features
A key advantage of the REML framework is its ability to avoid the downward bias in the covariance function that affects kriging-based methods (including Top-kriging) when external trend coefficients are simultaneously estimated. This bias particularly affects the prediction of the variance. Again, empirical cross-validation analysis does not allow an assessment of this bias, because the observation data sets used contained only one observation per location. Numerical simulations, however, allow many realizations of the underlying stochastic process to be made at each location, and thus allow the prediction variance to be compared with the numerical variance. We evaluate TopREML's ability to predict variances (and therefore evaluate prediction uncertainties) at ungauged locations using the Monte Carlo procedure on the synthetic catchments described in Fig. 4. We construct the observed prediction uncertainty by taking the standard deviation of the prediction errors across all 1000 Monte Carlo runs and compare it to the square root of the median predicted variance. The external trend is omitted from the model specification (i.e., it is not observed) in a first experiment, and explicitly included in the model in the second experiment. We compare TopREML and Top-kriging based on their ability to model prediction variance. We expect TopREML to provide a better estimate of the variance than Top-kriging when accounting for observable features. Because the trend is spa- Figure 4. Monte Carlo generation procedure: (i) a spatially correlated Gaussian field with an exponential covariance function (mean = 30, partial sill = 8, nugget = 2, range = 3) is generated along a 7 × 7 irregular grid. The central pixel (in black) represents the downstream-most catchment, where runoff is to be predicted. Among the remaining pixels, 24 inner isolated drainage areas (IDA) are within a radius of one spatial correlation range (dashed circle) of the central pixel, and 24 outer pixels are beyond that radius. (ii) A predefined number of inner and outer pixels are randomly selected as part of the set of catchments that are flow-connected to the central pixel. In the figure, all 24 inner pixels and 12 outer pixels are selected and form the flow catchment outlined with a thick black line. (iii) A tree graph is randomly generated (grey arrows) with its trunk at the prediction pixel and branches passing through all the flow-connected pixels. The random field generated in step one is aggregated along the tree by summing the value of all lower order branches at each confluence. (iv) A new spatially correlated field (mean = 1, partial sill = 0.15, nugget = 0, range = 0.5) is generated at each pixel -that is the observed trend. The trend is multiplied by a predefined trend coefficient (τ = 10) and added to the aggregated runoff at each pixel -that is the observed runoff. (v) Based on the observed runoff and (if applicable) trend at the 48 non-central pixels, TopREML and the compared baseline method (Top-kriging or universal kriging) are used to predict runoff at the central pixel. Prediction errors are recorded and the procedure repeated 1000 times to get the mean and variance of the errors.
tially correlated, omitting it in the model specification adds a significant spatially correlated component to the error and Eq. (15) should be used to predict the variance. Conversely, including a trend in the model will cause the remaining error to mostly consists of (spatially uncorrelated) residuals, so in this case Eq. (14) is used.

Case studies
Basin-level predictions of the considered signatures are presented in Fig. 5 for the three cross-validation analyses described in Sect. 3.1. Figure 5 also provides box plots summarizing the distribution of the ensuing cross-validation errors. In the three analyses, the prediction errors related to TopREML were comparable to the best alternate method: a linear model for annual specific runoff (Nepal) and Topkriging for runoff frequency (Nepal) and summer runoff (Austria). Figure 5a presents results for annual specific runoff in Nepal and shows that observable features play a significant role in the prediction of runoff. The linear model showed a highly significant effect of annual precipitation (τ (LM) yearlyPrecip = 0.99, t-stat: 9.1) a moderately significant effect of altitude (τ (LM) meanElev = 0.39, t-stat: 2.5) and an overall fit of R 2 = 0.63. The positive sign of the altitude coefficient can be attributed to the effects of glacial melt on runoff, which are more significant at higher altitudes, while the average effect of evapotranspiration explains the negative and noisy intercept of −313 mm yr −1 . While including rainfall and altitude in the model decreased the median absolute error by 43 % (LM to LM 0 ), further increasing the complexity of the model by allowing for spatial (UK) and topological effects (TK and TR) did not improve the predictive performance: residuals from the linear regression appeared to be correlated at a range shorter than the distance between the gauges in Nepal. Indeed, fitting the empirical semi-variograms with exponential functions revealed spatial correlation ranges that were on the order of the mean distance between IDA centroids for annual streamflow (21.6 km), and significantly below that distance (7.0 km) for the regression residuals. Nonetheless, the lack of parsimony of TopREML did not appear to affect its predictive performance, which almost perfectly reproduced the performance of the linear model -the most parsimonious method.
In contrast, the analysis revealed significant spatial effects for both runoff frequency in Nepal, which has a much larger spatial correlation range than annual streamflow (426 kmpresumably set by meteorology and the correlation range of storm events; Müller and Thompson (2013)), and summer average streamflow in Austria, which has a range of 19.1 km but is sampled by a much higher density of streamflow gauges than in Nepal. Allowing for spatial correlation in the residuals (UK) decreased the median absolute error by 11 % compared to the linear model (LM) for runoff frequency in Nepal and 31 % for summer runoff in Austria. Accounting for topological effects further reduced errors by 33 % (runoff frequency) and 40 % (summer runoff) for both TopREML and Top-kriging methods.    (a) and (b)). The graphs display the expectation and standard deviation of that difference over the 1000 Monte Carlo runs. (c) presents the observed (grey boxes) and predicted (black error bars) standard deviation on the prediction errors for Top-kriging (TK) and TopREML (TR). Note that the slight downward biases that appear on the graph remain below 1 % of the expected value of the predicted outcome.

Numerical simulation
Results from the Monte Carlo analysis are presented in Fig. 6, showing the outcomes of the two numerical experiments described in Sect. 3.2. Figure 6a and b shows the effect of network complexity on the performance of TopREML relative to the baseline performance of universal kriging. This effect is measured as the difference in the relative errors of the two methods as a function of N outer , the ratio of basins beyond the spatial correlation range of runoff that are flow-connected, and N inner , the ratio of basins within range that are not flow-connected. The effect is expected to increase with N outer and decrease with N inner , reaching zero when 100 % of observed basins lie within the spatial correlation range and 0 % of the basins beyond the range are flow-connected. In that case (not shown in the figure), TopREML and universal kriging perform similarly and the mean difference in the relative error of the two methods is zero. Figure 6a shows that the relative performance of TopREML improves with the number of flow-connected catchments that are located beyond the spatial correlation range, and which are therefore not properly accounted for by universal kriging. Conversely, Fig. 6b shows that the relative performance of TopREML decreases with decreasing network effects within the spatial correlation range. A linear regression of the relative performance of TopREML against N outer and N inner showed that both trends are significant and in the expected direction. However, the positive coefficient associated with N outer (9.1, t-stat: 11.9) is larger in absolute value and more statistically significant than the negative coefficient associated with N inner (−2.6, t-stat: −2.6), which suggests that the benefits of including distant flow-connected basins outweigh the costs of discarding nearby (but unconnected) IDAs.
In Fig. 6c, the Monte Carlo analysis showed that model uncertainty is well predicted by TopREML and strongly underestimated by Top-kriging, both with and without considering an external trend. Including a trend in the model reduces the prediction variance of TopREML -this effect is expected because the variance explained by the trend is no longer included in the modeling error . The decrease in the prediction variance is well modeled by TopREML, which predicts the observe model uncertainty almost exactly.

Performance of TopREML
Cross-validation outcomes suggest that TopREML is an attractive operational tool for predicting streamflow in ungauged basins. The method performs as well as the best alternative approach in the prediction of the considered runoff signatures in Nepal and Austria, and significantly outperforms Top-kriging in the prediction of modeling uncertainties in the numerical analysis. Two distinguishing features of TopREML are responsible for these encouraging results. First, TopREML incorporates the topology of the stream network by restricting correlations to runoff observed at flowconnected catchments. This allows TopREML to explicitly model the higher correlation in streamflow anticipated along channels, but comes at the expense of discarding correlations with neighboring, but not flow-connected catchments. Such correlations can, for instance, be driven by large-scale weather patterns. This tradeoff was investigated in a Monte Carlo analysis showing that modeling performance increases more rapidly when including distant flow-connected basins (slope in Fig. 6a) than it decreases when discarding nearby unconnected basins (slope in Fig. 6b). Further, empirical correlograms of Austrian summer runoff (Fig. 2) reveal significantly lower and shorter-ranged spatial correlations when basins are not flow-connected. Both results suggest that the benefit of accounting for network effects on correlations outweighs the cost of losing some information on the correlation between unconnected basins. Second, the REML frame-work provides an unbiased estimation of variance parameters, even when accounting for observable features. This allows TopREML to accurately predict modeling uncertainties even for highly trended and auto-correlated runoff signatures, as visible in the Monte Carlo analysis presented on Fig. 6c. By contrast, the expected downward bias in the kriging estimation of partial sills (Cressie, 1993) is clearly visible in the underestimation of prediction uncertainties by the Topkriging method.
TopREML also has considerably lower computational requirements than Top-kriging, both in terms of input data and optimization complexity. Unlike Top-kriging, where watershed polygons are necessary inputs for the regularization procedure, vectors are not fundamentally indispensable for TopREML. Indeed, TopREML does not rely on a distributed point process but assumes homogenous IDAs. It follows that its only fundamental data requirement is a table (i.e., a data.frame) of IDAs displaying the observed regionalization variable and the area, centroid coordinates and network position (i.e., own ID and downstream ID) of the IDA. When considering runtime, both methods rely on numerical optimization, but Top-kriging uses it to back calculate the point semivariogram in its regularization procedure. This may substantially increase the dimensionality of the optimization task, depending on the grid resolution chosen for the discretization of the catchment areas, which in turn has a highly significant effect on prediction performances (Skøien et al., 2006). By contrast, the dimensionality of the optimization in TopREML is driven by the number of catchments, not an arbitrary grid. More importantly, TopREML admits a well-defined objective function, the restricted likelihood, that is differentiable if the selected variogram function is differentiable. This allows gradient optimization methods to be used, which are much less computationally intensive than the stochastic algorithm required by Top-kriging. The resampling analysis shown in Appendix C suggests that TopREML reduces the computation runtime by an order of magnitude, relative to the implementation of Top-kriging in the rtop package, for comparable prediction performances.
Despite these encouraging results, TopREML is subject to stringent linearity assumptions on the nature of the regionalized runoff signature. The predicted variable should aggregate linearly both on hillslope surfaces and at channel junctions that are subject to mass conservation. This limitation also affects block-kriging approaches, as pointed out by Skøien et al. (2006), who suggest that Top-kriging can still be applied, in an approximate way on non-conservative variables. Here we assert that hydrologic arguments can be used to convert some non-conservative variables into linearly aggregating processes using simple algebraic transformations. This theoretically more robust approach was here successfully tested in a cross-validation analysis of runoff frequency in Nepal.  (WCM, 2000), and have a typical range of 1-10 km (Anders et al., 2006). Convective rainfall are assumed dominant in region with a high frequency of lighting strikes (≥ 10/km 2 yr −1 ) as recorded by the TRMM satellite (LIS, 2011) and have a typical scale of 10-100 km (Bosch et al., 1999;Smith et al., 2005). Frontal precipitations are assumed dominant in the remaining regions and have a typical scale in excess of 100 km (Bosch et al., 1999;Xu et al., 2014). (c) Drainage density is estimated based on the Hydro1k data set (Hyd, 2004), using 154 large basins (Wot, 2003) as units of analysis. Drainage densities are displayed in three classes: low (0.01-0.025 km −1 ), medium (0.025-0.027 km −1 ) and high (> 0.027 km −1 ).

Model selection
The regionalization methods assessed in the cross-validation analysis range from simple linear regressions with strong independence assumptions, to complex geostatistical methods that allow for both spatial and topological correlations. Results indicate that while complex methods perform best in general, there seems to be a threshold beyond which increasing the complexity of the statistical method does not significantly improve the prediction performance: while a linear model is better than a simple average for the prediction of annual streamflow in Nepal (Fig. 5a), accounting for spatial (UK) and topological (TR) correlation does not further improve predictions. In that situation, parsimony prescribes selecting the least complex of the best performing methods. Under these conditions, the selection of the optimal method is driven by the interplay between the layout of the gauges and the spatial correlation range of the considered runoff signature. A dense network of flow gauges is necessary for geostatistical methods to properly estimate the semivariogram and improve on predictions from linear regressions -the case studies suggest that the mean distance between the gauges must be on the order of half the spatial correlation range of the runoff signature. Sparser gauge densities do not allow geostatistical methods to capture spatial correlations and their prediction is effectively driven by the deterministic components of the model, i.e., the intercept and (when available) observable features.
An interesting tradeoff arises if observable features are themselves spatially correlated and explain a significant part of the spatial correlation of the predicted variable. Including these observable features in the model reduces the correlation scale of the residuals, possibly crossing the threshold below which geostatistics are not the most parsimonious approach. In Nepal, controlling for rainfall reduced the spatial correlation range of annual streamflow from 21.6 to 7 km -well be-  Figure 9. Algorithmic chart of the provided TopREML implementation. Dashed frames and arrows represent vector data and operations and the bold arrow represents the step requiring numerical optimization. The complexity of the computational tasks represented by the remaining plain arrows is driven by matrix inversion, which is of polynomial complexity. In the figure, X is a matrix of observed covariates and y a vector of outcomes measured at the available gauges, as defined in Eq. (1); x is a vector of identical covariates observed at the prediction location. A, U and c ij are matrices of relative catchment areas, network topology and inter-centroidal distances of the available gauges, as defined in Eq. (6); a, U out and c out ij are equivalent matrices for the prediction location. σ 2 , φ and ξ are estimated variance parameters as defined in Eq. (3); τ , u and G are the estimated fixed and random effects (Eq. 11) and variancecovariance matrix (Eq. 7); g is the estimated covariance at the prediction location (used in Eq. 13). Finally, y out and Var(y out − y) are the predicted outcome and the related prediction variance. low the mean distance between the gauges (13.9 km). In that case there is a tradeoff between relying on observable features or variance information to make a prediction, and parsimony and stationarity considerations come into play when selecting the regionalization model. For instance, while parsimony generally prescribes the use of observable features, a climate may be less stationary -and therefore a less reliable external trend -than embedded geology or geomorphology.
In general, geostatistical approaches improve on the prediction of ungauged basins if the distance between the stream gauges is significantly smaller than the spatial correlation scale of runoff. Favorable areas are characterized by high drainage densities or localized rainfall, in addition to a high density of streamflow gauges. All three variables are highly heterogeneously distributed at a global scale, as seen on Fig. 7. The multiplicity of local settings likely explains the large diversity of existing regionalization methods and suggests that the selection of the optimal regionalization approach has to be made locally. Lastly, the decreasing returns to improvements in the complexity of the model also suggest that the performance of statistical methods for PUB is ultimately bounded by the spatial heterogeneity of runoff generating processes. Statistical methods resolve parts of that heterogeneity using the spatial distribution of observable features (linear regressions) and/or based on the analysis of the variance of a sample of the predicted variable (geostatistics). Yet very important parts of the hydrological activity related to storage and flow path characteristics take place underground: they cannot be observed and included in the statistical models (Gupta et al., 2013). This residual spatial heterogeneity can ultimately only be resolved through a better understanding of the particular catchment processes governing runoff in the considered region. Approaches coupling statistical regionalization with processbased models that assimilate both a conceptual understanding of catchment-scale processes and the random nature of runoff (e.g., Botter et al., 2007;Schaefli et al., 2013;Müller et al., 2014) are particularly promising.

Conclusions
We introduced TopREML as a method to predict runoff signatures in ungauged basins. The approach takes into account the spatially correlated nature of runoff and the nested character of streamflow networks. Unlike kriging approaches, the restricted maximum likelihood (REML) estimators provide the best linear unbiased predictor (BLUP) of both the predicted variable and the associated prediction uncertainty, even when incorporating observable features in the model.
The method was successfully tested in cross-validation analyses on mass conserving (mean streamflow) and nonconservative (runoff frequency) runoff signatures in Nepal (sparsely gauged) and Austria (densely gauged), where it matched the performance of the best alternative method: Topkriging in Austria and linear regression in Nepal. TopREML outperformed Top-kriging in the prediction of uncertainty in Monte Carlo simulations and its performance is robust to the inclusion of observable features.
The ability of TopREML to combine deterministic (observable features) and stochastic (covariance) information to generate a BLUP makes it a particularly versatile method that can readily be applied in densely gauged basins, where it takes advantage of spatial covariance information, as well as data-scarce regions, where it can rely on covariates with spatial distributions that are increasingly observable thanks to remote-sensing technology. This flexibility, along with its ability to provide a reliable estimate of the prediction uncertainty, offer considerable scope to use this computationally inexpensive method for practical PUB applications.
Hydrol. Earth Syst. Sci., 19, 2925-2942 The aim of this analysis is to explore the likely forms of a correlation structure between spatially aggregated processes, given that the underlying point-scale processes are also spatially correlated. In order to maintain tractability, the analysis will consider a strongly idealized case. While we anticipate deviations from the results in non-ideal situations, we nonetheless interpret this idealized analysis as offering insight that constrains the choice of correlation function in the TopREML analysis.
Assuming that the underlying point-scale process Y is conservative, the aggregated process y k related to the subcatchment S k of gauge k can be expressed as where A k is the area of S k . To proceed, we make the assumption that the area of the drainage areas S k are approximately equal. While this is a strong constraint, under situations where gauges are placed near confluences and where subcatchments for a given stream ratio are adequately monitored by the gauge network, Horton scaling ensures that the drainage areas are of a similar order of magnitude. Thus, we will take A k = A∀k. The subcatchments are further assumed to have similar shapes and (by definition) do not overlap.
Following Cressie (1993) (p. 68), the covariance between two aggregated random variables y k and y m is expressed as a function of the covariogram C P (·) of the underlying pointscale process: Cov y k , y m = 1 A 2 S k S m C P (| x 2 − x 1 |)dx 1 dx 2 where S k and S m are the surfaces of subcatchments k and m, and ν(D) is the probability density function of the distance between randomly chosen points within S k and S mtwo identical and non-overlapping shapes. Analytical expressions for ν(D) can be derived for simple geometries (e.g., Mathai, 1999), although complex algebraic expressions typically result. For analytical tractability we adopt a simplified expression: which approximates distance frequency function of adjacent elliptical subcatchments, as shown in Fig. 8. In Eq. (A2) the parameters a 0 , a D > a c , D 1 and D 2 are positive functions of A, and c is the distance between the centroids of the subcatchments. We also assume that the underlying point-scale process is second-order stationary and follows an exponential correlation function: where σ 2 p and a p are respectively the point variance and spatial range of the process.
Inserting Eqs. (A2) and (A3) into Eq. (A1) allows the covariance of the two spatially aggregated random variables to also be expressed as an exponential function of the distance c between their supports C A (c) = ξ σ 2 exp (−φc) , where ξ σ 2 = σ 2 p a 0 a p +a D [exp(a p D 2 + a D D 2 ) -exp(−a p D 1 − a D D 1 )] > 0 and φ = a p +a D −a c > 0. This exponential form was adopted in the covariance derivation in the main text.
We note that within this analysis, the spatial aggregation of the point-scale process creates a nugget variance arising from spatial correlation scales smaller than the subcatchments. The nugget variance can be derived (for this idealized case) by considering the average covariance of points within the catchments: Cov y k , y k = 1 A 2 where ν 0 (D) now represents the probability density function (pdf) of the distance between two randomly selected points within S k : where D 0 is the maximum distance between two points within S k . Again, inserting Eqs. (A6) and (A3) into Eq. (A5), we get the nugget variance resulting from spatial aggregation: Therefore, under the aforementioned assumptions, catchment-scale variance parameters σ 2 and ξ in Eq. (6) can be expressed in terms of point-scale parameters: The Supplement related to this article is available online at doi:10.5194/hess-19-2925-2015-supplement.