Optimal design of hydrometric station networks based on complex network analysis

. Hydrometric networks play a vital role in providing information for decision-making in water resource management. They should be set up optimally to provide as much information as possible that is as accurate as possible and, at the same time, be cost-effective. Although the de-sign of hydrometric networks is a well-identiﬁed problem in hydrometeorology and has received considerable attention, there is still scope for further advancement. In this study, we use complex network analysis, deﬁned as a collection of nodes interconnected by links, to propose a new measure that identiﬁes critical nodes of station networks. The approach can support the design and redesign of hydrometric station networks. The science of complex networks is a relatively young ﬁeld and has gained signiﬁcant momentum over the last few years in different areas such as brain networks, social networks, technological networks, or climate networks. The identiﬁcation of inﬂuential nodes in complex networks is an important ﬁeld of research. We propose a new node-ranking measure – the weighted degree–betweenness (WDB) measure – to evaluate the importance of nodes in a network. It is compared to previously proposed measures used on synthetic sample networks and then applied to a real-world rain gauge network comprising 1229 stations across Germany to demonstrate its applicability. The proposed measure is evaluated using the decline rate of the network efﬁciency and the kriging error. The results suggest that WDB effectively quan-tiﬁes the importance of rain gauges, although the beneﬁts of the method need to be investigated in more detail


Introduction
Hydrometric networks monitor a wide range of water quantity and water quality parameters such as precipitation, streamflow, groundwater, or surface water temperature (Keum et al., 2017).Designing adequate hydrometric monitoring is key in water resources management; e.g., flood estimation, water budget analysis, hydraulic design, and monitoring climate change.Even after the advent of remote sensing based information, such as satellite precipitation estimates, in-situ observations are considered as an essential source of information in hydrometeorology (Rossi et al., 2017).
The basic characteristics of hydrometric networks comprise the number of stations, their locations, observation periods and sampling frequency (Keum et al., 2017).The general understanding is that the higher the number of monitoring stations, the more reliable the quantification of areal average estimates and point estimates at any ungauged location.However, a higher station number elevates the cost of installation, operation, and maintenance, but may provide redundant information and, therefore, not increase the information content obtained from the network.Scarcity of funds for hydrometric monitoring has led to slow but steady teardown of hydrometric stations in the last decades globally, raising the need for cost-effective design (Mishra and Coulibaly, 2009).For example, Putthividhya and Tanaka (2012) made an effort to design an optimal rain gauge network based on the station redundancy and the homogeneity of the rainfall distribution.Adhikary et al. (2015) proposed a kriging based geostatistical approach for optimizing rainfall networks, and Chacon-Hurtado et al. (2017) provided a generalized procedure for optimal rainfall and streamflow monitoring in the context of rainfall-runoff modeling.Yeh et al. (2017) optimized a rain gauge network applying the entropy method on radar datasets.Most of the aforementioned studies inherently assume that expanding the gauge network with supplementary stations aids more information that ultimately leads to less uncertainty (Wadoux et al., 2017).However, increasing the number of stations does not necessarily decrease the uncertainty (Stosic et al., 2017), and the expendable (relatively little significance) stations contribute little to no information though having the same maintenance cost as influential (significant) stations (Mishra and Coulibaly, 2009).
This study aims to discriminate influential and expendable stations in hydrometric station networks based on their relative information content.We propose complex networks as a suitable tool for this optimization problem.A complex network is defined as a collection of nodes, such as rain gauge stations, interconnected with links.Complex networks are powerful tools in extracting information from large high-dimensional datasets (Donges et al., 2009a;Cohen andHavlin 2010, Kurths et al., 2019).This non-parametric method allows investigating the topology of local and non-local statistical interrelationships.An example for non-local connections in a climate network are the global influence of El Niño Southern Oscillation (ENSO) on regional rainfall (Agarwal, 2019;Ferster et al., 2018), and of the Atlantic Meridional Overturning Circulation (AMOC) on air surface temperature (Caesar et al., 2018) via teleconnections and ocean circulation, respectively.Once the spatial network of stations has been constructed, statistical network measures (e.g.degree, betweenness centrality) are used to quantify the behaviour of the network and its components for a range of applications.Examples are the identification of the community structure of stations or homogeneous regions to unravel dominant climate modes (Agarwal et al., 2018a;Halverson and Fleming, 2015;), catchment classification indicating hydrologic similarity (Fang et al., 2017), short and long-range spatial connections in rainfall (Agarwal et al., 2018a;Boers et al., 2014b;Jha et al., 2015) and spatio-temporal hydrologic patterns (Halverson and Fleming, 2015;Konapala and Mishra, 2017).Complex network analysis complements classical Eigen techniques, such as empirical orthogonal functions (EOFs) or coupled patterns (CP) maximum covariance analysis (Donges et al., 2015).EOFs, CPs and related methods rely on dimensionality reduction, whereas network techniques allow studying the full complexity of the statistical interdependence structure and are not limited to linear and spatial-proximity connections.Also, higher-order complex network measures (betweenness centrality, closeness centrality, participation coefficient) provide additional information on the hidden structure of statistical interrelationships in climatological data (Donges et al., 2015).
In this study, we propose a complex network-based method to identify the influential and expendable stations in a rainfall network.Several methods in the field of complex networks have been proposed to evaluate the importance of nodes (Chen et al., 2012;Hou et al., 2012;Jensen et al., 2016;Kitsak et al., 2010;Zhang et al., 2013 andHu et al., 2013), however, the application and interpretation of complex network in hydrology is in infancy state.Degree (k), betweenness centrality (B), and closeness centrality (CC) are the measures commonly used in complex networks (Gao et al., 2013).Studies in different disciplines have shown that degree and betweenness centrality often outperform other node-ranking measures (Gao et al., 2013;Liu et al., 2016).We propose a novel measure, weighted degree-betweenness (WDB), which combines  and , to identify the stations providing the largest information to the network.Our main objective is to develop a node ranking method using complex network theory that can be used to identify not only influential but also the expendable stations in large hydrometric station networks.We do acknowledge that this study is preliminary efforts to explore complex networks application in hydrology and many further studies are necessary before the methodology can be considered a trustworthy optimization tool for measurement networks.Our aim is not to question the credibility of operating stations, but to propose an alternative evaluation procedure towards optimal design and redesign of observational hydrometric monitoring networks based on complex networks.

Network Construction
A network or a graph is a collection of entities (nodes, vertices) interconnected with lines (links, edges) as shown in Fig. 1.These entities could be anything, such as humans defining a social network (Arenas et al., 2008), computers constructing a web network (Zlatić et al., 2006), neurons forming brain networks (Bullmore and Sporns, 2012), streamflow stations creating a hydrological network (Halverson and Fleming, 2015) or climate stations describing a climate network (Agarwal et al., 2018).Formally, a network or graph is defined as an ordered pair  = {N, E}; containing a set  = { 1 ,  2 , … .  }, of nodes together with a set E of links, {, } which are 2-element subsets of N. In this work, we consider undirected and unweighted simple networks, where only one link can exist between a pair of vertices and self-loops of the type {, } are not allowed.This type of network can be represented by the symmetric adjacency matrix (Eq..1).
, = 1 denotes a link between the  ℎ and  ℎ station and 0 denotes otherwise.The adjacency matrix represents the connections in the network.Fig. 1 is a simple representation of such a network, i.e., one with a set of identical nodes (  , ℎ  = 1  4) connected by identical links.In general, (large) networks of real-world entities with irregular topology are called complex networks.The links represent similar evolution or variability at different nodes and can be identified from data using a similarity measure such as Pearson correlation (Ekhtiari et al., 2019), synchronization (Agarwal, 2019;Boers et al., 2019;Conticello et al., 2018) or mutual information (Paluš, 2018).

Event synchronization
Event synchronization (ES) has been specifically designed to calculate nonlinear correlations among bivariate time series with events defined on them (Quiroga et al., 2002).This method has advantages over other time-delayed correlation techniques (e.g., Pearson lag correlation), as it allows us to investigate extreme event series (such as non-Gaussian and event-like data sets) and uses a dynamic time delay (Ozturk et al., 2019).The latter refers to a time delay that is adjusted according to the two time series being compared, which allows for better adaptability to the variable and region of interest.Various extensions for ES have been proposed, addressing, for instance, boundary effects (Rheinwalt et al., 2016) and bias by varying event rates.
In the following, we define events by applying an  percentile threshold at the signals () and ().Threshold  percentile is selected to trade-off between a sufficient number of rainfall events at each location and a rather high threshold to study heavy precipitation.Events occur then at times    and    where  = 1,2,3,4 …   ,  = 1,2,3,4 … …   .
Events in () and () are considered to coincide if they occur within a time lag ±   which is defined as following where   and   are the total number of such events (greater then threshold ) that occurred in the signal () and (), respectively.The above definition of the time lag helps to separate independent events, which in turn allows to take into account the fact that different processes may be responsible for the generation of events.We need to count the number of times an event occurs in the signal () after it appears in the signal (), and vice versa, and this is achieved by defining quantities (|) and (|) where (3) This definition of   prevents counting a synchronized event twice.When two synchronized events match exactly (   =    ), we use a factor 1/2 since they double count in (|) and (|).Similarly, we can define (|) and from these quantities we obtain is a normalized measure of the strength of event synchronization between signal () and ().This implies   = 1 for perfect synchronization and   = 0 if no events are synchronized.After repeating this procedure for all pairs ( ≠  ) of grid sites, we obtain a similarity matrix.In this case, the similarity matrix for precipitation data is a square, symmetric matrix, which represents the strength of synchronization of the extreme rainfall events between each pair of grid sites.

Node Ranking Measures
A large number of measures have been defined to characterize the behaviour of complex networks.We focus here on those traditional and contemporary network measures which have been proposed to quantify the importance of nodes in a network: degree k, betweenness centrality B (Stolbova et al., 2014), bridgeness Bri (Jensen et al., 2016), and degree and influence of line DIL (Liu et al., 2016).

Traditional network measures
The degree k of a node in a network counts the number of connections linked to the node directly.The degree of any  node is calculated as where N is the total number of nodes in a network.For example, the degree of nodes 1, 2 and 4 in network N1 (Fig. 1a) is 1 and for node 3 is 3.In the network N2 (Fig. 1b), all nodes have degree 3. The degree can explain the importance of nodes to some extent, but nodes that own the same degree may not play the same role in a network.For instance, a bridging node connecting two important nodes might be very relevant though its degree could be much lower than the value of less important nodes.
The betweenness centrality B is a measure of control that a particular node exerts over the interaction between the remaining nodes.In simple words, B describes the ability of nodes to control the information flow in networks.To calculate betweenness centrality, we consider every pair of nodes and count how many times a third node can interrupt the shortest paths between the selected node pair.Mathematically, betweenness centrality B of any  node is where (, ) represents the number of links along the shortest path between node  and ; while   (, ) is the number of links of the shortest path running through node .In network N1, B of node 3 is 3, i.e., node 3 can disturb the information transfer between all of the three pairs 1-2, 1-4, 2-4, and for other nodes  = 0.In the network N2, all nodes have  = 0 because no node can interrupt the information flow.Thus, node 3 is a critical node in the network N1 but not in the network N2.Jensen et al. (2016) developed the Bridgeness measure  to distinguish local centres, i.e. nodes that are highly connected to a part of the network (e.g.highly correlated station in homogeneous region), from global bridge, i.e. nodes that connect different parts of a network (Fig. 2, e.g.teleconnection between Indian rainfall and climate indices).

Contemporary network measures
Bri is a decomposition of betweenness centrality B into a local and a global contribution.Therefore, the  value of node  is always smaller or equal to the corresponding  value and they only differ by the local contribution of the first direct neighbours.To calculate  we consider the shortest path between nodes outside the neighbourhood of node ,   ().Mathematically, it is represented as The neighbourhood of node  (  ()) consists of all direct neighbours of node i.For example, in the networks N1 and N2, all nodes (except node 3 in N1) have  = 0 hence  = 0.However, node 3 in the network N1 has all the nodes in direct neighbourhood hence, it also has  = 0.
The degree and influence of line (), introduced by Liu et al. ( 2016), considers the node degree  and importance of line  to rank the nodes in a network: where the line between node  and j is   and its importance is defined as    =   where  = (  −  − 1).(  −  − 1) reflects the connectivity ability of a line (link), p is the number of triangles having one edge   and  =  2 + 1 is defined as an alternative index of line   .  ()) is the set of neighbours of node  (for detailed explanation see Liu et al., 2016).
The equation for  suggests that all the nodes having   = 1 will have   = 1, since the second term of the equation will be zero.Hence, in the network N1 all nodes, except node 3, have  = 1.Node 3 has  = 3 equal to its degree, since the second term is zero (all the connected nodes 1, 2 and 4 have   = 1, hence    = 0).All the nodes in the network N2 have  = 3.

Methodology
We will first propose a new node ranking measure that we call weighted degree-betweenness (WDB).We will then compare the efficacy of this measure with the existing traditional and contemporary node ranking methods using two synthetic networks.

Weighted Degree-Betweenness
WDB is a combination of two network measures, degree and betweenness centrality We define WDB of a particular node i as the sum of the betweenness centrality of node  and all directly connected nodes j,  = 1,2,3 … .  in proportion to their contribution to node .The WDB of a node  is given by ℎ   is the betweenness centrality of node , and   stands for the cumulative effect of the influence or contribution of the directly connected nodes of , which are  = 1,2,3, … ,   , as ℎ   is the degree of node ,   is the degree of the nodes  which are directly connected to node .

Comparison with Existing Node Ranking Measures Using Synthetic Networks
In this section, we motivate the development of the new node ranking measure WDB by comparing it to existing measures.Identifying nodes that occupy interesting positions in a real-world network using node ranking helps to extract meaningful information from large datasets with little cost.Usually, the measures degree (  ) and betweenness centrality (  ) are common node ranking metrics (Gao et al., 2013;Okamoto et al., 2008;Saxena et al., 2016).The network measures   ,   and   of each node are given for an undirected and unweighted network  = (, ) with 8 nodes and 11 edges shown in Fig. 2 along with the node number.
In general, high degree nodes represent most connected (highly correlated) nodes in a network.Rheinwalt et al., (2015) considered these highly correlated nodes of homogeneous precipitation community as local centre representing homogenous precipitation patterns for that particular community.Agarwal et al., (2018) defined local centres as the nodes having maximum intra-community links and minimum inter-community links based on the Z-P space approach.
However, degree alone cannot distinguish the roles of nodes in the sample network as seen for nodes 5, 7, and 8, which have the same degree (ki=2), though node 5 serves as a bridge node linking the two parts of the network.In a larger complex network, such bridge nodes have strategic relevance as most of the information can be accessed quickly just by capturing those nodes.For example, Kurths et al., (2019)   The proposed measure WDB has higher discrimination power compared to betweenness centrality.Node 5 has the highest WDB score and is ranked as the most influential node, which reflects its role as a global bridge node.WDB distinguishes between nodes 1, 2, 3 (WDB = 14.4) and nodes 7, 8 (WDB = 13.3), which is important in case we need to sequentially rank nodes.
We further evaluate WDB with the network measures Bri.For this comparison, we use the same synthetic network as Jensen et al. (2016) shown in Fig. 3. Betweenness centrality once again assigns a smaller value to the global bridge (node 6) than to the local centers (nodes 4, 7).Bridgeness expresses the higher importance of node 6 compared to nodes 4, 7, however, it does not distinguish between all other nodes in the network (nodes 1, 2, 3…have  = 0).
Similarly, DIL misses representing the bridge nodes by assigning higher values to local centres.WDB ranks the nodes preferably following their role in the network as global bridges, local centers, and end nodes.For example, WDB is also able to differentiate between nodes 4 and 7 for which the bridgeness measure provides equal scores.

Evaluation of the Proposed Measure for a Rain Gauge Network
In the context of hydrometric station networks, we hypothesize that higher ranking nodes are more influential stations 5 in the network.Losing such stations could reduce the network stability and efficiency given their role as bridging different communities (processes), capturing detailed process information compared to lower ranking stations and among others.Stations with the lowest ranks in the network are the least influential and are seen as expendable stations.To test this hypothesis, we apply the proposed node ranking measure to a hydrometric station network, consisting of more than 1000 stations in Germany.The benefit of WDB is to capture the bridge nodes in the 10 hydrometric station network that are adequate to quantify the local and non-local rainfall variability, process identification, interpolation of measurements and transferability of precipitation measurements across locations.In contrast, expandable stations correspond to sites of spatially extended coherent rainfall, surrounding a local centre which represents the variability of such regions.Stations within such regions of coherent rainfall provide redundant information and can be removed (except the local centre) without loss of information.The information loss caused by removing stations is quantified via two measures: (a) decline rate of network efficiency, and (b) relative kriging error.

Decline Rate of Network Efficiency
The decline rate of network efficiency quantifies the decrease in information flows within a network when nodes are removed as, where N is the total number of nodes in a network.  is the efficiency between nodes   and   .  is inversely related to the shortest path length:   = 1   ⁄ , where   is the shortest path between nodes   and   .The average path length L measures the average number of links along the shortest paths between all possible pairs of network nodes.A network with small  is highly efficient, because two nodes are likely to be separated by a few links only.The decline rate of network efficiency  is defined as, where   is the efficiency of the network after removing nodes, and   is the efficiency of the complete network.
We hypothesise that the network efficiency reduces more strongly, when higher ranking stations are removed, e.g.bridge nodes.

Relative Kriging Error
As second measure to evaluate the information loss, when stations are removed from the network, we use a kriging based geostatistical approach (Adhikary et al., 2015;Keum et al., 2017).Kriging is an optimal surface interpolation technique assuming that the variance in a sample of observations depends on their distance (Adhikary et al., 2015).The algorithm estimates unknown variable values at unsampled locations in space, where no measurements are available, based on the known sampling values from the surrounding areas (Hohn, 1991;Webster and Oliver, 2007).Ordinary Kriging is used in this study for interpolating rainfall data and estimating the kriging error.The kriging estimator is expressed as where  * ( 0 ) refers to the estimated value of Z at the desired location  0 ;   represents weights associated with the observation at the location xi with respect to x0; and n indicates the number of observations within the domain of the search neighbourhood of x0 for performing the estimation of  * ( 0 ).Ordinary Kriging is implemented through ArcGISv10.4.1 (Redlands, CA, USA) (ESRI, 2009) and its geostatistical analyst extension (Johnston et al., 2001).
The kriging variance   2 (  ) in the Ordinary Kriging can be computed as (Adhikary et al., 2015;Xu et al., 2018) where γ(h) is the variogram value for the distance h; h 0i is the distance between observed data points   and   ;   is the Lagrangian multiplier in the  scale; h 0j is the distance between the unsampled location x 0 (where the estimation is desired) and sample locations x i ; and n is the number of sample locations.
The square root of the kriging variance, also named as kriging standard error (KSE), is used as a gauge network evaluation factor.We estimate the increase in the kriging standard error across the study area when stations are removed to evaluate the performance of the WDB measure in identifying influential and expendable stations in a large network.Goovaerts (1997, p. 179) states.
The relative kriging error before and after removing the stations is denoted as where   denotes the standard kriging error after removing stations, and   is the error for the original network.We hypothesise that the increase in the relative kriging error is higher when removing high ranking stations.
To cover a broad range of rainfall characteristics, the error is calculated for different statistics, i.e. the mean, 90 th , 95 th and 99 th percentile rainfall and the number of wet days (precipitation > 2.5mm).

Rainfall Data
To evaluate the proposed measure in the context of the optimal design of hydrometric networks, we apply it to an extensive network of rain stations in Germany and adjacent areas (Fig. 4).The data covers 110 years at daily resolution (1 January 1901 to 31 December 2010).The 1229 rain stations in Germany (blue dots in Fig. 4) are operated by the German Weather Service.Data processing and quality control were performed according to Österle et al. (2006), and in this study, we assume that data is free from measurement errors.211 stations from different sources outside Germany (red dots in Fig. 4) were included in the analysis to minimize spatial boundary effects in the network construction; however, these stations were excluded from the node ranking analysis.

Network Construction 5
We begin the network construction by extracting event time series from the 1229 daily rainfall time series.The event series represent heavy rainfall events, i.e., precipitation exceeding the  = 95ℎ percentile at that station (Rheinwalt et al., 2016).The 95 th percentile is a compromise between having a sufficient number of rainfall events at each location and a rather high threshold to study heavy precipitation.All rainfall event series are compared with each other using event synchronization (section 2.2) which is the base for deriving a complex network.This results in the similarity 10 matrix Q, whereas the entry at index pair (i,j) defines synchronization in the occurrence of heavy rainfall events at station i and station j (Eq.5).
Applying a certain threshold () to the  matrix yields the adjacency matrix (Eq.1).Here,    is a chosen threshold, and   = 1 denotes a link between the ℎ and ℎ sites, and   = 0 denotes otherwise.The adjacency matrix represents a rain gauge network, and complex network theory can subsequently be employed to reveal properties of the given network.
Two criteria have been proposed to generate an adjacency matrix from a similarity matrix, such as fixed amount of link density (Agarwal et al., 2018a;Stolbova et al., 2014) or global fixed thresholds (Jha et al., 2015;Sivakumar and Woldemeskel, 2014).However, both criteria are subjective and may lead to the presence of weak and non-significant links in the complex network.These non-significant links might obscure the topology of strong and significant connections.To minimize these threshold effects, we choose the threshold  ,  objectively by considering all links in the network that are significant.A link is significant (i.e. two stations are significantly synchronized) if the synchronization value exceeds the  ,  =95 th percentile (corresponding to a 5% significance level) of the synchronization obtained by two synthetic variables that have the same number of events but distributed randomly in the time series (i.e., both event series are independent).We calculate ES for 100 pairs of such random time series and derive the 95 th percentile of the resulting ES distribution.Using this 5% significance level, we assume that synchronization cannot be explained by chance, if the ES value between two stations is larger than the 95 th percentile of the test distribution.Here, we select 5% significance level since it is a well-accepted criterion in general in statistics.To validate the results, we have performed analysis for certain threshold range 90-99 th percentile and observe that node rankings are robust for comparatively high threshold.A detailed analysis has been presented in this study for 95 th percentile for the sake of brevity.

Decline Rate of Network Efficiency
In this section, we evaluate the ranking of stations derived from the proposed WDB measure using the decline rate of network efficiency.The rain gauges are ranked in decreasing order according to their WDB values.Highly ranked rain gauges are interpreted as the most influential stations, and low ranked as expendable stations.
Firstly, we analyze the decline rate of network efficiency  when one station is removed from the network.In each trial, we remove only one station (starting with the highest rank).After n=1229 (number of nodes) trials, we investigate the relationship between  and the node ranking measured by WDB.We expect an inverse relationship between  and WDB: the higher the node ranking, the more important is that node, leading to a higher loss in network efficiency (Fig. 5). is high for high-ranking stations and decays with node ranking.Interestingly,  < 0 for very low ranking stations, i.e. the network efficiency increases when single, low ranking stations are removed.This is explained by the decrease of the redundancy in the network when such stations are removed.In each implementation, only one node is removed from the network according to the ranking with replacement (bootstrapping).
Secondly, we remove successively a larger number of stations, from 1 to 123 stations (10%), considering three cases.In 5 case I, we remove up to the 10% highest ranking stations.This implies that in the first iteration we remove the topranked station and in the second iteration we remove the top two stations and so on.Fig. 6 shows a clear increase in  when more and more influential stations are removed.In case II, up to the 10% lowest ranking stations are successively removed.The efficiency increases when the lowest ranking stations are removed.In case III, up to 10% stations are randomly removed.Case III is repeated ten times to understand the effect of random sampling.In general,  increases 10 with removing random stations.However, the effect is much lower (in absolute terms) compared to the effect of removing high or low ranking stations, respectively.The variation in  between the ten trials and within one trial is caused by randomness.For example,  rises instantaneously when the algorithm picks up a high ranking station.

Relative Kriging Error
As the second approach to assess the suitability of WDB for identifying influential and expendable stations, we analyse the change in the kriging error when stations are removed from the network.We first estimate the kriging standard error across the study area for all 1229 stations termed as   .Then, we measure the increase or decrease in the kriging standard error across the study area when stations are removed terms as   .The variogram is kept constant during the network modifications.Similarly, to the evaluation using the decline rate of network efficiency in section 4.3, three cases are investigated: removing the 10% highest ranking stations, removing the 10% lowest ranking stations, and ten trials of removing 10% of the stations randomly.
The change in the kriging error is calculated for five characteristics, i.e., mean, 90%-, 95%-, 99%-percentile, and number of wet days (Table 1).For each case and rainfall characteristics we run model 100 times and the mean value of ℜ has been reported in Table 1.
Removing the 10% high-ranking stations (case I) leads to positive and high (ℜ > 5%) relative kriging errors for all five statistics considered, i.e. the kriging error increases substantially when these stations are removed.When the 10% lowest ranking stations (case II) are not considered, the ℜ values are small compared to those obtained by removing high ranking stations.The relative errors in estimating the mean, percentile rainfall characteristics (90 th and 95 th ) and number of wet days at ungauged locations are low (<5%) for the 10% lowest ranking stations, suggesting that these stations do not contribute much information.Case III, i.e. removing stations randomly, shows mostly positive and high (ℜ > 5%) values, because high ranking nodes are removed as well, which leads to higher rates of ℜ(%).However, in future, to further advance the model weighted kriging method could be used.

Discussion
Building on the young science of complex networks, a novel node ranking measure, the weighted degree-betweenness WDB, is proposed.The proposed method based on degree and betweenness centrality not only account the local (captured by degree) and global (captured by betweenness centrality) characteristics of nodes but also the cumulative effect of the influence or contribution of the directly connected (localized) nodes.
Further, this study proposes to use WDB for supporting the optimal design of large hydrometric networks.We compared our proposed measure WDB with other traditional (i.e.degree and betweenness centralities) and contemporary (i.e.Bridgeness and DIL) measures by applying it to prototypical situations.The results show that degree and betweenness centrality are unable in differentiating between different roles of a node in a network.Whereas contemporary network measure Bridgeness and DIL showed higher power in discriminating different roles of nodes but are restricted to provide a nuanced picture of marginal differences, for example between a local centre and a global bridge.In our test framework, WDB seems to be comparatively more informative to distinguish the different roles of nodes and provides a unique value to each node depending on its importance and influence in our test network.
The preliminary application of the WDB to the hydrometric monitoring network shows its ability to rank the nodes in a large hydrometric network in relation to their different roles, such as global bridge, local center, dead-end node, hub (high degree), or non-hub (low degree).The resulting ranking can be used to identify influential and expendable hydrometric stations.For example, removing low ranking stations in the German rain gauge network does not have adverse impact on the network efficiency, and errors are within the permissible limit.This is explained by the redundancy in the information that those stations provide, which in turn is attributed to the similarity between the gauges due to the common driving mechanisms or spatial similarity as advocated by Tobler's Law of Geography (Tobler, 1970).The results of our analysis suggest that WDB identifies the expendable nodes correctly as shown by the decline rate of efficiency and the insignificant change in relative kriging error.On the other hand, WDB awards stations that provide unique information as it considers different aspects of the spatio-temporal relationships in the observation network.However, this could be further strengthen using weighted kriging method or evaluating the results at individual locations rather than for entire layer.
We further analyzed the characteristics of the stations with the highest ranks.We plot the network (Fig. 7a) corresponding to the 10% (~122) high ranking stations, i.e. all the links originating only from these 122 stations.The size and color of each diamond-shaped rain gauge mark their degree and betweenness centrality.All other stations are plotted in the background without highlighting their degree and betweenness.We further plot the connections corresponding to two high ranking stations (Fig. 7b) and two low ranking stations (Figure 7c) to ease interpretation.
Although the degree of these four stations is roughly the same, the connections of low ranking stations are regionally confined, and they rather reflect the similarity in rainfall variability within (homogenous) regions.Highest ranked stations are not governed by only local or global features but rather the quantitative combination of both (Figure 7a).
This observation could reflect the critical nodes in pathways of moisture transport, extreme rainfall propagation, or (in case of betweenness centrality) a handful of stations which are positioned in-between the large communities and unlike most stations they tend to possess intercommunity connections (Halverson and Fleming, 2015;Molkenthin et al., 2015;Tupikina et al., 2016).We computed the geographical distance between all the connected raingauges and plot its median (Fig. 7d) and 95 th percentile (Fig. 7e) against the node ranking to test whether the long-range connections of the selected nodes in Fig. 7b are a typical feature of high ranking stations.There is a clear association between rank and distance: High ranking stations tend to show longer connections, implicitly affirming that the WDB measure has the potential to capture highly influential nodes in the network.
Further, Fig. 7 is also in congruence with the results reported by the declining rate of kriging error in section 4.4 and Table 1.Intuitively, "the kriging variance is expected to be greater at a location surrounded by data that are very different from one another (Fig. 7b) than at a location surrounded by similarly valued (Fig. 7c) data" (Goovaerts, 1997;Heuvelink and Pebesma, 2002).Hence, we notice higher kriging errors (Table 1) when removing influential stations compared to randomly selected and low ranking stations.Based on our analysis, we suggest that ranking of nodes in large networks has the major benefits that the new measure could add to the optimal design of hydrometric networks or redesign of existing hydrometric networks.However, the impact of similarity measure, number of stations present in the network, spatial boundary, data length and threshold needs to be investigated in detail before the method could be used further.Acknowledging that fact that complex network science is in infancy state at least in hydrology but had 5 grown manifold in other domains and offered powerful solutions.This showed the need that more intensive application, new interpretable network measures and visualization tools are needed to find the modern solutions of traditional hydrological problems.

Conclusions
This study proposes to apply complex networks to the optimization of hydrometric monitoring networks.In addition, it proposes a novel node ranking measure for identifying influential and expendable nodes in a complex network.The new network measure weighted degree-betweenness (WDB) combines the measures degree and betweenness centralities and not only account the local and global characteristics of nodes but also the cumulative effect of the influence or contribution of the directly connected (localized) nodes.Its comparison to existing measures demonstrates that WDB is more sensitive to the different roles of nodes, such as global connecting nodes or local centres as it considers different aspects of the spatio-temporal relationships in observation network.
We propose to use WDB for ranking rain gauges in hydrometric networks.Applying WDB to a network of 1229 rain gauges in Germany allows identifying influential and expendable stations.Two criteria, the decline rate of network efficiency and the kriging error, are used to evaluate the performance of the proposed node ranking measure.The results suggest that the proposed measure is indeed capable of effectively ranking the stations in large hydrometric networks.
We suggest that the proposed measure is not only useful for rain gauge networks but has also potential to support the selection of an optimal number of stations for the prediction in ungauged basins (PUBs) and estimating missing values by identifying influential stations in the region.Similarly, the proposed method can be applied to gridded satellite data (rainfall, soil moisture), to locate the strategic points where stations should be installed to ensure a highly efficient observation network.For instance, identifying influential grid points in the network of satellite data (rainfall, soilmoisture) will guide where to install monitoring stations.However, acknowledging the preliminary work done in this study, WDB application needs to be investigated in detail and this is currently out of the scope of the study domain.In addition, follow-up studies addressing threshold and spatial boundary issues of the network, physical interpretable measures and visualization are needed to prove the benefit of complex networks science in hydrometric network  The variogram models are a function of three parameters, known as the range, the sill, and the nugget (Fig. A2 (a)).The 10 range is typically the distance where the models first flattens out, i.e. station locations separated by distances closer than the range are spatially auto-correlated, whereas locations farther apart than the range are not.The value of γ at the range is called the sill.The variance of the sample is used as an estimate of the sill.Nugget represents measurement error and/or microscale variation at spatial scales that are too fine to detect and is seen as a discontinuity at the origin

Figure 1 :
Figure 1: Topology of two sample networks to explain network structures and measures.(a) Network N1 with four nodes and three links; (b) network N2 with four nodes and six links.
quantified the spatial diversity of Indian rainfall teleconnections at different timescale by identifying linkages between climatic indices (e.g.El Niño/Southern Oscillation, Indian Ocean Dipole, North Atlantic Oscillation, Pacific Decadal Oscillation, and Atlantic Multidecadal Oscillation) and seven Indian rainfall stations (bridge nodes).Betweenness centrality has a higher power in significantly discriminating different roles compared to   .For example, nodes 4 and 5 have the highest   ( 4 =  5 = 24) followed by node 6 ( 6 = 20).On the other hand,   gives equal scores to local centers (node 4), i.e., nodes of high   to a single region, and to global bridges (node 5), which connect detached regions.As mentioned, global bridges connect different parts of a network (e.g.teleconnection between Indian rainfall and ENSO) and measuring and interpretation of spatially large variations, process identification, interpolation of measurements and transferability of precipitation measurements across locations, would be restricted in the absence of high   nodes.

Figure 2 :
Figure 2: Synthetic network to explain the degree (k), betweenness centrality () and weighted degreebetweenness () measures, with node number (1 to 8) followed by the degree, betweenness centrality value and  value in brackets [k, B, WDB].Degree and betweenness are limited in distinguishing the role of different nodes in the network and centers from bridges, respectively.

Figure 3 :
Figure 3: Synthetic network used to compare the network measures betweenness centrality, bridgeness, and DIL with the proposed measure WDB.Numbers 1 to 11 are node counts, and values in brackets represent the network measure values in order of [, , ,  ].Node 6 is a global bridge node that connects two sub-networks.Node 4 and 7 are hubs that are connected to most of the nodes in the sub-networks.Node 5, 10 and 11 are the dead-end nodes.

Figure 4 :
Figure 4: Location of rain stations in Germany and adjacent areas.Blue dots indicate stations lying inside Germany that are used in the analysis.Red dots indicate stations outside of Germany that are used for network construction only to minimize the boundary effect.

Figure 5 :
Figure 5: Decline rate of network efficiency corresponding to the removal of each node in the rainfall network.

Figure 6 :
Figure 6: Decline rate of network efficiency as a function of the number of stations removed from the network.Case I: up to the 10% highest ranking stations are removed (black), case II: up to the 10% lowest ranking stations are removed (red), case III: up to 10% randomly drawn stations are removed (10 trials) (blue).

Figure 7 :
Figure 7: Connections and location of 10% (~122) highest ranking rain gauges (a).The size and colour of the diamond 10 marker indicate the degree and betweenness centrality of the rain gauges, respectively.Connections corresponding to

Figure A1 :
Figure A1: Sample rain gauge network constructed using cross-correlaton similairy measure and 90 th percentile threshold only for illustrative purproses.Autocorrelation (digonal) has been ignored in the network costruction.Numbers 1 to 11 are node counts, and values in brackets represent the WDB values.