Sensors and sensor networks play an important role in decision-making related to water quality, operational streamflow forecasting, flood early warning systems, and other areas. In this paper we review a number of existing applications and analyse a variety of evaluation and design procedures for sensor networks with respect to various criteria. Most of the existing approaches focus on maximising the observability and information content of a variable of interest. From the context of hydrological modelling only a few studies use the performance of the hydrological simulation in terms of output discharge as a design criterion. In addition to the review, we propose a framework for classifying the existing design methods, and a generalised procedure for an optimal network design in the context of rainfall–runoff hydrological modelling.
Optimal design of sensor networks is a key procedure for improved water management as it provides information about the states of water systems. As the processes taking place in catchments are complex and the measurements are limited, the design of sensor networks is (and has been) a relevant topic since the beginning of the International Hydrological Decade (1965–1974, TNO, 1986) until today (Pham and Tsai, 2016). During this period, the scientific community has not yet arrived at an agreement about a unified methodology for sensor network design due to the diversity of cases, criteria, assumptions, and limitations. This is evident from the range of existing reviews on hydrometric network design, such as those presented by WMO (1972), TNO (1986), Nemec and Askew (1986), Knapp and Marcus (2003), Pryce (2004), NRC (2004), and Mishra and Coulibaly (2009).
The design of rainfall and streamflow sensor networks depends to a large extent on the scale of the processes to be monitored and the objectives to address (TNO, 1986; Loucks et al., 2005). Therefore, the temporal and spatial resolution of measurements are driven by the measurement objectives. For example, information for long-term planning does not require the same level of temporal resolution as for operational hydrology (WMO, 2009; Dent, 2012). On the global and country scale, sensor networks are commonly used for climate studies and trend detection (Cihlar et al., 2000; Grabs and Thomas, 2002; WMO, 2009; Environment Canada, 2010; Marsh, 2010; Whitfield et al., 2012), and are denoted as National Climate Reference Networks (WMO, 2009). On a regional or catchment scale, applications require careful selection of monitoring stations, since water resource planning and management decisions, such as operational hydrology and water allocation, require high temporal and spatial resolution data (Dent, 2012).
This paper presents a review of methods for optimal design and evaluation of
precipitation and discharge sensor networks at catchment scale, proposes a
framework for classifying the design methods, and suggests a generalised
framework for optimal network design for surface hydrological modelling. It
is possible to extend this framework to other variables in the hydrological
cycle, since optimal sensor location problems are similar. The framework
introduced here is part of the results of the FP7 WeSenseIt project
(
The structure of this paper is as follows: first, a classification of sensor network design approaches according to the explicit use of measurements and models is presented, including a review of existing studies. Next, a second way of classification is suggested, which is based on the classes of methods for sensor network analysis, including statistics, information theory, case-specific recommendations, and others. Then, based on the reviewed literature, an aggregation of approaches and classes is presented, identifying potential opportunities for improvement. Finally, a general procedure for the optimal design of sensor networks is proposed, followed by conclusions and recommendations.
The design of a sensor network uses the same concepts as experimental design (Kiefer and Wolfowitz, 1959; Fisher, 1974). The design should ensure that the data are sufficient and representative, and can be used to derive the conclusions required from the measurements (EPA, 2002), or to assess the water status of a river system (EC, 2000). In the context of rainfall–runoff hydrological modelling, provide the sufficient data for accurate simulation and forecasting of discharge and water levels, at stations of interest.
The objectives of the sensor network design have been categorised into two groups, the optimality alphabet (Fedorov, 1972; Box, 1982; Fedorov and Hackl, 1997; Pukelsheim, 2006; Montgomery, 2012), which uses different letters to name different design criteria, and the Bayesian framework (Chaloner and Verdinelli, 1995; DasGupta, 1996). The alphabetic design is based on the linearisation of models, optimising particular criteria of the information matrix (Fedorov and Hackl, 1997). Bayesian methods are centred on principles of decision-making under uncertainty, in which it seeks to maximise the gain in information (Shannon, 1948) between the prior and posterior distributions of parameters, inputs, or outputs (Lindley, 1956; Chaloner and Verdinelli, 1995). Among the most used alphabetic objectives are the D-optimal, which minimises the area of the uncertainty ellipsoids around the model parameters, and G-optimal, which minimises the variance of the predicted variable, which can also be used as objective functions in the Bayesian design.
These general objectives are indirectly addressed in the literature of optimisation of hydrometric sensor networks, achieved by the use of several functional alternatives. These approaches do not consider block experimental design (Kirk, 2009), due to the incapacity to replicate initial conditions in a non-controlled environment, such as natural processes.
On the practical side, the design of a sensor network should start with the institutional set-up, purposes, objectives, and priorities of the network (Loucks et al., 2005; WMO, 2008b). From the technical point of view, an optimal measurement strategy requires the identification of the process, for which data are required (Casman et al., 1988; Dent, 2012). Considering that the information objectives are not unique and consistent or that the characterisation of the processes is not complete, the re-evaluation of the sensor network design should occur on a regular basis. Therefore, the sensor network should be re-evaluated when the studied process, information needs, information use, or modelling objectives change. Consequently, regulations regarding monitoring activities are often strict not in terms of station density, but in the suitability of data for providing information about the status of the water system (EC, 2000; EPA, 2002).
The design of meteorological and hydrometric sensor networks should consider at least three aspects. First, it should meet various objectives that are sometimes conflicting (Loucks et al., 2005; Kollat et al., 2011). Second, it should be robust in the event of failure of one or more measurement stations (Kotecha et al., 2008). Third, it must take into account different purposes and users with different temporal and spatial scales (Singh et al., 1986). Therefore, the design of an optimal sensor network is a multi-objective problem (Alfonso et al., 2010b).
The sensor network design can also be seen from an economic perspective (Loucks et al., 2005). In most cases, the main limitation in the deployment of sensor networks is related to costs, being sometimes the main driver of decisions related to reduction of the monitoring networks. The valuation between the costs of the sensor networks and the cost of having insufficient information is not usually considered, because the assessment of the consequences of decisions is made a posteriori (Loucks et al., 2005; Alfonso et al., 2016). In most studies, it is seen that the improvement of information content metrics (e.g. entropy, uncertainty reduction, among others) is marginal as the number of extra sensors increases (Pardo-Iguzquiza, 1998; Dong et al., 2006; Ridolfi et al., 2011), and thus the selection of the adequate number of sensors can be based on a threshold in the rate of increment in the objective function. However, in many practical applications the number of available sensors may be defined by budget limitations. Therefore, the optimal number of sensors in a network is strictly case-specific (WMO, 2008c).
Typical data flow in discharge simulation using hydrological models.
Scenarios for designing of sensor networks may be categorised into three
groups: augmentation, relocation, and reduction (NRC, 2004; Mishra and
Coulibaly, 2009; Barca et al., 2015).
The lack of data usually drives the sensor network augmentation, whereas economic limitations usually push for reduction. These costs of the sensor network usually relate to the deployment of physical sensors in the field, and transmission, maintenance, and continuous validation of data (WMO, 2008c).
Augmentation and relocation problems are fundamentally similar, as they require estimation of the measured variable at ungauged locations. For this purpose, statistical models of the measured variable are often employed. For example, Rodriguez-Iturbe and Mejia (1974) described rainfall regarding its correlation structure in time and space, Pardo-Igúzquiza (1998) expressed areal averages of rainfall events with ordinary Kriging estimation, and Chacon-Hurtado et al. (2009) represented rainfall fields using block Kriging. In contrast, for network reduction, the analysis is driven by what-if scenarios as the measurements become available. Dong et al. (2005) employ this approach to re-evaluate the efficiency of a river basin network based on the results of hydrological modelling.
In principle, augmentation and relocation aim to increase the performance of the network (Pardo-Igúzquiza, 1998; Nowak et al., 2010). In reduction, by contrast, network performance is usually decreased. The driver of these decisions is usually related to factors such as operation and maintenance costs (Moss et al., 1982; Dong et al., 2005).
The typical data flow for hydrological rainfall–runoff modelling can be summarised as in Fig. 1. For discharge simulation, precipitation and evapotranspiration are the most common data requirements (WMO, 2008c; Beven, 2012), while discharge data are commonly employed for model calibration, correction, and update (Sun et al., 2015). Data-driven hydrological models may use measured discharge as input variables as well (e.g. Solomatine and Xue, 2004; Shrestha and Solomatine, 2006). Methods for updating of hydrological models have been widely used in discharge forecasting as data assimilation, using the model error to update the model states. In this way, more accurate discharge estimates can be obtained (Liu et al., 2012; Lahoz and Schneider, 2014). In real-time error correction schemes, typically, a data-driven model of the error is employed which may require as input any of the mentioned variables (Xiong and O'Connor, 2002; Solomatine and Ostfeld, 2008).
In a conceptual way, we can express the quantification of discharge at a
given station as (Solomatine and Wagener, 2011)
There is a variety of approaches for the evaluation of sensor networks, ranging from theoretically sound to more pragmatic. In this section, we provide a general classification of these approaches, and more details of each method are given in the next section.
Although most of the approaches for the design of sensor networks make use of data, some rely solely on experience and recommendations. Therefore, a first tier in the proposed classification consists of recognising both measurement-based and measurement-free approaches (Fig. 2). The former make use of the measured data to evaluate the performance of the network (Tarboton et al., 1987; Anctil et al., 2006), while the latter use other data sources (Moss and Tasker, 1991), such as topography and land use.
Proposed classification of methods for sensor network evaluation.
The measurement-based approach can be further subdivided into model-free and model-based approaches (Fig. 2), depending on the use of modelling results in the performance metric.
In model-free approaches, water systems and the external processes that drive their behaviour are observed through existing measurements, without the use of catchment models. Then, metrics about amount and quality of information in space and time are evaluated with regards to the management objectives and the decisions to be made in the system. Some performance metrics in this category are joint entropy (Krstanovic and Singh 1992), information transfer (Yang and Burn, 1994), interpolation variance (Pardo-Igúzquiza, 1998; Cheng et al., 2007), and autocorrelation (Moss and Karlinger, 1974), among others. Figure 3 presents the flowchart for the case when precipitation and discharge, as the main drivers of catchment hydrology (WMO, 2008c), are considered in model-free network evaluation.
General procedure for model-free sensor network evaluation.
Fundamentally, the model-free approach aims to minimise the variance of the measured variable, thereby (and in theory) minimising the variance in the estimation (Eq. 3). However, a design that is optimal for estimation is not necessarily also optimal for prediction (Chaloner and Verdinelli, 1995).
In the model-based approach, the performance of sensor networks is carried out using a catchment model (Dong et al., 2005; Xu et al., 2013). In this case, measurements of precipitation are used to simulate discharge, which is compared to the discharge measurements at specific locations. Therefore, any metric of the modelling error could be used to evaluate the performance of the network. Figure 4 presents a generic model-based approach for evaluating sensor networks.
General procedure for model-based sensor network evaluation.
In the model-based design of sensor networks, it is assumed that the model
structure and parameters are adequate. Therefore, it is possible to identify
a set of measurements (
As is seen from its name, this approach does not require the previous collection of data of the measured variable to evaluate the sensor network performance. The evaluation of sensor networks is based on either experience or physical characteristics of the area such as land use, slope, or geology. In this group of methods, the following can be mentioned: case-specific recommendations (Bleasdale, 1965; Wahl and Crippen, 1984; Karasseff, 1986; WMO, 2008a) and physiographic components (Tasker, 1986; Laize, 2004). This approach is the first step towards any sensor network development (Bleasdale, 1965; Moss et al., 1982; Nemec and Askew, 1986; Karasseff, 1986).
In this section, we classify the methods used to quantify the performance of the sensor networks based on the mathematical apparatus used to evaluate the network performance. These methods can broadly be categorised as statistics-based, information theory-based, expert recommendations, and others.
Statistics-based methods refer to methods where the performance of the network is evaluated with statistical uncertainty metrics of the measured or simulated variable. These methods aim to minimise either interpolation variance (Rodriguez-Iturbe and Mejia, 1974; Bastin et al., 1984; Bastin and Gevers, 1985; Pardo-Igúzquiza, 1998; Bonaccorso et al., 2003), cross-correlation (Maddock, 1974; Moss and Karlinger, 1974; Tasker, 1986) or model error (Dong et al., 2005; Xu et al., 2013).
Methods to evaluate sensor networks considering a reduction in the interpolation variance assume that for a network to be optimal, the measured variable should be as certain as possible in the domain of the problem. To achieve this, a stochastic interpolation model that provides uncertainty metrics is required. Geostatistical methods such as Kriging (Journel and Huijbregts, 1978; Cressie, 1993) or copula interpolation (Bárdossy, 2006) have an explicit estimation of the interpolation error. This characteristic makes it suitable for identifying areas with expected poor interpolation results, (Bastin et al., 1984; Pardo-Igúzquiza, 1998; Grimes et al., 1999; Bonaccorso et al., 2003; Cheng et al., 2007; Nowak et al., 2009, 2010; Shafiei et al., 2013).
In the case of Kriging, the optimal estimation of a variable at ungauged
locations is assumed to be a linear combination of the measurements, with a
Gaussian distributed probability distribution function. Under the ordinary
Kriging formulation, the variance in the estimation (
Therefore, as an objective function the optimal sensor network is such that the total Kriging variance (TKV) is minimum:
Bastin and Gevers (1985) optimised a precipitation sensor network at pre-defined locations to estimate the average precipitation for a given catchment. Their selection of the optimal sensor location consisted of minimising the normalised uncertainty by reducing the network. The main drawback of their approach is that the network can only be reduced and not augmented. Similar approaches have also been used by Rodriguez-Iturbe and Mejia (1974), Bogárdi et al. (1985), and Morrissey et al. (1995). Pardo-Igúzquiza (1998) advanced this formulation by removing the pre-defined set of locations (allowing augmentation). Instead, rain gauges were allowed to be placed anywhere in the catchment and its surroundings. A simulated annealing algorithm is used to search for the optimal set of sensors to minimise the interpolation uncertainty.
Copula interpolation is a geostatistical alternative to Kriging for the modelling of spatially distributed processes (Bárdossy, 2006; Bárdossy and Li, 2008; Bárdossy and Pegram, 2009). As a geostatistical model, the copula provides metrics of the interpolation uncertainty, considering not only the location of the stations and the model parameterisation, but also the value of the observations. Li et al. (2011) use the concept of a copula to provide a framework for the design of a monitoring network for groundwater parameter estimation, using a utility function, related to the cost of a given decision with the available information.
In the case of copulas, the full conditional probability distribution function of the variable is interpolated. As such, the interpolation uncertainty depends on the confidence interval, measured values, parameterisation of the copula, and the relative position of the sensors in the domain of the catchment. More details on the formulation of copula-based designs can be found in Bárdossy and Li (2008).
Cheng et al. (2007), as well as Shafiei et al. (2013), recognised that the temporal resolution of the measurements affects the definition of optimality in minimum interpolation variance methods. This change in the spatial correlation structure occurs due to more correlated precipitation data between stations at coarser sampling resolutions (Ciach and Krajewski, 2006). For this purpose, the sensor network has to be split into two parts, a base network and non-base sensors. The former should remain in the same position for long periods, to characterise longer fluctuation phenomena, based on the definition of a minimum threshold for an area with acceptable accuracy. The latter is relocated to improve the accuracy of the whole system, and should be relocated as they do not provide a significant contribution to the monitoring objective.
Recent efforts have used minimum interpolation variance approaches to consider the non-stationarity assumption of most geostatistical applications in sensor network design (Chacon-Hurtado et al., 2014). To this end, changes in the precipitation pattern and its effect on the uncertainty estimation were considered during the development of a rainfall event.
The objective of minimum cross-correlation methods is to avoid placing
sensors at sites that may produce redundant information. Cross-correlation
was suggested by Maddock (1974) for sensor network reduction, as a way to
identify redundant sensors. In this scope, the objective function can be
written as
Stedinger and Tasker (1985) introduced the method called network analysis
using generalised least squares (NAUGLS), which assesses the parameters of a
regression model for daily discharge simulation based on the physiographic
characteristics of a catchment (Stedinger and Tasker, 1985; Tasker, 1986;
Moss and Tasker, 1991). The method builds a generalised-least-square (GLS)
covariance matrix of regression errors to correlate flow records and to
consider flow records of different lengths, so the sampling mean squared
error can be expressed as
A comparable method was proposed by Burn and Goulter (1991), who used a
correlation metric to cluster similar stations. Vivekanandan and
Jagtap (2012) proposed an alternative for the location of discharge sensors
in a recurrent approach, in which the most redundant stations were removed,
and the most informative stations remained using Cooks'
These methods assume that the optimal sensor network configuration is such as
satisfies a particular modelling purpose, e.g. a minimum error in simulated
discharge. Considering this, the design of a sensor network should be such as
minimises the difference between the simulated and recorded variables:
Theoretically, this score varies from minus infinity to 1. However, its practical range lies between 0 and 1. On the one hand, an NSE equal to 0 indicates that the model has the same explanatory capabilities as the mean of the observations. On the other end, a value of 1 represents a perfect fit between model results and observations. Model output error formulations have been used to identify the most convenient set of sensors that provide the best model performance (Tarboton et al., 1987) to propose measurement strategies regarding the number of gauges and sampling frequency.
Another application is provided by Dong et al. (2005), who proposed
evaluating the rainfall network using a lumped HBV model
(Lindström et al., 1997). They found that the model performance does not
necessarily improve when extra rain gauges are placed. A similar approach was
presented by Xu et al. (2013), who evaluated the effect of diverse rain gauge
locations on runoff simulation using a similar hydrological model. It was
found that rain gauge locations could have a significant impact and suggests
that a gauge density of less than 0.4 stations per 1000 km
Anctil et al. (2006) aimed at improving lumped neural network rainfall–runoff forecasting models through mean areal rainfall optimisation, and concluded that different combinations of sensors lead to noticeable streamflow forecasting improvements. Studies in other fields have also used this method. For example, Melles et al. (2009, 2011) obtained optimal monitoring designs for radiation monitoring networks, which minimise the prediction error of mean annual background radiation. The main drawback of this approach is that multiple error metrics are considered, as specific objectives relate to different processes.
The use of information theory (Shannon, 1948) in the design of sensor networks for environmental monitoring is based on communication theory, which studies the problem of transmitting signals from a source to a receiver throughout a noisy medium. Information theory provides the possibility of estimating probability distribution functions in the presence of partial information with the less biased estimation (Jaynes, 1957). Some of its concepts are analogous to statistics concepts, and therefore similarities between entropy and uncertainty, such as mutual information and correlation, can be found (Cover and Thomas, 2005; Alfonso, 2010; Singh, 2013).
Information theory-based methods for designing sensor networks mainly consider the maximisation of information content that sensors can provide, in combination with the minimisation of redundancy among them (Krstanovic and Singh, 1992; Mogheir and Singh, 2002; Alfonso et al., 2010a, b, 2013; Alfonso, 2010; Singh, 2013). Redundancy can be measured by using mutual information (Singh, 2000; Steuer et al., 2002), directional information transfer (Yang and Burn, 1994), or total correlation (Alfonso et al., 2010a, b; Fahle et al., 2015), among others.
The principle of maximum entropy (POME) is based on the premise that probability distribution with the largest remaining uncertainty (i.e. the maximum entropy) is the one that best represents the current stage of knowledge. POME has been used as a criterion for the design of sensor networks, by allowing the identification of the set of sensors that maximises the joint entropy among measurements (Krstanovic and Singh, 1992), in other words, to provide as much information content, from the information theory perspective, as possible (Jaynes, 1988).
In the design of sensor networks, the objective is to maximise the joint
entropy (
Krstanovich and Singh (1992) presented a concise work on rainfall network evaluation using entropy. They used POME to obtain multivariate distributions to associate different dependencies between sensors, such as joint information and shared information, which was used later either to reduce the network (in the case of high redundancy) or to expand it (in the case of a lack of common information).
Fuentes et al. (2007) proposed an entropy-utility criterion for environmental sampling, particularly suited for air-pollution monitoring. This approach considers Bayesian optimal sub-networks using an entropy framework, relying on the spatial correlation model. An interesting contribution of this work is the assumption of non-stationarity, contrary to traditional atmospheric studies, and relevant in the design of precipitation sensor networks.
Hydraulic 1-D models and metrics of entropy have been used to select the adequate spacing between sensors for water level in canals and polder systems (Alfonso et al., 2010a, b). This approach is based on the current conditions of the system, which makes it useful for operational purposes, but it does not necessarily support the modifications in the water system conditions or changes in the operation rules. Studies on the design of sensor networks using these methods have been on the rise in recent years (Alfonso, 2010; Alfonso et al., 2013; Ridolfi et al., 2014; Banik et al., 2017).
Benefits of POME include the robustness of the description of the posterior probability distribution since it aims to define the less biassed outcome. This is because neither the models nor the measurements are completely certain. Li et al. (2012) presented, as part of a multi-objective framework for sensor network optimisation, the criteria of maximum (joint) entropy, as one of the objectives. Other studies in this direction have been presented by Lindley (1956), Caselton and Zidek (1984), Guttorp et al. (1993), Zidek et al. (2000), Yeh et al. (2011), and Kang et al. (2014).
More recently, Samuel et al. (2013) and Coulibaly and Samuel (2014) proposed a mixed method involving regionalisation and dual entropy multi-objective optimisation (CRDEMO), which is a step forward when compared to single-objective optimisation for sensor network design.
Mutual information is a measurement of the amount of information that a
variable contains about another. This is measured as the
An optimal sensor network should avoid collecting repetitive or redundant
information; in other words, it should reduce the mutual (shared) information
between sensors in the network. Alternatively, it should maximise the
transferred information from a measured to a modelled variable at a point of
interest (Amorocho and Espildora, 1973). Following this idea, Husain (1987)
suggested an optimisation scheme for the reduction of a rain sensor network.
His objective was to minimise the trans-information between pairs of
stations. However, assumptions of the probability and joint probability
distribution functions are strong simplifications of this method. To overcome
these assumptions, the Directional Information Transfer (DIT) index was
introduced (Yang and Burn, 1994) as the inverse of the coefficient of
non-transferred information (NTI) (Harmancioglu and Yevjevich, 1985). Both
DIT and NTI are a normalised measure of information transfer between two
variables (
Particularly for the design of precipitation sensor networks, Ridolfi et al. (2011) presented a definition of the maximum achievable information content for designing a dense network of precipitation sensors at different temporal resolutions. The results of this study show that there exists a linear dependency between the non-transferred information and the sampling frequency of the observations.
Total correlation (
Recommended minimum densities of stations (area in km
A method to estimate trans-information fields at ungauged locations has been proposed by Su and You (2014), employing a trans-information–distance relationship. This method accounts for spatial distribution of precipitation, supporting the augmentation problem in the design of precipitation sensor networks. However, as the relationship between trans-information between sensors and their distance is monotonic, the resulting sensor networks are generally sparse.
Among the most used planning tools for hydrometric network design are the technical reports presented by the WMO (2008c), in which a minimum density of stations depending on different physiographic units, are suggested (Table 1). Although these guidelines do not provide an indication about where to place hydrometric sensors, rather they recommend that their distribution should be as uniform as possible and that network expansion has to be considered. The document also encourages the use of computationally aided design and evaluation of a more comprehensive design. For instance, Coulibaly et al. (2013) suggested the use of these guidelines to evaluate the Canadian national hydrometric network.
Moss et al. (1982) presented one of the first attempts to use physiographic
components in the design of sensor networks in a method called Network
Analysis for Regional Information (NARI). This method is based on relations
of basin characteristics proposed by Benson and Matalas (1967). NARI can be
used to formulate the following objectives for network design: minimum cost
network, maximum information, and maximum net benefit from the
data-collection programme, in a Bayesian framework, which can be
approximated as
Laize (2004) presented an alternative for evaluating precipitation networks based on the use of the Representative Catchment Index (RCI), a measure to estimate how representative a given station in a catchment is for a given area, on the stations in the surrounding catchments. The author argues that the method, which uses datasets of land use and elevation as physiographical components, can help in identifying areas with an insufficient number of representative stations in a catchment.
Most of the first sensor networks were designed based on expert judgement and practical considerations. Aspects such as the objective of the measurement, security, and accessibility are decisive to selecting the location of a sensor. Nemec and Askew (1986) presented a short review of the history and development of the early sensor networks, where it is highlighted that the use of “basic pragmatic approaches” still had most of the attention, due to its practicality in the field and its closeness with decision-makers.
Bleasdale (1965) presented a historical review of the early development process of the rainfall sensor networks in the United Kingdom. In the early stages of the development of precipitation sensor networks, two main characteristics influencing the location of the sensors were identified: at sites that were conventionally satisfactory and where good observers were located. However, the necessity of a more structured approach to select the location of sensors was underlined. As a guide, Bleasdale (1965) presented a series of recommendations on the minimal density of sensors for operational purposes, summarised in Fig. 5, relating the characteristics of the area to be monitored and the minimum required a number of rain sensors, as well as its temporal resolution.
Minimum number of rain gauges required in reservoired moorland areas – adapted from Bleasdale (1965).
In a more structured approach, Karasseff (1986) introduced some guidelines
for the definition of the optimal sensor network to measure hydrological
variables for operational hydrological forecasting systems. The study
specified the minimum requirements for the density of measurement stations
based on the fluctuation scale and the variability of the measured variable
by defining zonal representative areas. This author suggested the following
considerations for selecting the optimal placement of hydrometric stations:
in the lower part of inflow and wastewater canals; at the heads of irrigation and watering canals taking water
from the sources; at the beginning of a debris cone before the zone of
infiltration, and at its end, where groundwater decrement takes place; at the boundaries of irrigated areas and zones of considerable
industrial water diversions (towns); and at the sites of hydroelectric power plants and hydro-projects.
From a different perspective, Wahl and Crippen (1984), as well as Mades and
Oberg (1986), proposed a qualitative score assessment of different factors
related to the use of data and the historical availability of records for the
evaluation of sensor values. Their analyses aimed at identifying candidate
sensors to be discontinued, due to their limited accuracy.
These approaches aim to identify the information needs of particular groups of users (Sieber, 1970), following the idea that the location of a certain sensor (or group of sensors) should satisfy at least one specific purpose. To this end, surveys to identify the interests for the measurement of certain variables, considering the location of the sensor, record length, frequency of the records, methods of transmission, among others, are executed.
Singh et al. (1986) applied two questionnaires to evaluate the streamflow network in Illinois: one to identify the main uses of streamflow data collected at gauging stations, where participants described how data was used and how they would categorise it in either site-specific management activities, local or regional planning and design, or determination of long-term trends. The second questionnaire was used to determine present and future needs for streamflow information. The results showed that the network was reduced due to the limited interest about certain sensors, which allowed for enhancing the existing network using more sophisticated sensors or recording methods. Additionally, this redirection of resources increased the coverage at specific locations.
There are also other methods that cannot be easily attributed to the previously mentioned categories. Among them, value of information, fractal, and network theory-based methods can be mentioned.
The value of information (VOI, Howard, 1966; Hirshleifer and Riley, 1979) is defined as the value a decision-maker is willing to pay for extra information before making a decision. This willingness to pay is related to the reduction of uncertainty about the consequences of making a wrong decision (Alfonso and Price, 2012).
The main feature of this approach is the direct description of the benefits of additional pieces of information, compared with the costs of acquiring that extra piece of information (Black et al., 1999; Walker, 2000; Nguyen and Bagajewicz, 2011; Alfonso and Price, 2012; Ballari et al., 2012). The main advantage of this method is that it provides a pragmatic framework in which information has a utilitarian value, usually economic, which is especially suited for budget constraint conditions.
One of the assumptions of this type of model is that a prior estimation of
consequences is needed. If
The value of a single message
The value of information, VOI, is the expected utility of the values
Following the same line of ideas, Khader et al. (2013) proposed the use of decision trees to account for the development of a sensor network for water quality in drinking groundwater applications. VOI is a straightforward methodology to establish present causes and consequences of scenarios with different types of actions, including the expected effect of additional information. A recent effort by Alfonso et al. (2016) towards identifying valuable areas to get information for floodplain planning consists of the generation of VOI maps, where probabilistic flood maps and the consequences of urbanisation actions are taken into account to identify areas where extra information may be more critical.
Fractal-based methods employ the concept of Gaussian self-affinity, where sensor networks show the same spatial patterns at different scales. This affinity can be measured by its fractal dimension (Mandelbrot, 2001). Lovejoy et al. (1986) proposed the use of fractal-based methods to measure the dimensional deficit between the observations of a process and its real domain. Consider a set of evenly distributed cells representing the physical space, and the fractal dimension of the network representing the number of observed cells in the correlation space. The lack of non-measured cells in the correlation space is known as the fractal deficit of the network. Considering that a large number of stations have to be available at different scales, the method is suitable for large networks, but less useful in the deployment of few sensors in a catchment scale.
Lovejoy and Mandelbrot (1985) and Lovejoy and Schertzer (1985) introduced the use of fractals to model precipitation. They argued that the intermittent nature of the atmosphere can be characterised by fractal measures with fat-tailed probability distributions of the fluctuations, and stated that standard statistical methods are inappropriate to describe this kind of variability. Mazzarella and Tranfaglia (2000) and Capecchi et al. (2012) presented two different case studies using this method for the evaluation of a rainfall sensor networks. The former study concludes that for network augmentation, it is important to select the optimal locations that improve the coverage, measured by the reduction of the fractal deficit. However, there are no practical recommendations on how to select such locations. The latter proposes the inspection of seasonal trends as the meteorological processes of precipitation may have significant effects on the detectability capabilities of the network.
A common approach for the quantification of the dimensional deficit is the
box-counting method (Song et al., 2007; Kanevski, 2008), mainly used in the
fractal characterisation of precipitation sensor networks. The fractal
dimension of the network (
Due to the scarcity of measurements of precipitation types of networks, the quantification of the fractal dimension may be unstable. An alternative fractal dimension may be calculated using a correlation integral (Mazzarella and Tranfaglia, 2000) instead of the number of blocks, such that
The consequent definition of the fractal dimension of the network is the rate
between the logarithm of the correlation integral and the logarithm of the
scaling radius. This ratio is calculated from a regression between different
values of
Recently, research efforts have been devoted to the use of the so-called network theory to assess the performance of discharge sensor networks (Sivakumar and Woldemeskel, 2014; Halverson and Fleming, 2015). These studies analyse three main features, namely average clustering coefficient, average path length, and degree distribution. Average clustering is a degree of the tendency of stations to form clusters. Average path length is the average of the shortest paths between every combination of station pairs. Degree distribution is the probability distribution of network degrees across all the stations, being network degree defined as the number of stations to which a station is connected. Halverson and Fleming (2015) observed that regular streamflow networks are highly clustered (so the removal of any randomly chosen node has little impact on the network performance) and have long average path lengths (so information may not easily be propagated across the network).
In hydrometric networks, three metrics are identified (Halverson and Fleming,
2015): degree distribution, clustering coefficient, and average path length.
The first of these measures is the average node degree, which corresponds to
the probability of a node being connected to other nodes. The metric is
calculated in the adjacency matrix (a binary matrix in which connected nodes
are represented by 1 and the missing links by 0). Therefore, the degree of
the node is defined as
The clustering coefficient is a measure of how much the nodes cluster
together. High clustering indicates that nodes are highly interconnected. The
clustering coefficient (CC) for a given station is defined as
Classification of sensor network design criteria including recommended reading.
According to Halverson and Fleming (2015), an optimal configuration of streamflow networks should consist of measurements with small membership communities, high-betweenness, and index stations with large numbers of intracommunity links. Small communities represent clusters of observations, thus indicating efficient measurements. Large numbers of intra-community links ensure that the network has some degree of redundancy, and, thus, is resistant to sensor failure. High-betweenness indicates that such stations which have the most inter-communal links are adequately connected and thus able to capture the heterogeneity of the hydrological processes at a larger scale.
Advantages and disadvantages of sensor network design methods.
Table 2 summarises the sensor network design classes and approaches, with the selected references to the relevant papers in each of the categories for further reference.
It is of special interest in the review to highlight the lack of model-based information theory methods, as well as the low number of publications in network theory-based methods. Also, quantitative studies in the comparison of different methodologies for the design of sensor networks are limited. It is suggested, therefore, that a pilot catchment is used for the scientific community to test all the available methods for network evaluation, and to establish similarities and differences among them.
Table 3 summarises the main advantages and disadvantages for each of the design and evaluation methods. These recommendations are general, but take into account the most general points in the design considerations of sensor networks. Some of the advantages of these methods have been exploited in combined methodologies, such as those presented by Yeh et al. (2011), Samuel et al. (2013), Barca et al. (2015), Coulibaly and Samuel (2014), and Kang et al. (2014).
Based on the presented literature review, in this section an attempt is made to present a first version of a unified, general procedure for sensor network design. Such procedure logically link in a flowchart various methods, following the measurement-based approaches (Fig. 6). The flowchart suggests two main loops: one to measure the network performance (optimisation loop), and a second one to represent the selection in the number of sensors in either augmentation or reduction scenarios. Most of the measurement-based methods, as well as most of the design scenarios can be typically seen as particular cases of this generalised algorithmic flowchart.
Sensor network (re)design flowchart (CML: candidate measurement locations).
The general procedure consists of 11 steps (boxes in Fig. 6). In the first place, physical measurements (1) are acquired by the sensor network. These data are used to parameterise an estimator (2), which will be used to estimate the variable at the candidate measurement locations (CML) using, for instance, Kriging (Pardo-Igúzquiza, 1998; Nowak et al., 2009) or 1-D hydrodynamic models (Neal et al., 2012; Rafiee, 2012; Mazzoleni et al., 2015). The sensor network reduction does not require such estimators as measurements are already in place.
The selection of the CML should consider factors such as physical and technical availability, as well as costs related to maintenance and accessibility of stations, as illustrated by the WMO (2008c) recommendations. The selection of CML can also be based, for example, on expert judgement. These limitations may be presented in the form of constraints in the optimisation problem.
Then an optimisation loop starts (Fig. 6), by the estimation of the measured variable at the CML (3), using the estimator built in (2). Next, the performance of the sensor network at the CML is evaluated (4), using any of the previously discussed methods. The selection of the method depends on the designer and its information requirements, which also determine whether an optimal solution is found (5). The stopping criteria in the optimisation problem can be set by a desired accuracy of the network, some non-improved number of solutions, or a maximum number of iterations. As pointed out in the review, these performance metrics can be either model-based or model-free and should not be confused with the use of a (geostatistical) model of the measured variable.
In case the optimisation loop is not complete, a new set of CML is selected (6). The use of optimisation algorithms may drive the search for the new potential CML (Pardo-Igúzquiza, 1998; Kollat et al., 2008, 2011; Alfonso, 2010). The decision about adequate performance should not only consider the expected performance of the network, but also recognise the effect of a limited number of sensors.
Once the performance is optimal, an iteration over the number of sensors is required. If the scenario is for network augmentation (7), then a possibility of including additional sensors has to be considered (8). The decision to go for an additional sensor will depend on the constraints of the problem, such as a limitation on the number of sensors to install, or on the marginal improvement of performance metrics.
The network reduction scenario (9) is inverse: for diverse reasons, mainly of a financial nature, networks require fewer sensors. Therefore, the analysis concerns which sensors to remove from the network, within the problem constraints (10).
Finally, the sensor network is selected (11) from the results of the optimisation loop, with the adequate number of sensors. It is worth mentioning that an extra loop is required, leading to re-evaluation, typically done on a periodical basis, when objectives of the network may be redefined, new processes need to be monitored, or when information from other sources is available, and that can potentially modify the definition of optimality.
This paper summarised some of the methodological criteria for the design of sensor networks in the context of hydrological modelling, proposed a framework for classifying the approaches in the existing literature, and also proposed a general procedure for sensor network design. The following conclusions can be drawn.
Most of the sensor network methodologies aim to minimise the uncertainty of the variable of interest at ungauged locations and the way this uncertainty is estimated varies between different methods. In statistics-based models, the objective is usually to minimise the overall uncertainty about precipitation fields or discharge modelling error. Information theory-based methods aim to find measurements at locations with maximum information content and minimum redundancy. In network theory-based methods, estimations are generally not accurate, resulting in less biassed estimations. In methods based on practical case-specific considerations and value of information, the critical consequences of decisions dictate the network configuration.
However, in spite of the underlying resemblances between methods, different formulations of the design problem can lead to rather different solutions. This gap between methods has not been deeply covered in the literature and therefore general agreement on the sensor network design procedure is relevant.
In particular, for catchment modelling, the driving criteria should also consider model performance. This driving criterion ensures that the model adequately represents the states and processes of the catchment, reducing model uncertainty and leading to more informed decisions. Currently, most of the network design methods do not ensure minimum modelling error, as often it is not the main performance criteria for design.
Furthermore, in recent years, the rise of various sensing technologies in operational environments has promoted the inclusion of additional design considerations towards a unified heterogeneous sensor network. These new sensing technologies include e.g. passive and active remote sensing using radars and satellites (Thenkabali, 2015), microwave links (Overeem et al., 2011), mobile sensors (Haberlandt and Sester, 2010; Dahm et al., 2014), crowdsourcing, and citizen observatories (Huwald et al., 2013; Lanfranchi et al., 2014; Alfonso et al., 2015). These non-conventional information sources have the potential to complement conventional networks by exploiting the synergies between the virtues and reducing limitations of various sensing techniques, and at the same time require the new network design methods allowing for handling of the heterogeneous dynamic data with varying uncertainty.
The proposed classification of the available network design methods was used to develop a general framework for network design. Different design scenarios, namely relocation, augmentation, and reduction of networks, are included for measurement-based methods. This framework is open and offers “placeholders” for various methods to be used depending on the problem type.
Concerning the further research, from the hydrological modelling perspective, we propose directing efforts towards the joint design of precipitation and discharge sensor networks. Hydrological models use precipitation data to provide discharge estimates; however, as these simulations are error-prone, the assimilation of discharge data, or error correction, reduces the systematic errors in the model results. The joint design of both precipitation and discharge sensor networks may help to provide more reliable estimates of discharge at specific locations.
Another direction of research may include methods for designing dynamic sensor networks, given the increasing availability of low-cost sensors, as well as the expansion of citizen-based data collection initiatives (crowdsourcing). These information sources have been on the rise in recent years, and one may foresee the appearance of interconnected, multi-sensor heterogeneous sensor networks shortly.
The presented review has also shown that limited effort has been devoted to considering changes in long-term patterns of the measured variable in the sensor network design. This assumption of stationarity has become more relevant in recent years due to new sensing technologies and increased systemic uncertainties, e.g. due to climate and land use change and rapidly changing weather patterns. Although this topic has been recognised for quite some time already (see e.g. Nemec and Askew, 1986), the number of publications presenting effective methods to deal with them is still limited. This problem, and the techniques to solve it, are being addressed in the ongoing research.
No data sets were used in this article.
The authors declare that they have no conflict of interest.
We would like to thank Joanne Craven for the editing support in the final stages of this article, and the three anonymous referees whose comments greatly helped us to improve this document to its current form. Edited by: Laurent Pfister Reviewed by: three anonymous referees