Development of a method of robust rain gauge network optimization based on intensity-duration-frequency results

Based on rainfall intensity-duration-frequency (IDF) curves, fitted in several locations of a given area, a robust optimization approach is proposed to identify the best locations to install new rain gauges. The advantage of robust optimization is that the resulting design solutions yield networks which behave acceptably under hydrological variability. Robust optimization can overcome the problem of selecting representative rainfall events when building the optimization process. This paper reports an original approach based on Montana IDF model parameters. The latter are assumed to be geostatistical variables, and their spatial interdependence is taken into account through the adoption of crossvariograms in the kriging process. The problem of optimally locating a fixed number of new monitoring stations based on an existing rain gauge network is addressed. The objective function is based on the mean spatial kriging variance and rainfall variogram structure using a variance-reduction method. Hydrological variability was taken into account by considering and implementing several return periods to define the robust objective function. Variance minimization is performed using a simulated annealing algorithm. In addition, knowledge of the time horizon is needed for the computation of the robust objective function. A shortand a longterm horizon were studied, and optimal networks are identified for each. The method developed is applied to north Tunisia (area = 21 000 km 2). Data inputs for the variogram analysis were IDF curves provided by the hydrological bureau and available for 14 tipping bucket type rain gauges. The recording period was from 1962 to 2001, depending on the station. The study concerns an imaginary network augmentation based on the network configuration in 1973, which is a very significant year in Tunisia because there was an exceptional regional flood event in March 1973. This network consisted of 13 stations and did not meet World Meteorological Organization (WMO) recommendations for the minimum spatial density. Therefore, it is proposed to augment it by 25, 50, 100 and 160 % virtually, which is the rate that would meet WMO requirements. Results suggest that for a given augmentation robust networks remain stable overall for the two time horizons.


Introduction
Rain gauge monitoring networks are highly important to estimate precipitation and erosion, and to evaluate runoff.In fact, rainfall intensity is the most important variable in runoff and erosion impact prediction.More generally, rainfall network accuracy depends on precipitation variability as well as on the network size and design.Several optimization approaches have been proposed in the literature since the early work of Bras and Rodríguez-Iturbe (1976) and Delhomme (1978), who proposed a methodology of network design based on the minimization of the mean areal kriging error variance.The adoption of geostatistical methods for rainfall network sizing and augmentation was also performed by Pardo-Igúzquiza (1998).In Delhomme (1978), the optimal location of rain gauges was identified using a technique called the fictitious point method while in Pardo-Igúzquiza (1998) an automatic optimization technique (namely simulated annealing) was adopted.Barca et al. (2008) provided a methodology for assessing the Published by Copernicus Publications on behalf of the European Geosciences Union.
optimal location of new monitoring stations within an existing rain gauge monitoring network.The methodology used geostatistics and probabilistic techniques (simulated annealing) combined with GIS.A method composed of kriging and entropy that can determine the optimum number and spatial distribution of rain gauge stations in catchments was proposed in Chen et al. (2008).Chebbi et al. (2011) have considered mono-objective criteria assuming 1 h rainfall intensity interpolation and erosivity factor interpolation and using one single extreme rainfall event to conduct the analysis.Rainfall quantities retained in previous studies were mainly taken in a deterministic way.Effectively, a single rainfall pattern was selected for which the average kriging variance was minimized to achieve the best new rain gauge locations (Delhomme, 1978;Pardo-Igúzquiza, 1998;Chebbi et al., 2011).
In the present study, it is aimed to find out new observation locations using a collection of rainfall patterns or rainfall auxiliary variables, each characterized by its probability of occurrence.Because robust optimization is an approach which can deal with the uncertainty in optimization problems by computing a solution that can cope with possible different scenarios (Mulvey et al., 1995;Bai et al., 1997;Beyer and Sendhoff, 2007), we claim that a robust network augmentation framework is proposed here.
Regarding water related problems, the literature contains a few applications of robust optimization techniques.Watkins and McKinney (1997) proposed two problems to illustrate the suitability of robust optimization in the resolution of water resources problems.The first is a problem of urban transfer of water vis-à-vis the availability of water and the need to consider water supply as random variable for decision making.The second problem relates to the management of the quality of subsoil waters, considering the uncertainty of the aquifer parameters.Ricciardi et al. (2007) considered a similar question in the context of aquifer remediation.Afonso and Cunha (2007) developed a robust model to design biological reactors and secondary settling tanks in wastewater treatment plants.Cunha and Sousa (2010) presented models for the robust design of water distribution networks to enable them to face the uncertainty of network working conditions under extreme events.Zeferino et al. (2012) have recently proposed a robust optimization model for the sitting and sizing of wastewater treatment plants at regional level that includes uncertainty issues associated with river flows.Accidents such as broken conduits or tanks and change in demand may affect how water distribution functions.Our study proposes to apply these approaches to decide on rain gauge network development.The problem of the best rain gauge location is addressed.North Tunisia is the study domain (area = 21 000 km 2 ).Rainfall intensity at a given location is considered as a random variable.Local intensityduration-frequency curves are assumed to reflect the hydrological variability.Section 2 presents the method used in this paper.Section 3 presents the case study and the avail-able data.Section 4 sets out the results obtained, and some concluding remarks are presented in Sect. 5.

Definition of candidate stations
The main purpose of the rain gauge selection algorithm is to identify an optimal set of locations for a particular number of stations over the study area.The domain is in the Mediterranean area, and elevation ranges from 0 to 1281 m.According to WMO (1994, Sect. 20), the minimum recommended density is 1 station per 600 km 2 for Mediterranean plain areas in difficult conditions.This means, for instance, places where gauges are difficult to install and maintain, perhaps because of rugged topography or site inaccessibility.The initial network we consider was in operation at the time of the March 1973 flood event, which is why we took this as the point of departure of our problem.It was composed of 13 sparsely distributed stations, which was wholly inadequate to cover the rainfall variability over the study area.This study has a methodological character, and so various scenarios are simulated where the number of stations of the initial network is increased by 25 % (scenario 1), 50 % (scenario 2), 100 % (scenario 3) and 160 % (scenario 4).Moss and Tasker (1991) suggested that the number of candidate stations should be at least three times the number of the desired optimal stations.Accordingly, for scenarios 1 to 3, 40 candidate stations are assumed.In scenario 4, which achieves WMO-required density, the number of candidate stations is increased to 60.These 60 candidate stations, which contain the 40 candidate stations considered previously, are imaginary locations of rain gauges, equally distributed over the study area.

The IDF database
The search for "robust" solutions is an adequate method to solve the problem as it takes into account the hydrological uncertainty inherent to rainfall occurrence.Since we are interested in short duration rainfall, the maximum rainfall intensity for specified durations is the variable to be studied in this paper.To deal with hydrological risk inherent to rainfall data, we would need data on the maximum rainfall intensities recorded for several events.However, the problem is that we do not have this type of information for the study area.Thus, the adjustment parameters of the intensity-duration-frequency (IDF) curves (Koutsoyiannis et al., 1998) are proposed as alternative or auxiliary variables (parameters a(T ) and b(T ) of Eq. 1).An existing IDF study performed by DGRE-ST2i (2007) is taken as basis for this.The following times of reference were considered in DGRE-ST2i (2007): 5, 10, 15, 30, 45,. . . , 180 min.For stations having short observation periods (3 to 10 yr) without gaps, DGRE-ST2i (2007) selected thresholds (generally one threshold per time of reference), identified using various tests, and adopted the intensities that were greater than the fixed threshold, to constitute the time series.The peak over threshold approach was adopted by DGRE-ST2i (2007) for the rain gauges characterized by a large number of gaps in the rainy months, even in the case where the number of observation years exceeded 10.For the rain gauges characterized by recordings without gaps and observed over long periods, DGRE-ST2i (2007) considered the M highest values observed in M years to achieve the statistical analysis.
The Montana model, which predicts maximum rainfall intensity over duration t for the return period T as a power function of the duration (Eq. 1) (Burlando and Rosso, 1996), was adopted by DGRE-ST2i (2007).The estimated model parameters a(T )and b(T ) are reported.
where T is the return period (in years).Here, we adopt the hydrological risk definition where the risk is p = 1/T (Bobée and Ashkar, 1993) so that we may consider that T reflects the hydrological risk, t is rainfall duration in minutes, and a(T ) and b(T ) are Montana IDF model parameters.

Geostatistical framework for IDF parameters
In this study a(T ) and b(T ) are taken as geostatistical variables (Matheron, 1965).In fact, it is assumed that it is possible to represent the a(T ) and b(T ) spatial structures through variogram functions.Furthermore, the analysis is made possible by the fact that the two parameters a(T ) and b(T ) are known at the same experimental points.Further, we first test whether a(T ) and b(T ) are dependent or independent.For this, their experimental cross-variogram γ a(T )b(T ) (Chilès and Delfiner, 1999) is examined (Eq.2).The experimental cross-variogram is defined as the variance of the difference between two variables of different types or attributes at two locations.
where N (h) represents the number of sample points separated by interdistance h, and x i and x i + h are sampling locations separated by interdistance h.
For the cross-correlation analysis, it is recommended to adopt the co-dispersion coefficient graph r a(T )b(T )(h) (Matheron, 1965), which is linked to the cross-variogram and to the direct variograms by where γ a(T )a(T ) and γ b(T )b(T ) are the direct variograms respectively of a(T ) and b(T ).
Generally, the direct variogram function is a key tool to quantify the variability associated with the regionalized variable, Z(the variable Z is set as a(T ) or b(T )).The experimental semivariogram, γ (h), is calculated from the data as a function of the point separation, h, and is given by where N (h) is the number of sample points separated by h, x i and x i + h are sampling locations separated by a distance h, and Z(x i ) and Z(x i + h) are values of the variable Z measured at the corresponding locations.
The sample variogram is fitted to a variogram model.For a stationary regionalized variable, the variogram is characterized by three main parameters: range, sill, and nugget."Range" is the distance at which measurements cease to be correlated with each other."Sill" is the variogram value at and beyond the range distance.The "nugget" effect is the random component of the digital values, graphically expressed by the discontinuity of the variogram at the origin.A parametric approach is used to derive the variogram model.Without loss of generality, further developments are given for the example of the spherical model in Eq. ( 5), which is a bounded variogram: 3 . (5) The sill s(δ) represents the highest variance for a large data point distance, and, for the spherical model, the range r(δ) refers to the distance over which the data are correlated.The crossed and direct variograms are estimated from the pairs (a(T ), b(T )) estimated at each observed location for the various durations (t = 5, 10, 15, 20, 30, 45, 60, 120 and 180 min).
It is assumed that if the graph r a(T )b(T ) (h) is constant, it may be concluded that the parameters a(T ) and b(T ) are dependent (Chilès et al., 1991).
Furthermore, network optimization is achieved by adopting an objective function to be maximized (or minimized).The quality of a(T ) spatial interpolation is selected as a target, and kriging is adopted as the interpolation method.Effectively, advantage is taken of the fact that kriging is accompanied by the estimation of the variance of estimation error, which is an indicator of interpolation accuracy.Also, an advantage of kriging is that kriging variance error can be simulated using imaginary networks (with no observed data at the krigged location).We therefore used kriging methods to define the objective function.Hence, the minimization of the average (spatial mean over the study domain) kriging error of a(T ) is the specific approach proposed here.Six return periods (T = 2, 5, 10, 20, 50, 100 yr) covering a broad panoply of risk situations are considered.
As a(T ) and b(T ) are not independent, it is proposed to develop the kriging estimate of a(T ) using b(T ) as information.External drift kriging (EDK) is a suitable method to achieve this goal.In fact, in the case of a sparse network, kriging with external drift seems more appropriate than ordinary kriging.EDK requires knowledge of the values of b(T ) at the locations where a(T ) is to be interpolated.This is achieved by first using an ordinary kriging approach to b(T ).Therefore, a map of b(T ) is produced that is then adopted to interpolate a(T ).The kriging systems are set out in Appendix A.

The objective function
The decision model presented here is built within the framework of robust optimization and is inspired by the case studies reported in Mulvey et al. (1995).The objective function is written using the concept of regret by considering a quadratic term expressing the difference, for each scenario, between the values of the standardized mean spatial kriging variance achieved by the solution to be implemented and by the optimal solution for the scenario.This means that the optimal solution for the model proposed will be solution-robust (Laguna, 1998).As such, the optimal solution obtained will be "close" to the optimum for any of the realized scenarios ensuring the optimality robustness.The robust optimization method adopted requires knowledge of a time horizon.For a fixed hydrologic risk p = 1/T , where T is the return period, we express the probability of an overrun of the event of return period T during time horizon of duration N. It corresponds to where N is the number of years in the horizon.Two time horizons are successively considered in the present study: the short term (N = 5 yr) and the long term (N = 30 yr). u(T ) is further scaled by dividing it by the sum of u(T ) over the various return periods.
where N T is the number of return periods; this means the number of scenarios considered in the study.
To evaluate the mean spatial kriging error variance over the study domain, a grid mesh with a resolution of 4 km was used.The optimization problem consists of minimizing the objective function expressed by where (Eq. 7), with S ref (T = T i ) being the value of the standardized mean spatial kriging variance obtained for every return period T i independently of the other return periods.It is taken as reference.
In addition, standardization of the mean spatial kriging variance is obtained by using the interquartile range of a(T ) kriging error variance map: where (σ 2 i(a(T =T i ))) is the variance of kriging errors of a(T = T i ) at the computing node i depending on locations of stations, and n is the number of grid nodes. (10) is the 75th percentile of the pattern of the variance of kriging errors of a(T = T i ).
σ 2 25 %(a(T =T i )) is the 25th percentile of the pattern of the variance of kriging errors of a(T = T i ).
This objective function is subjected to domain constraints expressed by the set of possible locations for the stations (as such the solution space is defined).

Case study and data
The study area is composed of two watersheds: the north coast watershed (BV 3) and the Medjerda watershed (BV 5) in Tunisia.The study area is characterized by a sub-humid to semi-arid climate and covers more than 20 000 km 2 .Figure 1 shows the study area together with the reference rain gauges of the DGRE-ST2i study composed of 14 stations.It also shows the initial network consisting of the 13 stations functioning in March 1973 during the extreme flood event.Candidate locations for composing the robust network are also marked.Table 1 presents the rain gauges from the DGRE-ST2i study.The Montana IDF model parameters estimated by DGRE-ST2i (2007) are presented in Table 2 for the 14 stations.The size of the time series analysed in DGRE-ST2i ranges from 10 to 33 yr (Table 1).The observations periods range from 1962 to 2001, depending on the station.

Dependence of the parameters a(T ) and b(T )
The co-dispersion coefficient graph r a(T )b(T ) (h) is plotted in Fig. 2.An erratic fluctuation of r a(T )b(T )(h) around a fixed value can clearly be seen.Therefore, r a(T )b(T ) (h) can be assumed as nearly constant for all T allowing the supposition that a(T ) and b(T ) are dependent variables.This authorizes the kriging of a(T ) by taking b(T ) as external drift.

Structural analysis of the parameters a(T ) and b(T )
Spherical variogram models are adjusted to sample variograms.The ranges and sills were identified manually, and an attempt was made to take account of a good approximation of the first points of the sample variograms (short interdistances) as well as a good approximation of high inter- distances.Table 3 reports the adjusted sill and range for every return period.Adjusted ranges of a(T ) and b(T ) extend from 30 to 50 km.Sample and adjusted variograms of a(T ) and b(T ) are plotted in Fig. 3. Instead of using all the return periods previously studied, it is proposed to adopt representative return periods.In effect, the analysis highlights three groups of return periods where the variograms are found to be quite similar: group 1 including only {T = 2 yr}; group 2 including {T = 5, 10, 20 yr} and group 3{T = 50, 100 yr}.Hence, T 1 = 2 yr (normal situation), T 2 = 5 yr (small risk) and T 3 = 50 yr (high risk) were selected as representative of each group.

Comparison of the resulting robust networks
We first calculated the probability of overrun of the event u(T ) during the time horizon (Eq.6) and thus the associated weight ω(T ) (Eq. 7).Table 4 summarizes the values of u(T ) and ω(T ) for each return period for the two horizons.In the case of the short-term time horizon, we find that T = 50 yr is nearly neglected, and there is more focus on T = 2 yr in the weighting system.For the long-term time horizon, the method assigns equal weights to normal and moderate risk (2 and 5 yr) while more weight is assumed for high risk situations (T = 50 yr).Therefore, these findings are consistent with the intuitive point of view.Table 5 reports the values of the standardized variances OF ref (T ), which obviously decreased as the network size increased (Table 5).It was also found that the most improvement is obtained when one splits from scenario 1 (three new stations to implement) to scenario 2 (six new stations to implement).The comparison of the resulting robust networks obtained in turn for the two time horizons (short term and long term) is reported in Table 6, where the locations which are different for the two horizons are in italic.It shows that optimal networks are similar for scenario 1.This is an important result: the locations of the 3 new stations chosen from 40 candidates are independent of the time horizon.This kind of network reinforcing is needed, without doubt.For scenario 2, which consists of adding six new stations chosen from 40 candidates, the two robust networks differ only by one station, for both time horizons.This also indicates good accuracy of the result.Five new stations can be added, and they are equally representative for the short-and long-term perspectives.For scenario 3, which requires 13 new locations to be found (out of 40 candidates), the two robust networks obtained in turn for the short-and long-term horizons differ only by 2 stations.Hence, as many as 11 new locations are common to the two time horizon perspectives, which is a very encouraging result from a decision making point of view.However, using 60 candidates for scenario 4, the short and long-term visions differ by 7 stations out of 21 (Fig. 4), which may be seen as problematic by a network manager.In a previous paper (Chebbi et al., 2011), mono-objective criteria have been considered assuming 1 h rainfall intensity interpolation and erosivity factor interpolation and using one single extreme rainfall event to conduct the analysis.The comparison of previous results with the present study highlights that the mean spatial kriging variance in the case of the mono-objective criterion is lower than or equal to that obtained in the case of the robust optimization.Nevertheless, the essential advantage of the robust optimization lies in the fact that it allows overcoming the problem of using one single rainfall event and yields networks which work "adequately", when considering various extreme events with different return periods.

Conclusions
The robust optimization approaches are recommended in cases where the variables of interest are uncertain .The hydrological risk is considered in the present study, which aims to find new observation locations for rain gauges for recording short duration rainfall.The novel approach consists of considering IDF curve parameters to locate the best sites for installing imaginary new rain gauges.The method assumes a time horizon to minimize an objective functionbased IDF parameter a(T ) of the Montana model, considering the weighting of various return periods T .Kriging interpolation using the variance-reduction method was applied to build the objective function using the variance error of a(T ) estimation.The weighted mean spatial error variance was considered.
The region of north Tunisia, which has a sub-humid to semi-arid climate, was used to develop the methodology.The comparison of the resulting robust networks considered three return periods (T 1 = 2 yr, T 2 = 5 yr and T 3 = 50 yr), representing normal to high risk situations and two different time horizons (short term = 5 yr, and long term = 30 yr).It is suggested that robust networks are quite similar to each other when selecting 3, 6 and 13 locations from 40 candidates to reinforce an initial network of 13 stations covering an area of 21 000 km 2 .The conclusions are quite different when selecting 21 new stations: only 14 out of 21 stations, assuming 60 candidate locations, are common to the two time horizons.Therefore, the conclusion that may be drawn is that when the size of the new network is augmented, it is more difficult to obtain a unique robust solution.Further research topics aim to develop entropy approaches to define the robust objective function.

Table 3 .
Adjusted sills and ranges for a(T ) and b(T ) assuming spherical variogram models.

Table 5 .
Values of OF ref (T ) objective function obtained for interpolation of a(T ).