A method for low-flow estimation at ungauged sites : a case study in Wallonia ( Belgium )

Well-integrated water management can notably require estimating low flows at any point of a river. Depending on the management practice, it can be needed for various return periods. This is seldom addressed in the literature. This paper shows the development of a full analysis chain including quality analysis of gauging stations, low-flow frequency analysis, and building of a global model to assess low-flow indices on the basis of catchment physical parameters. The most common distributions that fit low-flow data in Wallonia were two-parameter lognormal and gamma. The recession coefficient and percolation were the most explanatory variables, regardless of the return period. The determination coefficients of the models ranged from 0.51 to 0.67 for calibration and from 0.61 to 0.80 for validation. The regression coefficients were found to be linked to the return period. This was used to design a complete equation that gives the low-flow index based on physical parameters and the desired return period (in a 5 to 50 yr range). The interest of regionalisation and the development of regional models are also discussed. Four homogeneous regions are identified, but to date the global model remains more robust due to the limited number of 20-yr-long gauging stations. This should be reconsidered in the future when enough data will be available.


Introduction
It is now recognised that river low flows can lead to severe consequences in water quality and river ecological status (Whitehead et al., 2009).Navigation and power supply sectors can also be affected by low flows (Middelkoop et al., 2001).Furthermore, as pressures on rivers become more important during low flows, some conflicts between the different water uses can arise, especially between instream water use and water abstraction demand (Hébert et al., 2003).Water managers therefore need to be able to quantify low flows at any point of a river, in magnitude as well as in frequency.
Low flows can have different meanings depending on the definitions of authors.In this study, low flows are considered as the lowest discharge values observed in a river, which usually occur between May and November in Wallonia (Belgium).The index chosen to characterise low flows is MAM7 which stands for mean annual minimum flow on a 7-day average basis.
In gauged catchments, the value of MAM7 is calculated on a yearly basis.Then, a distribution can be fitted to the data in order to find a relationship between low flows and probabilities of non-exceedance (1/T ).The distribution that best fits is chosen according to statistically and graphically based tests, and its parameters can be calculated by different methods (Smakhtin, 2001).Matalas (1963), Joseph (1970), Condie and Nix (1975), Tasker (1987), Leppärjärvi (1989) and Yue and Pilon (2005), amongst many others, have compared various distributions and methods to estimate parameters.The best-adjusted distribution is usually different according to the study region and the low-flow index (Abi-Zeid and Bobée, 1999).In other methods, the understanding of the generating processes of low flows contributes to the choice of distribution (Pacheco et al., 2006).
Low-flow calculation and frequency analysis are easy to handle for long-time gauged catchments.For ungauged catchments, however, low-flow index has to be inferred using neighbouring gauged catchment data.
Regarding the estimation of low flows in ungauged river catchments, Smakhtin (2001) reviewed all possible techniques but cited regional regression approach as the most widely used.The other methods use time-series simulation, regional prediction curves, or spatial interpolation.Regional regression consists in delineating hydrologically homogeneous regions based on catchment characteristics and developing, for each region, a regression model relating the lowflow index to these characteristics.As this approach is based on physical parameters of catchments, it allows a better understanding of low flows from a physical point of view, in addition to estimating them.This technique has been employed by numerous scientists, e.g.Nathan and McMahon (1992) for south-eastern Australia, Laaha and Blöschl (2006) for Austria, Vezza et al. (2010) for north-western Italy, and Tsakiris et al. (2011) for the state of Massachusetts in the USA.However, only a few of them included the return period in their analysis.
The temporal and spatial components of low-flow hydrology such as frequency and regional distribution, respectively, are closely related according to Smakhtin (2001), but articles that deal with both at the same time are rare.Some of them have aimed at assessing low flow for fixed return periods.Kim and Lee (2010) used Bayesian multiple regression analysis for regional low-flow frequency in south-east Korea and utilised the developed model to predict 7-day minimum flow at ungauged sites for a fixed 10 yr return period.Comparing to conventional techniques based on a t distribution, the Bayesian analysis resulted in lower confidence intervals for parameter estimates.After a low-flow frequency analysis comparing nine distributions, Ouarda and Shu (2009) developed models to estimate low flows in Quebec, employing single and ensemble artificial neural networks, for three different return periods and taking into account seven physiographical and meteorological variables.In Virginia, Hayes (1991) used the Pearson type III distribution for frequency analysis.Then, he developed regression models to estimate the annual minimum average 7-consecutive-day discharge from basin and climate characteristics at ungauged sites for two different return periods.Saravi et al. (2010) applied the regional regression approach for frequency analysis of annual peak maximum series of flood flows and also used the models to estimate flood quantiles at ungauged sites for several return periods.They considered seven climatic and physical catchment characteristics.
The only study which aimed at assessing a low-flow index for a range of return period is that of Chen et al. (2006).They performed a frequency analysis of low flows comparing five distributions and using L moments for the Dongjiang basin in South China.The quantile function of the 3-parameter lognormal distribution, which was the most appropriate distribution for the study data set, was then integrated into this regression model in order to establish the formula of the mean 7-day low flow for any return period.But they used a limited data set of 14 sites and used the catchment area as the only characteristic to assess the mean 7-day low flow.
To date, there is no mean to estimate low-flow indices in ungauged rivers for a range of return periods, on the basis of a set of meaningful physical parameters.Hence, in the case of droughts, different measures to maintain a minimum flow in rivers could need this information (e.g."Plan d'Action Sécheresse" in France, drought plans in the UK).
The aim of this study is to propose a complete method to build a model that can estimate low flows for any return period in ungauged rivers.We propose a full analysis chain: a selection of quality gauging stations and a low-flow frequency analysis with comparison of different distributions are first carried out, followed by the development of a regression model that uses physical parameters of the catchments.We aim at obtaining a formula that can be used for any return period between 5 and 50 yr.To do so, we propose a new approach which evaluates the relationships between regression coefficients and the return period.The regionalisation of low flows is also tested in order to improve the model performances.

Study area
The Walloon Region of Belgium covers an area of 16 844 km 2 .The two main catchments crossing this region are the Meuse (70 % of the area of Wallonia) and the Scheldt (20 % of Wallonia) catchments.Wallonia is characterised by a high number of small basins (70 % of gauged catchments are smaller than 200 km 2 ).

Choice of low-flow index
The mean annual minimum of 7-day average flows (MAM7), which is one of the most widely used indices, was selected for this study.The main advantages of this parameter are that it eliminates day-to-day variations and allows analyses to be less sensitive to measurement errors.Moreover, 7-day low flows are not very different from 1-day low flows (Smakhtin, 2001).Averaging over some days also allows smoothing out some human influences on flows such as variation of hourly flows due to hydropeaking and little abstraction from farmers.

Selection of gauging stations
More than 240 gauging stations have been installed in Wallonia during the last 40 yr.From those, we selected the stations that fulfilled the following criteria: -minimum of 20 yr of data (Laaha and Blöschl, 2005); Finally, 59 gauging stations were selected.The main rejection reason was a too short chronicle.

Frequency analysis
The data used for this analysis are the annual minimum 7day average flow series that we abbreviated to AM7 and correspond to 7Q or Q 7 , which is generally used in the USA for return periods of 2 and 10 yr (7Q2 and 7Q10 or Q 7,2 and Q 7,10 ) (Smakhtin, 2001;Hayes, 1991;Vogel and Kroll, 1989).We performed a frequency analysis in order to predict AM7 for return periods (T ) of 5, 10, 20 and 50 yr (AM7 T ).As the length of available data is quite short (maximum of 45 yr), AM7 T cannot be predicted with accuracy for a return period higher than 50 yr.
Parameters of all distribution laws were estimated using the maximum likelihood procedure.Indeed, this method provides asymptotically minimum variance estimates, is adapted to all distributions and to low flows, and has given good results in other studies (Joseph, 1970;Condie and Nix, 1975;Landwehr et al., 1979;Leppärjärvi, 1989;Nathan and McMahon, 1990).
These distributions, except Fréchet, were ordered by increasing posterior probability, and decreasing Akaike's information criterion (AIC) and Bayesian information criterion (BIC).Posterior probabilities were calculated from prior probabilities and Bayes factors.These factors were approximated via Schwarz's method (Schwarz, 1978).As no a priori information on the suitability of each law was available, prior probabilities were considered equal for all distribution laws.AIC and BIC both take into account the likelihood function and the number of parameters, but BIC also considers the size of the sample.Therefore, 2-parameter distributions were favoured by this ranking.
The statistical χ 2 test was also carried out to check the adjustment of each distribution to the sample.
Finally, the Fréchet distribution and the three best distributions for which the χ 2 test hypothesis was accepted were compared graphically.The graph showed observed data (AM7) in function of probabilities of non-exceedance as well as frequency curves for the four distributions.If two different distributions gave the same fit, the simpler one with less parameters was selected (Miquel, 1984).Once the bestfitted distribution was chosen, AM7 T was estimated for the four return periods, along with their 95 % confidence interval (CI).
The selection of the best distribution, except for the Fréchet distribution, was performed using HYFRAN (HYdrological FRequency ANalysis) software which was created, for the purpose of fitting statistical laws, by B. Bobée from the National Institute for Scientific Research -Water, Soil, Environment Centre (University of Quebec) (El Adlouni et al., 2008).

Development of a global regression model
It is acknowledged that catchments which have similar physical and climatic features have similar hydrological responses (Smakhtin, 2001).Firstly, we developed a regression model for the whole study area to estimate AM7 from catchment characteristics.Secondly, we tested if homogeneous regions can be identified in the area, using regionalisation techniques.In these regions, we developed regional models and compared their performance with the global model.
In this study, climatic and physical catchment characteristics were described by the 25 variables presented in Table 1."Summer" refers to the July to September period and "winter" refers to the October to April period.Position data (Alt, X and Y) were measured with a GPS or by levelling.Catchment boundaries were defined using a digital terrain model (DTM).Catchment features (A, DD, Sls, Ls and Ss) were derived from GIS maps.The hydrological type of soil describes the infiltration rate (high for A to very low for D) and drainage (excellent for A to very bad for D) of soils (Demarcin et al., 2011).Meteorological data (AP, SP, WP, ST) were interpolated for each catchment by the hydrological model EPICgrid (Sohier et al., 2009) from meteorological data measured at some locations in Wallonia, by means of the Thiessen polygon method.PET was also computed by the model, using the Penman equation.This model is actually able to simulate the atmosphere-soil-plant continuum, using soil, geology, land use, agricultural practices, topography (DTM) and meteorological data.Percolation is defined as the quantity of rainfall that reaches deeper soil layers.It was estimated for each catchment using a "capacitive" approach: each soil layer is considered as a tank that empties Table 1.The 25 physical and climatic variables considered in this study to describe catchments, along with their abbreviation and unit."Summer" refers to the July to September period and "winter" refers to the October to April period.when the layer water content is greater than field capacity (Sohier et al., 2009).

Variable
Recession is the part of stream flow in which discharge depletes gradually and there is no rainfall or human influence (Dacharry, 1997;Tallaksen, 1995).The recession coefficient is the parameter of the exponential model that describes the recession process.It was calculated for each catchment, using the method developed by Lang and Gille (2006).Recession periods were first defined according to flow and precipitation thresholds, and by removing overland flow influence.Thresholds were adapted to Wallonia.A mean or master recession curve was then constructed using a technique based on the correlation method.The recession coefficient is the parameter of this exponential curve.
Since catchment area was highly correlated to AM7 T (correlation coefficient of 0.72 for T = 5 yr), we used specific flows (AM7 T divided by the area) as the dependent variable.The explanatory variables were the 24 other climatic and physical catchment characteristics.
Out of these 24 variables, it was necessary to select a few of them that were not correlated to each other and that could explain the most variability of specific AM7 T .For this purpose, we used three different methods: stepwise, maximum R 2 improvement and adjusted R 2 selection.Regression coefficients were estimated using the ordinary least squares technique.As the second method gives the best model for each number of variables (p), the one with the Mallows coefficient (C p ) close to p + 1 was selected (Mallows, 1995).For the two last methods, non-significant variables (p value > 0.05) were removed one by one from the model.Also, variables with a variance inflation factor (VIF) above 10 were deleted because this indicates a multicollinearity problem (Confais and Le Guen, 2006;Vezza et al., 2010).
The normality and the equality of variance of residuals were evaluated graphically: Q-Q plot for normality and residuals-predicted AM7 T plot for the equality of variance (residuals must be around 0) (Confais and Le Guen, 2006;Vezza et al., 2010).
The model was calibrated using the variables related to the catchments of the 59 selected gauging stations.The validation of the models was performed using another data set associated with 19 stations which met all criteria, but their record length was between 15 and 20 yr.The validation sample was representative of the study area as these stations are spread over Wallonia.
In order to compare the performance of the models obtained by the three methods, R 2 , adjusted R 2 and RMSE (root-mean-square error) were calculated for each calibration and validation according to the following formulae (Laaha and Blöschl, 2006;Vezza et al., 2010): in which var (Y ) is the variance of observed AM7 T , n is the number of observations, p is the number of variables in the model, Y i is the observed value of AM7 T for the observation i, and Ŷi is the predicted value.R 2 gives the part of the variability of AM7 T explained by the model.R 2 can be adjusted for the number of explanatory variables in the model, which is useful to compare different models.When a variable is added to the model, unlike R 2 , the adjusted R 2 increases only if the new variable has additional predictive capability.RMSE quantifies the difference between observed and predicted AM7 T .
To evaluate the uncertainty of the model, the prediction interval can be calculated.It represents the confidence interval of predicted values of AM7, i.e. plausible values of AM7 for given values of explanatory variables.It takes into account the errors associated with the calibration of the model and with the data used to calculate the predicted value.The 100(1-α) percent confidence interval for the predicted value Ŷp is obtained as follows (Dagnélie, 2006): in which t 1−α/2 is the tabulated t quantile from the Student distribution with degrees of freedom equal to n-p-1, s is the estimate of the residual variance (the mean squared error (MSE)), -X is the matrix of the observations used to calibrate the model, and -X p is the vector of observations used to calculate the predicted value.
All statistical analyses were performed using SAS (statistical analysis system) software.The last step was to evaluate the relationships between regression coefficients and the return period in order to insert the return period as a variable in the equations.

Low-flow regionalisation
As mentioned above, we tested the possibility of delineating homogeneous regions within the study area.Catchment and climate variables were standardised and the ones describing land use and soils were weighted (divided by the square root of the number of variables per characteristic).
Homogeneous regions were obtained by performing a cluster analysis with the 25 variables for the 59 catchments.The clustering method used was the agglomerative hierarchical clustering which merges two small groups into a bigger one.The starting clusters were 59 groups of one observation each.
Since all variables are quantitative, we chose Ward's algorithm (Ward, 1963) as the merging strategy.Moreover, this method has often been used in low-flow regionalisation (Laaha and Blöschl, 2006;Vezza et al., 2010).Each merger was then carried out in order to have the smallest difference in R 2 between two groupings.The determination coefficient R 2 represents the proportion of information kept after the fusion of clusters.The fusions were stopped before this difference in R 2 became large.
To interpret the results of clustering and characterise the regions, we carried out a principal component analysis (PCA) which helps understand the main differences between groups.The mean and standard deviation of all variables for each group were also calculated and boxplots were drawn.This allowed locating groups in the plane of variables and comparing groups according to these 25 variables.

Frequency analysis
For each of the 59 stations, the distribution that best fitted the data was chosen amongst 2-parameter lognormal, 2parameter Weibull, gamma, Fréchet, 3-parameter lognormal and 3-parameter Pearson distributions.Two-parameter lognormal and gamma are the most common laws in Wallonia.No relationship was found between the type of distribution selected and the length of data, catchment area or the spatial location of the catchment.

Development of a global regression model
Applicability conditions were checked and the logarithm of AM7 T was chosen in order to improve the normality of residuals.In addition, Laaha and Blöschl (2006) proposed to use a logarithmic transformation when outliers increase with observed flow, which occurred in our case.Finally, this transformation allows avoiding negative estimates of AM7 T .
The Cook distance allowed us to detect three outliers.These observations corresponded to higher specific AM7.Removing them from the data set would have improved the fitting of the model in calibration, but this was not a sufficient reason.They were therefore kept, but one should note that the model is better calibrated within a range of specific AM7.
For each regression method (stepwise, maximum R 2 improvement and adjusted R 2 selection), the same variables were selected for all return periods, except the 50 yr return period for the two last methods.Maximum R 2 improvement and adjusted R 2 selection actually resulted in the same final equations, except for the 50 yr return period.Table 2 gives the regression coefficients for each method and return period.The general equation is AM7 T = AREA × 10 constant+ax 1 +bx 2 +...+zx p .
(5) Equation ( 6) is an example of the equation for AM7 T 5 obtained by the stepwise method: AM7 T 5 = AREA × 10 −2.7851 + 0.0017 P e − 13.4274 RC .(6) Table 3 presents, for each method, the values of R 2 , adjusted R 2 and RMSE for the calibration and the validation of the models.
For calibration, the performance of the two models is similar except for the 50 yr return period for which the models obtained by the maximum R 2 improvement and adjusted R 2 selection perform better (higher adjusted R 2 and lower RMSE).However, the validation is clearly better with the model obtained by the stepwise method for all return periods.Since the aim of this study is to be able to estimate AM7 T in ungauged catchments, we continued the analyses with the models obtained by stepwise.According to Laaha and Blöschl (2007), the stepwise method maximises the robustness and the predictive performance of the model, and minimises collinearity between variables.Our results confirm the lower collinearity when using the stepwise method.
If we consider the logarithms of specific AM7, VIF is 1.09 for the stepwise method, while VIFs range from 1.4 to 3.21 for the maximum R 2 improvement method and from 1.4 to 2.68 for the adjusted R 2 selection method.
For the stepwise model, the R 2 and RMSE of calibration decrease when the return period increases, which means that the part of the variance of AM7 T explained by the model and residuals both decrease.This seems contradictory but can be explained by the diminution in the variance of observed AM7 T when T increases, this diminution being relatively bigger than the reduction in MSE.From a validation point of view, R 2 is quite high but diminishes also when T becomes larger.RMSE decreases as well but is in the same range of values as the RMSE of calibration.Therefore, the variance of observed AM7 T of validation stations is higher.
In conclusion, the model performs generally well and even better for predicting since the part of the variance of AM7 T explained by the model is higher in validation.This performance is detailed in the following paragraphs.
Residuals increase linearly with observed specific AM7 T , as can be observed in Fig. 1 for T5.Therefore, the models overestimate low values of specific AM7 T (especially under 10 −3 m 3 s −1 km −2 for T5 and T10, and under 5 × 10 −4 m 3 s −1 km −2 for T20 and T50), and underestimate higher values of specific AM7 T (especially over 3 × 10 −3 m 3 s −1 km −2 for T5 and T10, and over 2 × 10 −3 m 3 s −1 km −2 for T20 and T50).This problem of lack of calibration for very high and low values of specific AM7 T is due to the small number of observations for this range of specific AM7 T , especially for specific AM7 T over 4 × 10 −3 m 3 s −1 km −2 for T5, T10 and T20, and over 3 × 10 −3 m 3 s −1 km −2 for T50.
The limits of prediction intervals can be calculated using Eq. ( 4) for the logarithm of specific AM7 T .In our case, for a risk α of 5 % and 56 degrees of freedom, t equals 2.003.The inverse matrix (X X) −1 associated with our observations is The sensibility of the model can also be studied.For a constant value of RC, the specific AM7 T 5 increases by 3.9 % when Pe increases by 10 mm.For a constant value of Pe, the logarithm of specific AM7 T 5 decreases by 6 % when RC increases by 0.002 day −1 .Plotting the constant and regression coefficients of the models against the return period, Fig. 2 shows that the constant and regression coefficients are linked to the return period.We adjusted a logarithmic relationship in order to calculate AM7 T for any return period T between 5 and 50 yr with this formula: AM7 T = Area × 10 −2.6457−0.0847ln T +0.0017Pe−9.8077RC−4×10−5 Pe ln T −2.4148RC ln T .(7) The choice of a logarithmic relationship was made in the context of our results and should be verified in other situations.Compared to AM7 T predicted with the models for each return period, values estimated by this model are lower by 0.5 to 3 % for 5 yr and 50 yr return periods.For 10 yr and 20 yr return periods, the difference is even lower (between 0.1 and 2 %), and estimated values are generally higher for a 10 yr return period and generally lower for a 20 yr return period.This means that, for 5 yr and 50 yr return periods, Table 3.Comparison of the performance of the models developed using three different regression methods (stepwise, maximum R 2 improvement and adjusted R 2 selection).The performance indices considered for calibration and validation are R 2 (determination coefficient), R 2 adj (adjusted determination coefficient) and RMSE (root-mean-square error).Maximum R 2 improvement and adjusted R 2 selection resulted in the same final equations, except for the 50 yr return period.The numbers in bold are the highest values of indices for each return period.The stepwise model seems the best option for the purpose of this study.this model slightly underestimates AM7 T that are already underestimated but improves the estimation of AM7 T overestimated by the models for each return period.For a 10 yr return period, it is generally the opposite: overestimation of overestimated AM7 T but improvement of the estimation of underestimated AM7 T .

Low-flow regionalisation
The cluster analysis resulted in four groups of catchments; the study area could therefore be divided into four homogeneous regions.Catchments in a homogeneous region are contiguous.
The PCA helped to elucidate the differences between groups (Fig. 3a).The first component is highly positively correlated to precipitation, the area percentage of forests, the slopes, the altitude, the area percentage of soils of the hydrological group B and the recession coefficient.It is negatively correlated to the area percentage of arable lands and soils of the hydrological group A, the summer temperature, Y map coordinate, and the area percentage of urban lands.These results have a real physical meaning.The southern region of Wallonia, corresponding to lower Y map coordinates, is characterised by higher altitudes and slopes, while soils of the hydrological group A and arable lands are more present in the northern region of Wallonia where precipitation is smaller and there are also many big city centres.
The second component is positively correlated to percolation and the area percentage of soils of the hydrological group D.
The area percentage of soils of the hydrological group C, soils that were not mapped, permanent crops and grasslands, the area, the drainage density, the X map coordinate and the potential evapotranspiration are variables which are less correlated to the first two components.They contribute therefore less to the grouping of catchments in the regions.
As shown in Fig. 3b, the regions can be distinguished by the climatic and physical features used for the analysis.
Region 1 is located in the north of Wallonia.It is a region of low altitude (average of 62.4 m) and gentle slopes (average of 3.2 % for the median slope) that receives less precipitation (average of 817 mm), has higher summer temperatures (average of 16.6 • C), is rather agricultural and urban (average of 58.1 % and 10.3 % of the area of catchments covered by arable and urban lands, respectively), and has soils of good infiltration capacity and permeability predominating (average of 44.4 % and 40.7 % of soils of hydrological groups A and B, respectively; 127.5 mm for percolation and 0.0012 for the recession coefficient).
Region 2 has a central spatial location.It is a region characterised mainly by intermediate values of features: medium altitude, steep slopes, average amount of precipitation, average summer temperatures for Belgium, fairly urbanised and agricultural (average of 4.6 % and 22.8 % of the area of catchments covered by urban and arable lands, respectively), rather forested and grassy (average of 30.7 % and 35 % of the area of catchments covered by forests and grasslands, respectively), and soils of moderate infiltration capacity and relatively low permeability predominating (average of 63.4 % and 19.2 % of soils of hydrological groups B and C, respectively; 78.4 mm for percolation and 0.0018 for the recession coefficient).
Region 3 is situated in the south of Wallonia.It is a region of higher altitude (average of 260.8 m) and steep slopes that receives a lot of precipitation (average of 1138 mm), has lower summer temperatures (average of 14.9 • C), is not very urbanised (average of 3.1 % of the area of catchments covered by urban lands), and has soils of relatively good infiltration capacity and moderate permeability predominating (average of 72.6 % and 19.8 % of soils of hydrological groups B and C, respectively; 99.3 mm for percolation and 0.0030 for the recession coefficient), where forests and grasslands prevail (average of 47.4 % and 34.8 % of the area of catchments covered by forests and grasslands, respectively).Region 4 is to the south of region 3.It is a region of higher altitude and steep slopes (average of 7.6 % for the median slope) that receives a lot of precipitation (average of 1079 mm), has average summer temperatures, is rather forested and grassy (average of 29.7 % and 36.4 % of the area of catchments covered by forests and grasslands, respectively), and has soils of low infiltration capacity but good permeability predominating (average of 34.3 % and 43 % of soils of hydrological groups B and C, respectively; 218.8 mm for percolation and 0.0012 for the recession coefficient).

Development of regional models
It has been shown by several studies that developing one regression equation per region results in better estimates than a global equation (Smakhtin, 2001;Laaha and Blöschl, 2006;Vezza et al., 2010).However, in our case, groups do not contain more than 20 catchments each.Group 4 has only 5 observations, and this can be a problem when checking applicability conditions.Nevertheless, we developed regional models using the stepwise method for the regions containing enough catchments to calculate statistics and compared their performance with the global model.In region 1, there are 18 catchments for calibration and 6 for validation.Region 3 contains 20 catchments for calibration and 10 for validation.
Applicability conditions were checked and the logarithm of AM7 T was also chosen.Outliers were not removed for the same reasons as the global regression model.Tables 4 and  5 give the regression coefficients for each return period for regions 1 and 3, respectively.The general equation is Eq. ( 5).The same conclusion as for the global model can be drawn regarding residuals.
Regional models give good results (see Table 6 for the performance indices) but do not improve all estimates of AM7 T ; residuals increase for some catchments.The gain in precision is not considered sufficient when balanced with the loss of robustness due to the smaller number of catchments used to calibrate and validate the models.The global model is thus preferred at the moment for its greater robustness.However, looking at their current performance, regional models seem promising for the future, when more data from gauging stations will be available.Indeed, it would be interesting to carry out a new cluster analysis and develop one model per homogeneous region in 10 yr.It should actually improve estimates of low flows.

Frequency analysis
Two-parameter lognormal and gamma are the most common distributions that fit low-flow data in Wallonia.The 2parameter lognormal law has already been used by Galéa et al. (1999) to fit low-flow data of the Loire catchment in France.Gamma was the best distribution for the Missouri catchment in the USA (Joseph, 1970).As our results demonstrated that the type of distribution is not linked to the spatial location or the area of the catchment, a frequency analysis per catchment was essential to determine the best distribution and therefore accurately estimate AM7 T for return periods of 5, 10, 20 and 50 yr.

Global regression model
The explanatory variables selected by stepwise for the global model were the recession coefficient and percolation.Indeed, they are the two variables that most correlated to specific AM7 T (correlation coefficient of −0.47 and 0.39 for a return period of 5 yr) without being correlated to each other (coefficient of −0.28).These two features are linked to geology: the more permeable the substratum, the higher the percolation and the lower the recession coefficient.Geology is considered by Smakhtin (2001) as one of the natural factors that most influence low flows.Indeed, the main component of low flow is baseflow, which depends on geology and in particular on substratum permeability.Percolation allows estimating groundwater recharge, and the recession coefficient helps characterise water input from groundwater to the river during low-flow periods.Moreover, Vogel and Kroll (1992) found that low-flow characteristics were highly correlated to catchment area, average basin slope and baseflow recession constant.Therefore, this equation quantifies the role played by geology in determining low flows in Wallonia.
The physical role of the variables selected by the other methods can also be explained.In our region, the precipitation from October to April quantifies water input during the period of groundwater recharge.The hydrological type of soil describes the infiltration rate (high for A to very low for D) and drainage (excellent for A to very bad for D) of soils.Yet, higher infiltration favours higher groundwater recharge, and groundwater is the main source of water in rivers during low-flow periods.Therefore, soils of the hydrological group A permit a higher groundwater input into the rivers during low-flow periods than soils of the hydrological groups B and C. Grasslands favour infiltration thanks to their dense root system.The renewal of the roots creates preferential infiltration paths.
The global model resulted in good prediction for the middle range of specific AM7 T (4 × 10 −4 to 4 × 10 −3 m 3 s −1 km −2 for return periods of 5 and 10 yr, 3×10 −4 to 4 × 10 −3 m 3 s −1 km −2 for a return period of 20 yr, and 2 × 10 −4 to 3 × 10 −3 m 3 s −1 km −2 for a return period of 50 yr).Around 10 % of the observations are outside this infiltration rate and drainage decrease.Compared to region 3, region 4 is characterised by lower altitude, precipitation, and percentage of grasslands and forests, and higher slope, temperature, and percentage of arable lands and soils of good infiltration rate and drainage.All these factors are in fact related.Crops grow better on soils of good infiltration rate and drainage, and need suitable temperature and precipitation.Forests rather cover steeper lands situated at higher altitude, where soil properties are less good.Actually, the four regions nearly correspond to natural regions called agrogeographical regions (Fig. 4) (Christians, 1971).
To date, only two regions contain enough gauging stations to permit the development of a specific model.Compared to the global one, they presented little interest because of the loss of robustness.But considering that a lot of new gauging stations have been put in place during the last decade, we propose to reconsider them in the future.

Conclusions
We developed a full analysis chain allowing us to estimate low flows anywhere in gauged and ungauged catchments in Wallonia, and this for any desired return period between 5 and 50 yr.
This method puts together the selection of gauging stations for low-flow calculation, frequency analysis to fit a frequency distribution to low-flow data, an optional cluster analysis to delineate homogeneous regions if enough data are available, regression analysis to develop models predicting low flows from catchment characteristics, and a new approach that evaluates the relationships between regression coefficients and the return period.
This method answers the need of water managers since it allows them to freely choose the return period of the low flows that they will consider in their regulation for each water use.
This method has been applied to a quite small region.The lack of data available at the moment affected the development of regional models and the precision of the global model for very high and low values of AM7 T .It is therefore advised to carry out this study again in 10 yr when more stations have at least 20 yr of data.It would also be interesting to take climate change into consideration, and repeat it for other areas to see if the same variables are selected for all return periods.

Fig. 1 .Fig. 3 .Fig. 2 .
Fig. 1.Plot of global model residuals against observed specific AM7 T (example for the return period of 5 yr).Residuals increase linearly with observed specific AM7 T .

Fig. 3 .
Fig. 3. Groups of catchments (a) and correlation circle (b) in the plane of the two first principal components.The two first components of the PCA allow distinguishing the four groups of catchments.

Fig. 4 .
Fig. 4. Map of homogeneous regions and agro-geographical regions in Wallonia.The four homogeneous regions nearly correspond to Walloon agro-geographical regions.

Table 2 .
Regression coefficients of the global model by method of regression and return period.The general equation is AM7 T = AREA × 10 constant+ax 1 +bx 2 +...+zx p .L g is the area percentage of grasslands [%], S A the area percentage of soils of hydrological group A [%], WP winter precipitation [mm], RC recession coefficient [day −1 ], Pe percolation [mm], Sl 90 the 90th percentile of slope [%], S B the area percentage of soils of hydrological group B [%], and S C the area percentage of soils of hydrological group C [%].

Table 4 .
Regression coefficients of the model for region 1 by return period.The general equation is AM7 T = AREA × 10 constant+ax 1 +bx 2 +...+zx p .Pe is percolation [mm] and SP summer precipitation [mm].

Table 5 .
Regression coefficients of the model for region 3 by return period.The general equation is AM7 T = AREA × 10 constant+ax 1 +bx 2 +...+zx p .ST is summer precipitation [mm], and RC the recession coefficient [day −1 ].