Selection of an appropriately simple storm runoff model

Abstract. An appropriately simple event runoff model for catchment hydrological studies was derived. The model was selected from several variants as having the optimum balance between simplicity and the ability to explain daily observations of streamflow from 260 Australian catchments (23– 1902 km2). Event rainfall and runoff were estimated from the observations through a combination of baseflow separation and storm flow recession analysis, producing a storm flow recession coefficient ( kQF). Various model structures with up to six free parameters were investigated, covering most of the equations applied in existing lumped catchment models. The performance of alternative structures and free parameters were expressed in Aikake’s Final Prediction Error Criterion (FPEC) and corresponding Nash-Sutcliffe model efficiencies (NSME) for event runoff totals. For each model variant, the number of free parameters was reduced in steps based on calculated parameter sensitivity. The resulting optimal model structure had two or three free parameters; the first describing the non-linear relationship between event rainfall and runoff (Smax), the second relating runoff to antecedent groundwater storage ( CSg), and a third that described initial rainfall losses ( Li), but which could be set at 8 mm without affecting model performance too much. The best three parameter model produced a median NSME of 0.64 and outperformed, for example, the Soil Conservation Service Curve Number technique (median NSME 0.30–0.41). Parameter estimation in ungauged catchments is likely to be challenging: 64% of the variance inkQF among stations could be explained by catchment climate indicators and spatial correlation, but corresponding numbers were a modest 45% for CSg, 21% forSmax and none forLi , respectively. In gauged catchments, better estimates of event rainfall depth and intensity are likely prerequisites to further improve model performance.


Introduction
Estimating catchment streamflow where or when it is not observed is a well established field of hydrological research (e.g.Sivapalan et al., 2003).Accurate prediction requires an appropriate model structure and methods to estimate model parameters.There is a well-known trade off between, on one hand, using a simple model that does not describe the available data well and, on the other, using a complex model that contains too many similar equations to reliably estimate their parameters (the ''equifinality" problem; Beven, 1993).
This study revisits the question posed by Jakeman and Hornberger (1993): "How much complexity is warranted in a rainfall-runoff model?".However, where those authors focused on the number of linear or parallel stores that best described the delayed release of water from a catchment, the current study focuses on the optimal functional form of the set of equations used to estimate what part of event precipitation is converted into storm runoff.The scope of this study is limited to hydrological models with process equations that operate on a daily time step and describe the behaviour of catchments rather than (segments of) hillslopes.Many such so-called 'lumped' models have been proposed (reviewed in e.g.Beven, 2004;Blöschl, 2005;Maidment, 1992) and are widely used as a comparatively parsimonious, pragmatic approach to estimating streamflow generation under historic, scenario or forecasted conditions.
The equifinality problem is particularly prominent in catchment storm runoff estimation, since alternative runoff generation processes can produce essentially the same streamflow patterns at the catchment outlet.Storm runoff can be generated by a combination of infiltration excess runoff, runoff from saturated zones, subsurface storm flow and direct rainfall onto water bodies, and even in highly instrumented field experiments the alternative mechanisms can be difficult to distinguish (see contributions in Kirkby, 1978;Beven, 2006).
Published by Copernicus Publications on behalf of the European Geosciences Union.
A. I. J. M. van Dijk: Selection of an appropriately simple storm runoff model Reflecting this ambiguity, the various existing rainfallrunoff models make different, often implicit, assumptions about the significance or insignificance of alternative runoff processes.A generic set of equations that captures most thresholds and variables may be: R = f sat P n + (1 − f sat )P n − I + R return (1) where R is runoff, P n net rainfall, I net infiltration into the soil, R return return flow from the soil (all in mm per event or mm d −1 ), and f sat the fraction saturated area.P n is often expressed as total rainfall less an initial loss L i (mm) which may be conceptualised as a constant or a proportion f n (or a combination of both) and assumed to represent infiltration and/or evaporative losses.The fraction f I represents the fraction of net rainfall on unsaturated soil that infiltrates, and f s the fraction of I that can be retained in the soil.This generalised model is usually simplified one way or another, for example, by assuming that f sat , f n , f I or f s are either negligible or equal to unity (e.g.Bergström, 1992;Chiew et al., 2002).It may also be made more complicated by introducing further functional relationships, for example expressing f sat as a function of groundwater level or storage, or expressing f I as a function of storm size or rainfall intensity and/or soil water content, or expressing f s as a function of actual and maximum soil water storage.Additionally, one or several of the variables in these equations may be represented by spatial distribution functions, for example to represent sub-grid variability in coarse resolution land surface models (e.g.Bonan, 1996;Liang et al., 1994;Liang and Xie, 2001;Oleson et al., 2004).Each addition introduces further assumptions and, importantly, model parameters.
An illustration of the ambiguity in model interpretation due to the multitude of equivalent storm runoff processes is provided by considering the widely used Soil Conservation Service Curve Number method (SCS-CN; USDA-SCS, 1985).It can be recast in a form somewhat similar to Eq. (1) as: Where S is notionally the maximum retention after runoff begins (mm).The SCS-CN model was derived empirically however, and when comparing the second term of Eq. ( 5) to Eqs. (1-4) it can be interpreted in several ways: -Equivalent to (1-f n ), representing the functional relationship between storm size and fraction rainfall in excess of infiltration capacity, in which case P n could be interpreted as a proxy for rainfall intensity and S as a proxy for maximum infiltration rate (while f sat = 0 and f s = 0).
-Equivalent to (1-f s ), representing the functional relationship between storm size and return flow fraction, in which case S represents maximum soil storage capacity (while f sat = 0 and f I = 1).
-Equivalent to f sat , representing a functional relationship between storm size and saturated catchment area that could arise if cumulative infiltration or run-off/run-on processes lead to increase of the saturated area over the course of a storm, where P n might be a proxy for cumulative infiltration or actual runoff, and S a proxy of the efficiency of soil and catchment drainage (in which case f n = 1 and f s = 0).
Elaborations of the SCS-CN method that implicitly or explicitly assume one of these three underlying explanations (or a combination) do exist, for example by relating S to land cover or soil conditions, or by modifying S as a function of antecedent rainfall or groundwater storage (for examples of both see Maidment, 1992;Mishra and Singh, 2003).Presumably the effectiveness of these elaborations will depend on dominant runoff processes.
The same point could have been demonstrated with other lumped runoff models, but this study does not attempt to address the ambiguity in interpreting models, which requires field study.However, the examples given emphasise the risks in increasing model complexity (adding processes, equations, parameters) without a solid justification through improved model performance.To date, there does not appear to have been a comprehensive and formal statistical analysis to assess what is an appropriately simple model to describe the relationship between event precipitation and event runoff at catchment scale.Since dominant runoff processes are expected to vary between catchments of different substrate, climate and land use, it would be anticipated that different model structures may perform better in different catchments.
The aims of the study were as follows: -Test several alternative versions and simplifications of the generalised storm flow model expressed in Eq. ( 1) for their performance in reproducing estimated event storm flow from 260 catchments across Australia.

Data
Daily streamflow data (in ML d −1 ) were collated for 362 catchments across Australia as part of previous studies (Peel et al., 2000;Guerschman et al., 2008Guerschman et al., , 2009)).Streamflow data for these selected catchments were considered of satisfactory quality and any influence of river regulation, water extraction, urban development, or other processes upstream considered unimportant.Large lakes or wetlands do not occur in any of the catchments, but smaller impoundments can occur.From the data set, those records were selected that had good quality observations for at least five years during the period 1990-2006 and no less than 50 runoff events (defined as an increase in streamflow from one day to the next).
The selected 260 stations were located in southwest West Australia, Tasmania, and coastal regions of the eastern states (Fig. 1).The contributing catchments of all gauges were delineated through digital elevation model analysis and visual quality control.Catchment areas varied between 23-1902 (median 333) km 2 .
Daily streamflow volumes were converted to streamflow depths (Q, mm d −1 ) and varied from 2 to 1937 (median 114) mm y −1 .Catchment-average daily precipitation was calculated using a gridded 0.05 • precipitation product de-rived by interpolation of station data (Jeffrey et al., 2001).Of the rain gauges used in interpolation, on average there were three (range 0-22) inside or within 5 km of each catchment.The range of average annual rainfall for the catchment sample was 317-2983 (median 851) mm y −1 ; precipitation other than rainfall was not important.Priestley-Taylor potential evapotranspiration (E 0 ) was 651-2417 (1254) mm y −1 and catchment humidity (H , the ratio of average rainfall over average E 0 ) was 0.13-3.48(0.68).The data set includes catchments under native forest, catchment fully cleared for grazing, and catchments with a varying combination of cropping, grazing, plantation forestry and native vegetation.

Streamflow analysis
The streamflow data were separated into time series of daily estimated baseflow (BF or Q BF ) and quick flow (QF or Q QF ) by combining forward and backward recursive linear reservoir baseflow filter as described in Van Dijk (2010).It was assumed that the estimated QF represents the sum of all storm runoff processes and BF the delayed groundwater discharge, but it is noted that the hydrograph per se cannot provide evidence for this interpretation.
Estimating event runoff from daily storm flow totals requires an additional step that considers the storm flow recession.To estimate the storm flow recession constant (k QF ) from the storm flow time series, all days t = i showing a storm flow peak were identified, that is, all days for which Q QF (i-1)<Q QF (i) > Q QF (i+1).For each station a weighted average storm flow recession constant was calculated following the theory for a linear store (cf.Van Dijk, 2010) as: Total event rainfall P (i) for the event peaking on day t =i was subsequently estimated as: where t =i−2 was chosen to account for the fact that the recorded rainfall event and the peak in storm flow were occasionally separated by up to two days due to the different times of rainfall and streamflow recording (09:00 a.m. and 24:00 p.m., respectively).Total event runoff R(i) was estimated as: where t n is the day on which Q QF (t n ) is ten times less than Q QF (i).This was done to avoid inclusion of storm runoff from subsequent rainfall events.The term S R (t n ) is the estimated storm runoff still in storage at the end of day t n based on linear reservoir theory.
Several studies have found that runoff response is positively related to groundwater level measurements in piezometers and antecedent streamflow rates (e.g.Dunne and Black, 1970) and this was also observed for several Australian catchments (e.g.Liu et al., 2007;Peña Arancibia et al., 2007;Beck et al., 2010).This is commonly attributed to the growth of saturated areas as groundwater rises.To allow inclusion of this correlation in the model, groundwater storage S g was estimated from the daily baseflow (Q BF ) estimates as: where k BF is the baseflow recession constant.Values of S g on the first day of each runoff event were used as an estimate of antecedent groundwater storage S g (i) before runoff event i. Due to the method of baseflow separation these values were estimated from the preceding baseflow recession with a forward filter and therefore not influenced by the storm flow event itself (cf.Van Dijk, 2010).

Evaluation of alternative model structures
For each of the model variants that was tested, the most complex model (that is, the one with the maximum number of parameters) was gradually simplified based on parameter sensitivity analysis, and the corresponding change in prediction error was calculated.The basic model structure tested had the form (cf. Eqs.1-4): where C 1 , C 2 , C 3 and C 4 are all optionally free parameters.
Each of these parameters can be effectively omitted by giving it a value of either zero or unity.For example, C 4 = 0 simplifies the equation to a power function of P n , while in addition C 3 = 1 produces a constant runoff fraction, and C 1 = 0 removes the rainfall threshold before runoff is produced.With C 3 = 1 and C 2 =1 the equation mirrors the SCS-CN model.To test whether model performance was further improved if the model took into account the effect of antecedent groundwater storage S g the following modifications were also tested: where C 5 and C 6 are again parameters that could be fitted or prescribed values of zero or unity.It follows that the most complex models had six free parameters.
To assess the trade-off between the number of free model parameters and the improvement in model performance, Akaike's Final Prediction Error Criterion (FPEC; Akaike, 1970) was used.FPEC represents the expected prediction error that would result were the model tested on a different data set, and is calculated as the product of the prediction error (ε) and a penalization factor that considers the degrees of freedom (d, the number of free parameters) in comparison to the number of observations (n): The error ε was estimated as the mean squared error between observed and modelled event runoff estimates.It was found by optimising the free model parameters with minimum ε as objective function.Latin Hypercube Sampling was used Hydrol.Earth Syst.Sci., 14, 447-458, 2010 www.hydrol-earth-syst-sci.net/14/447/2010/ to find a near-optimal parameter set for each model variant.
The number with draws was 10 d (but with a minimum of 10 2 and a maximum of 10 4 ), after which the optimal parameter set was found with a Nelder-Mead Simplex search.The value of n was calculated as the number of rainfall events used to fit the model.In principle, the model with the lowest FPEC should be adopted.However, Schoups et al. (2008) pointed out that Eq. ( 14) assumes that n d and may lead to underestimates of prediction error and favour overly complex models.This caveat was considered when interpreting FPEC values.Nash-Sutcliffe model efficiency (NSME; Nash and Sutcliffe, 1970) is arguably the most common metric to express runoff model performance in calibration (despite some undesirable properties; Legates and Mc-Cabe, 1999;Criss and Winston, 2008).As it can be calculated from the mean squared error, an adjusted NSME for prediction could be calculated from the FPEC and was used in interpretation.
Another factor to consider in deciding the optimal model structure was correlation between fitted parameters, with high correlations being indicative of parameter equivalence.The parametric and non-parametric (ranked) coefficient of correlation (r and r * , respectively) between fitted parameters was considered an indicator of possible equivalence between model parameters.

Spatial predictors of model parameters
The procedure described in Van Dijk (2010) was used to assess the predictability of calculated values of k QF and the fitted parameter values of the optimal model.The analysis involved statistical analysis of parameteric and nonparameteric correlation coefficients (r and r * ) with a variety of catchment attributes, including catchment geology; morphology (size, mean slope, flatness); soil characteristics (saturated hydraulic conductivity, dominant texture class value, plant available water content, clay content, solum thickness); climate indices (P , E 0 , H , remotely sensed actual evapotranspiration, average monthly excess precipitation); and land cover characteristics (fraction woody vegetation, fractions non-agricultural land, grazing land, horticulture, and broad acre cropping, remotely sensed vegetation greenness).Following this correlation analysis, any spatial correlation in the residual variance was analysed by fitting semi-variograms.Further details on the data sources and procedures can be found in Van Dijk (2010).

Optimal model structure
The average station record included 2178 storms (range 691-3734).Of these, 19% (2-51%) produced storm flow, resulting in an average 631 (30-1005) events.The reduction in prediction error that is required to accept another parameter is influenced by the definition of n in Eq. ( 14).The number of observed runoff events was used here, implying that on average an improvement of 0.32% would be required for an additional parameter to be accepted.Had the number of rainfall events been used instead, the improvement would only need to be 0.09% on average.Conversely, however, typically only some 10 to 20 events produced the large majority of runoff and variance in the runoff (Fig. 4a); if this number had been used instead the improvement would need to be in the order of 10-20%.
The best results obtained with all model variants using between six and one parameter are listed in Table 1.Equation (12) provided the best results among the alternative model structures tested.Table 1 suggests that the six parameter model has the best predictive power, but for reasons mentioned this small difference is not a robust result.Model performance appeared to remain very similar as the number of fitting parameters was reduced to three.FPEC increased slightly as the number of parameters was reduced to two (by 0.2% and 4.1% in median and mean FPEC, respectively), but increased much more if the number of parameters was further reduced to one.The two-and three-parameter model variants produced similar NSME values (Table 1; based on calculated FPEC and hence also allowing for the number of free parameters).The main differences occur for catchments with overall low model performance (Fig. 2).
An interpretative notation for the three and the two parameter version of Eq. ( 12) could be: with and where S max (mm) is maximum storage capacity, L i (mm) initial loss, and C Sg (mm −1 ) saturated area coefficient.The definition of S max is similar but slightly different from the SCS-CN model (Eq.5).

Parameter values and predictability
Correlation among parameters decreased as the number of free parameters was reduced.For the six-parameter model, the highest absolute value of r (r * ) between parameters was 0.39 (0.40), and gradually reduced to 0.29 (0.25) for the three parameter model, and 0.12 (0.04) for the two parameter model, respectively.The highest correlation in the three parameter model was between optimised values of L i and C Sg ; correlation with S max was small (|r|<0.15).Statistics of the distribution of optimised values for the best three parameter model Eq. ( 12) are listed in Table 2.The amount of variance between stations that could be explained by catchment attributes and the fraction of residual variance that was spatially correlated are listed in Table 3.
Values of k QF showed correlation with catchment climate indicators, such as average monthly excess precipitation (AMEP, r * = −0.52),potential ET (E 0 , r = 0.50) and humidity (H , r * =-0.49), and some correlation with the frac-tion of catchment under non-agricultural cover (r * = −0.45).The highest k QF values (i.e.fastest recessions) occurred in dry catchments.The strongest regression equation was a linear function of E 0 , explaining 23% of the variance.Another 41% of the variance was spatially correlated over distances of ca.300 km; the remaining 36% of variation was left unexplained (Table 3).
None of the catchment attributes correlated with L i and a predictive regression equation could not be established.The strongest correlation found was with depth-averaged rainfall intensity (r * = 0.31).There was also no obvious spatial correlation.
Similarly, no catchment attribute correlated with S max .The strongest correlation found was with the coefficient of variation in monthly rainfall (r * = 0.33).About 21% of the variance in (log-transformed) S max values appeared correlated over lengths of 1000 km or more.
Finally, the saturated area coefficient C Sg in mm −1 showed correlation with catchment climate indices such as AMEP (r * = −0.52),average rainfall (r * = −0.48)and catchment humidity (r * = −0.44).The best regression equation (with AMEP) still only explained 13% of the variance however.About 32% of the variance appeared spatially correlated over distances of ca.200 km.

Optimal storm runoff model structure
Based on the calculated FPEC values alone, the sixparameter model could be accepted as theoretically having the smallest prediction error.However, the deterioration in performance between the six-and three-parameter model seems insignificant when considering that the FPEC likely was too lenient on additional parameters.Among the remaining three parameters, L i appeared the least necessary parameter.Mean FPEC increased by 4% if the median fitted L i value (8 mm) was used.Together with the apparent lack of predictability of L i (no correlation with catchment attributes or spatial correlation could be found), this may not be enough basis to prefer the three-parameter model over the two-parameter variant.Reducing the number of parameters to one strongly deteriorated performance.
The SCS-CN technique is one of most widely used models to estimate event runoff, and shows some similarities with the optimal model structure derived here.Therefore a direct comparison of performance is of interest.an otherwise identical approach, two commonly used versions of the SCS-CN model were fitted to the event rainfall and runoff estimates.The two parameter version with initial loss I a (in SCS-CN notation, equivalent to L i ) and maximum storage S (equivalent to S max ) produced a mean FPEC of 1.82 mm and a median NSME of 0.41.The values of I a and S were slightly negatively correlated (r = −0.20)rather than showing the positive correlation expected (cf.Maidment, 1992;Mishra and Singh, 2003;USDA-SCS, 1985).Converting the optimised S to curve numbers produced CN values that were beyond the recommended range of 30-100 for 80 out of 260 stations.Fitting the one-parameter version, where it is assumed that I a = 0.2S, produced a FPEC of 1.80 mm and a median NSME of 0.30.Converting the optimised S values to curve numbers suggested an average CN of 60 (standard deviation ±12), and CN values were within the recommended range for 255 out of 260 stations.
It is concluded that the optimal two-parameter model selected outperforms the SCS-CN method by 14-15% when considering the mean FPEC (1.58 vs. 1.80-1.82)and more so when considering the median NSME (0.61 vs. 0.30-0.41).Therefore, at least for Australian conditions, Eq. ( 16) appears an improvement when compared to the SCS-CN technique, or indeed any of the other storm runoff models that can be expressed in terms equivalent to ( 12) or ( 13).Because the main difference with the SCS-CN technique is the consideration of groundwater storage dependent runoff response, it follows that antecedent wetness conditions have demonstrable predictive potential (cf.Beck et al., 2010).

Storm flow recession coefficient
Most catchments showed storm flow recession 'half times' of about one day (k QF = 0.77 d −1 ) with the most rapid drainage in dry catchments.This is consistent with the expectation that storm flow under dry conditions would be predominantly through infiltration excess overland flow during a small number of high intensity rainfall events.The 260 catchments varied considerably in size (23 to 1902 km 2 ) and because of associated differences in runoff travel time a relationship with catchment size might have been expected.Regression analysis did not indicate any such relationship (r = 0.20).Published methods to estimate surface travel times (Maidment, 1992) produce travel times between <0.05 day to ca. 0.5 day for the catchment size range, and ca.0.1 days for the median 333 km 2 catchment.Compared to the derived recession half times of around one day these numbers are small, and therefore it appears overland and channel storm flow routing is not the main cause for the observed storm flow recessions.It is concluded that storm flow recession is likely dominated by the release of water that is temporary retained in the catchment (e.g. in ephemeral water bodies, draining soil or fast responding groundwater).

Initial loss
The range of estimates for initial loss (50% between 3-19 mm, 90% less than 41 mm) agrees with the values reported in literature (e.g.Tromp-van Meerveld and McDonnell, 2006).Values are also consistent with the SCS-CN approach: a common assumption in applying the method is that www.hydrol-earth-syst-sci.net/14/447/2010/ Hydrol.Earth Syst.Sci., 14, 447-458, 2010 initial abstraction equals 0.2 of maximum retention (USDA-SCS, 1985) and combining this with curve number estimates of 60 to 90 (covering most of the range recommended for forest and grazing land) produces initial abstraction estimates of 6-34 mm.Optimal values of L i could not be predicted, but an initial loss of 8 mm caused little deterioration in model performance.Trialling alternative values for L i suggested that values of 6-12 mm produced the best FPEC, but FPEC deteriorated by less than 2% for any value between zero and 19 mm.
Initial losses are a conceptual water balance component covering a variety of processes, including rainfall retained by vegetation canopy and other surfaces and subsequently evaporated (typically in the order of 1-3 mm; e.g.van Dijk and Bruijnzeel, 2001), losses to wet up a dry soil Surface, and runoff retained in surface depressions that need to be filled before catchment runoff occurs (technically not an initial loss but likely to be lumped into it due to the model structure).

Maximum storage capacity
Between the three-and two-parameter model versions, the optimised S max values changed by more than 20% for 41% of stations.It is concluded that this parameter is rather poorly constrained.Values of S max found through optimisation were generally very high: 74% of stations had fitted values of more than 600 mm, and the maximum bound set at 10 5 mm was still suboptimal for 9% of stations.Such high values make interpretation as a 'maximum potential retention' unrealistic and call for another interpretation.When the ratio S max over P n attains high values, Eq. (15a) approaches the linear relationship: For example, for S max = 500 and P n = 50, the difference in R calculated from Eqs. (15a) and ( 16) is 10%.Fitting Eq. ( 16) to the data led to an overall deterioration in FPEC of 0.9%.Although Eq. ( 16) was preferred for its more realistic limits, it may be more conceptually appropriate to rewrite it in the equivalent form: where k P is a constant of proportionality that describes the initial increase in runoff fraction with event precipitation.More than once mechanism can be invoked to explain why runoff fraction should increase with rainfall event, as implied by Eq. ( 17).A rapid increase of temporarily saturated surface area as more rainfall accumulates (e.g. because of a perched water table) provides one possible explanation and has been observed in field studies (Latron and Gallart, 2008;Tanaka, 1992).An alternative explanation is that P n may function as a surrogate for peak storm rainfall intensity; the key storm  characteristic if runoff is dominated by Horton overland flow.For example, field studies in Indonesia demonstrated: (i) that an effective depth-averaged rainfall intensity can be calculated for every storm from short intervals measurements; (ii) that for a given site this index has strong predictive power to estimate storm runoff coefficient; and (iii) that there appeared to be an approximately linear relationship between storm rainfall depth and intensity.These findings could be combined to produce a theory linking event rainfall and runoff coefficient with a functional form that in fact closely resembles that of Eq. ( 17) and explained observed runoff from study plots of a range of sizes (<1 to 40 000 m 2 ; van Dijk et al., 2005a, b;Van Dijk and Bruijnzeel, 2004).The spatially variable infiltration model underlying the theory was originally developed and validated for sites in Australia and several southeast Asian countries (Yu et al., 1997) while the intrastorm rainfall intensity distribution has been shown equally valid for Australia (Surawski and Yu, 2005).In summary, the relationship between event size and runoff fraction can be explained by expansion of the saturated area during the storm, or the statistical relationship between event size and peak intensity, or a combination of both.

Saturated area coefficient
Antecedent baseflow proved a good predictor of storm runoff response.This would not surprise if saturation overland flow associated with groundwater (or other slowly draining stores) is an important runoff generating mechanism.The potential importance of this mechanisms has been recognised since the 1960s (Cappus, 1960, reproduced in Beven, 2006;Dunne and Black, 1970; see also recent review in Latron and Gallart, 2008).
To assess whether the saturated areas implied by the model simulations were realistic, the apparent median fraction of Hydrol.Earth Syst.Sci., 14, 447-458, 2010 www.hydrol-earth-syst-sci.net/14/447/2010/ saturated area (f sat ) was estimated for each catchment by combining the optimised C Sg value with the groundwater storage (S g ) estimated from the median baseflow rate in the time series with Eq. (15c).The resulting f sat was less than 5% of the area for 72% of the catchments.Values were positively correlated to catchment humidity (r = 0.61; Fig. 3).
Values greater than 20% were calculated for 14% of catchments.For some of these, humidity was high and therefore the estimated values may still be realistic.For some others total runoff was suspiciously high (Q/P >0.4), suggesting potential errors in estimated rainfall, streamflow or catchment area that were compensated by high f sat estimates.In the remaining cases, presumably saturated area was overestimated and the associated overestimate of saturation runoff compensated by an underestimation of infiltration excess runoff.This further highlights the uncertainty in model parameter estimation.The power of baseflow in explaining runoff response does not necessarily imply that water storage in the unsaturated zone, and in the soil in particular, has no effect on runoff response.Previous analysis using satellite-observed wetness of the top few cm of soil showed little value in predicting runoff response in Australian catchments: effectively, the dynam-ics in these shallow observations were much more rapid than those observed in surface runoff response, which increased more gradually during the course of the wet season in phase with baseflow (Liu et al., 2007;Beck et al., 2010).It may be that deeper soil moisture still plays a role in determining runoff response, however, for example by influencing rapid sub-surface pathways that allow infiltrated water to generate return flow.Without any direct observations or reliable estimates of root zone soil moisture content this could not be investigated.Field observations or soil water content estimates produced by a hydrological model may help to asses this in future.

Sources of uncertainty
The choice of parameter estimation method and formulation of model error are potential sources of uncertainty.In the discussion version of this paper (Van Dijk, 2009) the sum of absolute rather than squared errors was used to estimate FPEC and to optimise parameter values.This yielded the same optimal model structure and overall conclusions, even though optimised parameter values did vary.
A. I. J. M. van Dijk: Selection of an appropriately simple storm runoff model Example median, poor and good results are shown in Fig. 4. In this case, the poorest model result (Fig. 4d) can be attributed to the lack of large runoff events, corresponding to low annual QF (27 mm) and rainfall (521 mm).While poor model performance was generally associated with drier catchments, the reverse was not always true and some of the best model performances were also found for dry catchments.Comparison against catchment attributes showed that the FPEC (i.e. the standard error of estimate) was strongly correlated with annual average QF (r = 0.77).
Expressing model performance in model efficiency (NSME; equivalent to normalising FPEC by the observed variance) removed the correlation with annual average QF (r = 0.11) and rainfall (r = 0.13).The best two and three parameter models had an average NSME of 0.62 and 0.67 for event runoff when weighted by observed variance in the 260 records, and median values of 0.61 and 0.64, respectively (Table 2).Previous catchment modelling studies using some of the stations analysed here found similar median NSME values of 0.60-0.75 in calibration (e.g.Viney et al., 2009;Zhang and Chiew, 2009).However, in those cases NSME was not penalised for the number of free parameters and more importantly, NSME related to daily streamflow time series rather than event runoff.An estimate of the achievable NSME had the optimal runoff structure been incorporated within a catchment model, was obtained by adding the observed baseflow time series to the observed and modelled event runoff totals.This increased median NSME to 0.71.
The semi-variogram suggested that about 31% of the variation in NSME was correlated over spatial scales of ca.400 km, and some degree of clustering of catchments with similar model performance is apparent in Fig. 1: model performance appears comparatively poorer in the coastal areas of southwest West Australia, western Victoria and western Tasmania and in inland News South Wales, and better along the eastern sea board.None of the catchment attributes appeared a good predictor of NMSE.The higher correlations suggested poorest performance in catchments with alluvial geomorphology, low rainfall intensity, low relief and low tree cover, but in all cases r was a modest 0.25-0.30.
The quality of rainfall data is expected to be the main constraint on runoff estimation.The event rainfall data used in the current analysis were based on interpolation of daily rainfall gauge data, and Fig. 1 suggests that at least some of the catchments with poor model performance are found in areas with very sparse rainfall gauging, although a statistically meaningful relationship could not be established.The lack of data on (and hence consideration of) intra-storm rainfall intensity is another likely degrading factor.Although rainfall intensity is correlated to event rainfall depth, the relationship is not direct and intensity differences between storms of equal total depth can be more than an order of magnitude (e.g.Van Dijk and Bruijnzeel, 2004).Current developments in event rainfall and rainfall intensity estimation from ground-based radar and remote sensing should help address both constraints in future and allow improved runoff estimation, at least in gauged catchments.

Conclusions
Streamflow data for 260 Australian catchments were used to evaluate the performance of alternative conceptual storm runoff models and derive a model of appropriate simplicity.Event rainfall and runoff was estimated by baseflow separation; a storm flow recession coefficient k QF was calculated from the daily storm flow data and used to estimate event runoff.Four model structures with a maximum of six free parameters were investigated, covering most of the model equations used in existing lumped catchment runoff models.The following conclusions are drawn: 1.A non-linear response model with two or three parameters provides the optimal model structure for modelling event storm flow in Australian catchments.The optimal model produced a median Nash-Sutcliffe model efficiency (NSME) for event runoff of 0.64 across all records.
2. The SCS-CN technique had a similar functional form but had an error 14-15% larger and NSME of 0.30-0.41.The difference can be attributed to the predictive value of antecedent baseflow.
3. Of the three model parameters in the optimal model structure, one related event runoff to storm size (S max ) and another related runoff to groundwater storage estimated from antecedent baseflow (C Sg ).A third parameter described initial rainfall losses (L i ) but could be set at 8 mm without affecting model performance too much.
A fourth parameter k QF , the storm flow recession coefficient, related event runoff to daily storm flow and was calculated directly from streamflow records rather than optimised.
4. Of the total variance in k QF values among stations, 64% could be explained by climate indicators and spatial correlation.The scope to estimate the other parameters in gauged catchments appeared limited; fractions explained were a modest 45% for C Sg and 21% for S max , while none of the variation in L i could be explained.
5.More accurate estimates of event rainfall depth and rainfall intensity are likely to be prerequisite to further increase model performance in gauged catchments.

Figure 1 .
Figure 1.Map showing the location of the stations for which data were analysed.The size of the inner dot corresponds to the Nash-Sutcliffe model efficiency (attained with the best threeparameter model and corrected for the number of free parameters).The distribution of gauges underpinning the interpolated rainfall product is indicated (for an arbitrary day in 2005).

Fig. 1 .
Fig. 1.Map showing the location of the stations for which data were analysed.The relative size of the inner dot corresponds to the Nash-Sutcliffe model efficiency (attained with the best three-parameter model and corrected for the number of free parameters).The distribution of gauges underpinning the interpolated rainfall product is indicated (for an arbitrary day in 2005).

Figure 1 .Figure 2 .
Figure 1.Map showing the location of the stations for which data were analysed.The size of the inner dot corresponds to the Nash-Sutcliffe model efficiency (attained with the best threeparameter model and corrected for the number of free parameters).The distribution of gauges underpinning the interpolated rainfall product is indicated (for an arbitrary day in 2005).

Fig. 2 .
Fig. 2. Cumulative distribution of Nash-Sutcliffe model efficiency (NSME) for the model variants with three and two free parameters,

Figure 3 .
Figure 3. Relationship between catchment humidity index (H, the ratio of rainfa potential evaporation) and the median estimated fraction of saturated area (exceeded the time).

Fig. 3 .
Fig.3.Relationship between catchment humidity index (H , the ratio of rainfall over potential evaporation) and the median estimated fraction of saturated area (exceeded half of the time).

Fig. 4 .
Fig. 4. Examples of event runoff predicted by the optimal three-parameter model against event runoff inferred from streamflow observations.Shown are results for those gauges with the median NSME (gauge 129001), worst (318311) and best (222009) plotted on (a) linear scale and (b-d) double logarithmic scale.

Table 1 .
Change in predictive model performance as the number of free parameters is reduced from six to one.Listed are Final Prediction Error Criterion (FPEC) and adjusted Nash-Sutcliffe Model Efficiency (NSME) calculated from it.All values relate to the best performing model structure.

Table 2 .
Descriptors of the distribution in parameters of the preferred model variants: the storm flow recession coefficient k QF (d −1 ) calculated directly from the observations, and optimised model parameters initial loss L i (mm), maximum storage capacity S max (in 10 3 mm), and saturated area coefficient C Sg (in 10 −3 mm −1 ).

Table 3 .
Summary of the analysis of variance in parameter values derived from fitting the optimal runoff model to event runoff estimates from the 260 catchments.Listed are the fraction of variance explained by catchment attributes, the residual variance showing spatial correlation and the remaining unexplained variance.Also listed are the range (km) of the fitted semi-variograms.