The within-day behaviour of 6 minute rainfall intensity in Australia

The statistical behaviour and distribution of highresolution (6 min) rainfall intensity within the wet part of rainy days (total rainfall depth>10 mm) is investigated for 42 stations across Australia. This paper compares nine theoretical distribution functions (TDFs) in representing these data. Two goodness-of-fit statistics are reported: the Root Mean Square Error (RMSE) between the fitted and observed within-day distribution; and the coefficient of efficiency for the fit to the highest rainfall intensities (average intensity of the 5 highest intensity intervals) across all days at a site. The three-parameter Generalised Pareto distribution was clearly the best performer. Good results were also obtained from Exponential, Gamma, and two-parameter Generalized Pareto distributions, each of which are two parameter functions, which may be advantageous when predicting parameter values. Results of different fitting methods are compared for different estimation techniques. The behaviour of the statistical properties of the within-day intensity distributions was also investigated and trends with latitude, K öppen climate zone (strongly related to latitude) and daily rainfall amount were identified. The latitudinal trends are likely related to a changing mix of rainfall generation mechanisms across the Australian continent.


Introduction
Rainfall data at high temporal resolution are required to accurately model the dynamics of surface runoff processes and, in particular, sediment entrainment (e.g.Dodov and Foufoula-Georgiou, 2005;Kandel et al., 2005;Mertens et al., 2002).These processes respond to rainfall intensity variations over short intervals.However, measurement of rainfall intensity at Correspondence to: A. W. Western (a.western@unimelb.edu.au)sufficient resolution is available only at a limited number of locations across Australia.On the other hand there is good coverage of rainfall data at a daily time step, consequently many models used to inform water managers use a daily time step.The overall goal of this research is to establish a means of estimating the within-day statistical distribution of rainfall intensity given the daily rainfall depth and other readily available hydrometeorological data (e.g.temperature, pressure).This paper makes a first step in that by examining the within-day statistical behaviour of rainfall intensity and its representation by different statistical distributions.
There are several ways of capturing the effects of short timescale rainfall intensity variability in catchment modelling.The rainfall time series can be explicitly represented in a short time step model; however, running short time step distributed models on large catchments is impractical.Alternatively model parameters can be modified (e.g.calibrated) in an attempt to capture the effect of the short time scale processes but with a long (say daily) model time step; however, this effective parameter approach is not well suited to nonlinear processes.Another approach is to use the distribution function (DF) approach in which the cumulative probability density function (cdf) of short time step (say 6 min) rainfall intensity is input (Van Dijk and Bruijnzeel, 2004;Kandel et al., 2005).This function is then modified to produce a cdf of runoff rate by a typically non-linear runoff-intensity relationship that can be updated on a daily basis depending on the catchment wetness or other states such as surface cover.Point-scale work has shown that, from a water quality/erosion perspective, the probability distribution of rainfall intensity within the day and the total daily volume are of primary importance, while the time sequence of intensity is of secondary value (Kandel et al., 2005).Van Dijk and Bruijnzeel (2004) reached similar conclusions for events.The key meteorological input requirement of such models is the cdf of rainfall intensity within the day.(Peel et al., 2007) for Australia.The climate class symbols have the following meanings Aw = tropical, savannah; BWh = arid, dessert, hot; BWk = arid, dessert, cold; BSh = arid, steppe, hot; BSk = arid steppe, cold; Csa = temperate, dry hot summer; Csb = temperate, dry warm summer; Cwa = temperate, dry winter, hot summer; Cfa = temperate, no dry season, hot summer; CFb = temperate, no dry season, warm summer.
The intention of this paper is to examine how to best represent the cdf of 6 min rainfall using wet and dry fractions, coupled with an appropriate continuous distribution function of rainfall intensities during the wet fraction.In the absence of a comprehensive treatment of the TDF selection problem, this paper aims to fill the gap for within-day rainfall intensity distributions in Australia.Specifically, the aim of this investigation was to quantify how well a range of available TDFs fit the measured within-day rainfall intensity data and, in particular, fit the characteristics of rainfall that are most relevant to runoff generation and erosion, that is the high intensities.The principal aspects of the problem that are addressed by this work include: -How well does each of the TDFs perform and how do they rank with respect to each other?
-Which approach to parameter estimation shows the greatest skill: the method of moments, L-moments, LHmoments, or Least Squares (LS)?
-Does the "best" TDF vary with location around Australia (i.e. with climate zone) and how do characteristics of the distribution relate to climatic characteristics?
It also aims to examine variation in the statistical behaviour of the within-day intensity distributions between locations.
To address these aims we analysed high resolution (6 min) rainfall data recorded at 42 Bureau of Meteorology pluviometer installations around Australia.It is important to note that the paper is not aiming to develop a new rainfall disaggregation method as DF models do not require an explicit time sequence.

Data and methods
High resolution rainfall data from pluviograph stations across Australia was obtained and a detailed analysis conducted to explore the distribution of within-day intensities.There were three stages to the analysis.First, the raw rainfall intensity records were filtered to ensure data quality and to exclude days of small rainfall depth (not of interest for runoff or erosion).Second, nine different theoretical distributions were fitted to the measured cumulative density function (CDF) of rainfall intensity.Multiple methods for estimating the distribution parameter values were employed.Third, two objective functions were employed to assess the goodness-of-fit of the different distributions.Data processing and analysis was principally achieved via custom routines written in Fortran-90.Each stage is described in more detail in the following sections.

Data
Pluviograph records were obtained from the Australian Bureau of Meteorology (BOM) from the 42 sites shown in Fig. 1.The Köppen climate zones for Australia Hydrol.Earth Syst.Sci., 15, 2561-2579, 2011 www.hydrol-earth-syst-sci.net/15/2561/2011/ (Peel et al., 2007) are also shown.Where stations are very close to a zone boundary the classification was checked with site data.Table 1 shows pertinent properties of the 42 meteorological stations used.This set of sites (identified by Lu and Yu (2002) for a separate study) provides a broad spatial coverage across Australia, record lengths span at least 20 yr and the mean annual rainfall ranges from 196 mm at Oodnadatta to 2439 at Koombooloomba.Site elevations range from sea level to 760 m, nine of the ten Köppen climate zones present in continental Australia are represented and there is a selection of sites from each of winter dominated, summer dominated and non-seasonal rainfall regimes.

Quality control and censoring
Rainfall intensity data for each station was supplied at the BOM standard 6-min time increment with each 24 h period divided into 240 intervals (hereinafter referred to as pluviograph data).Prior to the early 1990s the BOM pluviometer network used Dines Pluviographs which recorded via a paper chart and pen connected to a float and siphon mechanism.
Since that time, tipping bucket rain gauges with a 0.2 mm tip size have been used and the time of individual tips recorded (Srikanthan et al., 2002).Both these types of records are provided by the BOM as 6 min data.Srikanthan et al. (2002) showed that the short time interval data from these two gauge types are statistically similar.This is consistent with the conclusions of Fankhouser (1998), who found little dependence on measurement characteristics (e.g.bucket size) for tipping bucket gauges.For this analysis, a day was designated as the period starting and finishing at 09:00 h (as per the Bureau standard).This investigation was concerned only with intraday characteristics; therefore inter-day relationships could be neglected and periods of record where data was missing were not used rather than being in-filled.Thus for this analysis only days with a complete pluviograph record were used (i.e.240 values, including zeroes, starting at 09:00 h).Records of rainfall intensity measured using tipping bucket technology incur errors at very low rain rates due to resolution problems (see review by Nystuen, 1999).This error is related to the inherent quantisation involved in tipping bucket technology (the finite volume bucket must fill and empty for rain to be recorded).In addition low intensity periods have been handled differently over time by the Bureau of Meteorology, with earlier data having single tips spread across multiple 6 min periods and later data having the tip assigned to a single 6n min period.As this work was concerned with the upper end of the rainfall intensity spectrum, pluviograph records were censored in two ways to eliminate low intensity data from consideration.First, only days where the total rainfall depth (P ) equalled or exceeded 10 mm were considered.Second, only those 6-min intervals where intensity (R) exceeded a threshold minimum (R min ) of 1 mm h −1 (0.1 mm/6 min) were considered in fitting the CDF.Finally, in order to numerically resolve the higher order moments, the number (n) of 6-min intervals where the intensity exceeded R min on any day need to be at least four.
The results of this censorship regime in terms of the number of rainy days on the record and the percentage of the rainfall depth that fell within the various categories is summarised in Table 2.The bottom line describes the data analysed by this investigation, showing for example that in Darwin 12.3 % of analysed days (i.e.days with a complete pluviograph record) had sufficient rain (P ≥10 mm) and that on these days 88.2 % of the total rainfall depth was received.In contrast, almost half of Melbourne's rainfall depth is delivered on days where the total accumulation is less than 10 mm.Over all stations, the average rainfall depth retained in the data after censorship was 74.7 % of the total rainfall depth, which was considered reasonable given our interest in processes sensitive to large events.
Rainfall was also censored if 6-min intensity was less than 1 mm h −1 .On average, this accounted for 5.5 % of the rainfall depth at each station, with this proportion varying from 1.8 % to 9.2 %.We undertook sensitivity testing using thresholds of 1 mm h −1 and 2 mm h −1 and found the fitted parameter values and quality of fits were insensitive to the exact level of the threshold.The 1 mm h −1 threshold is a reasonable compromise given the discretisation inherent in tipping bucket rain gauges (which is typically a 2 mm h −1 discretisation i.e. 0.2 mm tip and 6 min intervals), the practical need to remove the artefact of single tips being spread over many time increments in the data and our primary interest in higher intensities that are significant to surface processes.

General approach
With any analysis of information from multiple stations a decision must be made as to whether a local (analysis by individual site) or a regional (all sites together) approach should be taken.There are advantages of both.A local approach has the advantage of enabling a better understanding of local behaviour and contrasts between those sites while a regional analysis will provide a more robust relationship over a region due to the inclusion of more data.Here we chose a local approach because we are more interested in understanding the site level behaviour and in exploring the variation between sites.

Theoretical Distribution Functions
Nine different theoretical distribution functions (TDFs) (Table 3) were fitted to the data for the wet fraction of the day.The wet fraction is calculated as the proportion of 6min intervals in the day with rainfall intensity exceeding 1 mm h −1 .The selection of TDFs was populated with distributions well known in the meteorological and hydrological literature.The mathematical formulation of each TDF, and the parameter estimation techniques employed, followed the methods presented in Stedinger et al. (1993) as identified in  the right-most column of Table 3.Of these distributions it is worth noting that the generalised pareto distribution and its special case, the exponential distribution, can be interpreted as peak-over-threshold distributions (Madsen and Rosberg, 1997;Claps and Liao, 2003), which provides some theoretical justification for their suitability here.Other distribution functions with greater flexibility (more parameters) have been used to describe rainfall (e.g. the two-component extreme value distribution (Rossi et al., 1984)); however, given that we aim subsequently to predict the parameter values for distributions from daily meteorological observations, we limited distributions to those that have three or less parameters.
The final three TDFs in Table 3a are Extreme Value Distributions (EVDs).These have been derived specifically to represent the distribution of the largest observation drawn from a large sample.The validity of including these EVDs is open to question as the full range of observed intensity (ignoring the minor censoring at very low intensities) has been included, whereas EVDs describe distributions of extreme values (i.e.maximum or minimum) taken from of each of a set of realisations.The validity of including these EVDs is open to question as the rainfall intensity data to which they are being fitted is not an extreme value data set, at least using traditional ways of thinking about rainfall.However, a recent analysis of heavy rainfall by Wilson and Toumi (2005) shows that the distribution is in fact "heavy tailed" in some casesa characteristic feature of EVDs.
The utility of Wang's (1997) LH-moment method (LH4 moments in this case) was also examined using the GEV distribution as a test case.This fitting method was not pursued even though it yielded a better fit to the upper tail of the distribution than the L-moment estimates because the LH4 estimations were (for a large majority of pluviograph stations) inferior to those produced by product moment and LS methods.Consequently, the results presented in this paper examine only the relative merit of the other three parameter estimation techniques.
It should be noted that there is temporal structure to within-day rainfall that involves both intermittency and serial correlation during rainfall periods.This structure impacts on fitting techniques and in particular uncertainty estimation for fitted parameters (Willems et al., 2007).In this study we have not attempted to estimate the uncertainty in the fit of parameters for each type of distribution because of this issue.

Assessment of fit
There are two possible approaches to assessment of the performance of different distributions; either examining how closely the distribution functions fit the data by some sort of analysis of residuals from the distribution function, or examining the uncertainty in the quantile estimates resulting from the fitted distribution.To estimate the uncertainties in the quantile estimates requires either independent samples or a rigorous treatment of any temporal structure in the data.Rainfall over a day is both intermittent and exhibits (potentially intensity dependent) serial correlation.This structure would need to be incorporated into the uncertainty estimation for the parameters of each of the distributions and for each of the fitting methods.Because of this complexity we opted to examine the fits based on a residual analysis rather than uncertainty in the quantile estimates.
Two measures of goodness-of-fit were selected to quantify the fit of the distributions.First, the Root Mean Square Error (RMSE -defined by ( 1)) of the fitted TDF compared with the observed rainfall intensity data was computed.RMSE quantifies how well the shape of each TDF matches the recorded within-day data considering the entire range of intensity values above the 1 mm h −1 threshold.Note that this yields one RMSE value per rain day analysed.A low RMSE value indicates that the fitted TDF provides a good approximation to the shape of the rainfall intensity CDF; showing that a good fit to both the volume and the duration of different rainfall rates has been achieved. (1) where: Î and I are the fitted and measured rainfall intensity at the j'th probability of exceedance respectively; and n is the number of 6-min intervals during the wet fraction (wf ) of the day (that is: n =240 wf ).Note that for the LS fitting method, the objective function is to minimise the RMSE.
Given the ultimate aim of providing input to erosion models, a second goodness-of-fit statistic was used to quantify the fit to the upper tail of the rainfall distribution.A number of alternatives were considered, including the maximum 6-min intensity; the average of the 2, 3, 5 and 10 highest intensity 6 min periods; and the 80th and 90th percentile intensities.Of course, many of these measures were highly cross-correlated (i.e.r 2 > 0.8).Inspection of fitting results for Melbourne and Darwin showed that some degree of averaging was useful (to avoid over-emphasizing errors in the fit of the highest one or two intensity values) but that averaging over long periods tended to reduce differences between the fit of different TDFs.The average of the five highest intensity periods, designated I HI [mm h −1 ], was selected as providing a reasonable balance between these competing factors.It should be noted that I HI captures 30 min of rainfall in total but not necessarily from consecutive intervals.
Formal statistical testing of distribution fits was also considered.Several alternatives exist for testing whether a sample comes from a hypothesised distribution.These include the Anderson-Darling test (Stephens, 1974), the probability plot correlation coefficient (PPCC) test (Filliben, 1975), the Kolmogorov-Smirnov test and Chi-squared goodness-of-fit test.Of these, critical values only exist for a subset of the candidate distributions for the Anderson-Darling (Lognormal, Exponential and Weibull) and PPCC (Gamma, GEV, Weibull and Gumbel) tests (Engineering Statistics Handbook, Chapter 1.3.5.14, Heo et al., 2008) The Kolmogorov-Smirnov requires the distribution to be fully specified for the critical values to be valid (Engineering Statistics Handbook, Chapter 1.3.5.16).Because we wanted to test all the distributions consistently and needed to estimate the parameter values from the data, these three tests were not suitable.Thus we used the Chi-squared test and followed the Engineering Statistics Handbook (2011) recommendations.It should be noted that any intermittency and serial correlation should be accounted for in implementing these tests.We did not do this and this means the power of the Chi-squared test is overestimated (i.e. more days are found to be statistically different to the hypothesised distribution that is really the case given that there will be some temporal structure to the data).This is a limitation of the testing that was attempted.
The Chi-square test requires continuous data to be discretised into bins and it is recommended that there be at least 5 data points in each bin and at least 5 bins.The upper limit of the first bin was set arbitrarily to 1.5 mm h −1 (larger where necessary to ensure that it contained at least 5 data points).The number of subsequent bins was set to 2n 0.4 r , where n r is the number of remaining data points.For these bins, ranges were allocated on an equal probability basis using the fitted distribution.If bins existed with less than five data points, the number of bins was reduced and ranges recalculated until all bins had at least five observation points.Only days that met the above criteria were selected for testing, which were generally days with more than 3 h of rainfall (i.e. 30 observation points).This testing indicated that each candidate distribution was rejected on about half of the days tested.Subsequent analysis showed the lower half of the distribution contributed more than 50 % of the chi-square statistic on 70-75 percent of days (except lognormal -50 % of days) and that the statistic was insensitive to the upper tail.Given our greater interest in the upper tail, this testing was not useful for distinguishing candidate distributions.

Summary statistics for each station
The results of fitting at a given pluviograph station are summarised herein by one RMSE value and one I HI value for each rain day in the record (>30 000 following quality control).In order to quantify the goodness-of-fit over all the raindays at a given station, two summary statistics were computed: mCOE and RMSE90.
-The goodness-of-fit between the fitted I HI (from the fitted TDF) and the observed data was quantified using the Modified Coefficient of Efficiency (mCOE) (one value of mCOE per station) as defined by Legates and Mc-Cabe (1999).The mCOE is essentially similar to the well known Coefficient of Efficiency (Nash and Sutcliffe, 1970), but instead of squaring the error between measured and observed data (which gives extra weight to outliers), the absolute magnitude of the error is computed instead (refer to Legates and McCabe (1999) for a thorough derivation and discussion).
-The range of RMSE values at a station was summarised by the 90th percentile RMSE (i.e. 90 % of RMSE values are less than or equal to this RMSE value).Herein this statistic is denoted as RMSE90.The 90th percentile was chosen on the basis that it provides an indication of the minimum level of performance that can be expected from the majority of fits.
The meaning of these two statistics will become clearer as some illustrative results are introduced in the next section.
The equation used to compute mCOE was (as per Legates and McCabe (1999)): where: ÎHI and I HI are the fitted and measured mean intensities of the 5 highest intensity intervals of the day; I HI is the mean value of the set; and S is the number of rain days in the pluviometer record for that station.

Illustrative results: Melbourne and Darwin
Fits of Exponential, Gamma, and Generalised Pareto 2 and 3 parameter TDFs for Melbourne and Darwin are shown in Figs. 2 and 3 for nine randomly selected days at each station.These distributions were fitted using the LS method (except for the Gamma distribution which used PM).It is clear that for some days (for example 2 January 1970 at Melbourne) there is little difference between the quality of fit for the various TDFs, while for others there is a significant difference.This is largely controlled by the skewness of the rainfall intensity distribution on the particular day, with the GPT2 and GPT3 distributions being more flexible in terms of matching the variations in skewness.There also appears to be a wider range in observed distribution shapes at Melbourne than at Darwin.These figures give a qualitative idea of the range in fit quality.More quantitative results of the fitting for Melbourne and Darwin are shown in Figs. 4 and 5.The charts are paired (referred to as "chart-pairs"), showing fitted versus observed I HI (top) and RMSE (bottom).A number of additional statistics are provided with these plots as described in detail by each figure heading.The charts in Fig. 4 facilitate comparison of fitting skill using PM for three different TDFs (LGN3, GAMA, and GEV) at two locations: Melbourne (left) and Darwin (right).The charts in Fig. 5 show results for Darwin.They compare the fitting skill achieved by the three different parameter estimation methods (LM, PM and LS) and also show the improvement in fit when an additional degree of freedom is available: i.e.GPT3 (right) versus GPT2 (left).In summary, these two figures show that fitting skill varies as a function of: (i) TDF; (ii) location; (iii) fitting method; and (iv) number of TDF parameters.

Fit results for various TDFs
In Fig. 4 the amount of scatter around the line-of-perfect agreement is greater for the lognormal fit than either the gamma or GEV distributions, and this is the case for both Melbourne and Darwin.The mCOE statistics support this observation, with the lognormal statistic more than 10 % lower than either other TDF.The variation in RMSE is of a similar magnitude for Melbourne; that is the 90th percentile RMSE is 2.8 ± 15 %, with the lognormal TDF at the upper end of this range.In contrast, the RMSE values associated with the lognormal TDF in Darwin vary over a much wider range, with the lognormal fit (RMSE90 = 15.4) clearly inferior compared with the other two TDFs (RMSE90 = 7.6 and 8.3).This suggests that location-related differences in fitting skill may be important.In fact the source of the difference is most likely due to the fact that Darwin receives much heavier rainfall than Melbourne; approximately three times heavier if the median or 90th percentile I HI values are used as the basis of comparison (e.g.median I HI is 30.2 mm h −1 in Darwin compared to 9.5 mm h −1 in Melbourne).Indeed,  the RMSE90 values for the GAMA and GEV distributions are threefold larger in Darwin than in Melbourne, while the LGN3 value is fivefold higher (suggesting that LGN3 fits get poorer as rainfall intensity increases in general).
Given that the elevated RMSE values for Darwin are driven by the higher rainfall intensity of monsoonal events, should the data be normalised (e.g. by I HI ) so as to facilitate comparison between stations (i.e.RMSE calculated for nondimensional results)?It is the authors' opinion that this was not necessary as the objective of this work was to examine TDF fits at each station not between stations.For this task RMSE based on unscaled rainfall intensity data was suitable, and has the added advantage of indicating the error magnitude in units [mm h −1 ] that are readily comprehended (for example: RMSE of 1.0 mm h −1 has more physical meaning than a normalised RMSE of 0.1).Thus, from the RMSE data in Fig. 4 it can be concluded that: (i) GAMA and GEV in Darwin and Melbourne have superior performance to LGN3; (ii) RMSE90 values computed for Darwin are more than double those in Melbourne; and (iii) Darwin experiences events having far higher intensity than Melbourne (i.e.many events where the observed I HI exceeds 20 mm h −1 -putting result (ii) into context).
A final point to note from the fitted versus observed plots in Fig. 4 is that both the GAMA and GEV TDFs tend to slightly underestimate I HI for higher observed values of I 30 .This is indicated by the negative bias and the position of the dashed regression lines being consistently below the line-ofperfect-agreement.Consequently, runoff and erosion predictions using the fitted TDFs would tend to be underestimated compared with the observed data.

Impact of fitting method and number of TDF parameters
Figure 5 illustrates two trends in fitting skill: first, product moments are more successful than L-moments while LS is the best of the three; and second, the extra degree of freedom available to GPT3 noticeably improves the fitting indices.The best fit is shown by the chart-pair at the bottom right (GPT3 LS ).It is interesting to note that the middle-right (GPT3 PM ) has a very similar fit to the bottom-left chart-pair (GPT2 LS ).Given this result, two conflicting conclusions can be drawn regarding the value of the additional degree of freedom available to GPT3 over GPT2.The advantage of the third parameter is most evident in the product moment fits (middle chart-pairs), with the fit statistics for GPT3 far better than those of GPT2 (mCOE = 0.891 compared to 0.741, and RMSE90 = 1.91 compared with 2.56).However, looking at the bottom chart-pairs (LS fit), the improvement offered by the third parameter is less significant (mCOE = 0.928 compared with 0.884, and RMSE90 = 1.58 compared with 2.03).The optimisation provided by the LS process narrows the gap between the GPT2 and GPT3 goodness-of-fit to such an extent that the value of the third parameter must be questioned.
To summarise: the GPT3 LS combination clearly provides the best fit of the combinations shown in Fig. 5 (and in fact later figures show this to be the case across all the pluviograph stations).However, the combination of GPT2 LS should not be ruled out at this point as the fit is only marginally poorer but is achieved with one less model parameter.Using one less parameter should lead to less parameter uncertainty and thus a reduction in the uncertainties of the rainfall intensity.
In the present analysis it is not possible to decide whether fewer model parameters are more desirable than maximising the potential goodness-of-fit, this will indeed be a question for work that follows this TDF selection study (i.e.attempting to predict TDF parameter values from daily climate measurements).However, it is an important consideration in the selection process in that it is important to choose not only the best fitting TDF but also TDFs with two rather than three parameters.
Figures 4 and 5 looked at specific results for two pluviograph stations and illustrate the meaning of the goodness-of-fit in-dices (mCOE and RMSE90).Figure 6 summarises these goodness-of-fit results for all 42 stations using two sets of box plots (mCOE top, RMSE90 bottom).Three boxes are shown for each of the nine TDFs, one for each fitting method (see definitions in the figure legend).The results shown in Fig. 6 were the primary tool for ranking the fitting methods and TDFs Note that LS fitting to GAMA and LGN3 caused technical problems and hence results for these cases do not appear in Fig. 6 .The impediments to LS calculation in these cases are as follows.For GAMA an analytic CDF is unavailable and so instead an iterative numerical solution was required.Computation times became excessive when LS was attempted using the pattern search algorithm coupled with the numerical solution to the GAMA CDF.In the case of LGN3, estimation of the location parameter wasn't robust, with the denominator of the algorithm tending toward zero under some conditions.This problem could be avoided by imposing a number of constraints on the location parameter.However, given the poor fitting performance of LGN3 obtained with L-moment and product moment estimation, it was felt that the TDF was unlikely to be selected and hence the effort required to implement an LS solution was not justifiable.

Ranking the fitting methods
The trends previously observed for individual pluviographs are reinforced by the results in Fig. 6.These show that the LS method produces consistently higher mCOE values (top) and smaller RMSE90 values (bottom) than either of the other fitting methods.Furthermore, in all cases both the magnitude of the statistic is better and the range of values is smaller.The reduction in range implies that LS improves the poorest fits by a greater extent than the better fits, with the fit at all stations an improvement over those achieved using other fitting methods.
The superiority of the LS fitting is founded on the success of the PM fit in that the PM parameter values were used to initialise the LS optimisation.The PM fits in Fig. 6   minimise RMSE values.It is noteworthy that mCOE values are also substantially improved by the LS process by comparison with the mCOE values achieved using the PM approach (see especially GEV and GPT2 results -top of Fig. 6).
On the basis of these observations it is clear that the LS method represents the best fitting method, followed by PM and then the LM method.Thus, the first conclusion that this study draws is with regard to fitting method: Candidate TDFs should be first fit by PM and then optimised by LS to obtain the highest fitting skill as measured by mCOE and RMSE90.

Ranking the TDFs
In this section the focus is on ranking the fit provided by the nine TDFs.The objective was to reduce the number of candidates from nine down to the best three or four TDFs, with the ultimate aim to then use these in a subsequent study to predict the parameters of these TDFs from daily climate variables.
The desire to identify multiple candidate TDFs, as well as the TDF with the best fit, is that the parameter values of some TDFs may be more amenable to prediction than others.One reason (highlighted earlier) is that a two-parameter TDF may be more identifiable (i.e.able to be predicted) than a threeparameter TDF.A second possibility is that the parameters of one TDF may be more identifiable than the parameters of another TDF.For example, it may be that the two parameters of EXP are more readily predicted than the two parameters of GPT2, due perhaps to different structural relationships between the TDF parameters and the statistics of the distribution (i.e.mean, variance, skewness, and kurtosis).

Elimination: lognormal TDFs
The first two eliminations are straight forward.The skill shown by the LOGN and LGN3 distributions are clearly poorer than the other TDFs.Hence, LOGN and LGN3 were eliminated as candidate distributions.

Elimination: Extreme Value Distributions
The performance of the three EVDs (GEV, GMBL and WEBL) is mixed (consider results for fitting by LS).The box plots for GEV and WEBL are second only to GPT3, while the skill of the GMBL is not as good as the other TDFs (median mCOE is less than 0.9 and median RMSE90 is greater than 1.0).The good results for the GEV and WEBL supports the notion that EVDs are suitable for representing within-day rainfall intensity distributions.However, the skill shown cannot be considered exceptional in that the EVDs' fit is inferior to GPT3 (at all but one station).Thus, on balance it is not considered that there is a strong enough case to consider selecting an EVD, given the concern that within-day rainfall is not a classic extreme value distribution.Hence, the decision was taken to exclude GEV, GMBL and WEBL distributions from further consideration.

Variability with location
One factor that cannot be discerned from Fig. 6 is whether fitting skill varies with location.To understand how much of an influence location has two questions were asked: -Which TDF fits best at each pluviograph station?-Can spatial trends in the goodness-of-fit statistics be discerned?
The results shown in Fig. 6 suggest that GPT3 provides the best fit to the data.However because the range of mCOE and RMSE90 values overlaps with the box plots of other TDFs, it is possible that at particular stations one of the other TDFs yields a better fit.Thus, on a station-by-station basis the TDF and fitting method showing the highest mCOE and, independently, the lowest RMSE90 were identified.The aggregated results are summarised in Table 4 and demonstrate that GPT3 LS is unequivocally the best fitting TDF, with GPT2 and WEBL distributions providing a lower RMSE90 result at only 3 and 1 pluviometer stations, respectively.Fit by: Product Moments Fit by: Least Squares -75 th percentile -median -25 th percentile -whisker < 1.5 x interquartile range -outlier > 1.5 x interquartile range Fig. 8. Box plots showing the marginal error associated with choosing an alternate TDF and fitting method than the best available combination at each pluviograph.Error is very low for the GPT3/MLE combination as at most locations this combination exhibits the highest fitting skill (therefore zero error).The upper box plot indicates the spread of error for mCOE and the lower plots error in RMSE90.The most attractive TDF and fitting method combinations are those displaying low values in both the upper and lower plots.Note that while outliers are identified above, this does not imply data was removed from any subsequent analysis.
The fact that GPT3 consistently provides the best fit, rather than different TDFs being better at different locations, suggests that GPT3 has sufficient flexibility to accommodate a range of within-day rainfall intensity distributions, and perhaps that the shape of rainfall CDFs does not vary strongly with location.The answer is probably a combination of both factors, with GPT3 LS clearly the first choice distribution for fitting within-day rainfall intensity.
While GPT3 provides the best fit across almost all the stations, the next question is whether the level of fitting skill varies systematically with location.To examine this possibility, maps showing the spatial variation of mCOE and RMSE-90 such as Fig. 7 were constructed.Symbol size on these maps indicates the goodness-of-fit, with larger circle diameters indicating a poorer fit (i.e.low mCOE or high RMSE-90).Maps were constructed for the four TDFs not yet eliminated (GPT3 LS , GPT2 LS , EXP LS and GAMA PM ) and from a qualitative, visual inspection the pattern of circle sizes looked similar for each TDF.One pattern observed by the authors was that larger RMSE-90 values were concentrated in the North-East and lower values in the South and South-West.This is a similar spatial pattern, albeit with a larger proportional difference between the high and low values, to the pattern of mean wet period rainfall intensities and it reflects the higher magnitude of rainfall intensity in the North of Australia (as discussed with respect to Melbourne and Darwin earlier).To investigate whether this clustering could be quantified, spatial statistics were employed.

Final ranking
To rank the remaining four TDFs (EXP, GAMA, GPT2 and GPT3) some additional statistics were calculated which focus on the marginal error associated with selecting one TDF over another, rather than simply looking at the magnitude of mCOE and RMSE90.Marginal error is defined as the difference between the best performed TDF and the TDF of interest on a station-by-station basis.That is, for the ith TDF (TDF i ) at a given pluviograph station: -marginal error in mCOE for The box plots shown in Fig. 8 depict the range of each marginal error statistic across all the pluviometer stations with the TDFs fitted by both product moments and LS.Note that the error for GPT3 LS is zero or close to zero in both the upper and lower charts because at most stations it gives the best fit.Consider the mCOE results first (top of Fig. 8).These results reinforce the fact that GPT3 LS is the best-fit benchmark, followed by GPT2 LS and EXP LS , which exhibit the next lowest (and very similar) marginal error magnitudes.Turning to the lower box plot, the GPT3 LS result is followed by GPT2 LS , then GAMA PM and then EXP LS .Based on their performance as measured by the goodnessof-fit statistics mCOE and RMSE90, we suggest that the two best performing TDFs were GPT3 and GPT2, where GPT3 has a slightly better fit but GPT2 has the advantage of only two parameters.In selecting between two and three parameter distributions there is likely a trade-off between higher bias in the two parameter distribution (due to less flexibility) and higher uncertainty in parameter estimation in the three parameter distribution.The main advantage of GPT2 over GAMA and EXP is that it outperforms GAMA and EXP at the higher intensities.Although the GPT3 distribution provided clearly the best fit, the performance penalty for choosing GPT2, exponential or gamma distributions is only small.Therefore, it would be incorrect to interpret their ranking below GPT3 as a recommendation against their utility; in point of fact because they rely on only two parameters they are viewed as quite attractive options.
It is worth briefly discussing the results from a more theoretical perspective.First the GPT3 (and by inference its special cases) are peak-over-threshold distributions, which matches with the analysis undertaken here, albeit with a low threshold.Also the GPT2 and EXP are both special cases of GPT3, with GPT2 being equivalent to GPT3 with the location parameter set to zero and EXP being equivalent to GPT3 with κ = zero (Claps and Liao, 2003).Some inferences can be made from the fitted parameters for GPT3.First κ < 0, κ = 0 and κ > 0 implies light, normal and heavy tailed distributions respectively.Light tails are not expected as they imply an upper bound, which is unlikely for rainfall intensity.We examined the results from both Melbourne and Darwin and found κ varied from slightly negative (−0.23 and −0.24, respectively) to strongly positive (a few values >1 and >2 respectively) with the average being 0.11 and 0.15, respectively.This indicates a slight tendency towards heavy tailed distributions.Second the location parameter, ξ , for GPT3 can be interpreted as a threshold above which the distribution holds.We thresholded the data a 1mm/h before fitting the distributions.For Melbourne and Darwin respectively, we found 35 % and 7 % of fitted ξ values exceeded 1mm/h but only 1 % and 2 % exceeded 2mm/h respectively, which indicates that our thresholding was at a reasonable value from the perspective of our fitting of GPT3.

Overview of within day intensity behaviour
The TDFs are essentially representing three aspects of the statistical distribution of within day 6-min rainfall intensity distributions: the mean; standard deviation; and skewness.In addition, the wet fraction parameter represents the duration of rainfall within the day exceeding the 1mm/h intensity threshold.The data are discussed in terms of these standard statistical parameters rather than the GPT3 distribution parameters for clarity of interpretation.In addition the be- by I HI , are considered.To understand how these parameters vary between rainfall stations an exploratory analysis was undertaken and the existence of relationships with Köppen climate zone, annual rainfall depth, annual rain days, mean rain day rainfall depth, elevation, and latitude considered.The relationship between the within-day statistics and daily rainfall amount was also examined.
Figure 9a shows box plots of daily mean wet period intensity.Latitude and station numbers are shown on the xaxis.Boxes are organised by latitude from south to north and are coloured by Köppen climate zone.Similar figures were drawn for each of the explanatory variables and each of the statistics.Box order was varied both according to Köppen class first and then the explanatory variable and also according to the explanatory variable (as in Fig. 9).This enabled assessment both of differences between Köppen classes and also with each of the explanatory variables.All the examples shown use latitude as the explanatory variable as it consistently showed the strongest relationship with the rainfall behaviour.There are, however, significant correlations between the explanatory variables, most notably latitude and Köppen class, so attributing the behaviour to a particular explanatory variable is difficult.
Figure 9a shows a trend of increasing rainfall intensity towards the equator, particularly for latitudes less than 30 • S. A similar but noisier pattern was observed with wet day mean rainfall depth (annual rainfall/annual rain days).By considering the groups of colours in Fig. 9a differences between Köppen classes become evident.It is also clear from the rapid expansion in inter-quartile range compared with the median that the between-day variability in within-day intensity distributions becomes larger towards the equator, espe-cially below a latitude of about 30 • S. Very similar patterns of behaviour were evident for the within-day wet period standard deviation (not shown) of 6-min intensities and also for I HI (Fig. 9c).
The one site that is a consistent and significant exception to the above trends is Koombooloomba (31083).This site is located on the Great Dividing Range near Cairns, Queensland.This is an area with extremely high rainfall gradients associated with Orographic effects acting on the prevailing easterly winds blowing off the Pacific Ocean and up the escarpment of the Great Dividing Range.The site is at 760 m, and the terrain rises from near sea level (∼20 m) over the 15km east (i.e.upwind) of the site.No other sites in the data set are subject to orographic effects even approaching this magnitude.
Figure 9b shows that the coefficient of variation of within day wet period 6-min intensity grows smoothly with latitude, although the proportional change across the continent is smaller than for any of the mean, standard deviation or I HI .It can be concluded from this trend that the standard deviation grows more quickly than the mean towards the equator.Again, weaker patterns were observed with mean wetday rainfall and with Köppen class.The inter-quartile range in CV remains approximately constant across all stations.Skewness (not shown) was observed to be very consistent between stations with an inter-quartile range from about 1.1 to 2.4 and a median of 1.7.The wet fraction tends to decrease towards the equator but has a slightly higher inter-quartile range in the intermediate latitudes considered (Köppen zones BWh, Bsh, Cfa and Cwa).The opposing trends in intensity and wet period partially offset each other in terms of daily rainfall accumulation, although there is an increasing trend Hydrol.Earth Syst.Sci., 15,[2561][2562][2563][2564][2565][2566][2567][2568][2569][2570][2571][2572][2573][2574][2575][2576][2577][2578][2579]2011 www.hydrol-earth-syst-sci.net/15/2561/2011/ in daily rainfall accumulation towards the equator.Taken together the changes in within-day statistical behaviour of rainfall intensity probably reflect a shift in domination from frontal rainfall systems to convective rainfall systems towards the equator.The changes in inter-day variability (inter-quartile range) of the wet fraction possibly reflect a mix of frontal and convective systems in the intermediate latitudes, with increasing dominance of frontal systems in southern Australia and convective systems in northern Australia.In interpreting these data it should be remembered that they only reflect days with rainfall accumulations greater than 10mm, which accounts for most of the rainfall at these sites.
Figure 10 shows how the within day statistics of rainfall vary with daily rainfall amount for the various Köppen climate classes.Individual boxes represent all days within a 10mm range in daily rainfall, beginning with the 10-20 mm range.As daily rainfall amount increases there is an increase in mean intensity (Fig. 10a) and also standard deviation and skewness (not shown) for all climate zones.These combine together to result in a proportionally greater increase in the highest intensities observed during the rain day Fig. 10c).The most northerly Köppen zones (Aw and BSh -see Fig. 1) show the highest mean intensities and also the greatest interday variability in mean intensity for a given daily rainfall accumulation, while the most southerly zones (BSk, Csa, Csb, Cfb) show the lowest intensity and inter-day variability.This indicates that the trends in intensity with latitude are not just due to differing daily rainfall accumulations.The coefficient of variation shows interesting behaviour with daily accumulation, first increasing, then reaching a plateau or beginning to decrease.This behaviour results from the changes in standard deviation, which increases with daily rainfall accumulation but tends to asymptote towards constant behaviour at large daily rainfalls.Skewness shows similar patterns to standard deviation but the changes are less pronounced.The wet fraction (Fig. 10d) shows an almost linear growth with daily rainfall accumulation, as does the inter-day variability (interquartile range) in wet fraction.Considering both the mean intensity and wet fraction together, it is clear that most of the increase in daily rainfall accumulation is due to growing rainfall duration rather than increases in intensity.
It is clear from the relationships shown in Figs. 9 and 10 that the parameter values for the intensity distributions will change with both latitude and the amount of rainfall on a given day.Both these factors could be incorporated into a predictive model for the parameters that is based on location and daily rainfall depth.However the results in Fig. 9 also indicate that there is considerable variability between days with similar amounts of rain at a station, which suggests it may also be valuable to explore other predictors.

Summary and conclusions
This study was conducted as a precursor to a detailed investigation into the question of whether within-day rainfall characteristics and intensity distributions can be inferred from daily measurements of climatic variables.Given this context the study focussed primarily on identifying the most appropriate theoretical distribution function(s) with which to represent within-day rainfall intensities.In respect of this aim, the analysis demonstrated that the three-parameter Generalised Pareto Distribution provides the best fit, followed by the two-parameter Generalised Pareto, Exponential and Gamma distributions.The ranking was made on the basis of performance with respect to two objective functions: the root mean square error of the fitted theoretical distribution compared to the measured within-day pluviograph data; and the fitted versus the mean of the measured 5 highest 6-min rainfall intensities across the day, I HI , where the intervals did not have to be consecutive In addition to these specific conclusions, the study provides a range of other more general insights into the nature of within-day rainfall intensity data and information on fitting distribution functions to it.
-Parameter Estimation Methods: The utility of fitting theoretical distribution functions using L-moment methods was found to be consistently inferior to the standard product moment method.The best fit was achieved by first estimating parameter values by product moments, then improving the fit performance using a optimisation to minimise root mean square error (Eq.1).
-Variability of Fit Performance with Location: The importance of location in fitting a theoretical distribution function was found to be small with the same distribution (GPT3) being consistently identified as best performing between sites.However, the root mean square error statistic was noted to increase as rainfall intensity increased.
-Implications of Distribution Function Ranking: the relatively poor fit of the lognormal (2 and 3 parameter) distribution function suggest that it should not be used as the basis for modelling within-day rainfall patterns.
-Extreme Value Distributions: The skill of the GEV and Weibull distributions (and to a lesser degree the Gumbel distribution) provided fits to the within-day rainfall data of a quality that approaches but does not exceed that of the GPT3 distribution.Given that the extreme value distributions provide no clear performance advantage, coupled with the doubt over the validity of using them to describe within-day rainfall data, it is recommended that extreme value distributions not be used for this purpose.
It is important to note that in absolute terms the quality of the calibrated TDF fits to the measured rainfall intensity data is very high.This suggests that the TDFs are an excellent means to summarise the distribution of within-day data (240 points) by only 2 or 3 TDF parameter values plus the wet fraction statistic (giving a 3 or 4 parameter model).
The analysis has also provided insight into the within-day statistical behaviour of rainfall and the inter-day variation in this behaviour.Clear trends with latitude (increasing across the continent towards the equator) were identified for key within-day statistical properties including the mean, standard deviation and coefficient of variation of wet period 6-min intensity variation and maximum intensities (I HI ).Mean intensity, standard deviation and maximum intensities also became more variable between days for locations closer to the equator.Skewness remained approximately constant.The duration of rainfall during rain days tended to decrease towards the equator.Trends with daily rainfall accumulation demonstrated increases in mean, standard deviation and maximum intensities, more complex behaviour for the coefficient of variation and skewness and strongly increasing rainfall duration.Most of the difference in daily accumulation is due to duration rather than intensity changes.The spatial trends in within-day rainfall behaviour are believed to be linked to a shift in dominance of frontal and convective rainfall mechanisms across the continent.

Fig. 2 .
Fig. 2.Fitted cumulative density functions (CDFs) of EXP, GPT2, GPT3 and GAMA for nine events representative of varying daily rainfall depths for Melbourne.TDFs were fitted using the LS technique.

Fig. 3 .
Fig.3.Fitted cumulative density functions (CDFs) of EXP, GPT2, GPT3 and GAMA for nine events representative of varying daily rainfall depths for Darwin.TDFs were fitted using the LS technique.

Fig. 4 .
Fig. 4. Six paired scatter plots are shown to illustrate the skill of three selected TDF's: three parameter Lognormal (top); Gamma (middle); and the Generalized Extreme Value (lower).The data is for two pluviograph stations: Melbourne on the left and Darwin on the right.The plots are in pairs showing the fitted I 30 and the RMSE for each event (both plotted versus the measured I 30 ).Values for mCOE, bias and the square of Pearson's correlation coefficient (r 2 ), as well as the line-of-perfect-agreement (solid) and the linear regression line (dashed) are printed on the I 30 charts.The RMSE plots indicate the 50th (solid line) and 90th (dashed line) percentile RMSE and measured I 30 values, and also indicate the percentage of RMSE values greater than 16 mm h −1 and are hence outside the vertical scale of the plot.

Fig. 5 .
Fig. 5. Six paired scatter plots based on data from Darwin Airport are shown to illustrate first the relative skill of TDF's having two (GPT2 -left side) or three parameters (GPT3 -right side) and second the success of three different fitting schemes: L-moments (top); Product Moments (middle); and Least Squares Estimation (lower).The plots are in pairs showing the fitted I 30 and the RMSE for each event (both plotted versus the measured I 30 ).Values for mCOE, bias and the square of Pearson's correlation coefficient (r 2 ), as well as the line-ofperfect-agreement (solid) and the linear regression line (dashed) are printed on the I 30 charts.The RMSE plots indicate the 50th (solid line) and 90th (dashed line) percentile RMSE and measured I 30 values, and also show the percentage of RMSE values greater than 16 mm h −1 and are hence outside the vertical scale of the plot.

Fig. 6 .
Fig. 6.Box plots showing the spread of mCOE (upper chart) and RMSE90 (lower chart) values across the 42 pluviograph stations.The results indicate the spread of values associated with each of the nine TDFs and the three fitting methods (note that the least squares estimation technique was not able to be employed for the GAMA and LGN3 distributions).High fitting skill is indicated by mCOE values close to 1.0 and by RMSE90 values close to zero.Note that while outliers are identified above, this does not imply data was removed from any subsequent analysis.

Fig. 7 .
Fig. 7. Maps indicating the spatial variation of mCOE (left) and 90 th percentile RMSE (right) for GPT3 fitted using least squares estimation.Note that smaller circle sizes indicate a better fit (i.e.maximum mCOE and minimum RMSE).

Fig. 9 .
Fig.9.Box plots showing variation in daily mean wet period intensity, daily wet period intensity coefficient of variation, extreme intensity I HI , and wet fraction.Boxes are arranged from highest (most southerly) to lowest latitude and are labelled with station number and latitude.Colours show Köppen climate classification of the stations (see Fig.1).All intensity statistics use 6-min data.Boxes show the inter-quartile range, whiskers extend 1.5 times the inter-quartile range and notches show confidence limits on the median.

Fig. 10 .
Fig.10.Box plots showing variation in daily mean wet period intensity, daily wet period intensity, coefficient of variation, extreme intensity I HI , and wet fraction with daily rainfall accumulation for various Köppen climate zones.Daily rainfall has been categorised into 10 mm bins with a lower limit of 10 mm i.e. bin 1 includes daily rainfalls of 10-20 mm.Only some bins are labelled to maintain clarity and labels represent the middle of the bin range.Note, boxes are only drawn where at least 10 days fall in the observation bins, bins are missing for the second highest accumulation amount in some cases and some observations exist above the maximum plotted box due to this.

Table 1 .
Properties of the 42 study sites.

Table 3 .
Stedinger et al., 1993)n Functions tested for skill in fitting the CDF of within-day rainfall intensity.This table also indicates the short name assigned to each TDF, lists the parameters of the distribution and their function (scale, shape, or location), and indicates the source for relationships used in the fitting process (pages fromStedinger et al., 1993).

Table 4 .
Combination of TDF and fitting method with the highest fitting skill for mCOE and RMSE90 statistics, indicating the percentage of stations for which each combination is the best.