A Hybrid Stochastic Rainfall Model That Reproduces Rainfall Characteristics at Hourly through Yearly Time Scale

A novel approach of stochastic rainfall generation that can reproduce various statistical characteristics of observed rainfall at hourly through yearly time scale is presented. The model uses the Seasonal Auto-Regressive Integrated Moving Average (SARIMA) model to generate monthly rainfall. Then, it downscales the generated monthly rainfall to the hourly aggregation level using the Modified Bartlett-Lewis Rectangular Pulse (MBLRP) model, a type of Poisson cluster rainfall 10 model. Here, the MBLRP model is fine-tuned such that it can reproduce the fine-scale properties of observed rainfall. This was achieved by first generating a set of fine scale rainfall statistics reflecting the complex correlation structure between rainfall mean, variance, auto-covariance, and proportion of dry periods, and then coupling it to the generated monthly rainfall, which were used as the basis of the MBLRP parameters to downscale monthly rainfall. The approach was tested at the 29 gauges located in the Midwest to the East Coast of the Continental United States with a variety of rainfall characteristics. The results 15 of the test suggest that our hybrid model accurately reproduces the first through the third order statistics as well as the intermittency properties from the hourly to the annual time scale; and the statistical behaviour of monthly maxima and extreme values of the observed rainfall was well reproduced as well.


Introduction and Background
Most human and natural systems affected by rainfall react sensitively to temporal variability of rainfall across small (e.g.quarter-hourly) through large (e.g.monthly, yearly) time scale.Small scale rainfall temporal variability influences shortterm watershed responses such as flash flood (Reed et al., 2007) and subsequent transport of sediments (Ogston et al., 2000) and contaminants (Zonta et al., 2005).Large scale rainfall temporal variability influences long-term resilience of human-flood systems (Yu et al., 2017), human health (Patz et al., 2005), food production (Shisanya et al., 2011), and evolution of human society (Warner and Afifi, 2014) and ecosystems (Borgogno et al., 2007;Fernandez-Illescas and Rodriguez-Iturbe, 2004).
While the risk exerted by these impacts needs to be precisely assessed for the management of the systems, the observed rainfall record is oftentimes not long enough (Koutsoyiannis and Onof, 2001).Furthermore, the rainfall records do not exist when the risks need to be assessed for the future.For this reason, stochastic rainfall generators, which can generate Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.synthetic rainfall record with infinite length, have been frequently used to provide the rainfall input data to the modelling studies for the risk assessment.
The Poisson cluster rainfall generation model (Rodriguez-Iturbe et al, 1987;1988) is one of the most widely applied stochastic rainfall generators.Figure 1 shows a schematic of the Modified Bartlett-Lewis Rectangular Pulse (MBLRP) model, which is a typical Poisson cluster rainfall model.The model assumes that a series of rain storms (black circles) comprising a sequence of rain cells (red circles), arrives in time according to a Poisson process.Kim et al. (2013a) summarized the MBLRP model structure as follow: "In the MBLRP model, X1 [T] is a random variable that represents the storm arrival time, which is governed by a Poisson process with parameter λ [1/T]; X2 [T] is a random variable that represents the duration of storm activity (i.e., the time window after the beginning of the storm within which rain cells can arrive), which varies according to an exponential distribution with parameter γ [1/T]; X3 [T] is a random variable that represents the rain cell arrival time within the duration of storm activity, which is governed by a Poisson process with parameter β [1/T]; X4 [T] is a random variable that represents the duration of the rain cells.The distribution of the rain cell durations is known to have a long-tailed distribution (Rodriguez-Iturbe et. al., 1987), which was assumed to vary according to an exponential distribution with parameter η [1/T] that, in turn, is a random variable represented by a gamma distribution with parameters ν [T] and α [dimensionless]; and X5 [L/T] is a random variable that represents the rain cell intensity, which varies according to an exponential distribution with parameter μ [L/T].From the physical viewpoint, λ is the expected number of storms that arrive in a given period, 1/γ is the expected duration of storm activity, β is the expected number of rain cells that arrive within the duration of storm activity, 1/η is the expected duration of rain cells and μ is the average rain cell intensity.Parameters ν and α do not have a clear physical meaning, but the expected value and variance of η can be expressed as α/ν and α/ν2.Therefore, the model has six parameters: λ, γ, β, ν, α and μ; however, it is customary to use the dimensionless ratios φ = γ/η and κ = β/η as parameters instead of γ and β."As suggested by the figure, Poisson cluster rainfall models are designed to reflect the original spatial structure of rain storms containing multiple rain cells (Austin and Houze Jr., 1972;Olsson and Burlando, 2002), so they are good at reproducing 5 the first through the third order statistics of the observed rainfall at quarter-hourly through daily accumulation levels, as well as other hydrologically important statistics such as proportion of non-rainy period (Olsson and Burlando, 2002).The performance of the Poisson cluster rainfall models in reproducing the statistical properties of observed rainfall has been validated for various climates at numerous locations across the globe (Bo et al., 1994;Cameron et al., 2000;Cowpertwait, 1991;Cowpertwait et al., 2007;Derzekos et al., 2005;Entekhabi et al., 1989;Glasbey et al., 1995;Gyasi-Agyei and Willgoose;10 1997, Gyasi-Agyei;1999, Islam et al.;1990, Kaczmarska et al., 2014;;Khaliq and Cunnane, 1996;Kim et al., 2016;Kim et al., 2013b;Kim et al., 2014;Kossieris et al., 2015;Kossieris et al., 2016;Onof and Wheater, 1994a;Onof and Wheater, 1994b; Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.Onof and Wheater, 1993;Rodriguez-Iturbe et al., 1988;Rodriguez-Iturbe et al., 1987;Smithers et al., 2002;Velghe et al., 1994;Verhoest et al., 1997;Wasko et al., 2015).For this reason, they have been widely applied to assess the risks exerted on human and natural systems such as floods (Paschalis et al., 2014), water availability (Faramarzi et al., 2009), contaminant transport (Solo-Gabriele, 1998), and landslides (Peres and Cancelliere, 2014;2016).Recently, Poisson cluster rainfall models have also been used to generate future rainfall scenario under climate change (Kilsby et al., 2007;Burton et al., 2010;Fatichi et al., 2011).
In the meantime, Poisson cluster rainfall models have an intrinsic limitation derived from a fundamental model assumption.As described by Figure 1, they generate the rainfall time series assuming that the rain storms arrive according to a Poisson process, which assumes that rain storm occurrences are independent.In addition, the rain cells in different storms are independent with each other.These model assumptions deprive the model of the ability to reproduce the long-term memory of rainfall that is often observed in reality (Marani, 2003).
Let us introduce some notation.The aggregated process  (ℎ) at time-scale h hours is defined in terms of the continuous time process  by the equation: We can then write the variance at time-scale nh as: , where  ℎ is the variance of rainfall depths at scale h hours and  ℎ () the covariance of lag k at scale h hours.
The second term of the right-hand side of Equation 1, which represents the rainfall correlation between individual records separated by k time-steps of the time series of rainfall depths at scale h hours, is likely to be underestimated by the Poisson cluster rainfall model because it can only reproduce short-term memory in the rainfall signal through its model structure, i.e.
through the clustering of rain cells.The degree of underestimation will increase as the correlation between the individual records (  (ℎ) ) of the observed rainfall time series increases and as the aggregation level n increases.This underestimation was consistently observed in the rainfall data of the United States (Kim et al., 2013a).If ℎ = 1 in Equation 1, i.e. hourly rainfall, and  ≅720 (24hours/day x 30 days = 720 hours ≅ 1 month), the left-hand side of Equation 1 will represent the variance of monthly rainfall, which can be represented on the vertical axis of the box plots in Figure 2.
In Figure 2 greater than that of the blue box plots in general, which implies that the variability of the observed rainfall is systematically greater than that of the synthetic rainfall.The discrepancy between the two are shown as the gray shading in the figure.In addition, the monthly extreme values shown as star marks are also underestimated by synthetic rainfall.This is, in particular, caused by the aforementioned limitations of the Poisson cluster rainfall models.
Considering that the management strategies of the water-prone human and natural systems may be governed by the few extreme rainfall values observed in the shaded domain of Figure 2, the risk analysis based on the rainfall data generated by Poisson cluster rainfall models may miss the system behaviour that is crucial for development of the management plans.
As a matter of fact, other rainfall models have the similar issue: they cannot reproduce the temporal variability of observed rainfall across all time scales (Paschalis et al, 2014).For example, Markov chains, alternating renewal processes, and generalized linear models can reproduce the variability only at time scales coarser than one day.Models based on autoregressive properties of rainfall are typically good at reproducing the observed rainfall variability only for a limited range of scales, for instance from one month to a year or two (Mishra and Desai, 2005;Modarres and Ouarda, 2014;Yoo et al., 2016).Several studies discussed the need to use composite rainfall models to resolve this scale problem of rainfall models.This study proposes a composite rainfall generation model that can reproduce various statistical properties of observed rainfall at time scales ranging between one hour and one year.First, the model generates the monthly rainfall time series using the Seasonal Auto-Regressive Integrated Moving Average (SARIMA) model.Then, it downscales the generated monthly rainfall time series to the hourly aggregation level using a Poisson cluster rainfall model.Compared to the previous studies with similar methodology (Fatichi et al., 2011;Paschalis et al., 2014), our model has a novelty in that: (i) it models the monthly rainfalls so as to generate monthly statistics that will serve to calibrate the MLBRP model; (ii) each of the generated individual monthly rainfalls are downscaled using month-specific MLBRP model parameter sets that reflect the complex correlation structure of various rainfall statistics at fine time scale such as mean, variance, covariance, and proportion of dry periods.This distinctive approach of our model enables an accurate reproduction of the first through the third order statistics as well as the proportion of dry periods from the hourly to the annual time scale; and the statistical behaviour of monthly maxima and extreme values of the observed rainfall is well reproduced.

Study Area
Figure 3 shows the study area, which encompasses the Midwest to the East Coast of the Continental United States.
We used the National Climatic Data Centre (NCDC) hourly rainfall data observed at 29 gauge locations (triangles in Figure 3) for the period between 1981 and 2010.The study area has a variety of rainfall characteristics (Kim et al., 2013b).The northern, middle, and the southern part of the study area are classified as Humid Continental (warm summer), Humid Continental (cool summer), and Humid Subtropical climate, respectively, according to the Köppen Climate Classification (Köppen, 1900;Kottek, 2006).The annual rainfall of the study area varies from 750 mm to 1500 mm.

Monthly Rainfall Generation
We applied the Seasonal Auto-Regressive Integrated Moving Average (SARIMA) model to generate monthly rainfall.
Generation of monthly rainfall based on Autoregressive relationship has been widely applied due to its parsimonious nature 5 (Mishra and Desai, 2005) and was proven to successfully reproduce the first through the third-order statistics of the observed rainfall at monthly time scale (Delleur and Kavvas, 1978;Katz and Skaggs, 1981;Ü nal et al., 2004;Mishra and Desai, 2005).
Rainfall data of different stations have different temporal persistence, so we applied the SARIMA model with different autoregressive(p), differencing(d), and moving average terms(q) to different stations.The choice of the optimal model for each station was determined through the following processes: First, a model structure of SARIMA(p, d, q)(P, D, Q)m is 10 assumed, where P, D, Q represent the numbers of seasonal autoregressive, differencing, and moving average terms, respectively, and m represents the number of periods (here, months) in each seasonhere  = 12.Second, the parameters of the SARIMA model are determined through the method of maximum likelihood.Third, the the Bayesian Information Criterion (BIC) are calculated for the fitted SARIMA model.Lastly, the first to third steps are repeated for a combination of different values of p (0p2), d (0d2), q (0q2), P (0P2), D (0D2), and Q (0Q2), and the model structure with the lowest BIC is selected for the station.Therefore, a total of 729 (=3 6 ) SARIMA model structures were tested to obtain the optimal model for a station.The optimal model structure and the BIC values were shown in Figure 3. Through this process, we generated 200 years of monthly rainfall for the 29 gauges.

Generation of fine time scale rainfall statistics
The second module generates the fine time scale (1hour through 16 hours) statistics corresponding to each monthly rainfall value generated through the SARIMA model.In so doing we are now considering the monthly rainfall, when divided by the number of days in the month times 24, as providing us with an estimate of the mean rainfall for that particular month.
The second module consists of univariate regressions and functional relations linking the mean hourly rainfall to the other statistics that are required to fit the MBLRP model.With these statistics MLBRP model parameters are obtained and these will be used to disaggregate the generated monthly rainfall.Figure 5 describes the process of the fine time scale rainfall generation. 2 ) is stochastically generated using its relationship to one-hour rainfall mean ( 1 ) using the following formula: (2) where subscripts with square brackets are used for the residuals so as to avoid confusion with the time-scale, and where  [6] is the coefficient determined from the regression analysis (note that the constant term is zero here since, trivially,  1 = 0 when  1 = 0), and  [6] is normally distributed: 2 ).Note that  1 which is the mean hourly rainfall for the month in question is just the monthly total obtained using the SARIMA model, divided by the number of hours in the month.
Similar principles can be applied to the remaining statistics connected through solid arrows in Figure 5. Figure 6 shows the linear relationship between the rainfall statistics observed at the NCDC gauge ID 200164 (star mark in Figure 3) during the month of July.Let us look at this process in a little more detail, focusing first upon the dashed arrows: in Figure 5,  1 and  2 are connected to  1 (1) through a dashed arrow, which means that  1 (1) is derived from  1 and  2 .The following equations establish the relationship between the variances at time-scales h and 2h from which we shall obtain the relationship between  1 and  2 : Or, in simplified notation: 10  Using the discrepancies  between these two estimators which are approximately normally distributed as shown in Figure 7(b), i.e. ~(0,  2 ) we therefore estimate the autocorrelation lag-1 of hourly rainfalls using Looking now at the solid arrows in Figure 5, we see that the residual terms (denoted  [𝑖] ) are likely to be correlated.
For example, consider the following equations relating  1 to  2 and  2 to  4 : (5) From equation (4), it is clear that term ε [7] is dependent upon the hourly autocorrelation (lag-1) coefficient, and similarly therefore that ε [8] in equation ( 6) is dependent upon the two-hourly (lag-1) autocorrelation coefficient. . .The bivariate probability density functions were developed using the Gaussian Copula and its parameters are determined using the maximum likelihood method.

MBLRP Model Parameter Estimation
In this process, each of the monthly rainfall values generated by the SARIMA model is coupled with one set of six MBLRP model parameters that define the random nature of rain storm and rain cell arrival frequency, and the intensity and duration of rain cells (Figure 1).
In this study, the parameters of the MBLRP model were determined such that the rainfall statistics of the generated rainfall resemble the 20 fine-scale rainfall statistics that were coupled with the monthly rainfall generated by the SARIMA model.The Isolated-Speciation Particle Swarm Optimization (ISPSO, Cho et al., 2011) algorithm was employed to identify a set of parameters that minimizes the following objective function: Fi is the i th statistic of the synthetic rainfall time series (e.g.mean of hourly rainfall, standard deviation of 4-hourly rainfall, etc.).The mathematical formulae for the Fis were derived by Rodriguez-Iturbe et al. (1988) as a function of the six parameters (λ, ν, α, μ, ϕ, κ); fi is the i th generated statistic, and wi the weighting factor given to the i th rainfall statistic depending on the use of the synthetic rainfall time series (Kim and Olivera, 2011).Here, it should be noted that a time step with rainfall less than 0.5mm was considered dry when the proportion of non-rainy period was calculated because small rainfall values are known to distort the "true" proportion of non-rainy period exerting an adverse effect on calibration process (Kim et al, 2016, Cross et al., 2018).
It is noteworthy that Module 2 may fail to generate a realistic set of fine scale rainfall statistics due to the complex interdependencies between them.The unrealistic fine scale rainfall statistics cannot be represented by the MBLRP model that reflects the original spatial structure of rainfall in reality, which entails poorly calibrated model parameters with high objective function value of Equation 8. To exclude the poorly calibrated parameter sets caused by the unrealistic fine scale rainfall statistics generated by Module 2, we repeated the process of Module 2 and Module 3 until the objective function value of Equation 8 becomes lower than a given threshold value (0.8 in this study).If the algorithm fails to find the parameter set after 50 repetitions, the parameter set with the lowest objective function value is chosen.Figure 4 describes this filtering process, and the red squares in Figure 6 shows the chosen parameter sets.

Downscaling of Monthly Rainfall Using the MBLRP Model
The MBLRP model was used to downscale the monthly rainfall to the hourly aggregation level.First, the MBLRP model generates the hourly rainfall time series using the parameter set for the monthly rainfall being downscaled.Second, the discrepancy between the generated fine time scale statistics and the statistics of the generated synthetic hourly rainfall time series is calculated using the following formula: ,where   is the discrepancy between the generated statistics and statistics of j th synthetic hourly rainfall time series.   is the i th statistic of j th time series and   is the difference between maximum and minimum values of    about i th statistic.
Third, the first and the second process are repeated 300 times.Then the synthetic hourly rainfall time series with the lowest discrepancy value is chosen.Finally, we repeated the entire process for 200 times to obtain 200 synthetic hourly rainfall time series for each of the generated monthly rainfall.preserve the rainfall persistence at the aggregation interval that is greater than the typical model storm duration, i.e. a few hours.See Figure 1 for example.Within the duration of one storm, rainfall at different time steps may be similar insofar as a portion of it is from the same rain cell.However, the rainfall within one storm is independent of the rainfall within another storm.Therefore, it is natural that Poisson cluster rainfall models tend to underestimate the observed rainfall variance (which reflects the covariance structure -see Equation 1) at time scales exceeding the rain storm duration.Kim et al. (2013b), when mapping the average model storm duration across the continental United States using Equation 11, showed that the model storm duration of the MBLRP model approximately ranges from 2 to 100 hours, so it is not only at the annual scale, but already at the scale of several hours (depending upon the location) that the variability may be underestimated by the MBLRP model.A similar trend as exhibited in Figure 10 was observed at all of the 29 gauges.

Reproduction of Sub-Daily Rainfall Statistics
Figure 12 compares the mean, standard deviation, lag-1 autocorrelation, and the proportion of dry periods of the observed (x) and synthetic (y) rainfall time-series at hourly through 16 hourly aggregation levels.The colour of the scatters represents the calendar months.In each plot, the coefficient of determination ( 2 ) of the linear regression between the two variables is shown.All four statistics were accurately reproduced across various sub-daily time scales with  2 equal to 0.98 (mean), and varying between the following limits for the other statistics: 0.96 and 0.98 (standard deviation), 0.58 and 0.94 (lag-1 autocorrelation), and 0.70 and 0.87 (proportion of dry periods).A linear regression line passing through the origin is shown in each plot.As the slope of the regression line approaches the value of one, the less biased the extreme values reproduced by the model.As the  2 of the regression line approaches the value of one, the more consistent the extreme values reproduced by the model.In all cases, our hybrid model did not show the tendency of underestimating extreme values, which is one of the most widely discussed issues in Poisson cluster rainfall 10 modelling (Cowpertwait, 1998;Cross et al., 2018;Furrer and Katz, 2008;Verhoest et al., 2010;Kim et al., 2013a;Onof et al., 2013;Kim et al., 2016).This is a somewhat surprising result: our algorithm to incorporate large scale variability of the observed rainfall not only served its original purpose but also enhanced the capability of the model to reproduce extreme rainfall values.Figure 14 shows the degree of bias of extreme value reproduction (slope of the regression line in Figure 13) varying with recurrence interval.The values corresponding to the traditional MBLRP model is also shown.The degree of underestimation of the traditional methods varies between 73% and 87%, and it tends to increase as the recurrence interval increases.A similar tendency was observed for our model, but the degree of underestimation was significantly reduced.For our model, the degree of underestimation is the greatest for the 1-hour extreme rainfall and tends to decrease as the duration of the rainfall increases.This tendency was not observed with the traditional MBLRP model.It is important that the rainfall should reproduce not only the extreme rainfall values but also the distribution of the rainfall monthly maxima from which extreme rainfall values are derived.We performed the two-sample Kolmogorov-Smirnov (K-S) test between the monthly maxima of the synthetic rainfall and the observed rainfall.A significance level of 5% was used.Among all 348 calendar months (29 gauges x 12 months), the null hypothesis of assuming that two distributions are the same could not be rejected at 334, 318, 267, 248, 272, and 282 months for the 1-, 2-, 4-, 8-, 16-and 24-hour rainfall, respectively (82 percent of all gauges).On the contrary, the traditional approach successfully reproduced the observed monthly maxima distribution only at 243, 202, 189, 165, 168, and 172 months (47  While significant variability is observed for all six parameters, the parameter μ, which represents the average rain cell intensity, showed the greatest variability, ranging over two orders of magnitudes.This explains why our model is good at both reproducing large scale rainfall variability and small scale extreme values: the variability of the rain cell intensity parameter has the effect of stretching out the distribution of rainfall depths at a range of levels of aggregation, thereby increasing the probability of very large values.And it is course the variability of this cell intensity parameter that is also the most important factor responsible for the increase in the large scale rainfall variance.The physical characteristics of rainfall can be estimated using Equation 10 through Equation 14.We repeated the same analysis on these variables to compare the variability of the rainfall characteristics of our hybrid mode and that of the

Conclusion
The phenomena observed in hydrologic systems and the subsequent human and environmental systems are the consequences of the complex interactions between the components that are influenced by rainfall variability at various ranges of time scale.Therefore, a good or realistic rainfall model must properly reflect the rainfall variability at all hydrologically relevant time scales.Its importance will gather more attentions because of the recent trend of the hydrologic societies that started to recognize the hydrologic, human, and environmental systems from a holistic view point and interpret them based on continuous and dynamic simulation as opposed to the event-based ones (Wagener et al., 2010).
This study is one of many recent efforts in this regard (Fatichi et al., 2011;Kim et al., 2013a;Paschalis et al., 2014).
First, we showed that the Poisson cluster rainfall model, which is probably the most widely applied stochastic rainfall models, cannot reproduce large-scale rainfall variability due to in-built limitations that lie in the model assumptions.Then, we showed that a combination of an autoregressive model for monthly time scale and the "well-tuned" Poisson cluster rainfall model for the finer ranges of time scale is capable of reproducing not only the first through the third order statistics of the rainfall depths, but also the intermittency properties of the observed rainfall.
An additional model could be integrated to our hybrid model to incorporate further rainfall variability.For example, an approach based on random cascades (Molnar and Burlando, 2005;Müller and Haberlandt, 2016;Pohle et al., 2018) can be integrated to our model for reproducing the rainfall variability at the time scale as fine as five minutes.In addition, the SARIMA model that was adopted in this study could be further modified to account for the coarser rainfall variability caused by El Niño-Southern Oscillation (ENSO) and North Atlantic Oscillation (NAO).

Data Availability
Our hybrid model is not easy to implement because it requires extensive analysis of the correlation structure of the fine-scale rainfall statistics to fine-tune the MBLRP model to downscale the generated monthly rainfall.For this reason, we shall continue our work on all possible rain gauge locations across the world and share the results (several hundred years of synthetic rainfall data in text format) through the following website: http://letitrain.info.We ask for cooperation from the international community to share their rainfall data.

Figure 1 :
Figure 1: Schematic of the Modified Bartlett-Lewis Rectangular Pulse Model.

Figure 2 :
Figure 2: Box plots of the observed monthly rainfall at the NCDC Gauge 85663 in Florida, US. (red).The box plots of the synthetic monthly rainfall generated by the Modified Bartlett-Lewis Rectangular Pulse model at the same gauge are shown for reference (blue).
Koutsoyiannis (2001) used two seasonal autoregressive models with different temporal resolution to generate two different time series referring to the same hydrologic process.Then, they adjusted the fine scale time-series using their novel coupling Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.algorithm so that this series becomes consistent with the coarser scale time series without affecting the second-order statistical properties.Menabde and Sivapalan (2000) combined the alternating renewal process with a multiplicative cascade model in which a multi-year rainfall time series generated by a Poisson process based model is disaggregated using a bounded random cascade model.Their model reproduced the observed scaling behaviour of extreme events very well up to 6 minutes of temporal resolution.Fatichi et al. (2011) developed a model that generates monthly rainfall using an autoregressive model and disaggregating the generated monthly rainfall using a Poisson cluster rainfall model.Their composite model showed improved performance in reproducing the rainfall interannual variability that the latter often fails to reproduce.Kim et al. (2013a) proposed a model where the Poisson cluster rainfall model is used to disaggregate the monthly rainfall that is randomly drawn from a Gamma distribution.They found that incorporating the observed rainfall interannual variability through their composite approach also helps reproduce the statistical behaviour of rainfall annual maxima and extreme values at time scales ranging from 1 to 24 hours.Paschalis et al. (2014) introduced a composite model consisting of a Poisson cluster rainfall model or Markov chain model for large time scale and a multiplicative random cascade model for small time scale, which performed better than individual models across a wide range of scales at four different sites with distinct climatological characteristics.

Figure 5 :
Figure 5: Schematic of the algorithm to generate fine time-scale rainfall statistics.In the figure,  ℎ ,  ℎ ,  ℎ (1) =  ℎ (1)/ ℎ and  ℎ in each rectangle represent the rainfall mean, variance, lag-1 autocorrelation, and proportion of dry periods at time-scale h hours, respectively.The statistic connected to each solid arrow head is stochastically generated based on its linear relationship to the one connected to the tail of the same arrow.The statistic Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 6 :
Figure 6: Linear relationship between various fine time-scale rainfall statistics of the observed rainfall (black dots).Based on this regression relationship, up to 50 sets of statistics are generated for each of the months (hollow blue squares) until the objective function of the set becomes smaller than given threshold value.Then, the set with objective function less than the threshold in the MBLRP parameter calibration process is finally chosen (red squares).
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.The autocorrelation lag-k is  ℎ () =  ℎ ()/ ℎ , so, for  = 1 and ℎ = 1 hour, we obtain the relation: the lag-one autocorrelation using standard estimators of the terms in the right-hand side of this relation, i.ehow good is the estimation likely to be?The figure below compares this estimator with the standard estimator (1) ̂ of the autocorrelation.

Figure 7 :
Figure 7: (a) Comparison of estimator () ̂ (horizontal axis) with estimator   2  ̂−  (vertical axis) of the autocorrelation lag-1 of hourly rainfall, (b) The histogram of the discrepancies between these two estimators.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.The autocorrelations at various time scales are known to be correlated with each other(Kim et al., 2013a, Kim et al., 2014), which means that ε[7]  and ε [8] should be correlated with each other.Figure8(a) shows the bivariate probability density function of these two variables at the NCDC gauge 366889 for the month of September.Figure 8(b) shows the colour map of the correlation coefficient between different ε [i] s.This study developed bivariate probability density functions for consecutively numbered random variables ε, i.e. ε [i] and ε [i+1] (for i ranging from 1 to 4 and 6 to 9 respectively -see Figure 5).These were then used to sample values of ε [i+1] conditional upon ε [i] , This procedure in effect assumes that a Markov structure governs the sequences {ε [i

Figure 8 :
Figure 8: (a) Relationship between  [] and  [] and the fitted bivariate distribution.(b) Color map of the correlation coefficient between different  [] s.

Figure 9
Figure 9 compares the mean, variance, lag-1 autocorrelation, and skewness of the observed (x) and generated rainfall (y).Each scatter represents one rainfall gauge.The first and the second-order moments were reproduced accurately with the coefficient of determination ranging between 0.72 and 0.96.Skewness was reproduced fairly well with the correlation coefficient of 0.43.

Figure 10 :
Figure 10: Behaviour of the rainfall variance with regard to aggregation interval of the observed rainfall time series (black) and the 200 years of synthetic rainfall generated by the MBLRP model (blue) and our hybrid model (red).
Figure 11(a) compares the variance of the observed (x) and synthetic (y) rainfall time series at yearly (purple), monthly (red), 15-daily (yellow), weekly (blue), and 32hourly (green) aggregation levels.The comparison of the variance at the finer time scale is carried out in the following Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.section.Figure 10(b) compares the observed (x) and synthetic rainfall time series generated by the traditional MBLRP model (y).

Figure 11 :
Figure 11: (a) Comparison of the large scale rainfall variance of the observed rainfall (x) and the rainfall generated by our hybrid model (y); (b) Comparison of the large scale rainfall variance of the observed rainfall (x) and the rainfall generated by the traditional MBLRP model (y).The different colours of the scatter correspond to the different aggregation interval of rainfall time series.As indicated by the concentration of the scatters below the 1:1 line in Figure 11(b), the traditional MBLRP model systematically underestimates the variability at time scales greater than 32 hours.Our model did not show any bias in this range of large time-scales.

Figure 12 :
Figure 12: Comparison of the statistics of the observed (x) and synthetic (y) rainfall time series at sub-daily time scale.The colour of the dots represents the statistics of each calendar month.

Figure 13 :
Figure 13: Comparison of the extreme rainfall values estimated from the observed (x) and synthetic rainfall generated by the model of this study (y).The plots comparing (a) 20-, (b) 50-, (c) 100-, and (d) 200-year rainfall are shown.Different colours of the scatter represent the rainfall duration.A circle represents the result according the hybrid model, and a triangle represents the result according to the traditional MBLRP model 5

Figure 14 :
Figure 14: Degree of over/underestimation of extreme values by our model (blue) and the traditional MBLRP model (red).  and   are extreme rainfall estimated from synthetic rainfall and observed rainfall, respectively.
percent of all gauges).Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License. of the Parameters of the MBLRP model and Extreme Values Our model uses different parameter sets of the MBLRP model to disaggregate different monthly rainfalls.This means that one given calendar month can have many different parameter sets.By contrast, the traditional MBLRP model uses one parameter set for each calendar month.Therefore, if we look at the variability of each month's parameters, we can see how the model of this study explains the variability of rainfall unlike the MBLRP model.Figure 15 shows a box plot of the parameters for each month at the NCDC Gauge 460582.The parameters of the traditional MBLRP model are shown together for reference (triangles).

Figure 15 :
Figure 15: Variability of the six parameters of the MBLRP model of this study (box plot) at the NCDC Gauge 460582 (star mark in Figure 3).The parameters of the traditional MBLRP model are shown together for reference (triangle).

Figure 16 :
Figure 16: Variability of the rainfall characteristics of the MBLRP model of this study (box plot) at the NCDC Gauge 460582 (star mark in Figure 3).The rainfall characteristics of the traditional MBLRP model are shown together for reference (triangle).

Figure 16
Figure16shows box plots of the various rainfall characteristics for each month at the NCDC Gauge 460582.The values were calculated using Equations 10 through 14.The rainfall characteristics of the traditional MBLRP model are shown 5 together for reference (triangles).The variability of the average storm depth, the average storm duration, and the average number of rain cells per storm was significant, so the y-axes of the box plots were drawn in log-scale.This result suggests that the parameter variability that is incorporated in our model's distinct algorithm contributes to the highly variable external (average storm depth, average storm duration) and internal (average number of rain cells per storm, average rain cell duration) properties of the generated rainfall.10 , the red box plots represent the variability of the monthly rainfall observed at the NCDC rain gauge located in 85663, Florida, USA during the period between 1961 and 2010 The blue box plots represent the variability of the monthly rainfall estimated from the 50 years of hourly synthetic rainfall data generated by the Modified Bartlett-Lewis Rectangular Pulse (MBLRP) model, a type of Poisson cluster rainfall generators.Note that the vertical length of the red box plots are Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-267Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 28 May 2018 c Author(s) 2018.CC BY 4.0 License.