A novel approach to stochastic rainfall generation that can reproduce various statistical characteristics of observed rainfall at hourly to yearly timescales is presented. The model uses a seasonal autoregressive integrated moving average (SARIMA) model to generate monthly rainfall. Then, it downscales the generated monthly rainfall to the hourly aggregation level using the Modified Bartlett–Lewis Rectangular Pulse (MBLRP) model, a type of Poisson cluster rainfall model. Here, the MBLRP model is carefully calibrated such that it can reproduce the sub-daily statistical properties of observed rainfall. This was achieved by first generating a set of fine-scale rainfall statistics reflecting the complex correlation structure between rainfall mean, variance, auto-covariance, and proportion of dry periods, and then coupling it to the generated monthly rainfall, which were used as the basis of the MBLRP parameterization. The approach was tested on 34 gauges located in the Midwest to the east coast of the continental United States with a variety of rainfall characteristics. The results of the test suggest that our hybrid model accurately reproduces the first- to the third-order statistics as well as the intermittency properties from the hourly to the annual timescales, and the statistical behaviour of monthly maxima and extreme values of the observed rainfall were reproduced well.

Most human and natural systems affected by rainfall react sensitively to temporal variability of rainfall across small (e.g. quarter-hourly) to large (e.g. monthly, yearly) timescales. Small-scale rainfall temporal variability influences short-term watershed responses such as flash floods (Reed et al., 2007) and subsequent transport of sediments (Ogston et al., 2000) and contaminants (Zonta et al., 2005). Large-scale rainfall temporal variability (Iliopoulou et al., 2016; Tyralis et al., 2018) influences long-term resilience of human–flood systems (Yu et al., 2017), human health (Patz et al., 2005), food production (Shisanya et al., 2011), and the evolution of human society (Warner and Afifi, 2014) and ecosystems (Borgogno et al., 2007; Fernandez-Illescas and Rodriguez-Iturbe, 2004).

Schematic of the Modified Bartlett–Lewis Rectangular Pulse model. The blue area represents duration (width) and intensity (height) of each rain cell, respectively. The dashed line represents superposed sum of the rain cell intensities.

The risk posed by these impacts needs to be precisely assessed for the management of such systems, but the observed rainfall record is oftentimes “not” long enough for this purpose (Koutsoyiannis and Onof, 2001). Furthermore, the rainfall records do not exist when the risks need to be assessed for the future. For this reason, stochastic rainfall generators, which can create synthetic rainfall records with infinite length, have been frequently used to provide rainfall input data to the modelling studies for risk assessment.

The Poisson cluster rainfall generation model (Rodriguez-Iturbe et al., 1987, 1988) is one of the most widely applied stochastic rainfall generators. Figure 1 shows a schematic of the Modified Bartlett–Lewis Rectangular Pulse (MBLRP) model, which is a typical Poisson cluster rainfall model. The model assumes that a series of rainstorms (black circles) comprising a sequence of rain cells (red circles) arrives in time according to a Poisson process. The MBLRP model has six parameters of which a brief description is provided in the lower text box of Fig. 1.

As suggested by the figure, Poisson cluster rainfall models are designed to reflect the original spatial structure of rainstorms containing multiple rain cells (Austin and Houze Jr., 1972; Olsson and Burlando, 2002), so they are good at reproducing the first- to the third-order statistics of the observed rainfall at quarter-hourly to daily accumulation levels, as well as other hydrologically important statistics such as the proportion of non-rainy periods (Olsson and Burlando, 2002). The performance of the Poisson cluster rainfall models in reproducing the statistical properties of observed rainfall has been validated for various climates at numerous locations across the globe (Bo et al., 1994; Cameron et al., 2000; Cowpertwait, 1991; Cowpertwait et al., 2007; Derzekos et al., 2005; Entekhabi et al., 1989; Glasbey et al., 1995; Gyasi-Agyei and Willgoose, 1997; Gyasi-Agyei, 1999; Islam et al., 1990; Kaczmarska et al., 2014, 2015; Khaliq and Cunnane, 1996; Kim et al., 2013b, 2014, 2016, 2017a, b; Kossieris et al., 2015, 2016; Onof and Wheater, 1993, 1994a, b; Ritschel et al., 2017; Rodriguez-Iturbe et al., 1987, 1988; Smithers et al., 2002; Velghe et al., 1994; Verhoest et al., 1997; Wasko et al., 2015). For this reason, they have been widely applied to assess the risks exerted on human and natural systems such as floods (Paschalis et al., 2014), water availability (Faramarzi et al., 2009), contaminant transport (Solo-Gabriele, 1998), and landslides (Peres and Cancelliere, 2014, 2016; Thomas et al., 2018). Recently, Poisson cluster rainfall models have also been used to generate future rainfall scenarios under climate change (Kilsby et al., 2007; Burton et al., 2010; Fatichi et al., 2011).

In the meantime, Poisson cluster rainfall models have an intrinsic limitation derived from a fundamental model assumption. As described by Fig. 1, they generate the rainfall time series assuming that the rainstorms arrive according to a Poisson process, which assumes that rainstorm occurrences are independent. In addition, the rain cells in different storms are independent with each other. These model assumptions deprive the model of the ability to reproduce the long-term memory of rainfall that is often observed in reality (Marani, 2003).

Let us introduce some notation. The aggregated process

We can then write the variance at timescale

Box plots of the observed monthly rainfall at gauge NCDC-85663 in Florida, USA (red). The box plots of the synthetic monthly rainfall generated by the Modified Bartlett–Lewis Rectangular Pulse model at the same gauge are shown for reference (blue). Whiskers reach minimum and maximum values of monthly rainfall during the period between 1961 and 2010, and grey shaded boxes represent the discrepancy in the variability of the two monthly rainfall values.

The second term of the right-hand side of Eq. (1), which represents the
rainfall correlation between individual records separated by

In Fig. 2, the red box plots represent the distribution of the monthly rainfall observed at gauge NCDC-85663 located in Florida, USA, during the period between 1961 and 2010. The blue box plots represent the variability of the monthly rainfall estimated from the 50 years of hourly synthetic rainfall data generated by the Modified Bartlett–Lewis Rectangular Pulse (MBLRP) model, a type of Poisson cluster rainfall generator. Here, the MBLRP model used the parameter set that was calibrated to reproduce the observed rainfall mean, variance, lag-1 auto-covariance, and proportion of dry periods at sub-daily aggregation intervals (1, 2, 4, 8, and 16 h), which is a typical practice of MBLRP model calibration (Rodriguez-Iturbe et al., 1987, 1988; Kim et al., 2013a). Note that the vertical lengths of the red box plots are greater than those of the blue box plots in general, which implies that the variability of the observed rainfall is systematically greater than that of the synthetic rainfall. The discrepancy between the two is shown as the grey shading in the figure. In addition, the monthly extreme values shown as the highest points of the lines are also underestimated by synthetic rainfall. This is, in particular, caused by the aforementioned limitations of the Poisson cluster rainfall models.

Considering that the management strategies of the water-prone human and natural systems may be governed by the few extreme rainfall values observed in the shaded domain of Fig. 2, the risk analysis based on the rainfall data generated by Poisson cluster rainfall models may miss system behaviour that is crucial for development of the management plans. As a matter of fact, other rainfall models have similar issues: they cannot reproduce the temporal variability of observed rainfall across all timescales (Paschalis et al., 2014). For example, Markov chains, alternating renewal processes, and generalized linear models can reproduce the variability only at timescales coarser than 1 day. Models based on autoregressive properties of rainfall are typically good at reproducing the observed rainfall variability only for a limited range of scales, for instance from 1 month to 1 or 2 years (Mishra and Desai, 2005; Modarres and Ouarda, 2014; Yoo et al., 2016).

Several studies discussed the need to use composite rainfall models to resolve this scale problem of rainfall models. Koutsoyiannis (2001) used two seasonal autoregressive models with different temporal resolution to generate two different time series referring to the same hydrologic process. Then, they adjusted the fine-scale time series using their novel coupling algorithm so that this series becomes consistent with the coarser-scale time series without affecting the second-order statistical properties. Menabde and Sivapalan (2000) combined the alternating renewal process with a multiplicative cascade model in which a multi-year rainfall time series generated by a Poisson-process-based model is disaggregated using a bounded random cascade model. Their model reproduced the observed scaling behaviour of extreme events very well up to 6 min of temporal resolution. Fatichi et al. (2011) developed a model that generates monthly rainfall using an autoregressive model and disaggregating the generated monthly rainfall using a Poisson cluster rainfall model. Their composite model showed improved performance in reproducing the rainfall interannual variability that the latter often fails to reproduce. Kim et al. (2013a) proposed a model where the Poisson cluster rainfall model is used to disaggregate the monthly rainfall that is randomly drawn from a Gamma distribution. They found that incorporating the observed rainfall interannual variability through their composite approach also helps reproduce the statistical behaviour of rainfall annual maxima and extreme values at timescales ranging from 1 to 24 h. Paschalis et al. (2014) introduced a composite model consisting of a Poisson cluster rainfall model or Markov chain model for large timescales and a multiplicative random cascade model for small timescales, which performed better than individual models across a wide range of scales at four different sites with distinct climatological characteristics.

This study proposes a composite rainfall generation model that can reproduce various statistical properties of observed rainfall at timescales ranging between 1 h and 1 year. First, the model generates the monthly rainfall time series using a seasonal autoregressive integrated moving average (SARIMA) model. Then, it downscales the generated monthly rainfall time series to the hourly aggregation level using a Poisson cluster rainfall model. Compared to the previous studies with similar methodology (Fatichi et al., 2011; Paschalis et al., 2014), our model has a novelty in that (i) it models the monthly rainfall values so as to generate monthly statistics that will serve to calibrate the MBLRP model, and (ii) each of the generated individual monthly rainfall values are downscaled using month-specific MBLRP model parameter sets that reflect the complex correlation structure of various rainfall statistics at fine timescales such as mean, variance, covariance, and proportion of dry periods, which existing composite approaches that are not based on Poisson cluster rainfall models showed limitations in reproducing, especially at sub-daily scale. This distinctive approach of our model enables an accurate reproduction of the first- to the third-order statistics as well as the proportion of dry periods from the hourly to the annual timescale, and the statistical behaviour of monthly maxima and extreme values of the observed rainfall is well reproduced.

Figure 3 shows the study area, which encompasses a region from the Midwest to the east coast of the continental United States. We used the National Climatic Data Centre (NCDC) hourly rainfall data observed at 34 gauge locations (triangles in Fig. 3) for the period between 1981 and 2010. The study area has a variety of rainfall characteristics (Kim et al., 2013b). The northern, middle, and the southern part of the study area are classified as humid continental (warm summer), humid continental (cool summer), and humid subtropical climate, respectively, according to the Köppen climate classification (Köppen, 1900; Kottek, 2006). The annual rainfall of the study area varies from 750 to 1500 mm.

Study area and 34 NCDC hourly rainfall gauges. The label of the
markers is presented in the following format:
xxxxxx(

Figure 4 describes the model structure of this study. The model is composed of four distinct modules. The first module generates the monthly rainfall. The second module generates the fine-scale (1 to 16 h) rainfall statistics corresponding to each of the generated monthly rainfall values in the first module. The third module estimates the parameters of the MBLRP model based on the fine-scale rainfall statistics generated by the second module. As a result of this process, each of the generated monthly rainfall values is coupled with the MBLRP parameter set reflecting its fine-scale statistical characteristics. The fourth module downscales each of the monthly rainfall values using the MBLRP model based on the parameters obtained in the third module.

The four different modules of the model of this study.

We applied a seasonal autoregressive integrated moving average (SARIMA)
model to generate monthly rainfall. Generation of monthly rainfall based on
autoregressive relationships has been widely applied due to its parsimonious
nature (Mishra and Desai, 2005) and was proven to successfully reproduce the
first to the third-order statistics of the observed rainfall at monthly
timescales (Delleur and Kavvas, 1978; Katz and Skaggs, 1981; Ünal et al.,
2004; Mishra and Desai, 2005). Furthermore, some recent models assuming an
autoregressive process (Langousis and Koutsoyiannis, 2006; Koutsoyiannis,
2010; Efstratiadis et al., 2014; Dimitriadis and Koutsoyiannis, 2015, 2018)
succeeded in reproducing the various statistical properties of the observed
rainfall over a wider range of scales. Rainfall data of different stations
have different temporal persistence, so we applied the SARIMA model with
different autoregressive (

The second module generates the fine-timescale statistics corresponding to each monthly rainfall value generated through the SARIMA model. These synthetic fine-timescale statistics will later be used for the calibration of the MBLRP model to downscale the monthly rainfall to the hourly level. In so doing we first consider the monthly rainfall, when divided by the number of days in the month times 24, as providing us with an estimate of the mean hourly rainfall for that particular month. Then, this estimated mean hourly rainfall is provided as the input variable of the module that generates the statistics needed to fit the MBLRP model, namely the mean, variance, autocorrelation coefficient, and the proportion of dry periods at 1, 2, 4, 8, and 16 h aggregation intervals, as described in Fig. 5. In this process, the module employs the information obtained from univariate regression analyses between the fine-scale statistics of the observed rainfall (Fig. 6) and the mathematical formulae relating rainfall variance and auto-covariance at different timescales (Eq. 4), as explained below.

Schematic of the algorithm to generate fine-timescale rainfall statistics. The statistics in the blue boxes are used to calibrate the MBLRP model and the statistics in grey boxes are used to estimate the lag-1 autocorrelation.

Linear relationship between various fine-timescale statistics of the rainfall observed for the month of July of different years at gauge NCDC-200164 (black dots). The solid black line represents the least-squares regression line. Based on this regression relationship, a set of the 20 fine-timescale statistics are generated, which are immediately used as the basis of the MBLRP model parameter calibration. If the objective function of the parameter calibration corresponding to the generated set is greater than a given threshold, the set is rejected (blue squares), and only the set with the objective function lower the threshold value is chosen (red squares).

Figure 5 shows a schematic of the second module. In the figure,

The linear relationships were also identified at all other gauges investigated. This is a secondary yet significant finding of this study: a simple linearity can accurately express the relationship between the variables reflecting such chaotic and dynamic interactions occurring in natural phenomena concerning rainfall. Also note that the linearity established here applies only to sub-daily timescales. For example, a power-law may better express the relationship between the mean and standard deviation at daily scale (Sotiriadou et al., 2016).

Consider, for example, statistic

Similar principles can be applied to the remaining statistics connected through solid arrows in Fig. 5. The black dots in Fig. 6 shows the linear relationship between the rainfall statistics observed at gauge NCDC-200164 (star mark in Fig. 3) during the month of July of different years.

The statistic connected to the dashed arrow head is calculated based on the
ones connected to the tail of the same arrow using the mathematical
(deterministic) relationship connecting these variables (Eq. 4). For
instance, in Fig. 5,

If we estimate the lag-1 autocorrelation using standard estimators of the
terms in the right-hand side of this relation, i.e. by using

Figure 7a reveals that there exist discrepancies between the true c(1) and
its estimator (

From Eq. (6), it is clear that the term

The autocorrelations at various timescales are known to be correlated with
each other (Kim et al., 2013a, 2014), which means that

Residual terms (

As a result of this process, a total of 20 rainfall statistics at fine timescales (mean; variance; lag-1 autocorrelation; and proportion of dry period at 1-, 2-, 4-, 8-, and 16-hourly aggregation interval) are sampled using these conditional distributions and the individual monthly rainfall that is generated by the SARIMA model.

In this process, each of the monthly rainfall values generated by the SARIMA model is coupled with one set of six MBLRP model parameters that define the random nature of rainstorm and rain cell arrival frequency, and the intensity and duration of rain cells (Fig. 1).

In this study, the parameters of the MBLRP model were determined such that
the rainfall statistics of the generated rainfall resemble the 20 fine-scale
rainfall statistics that were coupled with the monthly rainfall generated by
the SARIMA model. The Isolated-Speciation Particle Swarm Optimization (ISPSO;
Cho et al., 2011) algorithm was employed to identify a set of parameters that
minimizes the following objective function:

It is noteworthy that Module 2 may fail to generate a realistic set of fine-scale rainfall statistics due to the complex interdependencies between them. The unrealistic fine-scale rainfall statistics cannot be represented by the MBLRP model that reflects the original spatial structure of rainfall in reality, which entails poorly calibrated model parameters with the high objective function value of Eq. (8). To exclude the poorly calibrated parameter sets caused by the unrealistic fine-scale rainfall statistics generated by Module 2, we repeated the process of Module 2 and Module 3 until the objective function value of Eq. (8) becomes lower than a given threshold value (0.8 in this study). If the algorithm fails to find the parameter set after 50 repetitions, the parameter set with the lowest objective function value is chosen. Figure 4 describes this filtering process, and the red squares in Fig. 6 show the chosen parameter sets.

The MBLRP model was used to downscale the monthly rainfall to the hourly
aggregation level. First, the MBLRP model generates the hourly rainfall time
series using the parameter set for the monthly rainfall being downscaled.
Second, the discrepancy between the fine-timescale statistics generated by
the second module of the model (Fig. 5) and the statistics of the synthetic
hourly rainfall time series generated by the MBLRP model is calculated using
the following formula:

Third, the first and the second processes are repeated 300 times. Then the synthetic hourly rainfall time series with the lowest discrepancy value is chosen. Finally, we repeated the entire process 200 times to obtain 200 synthetic hourly rainfall time series for each of the generated monthly rainfall values.

One of the primary purposes of the stochastic rainfall model is to provide
synthetic rainfall for the ungauged periods, which can be the periods of
missing data or future periods. For this reason, we separated the period of
model calibration and validation at some gauge locations (square marks in
Fig. 2) where the record length of each period is sufficiently long
(

Figure 9 compares the mean, variance, lag-1 autocorrelation, and skewness of
the monthly rainfall time series generated by the SARIMA model (

Comparison of

Figure 10 shows the behaviour of the rainfall variance varying with temporal
aggregation intervals between 1 h and 1 year at gauge NCDC-122738. The
behaviour corresponding to the observed calibration (black, 1981–2010),
observed validation (green, 1951–1980), MBLRP (blue) and our hybrid model
(red) is shown together. In addition, the behaviour based on the
two-parameter generalized Hurst–Kolmogorov process (grey, GHK hereafter;
Koutsoyiannis, 2016; Dimitriadis and Koutsoyiannis, 2018) is shown together. The
good fit between the GHK behaviour (grey) and the observed behaviour (black and
green) indicates that the observed rainfall has a clear long-term
persistency, which is also a feature of all 34 NCDC gauges. While our model
successfully reproduces the rainfall variance across the timescale, the
MBLRP model is successful in reproducing the rainfall variance only at the
hourly accumulation level. This reflects the fact that Poisson cluster
rainfall models are not designed to preserve the rainfall persistence at the
aggregation interval that is greater than the typical model storm duration,
i.e. a few hours. See Fig. 1 for example. Within the duration of one storm,
rainfall at different time steps may be similar insofar as a portion of it is
from the same rain cell. However, the rainfall within one storm is
independent of the rainfall within another storm. Therefore, it is natural
that Poisson cluster rainfall models tend to underestimate the observed
rainfall variance (which reflects the covariance structure – see Eq. 1) at
timescales exceeding the rainstorm duration. Kim et al. (2013b), when
mapping the average model storm duration across the continental United States
using Eq. (11), showed that the model storm duration of the MBLRP model
ranges from approximately 2 to 100 h, so it is not only at the annual scale,
but already at the scale of several hours (depending upon the location) that
the variability may be underestimated by the MBLRP model.

Behaviour of the rainfall variance with regard to the aggregation interval of rainfall time series at gauge NCDC-122738. The behaviour corresponding to the observed calibration (black, 1981–2010), observed validation (green, 1951–1980), MBLRP (blue) and our hybrid model (red) is shown together.

A similar trend as exhibited in Fig. 11 was observed at all of the 34 gauges.
Figure 11 compares the variance of the synthetic (

As indicated by the concentration of the scatters above the

Figure 12 compares the mean, variance, lag-1 autocorrelation, skewness, and
the proportion of dry periods of the synthetic (

Comparison of the statistics of the synthetic (

The scatters in Fig. 13 compare the 20-, 50-, 100-, and 200-year rainfall
estimated from the observed rainfall (

A linear regression line passing through the origin is shown in each plot. In all cases, our hybrid model did not show the tendency of underestimating extreme values, which is one of the most widely discussed issues in Poisson cluster rainfall modelling (Cowpertwait, 1998; Cross et al., 2018; Furrer and Katz, 2008; Verhoest et al., 2010; Kim et al., 2013a, 2016; Onof et al., 2013). This is a somewhat surprising result: our algorithm to incorporate large-scale variability of the observed rainfall not only served its original purpose but also enhanced the capability of the model to reproduce extreme rainfall values.

Comparison of the extreme rainfall values estimated from the
observed rainfall (

Figure 14 shows the degree of bias of extreme-value reproduction (slope of the regression line in Fig. 13) varying with the recurrence interval. The values corresponding to the traditional MBLRP model are also shown. The degree of underestimation of the traditional methods varies between 73 % and 87 %, and it tends to increase as the recurrence interval increases. A similar tendency was observed for our model, but the degree of underestimation was significantly reduced. For our model, the degree of underestimation is the greatest for the 1 h extreme rainfall and tends to decrease as the duration of the rainfall increases. This tendency was not observed with the traditional MBLRP model.

Degree of over- or underestimation of extreme values by our model (red)
and the traditional MBLRP model (blue).

A good rainfall model should reproduce not only the extreme values but also
the distribution of the maxima from which extreme values are derived. We
performed the two-sample Kolmogorov–Smirnov (K-S) test between the monthly
maxima of the synthetic rainfall and the observed rainfall. A significance
level of 5 % was used. Among all 408 calendar months (34 gauges

Relative frequency and the fitted GEV distribution of the 1, 4, and 16 h monthly maxima of January, April, July, and October rainfall at NCDC gauge 132203. Results of observed rainfall (black), our hybrid model (red), and the traditional MBLRP model (blue) are shown. The upper 10th percentile of the distribution (dashed box in the plots in the first, third, and fifth row) is magnified in the lower rows (plots in the second, fourth, and sixth row).

Comparison of the shape (

Figure 15 shows the relative frequency and the fitted GEV distribution of the monthly maxima of January, April, July, and October at NCDC gauge 132203. The black, red, and blue lines correspond to the result of observed rainfall, our hybrid model, and the traditional MBLRP model, respectively. The GEV distribution of the 1, 4, and 16 h rainfall durations are shown in the plots of the first, third, and fifth row, respectively. The plots in the second, fourth, and the sixth row magnify the upper 10th percentile of the distribution of the upper figures that is denoted as the dashed box. For all months and durations, our hybrid model outperforms the traditional MBLRP model in reproducing the head-to-tail part of the distribution. The distribution of the traditional MBLRP model was skewed toward the lower values. A similar tendency was observed at most gauge locations while at some of the gauges our hybrid model showed similar or slightly degraded performance compared to the traditional MBLRP model in reproducing the distribution of extreme values. We discuss this finding further in Sect. 5.1.

Figure 16 compares the shape (

Our model uses different parameter sets of the MBLRP model to disaggregate
different monthly rainfall values. This means that one given calendar month can
have many different parameter sets. By contrast, the traditional MBLRP model
uses one parameter set for each calendar month. Therefore, if we look at the
variability of each month's parameters, we can see how the model of this
study explains the variability of rainfall unlike the MBLRP model. Figure 17
shows a box plot of the parameters for each month at gauge NCDC-460582. The
parameters of the traditional MBLRP model are shown together for reference
(triangles). While significant variability is observed for all six
parameters, the parameter

Variability of the six parameters of the MBLRP model of this study (box plot) at gauge NCDC-460582 (star mark in Fig. 3). The parameters of the traditional MBLRP model are shown together for reference (triangle).

The physical characteristics of rainfall can be estimated using Eqs. (11) and
(12) to (15). We repeated the same analysis on these variables to
compare the variability of the rainfall characteristics of our hybrid mode
and that of the MBLRP model.

Variability of the rainfall characteristics of the MBLRP model of this study (box plot) at gauge NCDC-460582 (star mark in Fig. 3). The rainfall characteristics of the traditional MBLRP model are shown together for reference (triangle).

Figure 18 shows box plots of the various rainfall characteristics for each
month at gauge NCDC-460582. The values were calculated using Eqs. (11)
to (15). The rainfall characteristics of the traditional MBLRP model are
shown together for reference (triangles). The variability of the average
storm depth, the average storm duration, and the average number of rain cells
per storm was significant, so the

Comparison of the slope of regression analysis between the
statistics shown in Fig. 6 for the calibration (

Our hybrid model uses 1 MBLRP model parameter set per 1 simulation month
of 1 year while the MBLRP model needs only 6 parameters regardless of the
simulation length. However, this does not mean that our model requires 600
MBLRP model parameters (6 per month

We admit that this large discrepancy of model parsimoniousness is an issue to
be resolved for our model to be applied in practice. Regarding this, we are
planning to apply our model to additional gauge locations across the world
and share the result through the website (

Our approach of separating the period of calibration and validation adopted
to some gauge locations may seem surprising because most stochastic rainfall
generators are calibrated based upon the statistics under an assumption of
temporal stationarity of the rainfall process. According to this assumption,
the statistics of the periods of calibration and the validation should be the
same, which obviates the need for validating the model for separate periods.
However, this assumption often does not account for cases in which, for
example, the observation period is too short (e.g. a few extreme events are
included in only one part of the time period) or the time series is indeed
non-stationary. For this reason, the discrepancy of the model performance
between the calibration and the validation period should not only be
attributed to the model's limitations but also to the difference between
statistics from the two periods. In view of these considerations, our primary
purpose of separating the period of calibration and validation should be
understood as an assessment of the model's applicability to rainfall
generation for a future period. From the point of view of the calibration
period, the validation period is an ungauged period just as any future
period, and our model can be easily extended to the future period by adding a
term accounting for long-term rainfall non-stationarity to the SARIMA model
(first module). Our hybrid model assumes not only the stationarity of the
typical rainfall statistics such as mean, variance, covariance and proportion
of dry periods but also the relationship between them (see Fig. 6). The
latter has not been explicitly discussed by previous studies, so it was also
interesting to see whether such relationships between the statistics hold
over different temporal periods and how the discrepancy affects the final
model performance, if there is any. Figure 19 compares the slope of the
regression analysis between the statistics shown in Fig. 6 for the
calibration (

The phenomena observed in hydrologic systems and the subsequent effects on human and environmental systems are the consequences of the complex interactions between the components that are influenced by rainfall variability at various ranges of timescales. Therefore, a good or realistic rainfall model must properly reflect the rainfall variability at all hydrologically relevant timescales. Its importance will gather more attention because of the recent trend in the hydrologic societies of recognizing the hydrologic, human, and environmental systems from a holistic view point and interpreting them based on continuous and dynamic simulations as opposed to the event-based simulations (Wagener et al., 2010).

This study is one of many recent efforts in this regard (Fatichi et al., 2011; Kim et al., 2013a; Paschalis et al., 2014). First, we showed that the Poisson cluster rainfall model, which is probably the most widely applied stochastic rainfall models cannot reproduce large-scale rainfall variability due to in-built limitations that lie in the model assumptions. Then, we showed that a combination of an autoregressive model for monthly timescales and the “well-tuned” Poisson cluster rainfall model for the finer ranges of timescales is capable of reproducing not only the first- to the third-order statistics of the rainfall depths, but also the intermittency properties of the observed rainfall.

An additional model could be integrated to our hybrid model to incorporate further rainfall variability, for example, an approach based on random cascades (Lombardo et al., 2012, 2017; Molnar and Burlando, 2005; Müller and Haberlandt, 2016; Pohle et al., 2018) can be integrated to our model to reproduce the rainfall variability at timescales as fine as 5 min; a multivariate downscaling approach (Koutsoyiannis et al., 2003; Moon et al., 2016) may be applied to obtain spatially consistent rainfall at multiple sites. In addition, the SARIMA model that was adopted in this study could be further modified to account for the coarser rainfall variability caused by the El Niño–Southern Oscillation (ENSO) and North Atlantic Oscillation (NAO). Lastly, the genuine structure of our model that is composed of a large-scale rainfall generation module and a downscaling module may be integrated to existing space–time rainfall generators to enhance their ability to generate large-temporal-scale rainfall variability (Burton et al., 2008; Müller and Haberlandt, 2015; Paschalis et al., 2013; Peleg and Morin, 2014; Peleg et al., 2017; Benoit et al., 2018).

Our hybrid model is not easy to implement because it
requires extensive analysis of the correlation structure of the fine-scale
rainfall statistics to fine-tune the MBLRP model and downscale the generated
monthly rainfall. For this reason, we shall continue our work on all possible
rain gauge locations across the world and share the results (several hundred
years of synthetic rainfall data in text format) through the following
website:

Conceptualization: DK, CO, and JP; data curation: JP; formal analysis: JP; funding acquisition: DK; investigation: JP, DK, and CO; methodology: DK, CO, and JP; project administration: DK; resources: DK; software: JP; supervision: DK; validation: CO, and DK; visualization: JP; writing – original draft: DK, and JP; writing – review and editing: CO, and DK.

The authors declare that they have no conflict of interest.

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF 2017R1C1B2003927, 50 % grant). This study was also supported by the Basic Research Laboratory Program through the NRF of Korea funded by the Ministry of Education (NRF 2015-041523, 50 % grant). Edited by: Carlo De Michele Reviewed by: Panayiotis Dimitriadis and three anonymous referees