Stochastic simulation of reference rainfall scenarios for hydrological applications using a universal multi-fractal approach

. Hydrological applications such as storm-water management usually deal with region-speciﬁc reference rainfall regulations based on intensity–duration–frequency (IDF) curves. Such curves are usually obtained via frequency analysis of rainfall and exceedance probability estimation of rain intensity for different durations. It is also common for reference rainfall to be expressed in terms of precipitation P , accumulated in a duration D , with a return period T . Me-teorological modules of hydro-meteorological models used for the aforementioned applications therefore need to be capable of simulating such reference rainfall scenarios. This paper aims to address three research gaps: (i) the discrep-ancy between standard methods for deﬁning reference precipitation and the strong multi-scale intermittency of precipitation, (ii) a lack of procedures to adapt multi-fractal precipitation modelling to speciﬁed partial statistical references, and (iii) scarcity of proper multi-scale tools to quan-titatively estimate the effectiveness of such simulation procedures. Therefore, it proposes (i) a procedure based on extreme non-Gaussian statistics in two scaling regimes due to earth’s ﬁnite size to tackle multi-scale intermittency head on, (ii) a renormalization technique to make simulations comply with the aforementioned partial statistical references, and (iii) multi-scale metrics to compare simulated rainfall time series with those observed. While the ﬁrst two proposals are utilized to simulate reference rainfall scenarios for three regions (Paris, Nantes, and Aix-en-Provence) in France that are characterized by different climates, the last one is used to validate them. The

parameters, 8) Low Computational complexity -the entire simulation procedure including parameter estimation is not too time consuming. The existence of simplifying physical principles such as universality help the frameworks in being highly parsimonious and computationally simple without compromising too much on the physical relevance of the simulations. Table. 1 shows a literature-based comparison of the desirable characteristics possessed by each model sub-classification. As shown in Fig. 1 most of the aforementioned models (10 out of 12) seem to be more focussed on computational and conceptual simplic-50 ity than on physics. Alternatives such as Universal Multifractal (UM) cascades that aren't computationally that complicated (compared to high-resolution Numerical Weather Prediction models that explicitly represent given atmospheric processes on a limited range of scale) therefore seem to be attractive choices especially since they are capable of representing fields with high spatio-temporal variability (Schertzer and Lovejoy, 1989;Tessier et al., 1996;Schertzer, 2006, 2007;Schertzer et al., 2010;Schertzer and Lovejoy, 2011;Hoang et al., 2012;Lovejoy and Schertzer, 2013;Gires et al., 55 2013; Hoang et al., 2014).
The objective of this paper is to simulate region specific reference rainfall scenarios which could be used as realistic inputs (as they exhibit larger variability and intermittency over a wide range of scales compared to uniform rainfall or synthetic hyetograms) to hydrological models for optimally designing storm-water management infrastructures. This paper attempts to understand the possibility of simulating reference rainfall ensembles characterized by the required properties (P,D,T) while 60 exhibiting temporal variability and intermittency close to that of observed rainfall data. Section 2 discusses the different regions considered in France, their corresponding reference rainfall regulations and the observational datasets used. These rainfall datasets are analysed via multifractal techniques as shown in Section 3 to identify scaling regimes and corresponding UM parameters necessary to simulate rainfall. Section 4 gives a brief recollection about discrete-in-scale UM cascades, explains in detail the procedure used here to simulate reference rainfall scenarios, and finally defines four metrics to quantitatively compare 65 the simulations with corresponding datasets. Finally, the conclusions of this study along with its limitations and some future scope (extension to higher dimensions and other regions) are discussed in Section 5.
2 Regions considered and observational datasets used French regional storm-water management/discharge regulations are usually expressed in relation with some reference rainfall events expressed in terms of precipitation P , duration D, return period T values. As shown in Table. 2 the P ,D,T values -for 3 70 different localities -display high variability, but this is not that surprising since these values correspond to reference rainfall and rainfall like many other geophysical fields exhibits high spatio-temporal variability. As seen from the P ,D,T combinations for Nantes and Aix-en-Provence it is very clear that these specifications are highly variable even within the same region considered and the corresponding hydrological designs have to take into account such high space-time variability of rainfall at least up to (and in fact more than) these legal constraints or regulations. Therefore, it is quite logical that the modelling technique 75 to be used for stochastically simulating an ensemble of such highly variable reference rainfall scenarios should explicitly incorporate properties of heterogeneity and Non-Gaussian statistics among several other properties that the observed fields typically exhibit. The rainfall datasets used for the three regions (Paris, Nantes and Aix-en-Provence) were obtained from MeteoFrance. Different time series were collected according to the time step (6-minute, hourly, daily). Figure. 2 shows the selected conurbations and their climatological rainfall data. These three regions were selected for this study as their monthly 80 cumulative rainfall climatology (computed from daily data sets) are quite different from each other: while Paris receives around 40-60 mm monthly rainfall, Nantes receives a higher monthly rainfall from around 40-90 mm, Aix-en-Provence on the other hand receives a more variable monthly rainfall from around 10-80 mm. Cities are chosen here since storm-water management is more vital in urban areas due to their limited infiltration capacity. Information about the datasets used for each city/conurbation are given in Table 3. Since the proportion of data missing is low, replacing these values with zeros will probably not result in 85 any significant change to the actual data. For the sake of simplicity, we shall henceforth refer to the daily, hourly and 6-minutes datasets of Paris, Nantes and Aix as PD1, PD2, PD3, ND1, ND2, ND3, AD1, AD2, AD3 respectively.
The concept of universality in complex systems states that only a few parameters (out of many) are relevant for defining the system since the same dynamical process is repeated scale after scale or the process interacts with many independent processes 90 over a range of scales resulting in this reduction (Schertzer and Lovejoy, 1987;SCHERTZER et al., 1991;Schertzer and Lovejoy, 1997). In the UM framework only three parameters α,C 1 ,H (therefore referred to as UM parameters) are necessary.
The three universal multifractal parameters have different geometrical and physical meanings. The degree of multifractality α defines the deviation from monofractality and its value is between 0 and 2 (if α = 0 the process is mono-/uni-fractal with a unique fractal scaling exponent contrary to other cases (α ̸ = 0), if α = 2 the process has maximum multifractality with a 95 larger spectrum of scaling exponents), codimension of the mean C 1 which describes the sparseness of the level of activity that dominantly contributes to the mean field (C 1 = 0 if the rainfall is homogeneous or in other words, if it always rains). The parameter H (not exactly the same as Hurst's exponent but related to it) quantifies the deviation from a conservative process (H = 0), where the ensemble average of the field is conserved or in other words the ensemble average of the normalized field is 1. In a stochastic multifractal formalism, the q-th order statistical moment of rainfall R λ observed at a scale l follows the 100 multiscaling equation: where λ is the intermediate scale ratio or resolution (ratio of the largest scale to the intermediate scale l), the equality sign is used here in a scaling sense, and the scaling exponent K(q) is the scaling moment function that is scale-independent. For conservative UM, K(q) depends only on the UM parameters as follows: By computing the trace moments and double trace moments the function K(q) and UM parameters can be empirically estimated (Schertzer and Lovejoy, 1987;Lavallee et al., 1993) as briefly discussed in the following two subsections. We consider each observational dataset to be a single sample (to avoid any reduction in the largest scale considered which may lead to different multifractal characteristics). However, there is a drawback due to this small sample size: the estimate of spectral slope 110 β is unreliable (coefficient of determination of the straight line fit is too low). Since H = β+K(2)−1 2 , consequently the H values estimated using β are also not very accurate. Therefore H is estimated by considering the first order (q = 1) un-normalized trace moment ⟨R λ 1 ⟩ (initially) assuming that the field is non-conservative where once again equality sign is used for a possible asymptotic equivalence (λ → ∞).

115
It turns out that for all the datasets (from PD1 to AD3) the slope of a straight line fitted through a log-log plot of ⟨R λ 1 ⟩ vs. λ is close to zero, implying H ≈ 0 (as shown in Table. 4). Therefore, we proceed by assuming the observed rainfall used in this study are conservative fields.
ratio Λ = largest scale smallest scale is averaged to obtain rainfall over coarser and coarser resolutions R λ ,where the intermediate scale ratio λ is a decreasing integer power of λ 1 (λ = λ 1 n , Λ = λ 1 N ; n = N, . . . , 0), which is the scale ratio of the elementary cascade step and usually equals 2: Since rainfall fields are multifractals their statistics follow the multiscaling equation Eq. (1), therefore the trace moments at 125 coarser and coarser resolutions T M λ = ⟨R λ q ⟩ ⟨R λ ⟩ q when plotted vs. λ in a log-log coordinate can be used to estimate the slope K(q) of a fitted straight line. Figure. 3 shows the results of this analysis done on all the datasets (PD1 to AD3): there are two scaling regimes having distinct slope or K(q) with a scaling break (the scale where K(q) changes abruptly and distinctly) at around 2 to 4 weeks. This break in temporal scaling can be attributed to the synoptic maximum (Tessier et al., 1996) or in other words the lifetime of planetary scale atmospheric structures. The similarity of scaling breaks observed in all the datasets justify 130 the dependence of scaling break on the value of the largest planetary spatial scale (and its corresponding eddy turnover time or lifetime). All these scaling ranges of both the first and second scaling regimes are tabulated in Table. 4. Henceforth the scaling moment functions of the first and second scaling regime are denoted as K 1 (q) and K 2 (q) respectively.

Double Trace Moment (DTM) Analysis
Although the TM analysis helps in estimating K(q), it does not provide explicit estimates of UM parameters α, C 1 . To do this 135 the DTM analysis (Lavallee et al., 1993) is used: where η is the power to which the field is raised. Eq. (5) suggests that when K(q, η) vs. η is plotted in log-log coordinates, the slope of a fitted straight line gives the estimate of α, whereas C 1 is calculated using this α estimate and the y-intercept of the fitted straight line. While performing the usual DTM analysis it is found that the α estimates are larger than 2 (thereby exceeding 140 the limits in Eq. 2) in the first scaling regime for all the datasets considered here. Generally this could be due to two different issues: (i) an incorrect α estimation procedure, or (ii) an incorrect assumption about the processes conservativeness. However, for the datasets considered here the first possibility seems more likely due to the fact that the H estimates are negligibly small (as shown in Table. 4 and discussed earlier in section 3) and that Fourier analysis of these datasets are unreliable due to the small sample size chosen (N s = 1). Therefore, to overcome this issue an iterative DTM procedure is used here. More technical 145 details about this procedure is given in the Appendix A. Table. 4 shows the UM parameters estimated using the 9 different datasets, while Fig. 4 shows the DTM based estimation procedure. The parameters for the first scaling regime and second scaling regimes are denoted by the subscripts 1 and 2 respectively. Although 3 different scaling breaks and 6 different pairs of α,C 1 values are empirically estimated (3 pairs for each scaling regime) for each region, for simulating a reference rainfall scenario that corresponds to rainfall observed in the corresponding region only 1 scaling break and 2 pairs of α,C 1 values (1 pair for each scaling regime) are necessary (since these values are not too dependent on the dataset used, this choice is justified). The UM parameters estimated from the daily and six-minutes data are selected to be used for the first and second scaling regime in the simulations, whereas the median value of scaling breaks (out of the three scaling breaks estimated from daily, hourly and six-minutes datasets) are chosen. To confirm that this selection procedure does not result in any significant difference in the multifractal characteristics of the datasets and the corresponding simulations we compute the Multifractal

155
Comparison Index (MCI) based on the difference in the theoretical maximum observable singularity from a finite-sized sample γ s (Hubert et al., 1993;Douglas and Barros, 2003) based on the difference between UM parameter values observed from datasets and selected for simulations (as indicated by the subscripts obs and sel) with the analytical expression with respect to α and C 1 , the indices i,j denote the scaling regime (first or second) and the dataset (6-minutes, hourly or daily) used respectively.
MCI is computed to be 0.03 for both Paris and Nantes, and 0.04 for Aix. These low values of MCI justify the aforementioned selection procedure.

4 Discrete-in-scale Universal Multifractal cascades
Multifractal cascade processes have strongly non-Gaussian statistics (e.g., fat-tailed distributions) and therefore are capable of generating structures of highly varying intensities. These cascades represent the atmospheric physical processes underlying rainfall generation in an abstract (Richardson's idea of energy transfer from large to small scales by random breakups of eddies) but explicit manner by the concept of scale-symmetry or scale-invariance (a property respected even by the Navier-

170
Stokes equation used by state-of-the-art NWP models for operational weather forecasting). Therefore, these types of models can be considered as a bridge between purely statistical and purely physical models. Due to their multiplicative property the heterogeneity of the simulated field increase incrementally at smaller scales (making these models capable of generating scaledependent rain rates as observed in nature). Although discrete-in-scale cascades consider scale-ratios that are integer powers of integers they exhibit better scaling properties and are pedagogically straightforward compared to continuous-in-scale cascades 175 (Lovejoy and Schertzer, 2010a, b). Furthermore, for the current purpose of simulating rainfall fields anisotropic and vector generalizations do not seem too relevant. The discrete-in-scale UM cascade model is used here to simulate an ensemble of rainfall scenarios for each region (and its corresponding P ,D,T specifications). The basic idea of discrete-in-scale cascades Lovejoy, 1989, 2011) is to iteratively divide large-scale eddies (structures) using a constant integer scale ratio λ 1 (usually 2 as mentioned earlier) and multiplicatively distribute flux (ε λ ) to these sub-eddies randomly (stochastically). It is convenient to do this using an additive noise or generator Γ λ (generator) the exponential of which results in the multiplicatively modulated multifractal flux field at resolution λ (Schertzer and Lovejoy, 1989). To simulate universal multifractals (whose statistics are governed by Eqs. 1 and 2) this generator must satisfy To do this an extremal Lévy random variable of index α and suitable amplitude (Pecknold et al., 1993;Gires et al., 2013) -185 that is a function of C 1 -is chosen as Γ λ (this generator generates the singularity γ λ corresponding to each sub-eddy). In the present context rainfall R λ accumulated in a given interval of time is the flux ε λ . Such a simulated field when normalized by its ensemble average is canonically conserved (Schertzer and Lovejoy, 2004b).

Simulating reference rainfall scenarios
To have the same P ,D,T characteristics of the reference rainfall, a simulated rainfall series with largest (temporal) scale (T sim ) 190 needs to have a specific number (ρ) of peak values of rainfall (≥ P ) accumulated over specific durations (D) so that their return period T = Tsim ρ . A simple way to do this is to multiply the simulated multifractal time series (with largest scale T sim = ρT, ∀ ρ ∈ Z + ) by an appropriate renormalization constant (RC): P divided by the ρ-th highest value in the multifractal series aggregated over duration D. Therefore, the simulated rainfall series are dependent on these P ,D,T values, resulting in 1 rainfall series for Paris, 11 rainfall series for Nantes, and 3 rainfall series for Aix. Since the observed datasets have two scaling 195 regimes it is necessary to use a double cascade: a coarser resolution cascade using parameters α 1 ,C 11 for the first scaling regime and a finer resolution cascade using parameters α 2 ,C 12 for the second scaling regime. Let the smallest scale observed and simulated be δ (here δ = 6 minutes for all three regions). The largest temporal scale selected from observed datasets T sel is related to the largest scale that can be simulated T (s,sim) (largest scale in simulated sample which is a power of λ 1 = 2 and ≤ T sim ): where ⌊x⌋ denotes the integer part of x. to finally extract a time series ε Λ δ where Λ δ = Tsim δ (here δ = 6 minutes). The ρ-th highest value in a aggregated multifractal seriesε (ε Λ δ aggregated to resolution Tsim D ) when multiplied by RC should equal P. If we rank the values in seriesε in decreasing order and call itε DO , then the ρ-th valueε DO (ρ) is the ρ-th highest value. Therefore, RC is computed as The RC computed using Eq. (10) when multiplied to ε Λ δ gives the final rainfall series that has characteristics corresponding to the reference rainfall. This entire procedure is repeated n e times to generate an n e member ensemble of possible reference rainfall scenarios (Fig. 5. schematically presents the whole simulation method). Here n e = 10, i.e. an ensemble of 10 members (m1 to m10) are simulated.

230
The MCM is a theoretical metric and is computed based on the maximum observable theoretical singularity γ s (Eq. 7) from a finite sample size N s ≈ λ Ds where D S is the sample dimension (Schertzer and Lovejoy, 1992) in each dataset and in each simulated member of the ensemble where j indicates the dataset used (daily, hourly or 6 minute), i denotes the scaling regime (first or second), k indicates the 235 ensemble member. MCM is closely related to MCI defined earlier, the only difference between them is that MCM uses the UM parameters estimated from DTM analysis of simulated members, whereas MCI directly uses the UM parameters selected for simulations from the observed datasets. Therefore, as expected the MCM computed for all the simulations are very low (shown in Figure. 9) and close to MCI. Since both MCM and MCI depend only on the UM parameters (or in other words the multifractal characteristics of the series), they are scale-independent (this means that MCM of two fields of same or different 240 resolutions or lengths are not too different). Since renormalization does not affect the multifractal properties of a series, the MCM is independent of P ,D,T . Lower values of MCM imply that the simulation has multifractal properties close to that of observed data.

Rainfall Comparison Metric (RCM)
RCM on the other hand is a more practical metric and is computed based on the highest rainfall value present in the dataset 245 and in each simulation member: where Λ(obs(j)) = L obs(j) δ(j) ; Λ(sim(k), j) = L sim(k) δ(j) ; δ(j) =1 day,1 hour,6 minutes for j = 1, 2, 3, the indices j, k have the same meaning as in MCM.
Lower values of RCM imply that the extreme behaviour of simulations are closer to that of the observed data. But RCM 250 is sensitively dependent on scale and P ,D,T . Therefore, as shown in Figure. 9 the RCM values are larger for cases where P D is larger. This might be due to the fact that the datasets used (since they are of shorter lengths) are not actually representative of these specific P ,D,T values that correspond to rainfall events that are more extreme (since probability of observing rarer events is higher in larger datasets).

255
SCM is a metric that instead of comparing the actual fields compares the singularities corresponding to them, and is computed as: where γ Λ(obs(j)) = log R Λ(obs(j)) log Λ(obs(j)) ; γ Λ(sim(k),j) = log R Λ(sim(k),j) log Λ(sim(k),j) , the indices j, k have the same meaning as in MCM. Lower values of SCM imply that the simulations are closer to the observations (after reducing the effect of scale-dependence 260 on the comparison) since the singularities corresponding to the simulations and the singularities corresponding to the observations are close to each other. Although SCM is scale-dependent it is less sensitive to scale than RCM; moreover SCM is also dependent on P ,D,T . Therefore, SCM values of all simulations are low (≤0.15) even for cases where P D is larger as shown in Figure. 9.

265
The main drawback of the MCM, RCM and SCM are that they focus only on either the maximum rainfall values or the maximum singularities. On the contrary, a range of values rather than threshold values can be used. For instance, the codimension of singularity c(γ) takes into account a range of singularities larger than γ. Following Schertzer and Lovejoy 1987: meaning that c(γ) can be obtained as the negative of the slope of a straight line fitted to log-log plot of Pr[R λ ≥ λ γ ] with 270 respect to λ. Equation. 14 (where ≈ indicates an asymptotic equivalence) implies that c(γ) is almost scale independent and any metric defined using it should also be not very scale-sensitive. The CCM is defined as where γ Λ(obs(j)) (i) = min[γ Λ(obs(j)) ] + 1 n (i − 1)(max[γ Λ(obs(j)) ] − min[γ Λ(obs(j)) ]), the indices j, k have the same meaning as in MCM, whereas i indexes the singularities (here n = 10 singularities are used for the comparison procedure).

275
The CCM is dependent on P ,D,T via the singularities γ Λ , therefore in an almost scale-independent manner. As shown in Fig. 9, the SCM and CCM values are consistently low implying that it is possible to simulate reference rainfall ensembles characterized by the required properties (P, D, T ) while taking into account temporal variability.

Conclusions
A novel method is proposed to simulate reference rainfall scenarios that are indispensable for hydrological applications such 280 as designing green roofs and other generic storm-water management devices. The suggested discrete-in-scale Universal Multifractal cascade based method is used here to stochastically simulate an ensemble of reference rainfall scenarios (with rainfall events exceeding or equal to P mm within D hours duration having a return period of T years) as specified by regional stormwater management regulations for three conurbations in France. The extreme variability of P ,D,T values which is a direct result of the extreme space-time variability of precipitation and underlying atmospheric processes, not only justifies but also 285 makes the choice of UM framework rather crucial in producing computationally cheap, physically realistic reference rainfall ensembles. Furthermore, four new metrics are proposed to quantify the performance of the suggested procedure and analyse their effectiveness. Three (MCM, SCM and CCM) out of the four metrics (which are not too scale-dependent) seem to indicate that the simulations are good. CCM being almost scale-independent, and utilizing a range of values rather than just maxima for comparison seems to be the most reliable comparison metric. Therefore, the consistently low CCMs show that the pro-290 posed method is indeed an attractive choice to stochastically simulate physically-based reference rainfall scenarios. Although only purely temporal, discrete-in-scale, conservative simulations over three locations (Paris, Nantes, Aix) are considered in this study, this approach could possibly be generalized to spatio-temporal, continuous-in-scale, non-conservative simulations over other locations as well. Finally, it is worth noting that the suitability of the UM framework in simulating reference rainfall scenarios and in several other geophysical applications (Ramanathan et al., 2018Satyanarayana, 295 2019, 2021;Ramanathan et al., 2021), illustrates that this generalized, physically based, computationally simple framework could probably be ideal for framing reference rainfall regulations guiding hydrological applications/designs.

Appendix A
To get more accurate α estimates for the first scaling regime an iterative DTM procedure is implemented here. Following earlier studies (Hoang et al., 2012) the idea of this procedure is to estimate η min = ( CΣ C1 ) 1 α max[1, 1 q ] and η max = ( 1 C1 ) 1 α min[1, 1 q ]: 300 first using an initial guess of α, C 1 (based on initial guesses of η min and η max ) then subsequent η range and α, C 1 estimates are obtained in each iteration until there is no longer any change in the η range and therefore the α, C 1 estimates. The codimension (difference between the dimension of the embedding space and that of the dimension of the set under consideration) of non-zero rainfall support C Σ = 1 − D Σ (here D Σ is the fractal dimension of rainfall greater than the minimum threshold considered).
However, the procedure used here is slightly modified: instead of searching for both η min and η max simultaneously in each 305 iteration, the current procedure fixes η min as a constant value (here it is initially 1) and obtains different η max , α, C 1 values in each iteration. If the α estimate is still > 2 or if the α values keep changing even after a certain number of iterations, η min is slightly reduced and the whole procedure is repeated. A q value of 0.8 is used here so that the usable range of η is larger (since the multifractal phase transition due to divergence of moments is more delayed) resulting in more reliable α, C 1 estimates.