Forecasting green roof detention performance by temporal downscaling of precipitation time-series projections

A strategy to evaluate the suitability of different multiplicative random cascades to produce rainfall time series, taking into account climate change, inputs for green infrastructures models. The multiplicative random cascades reproduce a (multi)fractal distribution of precipitation through an iterative and multiplicative random process. In the current study, the initial model, a flexible cascade that deviates from multifractal scale invariance, was improved with (i) a temperature dependency and (ii) an additional function to reproduce the temporal structure of rainfall. The structure of the models with depth and temperature dependency was found to be applicable in eight locations studied across Norway and France. The resulting time series from both reference period and projection based on RCP 8.5 were applied to two green roofs with different properties. The different models led to a slight change in the performance of green roofs, but this was not significant compared to the range of outcomes due to ensemble uncertainty in climate modelling and the stochastic uncertainty due to the nature of the process. The hydrological dampening effect of the green infrastructure was found to decrease in most of the Norwegian cities due to an increase in precipitation, especially Bergen (Norway), while slightly increasing in Marseille (France) due to decrease in rainfall event frequency.


Abstract.
A strategy to evaluate the suitability of different multiplicative random cascades to produce rainfall time series, taking into account climate change, inputs for green infrastructures models. The multiplicative random cascades reproduce a (multi)fractal distribution of precipitation through an iterative and multiplicative random process. In the current study, the initial model, a flexible cascade that deviates from multifractal scale invariance, was improved with (i) a temperature dependency and (ii) an additional function to reproduce the temporal structure of rainfall. The structure of the models with depth and temperature dependency was found to be applicable in eight locations studied across Norway and France. The resulting time series from both reference period and projection based on RCP 8.5 were applied to two green roofs with different properties. The different models led to a slight change in the performance of green roofs, but this was not significant compared to the range of outcomes due to ensemble uncertainty in climate modelling and the stochastic uncertainty due to the nature of the process. The hydrological dampening effect of the green infrastructure was found to decrease in most of the Norwegian cities due to an increase in precipitation, especially Bergen (Norway), while slightly increasing in Marseille (France) due to decrease in rainfall event frequency.

Introduction
Hydrologic performance of stormwater green infrastructure (GI) is usually divided between retention and detention. Retention refers to water stored, infiltrated, or evapotranspirated. Actual evapotranspiration can be estimated from a water balance including potential evapotranspiration, accumulated precipitation, a soil moisture evaluation function, and a crop factor (Johannessen et al., 2017;Oudin et al., 2005). The temporal resolution for modelled evapotranspiration process for green infrastructure is typically daily (Stovin et al., 2013) or hourly (Kristvik et al., 2019). Detention refers to water temporarily stored in the GI before being discharged into a downstream stormwater network. The process temporal resolution is typically minutes. Consequently, modelling GI detention performance requires higher-resolution data to estimate its outflow (Schilling, 1991). Therefore, both highresolution climate data and projections at subdaily and subhourly scales are needed in order to model GI and to estimate their potential as a climate change adaptation measure.
In Norway and most of the European countries, precipitation has been measured with tipping buckets in numerous cities from years to decades. Moreover, climate projections at daily resolution for future precipitation and temperature from the EURO-CORDEX project are available at 1×1 km spatial resolution in Norway (Dyrrdal et al., 2018) and 12 × 12 km resolution in France (Jacob et al., 2014). Consequently, the use of such data by urban hydrologists to assess the resilience of GI solutions to face climate change is conditioned by the possibility to downscale them to a subhourly resolution.
Downscaling includes two families of methods: dynamical downscaling and statistical downscaling (Benestad, 2016). Dynamical downscaling methods use physically based equations and are usually computationally expensive specially to generate high-resolution data. Statistical downscaling consists in improving the resolution of data based on statistical properties observed from a lower-resolution dataset. The computational cost is lower; therefore, statistical methods might still be used to fill the gap in the next decades until the computational power is sufficient for routine dynamical downscaling. In addition, the use of a stochastic approach is necessary due to the current limitation in parametrization of small-scale processes (below the truncation scale) and the lack of coupling between resolved and parameterized scales (Sanchez et al., 2016).
Statistical downscaling has already been extensively used to temporally downscale data for various temporal resolutions, usually hourly or daily data. Three popular methods can be mentioned: (i) the method of fragment, (ii) the method based on point process theory, and (iii) the method of multiplicative random cascades. The method of fragment (Li et al., 2018;Lu et al., 2015) is a resampling method based on k-nearest neighbours (Kalra and Ahmad, 2011), which has been applied to derive hourly data from daily data. It can be accurate and effective due to its resampling nature, but it requires a large dataset, and by its design it cannot ensure extrapolation from observed data. Therefore, it might not be suitable to downscale climate projections. Methods based on point process theory have been used (Glasbey et al., 1995;Onof et al., 2000). The main principle is to generate storm occurrences and then describe them based on rain cells and statistical distribution based on Poisson point process. Multiplicative random cascades (MRCs) consist of using successively random cascades to split data in N data of finer resolution (N = 2 in most of the cases). It is a very popular method that deserves further investigation (Gaur and Lacasse, 2018;Rupp et al., 2012;Thober et al., 2014). They were originally based on the hypothesis of multifractal scale invariance (Schertzer and Lovejoy, 1987) and were further developed by Gupta and Waymire (1993) and Olsson (1998). While multifractal scale invariance continues to be studied (Gires et al., 2020), several studies noted a deviation from that behaviour which led to the use of more flexible models (Koutsoyiannis and Langousis, 2011;Veneziano et al., 2006). Multiplicative random cascades can be divided between canonical and micro-canonical types. The canonical one ensures conservation on average, while the micro-canonical one ensures exact conservation. The parameters of the canonical MRC are often calibrated by fitting between observed and simulated noncentred moments of depths or intensity through the timescale (Paschalis et al., 2012). The principle of micro-canonical MRC is usually based on reverse cascades: studying how the data are split and then reproducing the properties of the weights distribution depending on different quantities. The influences of timescale, rainfall intensity (Paschalis et al., 2012;Rupp et al., 2009) or season (McIntyre et al., 2016) have been extensively studied. Lombardo et al. (2012) suggested that the commonly used MRC suffers from conceptual weaknesses due to the non-stationary process of autocorrelation and proposed a method to improve the model. More recently, Bürger et al. (2014Bürger et al. ( , 2019 suggested to include a temperature dependency in MRC models to make them more robust. This also enables them to be used with projections. Green infrastructures, due to their retention and detention capacities, are seen as a promising solution to manage storm water and cope with climate change, especially in cities where urbanization increases. Among green infrastructures, green roofs are especially suitable for dense urban centres. They are designed to retain day-to-day rain by evapotranspiration and attenuate major rainfall events (Stovin, 2010). Depending on their characteristics they can also help to detain extreme rainfall (Hamouz et al., 2020). Due to the timescale of their detention process, and their sensitivity to initial water content at the beginning of a rainfall event, they are suitable for evaluating downscaled time series. Moreover, it is especially relevant to evaluate their detention performance by the end of the century under a scenario such as RCP 8.5 (Thorndahl and Andersen, 2021). The results could be used to evaluate, at strategic level, their potential in mitigating storm water in order to make robust decision (Walker et al., 2013).
While downscaling models have been used to model the performance of green infrastructure under current climate (Stovin et al., 2017) or applied to intensity-durationfrequency (IDF) curves to do an event-based simulation of local stormwater measures (Kristvik et al., 2019), none has been developed to produce future high-resolution time series as input for green infrastructure models. The aim of this research is to evaluate different MRC downscaling models and their potential to produce input time series to predict the performance of stormwater green infrastructure, for the case of green roofs. In order to achieve this aim, different parts are detailed in the paper: (i) the development of a general structure of MRC, (ii) the improvement of the MRC structure by adding a temperature dependency, (iii) the addition of an ordering function to improve the temporal structure of the produced rainfall time series, (iv) the evaluation of the capability to reproduce the performance of GI based on observed data, and finally (v) the analysis of a possible shift in performance of GI at the end of the century.

Meteorological data
Time series of precipitation and temperature from six locations in Norway and two in France, representing four different climates (Table 1) according to the Köppen-Geiger classification (Peel et al., 2007), were used to apply the downscaling method. In Norway, the precipitation was measured by 0.2 mm plumatic tipping rain gauge manufactured by Kongsberg Våpenfabrikk. The rain gauges were not heated and thus did not operate in cold temperature. They were successively replaced by Lambrecht 1518H3 (measuring tip of 0.1 mm) in the 1990s and 2000s. The stations were operated by the Norwegian Water Resources and Energy Directorate (NVE) and the Norwegian Meteorological Institute (MET). The data were quality checked by the Norwegian Meteorological institute (MET) (Lutz et al., 2020). In Lyon and Marseille, precipitation was measured by 0.2 mm Précis-Mécanique tipping bucket rain gauges. Ten climate projections (temperature and precipitation) on daily resolution with the RCP 8.5 for the period from 2071 to 2099 for Norwegian cities were available online https://nedlasting. nve.no/klimadata/kss (last access: 26 May 2022; Dyrrdal et al., 2018). For Lyon and Marseille (France), 12 climate projections at daily resolution were available for the same period and RCP (2071 to 2099, RCP 8.5) from http://www. drias-climat.fr/ (last access: 25 May 2022). The RCP 8.5, a scenario with a high gas-emission baseline leading to a radiative forcing of 8.5 W m −2 in 2100, and the end of the century were chosen to test the methods since it was the scenario and period that deviate the most from the current climate among the available data in both countries. In practice, it is relevant to evaluate GI performance at the end of the century, but their design could be based on a different period depending on their lifetime.

Data aggregation and processing
The historical data were aggregated two by two from 1 min resolution (and accordingly 6 min) to more than 1 d resolution in order to capture a part of the uncertainty linked to the estimation of the parameter of the models. The aggregation was done for each possible time steps: all multiples of 2 smaller than 1500 min (as there are 1440 min in a day). During the process of aggregation, both the weights, Eq. (1), and the rainfall continuity indicator, Eq. (2), measuring the proportion of high weight on the side of the highest neighbouring depth were computed. Given j ∈ [1. . .750] a timescale in minutes, i ∈ [0. . .N 2j ] a time step with N 2j the number of time steps at scale 2j , and d i,2j a rainfall depth, the weight w i,2j , and the indicator S i,2j of the side of the neighbour were calculated according to (2)

Downscaling process
The MRC downscaling process consists of transforming daily rainfall depths to rainfall depths at lower timescale, e.g. 1 min, by means of successive distribution of the depth of a parent time steps between its two children time steps. The process is repeated by iteration until the desired timescale is reached. Figure 1 describes the downscaling process. In practice, the downscaling started at 1440 min (1 d) time step with eight iterations to reach a time step of 5.625 min. The results were interpolated and scaled to a 6 min time step for comparison with observed data. The final time step of 6 min was chosen based on the resolution of original datasets in Lyon and Marseille. Three steps are necessary to downscale a parent time step to two children time steps. The occurrence of a zero weight, i.e. the probability to assign all the water from the parent time step to only one of the children time steps (Fig. 1, centre left), is tested. This property is especially important and acknowledged by other studies. If a zero weight does not occur, a non-zero weight w i,2j ∈]0, 0.5] is generated from a probability distribution (Eq. 3b). It distributes the depth from the parent time step between the two children time steps, as illustrated in Fig. 1, centre right. Finally, the weights w i,2j and 1 − w i,2j have to be assigned to the children time steps. The occurrence of S W (Eq. 4), i.e. allocating the highest weight to the children with the neighbour with the highest depth, is tested (Fig. 1, bottom).

Downscaling models conceptualization and calibration
Based on the observed data, six different MRC models were developed. Different mathematical expressions and probability distributions, detailed in Appendix A, were defined to represent Eqs. (3a), (3b) and (4), depending on the hypothesis inherent to the later described models ( Table 2). The models consist in three generators: a zero-weight generator, a nonzero-weight generator and a stochastic element permutation generator (SEP generator). Each of the zero-weight and nonzero-weight generators (Eqs. 3a and 3b) were considered to vary with timescale (indicated with S in the model naming). The letter I in the nomenclature indicates a depth/Intensity for the zero-weight generator (Eqs. A2a-A2d). The temperature dependency for the zero-weight generator (Eqs. A3a-A3e) was indicated by the letter T in the nomenclature. The Table 1. Locations and input data for current and future climate. The climate column gives the Köppen-Geiger classification for climate. Observed days is the number of observed days with data. YearPr is the annual precipitation in millimetre, YearWt the annual number of wet days (> 1 mm). YearTe is the mean annual temperature; for these quantities the 5th, 50th, and 95th percentiles are given as indicators. temperature dependency was added in an attempt to improve the robustness of the model toward climate change under the hypothesis that the change in rainfall pattern would be correlated to the shift in temperature and that the existing observed datasets already carry the necessary information for calibration. In the models MRC S-SEP , MRC SI-SEP and MRC SIT-SEP (Table 2) the weights generated were permuted stochastically depending on the neighbour (indicated with SEP, Eqs. 4 and A5), while the MRC S , MRC SI , and MRC SIT model considered equal probability (0.5) to permute the two children weights.
Similarly to Rupp et al. (2009), the generators of the MRC models include timescale dependency through analytical formulas. In practice it means that there is a single set of parameters per generator and not a different set at each cascade step. It ensured a relatively parsimonious number of parameters compared to other recent works on microcanonical MRC where a set per cascade step is often used (e.g. 12 to 36 total parameters by Bürger et al., 2019 or from 6 to 224 parameters per cascade step by Müller-Thomy, 2020). It should be noted that based on dataset analysis and as advised by Serinaldi (2010), despite the fact that universal and canonical MRC represent the most parsimonious approach parameterwise, their microcanonical counterpart was preferred. This choice of model that deviates from the hypothesis of multifractal scale invariance was supported by several other studies (Koutsoyiannis and Langousis, 2011;Veneziano et al., 2006). The number of parameters can be lower with the use of universal MRCs which were excluded from this study due to lack of flexibility (Serinaldi, 2010). It also allows the model to be used with any desired initial resolution lower than 1500 min. Homogeneity of the resolution in the input datasets was not required for calibration and data processing (i.e. the model can be calibrated using multiple datasets with different resolutions between 1 min and 1 d). The parameters of each generator of MRC models and each location required calibration. A single-step calibration, based on the processed data, was sufficient for generators with only timescale dependency. A multiple-step calibration with data manipulation was necessary for generators with depth/inten- Figure 1. Workflow for downscaling to transfer a depth from time step T to time step T 2 . The red boxes involve the generation of a random number. The process starts with 1440 min time step to reach 5.625 min. An interpolation is then done to reach a 6 min time step. Table 2. Nomenclature of the models and various quantities taken into account by each model depending on the process considered; S is the timescale, D the rainfall depth/intensity, T the temperature and N the close neighbour.
sity, temperature dependency, and for the non-zero-weight generator. This choice was motivated by the development of the model through data analysis and conceptualization of the model. Especially, the steps and calibration of the three generators were chosen to avoid compensation between processes using a bottom-up approach (i.e. starting from local properties and then add dependencies to progressively upscale the model). Additionally, the choice of regression over timescale was used to avoid parameter sets that lead to the correct distribution of precipitation intensities without tem-poral consistency. Later studies can further improve the procedure to make it more easily calibrated. The optimizations were based on non-linear least squares in the standard library scipy.optimize with default parameters in Python (Virtanen et al., 2020).
-The parameters of zero-weight generator with only timescale dependency (Eq. A1) followed a single-step calibration against observed zero-weight proportions by non-linear least squares.
V. Pons et al.: Downscaling models to forecast green roof detention performance -The parameters of the zero-weight generator with timescale and depth dependency (Eqs. A2a-A2d) followed a two-step calibration: (i) for each timescale, the proportion of zero-weight depending on depth was evaluated using a weighted running window to compensate for rare occurrence of extreme depths. The proportion of zero weight depending on depth was then fitted to a function (Eq. A2a). (ii) The functions modelling the parameters depending on timescale were then calibrated by least squares (parameters of Eqs. A2b, A2c and A2d).
-The parameters of zero-weight generator with timescale, depth and temperature dependency (Eqs. A3a-A3e) followed a similar calibration procedure. (i) Using running windows of temperature, the proportion of zero-weight depending on depth was fitted by least squares for different temperatures (Eq. A3a). (ii) Given a timescale the parameters depending on temperature were fitted to a Gaussian function (Eq. A3b). (iii) The parameters of the Gaussian function depending on timescale were then fitted to set of functions by least squares (Eqs. A3c, A3d and A3e).
-The non-zero-weight generator consisted of a truncated normal distribution on [0, 0.5] with µ = 0.5 (Eq. 3b) and a function σ depending on timescale (Eq. A4). It was chosen against more commonly used beta distributions (McIntyre et al., 2016) after a goodness-of-fit test was applied to the historical data. The calibration was done in two steps. (i) σ was evaluated by non-linear least squares for each timescale. (ii) The parameters of Eq. (A4) were calibrated against the evaluated σ depending on timescale by least squares.
-The parameters of the SEP generator (Eq. A5) followed a single-step calibration by least squares with processed proportion of high weight on the side of highest neighbour depending on timescale.

Green Infrastructure modelling
In order to quantify the influence of rainfall input in green roof performance estimation, two green roofs located in Trondheim were modelled. They were selected due to data availability and the contrast of their behaviours: (i) a typical extensive green roof (E-Green roof) with sedum vegetation, 30 mm of substrate, and 10 mm "eggbox" drainage layer (Hamouz and Muthanna, 2019), and (ii) a detentionbased extensive green roof (D-Green roof) with sedum vegetation, 30 mm of substrate, and 100 mm of lightweight clay aggregates (Hamouz et al., 2020). The model (Eqs. 5a-5c) was a simple reservoir model with smoothed linear function (Eq. 5c) for the outflow, Oudin's model for potential evapotranspiration (PET, Eq. 5b) and a soil moisture evaluation function to estimate actual evapotranspiration (AET) (Johannessen et al., 2017).
WC i is the water content (mm) at time t i . P i is the precipitation (mm min −1 ). The discharge Q i (mm min −1 ) is based on the empirical (Eq. 5c). The temperature T mean is in degrees Celsius, and the extra-terrestrial radiation R a is derived from the latitude and the Julian day. The constant 1 λρ ≈ 0.408 depends on latent heat and volumetric mass of water. The factor C is a calibrated factor depending on the maximum storage and the crop factor. The smoothed linear function (Eq. 5c) has three parameters: K the conductivity slope, S K the smoothing factor and WC K the starting delay. The model was developed based on data from extreme tests with artificial precipitation (Hamouz et al., 2020) by establishing a relationship between water content and runoff. The outflow depending on water content was used as input for calibration of the parameters of the discharge function using a Bayesian calibration with DREAM setup (Laloy and Vrugt, 2012). It should be noted that the model remains limited as it lumps processes and neglects dynamical effect, i.e. the wetting of the aggregates and substrate and the spatial distribution of water content within the roof (Hamouz et al., 2020). The D-Green roof's model was tested against measured discharge with, as input, a rainfall series of 2.5 months from 10 July to 25 September 2018 and a 1-month series from 5 September to 5 October 2019. The E-Green roof's model was tested against measured discharge with a rainfall series from April 2017 to September 2017 as input. Snow periods were mostly excluded for the evaluation.

Evaluation of the downscaling models
For each location, the observed precipitations were aggregated to daily resolution and downscaled to obtain 200 time series of 6 min time step. They were used to model the extensive and detention-based extensive green roofs in parallel. It should be noted that irrigation needs and snow periods were neglected since the primary objective of the study was to evaluate the produced time series. There were 10 projections available in Norway for the RCP 8.5 and 12 in France with the EURO-CORDEX project. Each projected time series was downscaled 20 times (200 simulations for Norwe-gian locations, and 240 simulations for French locations) to capture the following: (i) the variability between the projections and (ii) the variability due to the nature of the downscaling model. The number of simulations per location and per period was chosen to ensure reasonably low simulation time and represent the stochastic uncertainty inherent to the downscaling process. The stability of the percentile estimator with 200 simulations was verified against 1000 simulations in one model and one location to validate the choice.
To evaluate the performance of the downscaling model and the projected performance of green roofs, different indicators were used: -The lag-1 autocorrelation depending on timescale was evaluated. It was chosen to assess the temporal structure of the produced time series. The autocorrelation depending on lag time for timescale 6, 48 and 180 min were used for an in-depth analysis.
-The survival distribution of precipitation and discharge from both roofs were assessed at 6 min time steps. This approach is similar to the use of flow duration curves recently applied to green roofs by Johannessen et al. (2018). The exceedance probabilities were presented with a log axis to account for extreme probabilities. The median, 5th and 95th percentiles of the downscaled time series were represented. The survival distribution of discharge from the roofs with downscaled time series compared to the distribution based on observed data indicates the applicability of the downscaled time series as an input for green infrastructure modelling.
-Along with the survival distribution, a performance indicator derived from the Kolmogorov-Smirnov (KS) distance was used. The KS distance was indeed not relevant for the survival distributions where the extreme probabilities are of prime importance. The authors did not find a standard indicator for such cases in the literature; therefore the following indicator, which penalizes more errors for extreme probabilities, was developed: -Three different discharge thresholds were used to report exceedance frequency on different operating modes: 1 L s −1 ha −1 for small events, 10 L s −1 ha −1 for major events and 100 L s −1 ha −1 for extreme events. Those thresholds were chosen in common for all roofs to facilitate comparison. They represent a compromise to have the same operating modes for each location even if the occurrence of those modes differs due to different climate conditions. Small event durations were counted in days per year, major events in hours per year and extreme events in minutes per year.
-The distribution of dry periods and the retention fraction were computed. They are not expected to be affected by the downscaling process since the dry periods affecting the roofs can be observed on daily resolution, and the retention fraction can be estimated with conceptual models using daily time-step data. However, they provide additional information to analyse the behaviour of the roofs.

Hybrid event-based downscaling
In order to assess the applicability of downscaled time series to predict the future performance of green infrastructure, the methods were compared to the current recommended practice in the locations: the use of an event-based design method based on IDF curves with a climate factor (CF) (Kristvik et al., 2019). In particular, the variational method (Alfieri et al., 2008) is applied. It consists, given a return period, in considering the constant-intensity rainfall leading to the highest discharge. It should be noted that the comparison intended to follow the recommended design method and not to follow the guidelines of a specific city since they can differ in terms of regulation. For instance, in Trondheim a threshold for maximum discharge has to be fulfilled (Trondheim Kommune, 2015), while in Lyon the first 15 mm of a 20-year return period has to be retained, and beyond those 15 mm a threshold is set for maximum discharge from the parcel (Greater Lyon council, 2020). The longest available time series, which originated from Trondheim, was the most adequate for this example. For 2-, 5-and 10-year return period rainfall and runoff events, three approaches were compared: (i) peak runoff of runoff events based on an observed precipitation time series (reference), (ii) peak runoff of rainfall events based on variational method, the IDF curves and with and without climate factor (typical design approach), and (iii) a hybrid approach based on downscaling 10 5 rainfall events with a daily depth based on the return period curves with and without climate factors. This last approach used the MRC SIT-SEP model, the initial water content was set to the most probable value based on analysis of a long time series. According to the current recommendation in Norway for Trondheim municipality, a climate factor of 1.4 was applied (Dyrrdal and Førland, 2019).

Green infrastructure model
The parameterized empirical reservoir model was applied to the extensive green roof and the detention-based extensive green roof. The performance was evaluated both on the time series and individual events extracted from the time series. The criteria were (i) Nash-Sutcliffe efficiency (NSE) indicator on time series for both discharge and water content, (ii) NSE for rainfall events defined with a minimum inter events time of 6 h to analyse further the behaviour of the model, and (iii) the volumetric error on the time series to account for model retention evaluation. The water content was estimated directly from discharge measurement using the empirical curve. The performance was as follows: -NSE > 0.8 for both discharge and water content for the extensive green roof. For the three most intense events the NSE ranged from 0.9 to 0.75. The water balance error was found to be 2.1 %.
-NSE > 0.94 for both discharge and water content for the detention-based extensive roof. For the three most intense events the NSE ranged from 0.96 to 0.85. The water balance error was found to be 5 %.
The conceptual limitation of the model can be seen in Fig. 2 at the beginning of the events of the testing period. It suggests that short events with low intensity are not reproduced well by the model as it cannot represent the delay induced by the wetting of the different layers of the roofs. Since the objectives of this study involve the use of a simple model to reproduce the behaviour of two roofs, the model was not further improved. Figure 3 presents the zero-weight proportion depending on timescale, depth and temperature for two different datasets (Bodø and Hamar). In Fig. 3a the proportion of zero-weight decreases with increasing timescale for Bodø. In Hamar the proportion decreases until 45 min and increases for higher timescales. Based on this observation, two types of datasets were identified in terms of zero-weight occurrence. For data from Bodø, Bergen, Kristiansund and Trondheim, the proportion of weights that equalled zero decreased with increasing timescale. For the data from Hamar, Lyon, Marseille and Kristiansand, the proportion decreased until 45 min timescale and increased afterwards (Fig. B1a). Given a timescale, the proportion of weight equal to zero was not uniform depending on the weights (e.g. Bodø and Hamar Fig. 3b with a timescale of 48 min). Therefore, the monotony or nonmonotony of the proportion of weights equalling to zero depending on timescale can be explained by different distributions of depth in the observed data. The proportion depended on depth, which is consistent with previous work (Rupp et al., 2009). It should be noted that a high proportion of zeroweight is linked to shorter and more intense rainfall events. It could explain why the proportion is higher in Lyon than in Bergen (cf. Appendix B, Fig. B1a).

Analysis of climates properties
In Fig. 3b, the zero-weights proportion decreases with increasing depth for the case of Bodø. In the case of Hamar, it increases for depth higher than 2 mm. Figure 3c and d show that a temperature dependency may explain this behaviour. In Bodø, the proportion depending on depth gives similar results for different ranges of temperature at 48 min resolution (Fig. 3c). On the contrary, in Hamar, the subsets with lower temperature lead to a lower proportion of weights being equal to zero, compared to subsets with higher temperature (Fig. 3d). Moreover, the higher depths were observed in subsets with higher temperature. The increase observed in Hamar can be explained by the distribution of observed values. It is consistent with the observation of different temporal distributions of rainfall for different temperature ranges such as convective rains (Berg et al., 2013;Zhang et al., 2013). If, given a depth of 10 mm at resolution of 48 min, the probability to have a weight equal to zero is higher, then there is a higher probability to have an intense rainfall. The non-homogeneity of observed datasets and the shift in temperature with climate change might lead to inconsistency in time series produced by the downscaling methods that exclude depth and/or temperature dependency. The 48 min timescale was chosen to exemplify these properties. The same properties can be observed for different timescales, but the magnitude differs and tends to lower with higher timescale (Fig. B1b, c and d). Developing a model without temperature dependency might prevent comparability of parameters between locations and does not necessarily lead to parameter parsimonious models. Moreover, a model such as MRC SI can result in overfitting when used with datasets like Hamar. The functions necessary to represent the behaviour without considering the temperature dependency are more complex and less explanatory. Based on this analysis it was possible to add the temperature dependency and conceptualize a more explanatory model (MRC SIT , with Eqs. A3a-A3e) with more robust results for the influence of climate change. This underlies the hypothesis that the information about the correlation between rainfall and precipitation will be expressed in the same way through those variables in the future.

Evaluation of the downscaling methods
An overview of the performance of the downscaling and green roof models in Bergen is presented on Fig. 4. All the downscaling models performed similarly in terms of dryperiod distribution and slightly underestimated the dry periods in observed data (Fig. 4b). The dry periods were directly linked to the zero-weight probability. In green infrastructure modelling, the length of the dry periods influences the retention performance as it can lead to water stress hindering evapotranspiration. However, dry periods leading to water stress can be also evaluated with daily time-step series (there is no need for minute time-step series). Therefore, dry periods longer than the initial daily resolution are not significantly affected by downscaling.
The distribution of precipitation (Fig. 4a) was properly reproduced by MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP (KS rel = 1 in this case, indicating that the maximum distance has the same order of magnitude in data and model results) models, while MRC S and MRC S-SEP underestimated  low precipitation and overestimated high precipitation depths (KS rel = 10 2 meaning that the maximum distance reached 2 orders of magnitude). This was expected as the time steps with high depth have higher probability to not be split in the observed data. This is not the case for MRC S and MRC S-SEP models, for which probability is uniformly distributed. In Bergen, the observed precipitations were within the range of 90 % coverage interval for MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP . For the four later mentioned models, the discharge of the D-Green roof was underestimated by 1 order of magnitude with a KS rel of 1.7 × 10 1 (Fig. 4c), due to the behaviour of the roof with rare high discharge. The hyetographs produced by downscaling probably tend to generate less favourable hyetographs for this roof. Although the discharge of the E-Green roof did not fall in the 90 % coverage interval, it can be considered as slightly underestimated since the magnitude is similar with a KS rel of 2.0 (Fig. 4d). However, this was not the case for all locations, as in Hamar the most extreme precipitation tended to be underestimated, while the discharge from both roofs had the same order of magnitude as the observed data but tended to be overestimated. These findings could suggest inconsistency in the temporal structure of rainfall. This hypothesis can be confirmed by the autocorrelation (Fig. 5) being overestimated at 6 min time step.
The autocorrelation was underestimated by MRC S and MRC S-SEP models. The use of the rainfall continuity indicator increased the lag-1 autocorrelation for all models but did not improve the overall performances. The models MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP underestimated the lag-1 autocorrelation between 48 and 300 min timescales, but an in-depth analysis with different lags at 48 and 180 min timescale shows that despite that underestimation for lag-1 the general behaviour of the observed time series is reproduced. Similar observations were done for other locations.
To evaluate the produced time series it is necessary to compare the discharge with observed time series to the discharge with downscaled time series. For most of the locations, the predicted range of precipitation or discharge deviated for lowest probabilities from the values obtained with observed time series: (i) when the precipitation range matched with the observed distribution, the discharge tended to be overestimated; (ii) when the precipitation was underestimated, the discharge with observed data tends to lie in the range obtained from downscaled time series. While the downscaled time series suffer from some limitation when compared to results obtained from the observed time series, the raw discharge time series might as well not be suitable for robust decision making in green infrastructure implementation as it does not represent the natural variation of performance of green infrastructure.
In order to evaluate the potential of discharge from downscaled time series to approach the range of performance linked to natural variability, a 3-year moving window was used on precipitation and discharge time series resulting from observed precipitation. The resulting 5th and 95th percentiles of the annual duration exceeding 1, 10 and 100 L s −1 ha −1 are presented in Fig. 6 to evaluate the time series in different operating modes of the roofs. It is compared to the stochastic variability (5th and 95th percentiles) from the six models. Each horizontal line in Fig. 6 represents the range between 5th and 95th percentiles for the threshold and model considered. The different thresholds represent the discharge for respectively small events, for major events and extreme events. In Fig. 4, the thresholds correspond to 0.006, 0.06 and 0.6 mm min −1 . A good estimate is defined by a complete or partial overlap between the observed natural variability and the stochastic variability range; the order of magnitude of the estimates should be similar. For instance, in Bergen (first column), the observed range of the E-Green roof higher than 100 L s −1 ha −1 (third row) is predicted, based on observed input, from 4 to 10 min; the MRC S model provided values around 200 min; it is not a good es-timate as there is no overlap and the order of magnitude varies; the MRC SI model resulted in a range from 10 to 20 min. It is a good estimate as the ranges are overlapping, and the orders of magnitude are similar. The MRC S and MRC S-SEP models tend to underestimate the order of magnitude of the range of exceedance frequencies of the small events (1 L s −1 ha −1 ) (in Bergen, Hamar and Marseille) but tend to overestimate major (10 L s −1 ha −1 ) (Hamar) and extreme events (100 L s −1 ha −1 ) (Bergen, Bodø, Hamar and Marseille). The other models gave mostly good estimates for each of the thresholds (Figs. 6, C1). In Marseille, the models MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP tended to underestimate the higher bound of the extreme event precipitation with values lower than 50 min yr −1 , whereas the observed time series led to a maximum of 90 min yr −1 . However, those models kept the order of magnitude, while MRC S and MRC S-SEP models estimated it higher than 10 2 min. The same behaviour was observed with Hamar ( Fig. 6) and Lyon datasets (Appendix C, Fig. C1). This suggests that the models performed worse for dryer locations, possibly due to the calibration procedure since fewer wet days are available for calibration. MRC SI and MRC SIT performed similarly, but due to its structure, MRC SI may overfit to the calibration data. It could result in an inaccurate prediction in the case of significant temperature shift between the calibration and prediction datasets. To conclude, MRC S and MRC S-SEP lead to overestimation of the natural variability, while MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP give more accurate estimates.

Assessment of green roof future performance
All six models were used to assess the performance of green roofs for future climate as illustrated for Bergen in Fig. 7. It was nevertheless acknowledged that MRC S and MRC S-SEP models gave less accurate estimates. The four model MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP lead to similar results in Bergen (Fig. 7). The difference in estimates between the models with coherence indicator (MRC SI-SEP , MRC SIT-SEP ) and without (MRC SI-SEP , MRC SIT-SEP ) was negligible in comparison to the stochastic uncertainty inherent to the models and the variability linked to the different projections available under RCP 8.5 (Fig. 7). In Bergen, according to the projections, the performance of the two solutions is likely to lead to worse performance: under the current climate, the 100 L s −1 ha −1 exceedance was lower than 1 min for the D-Green roof; according to the MRC SIT-SEP model it might reach between 5 and 19 min in future climate. It suggested a shift in the order or magnitude from 10 0 to more than 10 1 min. Similarly, the E-Green roof might have a 100 L s −1 ha −1 exceedance shift from 10 1 to 10 2 min. It means that the threshold would regularly be reached. As illustrated by Figs. 8 and C2, the performance shift depends highly on the location. While the 100 L s −1 ha −1 exceedance of the green roofs was likely to get worse in Bergen, it was found to stay stable despite a small increase in Bodø   Observed represents the fine-resolution observed time series or simulation using this time series as input. The 5th to 95th percentiles were estimated with a 3-year moving window. Due to log axis, occurrences lower than 10 0 are not visible. and to improve in Hamar and Marseille. The increase of exceedance frequency in the Norwegian cities was due to an increase in precipitation (Table 1). However, the increase in temperature led to an increase in potential evapotranspiration and therefore might have attenuated or even counterbalanced the effect of rainfall increase by lowering the initial water content in the roofs at the beginning of a rainfall event. The Table 3 shows that the retention fraction was likely to decrease in Bergen, Bodø, Hamar, Kristiansand and Kristiansund. It was found to increase more significantly in Lyon, Marseille and slightly in Trondheim. The models with temperature dependency performed similarly to the model with only depth dependency in most of the locations. However, in Lyon and Marseille, the 100 L s −1 ha −1 exceedance or precipitation predicted differed from [16][17][18][19][20][21][22][23][24][25][26][27]. This suggests that some locations are more sensitive than others to temperature-dependent patterns. The models MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP allow the evaluation of shift in performance for the different roofs using exceedance range.

Design perspectives
The potential of downscaling models to improve the current practices was investigated. Figure 9 presents results based on continuous simulation, on the variational method and on the hybrid approach with downscaled events. It shows that the variational method underestimated the peak runoff with observed data, and the distribution from the hybrid approach covered them. It suggests that the variational method might not be enough conservative when compared to peak runoff from runoff events instead of rainfall events. Even if the results from the hybrid event-based downscaling lead to realistic distribution based on probable rainfall events, the downscaling models might need a different calibration or conceptualization to be optimized specifically for extreme events. Moreover the initial water content for the events remains a limitation of this method. The observed peaks show a range of possible outcomes, which highlights the limitations of the variational method with a single estimate, whereas the hybrid downscaling-event based method, leading to a range of probable outcomes, gave promising results that can lead to more robust design and decision making. Due to its characteris- Figure 7. Comparison between performance under current climate and future climate in Bergen for the MRC S , MRC S-SEP , MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP with a range from the 5th to 95th percentiles. They are compared to observed input which represents the fine-resolution observed time series or simulation using this time series as input. tics, the shift in performance between current climate and future climate is higher for the E-Green roof than for the D-Green roof. This is due to the detention layer in the D-Green roof which is not saturated by a 10-year return period event (Hamouz et al., 2020).

Conclusions
In this study, multiplicative random cascades models with different variable dependency were developed. They were based on a study of timescale, depth, and temperature characteristics of the datasets to ensure a consistent structure in the view to apply them to daily-resolution climate projections. The applicability of the synthetic time series to be used as input for performance modelling of green infrastructure was evaluated. They were used to predict the shift in runoff exceedance under a future climate. Six downscaling models were developed: two models with only timescale dependency (MRC S and MRC S-SEP ), two models with timescale and depth dependency (MRC SI and MRC SI-SEP ), and two models with timescale, depth and temperature dependency (MRC SIT and MRC SIT-SEP ). The models MRC S-SEP , MRC SI-SEP and MRC SIT-SEP include a rainfall continuity property with the intention of improving the Figure 8. Future performance of green roofs (D and E) in Bergen, Bodø, Hamar and Marseille; exceedance frequency for small events, major events and extreme events. The stochastic variability linked to the downscaled time series is evaluated with the 5th to 95th percentiles. Observed represents the fine-resolution observed time series or simulation using this time series as input. The 5th to 95th percentiles were estimated with a 3-year moving window. Due to log axis, occurrences lower than 10 0 are not visible. temporal structure of the rainfall. The parametrization of the models ensures the continuity of the different properties modelled and a low number of parameters.
The MRC S and MRC S-SEP were not sufficient to predict the future performance of green infrastructure as they lead to an overestimation of runoff; the MRC SI , MRC SI-SEP , MRC SIT , and MRC SIT-SEP lead to better performance: it was possible to predict runoff exceedance frequency with similar order of magnitude as the estimate of the natural variability of performance based on observed time series. The structure of the MRC SI and MRC SI-SEP models makes them more vulnerable to overfitting than MRC SIT and MRC SIT-SEP , which makes them less reliable for future performance estimates. However, the differences between them were negligible compared to the variability linked to the different outcome of climate models, the variability inherent to the model and its accuracy. The MRC S-SEP , MRC SI-SEP and MRC SIT-SEP add an equation to improve the temporal structure of downscaled rainfall. The models predicted higher runoff from the detention-based extensive green roof, which is consistent with their properties; however the change in performance was not significant compared to stochastic uncertainty.
Using the RCP 8.5, the different downscaling and the green roof models suggest that the performance shift due to climate change highly depends on the location. The runoff exceedance is likely to increase in Bergen while slightly decreasing in Lyon and Marseille and keeping the same order of magnitude in the other locations. The results were compared to one of the current practices: the use of the variational method with a climate factor. It highlighted the limitation of this practice that provides a singular estimate and underestimates the observed peaks. A hybrid method using downscaling on extreme events led to promising results by estimating a distribution of performance of peak runoff.
The models performed well in the eight locations and four different climates. The use of a more advanced calibration procedure with Bayesian methods should improve the results. Similarly, a sensitivity analysis could improve the parametrization, especially for the models with depth and temperature dependency in order to fix non-behavioural parameters. The current study does not include irrigation, and Figure 9. Performance depending on the return period in Trondheim for the extensive green roof (a) and the detention-based extensive green roof (b). The transparent coloured area is the distribution based on the hybrid event-based downscaling under current climate; the dotted line denotes the use of CF; the points represent the peaks runoff of runoff events from observed precipitation; the vertical lines the results found based on the variational method (VM): 2-, 5-and 10-year return periods are displayed.
snow modelling a study centred on green infrastructure modelling is therefore needed to extend the results. In order to be applied in practice on event-based simulation for design perspectives, the downscaling models need to be improved with a calibration procedure developed for extreme events and not on the complete spectrum of observation as in the current study.

A5 SEP generator
The stochastic element permutation follows a function generating the threshold to be compared to a uniformly generated random number depending on timescale: SEPGen S (S time ) = a 14 × log(S time ) 4 a 13 × log(S time ) 3 + a 12 × log(S time ) 2 + a 11 × log(S time ) + a 10 . (A5) Figure C2. Future performance of green roofs (D and E) in Lyon, Kristiansand, Kristiansund and Trondheim; exceedance frequency for small events, major events and extreme events. The stochastic variability linked to the downscaled time series is evaluated with the 5th to 95th percentiles. Observed represents the fine-resolution observed time series or simulation using this time series as input. The 5th to 95th percentiles were estimated with a 3-year moving window. Due to log axis, occurrences lower than 10 0 are not visible.
provided the French data. TMM, ES and JLBK supervised each step of the study. VP wrote the first manuscript. The manuscript was revised by all co-authors.