Technical note: Stochastic simulation of streamflow time series using
phase randomization

Abstract. Stochastically generated streamflow time series are widely used in water resource planning and management. Such series represent sets of plausible yet unobserved streamflow realizations which should reproduce the main characteristics of observed data. These characteristics include the distribution of daily streamflow values and their temporal correlation as expressed by short- and long-range dependence. Existing streamflow generation approaches have mainly focused on the time domain, even though simulation in the frequency domain provides good properties. These properties comprise the simulation of both short- and long-range dependence, as well as extension to multiple sites. Simulation in the frequency domain is based on the randomization of the phases of the Fourier transformation. We here combine phase randomization simulation with a flexible, four-parameter Kappa distribution, which allows for the extrapolation to yet unobserved low and high flows. The simulation approach consists of seven steps: 1) fitting the theoretical Kappa distribution, 2) normalization and deseasonalization, 3) Fourier transformation, 4) random phase generation, 5) inverse Fourier transformation, 6) back transformation, and 7) simulation. The simulation approach is applicable both to individual and multiple sites. It was applied to and validated on a set of four catchments in Switzerland. Our results show that the stochastic streamflow generator based on phase randomization produces realistic streamflow time series with respect to distributional properties and temporal correlation. However, cross-correlation among sites was in some cases found to be underestimated. The approach can be recommended as a flexible tool for various applications such as the dimensioning of reservoirs or the assessment of drought persistence.


Abstract. Stochastically generated streamflow time series are widely used in water resource planning and management. Such series represent sets of plausible yet unobserved streamflow realizations which should reproduce the main characteristics of observed data. These characteristics include the distribution of daily streamflow values and their temporal correlation as expressed by short-and long-range dependence. Existing streamflow generation approaches have mainly focused on the time domain, even though simulation in the frequency domain provides good properties. These properties comprise the simulation of both shortand long-range dependence as well as extension to multiple sites. Simulation in the frequency domain is based on the randomization of the phases of the Fourier transformation. We here combine phase randomization simulation with a flexible, four-parameter kappa distribution, which allows for the extrapolation to as yet unobserved low and high flows. The simulation approach consists of seven steps: (1) fitting the theoretical kappa distribution, (2) normalization and deseasonalization of the marginal distribution, (3) Fourier transformation, (4) random phase generation, (5) inverse Fourier transformation, (6) back transformation, and (7) simulation. The simulation approach is applicable to both individual and multiple sites. It was applied to and validated on a set of four catchments in Switzerland. Our results show that the stochastic streamflow generator based on phase randomization produces realistic streamflow time series with respect to distributional properties and temporal correlation. However, cross-correlation among sites was in some cases found to be underestimated. The approach can be recommended as a flexible tool for various applications such as the dimensioning of reservoirs or the assessment of drought persistence.

Highlights.
1. Stochastic simulation of streamflow time series for individual and multiple sites by combining phase randomization and the kappa distribution.
2. Simulated time series reproduce temporal correlation, seasonal distributions, and extremes of observed time series.
3. Simulation procedure suitable for use in water resource planning and management.

Introduction
Stochastically generated streamflow time series are used in various applications of water resource planning and management. These applications include water and reservoir management, the determination of the dimensions of hydraulic structures such as reservoirs, and the estimation of hydrological extremes such as droughts and floods. Stochastically generated time series mimic the characteristics of observed data and represent sets of plausible realizations of streamflow sequences (Ilich, 2014;Borgomeo et al., 2015;Tsoukalas et al., 2018b). They are essential for many uncertainty studies in hydrology because they can serve as input for deterministic Published by Copernicus Publications on behalf of the European Geosciences Union.
water system models in which they allow for the propagation of natural variability and uncertainty (Tsoukalas et al., 2018b). Stochastic models for the generation of synthetic streamflow time series need to fulfil certain requirements. They should reproduce both the marginal distribution of observed streamflow time series as well as their temporal dependence structure (Sharma et al., 1997;Salas and Lee, 2010). Temporal dependence encompasses both short-and long-range dependence. While short-range dependence typically refers to the dependence of daily streamflow values measured within a few days, long-range dependence refers to dependencies across months or years. This temporal dependence has been found to depend on magnitude, in that low values have stronger dependence than high values (Lee and Salas, 2011). A proper representation of this long-range dependence is of particular importance in studies where storage in reservoirs is of interest (Tsoukalas et al., 2018b). If one is interested in extreme events, the model should allow for the generation of values that go beyond the magnitude of those observed (Herman et al., 2016). This requires the choice of a suitable theoretical marginal distribution. Streamflow typically exhibits a skewed distribution, which requires the use of a three-or more-than-three-parameter distribution (Koutsoyiannis, 2000;Blum et al., 2017). Studies looking at individual hydrological events such as floods or droughts require a daily resolution. Therefore, the stochastic model should allow for outputs at such a fine temporal resolution. Often, study regions encompass several sites whose streamflows are correlated. Consequently, the model ideally not only allows for the simulation of streamflow at individual sites, but also for the joint simulation of streamflow at multiple sites, taking into account their spatio-temporal dependence. From a practitioner's point of view, the model should not only reproduce the characteristics of the observed data, but it should also be simple (Sharma et al., 1997).
Many different approaches have been proposed for the stochastic simulation of streamflow time series, each able to fulfil some but usually not all of the desired properties listed above. One commonly used approach is the use of a synthetic weather generator in combination with a rainfallrunoff model (Pender et al., 2015). This approach is affected by uncertainties due to hydrological model selection and calibration, which can be avoided by using direct synthetic streamflow generation approaches (Herman et al., 2016). According to Stedinger and Taylor (1982), the development of such direct approaches consists of the following steps: (1) selection of a model for seasonal marginal distributions, (2) selection of a model for spatial and temporal dependence, and (3) validation of the model. Different groups of direct approaches exist which are distinct in terms of their flexibility regarding marginal distributions and temporal dependence structures.
A first group of models consists of parametric models such as autoregressive moving average (ARMA) models and their modifications. While these models are commonly used in stochastic hydrology, they only allow for modelling of shortrange dependence because their autocorrelation decreases strongly with increasing lag time (Sharma et al., 1997). This means they guarantee neither the reproduction of observed persistence of annual flows nor the correlation structure among flows in different months (Stedinger and Taylor, 1982). This makes them unsuitable for applications where long-range dependence is important (Koutsoyiannis, 2000). However, AR models can be used to generate seemingly long-memory processes if a parametric autocorrelation structure is used to fit the data (Papalexiou, 2018). A second group of parametric models is based on the temporal disaggregation of annual series and enables the representation of longrange dependence (Stedinger and Taylor, 1982;Salas and Lee, 2010). These models include fractional Gaussian noise models (Mandelbrot, 1965), fast fractional Gaussian noise models (Mandelbrot, 1971), broken line models (Mejia et al., 1972), and fractional autoregressive integrated moving average models (Hosking, 1984). Disaggregation models can be extended to multi-site applications (Grygier and Stedinger, 1988). However, this group of models has been shown to exhibit parameter estimation problems and only allows for the representation of a narrow range of autocorrelation functions (Koutsoyiannis, 2000). A third group of models is nonparametric in its approach and includes kernel density estimation (Lall and Sharma, 1996;Sharma et al., 1997) and various bootstrap approaches. The latter include simple bootstrap, which is only useful if data are uncorrelated, moving block-bootstrap, nearest-neighbour bootstrap (Salas and Lee, 2010;Herman et al., 2016), matched-block bootstrap (Srinivas and Srinivasan, 2006), and maximum-entropy bootstrap (Srivastav and Simonovic, 2014), which also take lagged correlations into account. These nonparametric techniques resample from the data with perturbations and directly reproduce the characteristics of the original data (Sharma et al., 1997). However, the reproduction of long-range dependence is difficult, and variance can be underestimated or overestimated (Salas and Lee, 2010). To allow for values that go beyond the observed distribution, Salas and Lee (2010) proposed a model employing k-nearest-neighbour resampling with a gamma kernel perturbation. A further group of models consists of models that employ Markov chains and their variations. These models account for transition probabilities between different hydrological states (Stagge and Moglen, 2013;Bracken et al., 2014;Pender et al., 2015) and can be combined with nonparametric approaches such as k-nearest neighbours (Prairie et al., 2008). They can be extended to multiple sites by scaling the simulated values at individual sites with spatially correlated random numbers (Mehrotra and Sharma, 2006).
Several alternatives to these well-established simulation procedures have been proposed, which allow for a flexible choice of marginal distributions. These include models where the temporal dependence structure is modelled with Hydrol. Earth Syst. Sci., 23, 3175-3187, 2019 www.hydrol-earth-syst-sci.net/23/3175/2019/ copula functions, which are, however, difficult to apply for higher orders of autocorrelation (Lee and Salas, 2011). Examples of new simulation procedures based on the Autoregressive to Anything (ARTA) model proposed by Cario and Nelson (1996) following Li and Hammond (1975) are the SMARTA model by Tsoukalas et al. (2018b) or the SPARTA model by Tsoukalas et al. (2018a), which employ Nataf's joint distribution model for the simulation of stochastic time series, representing both short-and long-range dependence.
In addition, simulation schemes based on wavelet decomposition, which avoid assumptions about the temporal dependence structure, have been proposed by Kwon et al. (2007), Wang et al. (2010), and Erkyihun et al. (2016). Borgomeo et al. (2015) have shown how simulated annealing can be used to generate synthetic streamflow time sequences that represent possible climate-induced changes in user-specified streamflow properties. All these previously mentioned models are based on the time domain. An alternative to time-domain models is frequency-domain models (Shumway and Stoffer, 2017), which allow for the simulation of surrogate data with the same Fourier spectra as the raw data (Theiler et al., 1992). Such methods are based on the randomization of the phases of the Fourier transformation and have been commonly applied in hypothesis testing, when identifying nonlinearity in time series (Schmitz and Schreiber, 1996;Kugiumtzis, 1999;Venema et al., 2006;Maiwald et al., 2008), and in trend detection (Radziejewski et al., 2000). We hereafter refer to such methods, which are also known as amplitude-adjusted Fourier transformations (AAFTs) (Lancaster et al., 2018), as phase randomization simulations. Serinaldi and Lombardo (2017) used an iterative AAFT method to generate binary series of rainfall occurrence and non-occurrence. An extension of the amplitude-adjusted Fourier transformation has been presented by Keylock (2007), who applied randomization procedures to wavelet-decomposed signals to generate surrogate data. In hydrology, phase randomization simulation has rarely been applied for purposes other than hypothesis testing (Fleming et al., 2002) even though it has desirable properties which make it suitable for a wider range of applications. Indeed, its implementation is relatively simple, it can simulate time series with both short-and long-range dependence, and it can be extended to multiple sites. However, its application is often limited to the reproduction of the empirical distribution of the data. We here propose the use of phase randomization simulation for the stochastic generation of streamflow time series at individual and multiple sites. To allow for non-empirical distributions, we combine the data simulated by phase randomization with the flexible, fourparameter kappa distribution introduced by Hosking (1994) as a generalization of the three-parameter kappa distribution suggested by Mielke (1973). The stochastic streamflow generation approach shall represent a flexible tool which is easy to apply and generalizable to different contexts. This is enabled by combining a nonparametric time dependence model with a flexible four-parameter distribution. The simulation approach can be tailored to the specific problem at hand and be used for various water resource management applications.
We now turn to some theoretical background on Fourier transformation and phase randomization. For a more detailed introduction to the Fourier transformation, the reader is referred to textbooks by Morrison (1994) or Shumway and Stoffer (2017). We then discuss the use of phase randomization for the stochastic generation of streamflow time series. For illustration purposes, we apply and validate the approach on a set of four catchments in Switzerland. Finally, we discuss potential applications of the simulation approach.

Theoretical background
The basic idea behind all surrogate methods is to randomize the Fourier phases of the underlying (hydrological) process. The Fourier transformation converts a time-domain signal into a frequency-domain signal, which is complex-valued. This transformation may be depicted as a decomposition of the time series into sine and cosine waves of different amplitude, phase, and period (Fleming et al., 2002;Shumway and Stoffer, 2017). In the frequency domain, the power spectral density (power spectrum) expresses the same information in cycles as the autocovariance function expresses in lags in the time domain. The periodogram, the empirical counterpart of the power spectrum, shows high values at those frequencies which correspond to strong periodic components (Shumway and Stoffer, 2017).
The surrogate approach utilizes the property that realizations of linear Gaussian processes differ only in their Fourier phases and not their power spectrum. It preserves the autocorrelation structure of the raw series by conserving its power spectrum through phase randomization. The procedure consists of three main steps (Radziejewski et al., 2000;Maiwald et al., 2008;Kim et al., 2010). In the first step, the streamflow series is converted from the time domain to the frequency domain by Fourier transformation (Morrison, 1994). In this frequency domain, the data are represented by the phase angle and the power spectrum, as represented by the periodogram. The phase angle θ of the power spectrum is uniformly distributed over the range of − π to π . In the second step, the phases in the phase spectrum are randomized while the power spectrum is preserved. In the third step, the inverse Fourier transformation is applied to transform the data from the frequency domain back to the temporal domain (Maiwald et al., 2008).
The Fourier transformation of a given time series x = (x 1 , . . . , x t , . . . , x n ) of length n is where i = √ −1 is the imaginary unit. The original time series can be recovered by the back transformation e iω j t f ω j , t = 1, 2 , . . ., n, if the transformation is calculated for discrete frequencies ω j = 2π/n, j = 1, 2, . . . , n. The Fourier transformation surrogate method constructs a new time series y t with the same periodogram as the observations. Apart from this, the new series are statistically independent of x t . This can be achieved by fixing the Fourier amplitudes |f (ω j )| and replacing the Fourier phases φ(ω j ) = arg(f (ω j )) with uniformly distributed random numbers φ rand (ω j ) ∈ [−π , π]. A new realization is given by e iω j t f ω j e iφ rand( ω j ) .
The surrogate data consist of the same values as the original data in another temporal order but with the same time dependence structure as the original data (Schreiber and Schmitz, 2000). The approach can be extended to multiple sites by multiplying the phases of each site by the same set of random phases. This is possible because the cross-spectrum, which describes the cross-correlation of the data in the frequency domain, reflects relative phases only (Prichard and Theiler, 1994;Schreiber and Schmitz, 2000).

Stochastic streamflow simulation
Here, we use phase randomization to simulate stochastic streamflow time series to be used in various water resource management studies. The stochastic series generated using phase randomization are combined with a theoretical distribution to allow extrapolation to unobserved values which still realistically represent daily streamflow values. The observed streamflow time series require pre-treatment before phase randomization can be applied. First, they need to be normalized because phase randomization assumes Gaussianity (Maiwald et al., 2008). Second, they need to be deseasonalized in order to remove monthly/daily fluctuations (Pender et al., 2015). The stochastic simulation procedure consists of the following seven steps.
1. Fitting of theoretical kappa distribution: the fourparameter kappa distribution (Hosking, 1994) is fitted to the daily values of the observed input time series using L moments. This distribution will be used for the back transformation in Ste 7 and permits extreme values going beyond the empirical distribution to be obtained. It has four parameters and its cumulative distribution function is expressed as where ξ is the location parameter, α is the scale parameter which must be positive, and k and h are the shape parameters.
The kappa distribution was found to be suitable for fitting observed streamflow data in US catchments (Blum et al., 2017). A suitable fit was also found for our data as confirmed by the Kolmogorov-Smirnov and Anderson-Darling tests which did not reject the null hypothesis at α = 0.05 for most catchments. We fitted a separate distribution for each day to take into account seasonal differences in the distribution of daily streamflow values.  kappa distribution would prevent us from obtaining values that go beyond the range of observed data (Srinivas and Srinivasan, 2006). Depending on the input time series, other suitable theoretical distributions than the kappa distributions could be used for back transformation.
7. Simulation: steps 4-7 are repeated m times to generate m time series of the same length as the observed time series.
The method is extended to the simulation of stochastic streamflow time series at multiple sites. To model the cross-correlation between sites, the phase randomization performed in Step 4 of the procedure is performed in the same way for all the stations in the data set (Prichard and Theiler, 1994). In contrast, the parameters of the monthly kappa distributions and the power spectrum are calculated for each individual site separately.

Model validation
The simulation was validated on the observed streamflow time series of a set of four catchments in Switzerland (Fig. 1), namely, Plessur-Chur, Birse-Moutier, Thur-Jonschwil, and Cassarate-Pregassona. The catchments are characterized by diverse catchment characteristics and flow regimes (Table 1). Their catchment areas range between 74 and 493 km 2 and their mean elevations between 930 and 1850 m a.s.l. Plessur represents a catchment with a melt-dominated flow regime with high flows in summer but low flows in winter. In contrast, the flow regimes of Birse and Thur are dominated by precipitation with high flows in winter and low flows in summer. The regime of Cassarate shows two peaks, one in spring due to melt processes and one in autumn due to precipitation.
The model outlined in the previous section was fitted to the observed time series over 50 years  for each individual catchment. The application of this approach is only recommended for records longer than 30 years to reduce uncertainty in the estimation of the parameters of the kappa distribution. The model was then run, on the one hand, for each individual catchment and, on the other hand, for the four sites jointly. In both cases, 100 sets of stochastic streamflow time series of the same length as the observed series were generated as in Salas and Lee (2010) and Pender et al. (2015).
Both the temporal correlation structure and seasonal streamflow statistics were used to compare observed and simulated streamflow time series in order to assess the validity of the stochastic streamflow generation model. As in Kim et al. (2010), we used the autocorrelation function on daily values to represent the short-range temporal correlation. Further, we also used the partial autocorrelation function (Stedinger and Taylor, 1982). In addition to shortrange dependence, long-range dependence was assessed by looking at the autocorrelation function of annual discharge sums. The seasonal statistics were validated with respect to the seasonal distributions (winter: December-February, spring: March-May, summer: June-August, and autumn: September-November) and the monthly means, maxima, minima, and standard deviations. In addition to general distribution characteristics, the approach was validated for low and high flows because these characteristics are often of interest in hydrological simulation studies (Borgomeo et al., 2015). High and low flows were defined as above or below threshold values, respectively. For high flows, the 95th percentile was used as a threshold, while the 5th percentile was used for low flows.

Simulation at individual sites
The stochastic streamflow generator was found to produce realistic annual hydrograph realizations as illustrated in Fig. 2 for the Plessur catchment (Fig. 1). This is confirmed vi- sually by observing the temporal correlation structure as well as the seasonal statistics (see Fig. 3 for Thur and Plessur). The stochastic generator produces time series with mean regimes similar to the observed mean regime, and reproduces both the autocorrelation (ACF) and partial autocorrelation functions (PACF). Seasonal distributions match well thanks to the good fit of the kappa distribution to the data. Monthly means and standard deviations match particularly well, while monthly maxima and minima show some deviations from the observed maxima and minima, as was intended by using a theoretical instead of an empirical distribution. The suitability of the kappa distribution for producing realistic high and low flows is confirmed in Fig. 4. The distribution produces low flows similar to observed low flows, but with different outliers. In two catchments (Thur and Cassarate), however, observed low flows were rather overestimated. High flow distributions match well in all catchments, and values ex-ceeding observed values are generated. The four-parameter kappa distribution (Houghton, 1978;Griffiths, 1989) was found to be more suitable for representing daily streamflow values compared to distributions with even more parameters, which are rather prone to over-fitting. Similarly, tests on distributions with only three parameters (e.g. Burr type XII; Burr, 1942 and generalized Gamma distributions;Stacy and Mihram, 1965) were here not satisfactory because the distributions were not flexible enough. In cases where distributions with fewer parameters provide a satisfactory fit, they could, however, be used instead of the kappa distribution to ensure model parsimony.
The stochastic streamflow generator is able to reproduce not only the streamflow distribution and the short-range dependence in the data, but also the long-range dependence over several years (Fig. 5). Both the rapid decrease in the Hydrol. Earth Syst. Sci., 23, 3175-3187, 2019 www.hydrol-earth-syst-sci.net/23/3175/2019/ Figure 3. Comparison of observed and stochastically generated time series for the melt-dominated Plessur catchment (upper two rows) and the rainfall-dominated Birse catchment (lower two rows) for the following characteristics: mean hydrograph over 50 years, autocorrelation function, partial autocorrelation function, seasonal distributions, monthly means, monthly maxima, monthly minima, and monthly standard deviations. Black lines represent observations, while orange lines represent simulations.
ACF at short lags (up to 5 years) and the cyclical behaviour at lags longer than 5 years are reproduced as well.
The good performance of the stochastic streamflow generator with respect to streamflow distribution and temporal correlation -both short and long range -is not limited to these four example catchments, but generalizes to other data sets used as input.

Simulation at multiple sites
The stochastic streamflow generator can be extended from the simulation at individual sites to the joint simulation at multiple sites. In addition to reproducing distribution and temporal correlation at individual sites, it should then be able to reproduce the cross-correlation among sites, which describes the similarity of time series at two sites.  Cassarate. The results are given for 10 simulation runs (S1-S10), and high flows are plotted with (middle column) and without (right column) outliers. Whiskers extend to the lowest/highest data point, which is still within 1.5 times the interquartile range.
shows the cross-correlation function (CCF) for pairs of stations among the example catchments for the observed time series and the 100 simulation runs. Cross-correlation is already generally low for observations because the selected sample catchments are characterized by diverse discharge regimes and seasonality. The shape of the cross-correlation is reproduced for all pairs of stations. However, the magnitude of cross-correlation is underestimated for certain pairs of stations in the simulated time series compared to the observed series independently of the simulation run considered. For the catchment pair Birse-Thur, whose discharge behaviour is rainfall-dominated, the simulated cross-correlation is much lower than the observed one. In the observations, spatially consistent rainfall events lead to a joint rise in discharge at both stations. This behaviour is not captured by the stochastic discharge generator. The underestimation of cross-correlation is also visible when looking at the crosscorrelation of below-or above-threshold events (not shown).

Discussion and conclusions
The stochastic streamflow generator based on phase randomization has been shown to produce realistic streamflow time series with respect to both distributional properties and temporal correlation. Compared to models commonly used for the stochastic generation of streamflow time series, such as autoregressive moving average models, the simulation approach presented here reproduces not only short-range, but also long-range dependence. However, the representation of this dependence is limited to ranges within the length of the observed time series. Instead of producing one long time series, the simulation procedure allows for the simulation of multiple series of the same length as the original series. The use of ensembles of the same length as the observed time series might not be equivalent to using a long time series. Still, long-range dependence features may not be generated in either case since the model is fitted based on a limited number of years of observations. While the reproduction of the temporal dependence was well reproduced here, this is not necessarily the case under all conditions. Embrechts et al. (2010) have shown that any nonlinear transformation of a Gaussian time series, which is done during back transformation, reduces the strength of the linear correlations in the time series as expressed by Pearson's correlation coefficient and preserves only rank correlations. If one is working with heavytailed and zero inflated marginals (as present when looking at intermittent processes), it can happen that autocorrelations are reduced during back transformation (Papalexiou, 2018). Phase randomization was here combined with the flexible four-parameter kappa distribution, which was found to effectively represent daily streamflow values. The distribution of daily flows was found to be modelled well in all seasons. However, the use of one distribution per day has the disadvantage of introducing a lot of parameters, which makes the model non-parsimonious (Koutsoyiannis, 2016). If the user is not reliant on the generation of unobserved values, he/she might use the empirical instead of theoretical kappa distribution for back transformation instead. The use of the kappa distribution allows us to generate values that go beyond the range of observed values, which would not be the case if the empirical distribution was used. This ability of the generator to extrapolate extremes makes it suitable for applications where extreme events such as floods and droughts are of interest.
The generator can, on the one hand, be used to simulate streamflow at individual sites, and, on the other, to simulate jointly at multiple sites, which is not necessarily the case for other existing models. Its application to the example catchments, however, resulted in somewhat underestimated crosscorrelations between stations. This underestimation can be explained by the fact that phase randomization preserves the cross-correlation in the normal domain but not necessarily in the domain of the original distribution. This cannot be overcome even if the simulation run which best reproduces these cross-correlations is extracted from a large set of simulations. However, Stedinger and Taylor (1982) showed that estimators of the autocorrelation and cross-correlation of flows which do not match the historical sample estimates often provide more accurate estimates of the true but unknown correlations. Still, there are several potential avenues for improving the representation of cross-correlation. A first possibility would be the use of phase annealing (Hörning and Bárdossy, 2018). Phase annealing modifies the Fourier phases in an iterative way in order to optimize certain statistics, such as the cross-correlation function, and makes it possible to take covariates into consideration for the generation of time series. However, using phase annealing increases the computational effort. A second possibility was presented by Keylock (2012), who only randomized the phases corresponding to the wavelet coefficients lying above a certain threshold. He suggested fixing the large wavelet scales if one wanted to ensure that the low-frequency behaviour between the observations and simulated series remains the same. This can indeed be a solution for retaining the cross-correlation between two series. However, it comes with the disadvantage that the temporal structure of the simulated series is not very variable from the one of the observed series anymore. A third possibility is the introduction of functions correcting for the phase differences between two series as done by Nguyen et al. (2019), who applied this approach to correct for biases across multiple atmospheric variables derived from global circulation models. Another possibility for addressing the underestimation of cross-correlation would be the inflation of the cross-spectrum in the original domain in order to allow for a certain target cross-correlation after back transformation. To do so, transformation approaches have been introduced, which inflate the original process, which should after the back transformation to the original domain result in a process with a target distribution and correlation structure (Papalexiou, 2018;Tsoukalas et al., 2018b). An additional disadvantage of the method presented here (and of most other approaches presented in the literature) is that time irreversibility, which has been shown to be significant at a daily scale (Koutsoyiannis, 2019), is not explicitly modelled.
The streamflow generator was here used on observed streamflow time series. The input time series, however, do not necessarily need to consist of observed values. One could also use the generator on streamflow simulated with a hydrological model. This extends its application to climate impact studies where a hydrological model is driven by meteorological time series generated with global and/or regional climate models. Alternatively, the representation of non-stationary conditions in the properties of the marginal distribution or the temporal dependence structure could also be achieved by adjusting the parameters of the marginal distribution or the frequency spectrum, respectively. Phase randomization simulation can potentially accommodate not only changing climate conditions, but also changes in land use or water extractions. The approach is not limited to the simulation of streamflow time series, but extends to other hydro-meteorological variables such as precipitation, evapotranspiration, or snowmelt. This would require the test and identification of a suitable marginal distribution. In the case of intermittent processes, mixed-type marginal distributions would need to be used (Papalexiou, 2018). Distributions other than the kappa distribution can be used in PRSim by specifying a suitable (mixture) distribution. Spatio-temporal modelling of precipitation fields, for example, may be performed using a technique based on phase randomization. However, it must be noted that due to the large number of zero observations (specifically with fine temporal resolution), the normal score transforma-tion can become non-unique. In this case, additional efforts are needed to preserve the spatial structure of precipitation.
The stochastic streamflow generator presented here represents a flexible tool for streamflow simulation at individual or multiple sites. It can be used for various applications such as the design of hydropower reservoirs, the assessment of flood risk, or the assessment of drought persistence and the estimation of the risk of multi-year droughts.
Code availability. The stochastic simulation procedure for a single site using the empirical, kappa, or any other distribution and some of the functions used to generate the validation plots are provided in R package PRSim. The stable version can be found in the CRAN repository https://cran.r-project.org/web/packages/PRSim/ index.html (Brunner and Furrer, 2019), and the current development version is available at https://git.math.uzh.ch/reinhard.furrer/ PRSim-devel.
Author contributions. AB and MIB jointly developed the concept and methodology of the study. MIB and RF set up the simulation approach. MIB did the data analysis, produced the figures, and wrote the first draft of the manuscript. The manuscript was revised by RF and AB and edited by MIB.