Analysis and modelling of a 9.3 kyr palaeoﬂood record: correlations, clustering, and cycles

. In this paper, we present a unique 9.5 m palaeo-lacustrine record of 771 palaeoﬂoods which occurred over a period of 9.3 kyr in the Piànico–Sèllere Basin (southern Alps) during an interglacial period in the Pleistocene (some-time from 780 to 393 ka) and analyse its correlation, clustering, and cyclicity properties. We ﬁrst examine correlations, by applying power-spectral analysis and detrended ﬂuctuation analysis (DFA) to a time series of the number of ﬂoods per decade, and ﬁnd weak long-range persistence: a power-spectral exponent β PS ≈ 0.39 and an equivalent power-spectral exponent from DFA, β DFA ≈ 0.25. We then examine clustering using the one-point probability distribution of the inter-ﬂood intervals and ﬁnd that the palae-oﬂoods cluster in time as they are Weibull distributed with a shape parameter k W = 0.78. We then examine cyclicity in the time series of number of palaeoﬂoods per year, and ﬁnd a period of about 2030 years. Using these characterizations of the correlation, clustering, and cyclicity in the original palaeoﬂood time series, we create a model consisting of the superposition of a fractional Gaussian noise (FGN) with a 2030-year periodic component and then peaks over threshold (POT) applied. We use this POT FGN + Period model to create 2 600 000 synthetic realizations of the same length as our original palaeoﬂood time series, but with varying intensity of periodicity and persistence, and ﬁnd optimized model parameters that are congruent with our original palaeoﬂood series. We create long realizations of our optimized palaeoﬂood model, and ﬁnd a high temporal variability of the ﬂood frequency, which can take values of between 0 and > 30 ﬂoods century − 1 . Finally, we show the practical utility of our optimized model realizations to calculate the uncertainty of the forecasted number of ﬂoods per century with the number of ﬂoods in the preceding century. A key ﬁnding of our paper is that neither fractional noise behaviour nor cyclicity is suf-ﬁcient to model frequency ﬂuctuations of our large and continuous palaeoﬂood record, but rather a model based on both fractional noise superimposed with a long-range periodicity is necessary.

Abstract.In this paper, we present a unique 9.5 m palaeolacustrine record of 771 palaeofloods which occurred over a period of 9.3 kyr in the Piànico-Sèllere Basin (southern Alps) during an interglacial period in the Pleistocene (sometime from 780 to 393 ka) and analyse its correlation, clustering, and cyclicity properties.We first examine correlations, by applying power-spectral analysis and detrended fluctuation analysis (DFA) to a time series of the number of floods per decade, and find weak long-range persistence: a power-spectral exponent β PS ≈ 0.39 and an equivalent power-spectral exponent from DFA, β DFA ≈ 0.25.We then examine clustering using the one-point probability distribution of the inter-flood intervals and find that the palaeofloods cluster in time as they are Weibull distributed with a shape parameter k W = 0.78.We then examine cyclicity in the time series of number of palaeofloods per year, and find a period of about 2030 years.Using these characterizations of the correlation, clustering, and cyclicity in the original palaeoflood time series, we create a model consisting of the superposition of a fractional Gaussian noise (FGN) with a 2030-year periodic component and then peaks over threshold (POT) applied.We use this POT FGN+Period model to create 2 600 000 synthetic realizations of the same length as our original palaeoflood time series, but with varying intensity of periodicity and persistence, and find optimized model parameters that are congruent with our original palaeoflood series.We create long realizations of our optimized palaeoflood model, and find a high temporal variability of the flood frequency, which can take values of between 0 and > 30 floods century −1 .Finally, we show the practical utility of our opti-mized model realizations to calculate the uncertainty of the forecasted number of floods per century with the number of floods in the preceding century.A key finding of our paper is that neither fractional noise behaviour nor cyclicity is sufficient to model frequency fluctuations of our large and continuous palaeoflood record, but rather a model based on both fractional noise superimposed with a long-range periodicity is necessary.

Introduction
Risk estimates of floods are often based on instrumental records that cover a comparable small time period (e.g. a maximum of 150 years) and might be influenced by anthropogenic activities.Therefore, palaeoflood sequences represent a promising source for understanding the long-term hazard dynamics of the unperturbed climate system (e.g.Kochel and Baker, 1982;Stedinger and Cohn, 1986;Baker, 1987Baker, , 2006;;Ely et al., 1993;O'Connor et al., 1994;Knox, 2000;Yang et al., 2000;Greenbaum et al., 2000;Redmond et al., 2002;Benito et al., 2004;Czymzik et al., 2010;Huang et al., 2013;Swierczynski et al., 2013;Wirth et al., 2013).One important aspect of these dynamics is to understand to what extent flood frequencies can be considered constant over time, and more specifically, whether floods in a given geographic region have any correlations with themselves and whether the floods are clustered in time.Here we examine correlations, clustering, and cycles in a 9336-year record of palaeofloods using varved (annual) sediments from palaeo-Published by Copernicus Publications on behalf of the European Geosciences Union.lake Piànico-Sèllere (northern Italy), and construct a model to take into account the observed behaviour.In this introduction, we present the idea of correlations and clustering in time series, and then summarize the overall organization of this paper.
We first consider correlations in a time series.Consider an uncorrelated time series (e.g. a white noise), where values are independent of one another, i.e. it is equally likely at each time step to have values above or below the median value.To illustrate correlations, take a flood intensity time series, with flood intensity the number of floods per year; a "flood" here might be defined in many different ways.If the flood intensity time series is uncorrelated, when there is a year with a large intensity (a large number of floods occurring, above the median number of floods), it is equally probable to have the next year a number of floods that is above or below the median, i.e. a flood intensity value that is "large" (more floods) or "small" (zero or few floods).In contrast, for a flood intensity time series that exhibits positive correlations, adjacent values will have flood intensities that are on average closer to each other (in intensity) than for an uncorrelated time series; large values tend to follow large ones, and small follow small.Temporal correlations are also referred to as persistence or memory (see Witt and Malamud, 2013, and references therein).Two examples of positive correlations in time series are given in Fig. 1a and b, using a tree ring standardized growth index for Bristlecone pine, White Mountain, California, USA, for the years AD 0-1962 and cosmic ray neutron counts per hour, Beijing, China, 1 January to 11 March 2008.In these two Fig. 1 panels, successive values in both series are positively correlated with one another, and are examples of persistent time series.
Correlations can be both short-range (where only values in a time series close to each other are correlated) or longrange (where all values in the time series are correlated with one another).We will find that the unequally spaced palaeoflood time series used in this paper exhibit both long-range correlations and long period cyclical behaviour, and will focus on long-range (vs.short-range) persistent models in this paper.Long-range correlations have been discussed and documented for many processes in the environmental and Earth Sciences (see Box et al., 2013;Witt and Malamud, 2013, and references therein), with examples including river runoff and precipitation (Hurst, 1951;Mandelbrot and van Ness, 1968;Montanari et al., 1996;Koscielny-Bunde et al., 2006;Mudelsee, 2007;Khaliq et al., 2009;Ghil et al., 2011), atmospheric variability (Govindan et al., 2004) and temperatures over short-to very-long timescales (Pelletier and Turcotte, 1997;Fraedrich and Blender, 2003).The idea of longrange correlations is commonly used to characterize observational and experimental measurement data across many other scientific disciplines: For instance Peng et al. (1993bPeng et al. ( , 1994) ) found long-range persistence in the nucleotide organization of DNA sequences, Kurths et al. (1995) detected longrange correlations in time series from radio astronomy, Peng 1878 1888 1898 1908 1918 1928 1938 1948 1958 1968 1978 1988 Daily discharge (m 3 s −1 ) Year 1878 1888 1898 1908 1918 1928 1938 1948 1958 1968 1978 (1992) and USGS (2017) and described in Malamud et al. (1996).(d) Partialduration flood series, where floods are the largest 116 maximum daily discharge from (c) over the 116-year period, with daily discharges separated by at least 30 days to be considered a flood.
In our partial duration series, 48 of the 116 years have 0 floods, 33 years have 1 flood, 26 years have 2 floods, and 6/2/1 years have 3/4/5 floods.The value of these maximum daily discharges for each flood event is projected to the x-axis (daily discharge = 0 m 3 s −1 ).Below this is shown (green dots) all events along one line, an example of a data series that is strongly clustered in time.Clustering is due to (i) seasonal effects (cyclicity) and (ii) longer-term cycles.et al. (1993a) and Penzel et al. (2003) analysed long-term recordings of heart rate variability.
The palaeoflood time series we will examine in this paper is unequally spaced in time.Some examples given above include unequally spaced time series, such as Ghil et al. (2011), who studied long-range persistence of the unequally spaced Nile River time series.The palaeoflood time series we examine is also integer valued, i.e. is not a continuous onepoint probability distribution.Some long-range persistence research has investigated integer-valued time series.For example, the example given above for Peng et al. (1993bPeng et al. ( , 1994) ) research on long-range persistence of DNA's nucleotide or-Lake Iseo   3 for more detailed time series).The relative age represents years before the top of the stratigraphic section examined, with actual age estimated to be anywhere from 780 to 393 ka (see text).These data, the number of detrital layers per year, have been lodged online at the World Data Centre PANGEA (Mangili et al., 2017).
ganization represented the DNA as sequences with four different types of symbols.In another two examples, Altmann et al. (2012) studied correlations in texts as represented by binary sequences and Schaigorodsky et al. (2014) investigate long-range memory in the opening moves of chess games.
An alternative to examining the temporal correlations of flood intensities (here taken as the number of floods per year) is to examine whether the flood intensities over a given threshold cluster in time.Clustering is the grouping of values in time more than one would expect if the process that created them were random.An example of clustering, in unequally spaced flood magnitudes over a given threshold, is given in Fig. 1c and d.Here we observe the clustering on an annual cycle.Correlation and clustering (or lack thereof) have been studied for different extreme natural events, including earthquakes (e.g.Livina et al., 2005;Hainzl et al., 2006;Lennartz et al., 2008;Davidsen and Kwiatek, 2013), volcanic eruptions (e.g.Nathenson, 2001), floods (e.g.Pelletier and Turcotte, 1997;Milly and Wetherald, 2002), and tropical temperatures (e.g.Blender et al., 2008).
In this paper, we use a record of 771 detrital layers of the laminated sediments of palaeolake Piànico-Sèllere (northern Italy) covering a time span of 9336 years.These 771 layers are interpreted as indictors of flood events which occurred somewhere during the period of 780 to 393 ka (thousands of years ago).In Sect. 2 we give a detailed description of the palaeoflood sequence.In Sect. 3 we explain key definitions and methods used in this paper.Then in Sect. 4 we analyse the distribution of the palaeoflood frequencies per year, decade, and century compared to a Poisson (stationary) model, the one-point probability distribution of the interflood intervals as a potential indicator of clustering, the temporal correlations among the flood events, and cyclicity.In Sect. 5 we model the record of palaeoflood timings as elements of a long-range correlated synthetic time series that exceed a threshold.For this model type, scaling relationships for the distribution of the interevent intervals and the temporal correlations are derived in dependence on correlation properties of the time series and on the applied threshold.In Sect. 5 we also construct a model that consists of multiple realizations of synthetic time series that captures the same clustering and correlation properties that we find in our palaeoflood record.Finally, in Sect.6 we provide a summary of the paper.

Data
As discussed in the introduction, this paper focuses on the correlation, clustering, and cyclicity of palaeofloods.We present here for the first time and use a very comprehensive flood record at sub-annual resolution, obtained from the varved sediments of the Piànico-Sèllere Basin, located in the Borlezza Valley (Province of Bergamo, Italy; Fig. 2a).These data have been lodged online at the World Data Centre PANGEA (Mangili et al., 2017).This palaeolake sequence was first described in the mid 1800s (e.g.Stoppani, 1857), mainly for its fossil content.A detailed stratigraphic study was published by Moscariello et al. (2000).Palaeolake Piànico-Sèllere is located at the foothills of the southern Alps in Italy.Its sediments (45 • 48 N, 10 • 2 E, 280-350 m a.s.l.) are visible in outcrops (Fig. 2b).The sediment formation is more than 48 m thick and extends for about 600 m laterally.It includes four fine-grained laminated stratigraphic units (described more fully below).The size of the palaeolake has been reconstructed to about 3 km in length and 500-800 m in width (Casati, 1968).The palaeocatchment area of Piànico-Sèllere Lake has been estimated to be less than 13 km 2 (Moscariello et al., 2000).
The lacustrine sequence that forms the Piànico Formation (Moscariello et al., 2000) is a 48 m thick stratigraphic interval that includes four units.For the palaeoflood data set created here, only the 9.5 m unit called BVC (Banco Varvato Carbonatico or carbonate varved bed) will be considered (described more fully below).The age of the sediment is still debated.Tephrochronological dating of the sequence gives 393 ± 12 ka (Brauer et al., 2007), which corresponds to the interglacial period at about 400 ka, i.e.Marine Isotope Stage (MIS) 11.This interglacial period is considered the best analogue to the Holocene because of similar orbital parameters (e.g.Loutre and Berger, 2003).However, Pinti et al. (2001) dated a tephra layer in the varved BVC sequence at 779 ± 13 ka, assigning the sequence to the interglacial at about 780 ka (MIS 19).For the purposes of the paper here, we will be less concerned with the actual age and more concerned with the period of overall time that has elapsed, based on interpreting the BVC alternating layers (rhythmites) as varves.
The BVC 9.5 m unit constitutes an almost continuous succession of about 15 500 rhythmites (alternating layers of dark and light sediment), 0.2-0.8mm thick (Moscariello et al., 2000).These endogenic calcite rhythmites have been interpreted to be varves (annual cycles) as they have a structure very similar to Holocene Alpine lake varves, in which calcite precipitation takes place in spring and summer (Kelts and Hsü, 1978;Moscariello et al., 2000).The Piànico-Sèllere varved sequence formed under interglacial conditions, as testified by the flora remains included in the sediments (e.g.Amsler, 1900;Maffei, 1924;Rossi, 2003;Martinetto, 2009) and the oxygen stable isotope values of this interval (Mangili et al., 2007).The calcite varves consist of two layers (Fig. 2b): a lightly coloured and ∼ 0.5 mm thick spring/summer layer formed by up to 96 % of endogenic calcite, and a dark and thin winter layer constituted of organic remains, diatom frustules, and occasional detrital grains (Moscariello et al., 2000;Mangili et al., 2005).Using wavelet analysis, Brauer et al. (2008) found that the varve thickness is partially modulated by solar activity (88 and 208-year cycles), and probably also by the thermohaline ocean circulation (512-year cycle).In the BVC stratigraphic interval, the upper 60 % (9271 out of 15 500 varves) were examined here in detail using the same methodology and location as described in Mangili et al. (2005), but extending the vertical profile from 896 varves (Mangili et al., 2005) to 9271 varves (here).Note that at varve 4019 of 9271, there was found to be a "gap" of time, consisting of 65 missing varves, bringing the total sequence to 9336 varves (with 65 varves missing); this gap is described below in much more detail.
We briefly describe our microfacies sampling methodology (see Mangili et al., 2005 for further details).Continuous vertical profiles of sediment samples were collected from two outcrops stratigraphically similar, 150 m apart: (i) the Main Section, for which data are presented in this paper, and (ii) the Wall Section, a control section which we used to evaluate uncertainties in the Main Section results.Both sections had more than 30 key marker layers so the two outcrops could be correlated with each other.For each vertical section, the outcrops were cleaned with a sharp knife until a smooth and vertical surface was obtained.Then a block of sediment was carved out in situ to enable easy pushing of a special stainless box (33 cm × 5 cm) with removable side walls onto the sample.Samples were taken with at least a 5 cm vertical overlap, ensuring that a marker layer, allowing correlation, was present in both samples.The samples were then slowly dried at room temperature to avoid shrinkage and cracking and covered with a transparent epoxy resin, resulting in resin impregnation of the surface layer (1-2 mm) of the sediments.Samples were cut into two halves with the fresh surface again carefully dried and impregnated.From one half, 10 cm thick samples with a 4 cm overlap for final thin-section (120 mm × 35 mm) preparation were cut out.The thin sections were analysed with a petrographic microscope.For measurement of varve and detrital layer thickness, 100× magnification was chosen.
Approximately 8 % of the varve couplets contain also one or two detrital layers (Fig. 2b), which we describe in more detail below.The detritus is mainly constituted of fine, siltsize Triassic dolomite from the catchment (Mangili et al., 2005).Only in the thickest described layers are found finesand sized particles in minor amounts.No detrital layers contain gravel.Due to their composition and grain size, these detrital layers can easily be distinguished from the background endogenic sedimentation of the lake.These detrital layers are considered to be the result of channelized streamflow that originated in the hills surrounding the Piànico-Sèllere Basin and triggered by extreme precipitation events (Mangili et al., 2005).The site of the outcrops from which the samples have been taken have been explicitly selected to avoid gravel sediments which could cause hiatuses through erosion.We interpret the site to be a low-energy sedimentary environment in a distal position of the inflowing water.The position of a detrital layer within a varve allows the identification of the season in which the extreme precipitation event/flood took place: a "spring" detrital layer settled before the beginning of endogenic calcite precipitation, a "summer" detrital layer is within the varve summer layer and a "fall/winter" detrital layer is at the top of the calcite layer or included in the winter layer (Mangili et al., 2005).Due to the reduced thickness of the varve winter layer (0.06 mm mean), fall and winter detrital layers are not distinguished here.Another layer type (called matrix-supported) has also been observed in the varved sediments (Mangili et al., 2005); these layers are thought to result from reworking processes within the lake, are not linked to extreme precipitation events and will, therefore, not be taken into consideration in this paper.
In our analysis, we will focus on the temporal succession of detrital layers (flood events), the series of events that is graphically presented in Fig. 2c, and which consists of 771 single palaeoflood events occurring during a time interval of T = 9336 years.There are 9271 varves present, each taken to represent 1 year, in addition to a "gap" of 65 years (varve index 4019-4083) which is due to a slump at the Main Section.The varve structure there was not recognizable.To ascertain the temporal period of the gap, we correlated three marker layers between the Main Section's disturbed interval with an undisturbed sequence at the Wall Section, where we counted the number of varves, and were able to conclude that the gap in the Main Section is 65 years.The flood events have each been given a specific varve index (relative age) in the time series.In this paper, we will use the following notation (see also Table 1 for a list of all abbreviations and notations used in this paper) for the observed data with respect to varves and detrital (flood) layers: n year (t), t = 1, . . ., N varves is the number of detrital layers per varve (flood events per year) where the index t is the varve index (starting from the top varve) and represents relative age in years (see above), and N varves = 9336 years is the length of the observational period, i.e. the total number of varves including the "gap" of 65 years.The timings of the considered 771 palaeoflood events are transformed into three event series (Fig. 3), the number of floods (detrital layers) per year (n year ), per decade (n decade ), and per century (n century ), over the 9271 years of palaeoflood data (over a 9336-year record).These three data sets are integer-valued time series which contain N year = 9271, N decade = 925, and N century = 92 data points, and will provide the basis for our time series analyses in Sect. 4. In these three palaeoflood time series (Fig. 3) which represent our data, we observe that there are time periods with little fluctuation of the number of floods per century (i.e. the 24th to 18th centuries with values over the range 10-16 floods century −1 ).However, we also see sudden transitions from many to a few floods (as from 24 to 4 floods century −1 from the 39th to 38th centuries) as well as from a few to many floods (as from 7 to 25 floods century −1 from the 46th to 45th centuries).
The thicknesses of the detrital layers representing the 771 flood events that we investigate in this paper range from 0.02 to 23.00 mm (with quartiles of 25 %: 0.07 mm, 50 %: 0.10 mm, and 75 %: 0.18 mm) (see Appendix Fig. A1).Although detrital layers, which are mainly constituted of siltsized dolomite, indicate the occurrence (season and year) of an extreme precipitation event (flood), the thickness is not taken as representative of the magnitude (e.g.peak discharge or flow velocity) of the event.This is because the thickness of a layer can also be influenced by the distance between the source of the detritus and the studied section and the amount of detritus available at the time of the extreme precipitation and weaker vs. stronger rainfall during the extreme precipitation events (Kämpf et al., 2012(Kämpf et al., , 2015)).In addition, over the 9336-year period we investigate, the source of given rainfall events might result in different thickness signatures (Swierczynski et al., 2012(Swierczynski et al., , 2013)).We also note that although we 2) a 100-year portion of (A.1) (from t = 8090 to 8190 years) is expanded with an illustration of 4 flood years that have detrital layers (floods) in them, and the interevent occurrence times between them.These 4 flood years, each with n year ≥ 1 flood in them, appear as yellow symbols in all panels.(b) Interevent occurrence times, , temporally located (i.e. as a function of the relative age) when they occurred.(c) Interevent occurrence times, j , plotted as a function of "natural" time, where each is no longer represented temporally when it occurs, but rather successively one after another, j = 1, 2, 3, . . ., 741; interevent occurrence times of j = 0 are not shown.The detrital varve index j includes only those years (varves) where there are n year ≥ 1 floods yr −1 (i.e. at least one detrital layer in that year).
can identify when a flood is thought to have occurred, this does not preclude the possibility that other floods occurred but were not identified as a detrital layer.We do acknowledge research (e.g.Corella et al., 2014;Schillereff et al., 2014) where the type and thickness of detritus in Holocene varves are used to make interpretations of the strength of a hydrometeorological event; however, due to the age of our sediments this sort of analysis of event magnitude was not possible here.In this paper, we therefore focus on just the temporal attributes of the 771 floods themselves over this 9336-year period in the Pleistocene.See data availability for access to the palaeoflood database.
In terms of the relative temporal uncertainty of the floods (based on the varve sequence), a flood in a given year (represented by the detrital layers) can (very rarely) erode some of the sediment underneath it, forming a hiatus of missing sediment.This limitation is discussed in detail in Mangili et al. (2005), who analysed, for 10 % of the outcrop presented in this paper, the same stratigraphic intervals in both the Main Section and the Wall Section, again 150 m apart.Out of the 896 varves they examined in both sections, they found that 3 varves out of 896 were not present (e.g.eroded away by a flood) in the Wall Section but were present in the Main Section, and that 4 varves out of 896 were not present in the Main Hydrol.Earth Syst.Sci., 21, 5547-5581, 2017 www.hydrol-earth-syst-sci.net/21/5547/2017/ Section but present in the Wall Section.Overall, this represented a total of 889 out of 896 varves (99.2 %) in common between the two outcrops, which means that if we extend their results to the other 90 % of the outcrop, some uncertainty might exist, with up to 1 % of the varves "missing" in time, which should have a very small impact on the relative timing of floods discussed in this paper.The analyses in this paper will concentrate on the fluctuations in flood frequency over time (i.e. the number of floods per year and per decade), examining the temporal correlations, clustering, and cyclicity of the flood frequencies.Before proceeding to the analyses of these time series, we provide next (Sect.3) the definitions and methods used in subsequent sections.

Definitions and methods for the analysis of event series
Both correlations and clustering were introduced in Sect. 1.
Here we provide more in-depth definitions and explanations of clustering and correlation methods that will be used in examining our palaeoflood time series.In Sect.3.1, we introduce interevent occurrence times (IEOTs).Then, in Sect.3.2, Poisson processes, a model for non-correlated and nonclustered time series.Section 3.3 describes the Weibull distribution (in the context of IEOTs) as an indicator of clustering.Section 3.4 introduces autocorrelation as a method to quantify short-range and long-range correlations in a given time series.In Sect.3.5, we describe power-spectral analysis and detrended fluctuation analysis as methods for quantifying long-range correlations.Finally, in Sect.3.6 we briefly introduce fractional noises.

Interevent occurrence times
In this paper, we will consider the number of palaeofloods per year, decade, and century as event series in time.As indicators of clustering and/or correlations (or lack thereof), we will apply statistical methods to the event magnitudes (i.e. the number of floods per year, decade, or century).We will also consider the time intervals between successive events, i.e. the interevent occurrence times (IEOTs) which we introduce here.We will later use the statistical distribution of the IEOTs as an indicator of clustering (or lack of) in the original time series.We introduce IEOTs in the context of our yearly palaeoflood time series (top panel of Fig. 3) where, as previously, t is the relative age in years in our time series and n year is the number of floods per year.First, we define a flood year to be any year t with at least one flood in it, i.e. n year (t) ≥ 1 flood yr −1 .Second, we define IEOT as the time interval between successive years that have one or more floods, i.e. more formally, n year (t) ≥ 1 flood yr −1 and n year (t + ) ≥ 1 flood yr −1 , but for all years in-between, n year (t + l) = 0 floods yr −1 for l = 1, 2, . . ., −1.
In Fig. 4a we illustrate finding the IEOTs: we present the number of detrital layers per year (Fig. 4a.1, which is the same as the top panel in Fig. 3) with as inset (Fig. 4a.2) an example of four detrital layers with IEOTs of = 52, 3, and 26 years.In Fig. 4b we give all the as a function of relative age t for the entire palaeoflood data set.The values of range from 1 to 125 years (with quartiles of 25 %: 3 years, 50 %: 7 years, and 75 %: 15 years) and a mean = 12.5 years.We then use a subscript j to indicate successive IEOTs, j , which we plot using "natural" time, in other words where each is no longer represented temporally when it occurs, but rather successively one after another, j = 1, 2, 3, . . ., N , with the total number of natural time intervals N = 739 (we discard the j value corresponding to the gap).In Fig. 4c we plot the same from Fig. 4b, but now as a function of the detrital varve index (natural time), j = 1, 2, 3, . . ., N .
The use of natural time (i.e. the time between all events is "equal" in natural time) to represent unequally spaced events in time series has been mainly used to examine seismicrelated time series (e.g.Uyeda et al., 2009;Rundle et al., 2012), but has found use in a variety of other disciplines ranging from biology and environmental sciences to cardiology (see Varotsos et al., 2011, for a review of its use in various disciplines).

Poisson process as a model for an uncorrelated non-clustered time series
The standard example of an event series that is uncorrelated in time, non-clustered, and stationary in time is the realization of a Poisson process.In later analyses (Sect.4.1) we will compare the statistical distribution of the magnitudes in our three palaeoflood time series (Fig. 3) to those of a Poisson process, thus inferring the lack of correlations and clustering (if a Poisson process) or the presence of correlations or clustering (if not a Poisson process).If a real-world process (e.g.flooding) by which a time series results is assumed to be Poissonian, then many synthetic realizations of that Poisson process can be created, and their statistical properties confronted with those of the observed "real-world" time series.
Poisson processes have been found in some cases to model the temporal occurrence of floods (e.g.Kirby, 1969;Todorovic and Zelenhasic, 1970) and in other cases to not be an appropriate model (e.g.Mudelsee et al., 2004, who showed that the winter floods of the Elbe River from 1500 to 2000 cannot be modelled by a stationary Poisson process).
For a time series to be considered a realization of a Poisson process, the following must be true (Cox and Lewis, 1978): i. the time series elements are non-negative integers; ii. the time series considered is stochastic (i.e. has no obvious trends and periodicities and no deterministic components); iii. the one-point probability distribution of the time series elements follows a Poisson distribution, that is, a positive integer-valued random variable X, whose probability densities P λ are defined as (Cox and Lewis, 1978) where "!" means factorial, exp is the exponential function, λ is the rate parameter with λ > 0, and n is the intensity (number of occurrences) of the "random" variable per given time unit (e.g.hour, year, decade, century).The rate parameter λ is a physical quantity and thus has units (time unit) −1 , but the Poisson process as given in Eq. ( 1) is a mathematical model and P λ does not have units.
iv. the times between successive events (interevent occurrence times, IEOTs) are independent in time (i.e.there is no correlation between one IEOT and others, so events occur independently of one another); v. the one-point probability distribution P of the IEOTs is exponential with with the same rate parameter λ as in Eq. ( 1) and the time unit of the original time series being small enough that almost all time units have just no or one event (i.e.λ 1).
The Poisson process as defined in Eq. ( 1) does not lead to any temporal clustering.The one-point probability distribution for the Poisson process for 0 < λ ≤ 1 had no mode, and for λ > 1 it has a defined mode.

Weibull distribution of IEOTs as an indicator of clustering
If the j (the IEOTs), in addition to not being exponentially distributed, are Weibull distributed, this is taken as an indicator of clustering of the original time series (Bunde et al., 2005;Witt et al., 2010).The two-parameter Weibull probability distribution (Weibull, 1951) is a standard waiting time distribution, i.e. frequently used for modelling the time intervals between successive events (Cox and Lewis, 1978).The continuous (vs.discrete, as in Eq. ( 1) for the Poisson distribution) two-parameter Weibull probability distribution is given by (Weibull, 1951) where the two parameters are λ W for scale and k W for shape, and is any real number ≥ 0. When the shape parameter k W = 1.0, the two-parameter continuous Weibull distribution (Eq.3) turns into that is, the exponential distribution which describes the IEOT distribution of a Poisson process with a parameter λ = 1/λ W (see Eq. 1).
For shape parameters 0.0 < k W < 1.0, the two-parameter Weibull distribution (Eq.3) is heavy-tailed (i.e.asymptotically scale invariant) with a tail parameter of (1−k W ), which means that the probability density for very large values in Eq. ( 3) scales with a power law with the tail parameter as power-law exponent.

Autocorrelation as a method for quantifying temporal (short-range and long-range) correlations
There are a variety of methods that can be used to explore and quantify temporal correlations both in observed time series and realizations of a modelling process.As discussed in Sect. 1, correlations (persistence) have been studied in many environmental time series.Measures for correlations quantify the statistical dependence between variables, and in particular, measures for temporal correlations describe the intensity of this relation between time series elements with a fixed temporal distance.To quantify the strength of correlations, we use autocorrelation analysis applied to the time series of number of floods per year (n year ), decade (n decade ), and century (n century ) (Fig. 3).The autocorrelation function C (e.g.Priestley, 1982) for the number of floods is defined as where n j is a time series of the number of floods per year (decade, century) with j = 1,2, . . ., N, with N = N year , N decade or N century (i.e. the total number of years, decades, and centuries in the record), n is the mean number of floods (per year, decade, century) for the entire record, τ is the time lag (in years, decades, centuries), and σ 2 (n) is the variance of the time series.Positive values of the autocorrelation function are indicative of a process that on average, for a given lag τ , has positive persistence between values separated by that lag τ .In other words, if there are a large number of floods in a given year, on average, τ years later will also be followed by a large number of floods (compared to the mean number of floods per year overall).The converse is also true: if there are few floods in a given year, on average, τ years later will also be followed by only a few floods (compared to the mean number of floods per year overall).Negative values of the autocorrelation function are indicative of negative persistence, whereby if there are a large number of floods in a given year, on average, τ years later will be followed by a very few number of floods (compared to the mean number of floods per year overall).
If the correlations are essentially linear and thus can be described by the autocorrelation function, two types of correlations can be considered: (i) short-range correlations (Priestley, 1982;Box et al., 2013) and (ii) long-range correlations (e.g.Taqqu and Samorodnitsky;1992;Beran, 1994;Malamud and Turcotte, 1999).
Short-range correlations are characterized by a decay of the autocorrelation function C(τ ) (Eq. 5) that is bounded by an exponential decay for large lags, τ : where τ 0 , γ 0 , and γ are non-negative constants.In particular, this definition applies for time series with a finite correlation length (C(τ ) = 0 for τ > τ 0 ).Statistical models for shortrange correlated time series include autoregressive (AR) and moving average (MA) processes (Priestley, 1982).
In contrast to short-range correlated time series, longrange correlated time series approach a power-law decay of the autocorrelation function C(τ ) (Eq. 5) for large lags τ : The parameter β (0.0 <β< 1.0) is the strength of the longrange correlations.The autocorrelation function has two limiting values: β = 0.0 (which represents short-range persistence between the time series elements) and β = 1.0 (pink or 1/f noise).Koutsoyiannis and Montanari (2007) have shown that the statistical uncertainty of the mean value of (hydrological) time series is increased in the presence of correlations, and in particular long-range correlations.

Power-spectral analysis and detrended fluctuation analysis (DFA) for quantifying long-range correlations
In the last section, we used the autocorrelation function as one method to quantify long-range (and short-range) correlations.Here we describe two more methods for quantifying long-range correlations: power-spectral analysis and detrended fluctuation analysis.We focus on long-range (vs.short-range) correlations as these form the main part of our analyses and modelling in later sections.These two methods are both more robust than autocorrelation in quantifying long-range correlations (Witt and Malamud, 2013).
Long-range correlations are reflected by a scaling of the power-spectral density S (square of the modulus of the Fourier coefficients appropriately normalized) with frequency f .The power-spectral density S exhibits a power-law scaling such that (Taqqu and Samorodnitsky, 1992;Beran, 1994) where f is the frequency and the relationship holds (for longrange persistence) over all β (Pelletier and Turcotte, 1999;Witt and Malamud, 2013).Positive exponents (β > 0.0) in Eq. ( 8) represent positive (long-range) persistence and negative ones (β < 0.0) anti-persistence.The specific case of β = 0.0 corresponds to an uncorrelated time series (e.g. a white noise), and a value of β = 1.0 is known also as a 1/f or pink noise (Mandelbrot and van Ness, 1968;Bak et al., 1987).Some examples of these long-range persistent time series are given in the next Sect.3.6.In this paper, to avoid confusion with other estimators (e.g.DFA; see next), we will indicate the measurement of the strength of long-range persistence by power-spectral analysis using the notation β PS.
Another common method for quantifying long-range correlations is detrended fluctuation analysis (DFA) (e.g.Peng et al., 1994;Kantelhardt et al., 2001).Here, the scaling properties of long-range correlated time series are quantified in terms of the fluctuation function.In this paper the number of floods per year, decade, or century can be analysed by DFA; we call these time series here x t , t = 1, . . ., N. The fluctuation function is based on the running sums (or profile) of the considered time series x t , t = 1, . . ., N: This time series of the running sums s t , t = 1, . . ., N is then split up into non-overlapping segments of length l.For the kth segment of the running sums, s k,i , i = 1, . . ., l, the fluctuation is determined as the variance of the difference of this segment and its best-fitting polynomial trend t k,i , i = 1, . . ., l (with the polynomial order k, usually between 1 and 4), The fluctuation of the time series is the mean of the fluctuation of the segments: where k is the mean value taken over all fluctuations of length k segments.If the underlying time series x t , t = 1, . . ., N has long-range correlations, then the fluctuation function F (l) exposes for long segment lengths l a power-law scaling and will scale as (Peng et al., 1992) The strength of long-range correlations, β, is related to the scaling parameter of the fluctuation function, α, as β = 2α−1.If polynomials of order k are considered, then the resultant estimate of the long-range dependence is called DFAk (e.g.DFA1, DFA2).In this paper we will calculate α using DFA1 to DFA4.We will provide results using For Eqs. ( 7) (autocorrelation analysis), (5) (power-spectral analysis), and (9) (DFA), in each case we have an inverse power-law decay with increasing temporal scales (lag, frequency, temporal segment length), which defines a self-affine time series (the time series is statistically self-similar when comparing different temporal scales) (Mandelbrot and Van Ness, 1968).If the power-law exponent held over "all" temporal scales (which it rarely does), then the correlation length (the largest lag or temporal scale at which there are still statistical correlations in the time series) would be infinite.One potential significance of this "infinite" correlation length is that "all" values are statistically correlated with one another in the time series.A second significance is that the time series can be explained as a stochastic fractal (Malamud and Turcotte, 1998).In the case of our palaeoflood series this means potentially that the flood timings are organized in time as the points of a Cantor dust (see Ott, 1993, for a discussion of Cantor sets).
Extensive details about power-spectral analysis, DFA, and software written in R (R Core Team, 2013) for performing time series analysis using these techniques are given in the review paper by Witt and Malamud (2013).This latter study also investigates biases of both techniques which typically occur when they are applied to time series that have a onepoint probability distribution that is strongly non-Gaussian and/or the time series has very few values (e.g.just a few thousand data).

Fractional noises as examples of long-range correlated time series
Fractional noises are standard examples of long-range correlated time series (Malamud and Turcotte, 1999;Mandelbrot, 1999), which we will use here to help us model long-range persistent characteristics of our palaeoflood time series.Examples of six fractional noises are shown in Fig. 5, with the strength of long-range persistence ranging from β = 0.0 (a white noise) to β = 1.0 (a pink noise), in 0.2 increments.In the white noise in Fig. 5, the values are uncorrelated with each other, and as we increase the strength of long-range persistence, the values, although drawn from the same underlying probability distribution, become more correlated with one another (and their clustering increases).
4 Results of statistical metrics, clustering, correlations, and cyclicity of the palaeoflood sequence In this section, we first statistically analyse the 771 timings of palaeofloods in terms of their elementary metrics of the number of palaeofloods per year, decade, and century compared to a Poisson process (Sect.4.1).We then examine the probability of IEOTs compared to a Weibull distribution as a potential indicator of clustering (Sect.4.2).Next, we use autocorrelation of the palaeoflood time series to characterize the temporal correlations (Sect.4.3) and autocorrelation of the IEOTs as another potential indicator of clustering (Sect.4.4).We then characterize the temporal correlations of the time series using detrended fluctuation analysis and power-spectral analysis (Sect.4.5).Finally, we characterize the cyclicity of the palaeoflood time series using a sinusoidal model (Sect.4.6).As the techniques we used are designed for time series with a continuous one-point probability distribution (e.g.Gaussian, log-normal, or Levy distribution) vs. the integer values that are found in our palaeoflood time series, we throughout also show the statistical significance of our findings.

Number of palaeofloods per year, decade, and century, compared to a Poisson process
We now examine the number of floods per year, decade, and century, as if the process that created them resulted in a series of floods uncorrelated in time.Here, we will consider a time series to be the realization of the underlying process.If we are able to show that our time series has properties different from a Poisson process (Sect.3.2), we can infer that it is correlated or clustered.We will use as a model a Pois- son process with a constant rate parameter to create model time series of flood events per year, decade, and century (i.e.similar to the palaeoflood time series shown in Fig. 3).We will see below that the resultant model realizations will be stationary, uncorrelated in time, and non-clustered, and then show in subsequent sections that this modelling is inappropriate for our palaeoflood time series.We first interpret the number n in Eq. ( 1) as the number of detrital layers per varve (n year , floods yr −1 ).The rate parameter λ controls the probability of an event: the mean num-ber of events per year (i.e. per varve) is n year = λ year and the variance of the number of events is σ 2 n year = λ year .This equation holds only mathematically but not physically, i.e. it holds for the numbers but not for the units.The probability in Eq. ( 1) is not time dependent, and rather only depends on (if we take our unit of time to be 1 year) the probability of n year = 1 flood yr −1 , n year = 2 floods yr −1 , etc., and not the relative temporal spacing of the floods in time.To model the series of the number of flood events per year in the 9.3 kyr Piànico-Sèllere palaeoflood time series, we use a (constant) rate parameter of λ = 0.083 floods yr −1 that is the mean number of detrital layers per varve, i.e. (771 floods)/(9271 years) = 0.083 floods yr −1 , where we discard the 65-year gap for the total number of years considered.
As discussed in Sect.3.2, another consequence of considering the number of floods per year as a Poisson process is that the time interval between two successive events (i.e. the IEOTs between two successive floods) follows an exponential one-point probability distribution P for very small time units (Eq.2) with rate parameter λ.Now, the relative ordering between single floods is taken into account.For example, if in a given decade we have three varves, each with one flood, Eq. ( 2) will be different if the 3 years with floods are in years 1, 2, and 3 ( 1 = 2 = 1 year, 3 ≥ 8 years because the fourth flood will be in a subsequent decade) vs. one flood each in years 1, 5, and 9 of the decade (i.e. 1 = 2 = 4 years, 3 ≥ 2 years).Note that the time interval can have non-integer values, for instance if two or more events (floods) occur in a single time unit (within 1 year).
For our palaeoflood time series (Fig. 3), we now consider whether the histograms (horizontal bars on the right-hand side of Fig. 3) of the observed number of years, decades, and centuries with a given number of floods per year, decade, and century follow Poisson distributions.For each palaeoflood time series, we create a Poisson model which consists of 100 realizations of a Poisson process, each realization with 9271, 927, and 92 (respectively, for year, decade, and century) time series elements and with rate parameters λ = λ year = 0.083 floods yr −1 = λ decade = 0.83 floods decade −1 = λ century = 8.3 floods century −1 (the measurements values are different as well as the units; the differences cancel out because of multiplication of measurement values and units).We discard the 65-year gap for the purposes of this model.Each realization therefore has values that are uncorrelated in time and follow a Poisson distribution (Eq.1).Many common software programs (e.g.Excel, R, Matlab) are able to easily generate such realizations.
We give the results of our Poisson model for each respective time resolution (year, decade, century) in Fig. 6a as diamonds (mean of the 100 realizations) ± error bars (standard deviation of the 100 realizations).We find that for the yearly data, the number of years observed (k year ) with a given number of floods per year, n year = 0, 1, 2, 3 floods yr −1 , follows closely the number of years given by the Poisson model.For example, returning to the values given above, the number of years with n year = 0, 1, 2, 3 floods yr −1 is observed to be k year = 8530, 712, 28, and 1 years (respectively), and with the Poisson model k year = 8531 ± 26, 709 ± 25, 30 ± 5, and 1 ± 1 years (mean ± standard deviation) (respectively).In contrast, for both the decadal and centennial data, the observed data set contains decades and centuries with many fewer or greater floods compared to the model data (see Fig. 6a).For example, the number of decades with n decade = 0, 1, 2 floods decade −1 is observed to be k decade = 453, 283, and 123 decades (respectively), with the Poisson model giving k decade = 406 ± 15, 338 ± 13, and 140 ± 11 decades (mean ± standard deviation) (respectively).
We therefore conclude that at the yearly scale the observed data follow a Poisson process, but at the decadal and centennial scales, the time series cannot be modelled as a Poisson process.In subsequent sections we will explore whether clustering, correlations, or both are responsible for this.

Probability of interevent occurrence times (IEOTs) compared to a Weibull distribution as a potential indicator of clustering
Here, we analyse the one-point probability distributions of the interevent occurrence times (IEOTs) (Sect.3.1).As discussed in Sect.3.2, a realization of a Poisson process is characterized by "events" that are uncorrelated with one another and the j have an exponential distribution (Eq.2).Conversely, it means that if the j (the IEOTs) are nonexponentially distributed, the process is non-Poisson.In Fig. 7a, we give the frequency density of (the one IEOT that includes the sediment gap is excluded) of the flood record along with an exponential distribution which represents the interevent occurrence times of events in a Poisson process.The distribution of the IEOTs has a higher number of both short and long time intervals compared to an exponential distribution.This observation is supported by statistical hypothesis testing where an Anderson-Darling test (Anderson and Darling, 1952) was adjusted to integer values of the IEOT distribution (using a method described in Choulakian et al., 1994;Arnold and Emerson, 2011), and the test rejected the null hypothesis that this distribution can be explained as the result of a Poisson process (pvalue < 0.002).We have shown above that our data are not a Poisson process, and now wish to further characterize the distribution of the IEOT.For the palaeoflood IEOTs, we use a maximumlikelihood estimation (MLE) and assume a two-parameter Weibull distribution (Eq. 3 and discussed in Sect.3.3).We find (see Fig. 7a) a best-fit shape parameter for the Weibull distribution of k W = 0.91 ± 0.024, where the error bars represent the 95 % confidence intervals.However, this result is questionable as the MLE technique is tailored to a Weibull distribution as a continuous probability distribution, but the interevent occurrence times have discrete (integer) val-ues.Therefore, we use benchmarks (500 realizations of a Weibull distributed model for random numbers) for fitting Weibull distributions to integer-valued data.We find that the MLE estimator tends to over-fit the shape parameter by 0.12, i.e. we can correct our estimate of the shape parameter to k W = 0.79 ± 0.024.

Autocorrelation of palaeoflood sequences to quantify correlations
We now use the autocorrelation function (ACF) discussed in Sect.3.4 and apply it to our palaeoflood sequence as a potential indicator of correlations.In our analyses of the palaeoflood sequence, only pairs of years, decades, or centuries n j , n j +τ which do not contain the 65-year gap in the 9336year sequence of floods (detrital layers) are considered.The autocorrelation function (Eq.5) applied to the time series of the number of floods per year, decade, and century is given as correlograms in Fig. 6b (squares).This correlogram of the yearly floods shows significant positive correlations with a decaying trend and no obvious periodic components for lags of 1 ≤ τ ≤ 200 years.Due to the strongly fluctuating behaviour of the annual data's correlogram (squares) in Fig. 6b it is difficult to determine the functional form of the decay, i.e. to decide whether the decay is exponential (thin tailed, short-range persistent) or approaching a power law (heavy tailed, long-range persistent).Therefore, we apply the autocorrelation function to the time series of the number of floods per decade (with respect to non-overlapping 10-year time intervals) and find (Fig. 6b) a power-law behaviour with a power-law exponent for the decadal time series of γ ACF = 0.34 ± 0.04 (fitted for lags 1 ≤ τ ≤ 20 decade, which is equivalent to 10 ≤ τ ≤ 200 years) with uncertainties ±1 standard error of the exponent.Comparing to Eq. ( 7), γ ACF = (1−β ACF ), giving β ACF = 0.66 ± 0.04.We will return to this power-law model (Eq.13) in Sects.4.5 and 5.2.We conclude that for the floods per decade time series, when using the autocorrelation function, the data exhibit longrange correlations over the range 10 ≤ τ ≤ 200 years.
The time series of the number of floods per century contains only 92 data points.We therefore have calculated the autocorrelation function C only for lags τ = 1 century and τ = 2 century as graphically presented in Fig. 6b.Based on this figure, we see significant positive correlations and thus confirm the results found for the temporal correlations of the number of floods per year and decade.Due to the very few lag values, however, we cannot discuss the functional shape of this autocorrelation function.
We conclude that the three considered time series n year , n decade , and n century have positive autocorrelations for lags of τ < 200 years.For the number of floods per decade, n decade , we find indications of a power-law shape of the autocorrelation function which indicates long-range persistence of the time series over lags τ > 200 years.However, we will see later (Sect.4.6) for lags τ > 200 years that a power-law model cannot be conclusively fit, and thus over all lags, we cannot conclusively use ACF as a robust quantifier of long-range correlations.

Autocorrelation of interevent occurrence times (IEOTs) as a potential indicator of clustering
Another indicator of clustering is positive temporal correlations of the IEOTs (Sect.3.1).Such correlations are particularly caused by a "lumping" of small (large) IEOTs which results in time intervals with an increased (decreased) flood rate which correspond to flood clusters (phases of little flooding activity).In Fig. 7b, we examine correlations (vs.clustering) of the IEOTs by applying the autocorrelation function (Eq.5) to the j for the IEOTs.We find significant positive correlations of the j for flood year lags 1 ≤ τ ≤ 20 (no units).This indicates that for the IEOT, small values of j tend to follow small ones, and large ones tend to follow large ones.Similar to Eq. ( 13), the correlogram (see Fig. 7b) of j exhibits a power-law behaviour with a power-law exponent of γ = 0.45 ± 0.14 (fit for lags of 1 ≤ τ ≤ 20 IEOTs, which is equivalent to 12.5 ≤ τ ≤ 250 years).Power-law correlations for IEOTs are reported for a class of theoretical models by Bunde et al. (2003) and Eichner et al. (2007), who have studied peaks over thresholds of long-range correlated time series.We will explore this in more detail below.
In summary, we have analysed the one-point probability distribution and the autocorrelation function of the IEOTs of the palaeofloods with respect to a time resolution of 1 year (i.e.we have not used time resolutions between palaeofloods that are sub-annual, such as 3 months).The distribution of the IEOTs can be well approximated by a Weibull distribution with a shape parameter of k W = 0.78.Therefore, the IEOTs are more likely to be very short or very long temporal periods compared to a random (uncorrelated) occurrence of the IEOTs, and therefore the palaeoflood time series is not a realization of a Poisson process.Furthermore, the IEOTs have positive temporal correlations.This is particularly caused by a clustering of the very high and very low IEOTs and thus by a temporal clustering of the floods.The autocorrelation function seems to follow a power-law behaviour.

Detrended fluctuation analysis (DFA) and power-spectral analysis for quantifying long-range correlations and as a potential indicator of cyclicity
We now apply both DFA and power-spectral analysis (both discussed in Sect.3.5) to our time series given in Fig. 3  Sect.3.5, both methods are more statistically robust than autocorrelation in quantifying long-range correlations.Methods for quantifying long-range correlations have systematic and random errors in the resultant estimator for time series with very asymmetric probability distributions and the finite size effects of the time series.The systematic error is how much the resultant estimator deviates from the underlying "correct" value, and random errors are the spread of the estimated quantity.Witt and Malamud (2013), using 17 000 synthetic benchmark series, studied systematically these two types of errors for power-spectral analysis, DFA, and semivariogram analysis.ACF gives similar results of robustness to semivariogram analysis due to the two methodologies being very similar.Witt and Malamud (2013) found that the systematic and random errors (biases and standard errors) of DFA and PS analysis are significantly smaller (particularly for time series with non-Gaussian one-point probability distribution, and with very few values in the time series) compared to semi-variogram analysis, and both DFA and PS analysis are appropriate for a much broader range of long-range persistence strengths compared to semi-variogram analysis.We therefore only consider power-spectral analysis and DFA when considering the quantification of long-range correlations.
We start with our yearly flood record n year which contains 771 floods and extends over 9336 years.It has a heavily skewed one-point probability distribution with a coeffi-cient of variation of c v = 3.46.Witt and Malamud (2013) showed that very asymmetric one-point probability distributions lead to a systematic underestimation of the persistence strength for both techniques.Due to the palaeoflood time series elements being integer valued and in a small range (0 ≤ n year ≤ 4 floods yr −1 ), both techniques lead to uncertain results.Therefore, we will not apply power-spectral analysis and DFA to the annual palaeoflood data set.But we will quantify long-range persistence of the decadal and centennial flood series, as the data are less heavily skewed (c v = 1.30 and 0.77 respectively), and where there is a wider range of integer values (0 ≤ n decade ≤ 8 floods decade −1 and 0 ≤ n century ≤ 31 floods century −1 ).
In Fig. 8 we show the results of the DFA analysis of n decade .The fluctuation functions for different orders of detrending are shown on logarithmic axes.The almost linear shape indicates power-law behaviour of the fluctuation function (see Eq. 12) for DFA3 and DFA4.DFA1 (DFA2) follows this linear shape just for segment lengths up to 70 (100) decades.We expect a slowly varying trend (a longterm cyclicity > 700 (1000) years) to cause this dependence on the type of detrending and will analyse it in more detail in Sect.4.6.The power-law exponent for the decadal palaeoflood time series was estimated (for DFA3) as α DFA = 0.62, which corresponds to a strength of long-range persistence of β DFA = 0.25.This is a weak strength of long-range persistence.The third data set that counts the number of floods per century contains just 92 time series elements, which is not enough for a reliable DFA.
Power-spectral analysis for the decadal palaeoflood series n decade is shown in Fig. 9.Although the power-spectral density S has a lot of scatter, it is clearly oriented along a line in the log-log axes, and thus exposes power-law (selfaffine) behaviour (see Eq. 8).The persistence strength (i.e. the magnitude of the power-law exponent of the powerspectral density S) is estimated as β PS = 0.39.This again indicates a weak strength of long-range persistence.The centennial palaeoflood series n century is not suited for powerspectral analysis due to the short length of N century = 92 centuries.
We also see the hint of long-term periodic components superimposed onto the long-range persistence that is identified using power-spectral analysis.In Fig. 9 are given three coloured lines that represent the 95th percentiles (95 %), described in detail in a later section (Sect.5.1).What we observe is that three values (large blue dots, outlined with a grey ellipse) are above the three 95 % lines (i.e. they are significant with p < 0.05) at a cyclicity of 1300 to 2300 years.Again, as for DFA, we will explore this long-term cyclicity (fluctuation) that appears to be superimposed onto the longrange persistent signal in the next subsection.
We now consider both the systematic and random errors that can occur when using DFA and power-spectral analysis by using the techniques given in Witt and Malamud (2013) and their systematic study using thousands of benchmark Power-spectral analysis: the power-spectral density (periodogram) is presented as a function of the frequency (measured per decade) for the number of floods per decade.Also given is the bestfitting power-law function (Eq.8) and the corresponding power-law exponent.Additionally, presented are the 95th percentiles of the power spectra created for synthetic data based on peaks over thresholds of fractional noises (see Sect. 5.1) for different strengths of persistence (see the legend and the description in the text).Highlighted are the three values of the power-spectral density that exceed all of these 95th percentiles; the corresponding cycle lengths are given.
synthetic time series to calibrate the "error" in each method for a given one-point probability distribution, length of time series, and strength (uncalibrated) of correlations.We ignore any underlying periodicity as a contributing factor to any errors.We find for our palaeoflood per decade time series with 927 values and a coefficient of variation of c v = 1.3 that β PS would lead to a calibrated β * PS = 0.52 with (0.32, 0.71) the 95 % confidence interval and that calibrating the value of β DFA would give β * DFA = 0.37 with (0.13, 0.63) the 95 % confidence interval.
However, we have already indicated that there might be a long-range fluctuation (cyclicity) superimposed onto our long-range persistence signal.Therefore, we are unable to "calibrate" our β PS and β DFA using the techniques of Witt and Malamud (2013) and will instead proceed with an investigation of the period and amplitude of the long-term cyclicity (see Sect. 4.6), which will lead us to a model-based approach (see Sect. 5).In summary, by applying DFA and power-spectral analysis we find evidence for long-range correlations in our palaeoflood data set.For the decadal palaeoflood time series and over temporal scales from yearly to millennial, the estimated strengths of persistence for the (uncalibrated) values are β PS = 0.39 and β DFA = 0.25.

Cyclicity and fluctuations in the 9.3 kyr palaeoflood record
Several climate cycles on millennial scales are known from the analysis of long-term palaeoclimate proxy data (e.g.ice cores).They are related to Heinrich events (Heinrich, 1988;Bond et al., 1992) and the Dansgaard-Oeschger (Dansgaard et al., 1993;Johnsen et al., 1992;Voelker, 2002;Fisher, 2016) cycle which are caused by ocean ice dynamics and variations in solar activity (Gray et al., 2010).The corresponding cycle lengths range between 1000 and 5000 years.These long-term cyclicity processes might play a role as a background signal in the flood frequency described in our palaeoflood data set.In Sect.4.5 we quantified the long-range persistence of our three palaeoflood records.We also found that by applying (i) DFA there was an indication of a long-term cyclicity with a period > 700-1000 years, and (ii) with power-spectral analysis a hint of long-term cyclicity with a period of 1300-2300 years.We now further explore potential long-term temporal cycles and fluctuations in our data.
We first return to our autocorrelation analysis results presented in Sect.4.3 (Fig. 6b) where we used lags τ up to 200 years and found, (i) for the decadal and centennial palaeoflood time series, positive correlations (C > 97.5 % of the Poisson model) for τ < 200 years, and (ii) for the yearly palaeoflood time series, much weaker correlations (in places below the confidence limits, and with significant scatter).
We now extend the same autocorrelation analysis up to lags τ of 7500 years for all three time series.For the yearly palaeoflood time series, we found significant scatter, and so we just show the results for the autocorrelation analysis applied to the decadal and centennial palaeoflood time series (Fig. 10).As before, we also show the lower and upper bounds of the 95 % confidence interval of the ACF (for lags  τ > 0) of the Poisson model.We now observe a strong oscillating signal on both sides of the 95 % confidence interval, for about three periods, with minima at about 900, 3100, and 5000 years, and maxima at about 2000, 4000, and 5500-6500 years.In other words, we see a strong indicator of a cycle with a period in the range of 1900-2100 years.This is not contradictory with our previous observations using DFA (power-spectral analysis) of cyclicity with a period of > 700-1000 years (1300-2300 years).In terms of correlations, for both the decadal and centennial autocorrelation signals, the cycle overwhelms the signal, so no short-term or long-term correlation behaviour can be concluded from Fig. 10.In the hydrology community, semivariograms are also commonly used for studying temporal correlations in time series (e.g.Chiverton et al., 2015).The variogram was developed by a French professor of mining and engineering, Matheron (1963).We therefore apply semivariograms to our yearly, decadal, and centennial palaeoflood time series for lags of up to 7500 years, with an explanation of semivariograms and results given in Appendix B. Results are comparable to that found with the autocorrelation analysis, with a cyclicity at a period of about 1900-2100 years.Due to the long-term cyclicity, the semivariogram does not approach a limit value for large lags.Previously (Sect.4.1 to 4.3), our modelling considered the flood rate per year to be constant (λ year = 0.83 floods yr −1 ) (see Eq. ( 1), Fig. 6, and Fig. 7).We will now model the flood rate λ year as a function of time t (in years) and use the symbol (t), t = 1, . . ., 9336 years, to denote the long-term fluctuations of the flood rate in our palaeoflood record.First, to visualize the long-term fluctuations of the flood rate, we use a kernel density estimator (see Appendix C) which computes a weighted mean of the number of floods per year n year for a sliding time window.We applied Gaussian (based on a standard deviation of σ = 250 years) weights, dealt appropriately with the gap and the values close to the boundaries of the observational interval, and also computed 95 % confidence intervals for (for technical details, see Appendix C).The results (Fig. 11a) show that the time-dependent flood rate varies on millennial scales.We have high values of > 0.15 floods yr −1 at the beginning of the record, a first minimum with = 0.065 floods yr −1 at a relative age of t = 1000 years, a second maximum at t = 2000 years, a second minimum at t = 3400 years, and an absolute maximum with = 0.16 floods yr −1 at a relative age of t = 4300 years.The long-term fluctuations of the flood rate (t) continue for larger relative ages, but get weaker.As the 95 % confidence intervals of the absolute maximum and some of the minima do not overlap, the fluctuations are statistically significant.This long-term fluctuation can also be seen in the time series of n century (Fig. 11b), but due to scatter the millennial-scale cyclicity is not well defined.
We then model the cyclical behaviour of the flood frequency by fitting a sinusoidal function to the annual palaeoflood time series n year .A best-fit sinusoidal function is obtained by applying (with period T ) least-squares regression to time t in years: n year (t) = a 1 + a 2 sin(2πt/T ) + a 3 cos(2πt/T ) + ε (t) , (15) with a 1 a constant very similar to the average flood rate λ year = 0.083 floods yr −1 given in Eq. ( 1); the term "a 2 sin(2π t/T ) + a 3 cos(2π t/T )" is the best-fit sinusoidal function with optimized parameters a 2 and a 3 , and ε(t) is the residual of the statistic model.Using trigonometric functions, we can rewrite the middle two terms of the right-hand side of Eq. ( 16) such that with the amplitude A rate = (a 2 2 + a 2 3 ) 0.5 and the phase = arctan(a 3 /a 2 ).We have calculated the best-fitting sinusoidal model for periods 1500 < T < 2500 years (with a step size of T = 10 years) and have found that the model leads to the smallest variance of the residuals (σ 2 (ε) = 0.081) when T = 2030 years (see Fig. D1), where we find just one minimum and a very low curvature (platykurtic) shape.The model with T = 2030 years best explains the original palaeoflood time series when considering the time-dependent flood rate as a sine wave.The corresponding model parameters are (when T = 2030 years) a 1 = 0.082 floods yr −1 (≈ λ year = 0.083 floods yr −1 ), amplitude A rate = 0.049 floods yr −1 , and phase = 39 • .The value for A rate is almost 60 % of the average palaeoflood rate (λ year ).This model is graphically presented in Fig. 11 as a red line.
When the periodic rate model is compared to the kernel density estimate of the time-dependent rate , we find (i) a good agreement of the positions of the maxima and minima, and (ii) constant values of the maxima and of the minima of the periodic rate model but very different values for the five maxima of the time-dependent flood rate , and that (iii) the periodic rate model is far outside the 95 % confidence levels of the time-dependent rate for the time interval of 5800 < t < 6700 years, which is just 10 % of the length of the palaeoflood record.We can conclude that the periodic rate model captures the most important features of the timedependent rate .
4.7 Summary of clustering, correlation, and cyclicity of the 9.3 kyr Piànico-Sèllere palaeoflood record In Sect. 4 we have seen that the decadal number of floods cannot be explained by a Poisson process due to the palaeoflood time series' one-point probability distribution and long-range persistence (strength β PS ≈ 0.39 using powerspectral analysis and β DFA ≈ 0.25 using DFA).Furthermore, we found our palaeoflood series to be clustered as the interevent occurrence times are approximately Weibull distributed (shape parameter k W = 0.78 ± 0.02) and long-range correlated (power-law exponent of the autocorrelation function γ = 0.45 ± 0.14).We also found evidence for longterm cyclicity (period T ≈ 2030 years).In the next section we will suggest a minimal model that captures these identified characteristics of clustering, correlation, and cyclicity.
5 Creating a peaks over threshold (POT) model to capture correlations and clustering of the observational data In the previous section, we examined a 9.3 kyr palaeoflood record from Piànico-Sèllere that occurred sometime during the period 780 to 393 ka, and found long-range persistence indicated by power-law behaviour of the power-spectral density, the fluctuation function, and the autocorrelation function as well as temporal clustering and long-period cyclicity of the flood time series.We now introduce a model using peaks over threshold (POT) of synthetic time series that consist of a fractional Gaussian noise (FGN) + period, which we will abbreviate as a POT FGN+Period model.This model is based on the idea of long-range persistence and cyclicity that (i) captures the correlation and clustering properties of the palaeoflood observational data, (ii) captures the longperiod cyclicity, and (iii) depends on a minimum number of parameters.In this section, we will first present a method by which we can create a general model to incorporate correlations, clustering, and cyclicity (Sect.5.1).We next quantify the correlation, clustering, and cyclicity properties of this model (Sect.5.2) and specify the parameters of this general model to reflect the specific properties of our observed 9.3 kyr palaeoflood data (Sect.5.3).We then confront the model with the optimized parameters with the palaeoflood data (Sect.5.4) and finally discuss properties and a use of this specified model by analysing long model realizations (Sect.5.5).

A POT model utilizing fractional noises and long-term cyclical behaviour (POT FGN+Period )
To describe our palaeoflood record, we model a series of events along a timeline such that they exhibit correlation and clustering.As discussed in Sect.3.6, fractional noises exhibit linear correlations (i.e. by linear, we mean correlations that can be quantified by autocorrelation function or power-spectral analysis) that are long-range persistent (Beran, 1994).POTs of fractional noises result in event series that exhibit long-range persistence.Here the events are considered to be the values of the considered fractional noise that exceed a high threshold (i.e. the POTs) (Altmann and Kantz, 2005; Bunde et al., 2005;Eichner et al., 2007;Olla, 2007;Santhanam and Kantz, 2008;Moloney and Davidsen, 2009).It has been shown that the interevent occurrence times of such POT values follow a heavy-tailed or Weibull distribution and have long-term correlations (Bunde et al., 2005).Similar results have been found to hold for processes with multifractal and other non-linear correlations (Bogachev et al., 2007(Bogachev et al., , 2008;;Olla, 2007), but not for non-linear deterministic systems (Schweigler and Davidsen, 2011).We have taken the idea of using POTs of fractional noises and have modified it in order to gain long-term periodic fluctuations in the flood frequency and an annual number of floods ranging from 0 ≤ x year (t) ≤ 5 floods yr −1 .The long-period cyclical behaviour of our 9.3 kyr palaeoflood data set was taken into account by superimposing a sine wave with a period of T = 2030 years onto an input fractional Gaussian noise.The resulting time series is formally different from the periodic fractional noises introduced by Montanari et al. (1999).An integer-valued time series of floods per year was created from the strongly fluctuating input signal by introducing a group of thresholds.The model POT FGN+Period was implemented using the following three-part schema.
Step 1. Creation of a superposition of a fractional noise and a periodic signal (FGN+Period) to input to our POT model a. Begin with a realization y t , t = 1, . . ., 9271 years of a fractional Gaussian noise (FGN) with a given longrange persistence strength (β model ).
b. Normalize the FGN so that it has mean µ = 0.0 and standard deviation σ = 1.0.An example of this normalized FGN is illustrated in Fig. 12a.
c. Add to the normalized FGN a sine wave with period T = 2030 years and a given amplitude (which can be varied) A model .The value T = 2030 years is based on the results of Sect.4.6.The result of the normalized FGN superimposed with a sine wave is a periodic FGN.
Step 2. POTs to obtain the number of floods per year f.Define x max thresholds where the thresholds θ 01 , θ 12 , θ 23 , . . .are x max horizontal lines which will intersect our periodic FGN.The thresholds θ 01 , θ 12 , θ 23 , . . .are chosen such that the resultant one-point probability distribution of the number of floods per time unit is a Poisson distribution with a rate parameter of λ year = 0.083 floods yr −1 , i.e. the same as in our palaeoflood series.
g. Determine for each value of the periodic FGN y t , t = 1, . . ., 9271 years the highest of the x max thresholds which is exceeded.The ordinal number of this threshold is the modelled number of floods per year x year (as illustrated in Fig. 12b).If no threshold is exceeded, the number of floods per year is set to zero, x year (t) = 0 floods yr −1 .
h.The result is an integer-valued time series that models the number of floods per year (as illustrated in Fig. 12c) that has the same one-point probability distribution, the same number of data points, and the same long-term period (T = 2030 years) as our original palaeoflood record.The model depends on two model parameters (long-range persistence strength β model and amplitude of periodic signal A model ).
Step 3. Number of floods per decade and century i.For computing the modelled number of floods per decade x decade the number of floods per year x year is summed up over 10 consecutive years.j.For computing the modelled number of floods per century x century the number of floods per decade x decade is summed up over 10 consecutive decades.
The resultant outputs of our POT FGN+Period model, the integer-valued synthetic flood time series x year , depend on two parameters which are both related to the strongly fluctuating input signal: the strength of long-range persistence β model and the amplitude of the superimposed sine wave A model .In Fig. 13, 12 model realizations are shown.The six floods per year time series presented in Fig. 13a are constructed with strength of long-range persistence β model = 0.0, 0.2, . . ., 1.0 but without a long-term periodic component (A model = 0.0).We see visually from Fig. 13a that higher values of persistence lead to stronger clustering of the annual number of floods, i.e. years with many floods tend to follow years with many floods, and years without floods tend to follow years without floods.The six synthetic floods per year time series presented in Fig. 13b are constructed with white noise (β model = 0) and with an increasing amplitude of the long-term periodic component A model = 0.0, 0.2, . . ., 1.0.
In Fig. 13b we can visually observe that for high values of A model the years with many floods cluster periodically with a period of T = 2030 years.
We now return to the three 95 % lines in Fig. 9.These were derived as follows: we simulated 1000 realizations of the model POT FGN+Period with A model = 0.0 and β model = 0.1, 0.3, or 0.5.For a fixed value of β model the power-spectral density (the periodogram) was computed for all resultant event time series.For each frequency f the 1000 values of S(f ) were ordered from smallest to biggest and the 950th of 1000 values formed the three 95 % coloured lines in Fig. 9.
In summary, the time series presented in Fig. 13a and b show that high values of either of the model parameters lead to strong clustering.In the next subsection, we will investigate the dependence of the clustering and correlation properties on the model parameters.

Quantifying correlation, clustering, and cyclicity
properties of the POT FGN+Period model In the previous subsection, we have shown that it is possible to use the POTs of a fractional noise that has been superimposed with a 2030-year period to create an integer-valued time series with long-range persistence (correlation) properties and periodicity.In addition, we have been able to capture the same "number" of values in our synthetic flood time series as in our palaeoflood record.Now we want to evaluate the strength of clustering caused by persistence and the contribution of the long-term periodicity for different model parameters of our POT FGN+Period model.This is done by creating ensembles of realizations for different pairs of model parameters and by measuring the strength of long-range persistence, the shape parameter of the Weibull distribution of the interevent occurrence times, and the periodic rate fluctuations of these realizations, and by comparing them with the values measured for the palaeoflood record.The model has two parameters: (i) β model , the long-range persistence strength of the underlying fractional Gaussian noise, and (ii) A model , the amplitude of the superimposed periodic component with a period T = 2030 years.Both parameters were sampled with a step size of 0.02 from 0.00 to 1.00, resulting in a square grid of 51 × 51 = 2601 pairs of model parameters.For each pair of model parameters (β model , A model ), 1000 synthetic flood series (model realizations) were produced using the peaksover-thresholds method described in Sect.5.1 and illustrated in Fig. 12.The differences between the realizations for one and the same pair of model parameters are caused by different noise realizations.
The amplitude of the best-fitting sinusoidal flood rate A rate assuming a period of T = 2030 years and the shape parameter k W of the best-fitting Weibull distribution of the interevent occurrence times were determined for each model realization (x year ).Further, each resultant palaeoflood yearly series (x year ) was aggregated to give the number of floods per decade (x decade ), and then (using power-spectral analysis and detrended fluctuation analysis) its strength of long-range persistence β PS and β DFA was computed.
In Fig. 14a to d, for each of the 51 × 51 "cells" of the parameter space of our POT FGN+Period model, are given the mean values of measured β PS , β DFA , k W , and A rate (see the legend for colours and contours) for that cell's (β model , A model ) 1000 model realizations.The graphs show that on average an increase in the model parameter long-range persistence strength of the underlying fractional Gaussian noise, β model , leads to an increased strength in long-range persistence, β PS and β DFA , of the synthetic floods per decade series x decade , and to a decrease in the shape parameter k W of the Weibull distribution of the modelled interevent occurrence times (of x year ).Moreover, an increase in the A model parameter of our POT FGN+Period model (i.e. the amplitude of the superimposed periodic component with period T = 2030 years) leads to a larger best-fitting sinusoidal flood rate A rate .However, in both cases the intensity of the increase depends also on the second model parameter, as otherwise the images presented in Fig. 14a to d would have horizontally or vertically striped structures.Furthermore, the mean values of β PS , β DFA , k W , and A rate (Fig. 14a to d) do not give information about the spread of these measured parameters for a specific model parameter.This implies that for a specific cell, we do not have the information about the probability to have a model realization that is similar to the Piànico palaeoflood data set.

Parameter fitting of the POT FGN+Period model
Here we identify pairs of model parameters (β model , A model ) that correspond to flood time series whose strength of long-range persistence (β PS or β DFA ), interevent occurrence time distribution (measured as the shape parameter of the Weibull distribution k W ), and best-fitting sinusoidal flood rate A rate have values close to those measured for the palaeoflood record from Piànico.From the 2.6 × 10 6 synthetic flood series realizations generated in Sect.5.2, the realizations' parameters are within the range of our palaeoflood original series parameters as follows: 162 213 realizations (6.2 % of the total number of model realizations) with 0.370 ≤ β PS ≤ 0.410, 157 782 realizations (6.1 %) with 0.23 ≤ β DFA ≤ 0.27, 132 025 realizations (5.1 %) with 0.76 ≤ k W ≤ 0.81, and 152 056 realizations (5.8 %) with 0.0450 ≤ A rate ≤ 0.0530.In Fig. 14e the corresponding model parameters of these realizations are presented as clouds of points and each point stands for a POT FGN+Period model realization that is similar to the Piànico palaeoflood record with regards to a single measure of correlation or clustering.These points are located in specific areas: for instance, model realizations with a strength of longrange persistence β PS similar to the value of the palaeoflood record (pink dots) are found for β model < 0. We first compute a kernel density estimator for each of the 1000 model realizations.Then we take the mean of these at each time step, giving us a mean (t) (dashed black line).We then compute the 95 % confidence band limits (dark grey bands), by ordering the 1000 values of (t) for each time step t and choosing the 25th and 975th values.As the flood rate of the first and last 500 years might be affected by edge effects, the corresponding dark grey colour banding is changed to light grey.For a comparison, the corresponding time-dependent flood rate for the palaeoflood data (solid black line), as shown in Fig. 11a, is given.Hydrol  14e also shows that there are just a very few model parameters that belong to all four clouds and thus are in the desired ranges of β PS , β DFA , k W , and A rate .To identify optimum model parameters, we choose POT FGN+Period model realizations where the following four parameters intersect: β PS and β DFA (characterize strength of long-range persistence, pink and brown dots respectively), k W (characterizes clustering, blue dots), and A rate (characterizes cyclicity strength, green dots).In other words, we find where the brown, pink, blue, and green dots intersect.Just 143 realizations (presented as black dots in Fig. 14e) out of the 2.6 × 10 6 model realizations are part of this intersection.They are located in a small area of the 2-D parameter space: 0.10 ≤ β model ≤ 0.30 and 0.20 ≤ A model ≤ 0.38.It should be noted that this area does not intersect with the x-axis or the y-axis, meaning that both model parameters are necessary for an appropriate description of the palaeoflood data.Because 143 realizations were not sufficient for effectively investigating (and visualizing) the 2-D probability distribution of β model and A model , for 0.06 ≤ β model ≤ 0.34 and 0.16 ≤ A model ≤ 0.42 (a slightly extended area as to that just found) we created more realizations (10 000 per cell) to gain a total of 1518 realizations with the desired long-range persistence, clustering, and cyclicity properties.The 2-D probability density of these points has been approximated by a normalized 2-D histogram (Fig. 14f).The grid cell of β model = 0.25 and A model = 0.30 (presented as a red bullet) has the highest probability.This pair of model parameters is considered to be the best fitting.Approximately 50 % of the 1518 model realizations that are similar to the palaeoflood data set are found for model parameters 0.19 < β model < 0.29 and 0.26 < A model < 0.34; this range is indicated by the blue dashed lines in Fig. 14f.These ranges give an estimate of the accuracy of POT FGN+Period model parameter determination.
In summary, the model evaluation provides evidence that both model parameters of the strongly fluctuating input signal (the strength of long-range persistence of the fractional noise and the amplitude of the sine wave superposed onto it) of our POT FGN+Period model are required for an appropriate modelling of the clustering properties of the palaeoflood record.Optimum parameters including error bars (β model = 0.24 ± 0.05 and A model = 0.30 ± 0.04) have been determined.One realization of the model with optimum parameters is given in Fig. 15a.

Model confrontation
In the last subsection, we specified the parameters of our POT FGN+Period model in order to reproduce the observed clustering and long-term periodic properties.Now we will investigate further model properties such as the distributions of floods per decade or per century and confront these properties with the values measured for the palaeoflood record from Piànico.
For a comparison of the properties of the palaeoflood data and of the POT FGN+Period model with the specified parameters, we have created 1000 model realizations (50 of these x year series are shown in Fig. 15b).For each of the 1000 model realizations the time-dependent flood rate model (t) (Sect.4.6) was computed which enabled the calculation of the mean time-dependent flood rate model (t) and its 95 % confidence intervals (Fig. 15c).The mean time-dependent flood rate model (t) fluctuates periodically with a 2030-year period and replicates the long-term changes in rate fluctuations qualitatively.However, the amplitudes of the rate changes for some time intervals (t = 0 to 600 years, 3800 to 4800 years, and 5500 to 6800 years) are not correctly captured, i.e. they are outside the 95 % confidence intervals of model .This is due to the stationary character of our palaeoflood model and the non-stationary character of the longterm rate fluctuations.
For the ensemble of all model realizations, the histograms of the number of floods per year, per decade, and per century were computed, such that the mean number of floods per time unit with 95 % error bars could be estimated (Fig. 15d).The number of Piànico palaeofloods per year, decade, and century (as discussed in Sect.4.1 and seen in Fig. 6a) fit well within the 95 % confidence intervals of the considered histograms.
Finally, for the ensemble of all POT FGN+Period (with specified parameters) model realizations, the clustering and correlation properties were quantified: the persistence strength (β PS , β DFA ), the shape parameter of the Weibull distribution of the interevent occurrence times (k W ), and the timedependent flood rate ( model (t)) were computed (see Table 2).Table 2 shows that the correlation properties (β PS , β DFA ) of the model are on average weaker than those of the palaeoflood record.Nevertheless, the 95 % confidence intervals from the model contain the original palaeoflood values for β PS , β DFA , k W , and A rate .The clustering properties (k W , A rate ) of the data are excellently captured by the model as the mean values of the corresponding measures computed for the model realizations are very close to the values measured for the data.
In summary, we find a good agreement between the POT FGN+Period model and the palaeoflood data series in particular with respect to the distribution of floods per time unit and the clustering properties.

Using and analysing realizations from the POT FGN+Period model
We have shown in Sect.5.4 above that our POT FGN+Period model reproduces several properties of the palaeoflood time series, including the distribution of floods per year, decade, and century as well as long-range persistence, clustering, and cyclicity properties.We now use realizations from the POT FGN+Period model with optimized parameters to explore how the number of floods per time unit (e.g. per decade or century) is related to the number of floods in the following time unit (e.g.does a century (decade) with a few floods tend to follow a century (decade) with a few floods?).To do this we create one long model realization with 10 9 years (10 8 decades, 10 7 centuries).We computed the probability of floods per century for given numbers of floods in the preceding century.
Figure 16a shows the 2-D probability distribution of the number of floods per century (n century (j )) and the number of floods in the succeeding century (n century (j +1)).This distribution is concentrated along the diagonal line n century (j +1) = n century (j ) and thus indicates positive correlations in the number of floods per century time series realizations.Figure 16b presents the distribution of the number of floods per century if the preceding century contained 0, 8, or 16 floods century −1 .These distributions are unimodal with a systematic shift in the mode and a dispersion that is increasing with the numbers of floods in the preceding century.In other words, the uncertainty of the forecasted number of floods per century increases with the number of floods in the preceding century.
Our results shown in Fig. 16 imply that low numbers of floods century −1 follow low ones, and that high numbers of floods century −1 follow high ones.In other words, a century with no floods is much more likely to follow a century with no or a very few floods rather than a century with many floods, and a century with many (e.g.20) floods is much more likely to follow a century with many (e.g.20) floods than a century with no or a very few floods.However, this holds only in the statistical sense, and successions of centuries with many floods can certainly follow centuries with very few floods (and vice versa).For example, in the palaeoflood data set we have found a "jump" from 24 to 4 floods century −1 (between the 39th and 38th centuries).In our model the probability that a century with 4 floods will follow a century with 24 floods is 0.02 % on average, but will be much higher for time periods with a high amplitude of the periodic component of the input signal.We also observe that in our original palaeoflood time series there is a jump from 7 to 25 floods century −1 between the 46th and 45th centuries.A jump of this magnitude in our optimized POT FGN+Period model has a probability of 3.9 % based on our long synthetic realization, and is thus well captured by the model.Furthermore, we found for the POT FGN+Period model that medium changes in flood frequencies are fairly likely, as for instance a transition from 25 to < 10 floods century −1 has a probability of > 20 %.In a similar manner, we have also examined the summary statistics for the number of floods per decade using our long synthetic time series realization.For the simulated 10 8 decades, we find between 0 and 13 floods decade −1 , although > 10 floods decade −1 is very unlikely.Similar to the centennial data, we find indications of correlations that decades with few (many) floods are followed by decades with few (many) floods, i.e. on average, low values follow low ones and high values follow high ones.

Summary
We have presented a 9.3 kyr comprehensive flood record at sub-annual resolution, obtained from the varved interglacial Pleistocene sediments of the Piànico-Sèllere Basin.A lacustrine sediment unit of 9.5 m thickness has been considered which consists of an almost continuous succession of about 15 500 varves that are interpreted as annual cycles.Approximately 8 % of the varves contain 1-3 detrital layers which are considered to be the result of channelized streamflow that originated in the hills surrounding the Piànico-Sèllere Basin and triggered by extreme precipitation events (Mangili et al., 2005).Our analysis was aimed at understand the temporal succession of detrital layers.The analysed palaeoflood record is unique as it comprises the relative timings of 771 flood events, is continuous (except a gap of 65 years, which is less than 1 % of the length of the 9336-year time period), and was not affected by anthropogenic influences.This palaeoflood record provides an example of the high natural variability of flood frequency over a long period (e.g.variations of 0-31 floods century −1 ).Because of the comprehensive data set, the temporal succession of palaeofloods could be extensively studied as to its underlying statistics of correlations, clustering, and cyclicity.We showed that the correlations of the decadal number of floods over the 9.3 kyr palaeoflood are long range, with a long-range persistence strength of β PS ≈ 0.39 (power-spectral analy-sis) and β DFA ≈ 0.25 (DFA).Additionally, the flood frequency is modulated by a long-term cyclicity with a period of T = 2030 years.The palaeofloods are also shown to be temporarily clustered as the interevent occurrence times are Weibull distributed (with a shape parameter of k W = 0.78).
We have derived a model (POT FGN+Period ) that is based on POTs of a fractional Gaussian noise (FGN) superimposed by a long period signal.This model allows us to construct many realizations of an event series with properties similar to those identified for the original palaeoflood time series.Should more information (e.g. higher resolution, extending the data series in time) become available related to our original palaeoflood series, then the parameters of correlation, clustering, and cyclicity are likely to be slightly changed, and a modified model can then be easily created.
We used our model to create 2 600 000 synthetic flood series with different parameters, and then confronted the clustering, correlation, and cyclicity properties of our original palaeoflood record with the synthetic series to come up with optimized parameters in our POT FGN+Period model.Based on the optimized POT FGN+Period model, we found that it is important when modelling our original palaeoflood time series that the model needs both long-range correlations and a slowly varying component (periodicity) to capture the correlation, clustering, and cyclicity properties of the palaeoflood record.In other words, the observed temporal flood clusters in our 9.3 kyr time series cannot be explained by either longrange correlations or slow cyclical changes; rather, both components need to be present.
As an example of the application of our optimized POT FGN+Period model we use it to create long simulations with parameters similar to that of our palaeoflood time series.We show that centuries with no or a few floods tend to follow each other and that centuries with many floods tend to follow centuries with many floods.Those centuries that are extremely dry or wet we mostly attribute to the influence of noise in both the 9.3 kyr palaeoflood record and the corresponding model that we have created.We also found that the uncertainty of the forecasted number of floods per century increases with the number of floods in the preceding century.
Our research in this paper combines the statistical analysis of correlations, clustering, and cyclicity in a very complete and unique interglacial record of floods from the Pleistocene, allowing one to create a model to simulate many realizations with similar parameters to our original series.We believe that this approach is applicable to other environmental event time series (e.g.palaeo-hazards); where event magnitude is unknown, the timings are unequally spaced in time, and where the one-point probability distribution of the number of events per time unit is strongly asymmetric (i.e.non-Gaussian).This is true for many palaeo and historical environmental event series (e.g.earthquakes, wildfires, and volcanic eruptions).We also believe our approach of creating an optimized model that is congruent with the correlation, clustering, and cyclicity pa-

Figure 1 .
Figure 1.Example of correlations, clustering, and cyclicity (a and b following Witt et al., 2010).(a) Tree ring standardized growth index for Bristlecone pine, White Mountain, California, USA, for the years AD 0-1962 (Ferguson et al., 1994).(b) Cosmic ray neutron counts per hour, Beijing, China, 1 January to 11 March 2008 (NGDC, 2008).In (a) and (b) successive values in each both series are positively correlated with one another, and are examples of persistent time series.(c) Maximum daily discharge, Q, for station 05474500 on the Mississippi River at Keokuk, Iowa, for 116 water years, 1878-1993 with data from Slack and Landwehr (1992) and USGS (2017) and described in Malamud et al. (1996).(d) Partialduration flood series, where floods are the largest 116 maximum daily discharge from (c) over the 116-year period, with daily discharges separated by at least 30 days to be considered a flood.In our partial duration series, 48 of the 116 years have 0 floods, 33 years have 1 flood, 26 years have 2 floods, and 6/2/1 years have 3/4/5 floods.The value of these maximum daily discharges for each flood event is projected to the x-axis (daily discharge = 0 m 3 s −1 ).Below this is shown (green dots) all events along one line, an example of a data series that is strongly clustered in time.Clustering is due to (i) seasonal effects (cyclicity) and (ii) longer-term cycles.

Figure 2 .
Figure 2. Site, sediment outcrop, and data.(a) Location of palaeolake Piànico-Sèllere, Italy.Underlying image © 2017 DigitalGlobe.(b) Detail of the 9.5 m sediment outcrop containing a winter layer and parts of two summer layers with, shown here, a detrital layer that is interpreted as a flood event.(c) Graphical representation of the sediment outcrop with the grey horizontal lines indicating detrital layers, brown horizontal bars indicating the number of detrital layers per decade and the striped bar showing a 65-year gap (see Fig.3for more detailed time series).The relative age represents years before the top of the stratigraphic section examined, with actual age estimated to be anywhere from 780 to 393 ka (see text).These data, the number of detrital layers per year, have been lodged online at the World Data Centre PANGEA(Mangili et al., 2017).

Figure 3 .
Figure3.Temporal succession, histograms, and autocorrelation of the detrital layers of the 9336-year Piànico-Sèllere, Italy, palaeoflood record (data available atMangili et al., 2017).The number of observed detrital (flood) layers per year (blue dots, 0 ≤ n year ≤ 3 floods yr −1 ), decade (orange dots, 0 ≤ n decade ≤ 8 floods decade −1 ), and century (green dots, 0 ≤ n century ≤ 31 floods century −1 ) are presented over the 9336 years of the record examined.The x-axis represents relative age, with t = 1 year the most recent varve and increasing values indicating further back in time.The grey bar represents a sediment gap of 65 years (leaving 9271 years of the record examined over the 9336 years).In addition, for each time resolution is given (upper right) the coefficient of variation c v = σ/µ, where σ is the standard deviation and µ is the mean of the given time series.Shown to the right of each palaeoflood time series is the respective histogram of the number of floods per year (blue bars), decade (orange bars), and century (green bars).

Figure 4 .
Figure 4. Data series of detrital layers (floods) and interevent occurrence times (IEOTs) from the Piànico-Sèllere, Italy, palaeoflood sequence.(a)In panel (a.1) are shown the number of detrital layers (floods) per varve, n year , given as a function of varve index t = 1 to 9336 years (blue dots -data available atMangili et al., 2017) as shown in Fig.3.In panel (a.2) a 100-year portion of (A.1) (from t = 8090 to 8190 years) is expanded with an illustration of 4 flood years that have detrital layers (floods) in them, and the interevent occurrence times between them.These 4 flood years, each with n year ≥ 1 flood in them, appear as yellow symbols in all panels.(b) Interevent occurrence times, , temporally located (i.e. as a function of the relative age) when they occurred.(c) Interevent occurrence times, j , plotted as a function of "natural" time, where each is no longer represented temporally when it occurs, but rather successively one after another, j = 1, 2, 3, . . ., 741; interevent occurrence times of j = 0 are not shown.The detrital varve index j includes only those years (varves) where there are n year ≥ 1 floods yr −1 (i.e. at least one detrital layer in that year).

Figure 5 .
Figure 5. Examples of synthetic fractional Gaussian noises with different modelled strengths of long-range persistence, 0.0 ≤ β ≤ 1.0.The presented synthetic data series (unitless in magnitude and time), which have 512 elements each, are normalized to have a mean of and a standard deviation of 1, and were created by Fourier filtering (see Appendices 1 and 2 in Witt and Malamud, 2013, for further details).

Figure 6 .Figure 7 .
Figure 6.Histograms and autocorrelation of the palaeoflood time series with the corresponding Poisson model.(a) Histogram (also shown in Fig. 3) of the number of floods per year (blue bars), decade (orange bars), and century (green bars), with each compared to the Poisson model with respective rate parameters λ year = 0.083 floods yr −1 , λ decade = 0.83 floods decade −1 , and λ century = 8.3 floods century −1 .Diamonds represent the mean Poisson model value and the error bars the standard deviation over 100 realizations.(b) Autocorrelation function (ACF)(Eq.5) of the number of floods per year (blue squares), decade (orange squares), and century (green squares).Also shown for the ACF of the number of floods per decade is the best-fitting power-law model to the ACF (black dotted line) (Eq.13).Also shown are the 97.5th percentile, i.e. the upper bound of the 95 % confidence interval of the ACF (for lags τ > 0) of an uncorrelated signal with the same one-point probability distribution (i.e.realization of a Poisson model).

Figure 8 .
Figure 8. Detrended fluctuation analysis (DFA) of the 9271-year Piànico-Sèllere, Italy, palaeoflood record: the fluctuation function F for the number of floods per decade shown as a function of the segment length l and for different orders of the detrending (see the legend) on a double logarithmic scale.Segments containing parts of or the entire gap were excluded.Also shown are the best-fitting power-law function (Eq.12) for DFA3 and the corresponding power-law exponent α and its corresponding value of β DFA = 2α − 1.
Figure9.Power-spectral analysis: the power-spectral density (periodogram) is presented as a function of the frequency (measured per decade) for the number of floods per decade.Also given is the bestfitting power-law function (Eq.8) and the corresponding power-law exponent.Additionally, presented are the 95th percentiles of the power spectra created for synthetic data based on peaks over thresholds of fractional noises (see Sect. 5.1) for different strengths of persistence (see the legend and the description in the text).Highlighted are the three values of the power-spectral density that exceed all of these 95th percentiles; the corresponding cycle lengths are given.

Figure 10 .
Figure10.Autocorrelation analysis of the decadal and centennial palaeoflood records (see Fig.3) and its Poisson model for lags of up to 7500 years.Autocorrelation function (ACF) (Eq.5) of the number of floods per decade (a, orange squares) and century (b, green triangles).Also shown are the 97.5th and 2.5th percentiles, i.e. the upper and lower bounds of the 95 % confidence interval, of the ACF (for lags τ > 0) of an uncorrelated signal with the same onepoint probability distribution (i.e.realization of a Poisson model; see Sect.3.2).See also Fig.6b, which is a similar analysis, but only up to lags τ of 200 years.

Figure 11 .
Figure 11.Analysis of flood rate fluctuations.(a) Shown is the time-dependent flood rate (black line) and its 95 % confidence bands (green bands) as a function of the relative age t = 1 to 9336 years for the palaeoflood data set shown in Fig. 3 which was computed by using a kernel density estimator.The grey colour indicates results that might be affected by boundary effects.The estimator is based on a Gaussian kernel with a width of ±3σ , where σ = 250 years (Appendix C), shown in the inset figure.Also shown are the average flood rate (dashed grey horizontal line) of λ year = 0.83 floods yr −1 and the best-fit sinusoidal curve (red curved line, Eq. 16).(b) For a comparison, the number of floods per century n century (green bars) is presented.The grey vertical bar indicates the gap in the data set.

Figure 12 .
Figure 12.Schematic of the POT FGN+Period model for peaks over thresholds (POTs) of fractional Gaussian noises (FGN) (see Fig. 13 for POTs of FGN + Period): (a) shown is a synthetic fractional Gaussian noise (unitless in magnitude) with a persistence strength of β model = 0.5.The synthetic time series resolution is set at t = 1 year.See Fig. 5 for an example of other fractional Gaussian noises.(b) Three thresholds are set (dashed horizontal lines).All noise values that are below the threshold θ 01 are considered as years with 0 floods yr −1 .All noise values between the thresholds θ 01 and θ 12 (θ 12 and θ 23 ) are considered as years with 1 (2) floods yr −1 .And, all noise values that exceed the threshold θ 23 are considered years with 3 floods yr −1 .The thresholds θ 01 , θ 12 and θ 23 are chosen such that the resultant one-point probability distribution is the same as our flood series.(c) Time series of the modelled number of floods per year (blue circles) for 1000 years.For (a), (b), and (c) are also shown the histogram of the number of values at a given size.

Figure 13 .
Figure 13.POT FGN+Period model realizations.Shown are (a) long-range persistent flood time series that are modelled with the proposed POT FGN+Period model (see Fig. 12) as peaks over thresholds (POTs) of a fractional Gaussian noise (illustrated in Fig. 5) with different persistence strengths β model = 0.0, 0.2, 0.4, 0.6, 1.0 and no superimposed periodic component (A model = 0.0), and (b) flood time series x year with a long-term periodic cyclicity which are modelled as POTs of a Gaussian white noise (β model = 0.0) superimposed with a sine wave (T = 2030 years) with different sine wave amplitudes A model = 0.0, 0.2, 0.4, 0.6, 1.0.

Figure 15 .
Figure15.POT FGN+Period model with optimized parameters.(a) Shown (black time series) is a realization of a fractional Gaussian noise (FGN) with a long-range persistence strength of β model = 0.24 (normalized to a mean = 0.0 and variance = 1.0) that is superimposed with a periodic component (period: T = 2030 years; amplitude: A = 0.30).This synthetic time series is given a time resolution of t = 1 year and a length N year = 9271 years.Peaks over thresholds (POTs) are considered (as in Fig.12) for five thresholds (three shown here), θ 01 , θ 12 , etc., where input (FGN + period) values < θ 01 translate to x year = 0 flood yr −1 , input values between θ 01 and θ 12 result in x year = 1 flood yr −1 , etc.The resultant model series of the number of floods per year (grey and blue circles) for 9271 years is constructed to have a Poissonian one-point distribution and is shown at the bottom of (a).(b) Shown are 50 more model realizations of the POT FGN+Period model as number of floods per year (see the legend for the colour code), with maximum x year = 4 floods yr −1 .(c) Time-dependent flood rate (t) given as a function of time t (see Fig.11a, text and Appendix C) using 1000 model realizations similar to (b).We first compute a kernel density estimator for each of the 1000 model realizations.Then we take the mean of these at each time step, giving us a mean (t) (dashed black line).We then compute the 95 % confidence band limits (dark grey bands), by ordering the 1000 values of (t) for each time step t and choosing the 25th and 975th values.As the flood rate of the first and last 500 years might be affected by edge effects, the corresponding dark grey colour banding is changed to light grey.For a comparison, the corresponding time-dependent flood rate for the palaeoflood data (solid black line), as shown in Fig.11a, is given.(d) Black diamonds represent the mean number of floods per year, decade, and century based on 1000 realizations of our POT FGN+Period model as given in the previous panels, with error bars 95 % confidence intervals.For comparison are shown (i) the histogram (also shown in Fig. 6a) of the original palaeoflood time series number of floods per year (blue bars), decade (orange bars), and century (green bars), and (ii) the corresponding Poisson model (grey diamonds, also shown in Fig. 6a) with respective rate parameters λ year = 0.083 floods yr −1 , λ decade = 0.83 floods decade −1 , and λ century = 8.3 floods century −1 .
Figure15.POT FGN+Period model with optimized parameters.(a) Shown (black time series) is a realization of a fractional Gaussian noise (FGN) with a long-range persistence strength of β model = 0.24 (normalized to a mean = 0.0 and variance = 1.0) that is superimposed with a periodic component (period: T = 2030 years; amplitude: A = 0.30).This synthetic time series is given a time resolution of t = 1 year and a length N year = 9271 years.Peaks over thresholds (POTs) are considered (as in Fig.12) for five thresholds (three shown here), θ 01 , θ 12 , etc., where input (FGN + period) values < θ 01 translate to x year = 0 flood yr −1 , input values between θ 01 and θ 12 result in x year = 1 flood yr −1 , etc.The resultant model series of the number of floods per year (grey and blue circles) for 9271 years is constructed to have a Poissonian one-point distribution and is shown at the bottom of (a).(b) Shown are 50 more model realizations of the POT FGN+Period model as number of floods per year (see the legend for the colour code), with maximum x year = 4 floods yr −1 .(c) Time-dependent flood rate (t) given as a function of time t (see Fig.11a, text and Appendix C) using 1000 model realizations similar to (b).We first compute a kernel density estimator for each of the 1000 model realizations.Then we take the mean of these at each time step, giving us a mean (t) (dashed black line).We then compute the 95 % confidence band limits (dark grey bands), by ordering the 1000 values of (t) for each time step t and choosing the 25th and 975th values.As the flood rate of the first and last 500 years might be affected by edge effects, the corresponding dark grey colour banding is changed to light grey.For a comparison, the corresponding time-dependent flood rate for the palaeoflood data (solid black line), as shown in Fig.11a, is given.(d) Black diamonds represent the mean number of floods per year, decade, and century based on 1000 realizations of our POT FGN+Period model as given in the previous panels, with error bars 95 % confidence intervals.For comparison are shown (i) the histogram (also shown in Fig. 6a) of the original palaeoflood time series number of floods per year (blue bars), decade (orange bars), and century (green bars), and (ii) the corresponding Poisson model (grey diamonds, also shown in Fig. 6a) with respective rate parameters λ year = 0.083 floods yr −1 , λ decade = 0.83 floods decade −1 , and λ century = 8.3 floods century −1 .

Figure 16 .
Figure16.Using the optimized POT FGN+Period model for forecasting the number of floods in 1 century based on the number of floods in the previous century: based on the palaeoflood model realizations for 10 9 years are shown (a) the 2-D probability of the number of floods per century (x century (j )) and the number of floods in the succeeding century (x century (j + 1)) (background grid colours; see the legend for the scale).The solid grey diagonal line represents the equality of the number of floods in both centuries (x century (j + 1) = x century (j )).(b) The 2-D probability distribution shown in (a) is cut vertically at x century = 0, 8, and 16 floods century −1 and the corresponding probability densities of floods per centuries after a century with x century = 0, 8, and 16 floods century −1 are presented as coloured lines (see the legend).Also given is the probability density of floods per century of the palaeoflood model.
A model = 0) and (β model = 0.0, A model = 0.8).Realizations with values of A rate close to 0.049, i.e. the value of the bestfitting sinusoidal flood rate of the palaeoflood record, are centred around A model = 0.32 (green dots) with a spread that grows with increasing values of β model .Next, we will identify the model parameters β model and A model which lead to model time series that best match our palaeoflood record. Figure . Earth Syst.Sci., 21, 5547-5581, 2017 www.hydrol-earth-syst-sci.net/21/5547/2017/

Table 2 .
Statistics of long-range persistence, clustering, and cyclicity properties of the specified (β model = 0.24, A model = 0.30) palaeoflood model; 1000 model realizations were created and their long-range correlations (β PS and β DFA ), clustering (k W ), and cyclicity properties (A rate ) were quantified.Given is for each measure the mean value and the 2.5th and 97.5th percentiles of the model realizations, the value measured for the palaeoflood record, and the percentile of the value measured with respect to the values of the model.