Interactive comment on “ Flood frequency analysis of historical flood data under stationary and non-stationary modelling ”

We appreciate very much this critical point of view to our non-stationary analysis of the historical flood period. We understand that from a conventional-type engineering approach, this paper using non-systematic data brings up a number of questions on how to deal with uncertainty inherent to documentary flood data. However, the reviewer should consider this work as a scientific-exploratory approach in the application of non-stationary analysis to centennial flood data series in a site with a complete and continuous record of peak over threshold discharges on the basis of documentary evidence.

Abstract.Historical records are an important source of information on extreme and rare floods and fundamental to establish a reliable flood return frequency.The use of long historical records for flood frequency analysis brings in the question of flood stationarity, since climatic and land-use conditions can affect the relevance of past flooding as a predictor of future flooding.In this paper, a detailed 400 yr flood record from the Tagus River in Aranjuez (central Spain) was analysed under stationary and non-stationary flood frequency approaches, to assess their contribution within hazard studies.Historical flood records in Aranjuez were obtained from documents (Proceedings of the City Council, diaries, chronicles, memoirs, etc.), epigraphic marks, and indirect historical sources and reports.The water levels associated with different floods (derived from descriptions or epigraphic marks) were computed into discharge values using a one-dimensional hydraulic model.Secular variations in flood magnitude and frequency, found to respond to climate and environmental drivers, showed a good correlation between high values of historical flood discharges and a negative mode of the North Atlantic Oscillation (NAO) index.Over the systematic gauge record , an abrupt change on flood magnitude was produced in 1957 due to constructions of three major reservoirs in the Tagus headwaters (Bolarque, Entrepeñas and Buendia) controlling 80 % of the watershed surface draining to Aranjuez.Two different models were used for the flood frequency analysis: (a) a stationary model estimating statistical distributions incor-porating imprecise and categorical data based on maximum likelihood estimators, and (b) a time-varying model based on "generalized additive models for location, scale and shape" (GAMLSS) modelling, which incorporates external covariates related to climate variability (NAO index) and catchment hydrology factors (in this paper a reservoir index; RI).Flood frequency analysis using documentary data (plus gauged records) improved the estimates of the probabilities of rare floods (return intervals of 100 yr and higher).Under nonstationary modelling flood occurrence associated with an exceedance probability of 0.01 (i.e. return period of 100 yr) has changed over the last 500 yr due to decadal and multidecadal variability of the NAO.Yet, frequency analysis under stationary models was successful in providing an average discharge around which value flood quantiles estimated by non-stationary models fluctuate through time.

Introduction
Throughout Europe, national legislations on flood hazard assessment are based on flood-frequency analyses which estimate discharges associated with different return periods (usually 10, 25, 50, 100 and 500 yr).This assessment is also gathered in the European Flood Directive (2007/60/CE) together with the ordinarily used parameters (see Article 6.3 on flood hazard maps and flood risk maps), which are common practice in several continents and countries such as the USA, Australia and several South American countries.The common procedure involves the extrapolation from gauged hydrological registers, documenting 30-40 yr records, of observed floods (usually not comprising an extraordinary event) and estimate quantiles for floods.The straightforward handling of these simple statistical methods that makes them so widely used hides important uncertainties as well as scientific and technical problems which have been extensively discussed in the literature (Merz and Blöschl, 2008).This conventional method of addressing flood hazard assessment can be improved by including information of past floods (historical floods and palaeofloods), which should be accomplished using rigorous procedures of data collection and statistical modelling.Documentary records provide a catalogue of the largest flood events that occurred during periods of human settlement, and provide evidence of all other events below or above specific flow stages or thresholds (Brázdil et al., 2006).Long records of historical extreme floods have been applied successfully in hazard analysis together with the more traditional empirical, statistical and deterministic methods to estimate the largest floods (Stedinger and Cohn, 1986;Francés et al., 1994;England et al., 2003).Information on these extreme floods is highly demanded by planners and engineers and yet seldom registered in the observational record due to its short time length (Enzel et al., 1993;Benito et al., 2004).
A major issue of using long-term flood data in flood frequency analysis is the assumption of stationarity, i.e. the idea that "natural systems fluctuate within an unchanging envelope of variability" (Milly et al., 2008).The analysis of longterm historical flood records frequently reveals the occurrence of flood clusters and trends lasting from a few decades to centuries and, subsequently, the fact that the frequency of flood magnitude changes over time (Benito et al., 2003).Most of these trends observed in historical flood series are related to decadal climate variability (e.g.influence of the North Atlantic Oscillation or ENSO, El Niño) which may affect the assumption of stationary in conventional flood frequency analysis (Merz et al., 2014).Milly et al. (2008), in their study on the recent impacts of climate change in the hydrological cycle, conclude that the assumption of stationary in hydrology is dead and should no longer serve as a central, default assumption in water-resource risk assessment and planning.Yet, the application of non-stationary concepts by the engineering community has been rather challenging due to the limited number of non-stationary models and scarce long-term flood records to test the performance of such models.Centennial historical flood data provides long-term real data to test and compare stationary and non-stationary models under climate variability.The results of this analysis together with the selection of appropriate covariates explaining the changes in flood series are relevant in climate-related impacts on flood risk assessment.Moreover, the optimal merging of instrumental, historical and palaeoinformation is of great importance for understanding flood hazard and the ap- praisal of flood incidence through multi-decadal climate variability (Kundzewicz et al., 2014).
This paper aims at addressing stationary and nonstationary flood frequency analyses of historical floods (data censured above threshold), and to test their performance in flood hazard analyses.The specific objectives are to (1) analyse the long-term historical flood variability and identify the major drivers and covariates (climatic or human-induced environmental factors) describing the temporal changes; (2) test the stationary of flood series and, if so, estimate flood distribution of systematic (gauged) and non-systematic (historical) data under stationary assumption; (3) apply nonstationary models to historical flood data incorporating covariate indexes describing flood producing atmospheric circulation patterns (e.g.NAO) and other major human disturbance parameters (e.g.reservoir construction); and (4) compare results of using stationary and non-stationary models in flood hazards assessment.

Study area
The Tagus River drains the central Spanish Plateau (Meseta Central) and flows east-west into the Atlantic Ocean at Lisbon (Portugal).It is the longest river of the Iberian Peninsula (1200 km) and the third largest in catchment area (81 947 km 2 ).The studied flood records are located in Aranjuez (Fig. 1a), in the upper part of the catchment (9340 km 2 in drainage surface).In Aranjuez, the Tagus River channel has a meandering pattern on a floodplain 800-1000 m in width.Present day mean annual discharge is 35 m 3 s −1 with ordinary streamflow completely regulated by the upstream dams of Bolarque and the Entrepeñas-Buendia reservoir system.The natural streamflow is characterised by extreme seasonal and annual variability including severe floods with peak discharges of more than 30 times the mean.In the Tagus Basin, eastern and northeastern tributaries have a mixed hydrological regime from snowmelt and rainwater from the Iberian and eastern Central Range areas, whereas the southern and The Tagus River flood regime is influenced by the rainfall associated with the Atlantic fronts that cross the western Iberian Peninsula during low zonal circulation over the Atlantic (at 35-40 • N) mostly during winter months (Capel, 1981;Trigo and Palutikof, 2001).The eastern tributaries and the Tagus headwaters show a second flood maximum during autumn related to cold pool cells developing mainly along the Mediterranean coast producing intense precipitation.Flood-producing atmospheric circulation patterns are closely related to the NAO (North Atlantic Oscillation), with a low (negative) NAO index being associated with suppressed westerlies and storm track moving southerly toward the Mediterranean Sea (Walker and Bliss, 1932;van Loon and Rogers, 1978).Recent studies on the influence of the NAO on the Tagus River flooding show evidence that the largest floods (average recurrence intervals > 25 yr) are associated with negative mode of NAO during the 20-25 days (of a total 40-day period length) before the flood peak (Salgueiro et al., 2013).Analysis of flood response under natural and dam-regulated regimes (before and after the construction of reservoirs ca.[1957][1958][1959][1960] revealed changes in the behaviour of floods peaks.In particular, moderate floods (return intervals of 10-25 yr) were blurred during the post-dam period due to flood peak discharge attenuation by reservoirs.

Flood records database
A historical database was compiled from direct and indirect sources (Proceedings of the City Council, diaries, chronicles, memoirs, etc.).Written documents and maps were obtained directly from the General Archive of the Royal Palace which stores data on the administration of the Spanish Royal Heritage (16th-20th centuries), as well as from the Simancas General Archive, the Spanish National Library, and the Geographic Service of the Spanish Army.Further bibliographical sources consulted include scientific and technical reports, local history works and non-systematic compilations by of Rico Sinobas (1850), Bentabol (1900), Masachs (1948Masachs ( , 1950)), Fontana-Tarrats (1977), Gonzálvez (1977), López-Bustos (1981), Comisión Técnica de Inundaciones (1985), Font (1988), andCanales (1989).All indirect information was checked by cross-referencing between different sites (Benito et al., 2003), following the standardised methodology proposed by Barriendos and Coeur (2004).
Water stage data related to documentary descriptions and epigraphic marks were interpreted in terms of stage indicators as follows: (1) sites or landmarks reached by the flood (e.g.churches and bridges) were assumed to provide an ex-act discharge level (equal to the flood stage), (2) flooded areas (e.g.orchards, floodplain areas, gardens) provided a minimum flood stage, (3) non-flooded areas or landmarks (e.g.Royal Palace surrounded by water) were interpreted as a non-reached maximum flood stage, and (4) the relative importance of the event with respect to previous floods (e.g. the 1840 flood was 2 m higher than the flood occurring in 1820) was quoted as a range of discharge in the case of two recorded levels.
The water levels (flood stage) required to reach the described flooded areas or landmarks (buildings, streets, gardens, etc.) were converted into discharge values (O'Connor and Webb, 1988) using the HEC-RAS one-dimensional hydraulic model (Hydrological Engineering Center, 2010).The HEC-RAS software, allows for rapid calculation of watersurface profiles for specified discharges and energy loss coefficients.This conversion is an inverse problem, where the minimum discharge is obtained by matching the modelled flood water levels to those obtained from the elevation of sites inundated during the historical floods, as described in the documentary sources.Hydraulic modelling requires the estimation of key hydraulic characteristics of the river reaches (energy slope, roughness and cross-sectional topography) as well as the boundary conditions upstream or downstream depending on the flow type selected in the model.Detail cross-section topography of the river channel was obtained from echo-sounder and GPS field surveys performed by the LINDE Project (Spanish Ministry of Environment) for flood risk mapping.On the floodplain, cross sections were completed when needed from detailed topographical maps (1 : 500 or 1 : 1000 in scale).In the study reach, river channel modification resulting of constructions of dykes, river lining, gardens setting and other engineering-architectural works were reconstructed based on historical maps and technical reports described among others by García Tapia (1980), González Perez (1987) and Teran (1949).
Systematic streamflow data of annual peak discharge were obtained from gauge records, namely from the Aranjuez station (9340 km 2 ) covering the period between 1913 and 1985 although with several data gaps .In some years, annual daily maximum flows from the Aranjuez station were transformed into peak discharges using a regression equation (Jiménez Alvarez et al., 2013).The missing annual flow data of the Tagus annual maximum flows were completed using data from the Bolarque gauge station upstream of Aranjuez (7400 km 2 ) and since 1985 to present using the Villarubia de Santiago gauge station (9048 km 2 ) located 20 km upstream of Aranjuez.

Statistical analysis under assumption of stationary flood series
The potential of historical records is based not only in their value as documentary registers of the occurrence of a rare flood but also as repositories of the magnitude of these floods by documenting and sometimes mapping or graphically indicating the limits of every rare flood, over a centennial period.In statistical analysis, streamflow monitoring data are considered systematic information whereas flood observations reported as having occurred above some threshold are known as non-systematic (censored) data sets (Leese, 1973).Fitting a distribution to these systematic and nonsystematic data sets provides a compact representation of the frequency distribution, a task that requires a method to estimate the distribution parameters so that quantiles can be calculated.The parameters set for three statistical models, namely the log-normal (LN), generalised extreme value (GEV) and two-component extreme-value (TCEV) distribution functions have been estimated by the maximum likelihood estimation (MLE) method.This method was selected based on its statistical features' performance for large samples and also because of its capacity to easily incorporate in the estimation process any additional non-systematic quantified data (Leese, 1973;Stedinger and Cohn, 1986).Most historical information is non-systematic information of the censored type since only floods of certain magnitudes (typically producing damages) are registered in documentary records.The minimum flood stage (threshold levels) required to be reported in documentary records is most commonly reported in areas with urban settlements or economically important human activities.In the case of the Iberian Peninsula sites studied till now, most are located in floodplain areas (Benito et al., 2004).The flooding of sensitive areas, or perception threshold, may change through time, according to the progressive human occupation of the riverine areas and the socio-economic context.In the case of Aranjuez, the perception threshold did not register significant changes throughout time.The minimum discharge (250 m 3 s −1 ) to inundate the floodplain in Aranjuez is related to the channel bankfull discharge because gardens, roads and buildings (including Royal Palace) are placed on the Tagus' floodplain (Fig. 2).
In our study, flood discharge in years 1919, 1954 and 1941 were estimated based on epigraphic flood marks.All the historical flood data in Aranjuez provide minimum discharge estimates based on elevation points affected by flooding, although the exact water depth in those sites is unknown (lower-bound type after Naulet et al., 2005).A review of these various methods using historical data was presented by Ouarda et al. (1998) and by Francés (2004), and case study applications in Europe can be found e.g. in Francés et al. (1994), Naulet et al. (2005), Calenda et al. (2009) and Botero and Francés (2010), among others.
Statistical analyses rely on the general characteristics and stationarity of the flood.However, the temporal changes in the trajectory and statistics of a state variable may correspond to natural, low-frequency variations of the climate hydrological system or to non-stationary dynamics related to anthropogenic changes in key parameters such as land use and climate.Flood record stationarity from censored samples (systematic and/or non-systematic) was checked using Lang's test (Lang et al., 1999(Lang et al., , 2004)).This test assumes that the flood series can be described by a homogenous Poisson process.The 95 % tolerance interval of the accumulative number of floods above a threshold or censored level is computed.Stationary flood series are those remaining within the 95 % tolerance interval (Naulet et al., 2005).

Models with time-varying dependence parameters
Modelling of flood time series under a non-stationary assumption was carried out incorporating external covariates, which explains the time changes of the statistical parameters (mean and variance) for the selected frequency distribution.In this study, the NAO index and the reservoir index (RI) are introduced as explanatory variables of the marginal distribution parameters.The modelling framework was based on the "generalized additive models for location, scale and shape" (GAMLSS), as proposed by Rigby and Stasinopoulos (2005).GAMLSS has been successfully used in hydrological studies by Villarini et al. (2009aVillarini et al. ( , b, 2010aVillarini et al. ( , b, 2011) ) and López and Francés (2013).
In GAMLSS the response random variable Y (annual maximum peak discharge or historic flood above threshold discharge) has a parametric cumulative distribution function and its parameters can be modelled as a function of selected covariates, which in our study are represented by the winter NAO index (NAOw) and RI.A GAMLSS model assumes that independent observations y t at a given time t = 1, 2, 3, . . ., n follow a cumulative probability distribution function , where the evolution of its vector parameter t can be expressed as a function of the explanatory variables x t j (j = 1, 2, 3, . . ., m) via a monotonic link function g k (.) as follows: where k is a matrix of explanatory variables (i.e.covariates) of order n × m, β k is a parameter vector of length m, and h j k (x t j k ) represents the functional dependence of the distribution parameters (linear or smooth dependence) on explanatory variables x t j .In this study the smooth dependence is based on cubic spline functions.The addition of smoothing terms in the GAMLSS model has many advantages, such as identifying non-linear dependence between the parameters of the parametric distributions and the explanatory variables (López and Francés, 2013).
Due the characteristics of the selected probability distributions, only the non-stationarity in the first two moments are considered.If the relation between distribution parameters and selected covariates follows a smooth dependence, this tends to increase the complexity of the model.According to Eq. ( 1), four different models can be considered for the time-varying marginal distributions: model a -two distribution parameters are constant; model b -only the location parameter is time varying; model c -only the scale parameter is time varying; and model d -location and scale parameters are time varying.The final model is selected by comparing the corrected Akaike information criterion (AICc; Hurvich and Tsai, 1989).This AICc criterion is more strict than the classical Akaike information criterion (AIC).In the absence of a statistic to evaluate the goodness of fit of the selected models as a whole, verification was made by analysing the normality and independence of the residuals of each model, following the recommendations of Rigby and Stasinopoulos (2005).With these criteria, a final model is provided with a balance between accuracy and complexity (López and Francés, 2013).
After defining the functional dependence between distribution parameters and each selected covariate and the effective degrees of freedom for the cubic spline, the distribution functions F Y (y I |ϑ i ) were selected, according to the largest value of the maximum likelihood.In the present study the best model performance was obtained for a LN distribution function of two parameters.Detailed discussion on model fitting and selection can be found in Rigby and Stasinopou-los (2005), Stasinopoulos and Rigby (2007) and Villarini et al. (2009b).
The modelling of flood frequency over the systematic record should consider the high incidence of dam regulation on flood peak discharges.The Bolarque reservoir, together with the Entrepeñas and Buendia reservoirs complex, has substantially altered the streamflow regime of the upper Tagus watershed.Thus, under the altered regime  the reservoir regulation is an important variable to explain the abrupt change on flood magnitude and frequency observed in the systematic record.A RI was used to describe the dependence of flood variability on reservoir capacity and regulation strategies, as proposed by López and Francés (2013), for floods and recently used for low flows by Jiang et al. (2014): where N is the total number of reservoirs upstream of the gauge station; A i is the drainage surface controlled by the reservoir i; A T is the drainage surface upstream of the gauge station; C i is the total reservoir capacity i; and C T is the mean annual runoff at the gauge station.

Historical flood occurrence and discharge estimates
The historical flood records of Aranjuez and Toledo constitute the best registers to be found along the Tagus River; they both gather a continuous data record on floods over a time period of up to 600 yr.The Aranjuez Royal Palace and nearby extensive gardens were commissioned by Phillip II as a winter residence of the Spanish kings.The oldest garden, named Isla Garden (AD 1387-1409), is bounded by the River Tagus and a diversion canal from a mill dam next to the Royal Palace.The other gardens (Prince's Garden of ca. 150 ha) have been progressively extended along the Tagus River floodplain since AD 1545, together with a series of bridges, fountains, museums, as well as facilities for boat trips and deer hunting.In the middle of the garden and next to the Tagus River is located the Farmer's Lodge, a neoclassic Palace (built for Charles IV in AD 1790-1803) highly vulnerable to flooding and frequently referred to in the documentary record when the floodplain is inundated.Currently, only two epigraphic flood marks are displayed at the entrance patio: the 1916 and 1924 flood levels.
The hydraulic model comprises a 13 km reach, from upstream of the Tagus's river confluence with the Jarama River to the upstream of the Prince's Garden at the Embocador mill dam, along which 57 cross sections were surveyed (Fig. 2).Hydraulically, the Aranjuez reach is relatively complex due to its wide floodplain and the near junction to the Jarama River.Discharge estimations using one-dimensional hydraulic modelling for a floodplain river reach may contain some uncertainties in the final result due to potential channel migrations, avulsion and meandering cut-off.However, historical maps show that meanders of the Tagus River along the studied reach were stabilised since AD 1795 when a 2 km reach was straightened and numerous dykes were built at either side of the Tagus River banks.Another source of uncertainty is related to the numerous hydraulic constructions (e.g.dams, bridges, dikes, dam mills) that have been set up along the Tagus River over time.The excellent documentation of the work descriptions and their emplacements (e.g.Díaz-Marta, 1992;García Tapia, 1980) were enough to include the corresponding temporal geometric changes of the cross sections used in the hydraulic modelling.The model was calibrated using flood marks at the Labrador House Palace of the 20 December 1916 and 31 March 1924 floods.It was assumed for the calculations that flow was subcritical along the modelled reach.Manning's n values of 0.035 for the channel and 0.045-0.05for the floodplain (crops and gardens with trees) were assigned.A sensitivity test performed on the model shows that for a 25 % variation in roughness values, an error of 15 % was introduced into the discharge results.
The historical flood reports quote over 100 locations or sites (buildings, streets, orchards, gardens) to have been flooded during the historical period.A database was implemented comprising flood dates, flooded sites, GPS locations, maximum and minimum elevation and nearby cross sections used in the hydraulic modelling.As a result, for each historical flood, information of flooded and non-flooded affected sites and their elevation (maximum and/or minimum) is given, thus making it possible to calculate the discharge associated with the flooded sites.Discharge values associated with each historical flood were estimated upon calculated water-surface profiles bracketing elevations of flooded/nonflooded historical sites along the longitudinal profile (Fig. 2).Most of the documentary descriptions provide information on the minimum flood stage as written reports usually do not provide information of water depth at a site.Non-flooded sites explicitly indicated provide a maximum bound of discharge not exceeded by the flood.Figure 2 shows the calculated water-surface profiles along the study reach with the best match to seven documentary flood indicators of flooded areas during the 1866 flood that provided a minimum discharge of 650 m 3 s −1 .This individual analysis was carried out for each historical flood to estimate the associated discharge (Fig. 3).Only 10 out of the 59 historical flood records did not have an associated stage reference.In this case, the stage reference for these 10 floods was assumed to exceed the Tagus River channel capacity i.e. a minimum bankfull discharge ca.250 m 3 s −1 .
1899 and includes a total of 18 flood events, 6 of which showed minimum peak discharges of 500 m 3 s −1 (Fig. 3).The largest flood within this period occurred in December 1878 showing a minimum peak discharge of 1200 m 3 s −1 , probably the largest at basin scale within the last 750 yr.The next period in terms of flood magnitude corresponds to 1560-1611 with seven floods; four reached a minimum discharge of 400 m 3 s −1 and the largest (May, 1611) over 950 m 3 s −1 .During the 16th century, intense precipitation with large floods alternating with severe drought occurred from 1585 to 1599 (Bullón, 2011).A third historical flood period corresponds to 1739-1750 with seven flood events, three of which yielded values of minimum discharge over 350 m 3 s −1 , and the largest (December 1747) with an estimated minimum discharge of 750 m 3 s −1 (Fig. 3).During the 20th century, nine floods during the period 1913-1927 exceeded a discharge of 350 m 3 s −1 , the largest one in 20 December 1916 reached a discharge of 841 m 3 s −1 .An epigraphic mark of this flood is placed at a column at the entrance patio of the Farmer's Lodge, together with a lower water level mark corresponding to the flood of 27 March 1924 (discharge estimate of 700 m 3 s −1 ), the latter receding on 2 April 1924.However, according to our documentary data and modelling, the largest floods during the 20th century occurred in 25 January 1941, with an estimated discharge of 1100 m 3 s −1 , and on 8 March 1947, with a recorded discharge of 930 m 3 s −1 .

Climatic, environmental and human drivers of flood variability
The temporal and spatial variability of flood patterns are closely related to global change drivers, climate and human activity (e.g.Benito et al., 2008;Hall et al., 2014).A major challenge in non-stationary flood frequency analysis is the identification of major external and internal parameters describing the non-stationarity pattern of the flood series.At annual timescales, moisture influx variability in the Iberian Peninsula and therefore river flow is largely modulated by North Atlantic circulation, and hence by the NAO (Trigo et al., 2004).In the Tagus River flooding is highly related to persistent rainfall (several weeks) due to the successive passage of Atlantic cold fronts over the Iberian Peninsula in winter months (Cortesi et al., 2013).It is, therefore, natural to expect a strong impact of the NAO on flooding in large Atlantic river catchments.Most studies linking NAO index with floods (Benito et al., 2008;Machado et al., 2011;Silva et al., 2012) have used seasonal and monthly correlations leading to lack of significant connections and/or a delayed response between oscillation mode and hydrological event.Floods are produced by rainfall excess with an immediate hydrological response, requiring a daily to monthly resolution (Salgueiro et al., 2013).Figure 1b shows the relationship between the average winter (DJF) NAO index reconstructed by Luterbacher et al. (1999Luterbacher et al. ( , 2002) ) since AD 1500 and the maximum discharges recorded during those months from instrumental (gauge stations) and documentary sources for the Tagus River at Aranjuez.A strong correlation is generally observed between the negative monthly NAO index and flood discharges above 400 m 3 s −1 .The correlation NAO-flood discharge in Aranjuez is not as robust as in other sites downstream (e.g.Alcantara and Talavera; Benito et al., 2008;Salgueiro et al., 2013) since in Aranjuez some autumn floods are connected with Mediterranean cyclogenesis.
The temporal analysis of flood-NAO relationships (Fig. 3) shows that periods of decreasing flood frequency and magnitude are associated with the positive mode of NAO (e.g.1650-1700, 1750-1800).The largest flood discharges occurred at periods with negative NAO phase or at periods with high NAO variability characteristic of strong meridional flow over the Atlantic.However, note that negative winter NAO index values are not always related to the existence of extraordinary floods.
Other factors affecting the stationarity of flood series are land-use changes and reservoir construction in the upper Tagus Basin.Detailed analysis of land-use changes during the historical period is highly complex and out of the scope of this paper.However, the upper Tagus catchment is a mountain region with a relatively low population density, covering a relatively high drainage surface of 9340 km 2 .Therefore, we can assume that local land-use changes can be neglected as major drivers of change in flood patterns.Analysis of the series of annual maximum discharges recorded at gauging stations indicates a decrease in the peaks of ordinary floods over the last 50 yr (Fig. 3).This decrease in peak discharge is mainly due to the construction of dams between the 1950s and 1960s.6 Flood frequency analyses

Flood frequency analysis (FFA) under stationarity assumption
Stationarity tests (Lang et al., 1999(Lang et al., , 2004) ) of the combined documentary and instrumental (under natural regime) flood series above 250 and 350 m 3 s −1 show significant nonstationary conditions over the period 1559-1956.In the case of discharge above 250 m 3 s −1 , the flood record was stationary over the period 1850-1956, whereas for discharges above 350 m 3 s −1 the stationary period comprises 1779-1956.The test performed on the instrumental record  resulted in non-stationarity, due to a sharp decrease in flood discharges after 1957 related to streamflow regulation by reservoirs.In order to include in the analysis the maximum length of historical floods, the FFA was carried out with 32 historical floods exceeding 350 m 3 s −1 over the period 1779-1912, and 45 yr of annual maximum floods (Fig. 4).The LN (two parameters) TCEV and GEV distribution functions were applied to both, systematic data (dashed line) and systematic plus non-systematic (documentary flood) data (solid line) (Fig. 5).Discharge values associated with different return periods (T) that result from the distribution fitting are shown in Table 1.The visual matching as well as the statistical parameter analysis outcome indicate that distribution fitting is good in all cases, with the log-normal distribution providing the more realistic quantile values.In fact, only one flood over the last 230 yr reached a discharge of 1200 m 3 s −1 (1878 flood event).
The incorporation of the historical data into the FFA results in a slight increase of the magnitude of the flood quantiles when compared with the ones obtained with the systematic record before reservoir constructions (Fig. 5; Table 1).Major differences are found when the combined FFA quantiles are compared with annual maximum discharge series under the regulated systematic period.For example, the 0.01 annual probability flood of the combined historical and natural-systematic records is 1450 m 3 s −1 whereas during the regulated systematic period is 600 m 3 s −1 .

Modelling flood data under non-stationary approach
The implementation of the non-stationary model consisted in establishing the type of dependence of the distribution parameters with the associated external covariates (winter NAO index and reservoir index) as a function of time.For the selected log-normal distribution (LN with 2 parameters: µ* and σ *, the mean and the standard deviation of the logarithms, respectively) the modelled changes (linear or smooth depen-  dence) on the distribution parameters are applied to obtain the best-fitted distribution parameters for the discharge above the historical discharges threshold.It should be noticed that, for a LN distribution, the variance of the original random variable is a function of its two parameters.The reservoir index for the Aranjuez gauge station was calculated by Eq. ( 2).As previously indicated, the streamflow from the Tagus Basin headwaters is regulated by three major reservoirs (Entrepeñas, Buendia and Bolarque).Construction of the Bolarque reservoir (capacity of 31 × 10 6 m 3 ) was completed in 1955 and the Entrepeñas-Buendia (2394 × 10 6 m 3 ) were built between 1956-1958 (dams and reservoir database of the Spanish Ministry of Environment; http://sig.magrama.es/snczi/).As shown in Fig. 6, in 1956, the RI has an abrupt change point because of the construction of these dams.Modelling of maximum annual discharge (Q) followed a two-step process.Firstly, the non-stationary models incorporated only the winter (December-March) NAOw index (period with highest correlation with flooding) with two models implemented for the periods 1913-1956(Fig. 7a) and 1957-2008 (Fig. 7b).Secondly, the RI was added as a second covariate for the whole systematic period .
The non-stationary model with NAOw as external covariate over the period 1913-1956 shows a linear dependence on both parameters, the mean and the variance of the logarithms of annual flood peak discharge (Table 2) and therefore on the mean and variance of the floods (Fig. 7a).However, for the period 1957-2008, the linear dependence was only found in relation to the log-Q mean (Table 2, Fig. 7b), indicating the strong incidence of the reservoir regulation on the peaks of annual maximum floods.Also, it is interesting to underline that, for the whole period 1913-2008, the heteroscedasticity of the log-Q data is only described by the climate, and not by the reservoir construction.However, in terms of the original variable, the heteroscedasticity of the floods is described by both the climate and the reservoir construction.
The models adequately describe the changes in the percentiles of the annual maximum flood peaks although, for some periods, a high negative NAOw index does not imply high flood discharges.Moreover, there is a higher variability for the highest percentiles (75th and 95th) whereas a low variability is observed for the 5th and 25th percentiles.Fig-  (1913-1956 and 1957-2011) due to construction of a complex of three reservoirs at the Tagus headwaters.ure 8 shows the observed discharges and the estimated quantiles plotted in relation with the NAOw index.The highest annual discharges and their associated 95th percentiles show a high correlation with negative NAO index for the period with natural or low altered stream regime , although some negative NAO index values resulted in moderate peak discharges.
The modelling of flood frequency including the NAOw and RI as external covariates explains adequately the observed flood variability for the Aranjuez station (Fig. 9).The flood modelling incorporating additionally the RI allowed for a good characterisation of the sharp change on the flood regime associated with the construction of the upstream reservoirs system, which controls about 80 % of catchment surface draining to Aranjuez.These results show the potential of the RI as a covariate that explains the sharp change in the median and the variance, and how to incorporate this effect into the model.and the systematic information , with the NAO index and reservoir index as the explanatory variables of the distribution parameters.

Comparison between stationary and non-stationary models
The modelling of non-stationary series was carried out considering continuous annual maximum discharge records as a variable of response (Y ) to be explained in the model.In the following analysis the focus is on modelling flood frequency and magnitude of censored historical floods for which a minimum discharge threshold can be determined.In the study of historical (documentary) hydrology the most critical point is to establish a quantitative threshold of discharge, which in most cases depends on the human occupation of the floodplain (perception level) that may vary through time.In the case of Aranjuez, documented floods are in relation to the gardens and singular buildings (Royal Palace, Lodge, etc.) settled on the floodplain, and their situation has remained almost invariant over the last 400 yr.The threshold of discharge is therefore related to the bankfull discharge and the minimum discharge required for inundating the floodplain, which has been established as 250 m 3 s −1 .As indicated previously, the relationship between the average winter NAO index, reconstructed by Luterbacher et al. (1999Luterbacher et al. ( , 2002) ) since AD 1500, and the historical flood discharges estimated in our study support the use of NAOw as a covariate to explain nonstationarity through the historical period.Figure 10 shows the results of the estimates of quantiles since AD 1700, assuming the modelling relationships obtained from the systematic record with the natural regime .The comparison of the estimated quantiles with the non-systematic historical data shows that the reconstructed NAO index is a highly significant covariate to describe the temporal variability of historical floods recorded in Aranjuez.Similar results were obtained for the estimated quantiles with the non-stationary model based on the systematic record for the period 1913-2008 (Fig. 10), where both the NAOw index and the RI were used as covariates of the statistical parameters of the parametric distribution.
The results obtained from modelling flood frequency under non-stationary conditions while incorporating external forcing captures more adequately the dispersion of flood values and shows the effect of climate indices modulating the frequency and magnitude of floods (Fig. 10).The historical non-stationary model also captures the presence of multidecadal temporal trends, characteristic of the North Atlantic atmospheric circulation, which are incorporated in the flood frequency model.Figure 11 shows the results of FFA in stationary conditions and non-stationary conditions for an exceedance probability of 0.01 (i.e. return period of 100 yr).It can be seen that non-stationarity models indicate the existence of periods in which flood frequency experienced significant variability (decreases and increases).On the long-term, stationary FFA based on historical data reflects the average discharge for the 100 yr flood although in terms of flood hazards during some periods the stationary 100 yr flood discharge (1450 m 3 s −1 ) has associated a higher probability of occurrence.A clear decrease in flood frequency can be seen during the period 1957-2008, which has been caused by the construction of the upper Tagus reservoirs, with a subsequent decrease in the frequency of occurrence of floods in Aranjuez.

Discussion
Flood hazard analysis is critical for the elaboration of risk maps, and it is the basis of the design of hydraulic structures (e.g.dam spillways, diversion canals, dikes), urban drainage systems, land and urban planning and cross-drainage structures (e.g.bridges and culverts and dips).This probabilistic assessment of hazards and risks are routinely done assuming the stochastic nature of streamflow and their extremes.The reliability of the quantile estimator can be improved either by using the best statistical model or by increasing the amount of information (e.g.Botero and Francés, 2010).The use of historical censored data can increase the information length at the gauge station, providing, as it has been mentioned previously, that for each study case the information error and the underlying stationary hypothesis are explored.
Flood frequency analyses with systematic and additional non-systematic information have been widely demonstrated to reduce the quantile estimate uncertainty (e.g.Stedinger and Cohn, 1986;Francés et al., 1994;O'Connell, 2005).In the analysis of frequency using non-systematic data it is more important to know the number of floods exceeding a certain stage (threshold) over a period of time than the precise value of the historical flood peaks (Viglione et al., 2013).The location of the royal palace and gardens on the River Tagus floodplain (Aranjuez) made it highly likely that any flood causing damages to these royal constructions would have been reported and recorded on administrative documents, providing an accurate account of all floods exceeding the bankfull discharge.The statistical analysis with non-systematic records is suitable to deal with uncertainties related to the precise estimation of flood discharges (Stedinger and Cohn, 1986) particularly when the historical information is used as binomial censored data (i.e.without defining their exact value, just the lower limit of censoring) without losing statistical significance (e.g.Francés et al., 1994).In our FFA the 32 historical floods, exceeding 350 m 3 s −1 over the period AD 1779-1912, have been treated as lower bound information, reducing in this way the potential negative impact of the information error.
In recent decades, land-use change and anthropogenic climate change impacts on hydrology have largely questioned the stationarity hypothesis (Milly et al., 2008), and several non-stationary models have been applied in the study of annual maximum flood series (e.g.Cunderlik and Burn, 2003;Ouarda and El-Aldouni, 2011;López and Francés, 2013).However, there is still a lack of research assessing the reliability of non-stationary flood frequency models to represent hydrological disturbances due to low-frequency climate variability over centennial historical periods.This study presents a unique record of censured flood discharges over a 300 yr period that made it possible to test the weight of temporal variability on statistical parameters under a non-stationary assumption.
In the Aranjuez case study, the Lang stationary test for a discharge threshold of 350 m 3 s −1 was successfully performed covering the historical period  and the instrumental record under natural streamflow regime .The oldest historical period (1557-1778) showed a non-stationary behaviour, mainly related to a reduction on the flood frequency over the period 1630-1710, that climatically corresponds to the minimum Maunder, known by the existence of dry periods and characterised by a high hydrological variance and a decrease in flood activity over the western Iberian Peninsula (Rodrigo et al., 2000).
A detailed analysis of the stationary test shows several periods of accumulated flood events at the limit of the confidence interval due to a reduction of flood frequency (Fig. 4a).A reduction on flood frequency of large floods occurred over the first decades of the 19th century (1810-1850).In contrast, a high flood frequency of large floods was recorded over the second half of the 19th century and beginning of the 20th century.A similar flood reactivation during this period is recorded in other Iberian rivers such as Guadalentín (Benito et al., 2010), Júcar (Francés, 1998), Llobregat (Thorndycraft et al., 2005) and Guadalquivir rivers (Benito et al., 2005).Over the whole record, three of the largest five floods of the Tagus River occurred during the historical period record (1878, 1611 and 1830).The use of this non-systematic data improved the precision of the frequency analysis for the exceedance probability of 0.01 (i.e. return period of 100 yr) for describing long-term average flood risk.The frequency analysis based solely in the systematic gauge record provided lower discharge values, both under the natural flow regime  and to a greater degree during the substantially altered streamflow, post-reservoir regime .In fact, the flood quantile estimates over the systematic period under the non-stationary model (Fig. 9) shows a sharp change in the flood regime after the construction of the reservoir system.
In the western Iberian Peninsula, moisture influx variability and, therefore, river flow are largely modulated by North Atlantic circulation, characterized by the NAO (Trigo et al., 2004).In the Tagus River a strong correlation is generally observed between flood discharges above 400 m 3 s −1 (floods of ∼ 5 yr return interval) and negative monthly NAO index, including both the historic and instrumental periods.However, negative winter NAO index values are not always related to the existence of extraordinary floods.This relationship between flood occurrence and a negative NAO phase provides robust a basis for using the winter NAO index as covariate explaining expected changes on flood stationarity towards the historical period, and validation of the changing discharge for an exceedance probability quantile.The modelling of the dependence on the occurrence rate and the dispersion coef-ficient in relation with the climatic variability during the instrumental period identified the NAO index as an explanatory covariate, allowing for the modelling of the most likely changes of the statistical parameters towards the past.The implementation of a non-stationary model that includes the effects of the North Atlantic circulation mode in relation to climate variability provides an excellent description of the interannual flood variability associated with the 0.01 annual exceedance probability flood (100 yr return period) over the last 300 yr.This variability in the case of the 100 yr flood discharge obtained with the non-stationary model reproduces many of the observed changes on historical floods (with higher flood magnitudes recorded for the periods 1920-1950, 1860-1890 and 1700-1730) as well as other with belowaverage flood magnitudes (e.g.1750-1800).The projection of the NAO index in relation to climate change is still unclear although nearly half of the models predict a positive intensification of the index associated with global change (Osborn, 2004).Moreover, some studies have shown that the association between the NAO circulation mode (NAO index) and local or regional climate variables (rainfall and temperature) has changed over time (Rodó et al., 1997;Goodess and Jones, 2002).
The application of these models as a predictive tool requires the introduction of scenarios as well as an improved modelling of low-frequency atmospheric circulation under anthropogenic climate change (Hall et al., 2014;Merz et al., 2014).Our results indicate that stationary frequency analysis using historical data provides a robust reference value of the "average" probabilistic discharge during the considered period.For instance, the peak flood for an annual exceedance probability of 0.01 during the historical  and the systematic periods  under a stationary model (log-normal distribution) is ∼ 1450 m 3 s −1 whereas under the non-stationary model for the period 1700-1956 the 100 yr flood discharge ranges from a maximum value of 4180 m 3 s −1 to a low of 560 m 3 s −1 (Fig. 11).This interannual variability in probabilistic discharge reflects the high variability on streamflow characteristic of regions with contrasting interannual and interdecadal rainfall amounts, such as the Iberian Peninsula.These changes on annual flood peaks have been also observed in the historical periods with discharges that ranged between 1200 m 3 s −1 in 1878 to a low gauged discharge of 59 m 3 s −1 in 1929.Our analyses indicate that a stationary model based on long-term historical flood records provides reliable average flood discharge quantiles, although the probabilities of exceedance changed from year to year.In the case of hazard-sensitive infrastructures, the hypothesis of a non-stationary model should be integrated in their design and implementation, once it is able to address a more accurate estimate regarding the worst case scenario for the safety check flood.

Conclusions
Documentary records of the Tagus River in Aranjuez (central Spain) provide evidence of continuous and homogenous data of extreme floods over the last 300 yr.Documented reference to flooded and non-flooded sites (gardens, streets) and buildings (Royal Lodge) were successfully assessed and computed into flood discharges.The universality of this methodology depends however on the quality and continuity of the documented registers, as well as on the characteristics of the human settlement (spatial variation along time, burying processes, destruction or changes of landmarks) and the geomorphological dynamics of the area.
The discharge estimates show evidence that during the historical flood record , flood events of greater magnitude than the ones recorded in the gauging station  were documented.The documentary record contains descriptions of 59 historical floods with the largest event produced in 1878 with an estimated discharge of 1200 m 3 s −1 .Documentary evidence illustrates the high sensitivity of flood magnitude and frequency to the climatic variability during the last millennium.Unusually high flood frequencies were registered in the periods 1580-1610, 1730-1760, 1800-1810, 1870-1920, and 1940-1950.The construction of large reservoirs (Entrepeñas-Buendia and Bolarque dams), in full operation since 1957, changed dramatically the flood regime of the upper Tagus River.During the post-reservoir period (1957-present) the maximum peak flow discharge reached 280 m 3 s −1 (in 1964), with most of the events recording values below 160 m 3 s −1 .
The identification of flood clusters during the historical record uncover the potential problems of assuming a stationarity in the flood frequency analysis.In this study, two different models were used for the flood frequency analysis of the Aranjuez flood data sets: (1) the stationary model, in which the distribution parameters do not depend on covariates, i.e. the parameters are constant in time; and (2) the time-varying model (non-stationary model) that incorporates external covariates, where the distribution parameters can vary as a function of climate variability (North Atlantic Oscillation index during winter months; NAOw) and the streamflow regulation by dams (reservoir index; RI).Regarding the stationary flood frequency analysis, a stationarity test associated with different discharge thresholds (perception levels) was efficient to detect anomalous flood periods/records.The Aranjuez flood record (historical and systematic) for discharges exceeding 350 m 3 s −1 showed problems of stationarity with two cutting dates in AD 1778 and 1957, the former associated with an increase in flood frequency and the second as a result of a decrease in floods after the upstream reservoir system construction.The stationary test (Lang's test;Lang et al., 1999) was successfully passed over the period 1779-1956, covering part of the historical non-systematic record  and gauged record under natural streamflow conditions .
Flood frequency analysis using documentary data (plus gauged record) improved the estimates of the probabilities of rare floods (return intervals of 100 yr and higher).Moreover, historical information increases and tests the representation of outliers in the systematic data.A flood frequency analysis with 32 historical floods (> 350 m 3 s −1 ) and 45 annual maximum flood data from the gauged record provided the best fitting results to a log-normal distribution, with a discharge of 1450 m 3 s −1 associated with a 0.01 % annual exceedance probability (average recurrence interval of 100 yr).Our analysis shows a major impact of river regulation by large reservoirs in the occurrence of large peak flows in the upper Tagus River basin, with major impacts on the stationarity of the flood series.
Although, in the long-run, the FFA (flood frequency analysis) under stationary models provides good average values, historical flood records indicate that secular flood occurrence related to climate variability has changed over the last 300 yr, during which flood quantiles have fluctuated through time at decadal time spans.The modelling of flood time series under a non-stationary model incorporating the NAO index and reservoir index as external covariates demonstrate the existence of a high decadal variability on flood percentiles reflecting the internal climatic variability in the flood occurrence.According to non-stationary models, the peak flood associated with a 0.01 annual exceedance probability may range between 4180 and 560 m 3 s −1 .
In terms of the hydrological effects of climate change, future global circulation model projections incorporate too much uncertainty to accurately specify expected patterns of precipitation change or to estimate expected changes in the frequencies and magnitudes of extreme storm and flood events.Predictions can be improved by incorporating longterm flood records (several millennia) in climatic modelling and statistical analysis.The study of temporal variability of past climate-flood links can establish short-and long-term relationships at regional levels and in areas within different climatic zones.Regional studies of long-term climate-flood links involve calibrating the relationships, detecting trends (where they exist) and revising estimates of return periods.This integration will greatly improve our understanding of flood frequency and magnitude in the context of changing climates where the assumption of stationarity (implicit in most current flood risk models) is being questioned.The analysis of flood quantiles under stationary and non-stationary models over long-term historical periods is critical for the analysis of flood hazards of sensible infrastructure (e.g.dams, nuclear power plants) in the context of climate change.

Figure 1 .
Figure 1.The Tagus River basin in the Iberian Peninsula.(a) Location of the historical and instrumental records used; (b) relationship between the annual flood peak discharge at Aranjuez and the corresponding month value of the NAO index.Discharges over 400 m 3 s −1 are in most cases related to a negative NAO phase.

Figure 2 .
Figure 2. Above: Royal Palace of Aranjuez with a view of the Tagus River and gardens painted by Antonio Joli, ca.1753.Bottom: longitudinal profile of the channel bottom along the study reach and calculated water surface profiles fitting historical flood water indicators described as inundated during the 1866 flood.An estimated discharge of 650 m 3 s −1 was assigned to this flood.Note the water surface profile for baseflow conditions.

Figure 3 .
Figure 3. Reconstructed flood record in Aranjuez during the historical period (1557-1912) and annual maximum flood series during the instrumental periods (1913-2008), with vertical bars showing the peak discharge.NAO index values associated with flood dates are indicated (historical NAO index afterLuterbacher et al., 1999Luterbacher et al., , 2002)).

Figure 4 .
Figure 4. (a) Poisson test on the time flood process in the Aranjuez record for the period 1780-1956, for floods exceeding a discharge threshold of 350 m 3 s −1 .Note that the flood series (central line connecting points) remain within the 95% tolerance interval (outside enveloped curves).(b) Organisation of historical and systematic flood data, for flood frequency analysis.

Figure 5 .
Figure 5. Log-normal (LN), Generalised extreme value (GEV) and two-component extreme-value (TCEV) distributions fitted with (1) systematic data and (2) systematic and historical flood data.Dots represent the plotting positions for the systematic record (white dots) and historical plus systematic records (black dots).

Figure 6 .
Figure 6.Variations of the RI in the Aranjuez gauge station.Note the sharp change in RI value produced in 1957.

Figure 7 .
Figure 7. Modelling results of the variation in the marginal distributions of the annual maximum streamflow with the NAO index as the explanatory variable of the distribution parameters.(a) Quantile plot for the Aranjuez gauge in the period 1913-1956; (b) quantile plot in the period 1957-2008.

Figure 8 .Figure 9 .
Figure 8.Estimated quantiles for the model with the NAO index as the explanatory variable plotted against NAO index.(a) Plot for the Aranjuez gauge in the period 1913-1956; (b) plot of the period 1957-2011.

Figure 10 .
Figure10.Estimated quantiles for the model using systematic information under a natural regimen and the systematic information, with the NAO index and reservoir index as the explanatory variables of the distribution parameters.

Figure 11 .
Figure 11.Quantile estimates of the annual maximum floods with 0.01 annual exceedance probability based on stationary and nonstationary models.

Table 1 .
Flood quantiles for different return periods in Aranjuez obtained using log-normal (LN), two-component extreme-value (TCEV) and generalised extreme-value (GEV) distributions fitted to firstly the annual maximum systematic records only, then the combined systematic and censored historical floods for discharges above 350 m 3 s −1 .