A framework for assessing flood frequency based on climate projection information

Abstract. Flood safety is of the utmost concern for water resources management agencies charged with operating and maintaining reservoir systems. Risk evaluations guide design of infrastructure alterations or lead to potential changes in operations. Changes in climate may change the risk due to floods and therefore decisions to alter infrastructure with a life span of decades or longer may benefit from the use of climate projections as opposed to use of only historical observations. This manuscript presents a set of methods meant to support flood frequency evaluation based on current downscaled climate projections and the potential implications of changing flood risk on how evaluations are made. Methods are demonstrated in four case study basins: the Boise River above Lucky Peak Dam, the San Joaquin River above Friant Dam, the James River above Jamestown Dam, and the Gunnison River above Blue Mesa Dam. The analytical design includes three core elements: (1) a rationale for selecting climate projections to represent available climate projections; (2) generation of runoff projections consistent with climate projections using a process-based hydrologic model and temporal disaggregation of monthly downscaled climate projections into 6-h weather forcings required by the hydrologic model; and (3) analysis of flood frequency distributions based on runoff projection results. In addition to demonstrating the methodology, this paper also presents method choices under each analytical element, and the resulting implications to how flood frequencies are evaluated. The methods used reproduce the antecedent calibration period well. The approach results in a unidirectional shift in modeled flood magnitudes. The comparison between an expanding retrospective (current


Introduction
The design and safety assessment of large dams in the western United States requires estimates of flood frequency.Flood frequency relates the magnitude of floods with their probabilities of occurrence.Often flood frequencies are described by return period.The return period concept, as often communicated in the community and practice, is that a 100-year flood is an event that should happen, on average, once every hundred years.A more strict interpretation of a flood frequency for a 100-year flood is that it is a flood that is believed to have a probability of being equaled or exceeded of 0.01 in any one year.While we do not wish to challenge the current paradigm of communication of flood hazard, it is reasonable to question the paradigm of what a return period means within a nonstationarity system (Sivapalan and Samuel, 2009).The nonstationarity concern and current paradigm are not mutually exclusive if it is acknowledged that a flood with a 100-year return period is not a constant value.Or, working within our preferred strict interpretation of the flood return period, a flood with an exceedance probability of 0.01 this year may have a different exceedance probability in the future.
Risk based decisions often use the probability of occurrence of a flood with a specified magnitude and the consequences of that event.If the consequences are deemed unacceptable, modifications of infrastructure or changes in operations may be necessary to alleviate the risk.In a changing D. A. Raff et al.: A framework for assessing flood frequency climate, and given how flood risks are generated from the observed record of the past, it may be prudent to include information that not only describes the flood potential of the past but also of the future.
Flood frequency estimation within the United States government has as its fundamental doctrine, Bulletin 17-B published by the Interagency Advisory Committee on Water Data (IACWD, 1982).Released in 1982, Bulletin 17-B provides guidance for observational data treatment and parameter estimation for flood frequency distributions (IACWD, 1982).The general methodology of Bulletin 17-B is to gather a time series of annual maximum floods at the location that the user wishes to determine the flood magnitude versus frequency relationship.In additional to the gage information, any historical information about large floods that may pre-date the gage record is also used.Fundamentally, Bulletin 17-B assumes that flood potential can be described by a three parameter log-Pearson distribution (log-Pearson III distribution).It is known that of the three parameters (mean, standard deviation, and skew) the skew is most sensitive to the information set.Bulletin 17-B, therefore, provides guidance on estimating the skew based upon a weighted sum of the collected data set and regional estimates of skew.All of the information is then used to fit to a log-Pearson III parameter distribution1 .This fitted distribution then describes the probability of an annual maximum flood being exceeded.The process used in Bulletin 17-B assumes many things such as that the annual maximum floods are independent samples from a general population.This idea that information from the past is a good indication of current potential or future potential is called a stationarity assumption.This stationarity assumption may be less valid when the climate is changing and the flood potential at a location may be changing along with the climate.Nearly three decades ago it was acknowledged within Bulletin 17-B that little attention was given to the subject of non-stationarity and that future studies were needed.Within Bulletin 17-B, although the word non-stationarity is not used explicitly, the concept is alluded to among the eight recommendations for future studies.It was identified that there is a need to account for watersheds altered by urbanization whose flood potential may not be reflected by the observed and historical data at the location (p.27+28, IACWD, 1982).
That vast majority of research since the release of Bulletin 17-B has been focused on improved treatment of historical data from instrumental records and/or historical and paleoflood proxies.There are studies that have looked at more efficient selection of distributional parameters (e.g., Lane and Cohn 1996, O'Connell et al., 2002, Stedinger et al., 1988) that perform better when compared to Bulletin 17-B (e.g., Cohn et al., 1997;England, 2003).There are studies that have improved estimates of uncertainty (e.g., Cohn et al., 2001;O'Connell et al., 2002) and those that avoid a distributional assumption (e.g., O'Connell, 2005).These methods have improved the treatment of historical data and as a collection have made vast strides forward to fitting distributions to data that has been collected for a specific site when an assumption of stationarity is supportable.There has also been work in an attempt to expand our assumptions of known variability through the incorporation of paleoflood data which may have come from a different climate than that observed or known in the historical record (e.g.Frances et al., 1994;O'Connell, 1999).
It is acknowledged that the assumption of historical climate stationarity has always been questionable in flood frequency estimation.This assumption would appear to become even more questionable in the future (e.g., Milly et al., 2008), particularly as a warming climate may to lead to changes in precipitation regime, seasonality, and other characteristics relevant to floods.Some studies have focused on how shifts in climate might lead to changes in extreme events such as precipitation and temperature (Manabe et al., 1980;Easterling et al., 2000).The Intergovernmental Panel on Climate change recently reported in their fourth assessment report that the climate is warming and that it is very likely that heavy precipitation events will increase in frequency over most areas (IPCC, 2007a).Evidence has been mounting that precipitation rates and patterns have been changing in the observational record (e.g., Alexander et al., 2006;Kunkel et al., 2003;Kanae et al., 2004).There are further studies that have used climate projections to show shifts in future precipitation patterns (e.g., Easterling et al., 2000;Emori et al., 2005).Changes in extreme precipitation patterns have consequences for changes in flood patterns.Hamlet and Lettenmaier (2007) showed that there were changes in flood risks during observed warming of the 20th century.
There have been process-based approaches to consider changes to floods and flood frequencies.Using GCM projections, Hirabayashi et al. (2008) have simulated daily discharges for projected climate and shown changes in precipitation and flood patterns that they identified as an increased frequency of flooding over many regions except North American and central to western Eurasia.Cameron et al. (2000) used GCM simulations to drive the TOPMODEL hydrology model to show the changes to probability of occurrence of specific discharges for the gauged, upland Wye catchment in Wales, UK.Sivapalan and Samuel (2009) illustrate an approach to use process-based methods to estimate flood frequencies that do not rely upon stationarity assumptions for three catchments in Australia.
From a statistical perspective, methods have been proposed to address how a changing climate might be related to flood frequency estimation.For example, Griffis and Stedinger (2007) proposed to use observed trends in log-Pearson III parameter estimates as a function of time to estimate distributional parameters that may be useful to describe the flood potential into the future.To evaluate the physical response to a changing climate there remains limited guidance on how to incorporate climate projection data into a framework for flood hazard assessment.In this manuscript methods to address this gap in planning capabilities are introduced.The methods described are meant to identify whether climate change may influence risk assessments made using Bulletin 17-B.The methods are designed to reveal flood frequency consistent with climate projection information at a user-specified future period.Methods are demonstrated in four case study basins: the Boise River above Lucky Peak Dam, the San Joaquin River above Friant Dam, the James River above Jamestown Dam, and the Gunnison River above Blue Mesa Dam.The analytical design includes three core elements: (1) a rationale for selecting climate projections with the objective of representing the breadth of climate projection information available; (2) generation of runoff projections consistent with climate projections, using a process-based hydrologic model and temporal disaggregation of monthly downscaled climate projections into submonthly weather forcings required by the hydrologic model; and (3) analysis of flood frequency distributions based on runoff projection results.

Data sources and methods
The following methods describe the steps utilized in this manuscript to estimate flood frequency from climate projections.There were four river basins considered (Sect.2.1).The focus is to evaluate the physical response to climate projections through the use of a hydrologic tool (Sect.2.2).The general methodology described below is to use GCM projections of temperature and precipitation to drive a hydrology model.The GCM projections are at a spatial and temporal scale incompatible with modeling flood flows so spatial and temporal downscaling methods will be employed.
For each of the four river basins a subset of 9 climate projections of temperature and precipitation were chosen from a candidate pool of 112 potentials at each of three lookahead periods (2011-2040, 2041-2070, and 2071-2099) (Sects.2.3 and 2.4).For each of the climate projections a weather generation scheme was employed to temporally disaggregate the monthly climate projection values into 6-h values (Sect.2.5) necessary to drive the hydrologic tool.The weather generation approach has a random component to it and therefore was applied 10 times per projection.Ten random generations were chosen, somewhat arbitrarily, through assessment of the differences among each random generation.The hydrologic simulations result in a set of flows from which the annual maximum discharges were compiled.The simulated annual maximum discharges were then considered in the context of estimating flood risk through flood frequency analyses (Sect.2.6).

Basin selection
The effect of a changing climate may vary geographically.Therefore, to determine the suitability of the methods proposed it was desired to have a geographically diverse set of examples.Four geographically diverse reservoir watersheds were considered, each having dams that were either built by the Bureau of Reclamation (BOR) or significantly influence Reclamation operations.The four basins are the Boise River, above Lucky Peak Dam, the James River above Jamestown Dam, the Gunnison River above Blue Mesa Dam, and the San Joaquin River above Friant Dam (Fig. 1).Each of these basins has a strong snowmelt component to flood generation.Most often these basins have annual maximum discharges that are snowmelt only, or rain-on-snowmelt events.It is expected, however, that there are different geographic and other conditions that affect flood response to climate change (Hamlet and Lettenmaier, 2007).
Lucky Peak Dam is located at 43 Friant Dam is located near 37 • 00 N, 119 • 42 W on the San Joaquin River about 19 miles from Fresno, California.The dam impounds Millerton Lake.The drainage area at Friant Dam is approximately 4120 km 2 (1591 mi 2 ).Drainage is from the western slope of the Sierra Nevada range.Elevations in the basin range from 170 m at the dam to just under 4260 m along the crest of the Sierra Nevada range.The terrain in the basin may be described as rugged forest.Mean annual precipitation over the basin is approximately 900 mm which varies significantly by elevation.

Hydrologic tool
The hydrologic model used in this study is the National Weather Service River Forecast System (NWSRFS) Sacramento Soil Moisture Accounting (SAC-SMA) Model (Burnash et al., 1971).The SAC-SMA Model is coupled to the Anderson Snow Model of snow accumulation and ablation (Anderson, 1973).This model was chosen because it is the operational model of the National Weather Service and calibrated models for all of the chosen basins were available.SAC-SMA consists of two upper and three lower soil moisture storage zones.The two upper zones are free and tension water storage and the three lower zones are a primary free, a supplemental free and a tension water storage zone (Burnash, 1995).The snow accumulation and ablation model computes a freezing height to distribute rain and snow by elevation.The NWSRFS SAC-SMA Model has a long history of operational use within the United States Federal Agencies.Despite the fact that this study looks at characterization of future climate, calibration sets based on an antecedent period were not altered for the future period.Further discussion of this assumption can be found in Sect.3.4.

Climate projections data
In order to evaluate the potential changes in flood frequency from projected climate changes it is desired to have a current set of climate projections that encapsulate the projected future climate variability.In preparation for the IPCC's fourth assessment report (IPCC, 2007a, b), climate model output was collected as the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3) multi-model dataset (Meehl et al., 2007).The CMIP3 archive houses projections made from climate models that include coupled atmospheric and ocean general circulation models (GCMs).Each of these models simulate global response to various future greenhouse gas emissions paths (IPCC, 2000).The GHG emission paths were defined beginning from the end of the 20th century from lower to higher emission rates of carbon dioxide into the atmosphere as a subjective function of global technological and economic developments during the 21st century.
The grid resolutions of the CMIP3 models are O(10 2 ) km, which is not appropriate to evaluate the impacts to local flood hydrology where information at less than O(10) km is needed.For example, the hydrologic models used in this study are used to support operational flood forecasting objectives and have been applied at resolutions of O(10) km to appropriately represent flood-relevant hydrologic processes.Spatial downscaling is used to bridge this gap in spatial resolution.There are two broad types of downscaling available, dynamic and statistical.A statistical downscaling approach was selected for use here as it provides information that is well tested and documented, automated and efficient enough to permit downscaling of many projections, able to produce output that statistically matches historical observations, and is capable of producing spatially and temporally continuous fine-scale precipitation and temperature information at the basins modeled (Brekke et al., 2009).Potential drawbacks to a statistical downscaling approach include the lack of capability of a statistical approach to identify or model local climate effects and land-surface feedbacks (Salathe et al., 2007).There is a further inherent assumption of stationarity that the statistical relationships observed between fine scale observations of the past and the GCMs are relationships that will continue in the future.Despite these drawbacks the statistical approach has been shown to provide capabilities competitive with dynamical methods (Wood et al., 2004).There are multiple methods to accomplish statistical downscaling (e.g., Wood et al., 2002;Wood et al., 2004;Maurer and Hidalgo, 2008).
For this study, the focus was having access to a large set of consistently downscaled climate projections over each of the case study basins.Using these criteria, a decision was made to use data from the "Statistically Downscaled WCRP CMIP3 Climate Projections" archive http://gdo-dcp.ucllnl.org/downscaled cmip3 projections/ (Maurer et al., 2007).These data were developed using a statistical downscaling technique called bias-correction spatial disaggregation (Wood et al., 2002(Wood et al., , 2004) that has been used to support numerous investigations on projected hydrologic impacts under climate change (Payne et al., 2004;Van Rheenan et al., 2004;Maurer, 2007;Christensen and Lettenmaier, 2007;Anderson et al., 2008;Brekke et al., 2009).The data archive includes downscaled projections of 112 CMIP3 projections of simulated monthly climate from 1950-2099 and at 1/8 • spatial resolution.
All 112 projections were obtained for the latitude longitude coordinate of the dam for the purposes of projection selection described in Sect.2.4 and subsequently over the entire basin in support of the weather generation methods described in Sect.2.5.The particular projections are available at the archive described above.For the purpose of numbering the 112 projections they were numbered first by model in ascending alphabetical order, second by emissions path in ascending alphabetical order, and finally by model run in ascending numeric order.For example, the projections are labeled <model>.<path>.<run> in the archive and the projections numbered here #1 through #3 are therefore bccr bcm2 0.1.sresa1b,bccr bcm2 0.1.sresa2,and bccr bcm2 0.1.sresb1,respectively.

Projection selection
The desire of using projected climate and considering more than a single projection is to portray that there is not a known future climate and to consider the variability with respect to temperature and precipitation changes and lookahead periods.Ideally to estimate flood risk at some point in the future one could assign a probability distribution to the expectation of temperature and precipitation.For example one approach could be to use all 112 projections, treated as equally likely as an ensemble representation of projected climates.This approach has the advantage of not requiring assignment of probabilities to specific projections.However, the results would tend toward the central tendency of the 112 projections with little weight on the projections that show dramatic shifts and may have the most significant implications on flood risk.A second approach could be to attempt to evaluate model performance over the historical period at the locations of interest and use the "best" models for projections.This approach, however, has been shown to be difficult and sensitive to evaluation metric (Gleckler et al., 2008;Reicher et al., 2008).In addition, it might not reduce the assessed projection uncertainty given the role of emissions scenarios and initialization options in establishing this uncertainty (Brekke et al., 2008).
Here a method was chosen that chose a subset of 9 GCM model projections that encapsulate the variability of precipitation and temperature.This information, as opposed to at-tempting to identify a specific risk can be used to show the range of risk that may exist.The selected nine projections are allowed to vary by lookahead period.Three lookahead periods were considered, 2011-2040, 2041-2070, and 2071-2099.These periods represent three different decision time frames in which one might change operations or physical infrastructure.A tercile grid is constructed based upon the projected temperature and precipitation relative to the simulated historical antecedent period  (Fig. 2).The tercile grid is generated through a Cartesian sectioning between the maximum and minimum changes in precipitation and temperature at the lookahead period relative to the antecedent period.The GCM projections that were geometrically calculated to be closest to the nine vertices encompassing the array of projected temperature and precipitation shifts were chosen.Projections have internal climate dynamics and just as there is observed interdecadal variability in the observed and historical past, the climate models have interdecadal variability in their projected future.The interdecadal variability are not necessarily synchronous with each other and also do not necessarily share the same dynamics or initial conditions and have other differences Projections, therefore, depending when in the future they are examined, may display different relative precipitation and temperature.The relative precipitation and temperature of the 9 selected GCM projections are hence different by lookahead period.In Fig. 2 the blue lines in the second and third panel represent the location of projection from the 2011-2040 lookahead in the 2041-2070 and 2071-2099 lookahead, respectively.For example, for the Gunnison River above Blue Mesa for the lookahead period of 2011-2040 projection 87 (ncar ccsm3 0.6.sresa1b)represents the GCM projection geometrically closest to the tercile of smallest precipitation ratio and largest temperature ratio Fig. 2 -first panel).At the 2041-2070 lookahead period projection 87 remains in the upper third of temperature amongst the projections but is in the middle third of precipitation ratios.At the 2071-2099 lookahead period, projection 87 is in the middle third of both precipitation and temperature ratios.

Weather generation
The bias corrected spatially downscaled projections in the archive (Sect.2.3) describe time series of temperature and precipitation conditions on a monthly time step.The SAC-SMA model, as applied in the case study basins, operate on 6-h values of temperature and precipitation.Therefore, a method is necessary to equate monthly average temperature and precipitation values to 6-hourly values to force each basin SAC-SMA model.The general approach was to scale a monthly set of observed 6-hourly values by the ratio of projected temperature and precipitation to the observed monthly average temperature and precipitation within the scaled month (e.g., Maurer, 2007;Reclamation, 2008).(1990).This choice of methodology is adopted and/or modified from earlier work (Wood et al., 2002).The technique has been used in other hydrologic impacts studies under climate change where monthly climate projections were temporally disaggregated to develop sub-monthly weather forcings (e.g., Payne et al., 2004;Christensen and Lettenmaier, 2007;Maurer, 2007).
Key choices in this temporal disaggregation scheme are the eligibility constraints applied to observed-historical months during the process of resampling.Several past implementations of this scheme have adopted the constraint that the sampled month only needs to be of the same calendar month as the projected month (denoted "one-square" in this manuscript, or 1-sq for short).With this constraint, it is possible that the projected month may be relatively hot and wet while the resampled observed month is relatively cold and dry.This could lead to rather large scaling ratios applied to the historical month's 6-h forcings and call into question about whether the new and adjusted forcings are still plausible in the context of observed historical data.The tails of flood frequency distributions are important, and this opportunity for large scaling ratios can lead to anomalies in the tails of the distributions.A decision was thus made in this study to consider alternative eligibility constraints on resampling in order to limit such scaling ratios.
Two alternative sampling constraints were considered (Table 1).The first alternative sampling constraint is called 4-sq (four-square).It involves subdividing the calibration weather years into four categories: hot-wet, hot-dry, coldwet, and cold-dry.For example for basin A, each January from the calibration set of 1967-1997 were collected and the 6-h observed values were aggregated into monthly mean for temperature and total precipitation.The median temperature amongst these mean monthly values was then found and used to separate hot and cold Januaries.Then for the hot Januaries the median precipitation value was found and the hot Januaries were then divided into hot-dry and hot-wet Januaries.For the cold Januaries the procedure is repeated.The result for the 4 sq method for basin A with 50 years of calibration set data is that there would be 12 or 13 historical Januaries in each of the four categories.Sampling of observed historical months was then constrained so that the categories of sampled and projected months matched in each sampling instance.For example if projected January 2031 is hot and dry then the randomly selected 6-h values for scaling must come from the 12 or 13 historical Januaries that have been categorized as such.
The second alternative sampling constraint is called 8-sq (eight-square).It involves the same procedures as the 4square with a further subdivision by precipitation to result in four categories of precipitation and two categories of temperature.A relaxation was made from the 4-square method that constrained sampling for a projected month to the same month from the observed historical period.The motivation for further subdividing by precipitation was to further limit the scaling constant that would be necessary to match the aggregate observed 6-h time increments to the projected mean monthly value.The motivation for the relaxation of the monthly sampling constraint was to expand the opportunities for random selection for any projected month.
Because there is a random component to the sampling methodology for temporal downscaling it was desired to consider the range of variability that this randomness may induce.Therefore, multiple downscaling simulations were done for each projection selected for each lookahead period.The simulation set size was arbitrarily set to 10 simulations of 6-hourly values to be run through the SAC-SMA hydrologic model.To show the variability induced through the temporal downscaling methodology consider a single projection.For the lookahead period 2011-2040 there are 30 years of modeled results for each simulation.The key variables of interest for further discussion are the annual maximum floods of each year.For a single projection (inmcm3 0.1.a1b),the temporal downscaling random component results in an empirical distribution of the ten simulations that can encompass a relatively wide distribution of annual maximum floods (Fig. 3).Empirical cumulative distribution functions of annual maximum floods during the lookahead period 2011-2040 for one projection set are shown in Fig. 3.The 90% non-exceedance level for this projection and simulation set ranges from approximately 400 to 1200 m 3 /s.The weather generation approach used to generate in Fig. 3 was the 8-sq method.
The National Weather Service considers two methods (station and area-weighted) for calibrating the SAC-SMA models.The station method involves mapping the gridded GCM data to a location in space using bi-linear interpolation.This method is used if the corresponding SAC-SMA observed mean-area temperature (MAT) for an elevation zone is represented by a "synthetic" station.This is usually the case where there is significant elevation variation across the basin.Observed temperature data from a network of climate stations is mapped to the "synthetic" station to produce the MAT for a given basin elevation zone.This "synthetic" station location is used to extract the GCM gridded data for a given basin elevation zone.The temperature value is interpolated from the four grid cell centers surrounding the "synthetic" station location.Although there is not one single standard for which SAC-SMA models are calibrated an example can be reviewed within Bissell and Orwig (1995).
The area-weighted approach is used in cases where the MAT was developed without the use of a "synthetic" station.This involves intersecting the boundary of the basin elevation zone with the 1/8 degree grid then deriving a temperature value for the zone area by area-weighting the temperature of each grid cell that intersects the zone boundary.
The methods ("sythetic" station vs. MAT) used for calibration of the SAC-SMA model by the National Weather Service were retained when mapping the GCM average temperature data from the 1/8 degree grid to the SAC-SMA basin elevation zones.The designation throughout the rest of the analysis is S for station weighting, and AW for area weighting.For example, the weather generation with 8-sq constraints on the Boise River with station weighting is designated S-8sq.The James River basin is the only of the four evaluated in this manuscript that had an AW.

Hydrologic hazard assessment
To put information into a context that is used throughout flood hazard assessment and management the information developed from the simulation model are used to create flood frequency curves.For each projection and each simulation by lookahead there is a modeled annual maximum flood.For each of the three lookahead periods two types of flood frequencies were considered, the expanding retrospective flood frequency and the lookahead flood frequency.The expanding retrospective is the current paradigm for flood frequency.This is how most flood frequencies are calculated in that all information at a location of interest is considered equally when developing a flood frequency curve.Every year there is a new observation of an annual maximum discharge added to an expanding record of floods at that location.For example, using expanding retrospective analysis for a basin that has a period of record from 1950-1990 those forty occurrences of annual maxima would be treated as independent samples from a general population and used to fit a distribution to (i.e.Log-Pearson III from Bulletin 17-B).If time then proceeds to 2020 there would be 30 additional independent samples (i.e., 1950-2020).This approach relies heavily on the stationarity assumption in that all 70 years are assumed to independent samples from the same distribution.
The lookahead flood frequency differs from this approach in that it will consider only a limited set of floods to estimate a flood frequency curve.This approach is used to somewhat account for non-stationarity.The implicit statement is that floods are representative of a given climate state and samples from a different climate state should not be considered.

D. A. Raff et al.: A framework for assessing flood frequency
For example, for a location that has a period of record from 1950-2020 as before only the period of 1990-2020 is used to compute the flood frequency.Although the period of 1990-2020 is considered to be stationary when fitting a distribution it assumes that the period of 1950-1990 does not come from this same distribution.
The expanding retrospective flood frequencies were calculated as follows.For the 2011-2040 future period, a total of 60 samples were used to fit the log-Pearson III distribution.These 60 samples comprised 30 random samples taken from between the 5th and 95th quantiles from the length of record of the calibration set for that particular basin and 30 samples taken from the 5th and 95th quantiles between the 2011-2040 simulations.The result is 60 total samples which were then fit to a log-Pearson III distribution as described in Bulletin 17-B without any regional skew adjustment.Because of the random selection of 30 simulations from the 4500 possibilities for the retrospective period and the 30 random samples from the 2700 possibilities for the 2011-2040 period, the procedure was performed 100 times to account for some of the variability.For the expanding retrospective approach for the 2041-2070 lookahead period, the same procedure was followed as the 2011-2040 period with the additional 30 random samples taken between the 5th and 95th quantiles from the 30 years by 9 GCM projections by 10 simulations between 2040-2070 for a total of 90 samples.Likewise there were a total of 120 samples for the 2071-2099 lookahead period.
The lookahead flood frequencies were calculated as follows.For the 2011-2040 lookahead period 30 random samples were taken from between the 5th and 95th quantiles from the 30 years by 9 GCM projections by 10 simulations between 2011 and 2040.The difference between this set and the expanding retrospective set is that for this set there is an absence of the retrospective period.The sample size from which a distribution is being fit is smaller.This is a total of 30 samples which were then fit to a log-Pearson III distribution as described in Bulletin 17-B without any regional skew adjustment.This was repeated 100 times.For the 2041-2070 lookahead period 30 random samples were taken from between the 5th and 95th quantiles from the 30 years by 9 GCM projections by 10 simulations between 2041 and 2070.Again, this is a total of 30 samples which were then fit to a log-Pearson III distribution.The procedure for the 2071-2099 lookahead was similar.For each of the three lookahead periods using the lookahead flood frequency approach there are 30 years of data from which to fit the log-Pearson III distribution.For the 2071-2099 lookahead period the difference between the lookahead approach (30 years data) and the expanding retrospective approach (120 years data) is 90 years of data.The implications of reducing the sample size in an attempt to better characterize the population from which the floods are being observed are examined later in the document.

Weather generation for evaluating flood potential
Three sampling constraints (variations) described in Sect.2.5 were considered from a weather generation method based on historical resampling.It was desired to use only one of the three variations to evaluate changes to flood frequency.A comparison of performance was therefore made between the weather generation variations.The comparison metric chosen was the calibration set used for each of the NWSRFS SAC-SMA models.As described in Sect.2.4 for each of the three lookahead periods there were nine projections selected based on their variation in temperature and precipitation.With ten simulations available per projection and a fifty year calibration set there are a combined, 13 500 annual maximum discharges per location over the antecedent period.An evaluation was made to evaluate how many of the simulations and weather generation sequences encapsulated the observed historical annual maximum flows at each basin.It was determined that the 8-sq weather generation constraint variation produced empirical distribution functions that best encapsulated the observed historical flows for all basins.Figure 4 shows the empirical distribution functions of the annual maximum discharges plotted for the Boise River Basin at Lucky Peak Dam on each panel of Fig. 4: first for the simulated historical using observed historical weather (blue line), and then for simulated retrospective period defined as 1951-1997 that overlaps the observed historical weather.For this period there are 270 grey lines representing the 9 GCM projections for each of the three-lookahead periods simulated over the retrospective period.For both the S-1sq and the S-4sq clouds there are only a couple simulation sets that encompass the observed historical over the 0.4 to approximately 0.6 probability range.The S-8sq has an approximately equal number of simulations greater and less than the observed historical values.However, it is the less frequent floods (probabilities of occurrence less than 0.01) that are the most influential in estimating flood risk.It is assumed that the ability to simulate the entire probability range of flows is a good representation of simulating more extreme events.The ability to reproduce the calibration set empirical distribution is evidence of the ability of the methods as described in Sect. 2 perform adequately over this range of exceedances.It is for the reason that the 8sq constraint variation always encompasses the observed historical values better than the 1sq and the 4sq variations.Therefore, only the 8sq variation will continue to be evaluated for the remainder of the analysis.
Figure 4 also shows that the tail of the distribution the projection driven floods begin to deviate significantly from the calibration set at approximately the non-exceedance probability of 0.9 to 0.95 for all weather generation variations.This also was the case for the other basins.This deviation may come as a result of scaling anomalies despite the attempt to provide some constraints.Despite the selection criteria for weather generation described previously it is still possible to have a monthly precipitation value scaled by a value that causes an anomalous result.Thus, a further assumption was made for the distribution fitting described and analyzed in Sect.3.3 that only those flows with non-exceedance probabilities between 0.05 and 0.95 would be used to fit log-Pearson III distributions.

Evaluation of flood potential by lookahead
For each lookahead period there are nine projections with 10 simulations each for a total of 90 simulated projections.Each simulated projection is a thirty year time period for a total of 2700 simulated years per lookahead.To evaluate potential changes in flood potential using the 8sq weather generation constraint, the empirical distribution functions by lookahead periods were compared.All 2700 simulated annual maximum values were pooled to create a single empirical distribution function for each of three lookahead periods.Figure 5 shows the empirical distribution functions for each of the four basins included in this study for each lookahead period as well as the retrospective period .Each of the four basins has different simulated responses as well as some similarities.
The Boise River Basin shows an increase in annual maximum flood values with time for essentially all probabilities of occurrence.The San Joaquin River Basin has virtu-ally identical annual maximum flood values through time for non-exceedance probabilities below approximately 0.30 and increasing annual maximum flood values with time for nonexceedance probabilities above 0.30.The Gunnison River Basin shows a decrease in annual maximum flood values for non-exceedance probabilities up to approximately 0.70 and an increase in annual maximum flood values with lookahead time for non-exceedance probabilities greater than 0.70.The James River Basin shows virtually no change in annual maximum flood values for non-exceedance probabilities up to approximately 0.45 with an increase in annual maximum flood values with time for non-exceedance probabilities greater than 0.45.In all four basins the upper end of the distributions, the flood magnitudes with greater than 70% non-exceedance probabilities increase with time.As previously discussed, it is the most infrequent of floods that often define the flood hazard and risk at a location and all four of the basins have simulated values that show an increase in annual maximum flood values for rare events.The following section will discuss the implications of this result in a context of characterizing flood risk through the methods as currently utilized in practice.Further research is needed to determine the basin or climate characteristics that drive the differences and similarities amongst the basins and their flood potentials.

Expanding retrospective vs. lookahead flood frequency evaluation
As described in Sect.2.6, the most common method to estimate flood risk is to use an expanding retrospective analysis.
A second method was also described that only considers the most recent time period to evaluate flood risk.In Sect.3.2 it was shown that for the four basins there can be varying deviations of simulated future flood potential from those in the retrospective period.The expectation is therefore that the flood frequency estimations will also differ by lookahead period.For example for the Boise River Basin that shows an increase in annual maximum floods for each of the lookahead periods the expanding retrospective approach to evaluate the flood frequency in 2099 will be a blend of all years leading up to 2099 despite the fact that the 2071-2099 period itself does not appear to share much in common with 1951-1997.
Figure 6 shows the expanding retrospective vs. lookahead approach to flood frequency for each of the basins and each of the lookahead periods.The solid blue lines in each of the plots represent the median flood frequency curve from the 100 flood frequency curves using the methods described in Sect.2.6 for the expanding retrospective approach.The dashed blue lines represent the flood frequency curves that had 10 and 90th quantile 100-year return period values from the random selections.The colored solid and dashed lines have corresponding meanings for the lookahead flood frequency approach.From Fig. 6 there are clear differences in the flood frequency estimates depending on whether the expanding retrospective approach or the lookahead approach was employed.The implications for the 100-year flood are now discussed.For all locations and all lookahead periods the expanding retrospective approach results in a lower estimate of the 100year flood than the lookahead approach (Table 2).The percent differences in the 100-year estimates vary by lookahead period and by basin.For the 2011-2040 lookahead period the smallest percent difference is 4% in the Boise River Basin and the largest percent difference is 17% in the San Joaquin and Jamestown River Basins.For the 2041-2070 lookahead periods the percent differences range from 8% to 28% in the Boise River Basin and the James River Basin, respectively.The smallest percent difference in the 2071-2099 is 8% for the Boise River Basin and the largest percent difference is 32% for the James River Basin.The implication of this result is that to characterize the flood frequency given current methods of fitting log-Pearson III distributions may result in a biased underestimate of the true flood potential.This is an intuitive result given that the empirical distribution functions for each of the locations show an increased trend to bigger floods.The expanding retrospective approach to characterizing the floods continues to give equal weight to floods that occurred during an entirely different climatology.
Perhaps a more important implication is in the context of designing for some lookahead period.Consider if we were to make a flood frequency estimate in 2041 for a structure with a life span until 2099 for each of the four basins analyzed in this manuscript.The current methodology would be the expanding retrospective approach over the retrospective period 1951-2041.If the flood potential is increasing through time however at the end of the life span, 2099, of the structure than the flood potential at that time may be very different than the 1951-2041.So consider a comparison of the expanding retrospective approach for the 2011-2041 lookahead period as described in Table 2 and the lookahead ap-proach for 2071-2099.The differences for the four basins are 11%, 52%, 23%, and 45% for the Gunnison, San Joaquin, Boise, and James River Basins, respectively.Therefore, the design would be underestimating the flood with a given risk by between 11% and 52%, depending on the basin, over the life span of the project.

Uncertainties
This manuscript presents methods to quantitatively describe the flood potential and flood frequency using climate projections.The results, analyses, and discussion within this manuscript are all subject to the uncertainties associated with the data and methods employed.The uncertainties involved in greenhouse gas emissions used to generate the climate projections are not fully known.The climate projections are from state-of-the-art global circulation models that have an ability to simulate the past, but the models ability to characterize the future is uncertain.There are also uncertainties associated with the bias corrected spatial downscale methodology employed for spatial disaggregation.The assumption that the fine grid scale spacing relationship to climate model outputs remains the same in the future is a stationarity assumption, though it is a less strict stationarity assumption assuming the entire flood frequency curve is stationary in time.The weather generation methods also have uncertainties associated with them and rely on an assumption that the weather patterns observed historically are similar to those that may occur in the future but will just be warmer or cooler, wetter or drier.Despite all of these uncertainties and assumptions the methods presented attempt to encompass the stateof-the-art knowledge and ability to simulate future runoff from climate projections.
The hydrology model used, SAC-SMA, have parameterized land surface schemes that are calibrated to past events.It was assumed in this study that these calibrations are reasonable for the future period were kept constant.This approach is first order in that we do not account for developing model physics and parameters.The approach is well supported in the literature of assessing hydrologic impacts through offline hydrology models as opposed to hydrology models embedded within climate models (e.g., Miller et al., 2003;Mauer, 2007;Christensen and Lettenmaier, 2007;Purkey et al., 2007).For example, although one of the inputs into the SAC-SMA model is potential evapotranspiration (PET) and PET may be altered in a changing climate, the value was not altered as part of this study.This approach is justified by Miller et al. (2003) that showed that sensitivity to the PET with projected temperature changes was relatively small.
It is also somewhat necessary, to reiterate, the climate models operate at a spatial scale that is inconsistent with the generation of flood flows.We have thus relied upon the climate models for representations of temperature and precipitation.We then rely upon a spatial and temporal downscaling techniques to drive the off line hydrology model.Figure 4 shows that the ability of these methods, over the antecedent period, is adequate at reproducing the calibration set floods for non-exceedance probabilities between approximately 0.5 and 0.95.
The results also lead to the need for further research.Each of the four basins responds differently to the climate projections.For a complete understanding of the flood response to climate change it will be important to determine why the responses differ.Key questions are: Is temperature the dominant driver, or is precipitation, or some combination?It may also be useful to determine what sorts of generalizations may be derived from these basins to similar basins elsewhere.

Conclusions
A set of methods have been developed and presented that allow for the estimation of flood potential given a set of climate projections.These methods are intended to provide an envelope of expected variability of the climate through an equally weighted tercile selection of candidate projections of temperature and precipitation.Through the use of a weather generation scheme and a rainfall runoff tool simulated annual maximum discharges are derived for lookahead periods of 2011-2040, 2041-2070, and 2071-2099.These annual maximum discharges are then put into the context of flood frequency analysis.Results indicate that for the four basins analyzed in this study the climate projections result in an increased simulated annual maximum flood potential through time.An expanding retrospective approach to characterizing flood hazard may increasingly underestimate the flood potential as time progresses.Decisions based upon the expanding retrospective approach to characterizing flood fre-quency could be based upon underestimates of future flood potential.Additional work is required to understand the differences in basin response with the climate forcings, but current results indicate that more consideration should be given to non-stationarity assumptions when estimating flood risk.

Fig. 1 .
Fig. 1.Basin Selections are the Boise River above Lucky Peak Dam, the James River above Jamestown Dam, the San Joaquin River above Friant Dam, and the Gunnison River above Blue Mesa Dam.

Fig. 2 .
Fig. 2. Projection Selection by lookahead period and basin.Numbers represent spread of individual climate projections.Panels moving from left to right are the three lookahead periods.Colored numbers represent the selected projections for that lookahead period.Colored lines show where previously selected projection are with respect to spread at future lookahead periods.

Fig. 4 .
Fig. 4. Evaluation of candidate weather generation schemes for Boise River basin above Lucky Peak Dam.Blue line represents empirical distribution function (ECDF) for the calibration set 1967-1997 for the SAC-SMA model.Grey lines represent ensemble of of projections for the same 1967-1997 period.Three different panels represent the three candidate weather generation schemes.

Fig. 5 .
Fig. 5. Cumulative distributions of annual maximum discharge based on ensemble hydrologic simulation for the periods and basins shown.Retrospective period is defined as 1951-1997 for all basins.CDFs based on SacSMA simulation of GCM simulated historic climate.

Fig. 6 .
Fig.6.Flood Frequency Curves for the locations and lookahead periods as specified.Blue lines represent the Expanding retrospective approach and colored lines represent lookahead approach.

Table 1 .
Weather generation scenarios (column 2) and their corresponding sampling constraints (columns 2 and 3).The implication of the sampling constraints for number of random possibilities shown in column 4.

Table 2 .
100 year discharge values for each of the basins and lookahead periods as specified as well as percent differences among expanding retrospective approach and lookahead approach for flood frequency analysis.All values are simulated annual maximum discharges in m 3 /s rounded to the nearest 10.