An integrated uncertainty and ensemble-based data assimilation approach for improved operational streamflow predictions

The current study proposes an integrated uncertainty and ensemble-based data assimilation framework (ICEA) and evaluates its viability in providing operational streamflow predictions via assimilating snow water equivalent (SWE) data. This step-wise framework applies a parameter uncertainty analysis algorithm (ISURF) to identify the uncertainty structure of sensitive model parameters, which is subsequently formulated into an Ensemble Kalman Filter (EnKF) to generate updated snow states for streamflow prediction. The framework is coupled to the US National Weather Service (NWS) snow and rainfallrunoff models. Its applicability is demonstrated for an operational basin of a western River Forecast Center (RFC) of the NWS. Performance of the framework is evaluated against existing operational baseline (RFC predictions), the standalone ISURF and the stand-alone EnKF. Results indicate that the ensemble-mean prediction of ICEA considerably outperforms predictions from the other three scenarios investigated, particularly in the context of predicting high flows (top 5th percentile). The ICEA streamflow ensemble predictions capture the variability of the observed streamflow well, however the ensemble is not wide enough to consistently contain the range of streamflow observations in the study basin. Our findings indicate that the ICEA has the potential to supplement the current operational (deterministic) forecasting method in terms of providing improved singlevalued (e.g., ensemble mean) streamflow predictions as well as meaningful ensemble predictions.


Introduction
Hydrologic forecasting is of primary importance in the context of flood and drought mitigation as well as for optimal water resources planning and management.The National Weather Service (NWS), the US agency responsible for short-and long-term hydrologic forecasting across the nation, primarily applies a joint snow accumulation and ablation model (i.e., SNOW17; Anderson, 1973) and rainfallrunoff model (i.e., SAC-SMA model; Burnash et al., 1973) for operational streamflow prediction in snow-dominated watersheds.For these watersheds, the accuracy of streamflow prediction largely relies on the accuracy of initial snowpack states (Clark and Hay, 2004;Franz et al., 2008).In the current NWS forecasting system, model initialization is conducted by running the calibrated joint model to the beginning of the forecast period.Subsequently, simulated states are manually updated based on the observed states.These state adjustments are typically not archived and are fairly subjective, varying significantly among forecasters who have different understanding and knowledge of the forecast basins, models, and prediction errors.Forecast accuracy thus varies across forecasters with different levels of forecasting skills (Seo et al., 2003).In addition, potential uncertainties in model forcing and parameters are typically ignored in the adjustment process.The non-systematic and manual procedures of the current model initialization method, as well as the need for more systematic uncertainty assessment motivate the development of a coupled automated uncertainty analysis and data assimilation approach.The developed approach described herein includes the following capabilities: (1) treating errors in model forcing and parameters in a systematic and meaningful way, and (2) automatically merging state measurements into the operational model(s).
Over the past several decades, a variety of uncertainty analysis methods and data assimilation techniques have been developed and reported in the hydrologic literature.Some of the uncertainty analysis methods, among many others, include the generalized likelihood uncertainty estimation (GLUE) (Beven and Binley, 1992), the Bayesian total error analysis (BATEA) (Kavetski et al., 2002), the Integrated Bayesian Uncertainty Estimator (IBUNE) (Ajami et al., 2007), the Framework for Understanding Structural Errors (FUSE) (Clark et al., 2008b;Clark and Kavetski, 2010), and Markov Chain Monte Carlo (MCMC) methods including the Random Walk Metropolis algorithm (Kuczera and Parent, 1998), the Shuffled Complex Evolution Metropolis (SCEM-UA) algorithm (Vrugt et al., 2003), and the Differential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2008).Most of above methods have not been commonly applied in practice, particularly for the NWS operational models SNOW17 and SAC-SMA.This is partly due to their high computational requirements, since efficiency is extremely important in hydrologic forecasting operations (Weerts and El Serafy, 2006).Most recently, He (2010) proposed an Integrated Sensitivity and UnceRtainty analysis Framework (ISURF).The ISURF first applies the Generalized Sensitivity Analysis (GSA) (Hornberger and Spear, 1981) method as a screening tool to identify sensitive model parameters.Subsequently, ISURF applies the DREAM algorithm (Vrugt et al., 2008) to explicitly quantify the uncertainty structure of these identified parameters.This step-wise method significantly reduces the computational load and has been demonstrated to efficiently and adequately identify the uncertainty structure of the SNOW17 and SAC-SMA parameters at a set of study sites with contrasting hydroclimatic conditions (He, 2010;He et al., 2011b).
Data assimilation techniques have been extensively applied in meteorological and ocean sciences.Currently existing data assimilation techniques were primarily developed for numerical weather prediction (Daley, 1993) and more recently applied in hydrologic fields including operational hydrologic forecasting.The earliest data assimilation approach applied to operational hydrologic models (i.e., SAC-SMA) is the extended Kalman filter (EKF) (Kitanidis and Bras, 1980a,b).Another data assimilation technique applied to an operational model (i.e., SAC-SMA) is the variational method (VAR) (Seo et al., 2003(Seo et al., , 2009)).A third data assimilation technique applied in hydrologic modeling is the Ensemble Kalman Filter (EnKF) (Evensen, 1994).The EnKF has been applied to both the SNOW17 model (Slater and Clark, 2006) and SAC-SMA model (Vrugt et al., 2006), as well as other hydrologic models including the TOPNET model (Clark et al., 2008a) and the HBV model (Weerts and El Serafy, 2006).
These data assimilation techniques have their own advantages and disadvantages (Liu and Gupta, 2007).The EKF is inherently associated with instabilities and divergence for models with strong non-linearities (Evensen, 1994).The VAR requires the development of the adjoint model which is complicated and labor-intensive (Margulis and Entekhabi, 2001).In comparison to the EKF and VAR, the EnKF does not require reformulation or modification of the original model, which is important in an operational setting.In addition, the EnKF provides more flexibility in explicitly handling various sources of uncertainty.This distinctive characteristic of the EnKF makes it a potentially robust technique for assimilation of a range of hydrologic data into operational forecasting.There are also other ensemble data assimilation algorithms including the Ensemble Kalman Smoother (e.g., Dunne and Entekhabi, 2006), and particle filter (e.g., Moradkhani et al., 2005a) which have similar strengths as the EnKF.However, previous studies have shown the EnKF is easy to implement and is more computationally efficient (Weerts and El Serafy, 2006).
In spite of the above-mentioned attempts to integrate data assimilation approaches into the NWS operational SNOW17 model (Slater and Clark, 2006) and SAC-SMA model (Kitanidis and Bras, 1980a,b;Seo et al., 2003Seo et al., , 2009)), most of these studies assumed uncertainty-free model parameters and thus did not capitalize on the full capabilities of the EnKF to propagate parameter uncertainty and subsequently reduce forecasting uncertainty.Recent studies (e.g., Moradkhani et al., 2005a,b;Vrugt et al., 2006;Su et al., 2011) explored the capability of data assimilation techniques (i.e., EnKF and/or particle filter) in sequentially updating model parameters and states at measurement times.However, the traditional premise in operational hydrologic forecasting is that model parameters are time-invariant.Parameter uncertainty can be addressed by defining an ensemble set of parameters which approximately covers the uncertain parameter ranges.This can be achieved by perturbing parameters in the same way as the model forcing (Margulis et al., 2002).
The objectives of the current study are two-fold: (1) to explicitly quantify uncertainty in model parameters; and (2) to develop an ensemble-based data assimilation technique configured with derived parameter uncertainty and evaluate the applicability of this technique in improving NWS operational streamflow predictions.To achieve our objectives, we propose an Integrated unCertainty and Ensemble-based data Assimilation (ICEA) approach for the coupled SNOW17/SAC-SMA model of the NWS.The ICEA has two components, the ISURF framework (He, 2010) and an EnKF framework.The ISURF is applied to identify sensitive model parameters and the uncertainty structure of these parameters.Based on the parameter uncertainty structure identified along with preset uncertainty structure for model forcing, an EnKF framework is developed for the SNOW17 model to generate rain plus snowmelt (defined here as rain falling on bare ground plus melt out of the snowpack) via assimilation of snow water  Anderson (1973) and He et al. (2011a,b) 2 Study area, datasets, and models

NWS operational models
The SNOW17 is a lumped process-based model that simulates snow accumulation and ablation processes (Anderson, 1973).The model requires mean areal precipitation (MAP) and mean air temperature (MAT) as inputs and simulates SWE and a rain plus snowmelt timeseries.Snow is modeled as a single layer in the model.The heat storage of the snowpack, liquid water retention and transmission, and snowmelt are computed using empirical functions, an areal depletion curve (ADC), and 11 parameters (Table 1).The model applies a snow correction factor (SCF (−)) to account for precipitation gage catch deficiencies.The actual snowfall input to the model, P s (mm), is computed as: where P (mm) represents the observed precipitation (MAP in this study); f s (−) is fraction of snow in the precipitation and is determined as: where T a ( • C) is the observed air temperature (MAT in this study); PXTEMP is the threshold temperature to distinguish snowfall from rainfall in precipitation.Parameters SCF and PXTEMP jointly determine the snowfall input to the model for given precipitation and air temperature forcing.
The SAC-SMA model is a saturation excess rainfall-runoff model that simulates infiltration, percolation, soil moisture storage, drainage, and evapotranspiration processes (Burnash et al., 1973).Inputs to the model typically include MAP (or rain plus snowmelt when used in junction with the SNOW17 model) and potential evapotranspiration (PET).Outputs are estimated evapotranspiration and a basin-averaged runoff depth.A separate unit hydrograph is then applied to convert the runoff depth to streamflow.The model has 16 parameters, 13 of which must be estimated and three are generally set to default values (Hogue et al., 2000(Hogue et al., , 2006;;Ajami et al., 2004).

Study area and datasets
The current study focuses on the North Fork of the America River Basin (NFARB) located on the western side of the Sierra Nevada Mountains in northern California (Fig. 1).The basin is one of the study locations for the Distributed Model Intercomparison Project (DMIP) of the NWS (Koren et al., 2004;Reed et al., 2004;Smith et al., 2004).The NFARB drains into Lake Folsom with an area of 868 km 2 .Elevation in the NFARB ranges from about 200 m at the basin outlet to about 2870 m at the highest boundary.The basin is characterized by deep winter snowpack (November to February) resulting from orographic precipitation processes.The snowpack melts out in the spring (March to June).The long-term annual precipitation and runoff are 1514 mm and 837 mm, respectively.The vegetation is elevation-dependent, with thin alpine tundra forest, dense mixed coniferous forest, and grassland chaparral and woodland species in the upper, medium, and lower elevations of the basin, respectively (Shamir and Georgakakos, 2006).
The snowline at the end of the accumulation period is estimated at around 1500 m.In the NWS modeling system, the NFARB is delineated into the upper sub-basin and lower sub-basin using the snowline as the divide.The upper subbasin accounts for 37 % of the total basin area.The upper and lower sub-basins receive an annual average precipitation of about 1643 mm and 1397 mm, respectively.The long-term annual average temperatures for the upper and lower subbasins are approximately 5.7 • C and 10.7 • C, respectively.There are three snow stations which provide daily SWE observations in the upper sub-basin (Fig. 1, Table 1).One of them (CS) is a SNOw TELemtry (SNOTEL) site maintained by the National Water and Climate Center (NRCS).The other two are maintained by the California Data Exchange Center (CDEC).Additional climatic and geographic characteristics of the upper and lower sub-basins are presented in Fig. 2.
Model forcing data, including the historical MAP, MAT, and PET for both upper and lower sub-basins, are avail-  (WY 1979(WY -2002)), training period (WY 1979(WY -1984)), and prediction period (WY 1991(WY -1996)).Mean annual temperature at various percentiles over the lower (a) and upper (b) subbasins are highlighted, as well as the annual areal precipitation at various percentiles over the lower (c) and upper (d) sub-basins.
able at a 6-hourly time step from water year (WY) 1979-2002 (hereafter refer to as the "whole period").The daily streamflow discharge data at the basin outlet (recoded at USGS gage # 11427000) and daily SWE at the snow stations are also available from WY 1979-2002and WY 1991-2002, respectively.For demonstration purposes, a 6-year training period (WY 1979(WY -1984) ) is selected to identify parameter uncertainty.An equivalent length of period is selected as the prediction period (WY 1991(WY -1996)), when the SWE observations are assimilated to produce streamflow predictions.The MAT and MAP during both training and prediction periods generally resemble their counterparts during the whole period, however, MAT during the prediction period is slightly higher and MAP during the training period is higher at both sub-basins (Fig. 2).Further details on the hydroclimatic characteristics of the study basin (NFARB) during the prediction period are provided in Table 3.

Integrated uncertainty and Ensemble-based data Assimilation (ICEA)
The ICEA consists of a parameter uncertainty analysis component, ISURF, and a data assimilation component, the EnKF.These two components are described as follows.

Uncertainty analysis
The ISURF utilizes the GSA (Hornberger and Spear, 1981) as an initial step to screen and identify sensitive parameters.
The GSA first samples the parameter space and then classifies the sampled parameters into behavioral and nonbehavioral categories according to the performance of the model configured with these parameters.The behavioral parameter sets are then divided into an amount of equally sized groups.Cumulative distributions of each parameter within each group are calculated and the Kolmogorov-Smirnov (KS) test is applied to quantify the closeness between these distributions.The resulting KS value (from 0 to 1) is the maximum vertical distance observed between these distributions, with higher value indicating higher parameter sensitivity.A KS value greater than 0.25 is utilized as a threshold for parameter sensitivity following He et al. (2011b).More detailed explanation and application of the GSA appears in He et al. (2011b).
The ISURF employs the DREAM algorithm for subsequent uncertainty assessment of sensitive model parameters.The DREAM was recently introduced by Vrugt et al. (2008) to estimate optimal parameter values and their underlying posterior probability density.This is accomplished by minimizing the sum of squared residuals between modelpredicted and observed variables.In DREAM, a preset number of Markov Chains (a chain refers to a vector containing model parameters considered) are simultaneously run in parallel.For each chain, a candidate vector is generated by taking a discrete proposal distribution containing a fixed multiple of the difference between randomly chosen chains.The Metropolis ratio is used to decide whether to accept the candidate point or not.The convergence of a DREAM run is monitored with the R statistic of Gelman and Rubin (1992).The reader is referred to Vrugt et al. (2008) for detailed descriptions on DREAM.In the current study, we use standard algorithm settings for both the GSA and DREAM applied by He et al. (2011b).

Ensemble Kalman filter (EnKF)
The EnKF, first introduced by Evensen (1994), is a Monte Carlo approach which approximates a Bayesian updating scheme to determine the evolution of model states over time (between measurements) and updating of these states at measurement times.Detailed description of EnKF can be found in Evensen (1994Evensen ( , 2003) ) and Reichle et al. (2002).Basically, the EnKF involves a recursive propagation and updating process.The propagation step produces an ensemble of model state outputs between measurements based on an ensemble of model inputs as follows: where A is the model operator investigated (SNOW17 in this case) which is subject to time-varying forcing (µ(t)) and time-invariant parameters (θ); y represents model states (Table 1) and y 0 denotes initial states; j (∈ [1, 2, ..., N]), N designates ensemble size) is a single ensemble member.
The probability distributions of model forcing and parameters (from which samples µ j (t) and θ j are generated) need to be specified a priori and are further discussed in Sect. 3.3.3.
The ensemble of model states is systematically updated when the new measurement is available.This update largely relies on the differences between actual and predicted measurements, as illustrated below, where superscripts "−" and "+" denote the state estimates before and after the update, respectively, at a specific measurement time (t i ); z i is the i-th measurement (areal SWE in this case); ω i is a random error term generated by the EnKF and has the same characteristics of the systematic measurement errors associated with z i ; M is a measurement operator (linear in this case) mapping model states to the measurement variable(s); and K is the Kalman gain determined from the ensemble statistics and the measurement error specified.Definition of measurement error is presented in Sect.3.3.Figure 3 illustrates how EnKF is applied in ICEA.

Implementation of ICEA and proposed scenarios
Given that most snow events occur above the snowline (i.e., within the upper sub-basin), the current study focus on assimilation of SWE observations from the upper sub-basin.Model parameters (SNOW17 and SAC-SMA) used for the lower sub-basin remain unchanged from those applied in RFC operational forecasting.The two components of ICEA, the ISURF and the EnKF, are applied in the training period and prediction period, respectively.The general application procedure is illustrated in Fig. 4 and described as follows (specific details explained in Sect.4).
-Step 1: The ISURF is applied to the joint SNOW17/SAC-SMA model in the upper sub-basin to identify sensitive model parameters, their optimal values, and their uncertainty structures (i.e., distribution type and characteristics) during the training period.
-Step 2: The determined uncertainty structure of sensitive SNOW17 parameters, along with preset forcing uncertainty and areal SWE data uncertainty (Sects.3.3 and 3.4), is embedded into the EnKF formulation.The EnKF is subsequently employed to assimilate the daily areal SWE data (during the prediction period) to update SNOW17 model states (Table 1) for the upper subbasin.The assimilation frequency is set to be one week to mimic the operational environment.
-  as input to the SAC-SMA model to produce streamflow predictions for the upper sub-basin.The joint SNOW17/SAC-SMA model for the lower sub-basin is run to generate streamflow for the lower sub-basin.The streamflow obtained for both upper and lower subbasins is combined according to the size (area) of each sub-basin, as is the case in operations.The aggregated flow is routed using the operational unit hydrograph, yielding streamflow predictions at the NFARB outlet.
Four scenarios are considered in our study.The first two scenarios produce single-valued streamflow predictions.The last two involve EnKF applications and thus produce an ensemble of streamflow predictions.The first scenario (entitled "RFC") directly adopts model parameters used in RFC operations.The second scenario (named "ISURF") differs from the first one in that it uses ISURF-derived parameter set with the maximum likelihood (both SNOW17 and SAC-SMA) for the upper sub-basin.The third scenario (entitled "EnKF") applies a stand-alone EnKF technique configured with preset forcing uncertainty and limited information (i.e., the type of posterior distribution) on parameter uncertainty derived from the ISURF.The fourth scenario (entitled "ICEA") differs from the third one in that it uses both the posterior distribution and optimal posterior parameter ranges derived from the ISURF as inputs to the EnKF.For comparison to the first two deterministic scenarios, the ensemble mean is estimated in the last two scenarios.It is worth noting that, in all scenarios, streamflow (at 6-hourly time step) generated from both the training and prediction periods at the basin outlet are aggregated to the daily time step in order to compare with daily streamflow observations from the USGS gauge.

Areal SWE data
Snow information, including SWE, is routinely recorded by both in situ observational networks (e.g., the SNOTEL in the western US) and remote sensing platforms (e.g., satellites).However, in situ snow stations are generally sparse and not sufficient to provide accurate areal SWE information.For instance, in the western US there are over 1700 snow stations which provide SWE observations.Nevertheless, they are still not adequate to resolve the variability of SWE at the basin scale (Bales et al., 2006).There are studies which produce areal SWE estimates from point SWE observations either through interpolation (Fassnacht et al., 2003) or binary tree methods (Molotch and Bales, 2005); however, these studies are limited to specific regions and are less readily applied over broad areas.These conversion methods have also not been applied in operations.There are also remotely sensed SWE data available (e.g., from AMSR-E).However, the data suffers from poor spatial resolution (e.g., AMSR-E at 25 km resolution) and poor accuracy (e.g., Andreadis and Lettenmaier, 2006;De Lannoy et al., 2009).There are also remotely sensed snow cover area (SCA) data available (e.g., from MODIS), which can be transformed to SWE via a snow depletion curve (Andreadis and Lettenmaier, 2006;Durand et al., 2008a,b;Su et al., 2008Su et al., , 2010;;Zaitchik and Rodell, 2009;Thirel et al., 2011).However, the SCA data is error prone (e.g., MODIS SCA data is highly impacted by cloud cover).Furthermore, the relationship between SCA and SWE varies across different hydroclimatic and geographic regimes.Additionally, incorporation of SCA information into hydrologic models may not necessarily lead to improved streamflow simulations (Clark et al., 2006).
Parsimonious methods are typically used to derive areal SWE from in situ SWE measurements in operations.For instance, the NWS RFCs in the western US typically use a regression-based technique to reconcile the SNOW17simulated areal SWE and the SWE observations from available snow stations.In the current study, we use a similar regression method.The method utilizes SWE information from both the SNOW17 model and the snow stations but assumes that the areal SWE is a linear combination of point SWE measurements so that it resembles the SNOW17 simulated SWE as much as possible.Specifically, the areal SWE is calculated as a linear combination of the SWE observations from three snow stations.The weight associated with each snow station is determined via a non-negative least-squares algorithm as follows: where SWE t k represents the SWE observations from the k-th snow station (k = 1, 2, 3) at time t (t = 1, 2, ..., T , where T is the total length of prediction period in days); C k is the weight corresponding to the k-th snow station; SWE t model denotes the SNOW17-modeled SWE for the upper sub-basin at time t.By applying the above equation, the weights of three snow stations (C k ) are calculated (Table 2).It should be noted that the final weights do not necessarily add up to be 1.
The measurement error associated with the areal SWE is assumed to be normally distributed white noise with zero mean and a standard deviation of 25 mm.Sensitivity tests at six SNOTEL sites illustrates that this error definition leads to satisfactory ensemble SWE estimates via the EnKF in multiple years with contrasting wetness when using an ensemble size of 100 (He, 2010).In the EnKF applications (the last two scenarios) in this study, the ensemble size is also set to be 100.

Model forcing uncertainty
In previous EnKF applications (e.g., Margulis et al., 2002;Durand and Margulis, 2006;Clark et al., 2008a;Leisenring and Moradkhani, 2011;Wu and Margulis, 2011, among others), model forcing uncertainty is generally considered by treating input variables (i.e., precipitation and air temperature observations) as random variables and perturbing these variables according to predefined error distributions  and characteristics.In operations, however, it may be difficult to determine credible information on error distributions and characteristics that regional forecasters would agree on and implement in their forecasting practices.This study adopts an alternative, but simpler, method to address the uncertainty of SNOW17 model forcing, which can be easily implemented in an operational environment.This method draws upon how precipitation and air temperature forcing are currently utilized within the SNOW17 model to determine snowfall input.Specifically, the SNOW17 model uses the air temperature and the PXTEMP parameter to determine whether precipitation is snowfall or rainfall (Eq.2).Rainfall goes directly to runoff while snowfall first accumulates and then melts out.The snowfall input to the model is then adjusted by a snow correction factor (SCF) (Eq.1).As such, the actual snowfall digested by SNOW17 model is determined by precipitation and air temperature forcing, as well as parameters SCF and PXTMEP.Hence, instead of perturbing precipitation and air temperature timeseries, we determine the uncertainty of parameters SCF and PXTEMP and assume that this implicitly represent the uncertainty in precipitation and air temperature.This is achieved by applying the same method used in defining the uncertainty of these two SNOW17 parameters in one of our previous studies (He et al., 2011a).This method is relatively easier to understand in concept and requires no explicit quantification of the distribution type of precipitation and air temperature.

Evaluation metrics
In addition to the deterministic metrics utilized in this study, including the correlation coefficient (R), Root Mean Squared Error (RMSE), Nash-Sutcliffe Efficiency (NSE), and percent bias (Bias), we apply two metrics to quantity the spread dispersion of the EnKF-derived ensemble streamflow predictions: the Normalized Root Mean Square Error Ratio (NRR) (Anderson, 2002;Moradkhani et al., 2005b) and the 95th Percentile Uncertainty Ratio (UR95) (Hossain and Anagnostou, 2005;Moradkhani et al., 2006).These two metrics are defined as: where N designates the ensemble size (N = 100 in this study); T is the length of the prediction period (in days); NRR is a normalized measure of ensemble dispersion relative to the deviation of the ensemble mean (Anderson, 2002).It is always greater than 0, with a perfect score equivalent to 1.A value of NRR greater than 1 signifies too little spread.A value of NRR between 0 and 1 indicates too much spread.UR95 is a measure of the aggregate variability of the 95th percentile prediction range relative to the observations, generally ranging from 0 to 100 % (Moradkhani et al., 2006).

Parameter uncertainty
The GSA method is applied to derive "behavioral" SNOW17 and SAC-SMA model parameters for the upper sub-basin.Subsequent sensitivity analysis focuses on the behavioral SNOW17 parameter set since only SNOW17 parameter uncertainty is addressed in the data assimilation applications.The analysis shows that two forcing-related SNOW17 parameters, SCF and PXTEMP as well as parameter MFMAX (which determines the non-rain melt rate) and parameter SI (which determines the areal extent of the snow cover) are sensitive (Fig. 5).The KS value of other parameters are consistently lower than 0.2.Sensitivity of SCF and PX-TEMP is expected because they control the snowfall input to the model.Sensitivity of MFMAX is also intuitive because non-rain melt is deemed the dominant melting mechanism in western basins (Franz et al., 2008).The sensitivity of SI reflects the importance of the snow cover area parameter when the SNOW17 model is applied at the areal scale.Only these four sensitive parameters are analyzed in the following uncertainty analysis and data assimilation applications.Other SNOW17 parameters and all SAC-SMA parameters for the upper sub-basin are fixed at the values which provide the best streamflow simulations (highest NSE) during the training period.
The DREAM algorithm is subsequently applied to derive posterior information for the four sensitive parameters.The marginal distributions of parameters SCF, MFMAX, and SI generally follow a similar distribution type, which can be roughly identified as a normal distribution (Fig. 6).Parameter PXTEMP shows no apparent distribution in its optimal range.As such, a uniform distribution is assigned for PX-TEMP.SCF is significantly correlated to MFMAX (with a correlation coefficient of 0.77).This is due to the fact that increasing SCF leads to a deeper snowpack.A deep snowpack results in a higher maximum melt rate (MFMAX) in comparison to a shallower snowpack.MFMAX shows medium correlation to SI.The correlation between other parameter sets is not significant.Despite the existence of the cross-correlation among sensitive parameters, for simplicity, this correlation is not accounted for when formulating the EnKF in the current study (in the last two scenarios considered).However, implementation of the cross-correlation among parameters into the EnKF is being explored in our ongoing work.

Streamflow simulation
The maximum likelihood values of the four sensitive parameters, along with other SNOW17 parameters and SAC-SMA parameters determined from the GSA for the upper sub-basin, are applied to simulate streamflow at the outlet (scenario "ISURF") following the procedure described in Sect.3.2.The outflow is compared to that simulated using the RFC parameter set (scenario "RFC").Observed flow at the outlet is used as the benchmark for comparison.The annual statistics between observed flow and the flow simulated from both scenarios are calculated both on an annual basis as well as for the entire training period (Table 4).
In general, the ISURF parameters outperform the RFC parameters in providing streamflow simulations, not only during the entire training period, but also for each individual year.Particularly, the RFC parameters consistently underestimate streamflow, while the ISURF parameters largely correct these biases.For example, the RFC parameters provide relatively poor streamflow simulations during WY 1979 (e.g., bias of −55.96 % and correlation coefficient of 0.38).In comparison, the ISURF simulation during this year is fairly satisfactory (e.g., with a bias of 1.8 % and a correlation coefficient of 0.77).These results illustrate that there is room to optimize the parameters currently under use by the RFCs and improve model performance.However, it is worth noting that during real-time forecasting, RFC forecasters routinely apply run-time modifications (MODs) in adjusting initial (i.e., initial model states) and boundary (e.g., precipitation forcing) conditions in order to improve the quality of streamflow predictions (Seo et al., 2003).In the current study, no MODs are applied for the RFC simulations.

Streamflow prediction
For the stand-alone EnKF (scenario "EnKF"), identified sensitive SNOW17 parameters (SCF, PXTEMP, MFMAX, and SI) from the upper sub-basin model are perturbed following   the identified distributions (Fig. 6) within the pre-defined feasible parametric ranges (Table 1) to produce parameter ensembles.Note that it is ideal to sample parameters from their joint distribution.However, joint distribution of DREAMderived posterior parameters is generally of a complex form and not straightforward to explicitly derive (He et al., 2011b).The last scenario, ICEA, differs from the stand-alone EnKF only in parameter ranges where the perturbations are conducted.The ICEA scenario uses the optimal parameter ranges identified via the ISURF (i.e., the ranges shown in Fig. 6), which is significantly narrower than the feasible parameter ranges (Table 1).In both scenarios, perturbation of SNOW17 forcing (precipitation and air temperature) is assumed to be implicitly included in perturbing parameters SCF and PXTEMP, as discussed in Sect.3.4.Areal SWE data for the upper sub-basin is assimilated into the SNOW17 model to produce rain plus snowmelt ensembles, which are consequently applied to generate ensemble streamflow predictions at the basin outlet following the general procedure outlined in Sect.3.2.Streamflow prediction is investigated in terms of both deterministic (single-valued) and probabilistic (ensemble) predictions.For the former, streamflow predictions within the prediction period from all four scenarios are compared to the observed streamflow during the same period, using the ensemble mean from the two data assimilation scenarios (i.e., EnKF and ICEA).This comparison focuses on the whole streamflow timeseries as well as the high flows.High flows in each scenario are defined as those greater than the 95th percentile of the observed streamflow timeseries during  the prediction period.The 95th percentile flow and the number of days (with high flows) are 83 m 3 s −1 and 110, respectively.Streamflow predictions from the two data assimilation scenarios are inter-compared from two perspectives.The first involves the ensemble statistics (i.e., NRR and UR95) as well as hydrographs.The second focuses on the predictability of both scenarios for different lead times.Specifically, since the areal SWE is assimilated once per week (once SNOW17 states are updated, the model is run up to seven days before the next update occurs), lead times are considered varying from one day to seven days with a daily interval.Streamflow prediction with a specific lead time, day L (L = 1, 2, ..., 7), is a vector containing predicted flow of the L-th day following each measurement (and update) time during the entire prediction period.Deterministic metrics (including NSE, RMSE, percent bias, and correlation) from all scenarios during the prediction period are calculated (Fig. 7).In comparison to the RFC prediction, the ISURF prediction has a lower bias and RMSE and a higher correlation and NSE.However, when compared to the EnKF predictions, the ISURF prediction is poorer.This reflects the potential of data assimilation approaches for providing improved streamflow predictions over traditional (i.e., the RFC parameter calibration method) and advanced (i.e., ISURF) parameter identification methods.The combination of ISURF and EnKF, the ICEA, shows further improvement over the stand-alone EnKF.The NSE of the ICEA prediction is 2.4 % higher than that of the stand-alone EnKF, while the RMSE value of the ICEA prediction is 7.7 % lower.Compared to the RFC prediction, the ICEA NSE is 14.5 % higher and the RMSE is 27.6 % lower.The superior performance of the ICEA is even more significant in terms of percent bias.The ICEA prediction has a bias of 2.8 %, in comparison to −10.4 % for the stand-alone EnKF.As a reference, the bias of the RFC prediction is −31.6 %, suggestive of significant improvements in flow volume using the data assimilation approaches, especially the ICEA.
The performance of the four scenarios in providing high flow predictions is also investigated (Table 5; Fig. 8).Overall, all scenarios underestimate high flows, while the ICEA (mean) prediction outperforms the other three scenarios and the RFC prediction is least satisfactory (Table 5; Fig. 8).The ISURF predictions are comparable to the EnKF mean prediction (Fig. 8), yet the former has slightly better statistics (Table 5).Comparing the two ensemble scenarios specifically, the ICEA considerably outperforms the EnKF in terms of all four metrics (Table 5).Further investigation shows the average replicate-wise RMSE (averaged RMSE of all individual ensemble prediction traces) values associated with the EnKF and ICEA are 62.90 m 3 s −1 and 53.18 m 3 s −1 , respectively.These observations trace back to the fact that EnKF sample sensitive parameters in wider ranges relative to the ICEA.Part of these derived parameter sets may lead to flow predictions with larger biases.It is thus likely that the (mean) predictive skill of the EnKF is somewhat diluted by those biased predictions.The error bars of the EnKF predictions and ICEA predictions are also presented (Fig. 8c-d), while both the RFC and the ISURF only provide a single-valued prediction.It is evident that the ensemble predictions of both EnKF and ICEAs encompass the highest peak flow.
The dispersion of the ensemble streamflow predictions derived from the two data assimilation applications is also investigated (Table 6).Overall, the stand-alone EnKF ensembles show higher UR95 (25.85 %) when compared to that of the ICEA ensembles (20.58 %), indicating that the 95th uncertainty bound of the EnKF is wider than that of the ICEA (Table 6).At the annual scale, on average, the ensemble spread associated with the EnKF accounts for around 22 % to (WY 1996) to 43 % (WY 1995) of the magnitude of the observations.For the ICEA, the range is from about 3 % (WY 1995) to 34 % (WY 1996).This further indicates that the 95th percentile uncertainty bounds of the ICEA ensembles are relatively narrower.When looking at the entire ensemble, however, the spread of predictions from both scenarios are almost identical; not only during the entire prediction period but also in each individual year.The NRR values are consistently greater than 1, indicating that ensemble predictions of both scenarios have too little spread relative to the deviation of the mean prediction.Additionally, it is evident that the inter-annual variation pattern of NRR is similar for both scenarios.However, the inter-annual variation pattern of UR95 is very different in the two scenarios, suggesting that the stand-alone EnKF and ICEA provide significantly different ensemble predictions though the overall spread of both ensemble sets (NRR) are similar.
To examine the performance of the stand-alone EnKF and ICEA methods at a higher temporal resolution, ensemble streamflow predictions derived from both approaches are investigated at the daily scale against the RFC single-valued predictions as well as observed streamflow.
For demonstration purposes, the results from one year (WY 1995) are presented (Fig. 9).The selection of WY 1995 is based on the facts that (1) it is the wettest year in the prediction period in terms of both annual precipitation and the total annual flow volume; and (2) the maximum peak flow (515.29 m 3 s −1 ) within the prediction period occurs in this year (Table 3).Results indicate that both the EnKF and ICEA ensembles are fairly narrow, which is consistent with previously discussed results (Table 6).Both ensembles capture the primary peak at day 103 (Fig. 9), while this peak is underestimated by the RFC predictions.Both ensembles also contain the secondary peak at day 162.The high flows at days 106 and 213 are not totally encompassed by both ensembles.Performances of the EnKF and ICEA prior to day 230 are almost identical in terms of the ensemble spread and variation pattern.However, between day 230 and day 289, the performances of both approaches are considerably different.First, from day 230 to day 252, in comparison to the ICEA ensemble, the EnKF ensemble is much wider and well encompasses the observations; second, from day 265 to day 289, the ICEA ensemble reasonably captures the variation pattern of streamflow observations in this period, while the EnKF ensemble follows the variation of RFC predictions which deviate from the observed streamflow with a negative bias.
Recall that both the EnKF and ICEA assimilate areal SWE to update SNOW17 model states and thus produce rain plus snowmelt, which serves as input to SAC-SMA model to generate streamflow.To examine the underlying cause of the deviations in the approaches from days 230 to 289, it is necessary to scrutinize the corresponding SWE and rain plus snowmelt predictions.Given the high variability of the rain plus snowmelt time series, only the ensemble-mean values during the same year are plotted (Fig. 10).It is evident that prior to day 230, the mean SWE predictions from both approaches are almost identical, as are the rain plus snowmelt predictions.After day 230, the EnKF predicted SWE declines faster, which is most likely caused by the inconsistent non-rain melt parameter (MFMAX, which controls the non-rain melt rate of the snowpack) applied in both approaches.The enhanced decline in SWE produces higher rain plus snowmelt onward until day 252, which is the period when the EnKF ensemble streamflow predictions  significantly spread.From days 253-264, when there is a significant rainfall event (Fig. 9a), rainfall dominates the rain plus snowmelt.As such, the rain plus snowmelt predictions and thus streamflow prediction during this period again become nearly identical for both approaches (Figs.10b and 9bc).From day 265 until the snow melts out, there are no major rainfall events (Fig. 9a).Snowmelt thus dominates the rain plus snowmelt.From days 278-289, it is evident that the EnKF predicted SWE reduces to zero earlier than ICEA predicted SWE (Fig. 10a).This is most likely due to the fact that the EnKF employs parameter values for MFMAX and SI that produce large non-rain melt in this period, particularly given the fact that the EnKF samples these two parameters in significantly larger ranges (Table 1) than the ICEA (Fig. 6).During the same period, the rain plus snowmelt prediction of the ICEA is thus considerably greater than that of the EnKF (Fig. 10b).As a result, the ICEA predicted streamflow ensembles mimics the observed streamflow much better than the EnKF predicted ensembles in this period (Fig. 9b-c).
The stand-alone EnKF and ICEA predictions at different lead times are also examined.Both deterministic (using ensemble mean prediction) and ensemble statistics are calculated for lead times up to seven days (Fig. 11).The ICEA mean prediction on average outperforms the EnKF mean prediction at all lead times investigated (Fig. 11a-d), except at lead time day 2, where the NSE and correlation values of the ICEA mean prediction are slightly lower (Fig. 11c-d).This is similar to what is observed for the ICEA and EnKF mean predictions over the entire prediction period (Fig. 8).Particularly, the ICEA predictions have considerably lower bias (Fig. 11a) and generally smaller RMSE (Fig. 11b) compared to the EnKF predictions, indicating that ICEA outperforms the EnKF in providing both overall and high flow predictions.At all lead times, the 95th percentile ensemble predictions of the EnKF are wider than that of the ICEA (Fig. 11e), which is also the case when looking at the entire prediction period (Table 6).In contrast, the whole EnKF predicted streamflow ensemble is less overconfident than the ICEA ensemble at several lead times (day 2, day 6, and day 7), while the ensemble is more overconfident at other lead times (Fig. 9f).In general, the performance of both approaches remain fairly stable across all lead times (Fig. 11), while the predictive skill of operational models normally decreases with increasing lead times in real-time hydrologic forecasting.This is due to the fact that the precipitation and forcing data applied in the current study are actually observed values.In real-time forecasting, precipitation and air temperature are predicted future values (from numerical weather models) where accuracy decreases with increasing lead times (e.g., Cloke and Pappenberger, 2009;Wu et al., 2011).

Discussion and conclusions
The current study proposes an integrated uncertainty and ensemble-based data assimilation framework ICEA.This framework, consisting of a parameter uncertainty analysis algorithm (ISURF) and a data assimilation technique (EnKF), systematically addresses uncertainty in model forcing and parameters for the SNOW17 model.The performance of the framework is evaluated against observed streamflow and compared to the performance of three alternative scenarios: the current RFC operational parameters, the stand-alone ISURF, and the stand-alone EnKF.Datasets from a NWS operational basin are applied in the evaluation and comparison.
The key findings of the study include: 1.The automatic RFC prediction is improved by applying a parameter identification tool (i.e., the ISURF) to update model parameters or by employing a data assimilation technique (i.e., the EnKF) which assimilates SWE information to provide updated snow model states, or by using a combination of both (i.e., ICEA).The improvement over the current RFC prediction is most significant for the ICEA (in terms of ensemble-mean prediction), followed by the EnKF (ensemble-mean prediction) and then the ISURF.However, when evaluating only high flows (top 5th percentile), ISURF and EnKF (mean) predictions are comparable to each other and the ICEA (mean) prediction shows the most skill.
2. The 95th percentile streamflow prediction range of the stand-alone EnKF is generally wider than that of the ICEA, which is expected since the uncertainty ranges of the four sensitive SNOW17 parameters employed in the EnKF are wider.However, the spread of the whole ensemble (rather than 95th percentile) associated with the EnKF is comparable to that of the ICEA.The spread, based on the ensemble metric calculated (i.e., NRR), is too narrow relative to the (ensemble) mean prediction.This most likely stems from the definitions of uncertainty in forcing as well as areal SWE measurements.Specifically, precipitation and air temperature uncertainty are implicitly accounted for by the uncertainty of parameters SCF and PXTEMP.While this method is straightforward in concept and easy to implement in operations, it limits the spread of forcing ensemble since theoretically there are feasible bounds for those parameter values.Furthermore, a consistent standard deviation is assigned to the SWE measurement error.Sensitivity tests at the point scale illustrate that this error definition provides satisfactory ensemble results, leading to limited variations to large SWE values (He, 2010).Our ongoing work is investigating an alternative definition, namely, assuming this deviation is proportional to the observed SWE value.
3. Both the stand-alone EnKF and ICEA predicted streamflow ensembles contain peak flows during the prediction period, while the RFC prediction generally underestimate peaks.For the selected (extreme wet) year, both EnKF and ICEA ensembles perform similarly in the snow accumulation period, but dramatically different in the ablation period.This deviation stems from different melt rate values produced by both approaches which ultimately produce different rain plus snowmelt output.The streamflow ensembles predicted by both approaches reasonably capture the variation patterns of the observed streamflow; however, the ensembles are not wide enough to enclose all the observations.This further indicates that the uncertainty defined in the current study can be improved upon.
4. On average, the ICEA mean prediction outperforms the mean prediction of the stand-alone EnKF for all lead times investigated (day 1 to day 7 in with daily interval).However, the EnKF 95th percentile prediction bound is consistently larger than that of the ICEA at all lead times, which is also the case for the entire prediction period as previously discussed.As for the ensemble predictions at different lead times derived from both approaches, their spreads are again too narrow when compared to the (ensemble) mean prediction at each lead time.
Despite its demonstrated advantages, the ICEA can be further improved by addressing several issues in addition to using time-variant error for SWE measurement as previously mentioned.First, it should be highlighted that the SNOW17 model applies air temperature input to calculate melt rate during the snow ablation periods in addition to applying it as a threshold for distinguishing rainfall from snowfall together with model parameter PXTEMP.As such, the uncertainty of parameter PXTEMP only partially mimics uncertainty in air temperature forcing.To investigate the impacts of air temperature measurement error on melt rate, snowmelt, and ultimately streamflow prediction, this error needs to be explicitly defined.This error can be simply assumed to be a systematic error with zero mean and a certain variance, as with previous studies (e.g., Margulis et al., 2002;Durand and Margulis, 2006).Moreover, the uncertainty of parameter SCF derived from ISURF represents only partial uncertainty of precipitation.This is due to the fact that ISURF is applied over the entire training period to produce the uncertainty range and distribution information of SCF.Thus, the uncertainty of SCF is more representative of the average precipitation error in the training period rather than errors associated with individual precipitation events.To more comprehensively account for uncertainty in precipitation, it may be necessary to implicitly consider the error for each precipitation event.This could be achieved by assigned a random multiplier (with a certain distribution) to observed precipitation at each time step following previous studies (e.g., Margulis et al., 2002;Leisenring and Moradhkani, 2011).In addition, it should also be pointed out that model error is not considered in the current study, while model structural error is shown to considerably impact SNOW17 performance and lead to SNOW17-predicted SWE ensemble with significant spread (He et al., 2011a).This error can be addressed by perturbing the predicted SWE using a uniformly distributed and auto-correlated error assumption as explored in previous studies (e.g., Evensen, 1994;Leisenring and Moradkhani, 2011).Addressing this error is promising for resolving the narrow-ensemble issue encountered by the ICEA and standalone EnKF.Furthermore, it is worth noting that the ensemble streamflow predictions are evaluated only in terms of dispersion, while other aspects of ensemble predictions (e.g., reliability and discrimination) also contains meaningful information on the predictions (e.g., Franz et al., 2003;Brown et al., 2010;Demargne et al., 2010;Franz and Hogue, 2011).Our ongoing work, which attempts to enhance the ICEA framework to further improve streamflow predictions including real-time predictions, comprehensively considers the uncertainty in forcing, model structure, and measurement, the cross-correlation between parameters, and adopts alternative statistical metrics to verify the ensemble predictions.

Fig. 1 .
Fig. 1.Location and elevation ranges of the North Fork of the American River Basin (NFARB) as well as locations of streamflow and snow observational stations used in the current study.

Fig. 2 .
Fig. 2. Climatic characteristics of the NFARB sub-basins during the entire study period(WY 1979(WY  -2002)), training period(WY 1979(WY  - 1984)), and prediction period(WY 1991(WY  -1996)).Mean annual temperature at various percentiles over the lower (a) and upper (b) subbasins are highlighted, as well as the annual areal precipitation at various percentiles over the lower (c) and upper (d) sub-basins.

Fig. 3 .
Fig. 3. Flowchart of the EnKF applied in ICEA to recursively update SNOW17 model states with the uncertainty of sensitive model parameters considered.N and p indicate the ensemble size and the number of sensitive parameters, respectively; y 0 and µ represent model initial condition and forcing, respectively; z i and M designate the observation and measurement operator, respectively; C v , C yz , and C zz denote the variance of observation error, the covariance between model states and observations, and the variance of observations, respectively.

Fig. 4 .
Fig.4.Flowchart of modeling and assimilation (of areal SWE) procedures in generating ensemble daily streamflow.RM and Q represent rain plus snowmelt and streamflow, respectively; UH stands for unit hydrograph; t indicates time step (t = 1, 2, ..., T ; 6 hourly); ta is the daily time step aggregated from 6-hourly step; t i reprsents the measurement time when areal SWE is assimilated (refer to Fig.2for detailed assimilation procedure); j is the ensemble number (j = 1, 2, ..., N ); subscripts U and L denote variables for upper and lower sub-basins, respectively; subscript c designates the combined variable.

Fig. 6 .
Fig. 6.Marginal distributions (in bars) and scatter plots (in dots) for posterior SNOW17 parameters.R signifies the linear correlation between parameters.

Fig. 7 .
Fig. 7. Statistics between observed and predicted streamflow during the prediction period (WY 1991-1996) under four scenarios considered for Bias (a), RMSE (b), Correlation (c) and NSE (d).The ensemble-mean prediction is used in estimating statistics for the EnKF and ICEA scenarios.

Fig. 8 .
Fig. 8. Scatter plot of predicted and observed high flows during the prediction period under scenarios (a) RFC, (b) ISURF, (c) EnKF, and (d) ICEA.For the EnKF (c) and ICEA (d) scenarios, the entire ensemble prediction ranges (error bars) are also shown except for the ensemble mean prediction (dots).

Fig. 9 .
Fig. 9. Basin-average precipitation (a), ensemble streamflow predictions from the stand-alone EnKF (b), and ensemble streamflow predictions from the ICEA (c) for water year 1995.Also shown are observed streamflow (circles) and RFC streamflow predictions (black line) during the same period.The grey regions correspond to the entire ensemble range.

Fig. 10 .
Fig. 10.Ensemble mean SWE (a) and Rain plus Snowmelt (b) time series for WY 1995 derived from the stand-alone EnKF (black line) and the ICEA (dash line).Also shown in the upper panel is the areal SWE (circles) derived from SWE observations at three snow stations.

Table 1 .
SNOW17 model parameters and state variables that are updated in the data assimilation applications.Parameter ranges are estimated from .
equivalent (SWE) observations.The rain plus snowmelt serves as input to the SAC-SMA model to produce streamflow predictions.An operational watershed in California is used as the study basin to demonstrate the applicability of ICEA to the coupled SNOW17/SAC-SMA model in terms of providing streamflow predictions.Results are evaluated against the RFC predictions as well as that from a stand-alone EnKF.

Table 2 .
Basic information for the three study snow stations and the corresponding weights used in determining the areal snow water equivalent.

Table 3 .
Precipitation and observed streamflow characteristics during the prediction period.

Table 4 .
Statistics for observed and simulated streamflow using the RFC parameters and ISURF parameters during the training period.

Table 5 .
Statistics for observed and predicted high flows (above the 95th percentile of observed streamflow) during the prediction period.The EnKF and ICEA statistics are based on the ensemble mean predictions.

Table 6 .
Ensemble performance statistics associated with the EnKF and ICEA scenarios during the prediction period.