Variational assimilation of streamflow into operational distributed hydrologic models : e ff ect of spatiotemporal adjustment scale

H. Lee, D.-J. Seo, Y. Liu, V. Koren, P. McKee, and R. Corby Hydrology Laboratory, NOAA/National Weather Service, Silver Spring, Maryland, USA University Corporation for Atmospheric Research, Boulder, Colorado, USA Riverside Technology, Inc., Fort Collins, Colorado, USA NOAA, National Weather Service, West Gulf River Forecast Center, Fort Worth, Texas, USA Present address: Department of Civil Engineering, The University of Texas at Arlington, Arlington, TX 76019-0308, USA Present address: Goddard Space Flight Center, National Aeronautics and Space Administration, Greenbelt, MD 20771, USA


Introduction
Flood forecasting has long been a key research topic for government agencies in support of field operations (Droegemeier et al., 2000;NHWC, 2002;NRC, 2010;USACE, 2000).Changes in spatiotemporal patterns of precipitation events and the occurrence of unprecedented record-breaking events around the globe during the past decades (Knutson et al., 2010;Milly et al., 2008;Min et al., 2011;Trapp et al., 2007;Trenberth et al., 2003) poses pressing needs of rapid advances in real-time operational flood forecasting systems to mitigate water-related hazards (NRC, 2010).In the US River Forecast Centers (RFCs), the flood forecasting procedure has often involved manual modifications (MOD; Seo et al., 2003;Smith et al., 2003) in an attempt to merge human forecasters' understanding of the latest observations into the lumped model, which might introduce physical and dynamical inconsistencies into the forecasting process.In the case of distributed models, an automated procedure might be necessary given the excessively large dimensionality of state variables, which often cannot be handled by human forecasters in any meaningful way.
Data assimilation (DA) techniques help develop an automated procedure that merges information in the real-time hydrologic and hydrometeorologic observations into the hydrologic model dynamics and accounts for uncertainties from different error sources (Liu and Gupta, 2007;McLaughlin, 2002;Moradkhani, 2008;Seo et al., 2003Seo et al., , 2009;;Troch et al., 2003).In comparison to the application of data assimilation techniques to lumped rainfall-runoff models (e.g., Bulygina and Gupta, 2009;Moradkhani et al., 2005aMoradkhani et al., , 2005b;;Seo et al., 2003Seo et al., , 2009;;Vrugt et al., 2005Vrugt et al., , 2006; Weerts and Serafy, 2006), state updating of distributed rainfall-runoff models is highly subject to the overfitting problem due to the large dimensionality of state-space of the model and data scarcity for most basins in the world.From an operational streamflow forecasting perspective, except for atmospheric forcing, streamflow data is normally the only data source available for assimilation, which is often insufficient to constrain large degrees of freedom (DOF) associated with distributed modelling.Therefore, existing distributed rainfall-runoff models are mostly, if not all, under-determined, i.e., information available in the data is not enough to uniquely determine or identify state variables and/or parameters of the distributed rainfall-runoff model.In the case of streamflow prediction, the most significant problem in solving the under-determined system is that streamflow analysis at independent validation locations as well as streamflow prediction at any locations in a basin could be worse than base model streamflow simulations due to overfitting.This poses an obvious obstacle to advances in data assimilation for distributed models which requires developing appropriate assimilation strategy to constrain large degrees of freedom causing a state and/or parameter identifiability problem.
In the following, we summarize previous studies in which research results indicate issues on state and/or parameter identifiability when applying data assimilation techniques to distributed hydrologic modeling.Clark et al. (2008) tested the impact of assimilating streamflow at one location on streamflow prediction at other locations in the Wairau river basin in New Zealand by using the ensemble square root Kalman filter (En-SRF) and the distributed model TopNet.They found degraded streamflow results from the assimilation at independent validation locations in the basin, highlighting the importance of accurately modelling spatial variability, or correlation structure of hydrological processes in order to improve streamflow prediction at ungauged locations via assimilating streamflow observations at gauged locations.Lee et al. (2011) found that assimilating the outlet flow of the Eldon basin in Oklahoma into the gridded SAC with the variational assimilation technique degraded streamflow prediction at some interior cells in the basin in a synthetic experiment.Van Loon and Troch (2002) noted the degraded prediction of ground water depth at some locations in a 44-ha catchment in Costa Rica with the assimilation of soil moisture measured by a Trime time domain reflectometry (TDR) system at multiple sites, despite that discharge predictions were considerably Introduction

Conclusions References
Tables Figures

Back Close
Full benefited from soil moisture assimilation.Chen et al. (2011) carried out assimilation of 20 different sets of synthetically generated soil moisture observations into the SWAT model with the EnKF.They found, for some cases out of a total of 20, analyses of groundwater flow and percolation rate were degraded.Brocca et al. (2010) assimilated the rescaled Soil Wetness Index (SWI) into the semi-distributed model Modello Idrologico SemiDistribuito in continuo (MISDc) in a synthetic experiment where assimilation results on flood prediction were degraded for some experimental settings.
To address the aforementioned issues associated with data assimilation into distributed models in an operational context and develop an effective assimilation strategy to constrain degrees of freedom in distributed modelling, this study aims to investigate the effect of the spatiotemporal adjustment scale on the analysis and prediction of streamflow generated via assimilating streamflow data into the distributed SAC-SMA and kinematic-wave routing models of HL-RDHM with the variational data assimilation procedure (Seo et al., 2010;Lee et al., 2011).We test nine spatiotemporal adjustment scales based on combinations of three spatial scales (lumped, semi-distributed, distributed) of adjustment to state variables and three temporal scales (hourly, 6-hourly, time-invariant) of adjustment to mean field bias in the precipitation and potential evaporation data.On one hand, adopting coarser spatiotemporal adjustment scale reduces the dimensionality of the control vector, which may help prevent overfitting when solving the inverse problem.On the other hand, finer adjustment scale may be preferable for the basins showing highly heterogeneous soil and physiographic properties.In the experiment, three streamflow assimilation scenarios are considered, i.e., assimilating interior flow observations, outlet flow observations, or both interior and outlet flow observations.Given a spatiotemporal adjustment scale, each streamflow assimilation scenario is applied to four basins in Oklahoma and five basins in Texas, US.
The paper is organized as follows.Section 2 describes the methodology including the hydrologic model, the assimilation technique, and the evaluation metrics.Section 3 summarizes the study basins.Section 4 describes the multi-basin experiment and presents the results and discussions.Finally, Sect. 5 summarizes conclusions and Introduction

Conclusions References
Tables Figures

Back Close
Full The models used are the gridded Sacramento Soil Moisture Accounting (SAC-SMA) and kinematic-wave routing models of the National Weather Service (NWS) Hydrology Laboratory's Research Distributed Hydrologic Model (HL-RDHM, Koren et al., 2004).
The SAC-SMA is a conceptual rainfall runoff model (Burnash et al., 1973) which calculates fast and slow runoffs from two subsurface storages.The models operate on an hourly time step and at the Hydrologic Rainfall Analysis Project (HRAP) grid scale (∼16 km 2 ) (Greene and Hudlow 1982;Reed and Maidment, 1999).Multi-sensor precipitation data (Fulton et al., 1998;Seo 1998;Seo et al., 1999;Young et al., 2000) are available on the HRAP grid scale.A priori estimates of the SAC parameters (Koren et al., 2000) are derived from the soil data such as STATSGO2 (NRCS, 2006) and SSURGO (NRCS, 2004).Optimization of these a priori estimates is carried out via manual or automatic calibrations (Koren et al., 2004).For the four Oklahoma basins we use manually-optimized parameter values used in the Distributed Model Intercomparison Project (DMIP, Smith et al., 2004).For the five Texas basins, we use calibrated parameter values obtained from the West Gulf River Forecast Center.The kinematicwave routing model of Koren et al. (2004) is used to rout runoff on hillslopes and flow through the channel network.The flow direction from headwater to downstream HRAP grid cells is defined by the Cell Outlet Tracing with an Area Threshold (COTAT) algorithm (Reed 2003) using the Digital Elevation Model (DEM) data.The routing parameters are estimated from the DEM and channel hydraulic data (Koren et al., 2004).
Figure 1 shows a schematic of the gridded SAC and kinematic-wave routing models of the HL-RDHM (Lee et al., 2011).Introduction

Conclusions References
Tables Figures

Back Close
Full 2.2 Data assimilation procedure

Variational data assimilation
We begin by describing state and observation equations for a nonlinear stochastic dynamic model in Eqs. ( 1) and (2), respectively (Lewis et al., 2006).
where X k and X k+1 denote model state vectors at time k and k + 1, respectively; M represents a nonlinear dynamic model that maps X k into X k+1 within the state space, or the gridded SAC in this study; U k represents an input to the model, or the precipitation and potential evaporation data in this study; W k+1 is the model error with covariance Q k+1 ; Z k denotes the observation vector; H expresses an observation operator that defines a nonlinear relationship between the observation vector Z k and the state vector X k , or the SAC and kinematic-wave routing models in this study; and V k denotes the measurement error vector with covariance R k .
A variational data assimilation problem is generally formulated as a least-squares minimization problem that minimizes the objective function J constrained by the model physics (Lewis et al., 2006;Liu and Gupta, 2007).Assuming that errors are independent (Schweppe, 1973), Eq. (3) defines the objective function to assimilate observations over the predefined time window, or the assimilation window of size L, which starts from some point in the immediate past K − L + 1, and ends at the prediction time

Conclusions References
Tables Figures

Back Close
Full In Eq. ( 3), P f ,K −L is the forecast error covariance of state variables at time K − L.
The variational data assimilation problem is then the same as finding X K −L that minimizes J K .Given a priori model states at the beginning of the assimilation window (X − K −L ), gradient-based numerical minimization algorithms can be used to locally optimize X K −L based on the gradient of J K computed with the adjoint method (Lewis et al., 2006).In the following section, we formulate the variational data assimilation problem for the specific case of assimilating streamflow observations into the gridded SAC and kinematic-wave routing models of HL-RDHM, and describe the numerical minimization algorithm adopted to find the local minimum of the objective function used in this study.

Variational data assimilation problem formulation for the gridded SAC and kinematic-wave routing models of HL-RDHM
The data assimilation problem tackled in this paper is summarized as follows: Given a priori SAC states at the beginning of the assimilation window, potential evapotranspiration (PE), real-time observations of streamflow and precipitation, and predefined spatiotemporal scales of adjustment for the control vector, update SAC states over the assimilation window by assimilating streamflow at the outlet and/or interior locations in a basin into the gridded SAC and kinematic-wave routing models and at the same time adjusting multiplicative biases for precipitation and PE.Introduction

Conclusions References
Tables Figures

Back Close
Full To solve this problem, we write the state equation for the SAC states in Eq. ( 4) and observation equations for precipitation, PE, and streamflow in Eqs. ( 5) to (7). (4) where X S,k , X S,k+1 , and . . ., X P,k , and the same for X E,K −L+1:k .
In deriving the objective function, we assume that the forecast error of state variables and observation errors are independent of one another and time-invariant (Schweppe, 1979;Seo et al., 2003) Equations ( 8) and ( 9) pose the nonlinear constrained least-squares minimization problem with strong constraints of the model dynamics.In Eqs. ( 8) and ( 9), n Q denotes the number of stream gauge stations, Z Q,l ,k denotes the streamflow observation at the l -th gauge station at hour k, and Z B,j,i ,K −L denotes the background (i.e., a priori or before-DA) model soil moisture state associated with the j th state variable and i th cell at the beginning of the assimilation window, H Q,l ,k () denotes the observation operator that maps X S,K −L to streamflow at the l th gauge station and hour k, X S,K −L denotes the SAC states at hour K −L, σ Q,l denotes the standard deviation of the streamflow observation error at the l -th stream gauge location, σ P and σ E represent the error standard deviation of the precipitation and potential evaporation data, respectively, σ B,j,i denotes the standard deviation of the error associated with the j -th background model state at Introduction

Conclusions References
Tables Figures

Back Close
Full the i -th grid, and λ j,i denotes multiplicative adjustment factor to Z B,j,i ,K −L .The vector At the beginning of the minimization, control variables (X P,k , X E,k , and λ j,i ) are set at 1 for all i ,j,k.Model dynamics and physical bounds, Eq. ( 9), act as constraints on the minimization problem.During the minimization, we allow X P,k and X E,k to vary hourly or 6-hourly or keep them time-invariant over the entire assimilation window; and λ j,i is to be adjusted independently at every individual cell, or uniformly over a predefined sub-catchment or the entire basin.Equations ( 8) and ( 9) are solved with the conjugate gradient method using Fletcher-Reeves-Polak-Ribiere minimization (FRPRMN) algorithm (Press et al., 1992).Gradients of the objective function to the control vector were calculated using the adjoint code generated from Tapenade (http://tapenade.inria.fr:8080/tapenade/index.jsp).
The simplified objective function, Eq. ( 8), makes the variational data assimilation (VAR) procedure used in this study computationally cheaper than the conventional VAR approach, ensemble Kalman filter (EnKF), and particle filters.Since the VAR procedure explicitly considers dynamical linkage of soil moisture states and atmospheric forcing in the immediate past to soil moisture and streamflow at the prediction time at each assimilation cycle, it is unnecessary to modify or improve the procedure to account for the time-lag issue between streamflow and soil moisture contents, which needs to be considered in other filtering-based methods such as EnKF (Seo et al., 2003;Clark et al., 2008;Pauwels and De Lannoy, 2006;Weerts and Serafy, 2006).On the other hand, the VAR approach does not inherently provide an ensemble modelling capability, as in EnKF, to produce an uncertainty estimate for streamflow predictions.

Evaluation metrics
The performance of DA procedure is evaluated using correlation coefficient (r), skill score (SS), root-mean-square-error (RMSE), and timing error (TE).We developed two types of correlation-based matrices (r1, and r2) as defined in Eqs. ( 10 gauge locations.The r2-matrix compares difference in spatial correlation structure of observed and simulated streamflow in off-diagonal entities, but association of the two at diagonal entities. where R denotes the operator to calculate Pearson's correlation coefficient of two streamflow time series; Q − s,i and Q − s,j denote simulated flow (without assimilation) at gauges i and j , respectively; Q o,i and Q o,j represent observed flow at gauges i and j , respectively; Q in Eq. ( 10) can be either Q − s or Q o ; subscripts i and j denote the indices for the stream gauges at interior or outlet locations.
The Skill Score (SS; Murphy, 1996) is calculated based on the sum squared error of streamflow before and after assimilation as presented in Eq. ( 12).
In the above, Q − s,k and Q + s,k denote streamflow at time k from the model simulation before and after assimilation, respectively; Q o,k represents streamflow observation; k denotes the time index.The positive SS means improvement after assimilation and the opposite for the negative SS.The SS value is 1.0 if DA is perfect and 0.0 if DA adds nothing.

Conclusions References
Tables Figures

Back Close
Full Timing Error (TE) in streamflow simulation, Eq. ( 14), is estimated with the information on the phase difference of observed and simulated hydrographs computed with a wavelet-based technique (Liu et al., 2011).
where T denotes the equivalent Fourier period of the wavelet; s and n represent the scale and location parameter of the wavelet, respectively; W X Y n (s) denote the cross wavelet spectrum of two time series X and Y ; ( ) and ( ) denote the imaginary and real parts of the variable in the bracket, respectively; denotes the smoothing operation in both time and frequency domains (Torrence and Compo, 1998).Further details on the wavelet-based timing error estimation technique are found in Liu et al. (2011).

Study basins
Figure 2 presents nine basins used in this study, and Table 1 summarizes details on the data and each basin.In Fig. 2, ELDO2 and SLOA4 are nested in Illinois River at the border of Oklahoma (OK) and Arkansas (AR); TIFM7 is a part of Elk River at the border of Missouri (MO) and Arkansas (AR); BLUO2 is a headwater basin to Blue River in Southern Oklahoma (OK).These four basins are located in the service area of Arkansas-Red Basin River Forecast Center (ABRFC).The other five basins including GBHT2, HBMT2, ATIT2, KNLT2, HNTT2 are located in Texas (TX) in the service area of the West Gulf River Forecast Center (WGRFC).The topography of ABRFC basins is gently rolling to hilly with the maximum elevation difference between the basin outlet and the interior larger than 200 m (Smith et al., 2004).On the contrary, the topography of WGRFC basins is generally characterized as flat to very flat (Vieux, 2001).Very large runoff coefficients for HBMT2 and GBHT2 are produced due mainly to the large extent of urbanized area around Huston, TX (Liscum, 2001).Especially, 105 Introduction

Conclusions References
Tables Figures

Back Close
Full extremely large runoff coefficient of the HBMT2 watershed is caused by the combined effect of 85 % of the watershed area being highly developed, clayey soils with low infiltration rates, and the lower 42 km channel being lined with concrete (Vieux, 2001).BLUO2, KNLT2, HNTT2, and ATIT2 are dry basins in comparison to the others, with the annual precipitation of less than 850 mm and runoff coefficients of less than 0.14 (Table 1).Like HBMT2, these four basins are also largely covered by clayey soils.Morphologically, BLUO2 is very elongated than the others.SLOA4 and TIFM7 show a radial shape of channel network with tributaries that all have a drainage area of a similar size.Figure 3 presents soil type and mean event rainfall on the HRAP grid for each basin.Inter-grid variability of mean event rainfall for each basin ranges from (3) the performance of DA procedure at the optimum spatiotemporal adjustment scale.
The experiment is composed of four steps as summarized below: -Step 1. Carry out a base model simulation and evaluate its performance on streamflow simulation.
-Step 3. Given a spatiotemporal adjustment scale, assimilate streamflow observations into the model separately for each of three assimilation scenarios, i.e., assimilation of outlet flow, interior flow, or both outlet and interior flows.
-Step 4. Repeat Step 3 for each of the nine spatiotemporal adjustment scales.
In Step 2, the sensitivity of the performance of DA on streamflow observational error variance (σ 2 Q ) is examined in order to find optimal σ 2 Q .In the sensitivity run, outlet flow from the entire data period is assimilated and seven different σ 2 Q values (0.01, 0.1, 1, 10, 100, 1000, 10 000 (m 3 s −1 ) 2 ) are used.The results show that σ 2 Q = 10 (m 3 s −1 ) 2 yields optimum estimates of streamflow analysis and prediction in terms of RMSE of streamflow for all basins except TIFM7 for which σ 2 Q = 100 (m 3 s −1 ) 2 was chosen.Observational error variances for precipitation and PE are taken directly from Seo et al. (2003).Sample variances calculated from base model simulation, i.e., without assimilation, for the entire data period are used as error variances for background model states (Lee et al., 2011).We assume homoscedastic streamflow error and spatially homogeneous forcing error.This assumption can be lifted in the future in order to more effectively constrain the assimilation problem, relying on advances in uncertainty techniques that properly parameterize and quantify uncertainty associated with stage measurement, stage to discharge conversion, and spatial correlation of forcing error (Clark et al., 2008;Mandapaka et al., 2009).Introduction

Conclusions References
Tables Figures

Back Close
Full

Results and discussion
In this subsection, the experiment results are comparatively evaluated focused primarily on analysis vs. prediction, and dependent vs. independent validation.This is to make it easier to analyze and interpret the assimilation results as well as the overall performance of the DA in the presence of the issue associated with overfitting due to large degrees of freedom involved in distributed modelling.

Analysis of the assimilation problem
Prior to the assimilation, we diagnose the level of complexity of the assimilation problem for each basin via examining the spatial correlation structure of streamflow of both the observation and the base model simulation, and basin characteristics such as spatial heterogeneity of soil and precipitation.Figure 4 presents correlation-based matrices of streamflow.In Fig. 4, correlation matrices are calculated using streamflow data at any paired gauges (i.e., interior and outlet as well as interior and interior) at the concurrent time step in part due to difficulty in correctly estimating travel time for all paired gauges (even impossible for two interior gauges located at different river tributaries) and reflecting them in the calculation of correlation matrices.Correlations of time-lagged simulated interior and outlet flow as a function of a lag time closely followed those based on streamflow observations.This supports the idea of using correlation matrices shown in Fig. 4 for analyzing spatial correlation structure of streamflow process, albeit less than accurate.The 1st row of Fig. 4 presents the r1-matrices (see Eq. 10) showing spatial correlation structure of streamflow observations.In all correlation matrices, the stream gauges are sorted in the increasing order of drainage area starting from the bottomleft corner.In Fig. 4, streamflow observations at the outlet gauges of most basins are highly correlated with those at interior gauges.For BLUO2, the low spatial correlation of interior and outlet flows could be a result of less rainfall on the upstream area and the distant location of the interior gauge to the outlet.For ELDO2, the weak spatial correlation of flow at the outlet and that at DUTCH may be largely affected by different soil Introduction

Conclusions References
Tables Figures

Back Close
Full types within the drainage area of DUTCH in comparison to other parts of the basin.For ATIT2, upstream flows at some interior gauges, particularly SCAT2 and BCDT2, are weakly correlated with downstream flows at BDUT2 and the outlet.This may be due to small drainage area and the location of SCAT2 and BCDT2 on minor river tributaries.The 2nd and 3rd rows in Fig. 4 show r1-matrices of streamflow generated from base model simulation, and r2-matrices (Eq.11) of observed and simulated flows prior to the assimilation, respectively.r2-matrices in Fig 4 indicate that the model simulation generally well reproduces the spatial correlation of streamflow at two locations in a basin, particularly GBHT2, HBMT2, HNTT2, and KNLT2 with correlation difference of streamflow observation and simulation (off-diagonal terms of r2) of less than 0.1.In addition to the absolute value of r2 off-diagonal terms, the unity of their signature is treated as another information associated with the degree of complexity of the assimilation problem; that is, overall overestimation or underestimation of spatial correlation structure of the streamflow process is considered less ill-posed than combination of over-and under-estimation.In the latter case, independent validation results posterior to the assimilation may benefit less at any adjustment scales than the former due to the interference of the correlation to the assimilation procedure in a complicated way.
In this regard, ELDO2, ATIT2, KNLT2, and WTTO2 can be viewed as more ill-posed than the rest basins.

Effect of spatiotemporal adjustment scale on the performance of the DA procedure
Figures 5 and 6 present mean SS of streamflow analysis and prediction, respectively, for all assimilation scenarios and adjustment scales.Streamflow prediction is generated with updated state variables at the prediction time and historical precipitation data and monthly climatology of PE as atmospheric forcing over the forecasting window.For streamflow analysis (Fig. 5), the mean SS is calculated by averaging SS evaluated at every hour within the assimilation window for each event and for each gauge location separately (Eq.15).For streamflow prediction (Fig. 6), the mean SS is calculated by 109 Introduction

Conclusions References
Tables Figures

Back Close
Full averaging SS evaluated at every lead hour up to 6-h lead time for each event and for each gauge location separately (Eq.15).
In the above, SS i ,j,τ represents the skill score (Eq.12) calculated for i th event, j th gauge, τth lead hour; N T denotes the number of lead hours considered, e.g., N T equals the length of assimilation window (Table 2) for Fig. 5, and N T = 6 for Fig. 6; N G denotes the number of gauges, e.g., N G = 1 in the case of calculating SS for outlet flow but N G ≥ 1 in the case of computing SS for interior flows; N F denotes the number of selected flood events for each basin (Table 2).Note that the mean SS presented in Figs. 5 and 6 equally weighs SS from each event.Main observations from Figs. 5 and 6 are summarized below.The performance of DA is less sensitive to temporal adjustment scales than spatial adjustment scales.For the basins showing high spatial correlation between interior and outlet flows (GBHT2, HBMT2, HNTT2), the performance of DA is less sensitive to the spatial adjustment scales than other basins.For BLUO2, lumped adjustment yields less improvement than other cases due possibly to the low spatial correlation of interior and outlet flows.In a number of independent validation cases (i.e., validating the assimilation results with the streamflow data not used for the assimilation), the mean SS of streamflow analysis is less than zero, indicating possible overfitting.For GBHT2, HNTT2, and KNLT2, assimilating interior flows produced positive mean SS of streamflow prediction for the first 6 lead hours at both interior and outlet locations.However, assimilating outlet flow generally degrades interior flow prediction for most basins.This implies assimilating interior flow makes the DA problem less subject to the overfitting problem.Note that some events are affected by timing errors in model simulated flow estimates which are partially responsible for small to negative mean SS estimates for some cases.We discuss the timing error issue later in this section.Introduction

Conclusions References
Tables Figures

Back Close
Full To further examine the sensitivity of DA results to the adjustment scales, Figs.7 and 8 show box-whiskers plot of the mean SS in Figs. 5 and 6, respectively.In Figs. 7 and 8, each box-whiskers plot is constructed with 27 samples resulted from the combinations of nine basins and three (space or time) scales.Figures 7 and 8 can be summarized as follows.The performance of DA is generally higher at finer adjustment scales and is more sensitive to the spatial adjustment scale than the temporal adjustment scale in terms of the median SS and the central 50 percentile of the SS.The DA performance greatly depends on the streamflow assimilation scenario, i.e., assimilating outlet and/or interior flow data.Assimilating outlet flow does not improve interior flow simulation in most cases, whereas assimilating interior flows typically improve outlet flow simulation to some degree.This indicates the difficulty of propagating the information contained in outflow data backward (i.e., upstream) within the stream network to correct interior flow prediction when no information is available to correct the spatial correlation structure of streamflow processes.
An "optimum" spatiotemporal scale is selected, separately for interior flow and outlet flow prediction, for the three different assimilation scenarios (Fig. 9).The selection is based on the mean SS of streamflow analysis or prediction presented in Figs. 5 and 6.
In the case of streamflow analysis, a number of assimilation cases showed the largest improvement with the finest spatiotemporal adjustment scale.However, in the case of streamflow prediction, the optimum adjustment scales spread over a broader range of adjustment scales.This indicates the possible large over-adjustment of state variables in the cases of distributed, hourly adjustment.Despite the issue associated with the over-fitting problem, the cases of distributed, hourly adjustment generally produce the best assimilation results in comparison to other adjustment scales.

Performance of the DA procedure at the optimum spatiotemporal adjustment scale
For more quantitative analysis of assimilation results for each basin, we chose a single optimum adjustment scale for each basin which produces reasonable assimilation Introduction

Conclusions References
Tables Figures

Back Close
Full results, based on mean SS in Figs. 5 and 6, for analysis and prediction of interior and outlet flows.Selected adjustment scales are semi-distributed, hourly for GBHT2, lumped, 6 hourly for HBMT2 and distributed, hourly for all the other basins.Figures 10 to 14 present the evaluation of the overall assimilation results from different perspectives: RMSE of streamflow analysis and prediction evaluated at every lead hour, for each basin and for each assimilation scenario in Fig. 10, RMSE of streamflow analysis and prediction for all basins collectively in Figs.11 and 12, and timing error estimates from streamflow analysis and prediction for each assimilation scenario in Fig. 13.
To diagnose streamflow analysis similarly to Fig. 4, we examined the spatial correlation structure of streamflow analysis at the optimum spatiotemporal adjustment scale (not shown).The spatial correlation of observed and simulated flows at both interior and outlet locations of all basins are generally improved by assimilating streamflow data but at the expense of slightly adjusting spatial correlation structure of streamflow process Especially, in the case of ELDO2, the spatial correlation of CHRISTIE and outlet flows has been noticeably improved after streamflow assimilation, whereas in the cases of ATIT2 and KNLT2, spatial correlation structure of the streamflow at some paired gauges become worse after streamflow assimilation in comparison to that of base model simulation.This indicates that the gains from the DA do not always lead to improving the spatial correlation structure of the streamflow process, i.e., another symptom of over-adjustment.Besides, examining the spatial correlation structure of streamflow process at all adjustment scales indicated that, in comparison to outlet flow assimilation, interior flow assimilation reproduced the correlation structure more consistent to that of streamflow observations.This may be explained by the local information available in interior flow observations which is diluted at the outlet location due to the routing along the main channel as well as the averaging effect at the basin scale through process interactions.
Figure 10 presents RMSE of streamflow as a function of the lead hours.In Fig. 10, a lead hour is defined negative for the assimilation window and positive for the prediction window.For GBHT2, HBMT2, and HNTT2, all three assimilation scenarios Introduction

Conclusions References
Tables Figures

Back Close
Full improved both streamflow analysis and prediction.These basins show high spatial correlation of interior and outlet flows and the base model simulation reproduces the spatial correlation structure very well as presented in Fig. 4. For ill-posed basins ELDO2, ATIT2, KNLT2, and WTTO2, there is an indication of over-adjustment for streamflow analysis and prediction.For ELDO2, the level of over-adjustment is not outstanding possibly due to the smaller basin size and the relatively better base model simulation than the other ill-posed basins.For BLUO2, weak spatial correlation of interior and outlet flow seems to nullify the transfer of assimilation effect from one gauge location to another.TIFM7 also shows similar results as BLUO2.It is noted that for TIFM7 we use an observational error variance ten times bigger than that of the other basins.This may considerably reduce the amount of adjustment to state variables at most cells in TIFM7.Overall, assimilating streamflow data from a given gauge location generally produces improved streamflow analysis and prediction at the same gauge location, which is somewhat expected.On the contrary, the amount of improvement in streamflow analysis and prediction at independent validation locations is varying, depending on the level of under-determinedness, observational streamflow error variance, basin characteristics, and so forth.
Figure 11 presents RMSE of streamflow analysis versus peak flows of selected events; Fig. 12 is the same as Fig. 11 but for streamflow prediction.To evaluate overall performance of the VAR procedure based on multi-basin results, each plot is constructed with samples from all nine basin simulations.In comparison to assimilating outlet flow, assimilating interior flow yielded the similar amount of improvement (14 % reduction in RMSE after the assimilation) in outlet flow prediction for the first 6-h lead time.In contrast, in the case of outlet flow assimilation, gains in interior flow analysis (19 % reduction in RMSE after the assimilation) did not lead to improvement in multi-basin averaged skill in interior flow prediction over the first 6 lead hours, even though breakdown into each basin showed RMSE reduction by assimilation ranging from −31 % (ATIT2) to 14 % (GBHT2).This indicates that the outlet flow assimilation case is more vulnerable to the overfitting problem than the interior flow assimilation Introduction

Conclusions References
Tables Figures

Back Close
Full case.Assimilating both outlet and interior flows outperforms the outlet flow assimilation case in terms of outlet flow prediction (22 vs. 14 % reduction in RMSE after the assimilation), although outlet flow analysis is less improved (55 vs. 59 % reduction in RMSE after the assimilation).This proves the importance of additionally assimilating interior flows for streamflow prediction at the basin outlet, which is very important in real-time operation.Phase (or, timing) and flow magnitude are the two distinctive features of a hydrograph (Liu et al., 2011).To examine the performance of DA on the phase of a hydrograph, we also examine the timing error of a hydrograph for the assimilation and prediction windows separately.The timing error is the statistic translated from phase information estimated via the wavelet-based technique (Liu et al., 2011).Note that our timing error analysis is somewhat exploratory because of the objective equation, Eq. ( 8), used in this study, which includes no timing error modelling component.Figure 13 presents box-whiskers plots of timing error estimates (Eq.14) of observed and simulated hydrographs that characterize inter-basin inter-event variability.In Fig. 14, the reference is event-scale timing error estimates of streamflow observation and simulation prior to the assimilation.The positive timing error means observed hydrograph lagging simulated hydrograph.On the whole, timing errors of simulated hydrographs posterior to the assimilation at both outlet and interior stream gauge locations from the assimilation window are generally smaller than the reference.While flow timing errors for the prediction period are less improved via streamflow assimilation, their medians are mostly free of timing error especially in the case of outlet flow.For events with significant timing errors in rising limb, timing error correction via streamflow assimilation causes significant magnitude errors in predicted flows.Examples of this are illustrated in Fig. 14.Base model simulation for Event A in Fig. 14 shows significant timing errors in rising limb, peak flow and the overall shape of the hydrograph, whereas Event B is less contaminated by the timing error than Event A. Compared to Event B, control variables in the case of Event A are over-adjusted to improve streamflow analysis; however, this causes large underestimation of streamflow prediction.This problem is due to a lack Introduction

Conclusions References
Tables Figures

Back Close
Full of timing error modelling component in the formulation of the DA problem used in this study (see Eq. 8).As a result, the VAR procedure over-adjusts control variables to compensate for timing errors in streamflow analysis which result in magnitude errors in predicted flows.Further analysis indicated that the spatial correlation structure of streamflow process for events with timing errors of 3 h or bigger in the rising limb or peak flow simulation is more ill-posed than the other events, and that the spatial correlation structure of streamflow from the entire simulation appear to be very similar to that of events with timing errors.Timing errors must be treated separately from the magnitude error for the updating of state variables to the right direction, which is left for the future work.

Conclusion and future work
The importance of hydrologic data assimilation has been emphasized by many researchers as a unifying approach to account for different error sources in hydrologic model simulations in a cohesive manner and improving skills in streamflow prediction (Aubert et al., 2003;Liu and Gupta, 2007;Seo et al., 2003Seo et al., , 2009;;Clark et al., 2008;Vrugt et al., 2006).In comparison to lumped models, distributed rainfall-runoff models are highly subject to the overfitting problem due to large dimensionality of the statespace of the model.The streamflow data normally available in a real-time streamflow forecast environment may not provide sufficient information for constraining degrees of freedom in the inverse problem to the right direction.Depending on the complexity of a basin's hydrologic and hydrometeorologic properties and the degree of underdeterminedness in the inverse problem, the amount of adjustment necessary for state updating may vary from one basin to another.In this study, we investigated the effects of spatiotemporal scales of adjustment on the assimilation results and the overall performance of the data assimilation procedure based on multi-basin results for selected adjustment scales for each of nine basins in Oklahoma and Texas in the United States.Introduction

Conclusions References
Tables Figures

Back Close
Full Conclusions from this study can be summarized as follows: -The optimum spatiotemporal adjustment scale varies from a basin to another basin and from streamflow analysis to prediction when assimilating interior and/or outlet flows.The latter indicates the over-adjustment of state variables.The performance of the assimilation procedure is more sensitive to the spatial scale of adjustment than the temporal scale.The most preferred adjustment scale based on applying the VAR procedure to nine basins used in this study is found to be adjusting state variables in a distributed manner and adjusting precipitation and PE on an hourly basis, despite the fact that validation with streamflow at interior and outlet gauge locations at this adjustment scale may indicate overfitting in some cases.
-The accuracy of streamflow analysis and prediction is highly dependent on the streamflow assimilation scenario, i.e., assimilating outlet and/or interior flow data.
At the optimum spatiotemporal adjustment scale, both cases of assimilating interior flow and assimilating outlet flow yielded the similar amount of improvement (14 % reduction in RMSE after the assimilation) in outlet flow prediction for the first 6-h lead time.However, outlet flow assimilation produces degraded interior flow prediction for the first 6-h lead time (10 % increase in RMSE after the assimilation), but 15 % reduction in RMSE in the case of assimilating interior flow observations.This indicates that the outlet flow assimilation case is more vulner- spatial correlation structure of steamflow process.In comparison to outlet flow assimilation, interior flow assimilation reproduces the spatial correlation structure of streamflow process more consistent to that of streamflow observations.This may be explained by the local information available in interior flow observations which is diluted at the outlet location due to the routing along the main channel as well as the averaging effect at the basin scale through process interactions.This may also explain that interior flow assimilation outperforms outlet flow assimilation in terms of multi-basin averaged skill in streamflow prediction.
-Timing errors in streamflow analysis and prediction are found to be largely related to the ill-posedness of the spatial correlation structure of streamflow process at all adjustment scales and assimilation scenarios.Besides, correcting timing errors in streamflow analysis results in large magnitude errors in streamflow prediction in the cases of events with significant timing errors in rising limb.This indicates error compensation with over-adjusting state variables due partly to a lack of timing error modelling component in the objective function used in this study.
The future work should include improving the VAR procedure to account for timing errors in streamflow simulation, accounting for the model structural error (Van Loon and Troch, 2002;Chen et al., 2011) by applying the model as a weak constraint to the inverse problem (Zupanski, 1997), and incorporating an ensemble modelling capability into the VAR procedure to quantify uncertainty information associated with streamflow prediction (Zupanski, 2005).Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | gridded SAC and kinematic-wave routing models of HL-RDHM Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | and static.This assumption helps turn the vectors and matrices in Eq. (3) into scalars, which significantly reduces computational loads.Equation (8) shows the objective function to be minimized in this work.Discussion Paper | Discussion Paper | Discussion Paper | Minimize Discussion Paper | Discussion Paper | Discussion Paper | ) to (11).The r1matrix defines spatial correlation of streamflow, either observed or simulated, at paired 103 Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 12 (HBMT2) to 90 mm (HNTT2).BLUO2 and HNTT2 show a clearer pattern of rainfall spatial variability than the other basins.Mean event rainfall at the upper half of the BLUO2 basin is approximately 25 mm smaller than that at the lower half of the basin.Each basin has one or more interior stream gauges (nine for ATIT2).The drainage area of the study basins ranges from 137 (GBHT2) to 2258 (TIFM7) km 2 .Streamflow assimilation experiments are carried out via assimilating streamflow data into the distributed SAC-SMA at the prespecified spatiotemporal adjustment scale.Three streamflow assimilation scenarios are considered: outlet flow assimilation, interior flow assimilation, and outlet flow and interior flow assimilation.The experiment is designed to investigate the following aspects of the effect of spatiotemporal adjustment scale on the three different streamflow assimilation scenarios: (1) effect of spatiotemporal adjustment scale on streamflow analysis and prediction, (2) dependency of the optimum spatiotemporal adjustment scale on streamflow assimilation scenario, and Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | able to the overfitting problem than the interior flow assimilation case.Assimilating both outlet and interior flows outperforms the outlet flow assimilation case in terms of outlet flow prediction (22 vs. 14 % reduction in RMSE after the assimilation), indicating the importance of additionally assimilating interior flows for streamflow prediction at the basin outlet.-Basins showing highly correlated interior and outlet flows tend to benefit more from streamflow assimilation and less sensitive to the adjustment scale.Streamflow assimilation into the model at most adjustment scales generally improves the 116 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Table 1 .
Study basins where A denotes drainage area, N G the number of interior stream gauges in a basin, P mean annual precipitation, Q mean annual runoff, C runoff coefficient.(km 2 ) USGS ID N G Period of record P (mm yr −1 ) Q (mm yr −1 ) C

Table 2 .
The length of assimilation window, the number of sub-basins delineated from the channel connectivity map, the number of flood events denoted as N F , and the threshold of streamflow (Q T ) used to identify flood events.