Ensemble Streamflow Data Assimilation using WRF-Hydro and DART: Hurricane Florence Flooding

Predicting major floods during extreme rainfall events remains an important challenge. Rapid changes in flows over short time-scales combined with multiple sources of model error makes it difficult to accurately simulate intense floods. This study presents a general data assimilation framework that aims to improve flood predictions in channel routing models. Hurricane Florence, which caused catastrophic flooding and damages in the Carolinas in September 2018, is used as a case study. The National Water Model (NWM) configuration of the WRF-Hydro modeling framework is interfaced with the Data Assimi5 lation Research Testbed (DART) to produce ensemble streamflow forecasts and analyses. Hourly streamflow observations from 107 United States Geological Survey (USGS) gauges are assimilated for a period of one month. The data assimilation (DA) system developed in this paper explores two novel contributions: (1) Along-The-Stream (ATS) covariance localization and (2) spatially and temporally varying adaptive covariance inflation. ATS localization aims to mitigate not only spurious correlations, due to limited ensemble size, but also physically incorrect correlations between unconnected and 10 indirectly connected state variables in the river network. We demonstrate that ATS localization provides improved information propagation during the model update. Adaptive prior inflation is used to tackle errors in the prior, including large model biases. Analysis errors incurred during the update are addressed using posterior inflation. Results show that ATS localization is a crucial ingredient of our hydrologic DA system, providing at least 40% more accurate (RMSE) streamflow estimates than regular, Euclidean distance-based localization. Assessment of hydrographs indicates that adaptive inflation is extremely useful 15 and perhaps indispensable for improving the forecast skill during flooding events with significant model errors. We argue that adaptive prior inflation is able to serve as a vigorous bias correction scheme which varies both spatially and temporally. Major improvements over the model’s severely underestimated streamflow estimates are suggested along Pee Dee River in South Carolina and many other locations in the domain, where inflation is able to avoid filter divergence and thereby assimilate significantly more observations. 20


Introduction
Affecting nearly a hundred million people worldwide per year, flooding is the most common natural disaster (Guha-Sapir et al., 2013). Flooding impacts human life, livelihood, and property. Improved streamflow flood forecasts can benefit the public facility that provides software tools for data assimilation research, development, and education (Anderson et al., 2009). Hourly streamflow observations from USGS gauges, retrieved from the (United States) National Water Information System (NWIS), 95 are used to update the spatially distributed ensemble states of streamflow and groundwater bucket head. The resulting analyses could serve as the initial conditions for short term flood forecasts. The analyses are evaluated in terms of the assimilation priors (i.e., the one hour forecast). Hydrographs and other time-series assessment tools (including errors, bias, ensemble spread, etc) are utilized to investigate the performance of the DA framework. Streamflow distribution in space resulting from DA is also studied and compared to the model's estimate. 100 The rest of the paper is organized as follows. Section 2 presents the Hurricane Florence subdomain, the NWM submodel and its components, the uncertainties incorporated into the ensemble design and the USGS observations assimilated. DART is briefly introduced in Section 3 and then ATS localization and adaptive inflation are described in Sections 3.2 and 3.3, respectively. Spatial assessment with particular focus on bias correction is given in Section 3.6. A summary of the findings and further discussions are found in Section 4. Many of the largest flood peaks occurred after the dissipation of the hurricane on September 19, as water concentrated along its course to the sea. A time line of observed and modeled attributes is shown in Table 1. Observed    extent of the subdomain indicates the region over which atmospheric forcing data are used to drive the Noah-MP (Niu et al., 2011) land surface model (1km) and its two-way coupling to lateral surface and subsurface flow routing schemes (250m; Gochis and Chen, 2003) used by the NWM Analysis and Short-Range Forecast configurations (https://water.noaa.gov/about/nwm). As shown in Fig. 2, the lateral flow components are one-way coupled to the streamflow, reservoir and bucket models of the NWM.
Detailed description of this submodel can be found in following section. All model code, domain data and parameter sets used in this study correspond to NWM version 2.0. The single exception is the groundwater bucket model formulation and parameters which are based on NWM version 2.1. We run the equivalent of the NWM "standard analysis and assimilation" cycle without the streamflow nudging used by the NWM. This is the configuration also used for the short and medium range forecast cycles. During the time of the study, the NWM extended analysis cycle had not yet been implemented.

125
The major rivers in this region are labeled in Fig. 1. To the west, the Pee Dee river has its headwaters between Charlotte and  through aggregation of (overland, subsurface, and column drainage) routing output fluxes. These output fluxes are the inputs to the data assimilation system used in this paper, shown inside the dotted box. Random noise is applied to these inputs to generate ensemble forcings.
Ensembles are denoted by groups of three arrows (the ensemble size is much larger than 3). The ensemble fluxes drive the ensemble model components (channel and reservoir model and groundwater bucket model) used in the assimilation. The groundwater states produce additional fluxes to the channel and reservoir model. The spatially distributed streamflow and bucket head states comprise the "state" vector passed to DART for updating by USGS streamflow observations.

Channel + Bucket Submodel
We run the so-called channel+bucket submodel of the NWM. Fig. 2 illustrates the one-way runoff fluxes to the streamflow and groundwater bucket models from the "upstream" model components (land surface model, overland and subsurface routing, and spatial aggregations). The fluxes are saved as forcing files for running the channel+bucket submodel used in our assimilation 135 approach. To generate these fluxes, the full NWM is first run once with its own set of atmospheric boundary conditions.
The use of the one-way coupled submodel means that no error covariances with the "upstream" components of the model will be considered (e.g. soil moisture, surface head, etc) and that the control vector will consist of two spatially distributed states, streamflow and groundwater bucket head. Reservoir states, embedded in the stream network calculations, are not considered in the state updating.
This open loop run was then continued from 2018-07-01 through 2018-10-15 with the NWM operational analysis and assimilation cycle forcings which were collected in real-time from NOMADS (NOAA Operational Model Archive and Distribution System). This real-time forcing product is based on MRMS (Zhang et al., 2016) gauge-adjusted and radar-only observed precip-145 itation products along with short-range RAP and HRRR products (Benjamin et al., 2016, see https://water.noaa.gov/about/nwm).
For the period 2018-08-01 through 2018-10-15, fluxes were saved for forcing the channel+bucket submodel in the data assimilation experiments. Fig. 2 shows these fluxes as inputs to the data assimilation system. Initial states for the data assimilation experiments were also taken from the full model run on 2018-08-01.

150
The NWM implements Muskingum-Cunge (M-C) streamflow routing with variable parameters (e.g., Ponce and Yevjevich, 1978) in a compound channel (Garbrecht and Brunner, 1991). M-C with variable parameters is a common approach to streamflow routing over large watersheds and has been successfully applied in many instances. The compound channel (Fig. 3) provides a lower trapezoidal channel and an upper rectangular channel section to simulate overbank flows. M-C is applied to the stream channel network derived from NHDPlus version 2, shown in Fig. 1, with trapezoidal channel geometry and 155 Manning's N (roughness) parameter values for each reach.
The one-dimensional storage (S) relationship between inflow (I) and outflow (O) on a reach (spatial segment) is given by: with storage coefficient K and weighting factor X. Formulated as a finite difference over a reach, this yields an explicit solution: for the current out flow (O k , k denoting the time step index) as a function of previous and current inflows (I k−1 and I k , respectively), previous outflow (O k−1 ), and the lateral inflows (combined overland and subsurface, L) to the reach. The coefficients in equation 2 can be found in the literature, expressed as combinations of the respective Courant and Reynold's numbers: where ∆t is the time step, ∆x is the reach length, s is the reach slope, c is the celerity and q is the unit discharge (discharge divided by the top width of the flow). Also shown are the relationships to K and X parameters.
The assumptions of M-C approach do not allow for backwater effects in the solution. However, the M-C variable parameter approach allows nonlinear flood wave dynamics by accounting for the interdependence of the time-varying flow rate and its 170 geometry. Specifically, the celerity and unit discharge used for calculating the coefficients in Equation 2, depend on the flow (Q) and its area (A) and top width (b) which are mediated by the channel geometry and roughness parameters in each reach. These parameters, some of which are shown in Fig. 3, are 175 described in more detail in Section 2.6. The equation for celerity can be solved from Manning's equation for uniform flow. Garbrecht and Brunner (1991) solve the celerity equation for the case of the compound channel shown in Fig. 3. The variable parameter approach is an iterative solution, updating the flow and its geometry in alternate steps, to converge on a physically consistent discharge-geometry solution. The implementation in the NWM follows the "secant method" which takes a high and a low departure from an initial water depth and iterates through the equations of geometry and flow until the calculated flow 180 rates converge within some threshold. Before the flow rates converge, their differences are used to reduce the discrepancy in the estimated water depths.
We note that the NWM makes a "short timestep approximation," I k = I k−1 , to eliminate the spatial/topological dependence at the current time and render the solution of the M-C method embarrassingly parallel.
Reservoir objects embedded into the NWM routing network accept fluxes from the streamflow network and from the over-

185
land and subsurface routing model on adjacent grid cells. Water is discharged to the stream network via equations for both weir and orifice flow in the NWM "level pool" scheme. Because we do not include the reservoir level in the assimilation state vector, the reader is referred to Gochis et al. (2020) for further details.

Groundwater Bucket Model
Even when lateral routing processes are included in hydrologic modeling, deficiencies in soil and aquifer data and model where A is the bucket area. The finite capacity of the bucket is expressed in terms of a maximum head, z max , a tunable parameter. When the current head exceeds this threshold, the Q spill term becomes non-zero, discharging all excess head in a single timestep.

205
if z k <= z max then Head up to and including the z max results in bucket discharge following an exponential equation, containing two additional tunable parameters, E (unitless) and G (m 3 /s):

215
The spill discharge and the exponential bucket discharge are finally combined to give the total bucket outflow at the current time step ( O k ) and the depth of water corresponding to Q exp is removed from the bucket.
Calibration of the bucket parameters (in advance of NWM version 2.1 calibration) yielded the following spatially uniform 220 bucket parameters used in this study: C = 0.005, E = 7.1244, and z max = 15.6476.

Sources of Uncertainty: Ensemble Design
We construct an ensemble of 80 members. This number was selected to balance computational demands and statistical performance. We did not investigate the effect of ensemble size on the results within this study. Incorporating different sources of uncertainty into the ensemble is necessary to create variability and to obtain a good estimate of the background error co-225 variance. Background error covariances are considered amongst and between the spatially distributed streamflow and bucket states (O k and z k , respectively). We produce error distributions in these states through a priori error distributions on: 1) stream channel parameters, 2) forcing fluxes to the channel reaches and 3) forcing fluxes to the buckets.
The error distribution imposed on the streamflow channel parameters is time invariant and unaffected by the state update.
This kind of error source is termed as "multiphysics" (Berner et al., 2011), meaning that each member runs a different physical

USGS Streamflow Observations
The USGS streamflow observations used by the NWM are provided along with its output in near-real-time on NOMADS.
The streamflow observations in these files, which correspond very closely to the values assimilated by the NWM, are always "provisional" because they are near-real-time and they are subject to revision until they have been thoroughly assessed. For this study, we collected NWM observation files as well as revised values from the USGS's NWIS many months after the time 250 period of this study. As expected, there were significant revisions to the streamflow values in the months following Hurricane Florence. While it is important to study the differences between the provisional and approved data, here we chose to focus exclusively on the revised observations. However, we note that the difference between these observation sets had a significant impact on our results. Fig. 1 shows the 107 USGS streamflow gauges used for evaluation in this study in green and red. The names of the gauges in 255 red are given in the caption as these locations are specifically called out in the results. All stream gauges considered in this study have their contributing area entirely contained within this subdomain. All experiments presented here use the heteroskedastic error model of 20% of the observed flow for the observation error (m 3 /s). This is certainly a simplistic approach, but the magnitude is roughly in line with previous studies (e.g. Coxon et al., 2015). We note that while important, the observation error plays a somewhat secondary role in the quality of the assimilation, particularly given the application of inflation. model: where T is the DART state consisting of the streamflow and the bucket distributions. The superscript i is the number of the ensemble member, N e is the ensemble size, f denotes forecast (prior) and a is the analysis (posterior). The func-270 tion M refers to the WRF-Hydro submodel. θ and γ denote a set of physical parameters and input model forcings (described in Section 2.6), respectively. As the data become available, DART assimilates the observations serially and applies an EAKF update (Ensemble Adjustment Kalman Filter, Anderson, 2003) as follows:

275
The observation increments, ∆y (i) , are first computed and then used to obtain the increments to the state, ∆x j . σ xy denotes the prior covariance of the observed variable, y, and the j th element in the state vector, x. The total number of elements in the state is denoted by N x . The sample variance of the observed variable is σ 2 y . A localization coefficient, 0 ≤ α ≤ 1 is used to limit the impact of spurious correlations in the update. α is computed as a function of the distance between observation and state variables given a predefined correlation structure.

280
Streamflow gauges are available at the location of the state variables, and assumed representative of the model element to which they are associated. This makes the (forward) observation operator linear and equal to the identity matrix, significantly simplifying the implementation of the update step in DART. Variance underestimation is tackled through covariance inflation such that the ensemble right after the forecast or analysis steps is inflated around its mean: 285 where x j is the j th element of the ensemble mean and the notation f |a is used to refer to either forecast or analysis ensemble.
The inflation factor √ λ (typically larger than 1) yields a sample covariance matrix scaled by λ.

Along-The-Stream Localization
It is well-recognized that the use of small ensemble sizes produces imperfect sample covariance matrices (e.g., Houtekamer and Mitchell, 2001). In fact, with a small ensemble the probability density function of the state remains only partially explored 290 which can possibly yield loss of information and even filter divergence. In addition, the sample covariance would generally be contaminated with spurious unrealistic correlations that may degrade the quality of the Kalman update.
To overcome these issues, we resort to using covariance localization. The idea is to taper any spurious correlations between variables that are physically far from each other and are possibly uncorrelated, using α in equation (16). Studies have shown that given the Euclidean distance between different variables, a correlation function could be utilized to compute a localization 295 factor, α. In the present study, a simple Euclidean distance could be inappropriate in many circumstances. For example, reaches from two different watersheds could be physically close but be highly unrelated, particularly in terms of their error correlations.
To this end, a topologically based localization strategy that adheres to the river network structure is applied. We introduce the Along-The-Stream Localization (ATS Localization) strategy. The idea is that only the reaches upstream and downstream from an particular observation are considered during the update (Fig. 4). The localization factor, α, is computed using a selected 300 functional form (e.g. Gaspari-Cohn, boxcar, or ramped-boxcar, see inset Fig. 6) which depends on the distance between any two reaches and the tunable localization radius, r.
ATS localization highlights some key features: (i) Downstream from each observation, we assume that the flow of information only travels downstream and not also back upstream. As such, we obtain tree-like shapes where the number of close km clearly demonstrate this feature.

310
The proposed localization method shares a lot of similarities with that of Emery et al. (E20, 2020b). The fundamental difference is that we are tackling sampling errors in the forecast error covariances with each assimilation cycle. Our sample covariances are computed using the evolving ensemble unlike E20 in which the authors use time-invariant covariances. Given that the hydrologic model used in E20 is linear, their system is technically an optimal interpolation with fixed error statistics.
Localization in this context is used to address structural errors of the covariances and to compensate for time-invariant co-315 variances. Another important difference is that our ATS localization approach can ensure that the impact of the observation decreases as the distance from the observation, both upstream and downstream, increases. For example, the fifth-order polynomial function of Gaspari and Cohn (1999) can be used to find the localization coefficients or other functional forms can be used. In E20, on the other hand, all reaches that are close to the observation are assigned the same weight (i.e., α = 1).   Such a performance, however, is inconsistent in space as can be seen at Tar River (Rocky Mount).

Tuning localization parameters
Overall, the best performance is obtained using r = 100 km. This was confirmed not only at the diagnosed gauges in Fig.   330 5, but also at the majority of the other available gauges in the domain (not shown). Our analysis suggests that larger radii generally give rise to spurious correlations and smaller ones limit the amount of useful information, both of which degrade the quality of the streamflow estimates.
The effect of the choice of the correlation function, used in the ATS localization scheme, is also investigated. We compare 3 different functions: Gaspari-Cohn, simple Boxcar (similar to E20) and a Ramped-boxcar. The resulting prior ensemble mean 335 from each scenario is evaluated at all 107 streamflow gauges in the domain and the results for r = 100 km are summarized in the Taylor plot (Taylor, 2001) of Fig. 6. The diagram is useful to quantify the degree of correspondence between the gauge observations and the prior streamflow estimates in terms of 3 statistics: the Pearson correlation coefficient, centered root mean squared error, and the standard deviation. We note that boxcar and ramped-boxcar estimates produced erroneous results at few (around 7) gauges and these results had to be removed from the diagram for visual purposes. Averaging over all Boxcar and Ramped-Boxcar functions only outperform Gaspari-Cohn for small localization radii (e.g., r = 50 km), however, the results produced with Gaspari-Cohn and r = 100 km have the best accuracy. This configuration, as a result, is selected and 345 used in all other results shown in this study.

ATS Comparison to Euclidean Distance-based Localization
This section compares the proposed topologically based ATS localization to the regular Euclidean distance-based localization.
Instead of searching for close streams on the river network as in the previous section, the regular approach looks for close by reaches with a circle given a prespecified localization radius. Five different localization radii; namely r = 1, 2, 5, 10, 20 km are tested. The resulting streamflow estimates are summarized and compared to the ATS (100 km) run, for two gauges, in Table 2.
The two gauges, shown in Fig. 1, are selected such that the performance is assessed at relatively low (i.e, Tar River at Tarboro) and high (i.e., Deep River at Moncure) streamflow regimes.
Among the five experiments that use regular localization, the best performance is suggested using r = 10 km. As r decreases below 10, the quality of the prior and posterior streamflow estimates diminishes. For instance, the time-averaged prior RMSE 355 for r = 10 km and r = 1 km at Tar River is 8.86 cms and 34.323 cms, respectively. It is also noticeable that smaller localization radii yield large prior and posterior ensemble spread. This happens because the tiny localization radius tends to limit the impact of the data during the update and hence the shrinkage of the uncertainty around the ensemble mean gets restricted. For r = 20 km, the performance strongly degrades at Tar River. For example, the posterior bias is shown to grow from -0.74 cms using r = 10 km to -11.41 cms for r = 20 km. The reason for such a behavior is that with a radius of 20 km, streamflow gets 360 falsely updated with information from nearby basins. Although these basins are physically close, however, they are governed by different flow regimes. The same is not true at Deep River and that is because the basin which Deep River belongs to is much larger and thus a localization radius of 20 km cannot contaminate the streamflow as we described at Tar River. Increasing the localization radius beyond 20 km yielded catastrophic updates for the streamflow and the subsurface bucket state. In fact, all DA experiments run with regular localization and r > 20 km failed at different stages (typically 2 weeks into the run).

365
Prior and posterior streamflow results obtained using ATS localization are significantly better than those with the regular localization. Unlike regular localization, using the proposed ATS approach we are able to increase the effective search radius because the algorithm adheres to the physical aspects of the streamflow problem. Compared to the 10 km regular localization run, ATS produces at least 40% more accurate (in terms of RMSE) streamflow estimates. This is consistent for all 107 gauge locations. Because the algorithm allows the use of large localization radii, ATS scheme further yields more certain estimates 370 (smaller spread) than those that use regular localization.

Inflation
Variance underestimation in ensemble Kalman filters is a common issue that usually happens in the presence of large sampling errors and model biases (Furrer and Bengtsson, 2007). Sampling errors are the result of using a limited ensemble size. Model biases are deficiencies in the model causing predictions to be far from the observations. Other sources of errors that might 375 degrade the performance of the filter include non-Gaussianity (Anderson, 2010), systematic errors in the observational operator and representativity errors (Hodyss and Nichols, 2015). In practice, studies have shown that when model biases exist they tend to dominate other errors in the system (e.g., El Gharamti, 2018) and thus, treating model errors is often prioritized.
In this section, we consider three approaches to deal with the issue of variance underestimation: prior inflation (PR-inf), shown to produce the best results in application to atmospheric general circulation models.

385
The algorithm used to compute the inflation is adaptive in time, based on Bayes' theorem as in eq. (18), and results in spatially varying inflation fields.
The algorithm assumes the inflation to be a random variable with an inverse-gamma prior distribution p (λ).   averaged RMSE values are slightly better than those obtained using the PR-inf run.

405
At the downstream gauge (near Goldsboro as shown in Fig. 8), the discharge is almost 4 times larger than the upstream gauge and the overall model fit to the data looks better. Towards the end of the flooding event (around Sep. 30 th ), PO-inf better delineates the data compared to the NO-inf case. However, a false modeled flood wave appears during this time in

Choosing the best inflation
The results shown in figures 7 and 8 clearly demonstrate the usefulness of prior inflation at mitigating model biases.  in turn helps the filter reject less data (9.6% compared to PR-inf run's 15.1%) and produce higher quality estimates. The same behavior was observed at a few other gauge locations in the domain (not shown).
Computationally, combining both adaptive prior and posterior inflation schemes is more expensive than running each scheme alone. Our experiments suggest that the extra wall-clock time required to perform a full PP-inf run is around 20% of the total computing time required by PR-inf or PO-inf. In the current framework, the higher complexity is not found prohibitive 430 especially when one takes into account the performance benefits that PP-inf provides. As a future study, it would be interesting to run other PP-inf cases with smaller ensemble size -to match the cost of the PR-inf run -and investigate the performance.

Inflation in space
The adaptive inflation varies spatially. With each cycle a different inflation factor is assigned to each value in the state vector.
Using cross-correlations in the joint covariance, inflation is therefore computed not only for streamflow but also for the bucket 435 portion of the state. Fig.10 maps the prior inflation for both streamflow and bucket obtained using PP-inf run. The displayed inflation field is an average over all fields obtained during the flooding period; i.e., between Sep. 12 and Sep. 18. Because of the localized update, the displayed inflation patterns generally follow the tree-like localization shapes (Fig. 4). Inflation values tend to increase near the observation locations and decrease away from the gauges. This is why many reaches, especially in the north east part of the domain, have no inflation (i.e., √ λ f = 1). Given the hourly assimilation of streamflow data, bucket 440 inflation values are relatively smaller than the streamflow ones. Streamflow inflation at more than 90% of the reaches do not exceed the value of 2. Reaches with very large inflation values are located in densely observed areas, the inflation helps restore ensemble spread after multiple, sequential state updates results in loss of spread.

Overall Assessment
Prior to the the hurricane landfall on Sep. 14, streamflow estimates of the model appeared relatively good. The major differences 445 between observed and modeled streamflows resulted from the hurricane. The impact of DA prior to the hurricane is marginal.
To investigate this further, we show posterior streamflow maps on Sep. 13, 15 and 17 in Fig. 11 (top row panels). We also estimates are 56% and 90% more accurate than the open loop. Such a significant enhancement is obtained due to a massive inflation that gets applied to the prior streamflow ensemble. As shown in the right panel of Fig. 12, the inflation mean on Sep.
how powerful the adaptive inflation algorithm in tackling large biases in the model. It is important to note that if the inflation variance was fixed in time, then the inflation mean will not have the room to grow as much and hence the fit to the observed 470 discharge will not be as good. The posterior inflation mean values during the flood (not shown) were ranging between 1 and 2. In terms of ensemble spread, due to inflation the DA estimates are almost 2 orders of magnitude larger than the open loop during the flood. The posterior ensemble spread is consistently smaller than the prior given the continuous hourly shrinkage caused by the Kalman update.   This study presents two main contributions within a generalized ensemble DA framework for hydrologic systems, particularly those defined on irregular grids such as a stream network. First, a topologically based "along-the-stream" (ATS) localization is shown to improve information propagation during the model state. Localizing the impact of the update miti-505 gates sampling errors due to undersampling as well as other analysis errors. Moreover, ATS localization specifically eliminates error-covariances between unconnected streams. The algorithm requires tuning of a localization radius and we do not attempt to diagnose a physical basis for estimating the optimal radius a priori (as such a discussion should probably include estimation of temporal error covariances not considered in this study). However, ATS localization was found to produce results significantly better than the regular Euclidean distance-based approach. The improved results stem in part from a larger localization 510 radius under the ATS approach, indicating more effective propagation of the observations in the update along the stream than through Euclidean space. While the ATS approach does not further the cause of "predictions in ungauged basins", it indicates that further research into novel localization strategies for streamflow DA may bear additional fruit. On this point, we note that the impact of the ATS localization strategy on the results of this study relative to the impact of adaptive inflation and bias correction is remarkably larger than would be expected in application to atmospheric DA.

515
The second major contribution of our study is to demonstrate utility of spatially and temporally varying adaptive inflation (El Gharamti, 2018)  The most challenging aspect of the hydrologic DA is the problem of model biases or errors. These biases are usually associated with inaccurate boundary conditions (e.g. precipitation), uncertain parameters (e.g. channel roughness and slope) 535 or model physics deficiencies. This study has shown that adaptive inflation can prove effective at handling biases in the data assimilation. Apart from inflation, a handful of other techniques can be performed to mitigating bias issues. Jointly estimating highly uncertain model parameters alongside the state is an approach commonly found in hydrology (e.g. Vrugt et al., 2006;Gharamti et al., 2015;Abbaszadeh et al., 2018;Ziliani et al., 2019). Updating parameters often increase the complexity of the DA framework (nonlinearity often increases in state-parameters estimation problems) and the computational cost may become 540 prohibitive, especially for spatially varying parameters. Yet, such an approach may yield improvements to the analyses of this study. The multiphysics approach considered here aims to incorporate uncertainty in the fixed boundary condition (geometry, roughness parameters) into the ensemble in order to better model the background error covariance. A combined multiphysics and joint parameter estimation approach might also be pursued. Uncertainty of updated parameters tends to dissipate in time and may be more appropriate for certain kinds of conceptual model parameters instead of those considered in our multiphysics 545 approach. Further up the model chain, not considered in this study, running WRF-Hydro with a land surface would allow for updating of soil moisture and surface head states. Instead of treating deterministic fluxes with parameterized noise, introducing these prognostic variables would provide the ability to adjust the fluxes coming to the channel. Many studies have tried this and remarked on the problematic updating of soil moisture from streamflow due to the highly nonlinear relationship between the states, particularly for flood forecasting applications (Rakovec et al., 2015). While expanding the prognostic states of the 550 model may potentially improve aspects of the flood prediction problem, such as overland and subsurface fluxes to the channel routing configuration, it is possible that shifting the boundary conditions up the model chain may result in a similar bias issue with more degrees of freedom in the state vector. Coupled atmospheric and hydrologic DA would be a further step towards updating the prognostic states causing hydrologic errors in the state vector. These are ideas to be pursued in future studies.
An essential DA ingredient this study did not cover is Gaussian anamorphosis (Simon and Bertino, 2009; Gaussian statistics, it becomes more appropriate to transform streamflow to a Gaussian space where the update is performed and then it can be pulled back to the physical space. This is well known in hydrological applications (e.g. Clark et al., 2008). Such a transform guarantees that the updated streamflow does not consist of any unphysical (i.e., negative) values. The transformation is often conducted using empirical functions or analytical ones such as the natural logarithm. This will be investigated in a 560 follow-up study.
Finally, one-hour ahead (prior) forecasts of flooding event were the focus of this study. Future research will study the impact of DA in the whole forecast time window up to 18 hours in the short-range forecasts, and expand the DA application to medium-and long-range forecasts including additional hydrologic components and observations. The functionality of the ATS localization and inflation may change in different forecasting modes. For instance, longer localization radii could be found 565 more desirable in a long range forecast.
Code availability. The data assimilation code used in this study is openly available as part of the DART repository (master branch) on GitHub; https://github.com/NCAR/DART/tree/master/models/wrf_hydro. The model code is also freely available and can be accessed through GitHub; https://github.com/NCAR/wrf_hydro_nwm_public.
Author contributions. MEG developed the localization and the inflation algorithms, ran the DA experiments and wrote more than 60% of 570 the manuscript. JLM developed the channel+bucket submodel, the python framework that configures and runs wrf_hydro with DART, and wrote Sections 1 and 2. SJN contributed to the introduction Section and the discussion of the results. TJH provided significant help with the DART code. AR helped retrieve the observations and the precipitation data. BKJ built an OSSE framework that provided helpful insights to the real DA system.
Competing interests. The authors declare that they have no significant competing financial, professional, or personal interests that might 575 have influenced the performance or presentation of the work described in this manuscript.
Acknowledgements. The authors would like to thank Jeffrey Anderson and David Gochis for fruitful discussions. We are grateful to Nancy Collins for her assistance with coding the recursive search algorithm of the localization scheme. We also would like to acknowledge highperformance computing support from Cheyenne (