Variational assimilation of remotely sensed flood extents using a 2-D flood model

A variational data assimilation (4D-Var) method is proposed to directly assimilate flood extents into a 2-D dynamic flood model to explore a novel way of utilizing the rich source of remotely sensed data available from satellite imagery for better analyzing or predicting flood routing processes. For this purpose, a new cost function is specially defined to effectively fuse the hydraulic information that is implicitly indicated in flood extents. The potential of using remotely sensed flood extents for improving the analysis of flood routing processes is demonstrated by applying the present new data assimilation approach to both idealized and realistic numerical experiments.


Introduction
Flooding poses a significant threat to human society.Nowadays, floods are becoming more frequent as a result of intensive regional human activities and environmental change.Hydraulic or hydrodynamic models have become reliable and cost-effective tools to analyze and predict flood routing through catchments, rivers, and floodplains.These models can provide dynamic outputs (e.g., inundation area, water depth, and/or flow velocity) for flood warning and risk assessment.Nevertheless, models are not perfect, and uncertainties and computational errors may arise from various sources, including the uncertainties associated with hydrological parameters, initial and boundary conditions, as well as numerical errors as a result of numerical discretization and mathematical approximations (Le Dimet et al., 2009;Pappenberger et al., 2007a).In order to reduce prediction errors or uncertainties, field measurements are usually used to verify and calibrate a model before applying it to make predictions.Traditional trial and error approaches are commonly used in model calibration, but they are known for being subjective and tedious (Ding, 2004).Therefore, in order to make a better prediction, it would be more beneficial to have more intelligent calibration methods achieved by fusing a dynamic flood model with observed information to obtain an optimal estimate of model states and parameters.
Data and model fusion methods are termed "data assimilation", which stems from meteorology and oceanography (McLaughlin, 2002;Reichle, 2008;Wang et al., 2000).The variational data assimilation method, also called the "4D-Var method", is based on the optimal control theory of partial differential equations, which offers a powerful tool for data assimilation (Le Dimet and Talagrand, 1986;Talagrand and Courtier, 1987).As well as its operational application in meteorology and oceanography, this method also attracts great attention of hydrological society.It has been widely applied to assimilate in situ and remotely sensed hydrological data from multi-sources into the runoff-rainfall model and land surface model (Bateni et al., 2013;Le Dimet et al., 2009;Lee et al., 2012;Reichle, 2008).Also, it has been successfully applied to improve the predictive capability of 1-D and 2-D hydraulic models (Atanov et al., 1999;Bélanger and Vincent, 2005;Ding, 2004;Honnorat et al., 2007Honnorat et al., , 2009; Roux and Dartus, 2006).
Published by Copernicus Publications on behalf of the European Geosciences Union.

X. Lai et al.: Variational assimilation of remotely sensed flood extents
In river hydraulics, the available measurements commonly include water stage (level) and discharge at hydrological stations, and velocity at gauging points.These measurements are generally sparse, even for those study areas with decent monitoring systems, and are therefore likely to be insufficient to support reliable model calibration.During a flood event, the available measurements may be even scarcer due to malfunctioned operation of some monitoring systems under extreme flow conditions and the difficulty in performing field surveys.Fortunately, rich sources of remote sensing data with different spatial and temporal coverage now become increasingly available.Remote sensing imagery provides spatially distributed information about flood states, which is hard to obtain from the traditional point-based field-measuring approaches (Hostache et al., 2010).As a whole, due to their low cost and large coverage, remotely sensed data are now becoming an important source of measurements, and they are widely applied to flood monitoring and loss evaluation for flood hazards (Pender and Néelz, 2007).Furthermore, recent intensive research -such as the direct estimation of hydraulic variables (e.g., flow discharge and water stage) from satellite imagery, the use of remote sensing data to calibrate and validate models, the fusion of these data with dynamic models using data assimilation methods, among others -has significantly contributed to the advances of integrating remotely sensed data from space with flood models (e.g., Schumann et al., 2009;Smith, 1997).
Substantial efforts have been made using the 4D-Var and Bayesian-updating methods to demonstrate the potential of assimilating remotely sensed data from space for improving flood prediction (Andreadis et al., 2007;Durand et al., 2008;Giustarini et al., 2011;Komma et al., 2008).Roux and Dartus (2006) attempted to determine flood discharge from a remotely sensed river width using a 1-D hydraulic model.In 2-D river hydraulic modeling, 4D-Var methods have been developed to assimilate spatially distributed water stage (Lai and Monnier, 2009) and Lagrangian-type observations; e.g., remotely sensed surface velocity (Honnorat et al., 2009(Honnorat et al., , 2010)).Hostache et al. (2010) employed a 4D-Var method to assimilate the water stage derived from a RADARSAT-1 image of the 1997 Mosel River flood event in France into a 2-D flood model to improve model calibration.Water stage can be indirectly derived from satellite imagery or directly measured by satellite altimetry.The accuracy of indirect water stage retrieval from satellite imagery is typically in a range of 40-50 cm (Alsdorf et al., 2007;Hostache et al., 2010;Matgen et al., 2010).Simple overlay analysis of a Digital Elevation Model (DEM) and a flood-extent map may lead to high errors on the order of meters, even when a 30 m resolution ERS ASAR (Advanced Synthetic Aperture Radar) image is used (Brakenridge et al., 1998;Oberstadler et al., 1997;Schumann et al., 2011).Generally, additional steps must be performed in order to obtain an acceptable estimation of water levels for using with hydrodynamic modeling.The complexity of these steps varies with the meth-ods being applied (Matgen et al., 2007(Matgen et al., , 2010;;Raclot, 2006;Schumann et al., 2007).For instance, Raclot (2006) and Hostache et al. (2010) used a hydraulic coherence constraint to minimize the estimation errors.Schumann et al. (2007) proposed a Regression and Elevation-based Flood Information eXtraction model (REFIX) for water depth estimation and later suggested an alternative for deriving water level from river cross-section data (Schumann et al., 2008).Therefore, the derivation of water level from flood extent with acceptable accuracy is not a straightforward procedure.
Inland water level can also be directly measured from satellite altimetry that is originally developed for open oceans.The database of altimetric water level for about 250 sites on large rivers in the world has been developed based on satellite altimetry missions (http://www.legos.obs-mip.fr/en/soa/hydrologie/hydroweb/). For oceans and great lakes, the accuracy of estimating water level may reach a few centimeters (Fu and Cazenave, 2001;Crétaux and Birkett, 2006).For rivers and floodplains, the retrieved water level data quality is highly variable (Santos da Silva et al., 2010), most typically at 50 cm (Alsdorf et al., 2007).However, despite its relative high accuracy for large inland water bodies, compared with the indirectly retrieved water level, the present in-orbit satellite altimetry (four satellites includingSaral/AltiKa, Jason-2, HY-2, and Cryosat-2) is still problematic because of the spatial and temporal resolutions and coverage for sampling relative small water bodies.It essentially provides only spot measurements of water level (Alsdorf et al., 2007).To improve this, an exciting satellite mission called the Surface Water and Ocean Topography (SWOT) using swath-based technology has been proposed and will be launched for accurate monitoring of inland water bodies (https://directory.eoportal.org/web/eoportal/satellite-missions/s/swot).The SWOT mission provides great potential and new opportunity for data collection in the near future (in 2020).However, currently, the rich optical and Synthetic Aperture Radar (SAR) images will still be the main sources of remote sensing data for monitoring floods.Therefore, it is still of great interest to investigate the combined assimilation of the currently available multi-source satellite data.
In contrast to water stage, the remotely sensed flood extent can be directly derived from satellite imagery without affecting the original resolution (for example, 30 m for Envisat ASAR and 250 m for MODIS data), which is comparable to the mesh size normally adopted in flood modeling.Various simple and mature approaches are available for rapid and automatic extraction of a flood-extent map from optical and SAR imageries (Matgen et al., 2011;Smith, 1997).However, to the best of our knowledge, there has been no attempt at the direct assimilation of flood-extent data into a 2-D dynamic flood model using a 4D-Var method to date.
Herein, we attempt to use a 4D-Var method to assimilate remotely sensed flood-extent data into a dynamic flood model based on the numerical solution to the 2-D shallow water equations (SWEs) 4D-Var is a method based on the optimal control theory of a physical system governed by partial differential equations (Le Dimet and Talagrand, 1986).It allows us to perform flow state analysis or prediction of a system by combining a physically based dynamic model with observations.To implement a 4D-Var, a cost function must firstly be defined to measure the discrepancy between the computational results and observations.The cost function J over the time interval from t = 0 to t = T without regularization terms may be given as where p is the control vector, • is the Euclidean norm, H is the observation operator that maps the space of the state variables to the space of observations, U is the vector of state variables, W is the error covariance matrix, and O is the observed data.Herein, the statistical information can be incorporated into the norm through the error covariance matrix W. 4D-Var can be considered as an unconstrained optimization problem that seeks an optimal control vector p * to minimize the cost function J (p) in Eq. (1).According to the optimal control theory, optimum conditions are reached if the gradient J = 0, which means that an optimal control vector is obtained and the optimal flow analysis results are closest to the true (measured) state.This optimization problem may be solved by a descent-type algorithm, and the quasi-Newton minimization subroutine M1QN3, developed by Gilbert and Lemaréchal (1989), is adopted in this work.The algorithm calculates the gradient of the cost function -i.e., the vector of its partial derivatives with respect to each of the control variables -, which may be efficiently performed using the adjoint method, as described in Sect.2.3.

2-D shallow water equations
The 2-D SWEs are widely used to approximate flood routing over a floodplain.They can be written in a conservative form as follows: where x and y represent the Cartesian coordinates, t is the time, U = (h, hu, hv) T = (h, q x , q y ) T is a vector containing the flow variables, with h being the water depth and u and v the two velocity components, F = (hu, hu 2 + 0.5gh 2 , huv) T and G = (hv, huv, hv 2 + 0.5gh 2 ) T are the flux vectors in the x and y directions, g is the gravitational acceleration, B = [0, gh(S 0x -S f x ), gh(S 0y − S fy )] T is the vector of the source terms, S 0x = −∂Z b /∂x and S 0y = −∂Z b /∂y are the two bottom slopes, with Z b denoting the bed elevation, and S f x = n 2 q x h −7/3 q 2 x + q 2 y and S fy = n 2 q y h −7/3 q 2 x + q 2 y are the two friction slopes in x and y directions, respectively, with n being the Manning roughness coefficient.Given initial and boundary conditions, the flood routing process over a floodplain may be numerically predicted on different temporal and spatial scales by solving the above governing equations.

Adjoint governing equations
The adjoint method, based on an optimal control theory (Le Dimet and Talagrand, 1986), is usually applied to compute the gradient of the cost function, owing to its computational burden, independent of the dimension of problems (Cacuci, 2003).The adjoint equations for the 2-D SWEs can be derived for the cost function in Eq. (1), as follows: where the adjoint variable U * = (h * , q * x , q * y ) T and the coefficient matrices are given by

X. Lai et al.: Variational assimilation of remotely sensed flood extents
The partial derivative of the cost function J corresponding to the control vector p is a simple function of the adjoint variables U * , which can be found in Lai and Monnier (2009).
Adopting the adjoint equations in gradient computation significantly reduces the computational cost because evaluation of the adjoint variables requires only one backward integral in time.Once the adjoint variables are known, the partial derivatives of the cost function with respect to the control variables can be computed in a straightforward way.

Forward model and adjoint model
The 2-D SWEs in Eq. ( 2) are discretized using a finite volume Godunov-type scheme with the inter-cell mass and momentum fluxes evaluated using the HLLC (Harten-Lax-van Leer-Contact) approximate Riemann solver (Toro, 2001).The scheme has first-order accuracy in space but provides high-resolution representation of flow discontinuities.Time discretization is achieved using an explicit Euler scheme.Readers may consult Honnorat et al. (2007) for a more detailed description of the shallow flow model, which is referred to as the "forward model" herein.
The adjoint model is developed by directly differentiating the source codes of the forward model that solve the 2-D SWEs in Eq. ( 2).The automatic differentiation tool TAPE-NADE (Hascoët and Pascual, 2004) is adopted in this work to generate the reverse codes.This method, based on source codes, helps to build a consistent adjoint model corresponding to the forward solver.

Cost function for flood-extent assimilation
As mentioned previously in the introduction, the flood extent can be derived from satellite imagery more directly and easily than the water stage.However, the flood extent is not a state variable in the 2-D SWEs, but basically the union of pixels, where water depth is not 0. Therefore, it has no explicit relationship to the state variables.As a consequence, it is difficult to define a cost function to implement the assimilation of flood extent in the framework of 4D-Var.In this work, we implement the assimilation of flood-extent information into a 2-D dynamic flood model through an implicit way.
If we assume a function f as an observable quantity, the cost function may be defined as in which, the regularization terms are neglected from the above cost function to facilitate simplified but more informative verification and validation of the proposed method, and they allow direct investigation of the potential benefit of assimilating flood-extent data.
To determine the cost function for assimilation of the hydraulic information, including implicitly in the remotely sensed flood-extent data, a specific form of f should be introduced.Here, we define the flood extent related quantity f as a function with regard to state variables of water, U , namely where A is a matrix with regard to water depth that describes the wet-dry status, namely flood-extent information.
Normally, the wet-dry status of a computational cell can be determined by its water depth, h.It is dry if water depth is 0; otherwise, it is wet.However, a finite threshold (critical value) of water depth, h c , must be defined at water boundary in real-world problems.This is essential to minimize the effects of the disturbances from different land covers, the resolution of the image, and other sources of uncertainty, as suggested by Aronica et al. (2002).It should be noted that the matrix, A, describing the wet-dry status of the computational cells, should be determined according to the difference between the predicted water depth and h c so as to keep the consistence with the observed flood-extent data derived from imagery.The matrix A can be simply obtained as in which The above expression shows that the matrix A dynamically changes with the flood routing.
For the flood-extent observation derived from satellite images, the matrix A obs in f obs is an error matrix of observation describing wet-dry status information.It should be determined by the specific method for extracting flood extent. in which, a ii represents the wet-dry status or the degree of certainty of a pixel being wet in a remotely sensed image.Uncertainty in the observed flood extent can be determined by, e.g., using the fuzzy set approach (Pappenberger et al., 2007b).In the positions with high uncertainty, a ii will be assigned by a very low certainty degree.Low certainty lets the extent information in these positions take little effect on the estimate of flood states.
A normalized weight, w (ranging from 0 to 1), is introduced in this work to describe this certainty.As shown in Fig. 1, w = 1 indicates a pixel being definitely wet, and w = 0 denotes a pixel being absolutely dry.The value in between is given according to the level of certainty of a pixel being wet.The observed flood-extent map can then be depicted in a 2-D raster format with pixel values equal to w (Fig. 1).When observations are used, they should be mapped into the model space by an observation operator.
Assuming f = Ah, where U = h, we can interpret f as a physically meaningful variable; i.e., a unit water volume.In a view that the weight w in A represents the certainty of a cell being wet, deriving from observations but not the certainty of observed water depth, it is better to be used to constrain the discrepancy of predicted and observed water depth when defining cost function.For those overlapping regions between the predicted and observed extents, no discrepancy information should be used for assimilation and the corresponding cells should be deactivated in the computation of cost function, because the predicted wet-dry status is always the same as the observed one.Considering that, we further modified the cost function to become The remaining difficulty is to determine the observed water depth.To overcome this, the computational domain is first separated into two parts, as illustrated in Fig. 1; i.e., 1 represents the region with predicted water depth h > h c , while 2 is the area outside of 1 .In either part, the observed water depth is assumed to be identical to the prediction when computing cost function if the wet-dry status of the computed cell is the same as the observation belonging to the same flood extent.It should be noted that this assumption excludes those cells in the overlapping regions between the predicted and observed extents from the computation of cost function.
In those non-overlapping regions, different assumptions have to be made, depending on the specific location under consideration.Inside 1 , the observed water depth is defined to be "0" if the cell under consideration is outside the area covered by the remotely sensed flood extent.As a result, the cost function in 1 may be defined as J 1 = 0.5(1 − w) 2 h 2 , where w is the certainty of flooding, as described in the previous paragraph.Obviously, J 1 decreases to 0 when the predicted and observed extents coincide.Inside 2 , an observed water depth, h obs , is required to construct the cost function in those areas covered by the remotely sensed flood extent.Numerical experiments show that it is feasible to set h obs = 2h to keep a similar gradient along the boundary, which leads to a cost function J 2 = 0.5w 2 (-h) 2 in 2 .J 2 will also decrease to 0 when the predicted and observed extents coincide.Although this assumption seems to be "unrealistic", it is mathematically reasonable in the computation of cost function, and it is effective for assimilating flood extent to drive the assimilation algorithm.
Taking into account all of above considerations, the cost function measuring the discrepancy of observations and predictions over computational domain may be written as  Table 1.The five groups of observations used in the test case of dyke-break flood routing over a flat bottom.

Description of observations
Group A Flood extent at t = 5 s Group B Flood extents at t = 1, 3, and 5 s Group C Z(t), time history of water stage at central position of floodplain (time interval of measurement is 0.2 s) Group D Flood extent at t = 5 s and Z(t) Group E Flood extents at t = 1, 3, and 5s and Z(t) by the forward model for 5 s over the floodplain.Synthetic binary maps of the flood extent and the time history of water stage in the middle of the domain are generated and will be used as observed data during the following numerical experiments.Five groups of observations are obtained, as listed in Table 1, with different combinations of synthetic flood extents and/or the stage hydrograph at the central point.The assimilation window is set to be 5 s (the same as the duration of the forward simulation).Three series of numerical experiments are carried out by controlling n, Q i (t), or both of them, respectively.
In this case, a series of numerical experiments are carried out to verify the model using the accurate synthetic data generated that can eliminate the disturbances of numerical and measured errors encountered in an actual case.

Experiment series A
The control variable of the experiment series A is the distributed Manning coefficient n.Five assimilation experiments are run with the same first guess of n 0 = 0.02 over the whole floodplain, but with different groups of synthetic data being assimilated.In each run, the optimal analysis of flood routing over the floodplain is undertaken and the distributed n is retrieved, as provided in Table 2.
Table 3 lists the root-mean-square (RMS) errors of water depth over the whole computational domain at different output times.For the runs involving the observations of groups A and B, which just assimilate flood extents, the RMS errors decrease by 78 and 94 %, respectively.This is also clearly demonstrated by comparing the flood extents obtained from different runs that assimilate different observations (Fig. 3a).
After data assimilation, the predicted flood extents are significantly improved and agree much more closely with the "observed" extents.The more observed flood-extent data are assimilated, the closer the results become to the "true" state.
In the numerical experiment involving water stage observations (group C), only the stage hydrograph is assimilated, and the RMS errors decrease by 82 % on average.However, the predicted results at t = 3-5 s are significantly different to the true states, which can also be seen evidently due to the difference between the predicted and 'true' flood extents (Fig. 3a).
The results from simulations using observations from groups D and E show that the RMS errors are further decreased by about 95 % after assimilating both the time series of water stage and spatial flood extents.As a whole, by assimilating different synthetic data, different levels of improvement in flood prediction have been achieved during the numerical experiments, which lead to the assimilated predictions that are always much closer to the true state.It confirms that the current assimilation analysis of fusing observed flood extent and relevant information improves the accuracy of flood prediction in both space and time (Fig. 5a).The quality of the assimilated results can also be confirmed from the identified n, as listed in Table 2.The value of n for the first land block can be accurately identified in all of the experiments, regardless of whether flood extent or stage hydrograph is assimilated.However, since the stage hydrograph only provides upstream information, it cannot optimize the values of n for the downstream land blocks 4 and 5. Therefore, the n values remain to be their initial guess in the numerical experiment using the group C observations, which leads to apparent difference between the simulated and true extents after t = 3-5 s (Fig. 3a).

Experiment series B
Taking the inflow discharge as a control variable, we carried out further numerical experiments using the five given ), with R being a random number between 0 and 1, are imposed through the inflow boundary.With the help of the minimization algorithm, the initial guesses of the discharge boundary condition are corrected and the corresponding analysis results after data assimilation are computed.The hydrographs of inflow discharge for numerical experiments using the groups B, C, and E observations are shown in Fig. 4a.They are slightly corrected to minimize the cost function.
The RMS errors of each run at t = 1, 2, 3, 4, and 5 s are listed in Table 3.They decrease by 28 ∼ 32 % for those simulations assimilating the flood extents, but only 5 % for runs just assimilating point-based data provided as the stage hydrograph.Figure 3b compares the predicted and true flood extents.
In this experiment series, it is interesting to note that better prediction over the whole duration and spatial extent (Table 3 and Fig. 3b) is produced by assimilating flood extent, even though poor prediction of water stage hydrograph at the central gauge station is found (Fig. 5b).Assimilation of these data can help to estimate the inflow hydrograph and then increase the assimilation accuracy.On the contrary, pointbased time series data only imply part of the inflow discharge information prior to the propagation time from the inlet to the given points.The inclusion of point-based measurements helps to improve the accuracy of the stage hydrograph at the central station but has no obvious benefit for prediction for the whole duration and spatial extent.

Experiment series C
In the experiment series C, both the Manning coefficient and the inflow discharge hydrograph are controlled.The same initial guesses of n and discharge are used.After running the assimilation model, Q i (t) and the distributed n are corrected to minimize the cost function.Although the discharge hydrograph (Fig. 4b) and n (Table 2) of each run are not well identified, the predictions (Fig. 3c) obtained after assimilating the flood extents are much closer to the true one than those just assimilating point-based measurements.The RMS errors of the runs assimilating the observations of group A, B, C, D, and E decrease by 50, 64, 45, 48, and 41 %, respectively, as listed in Table 3.It is encouraging to observe that almost half of the RMS errors decrease for each run.As in the experiment series B, although the inclusion of point-based measurements improves the accuracy of the stage hydrograph at the central station, no obvious improvement is detected in terms of overall RMS errors.

Flood routing over a complex bottom
A test case involving flood routing over three mounds are selected to further verify the performance of the proposed model under complex circumstances, which is similar to the previous cases (Begnudelli and Sanders, 2007).The channel in this case has a length of 80 m and a width of 15 m (Fig. 6).Three mounds inside the channel are centered at (x, y) = (9.5, 7.5), (25, 3.5), and (25, 11.5), respectively.The first mound at (9.5, 7.5) is a square island with an elevation of 2 m.The second and third mounds at (25, 3.5) and (25, 11.5) are conidial with a height of 0.2 m and their elevation is assumed to decrease linearly along the radial distance from the center at a rate of 1 : 4. The computational domain is discretized into a uniform mesh of a 1 m × 1 m resolution.The channel bed is initially dry.Cases with both lumped and distributed bed roughness are investigated, respectively.A constant Manning's n = 0.03 is set up for the cases with lumped roughness.For the cases with distributed roughness, the Manning's n are set to 0.05 when x ≤ 10 m, 0.04 validated.Then, the influences of the uncertainties in floodextent data on the assimilation results are examined.

Independence on water depth threshold
To validate the independence of assimilation on the selection of water depth threshold, the numerical experiments with a lumped (constant) roughness are conducted.Based on the simulated flood process using a lumped Manning's n = 0.03, we generate the observed flood extents at t = 24, 36, and 45 s using different water depth thresholds; i.e., h c = 0.0001, 0.001, and 0.01 m.By controlling the lumped Manning's n, the flood extents are assimilated into the flood dynamic model.The unknown (or guessed) Manning's coefficients are successfully identified after assimilation of a single flood extent at different times.The RMS errors of water depth (RMSE h ) decrease significantly in all cases after the assimilation of the given single flood-extent data (Fig. 7 and Table 4) although the Manning's coefficients are not well identified in the case that assimilates the flood extent at t = 24 s when h c = 0.0001 m.These results indicate that the assimilation performance and accuracy are not sensitive to the selection of water depth threshold in the current method, provided it is in a reasonable range.It should note that water depth threshold is a finite magnitude that presents water depth of water boundary in real-world problems.Thus, the threshold cannot select arbitrarily, but it keeps the value as close to real water depth at water boundary line as possible.Sensitivity analysis may be conducted if required.

Influence of flood-extent uncertainty
In the previous numerical experiments, the observations are assumed to be accurate.However, real observed flood extent may be full of uncertainty due to the contamination caused by complex environment.To examine its influence on the model performance, assimilation of flood-extent data with  = 24 and 36 s 0.047 0.045 0.029 0.021 0.0022 uncertainty is tested.We assume that the flood areas are completely wet if h > 0.01 m, completely dry if h < 0.001 m, and partially wet or dry if 0.001 m < h < 0.01 m.Therefore, the weight or certainty degree of the cell being wet, w over whole flood areas, can be determined by w = max (min ((max (h, 0.001)-0.001)/(0.01-0.001),1), 0).This results in a gridbased flood-extent map for assimilation experiments.Two groups of assimilation experiments with respectively lumped and distributed bed roughness are conducted.For the cases with lumped bed roughness, the accurate weights calculated from water depth are first used in our assimilation experiments (Case U-24, U-36, and U-45, as presented in Table 4).The successfully identified Manning's n and the decrease of near 99 % in RMSE h (Fig. 7 and Table 4) show that the flood-extent uncertainty can be correctly accounted for in our proposed method.In realistic problems, the ideal weight is almost impossible to be accurately obtained.That considered, more challenging cases are designed to verify the method (Case B-24, B-36, and B-45, as presented in Table 4).In these three experiments, w is assumed to be 0.5 for areas with uncertainty (0.001 m < h < 0.01 m).After assimilating the given single flood extent, the controlling n is again successfully identified, which leads to a dramatic decrease in RMSE h (Fig. 7 and Table 4).
Furthermore, the cases with distributed bed roughness are also considered (Case B2-24, B2-36, B2-45, B2-36&45, B2-24&45, and B2-24&36).We still use the observations with inaccurate weight, namely w = 0.5 in areas with 0.001 m < h < 0.01 m.After assimilating the given single flood extent, the RMSE h in each experiment is apparently reduced, although the true distributed Manning's n cannot be achieved for these cases (Table 4).However, when new observations are available, the RMSE h can decrease significantly and the distributed Manning's n can be identified to become much closer to the true values.For example, the RMSE h decreased by 90 % when assimilating the single flood extent at t = 24 s, and by about 93 % when further assimilating flood extent at t = 36 or 45 s (Table 4).These results indicate that detailed content in the flood extent is important for the assimilation performance.Assimilation experiments also show that the proposed method can directly handle complex flood extents; e.g., the isolated islands inside the flooded areas, with grid-based flood extents defined to be compatible with the numerical grids.

Assimilation of an actual remotely sensed flood extent
Based on the findings of the previous numerical experiments, this section intends to investigate further the potential of the proposed data assimilation method using actual satellite remote sensing data (here, MODIS).The study area, Mengwa flood detention area (MFDA), is located at Fuyang, Anhui Province of China, on the middle reach of the Huaihe River.It is the most important region for flood control within this river basin.MFDA covers a narrow and elongated area of 180 km 2 (Fig. 8a), with a population of 148 000 farming 120 km 2 of cropland.The domain is discretized using an unstructured grid (Fig. 8b) consisting of 1222 nodes and 1136 quadrilateral and triangular cells.The size of the cell edges ranges from 200 to 400 m.The bed elevation at each cell is extracted from a DEM of a 100 m resolution, which is generated from a 1 : 2500 topographic map.
The data assimilation experiments are carried out based on the flood routing process over MFDA induced by the flood diversion event that happened in the summer of 2007.From 29 June to 15 July 2007, persistent heavy rain was experienced in the Huaihe River basin.To reduce the risk of severe flooding that might cause significant economic and human loss downstream, MFDA was operated by opening the Wangjiaba gate (Fig. 8a) to receive flood water from the Huaihe River starting from 04:28 UTC, 10 July, with an order from the Chinese central government.Until 12 July 2007, the total diverted volume reached about 0.25 × 10 9 m 3 , which effectively stored and retained flood water and hence reduced flood risk.Figure 8 plots the 45 h inflow hydrograph to MFDA through the flood gate, from 04:28 UTC, 10 July 2007.
Two MODIS instruments on the Terra and Aqua spacecraft platforms have provided daily measurements with the global coverage since 1999.The 250 m resolution with daily revisits makes them particularly suitable for monitoring the changes of flooding over a floodplain.Herein, we downloaded one scene of Aqua/MODIS Level 1B and geolocation data covering the whole MFDA from the Level 1 and Atmosphere Archive and Distribution System (LAADS).The MODIS data were acquired at 06:00 UTC with a 250 m resolution capturing the flood routing during the flood diversion event.Although MFDA was partly covered by light cloud at that moment, the image is of sufficient quality to identify the flood extent.
A simple method is adopted to extract the flood extent based on the luminance of the composite image from the band 7-2-1 combination.The luminance L of each pixel is firstly calculated using the following formula (Gonzales and Woods, 2002): cover.The flood extent is then easily extracted over MFDA by setting a critical value of luminance as a threshold to separate the water area from the image.However, due to the fact that the extraction of flood extent may be affected by the land surface, such as trees and vegetation cover (Smith, 1997), and that the current image is of a relatively low resolution of 250 m, there exist certain uncertainties in the boundary water line.In light of this, the concept of membership degree from the fuzzy set theory (Huang, 2000;Nguyen and Walker, 2006) is introduced as an indicator to determine the flood extent.The degree of membership w quantifies the grade of membership of an element to a fuzzy set, which is herein the possibility of a pixel being wet.A membership function may be written as (Huang, 2000) w where L i is the luminance of pixel I , and a and b are the upper and lower bounds of the luminance to separate the water and land.The degree of membership w = 0 and w = 1 mean that pixel i is completely dry and wet, respectively.A value between 0 and 1 characterize fuzzy members that are only partially wet/dry.Misclassification may also occur with this method.For those areas covered by heavy clouds, null values are given to the corresponding pixels and these cells are excluded from the evaluation of cost function.
From visual interpretation, we can identify that those areas with luminance L i less than 110 are covered by water and hence a = 110.The upper bound b is more difficult to determine owing to the effects of complicated land cover.In this paper, b = 121 and 126 are respectively examined.The flood extents retrieved from fixed thresholds 110, 121, and 126 are shown in Figs.9b-d.
Taking the membership degree computing from Eq. ( 12) as a weighting factor w and substituting it into Eq.( 10), the  cost function J is obtained as Based on this cost function, data assimilation experiments are conducted with a computational time step of 12 s.The simulation time is set to 36 h, starting from the gate opening at 04:28 UTC on 10 July 2007.The actual discharge hydrograph for flood diversion to MFDA, as shown in Fig. 8c, is imposed through the inflow discharge boundary.Simulation starts from an originally dry floodplain.The critical water depth to derive the boundary line of flood extent from remote sensing data, h c , is set to 0.2 m.
The Manning roughness coefficient, n, was assumed to be constant over the whole computational domain because of little knowledge about land use or cover.The control variable of the numerical experiments is the lumped Manning's n, namely the control vector contains only one element.Giving different n 0 (Table 5), we carried out six numerical simulations assimilating one single remotely sensed flood extent from MODIS data at t = 25.5 h with b = 121 and 126.The minimized cost functions of the experiments with b = 126 are less than those with b = 121, but the values are close to their minimum for an independent b (Table 5).
Figure 10 shows the computed flood extents before and after data assimilation.It can be observed that consistent flood extents are obtained in the assimilation experiments with different n 0 by assimilating the flood-extent information from MODIS data.Also, it is obvious that the computed flood extents are improved after data assimilation has been performed in both experiments.The estimated flood extents are much closer to the one extracted from MODIS (Fig. 9).The findings are encouraging, which indicate that hydraulic information from satellite imagery can be directly assimilated into a 2-D dynamic flood model via the flood extent using the cost function, as suggested in this work.
We also identify a consistent n in the assimilation experiments with different n 0 , as listed in Table 5.The identified n is about 0.2 ∼ 0.25, partly depending on n 0 .It is greater than the empirical value of a normal floodplain, which may be caused by the loss of accuracy from the low-resolution MODIS data and uncertainties in the domain topography, etc.In addition, the minimization procedure of the 4D-Var method seems to be trapped in the local minimal value for different n 0 in our experiments.Taking the experiments with b = 126 as an example, the optimized n is 0.208 if n 0 = 0.025 or 0.030, but it is close to 0.24 if n 0 = 0.5 or 0.8.After checking the relationship between the cost function and   n (Fig. 11), two local minimal values of cost function exist when n is close to 0.20 or 0.24.This leads to different estimations of n in our experiments.The double minima may originate primarily from the assumption of a constant n over the study area with heterogeneous landscapes, which is inconsistent with the actual situation.Furthermore, insufficient data (a single low-resolution flood extent) may also lead to the appearance of double minima in the cost function.

Summary and conclusions
To the best of our knowledge, no attempt has been reported to directly assimilate the flood-extent data into a 2-D flood model in the framework of 4D-Var.In this work, a 4D-Var method incorporated with a new cost function is introduced to advance this research topic.The new approach has been validated using a series of numerical experiments undertaken for an idealized test case before applying to a realistic simulation in MFDA.The main results of this study are summarized as follows: -A new cost function is defined to facilitate assimilation of flood-extent data directly using a 4D-Var method.
While it can efficiently help the 2-D flood model to assimilate the spatially distributed flood dynamic information of the flood-extent data from remote sensing im-agery, the current approach does not require those additional steps of retrieving water stage (Hostache et al., 2010).Since the flood extent is much easier to map from a remote sensing image than water stage and gradients (Schumann et al., 2009), the present scheme provides a more promising way of data assimilation for flood inundation modeling.However, as a new data assimilation method for flood modeling, an interesting research question to answer is whether the direct assimilation of flood-extent data can improve the assimilation accuracy compared with the assimilation of water level observations retrieved from the same data sources of satellite imagery.This is worth a comprehensive comparative study in the future, which may then provide a useful guideline for the practical applications of remote sensing data assimilation.
-Flood extent is a type of spatially distributed data and implicitly implies hydraulic information of flood routing.The observed flood-extent data may provide an alternative to obtaining a denser time series, as stated by Roux and Dartus (2006), and to compensating for unavailable field measurements during a flood event (Lai and Monnier, 2009).The assimilation of flood-extent data is suitable for improving flood modeling in the floodplains or similar areas (e.g., seasonal lakes with significant wetting and drying processes) with slowly varying bed slopes.However, it should be noted that this approach has its own limitation.If the flood extent does not contain enough hydraulic information, the assimilation exercise may fail.For example, in the case of flood inundation in a domain constrained by steep slopes, the water stage (but not the flood extent) varies evidently with time.Since the extent data do not actually represent the physical evolution of such a flood event, they are not suitable for assimilation.Therefore, the correlation between extent and flood dynamics must be established before applying the current data assimilation scheme.
-The results of flood modeling are much improved by successfully estimating the roughness parameter over a floodplain, even though the low-resolution MODIS data (250 m) is adopted in the application of MFDA.This implies that the proposed method may extend the usable data sources for assimilation to the imageries from most of satellites that are currently in orbit and that provide large spatial and temporal coverage.
Overall, this study shows that the assimilation of the floodextent data is effective in improving flood modeling practice.Future work should be carried out to understand the full potential of this new way of making use of spatially distributed data from various existing satellites in data assimilation.

a.Figure 1 .
Figure 1.Definition of a cost function.(a) The concept map; (b) grid-based map for showing the specific definition of cost function and possible active cells during data assimilation.
-break flood routing over a flat bottomWe first consider a flood routing process induced by a dyke break over a 10 m × 8 m rectangular floodplain with a flat bottom (i.e., Z b = 0).As shown in Fig.2a, the left boundary represents a river bank with a breach of 0.4 m in the middle.The floodplain consists of five types of land covers corresponding to Manning's n: 0.03, 0.04, 0.05, 0.06, and 0.07, respectively, from left to right.The computational domain has been discretized into a uniform mesh of a 0.2 m × 0.2 m resolution.During the simulation, a fixed time step of 0.01 s is used.The boundary discharge hydrograph Q i (t) (half of total discharge through dyke breach to floodplain) is shown in Fig.2band imposed on each of the two breach cells.The other three lateral boundaries of the floodplain are assumed to be solid walls.The floodplain is initially dry.With the aforementioned "accurate" n set for each land cover, the dyke-break flow routing process is firstly simulated www.hydrol-earth-syst-sci.net/18

Figure 2 .
Figure 2. Idealized test of flood routing over a rectangular floodplain induced by dyke breach: (a) computational domain; (b) hydrograph of the inflow discharge Qi(t).

Figure 4 .
Figure 4. Identified discharge hydrograph from (a) experiment series B and (b) experiment series C.

Figure 5 .
Figure 5. Water stage validation at the gauge point in (a) experiment series A, (b) experiment series B, and (c) experiment series C.

Figure 6 .Figure 7 .
Figure 6.Test case of a flood routing over three mounds.(a) Bed elevation and computational grids; (b) flood extent and water depth contour at t = 24 s; (c) flood extent and water depth contour at t = 36 s; (d) flood extent and water depth contour at t = 45 s.

Figure 9 .
Figure 9. (a) Luminance of MODIS image with band 7-2-1; (b) flood extent extracted from the fixed digital number threshold 110; (c) flood extent extracted from the fixed digital number threshold 121; (d) flood extent extracted from the fixed digital number threshold 126.

Figure 10 .
Figure 10.Comparing flood extents obtained before and after assimilation of the remotely sensed flood extent from MODIS image specified by b = 126 (background map) when (a) n 0 = 0.025; and (b) n 0 = 0.8.The solid line represents the boundary of the flood extent after assimilation, where water depth is equal to h c .The filled area is the flood extent computed by forward model with n 0 .

Figure 11 .
Figure 11.The relationship between the cost function and the Manning's n.

Table 2 .
The identified Manning's n in experiment series A and C.

Table 3 .
The RMS errors of water depth at t = 1, 2, 3, 4, and 5 s in experiment series A, B, and C.

Table 4 .
The water depth threshold, h c , the assimilated observations, identified Manning's n, and time-averaged RMS errors of water depth (RMSE h ) in the test case of dyke-break flood routing over three mounds.

Table 5 .
The identified n and the final cost functions in the application to MFDA.