Assimilation of transformed water surface elevation to improve river discharge estimation in a continental-scale river

. Quantifying continental-scale river discharge is essential for understanding the terrestrial water cycle, but it is susceptible to errors caused by a lack of observations and the limitations of hydrodynamic modeling. Data assimilation (DA) methods are increasingly used to estimate river discharge in combination with emerging river-related remote sensing products (e.g., water surface elevation (WSE), water surface slope, river width, and ﬂood extent). However, directly comparing simulated WSE to satellite altimetry data remains challenging (e.g., because of large biases between simulations and observations or uncertainties in parameters), and large errors can be introduced when satellite observations are assimilated into hydrodynamic models. In this study we performed direct, anomaly, and normalized value assimilation experiments to investigate the capacity of DA to improve river discharge within the current limitations of hydrodynamic modeling. We performed hydrological DA using a physically based empirical localization method applied to the Amazon basin. We used satellite altimetry data from ENVISAT, Jason 1, and Jason 2. Direct DA was the baseline assimilation method and was subject to errors due to biases in the simulated WSE. To

Abstract. Quantifying continental-scale river discharge is essential for understanding the terrestrial water cycle, but it is susceptible to errors caused by a lack of observations and the limitations of hydrodynamic modeling. Data assimilation (DA) methods are increasingly used to estimate river discharge in combination with emerging river-related remote sensing products (e.g., water surface elevation (WSE), water surface slope, river width, and flood extent). However, directly comparing simulated WSE to satellite altimetry data remains challenging (e.g., because of large biases between simulations and observations or uncertainties in parameters), and large errors can be introduced when satellite observations are assimilated into hydrodynamic models. In this study we performed direct, anomaly, and normalized value assimilation experiments to investigate the capacity of DA to improve river discharge within the current limitations of hydrodynamic modeling. We performed hydrological DA using a physically based empirical localization method applied to the Amazon basin. We used satellite altimetry data from ENVISAT, Jason 1, and Jason 2. Direct DA was the baseline assimilation method and was subject to errors due to biases in the simulated WSE. To overcome these errors, we used anomaly DA as an alternative to direct DA. We found that the modeled and observed WSE distributions differed considerably (e.g., differences in amplitude, seasonal flow variation, and a skewed distribution due to limitations of the hydrodynamic models). Therefore, normalized value DA was performed to improve discharge estimation. River discharge estimates were improved at 24 %, 38 %, and 62 % of stream gauges in the direct, anomaly, and normalized value assimilations relative to simulations without DA. Normalized value assimilation performed best for estimating river discharge given the current limitations of hydrodynamic models. Most gauges within the river reaches covered by satellite observations accurately estimated river discharge, with the Nash-Sutcliffe efficiency (NSE) > 0.6. The amplitudes of WSE variation were improved in the normalized DA experiment. Furthermore, in the Amazon basin, normalized assimilation (median NSE = 0.50) improved river discharge estimation compared to open-loop simulation with the global hydrodynamic model (median NSE = 0.42). River discharge estimation using direct DA methods was improved by 7 % with calibration of river bathymetry based on NSE. The direct DA approach outperformed the other DA approaches when runoff was considerably biased, but anomaly DA performed best when the river bathymetry was erroneous. The uncertainties in hydrodynamic modeling (e.g., river bottom elevation, river width, simplified floodplain dynamics, and the rectangular cross-section assumption) should be improved to fully realize the advantages of river discharge DA through the assimilation of satellite altimetry. This study contributes to the development of a global river discharge reanalysis product that is consistent spatially and temporally.
limited by a lack of accurate, complete, and freely available data (Hannah et al., 2011;Shiklomanov et al., 2002;Vörösmarty et al., 2001). Because of the limited temporal coverage and spatial heterogeneity of in situ gauging networks, elucidating the terrestrial water cycle is essential.
As a result of recent computational advances, global hydrodynamic models (GHMs) have been used extensively to study the terrestrial water cycle (Döll et al., 2016;Sood and Smakhtin, 2015). Simulated water dynamics obtained from GHMs are used to complement unavailable ground observations. GHMs simulate water dynamics in discretized river segments to increase computational efficiency. Nevertheless, they are subject to numerous limitations, including simplified model structures, imperfect external forcing, and uncertainties in model parameters (Liu and Gupta, 2007;Renard et al., 2010). These inadequacies are due to both a lack of information about physical processes and simplifications made to limit computational costs. There are considerable uncertainties in model parameters such as river bottom elevation due to a lack of measurements or limitations of estimation methods that affect model outputs (Brêda et al., 2019). Uncertainties in the forcing are also partially responsible for uncertainty in the surface water dynamics (Emery et al., 2020c). In combination, these constraints result in unavoidable uncertainties in GHM simulations of water dynamics.
Given the current limitations of GHMs and in situ measurements, satellite altimetry observations provide an alternative method of estimating the surface water dynamics (Feng et al., 2021). Satellite altimetry quantifies the water surface elevation (WSE) by measuring the time required for the electromagnetic (e.g., radar or laser) pulse to travel between the satellite and the water surface. Beginning with GEOS-3 in 1975, numerous satellite altimetry missions have been deployed to obtain measurements of terrestrial water surfaces. Although some of these satellites were developed for other purposes (i.e., observing the sea surface), their application has expanded to include river and lake observations (Birkett et al., 2002;Crétaux et al., 2009;Santos da Silva et al., 2010). Commonly used satellite missions for river observations are ENVISAT, Jason 1, Jason 2, Sentinel 3A, and Sentinel 3B (Bannoura, 2001;Resti et al., 2002;Zwally et al., 2002). The Surface Water and Ocean Topography (SWOT) satellite will provide an unprecedented amount of data for the first time on terrestrial waters (Biancamaria et al., 2016;Fu et al., 2012). The greatest impediment to the use of these satellites is their limited spatial and temporal coverage, which ranges from a few days to several months between successive observations of specific locations. Hence, satellite altimetry observations may not provide a comprehensive view of the terrestrial water cycle because of their spatial and temporal sparseness.
Surface water dynamics can be clarified by combining remote sensing data with a limited amount of observational data in continental-scale hydrodynamic models. Data assimilation (DA) is a mathematical technique that combines a physical model with external observations, accounting for their uncertainties, to improve model outputs or replicate the evaluation of an actual system (Emery et al., 2020a). By leveraging remote sensing data, DA methods can be used to bridge the gap between models and ground observations. DA approaches are widely used in meteorology and oceanography (e.g., Anderson, 2007;Evensen and van Leeuwen, 2002;Miyoshi and Yamane, 2007) and have recently been used in large-scale hydrology (e.g., Clark et al., 2008;Emery et al., 2018;Michailovsky et al., 2013;Paiva et al., 2013a;Revel et al., 2021;Wongchuig et al., 2019). They have also been used to correct hydrodynamic parameters such as river bathymetry (Brêda et al., 2019;Yoon et al., 2012), Manning's coefficient (Emery et al., 2020a;Pedinotti et al., 2014), and floodplain bathymetry and slope (Durand et al., 2008). Emery et al. (2020c) used DA to improve the accuracy of runoff forcing by integrating discharge observations. Using operation system simulation experiments, researchers have thoroughly investigated the potential for improving river discharge through the assimilation of remote sensing data Andreadis and Schumann, 2014;Biancamaria et al., 2011;Revel et al., 2019Revel et al., , 2021. In situ (Clark et al., 2008;Paiva et al., 2013a;Wongchuig et al., 2019) or remotely sensed (Emery et al., 2020b;Feng et al., 2021;Ishitsuka et al., 2020) discharge assimilation performs better, but the unavailability of ground observations and the limitations of remotely sensed river discharge values may hamper the performance of these DA schemes. Thus, DA approaches based on remotely sensed data can be used to improve the performance of global hydrodynamic models.
Although DA approaches can improve model performance, hydrodynamic models are not yet mature enough to directly assimilate satellite altimetry data (Emery et al., 2020a). Because of ambiguity in digital elevation models (DEMs), flaws in hydraulic parameters (e.g., river bathymetry), and the simplification of cross-section parameters, simulated WSEs may have substantial errors. Several methods have been used to circumvent these limitations when assimilating satellite altimetry into large-scale hydrodynamic models, including assimilating anomalies (i.e., removing the long-term mean WSE) and using a common datum (e.g. Emery et al., 2020a;Michailovsky et al., 2013;Paiva et al., 2013a;Wongchuig-Correa et al., 2020). To improve river discharge estimation in the Brahmaputra River, Michailovsky et al. (2013) assimilated measurements from the ENVISAT satellite into a rainfall-runoff model using a common reference for satellite altimetry and simulated river depth (i.e., adding the difference between the modeled river depth and the altimetry elevation to satellite altimetry). Likewise, anomalies from ENVISAT observations were assimilated into a continental-scale hydrologic/hydrodynamic model and compared to in situ and remotely sensed river discharge data in the Amazon basin (Paiva et al., 2013a). Moreover, global-scale hydrodynamic modeling studies have used anomaly assimilation to eliminate biases in the simulated WSE (Brêda et al., 2019;Emery et al., 2020a;Paiva et al., 2013a;Wongchuig-Correa et al., 2020). However, anomaly assimilation does not provide accurate river discharge estimates for the Amazon basin (Paiva et al., 2013a), as it cannot compensate for discrepancies in flow dynamics between observations and simulations. These differences in flow dynamics can be attributed to several factors, including differences in amplitude due to limited river width (De Paiva et al., 2013;Yamazaki et al., 2012), differences in seasonal flow due to a failure to capture anthropogenic activity (Hanazaki et al., 2022;Pokhrel et al., 2018;Shin et al., 2020), and differences in flow variation due to the assumption of rectangular crosssections (Neal et al., 2015;Saleh et al., 2013). Given such uncertainties in parameters and the structural simplification of current hydrodynamic models, anomaly assimilation of satellite altimetry may not be effective for estimating river discharge (Liu et al., 2012;Paiva et al., 2013a). Therefore, alternative approaches to direct and anomaly assimilation are required to integrate satellite altimetry into existing hydrodynamic models.
In the present study, we evaluated the potential of assimilating satellite altimetry into a global-scale hydrodynamic model to improve river discharge estimation. We investigated methods of assimilating satellite altimetry data into a hydrodynamic model (within current limitations) without contamination from the errors in simulated WSE. Large biases between satellite altimetry and simulated WSE are driven by uncertainties in parameters, whereas simplified physics and a lack of representation of anthropogenic activity (e.g., reservoir operations) introduce differences in the WSE distribution between simulations and observations. To effectively replace direct value assimilation, we propose alternative methods for DA in the Amazon basin, including anomaly and normalized value assimilation. The hydrodynamic model used in this study was the Catchment-based Macro-scale Floodplain model (CaMa-Flood: Yamazaki et al., 2011) with the local ensemble transform Kalman filter (LETKF: Hunt et al., 2007), which we used to assimilate satellite altimetry using a physically based empirical localization approach (Revel et al., 2019). The methodology is described in Sect. 2, and the findings are presented in Sect. 3. The discussion and conclusion are presented in Sects. 4 and 5, respectively.

Data assimilation framework
Using a physically based empirical localization approach, we developed a DA framework to incorporate satellite altimetry into a hydrodynamic model (Revel et al., 2021). The DA framework developed in this study is represented schematically in Fig. 1a. A collection of runoffs created with Earth2Observe's (E2O) Global Earth Observation for Integrated Water Resource Assessment, a tier-2 water resources reanalysis (WRR2) runoff data set, forced the ensemble simulations. As runoff is the single largest source of error in hydrodynamic modeling (Paiva et al., 2013a;Wongchuig et al., 2019), we simply perturbed the runoff forcing ("runoff ensemble"). CaMa-Flood (Yamazaki et al., 2011) was the hydrodynamic core of the DA scheme, and LETKF (Hunt et al., 2007) was the DA algorithm. CaMa-Flood simulations provide the current water state (i.e., WSE) and correct that value using satellite altimetry. The assimilation scheme takes advantage of physically based empirical local patches (Revel et al., 2019). The initial water state at time T x a T and the runoff are used to simulate the forecasted water state at time T + T x f T + T using the CaMa-Flood hydrodynamic model. The water status is then updated to x a T + T via DA, and any modifications are transferred to the initial condition of the following time step. In anomaly and normalized value assimilation scenarios, the forecasted water state is transformed to anomalies or normalized values using the long-term mean and standard deviation for the assimilation of converted (anomalies or normalized values) satellite altimetry. Then the assimilated water states expressed as anomalies or normalized values are converted into natural values (i.e., the corrected WSE). Further information about the transformation of water states is presented in Sect. 2.3.
To match the WSE obtained from satellite altimetry, we allocated virtual stations (VSs) to the CaMa-Flood river network, accounting for the Multi-Error-Removed Improved-Terrain DEM (MERIT DEM; Yamazaki et al., 2017Yamazaki et al., , 2019 elevations and river size. The methods used to allocate VSs to the CaMa-Flood river network are illustrated in Fig. 1b. First, we digitized the VS locations to high-resolution (i.e., 3 arcsec) MERIT Hydro (conditioned DEM) map by using latitude and longitude information to identify the nearest river. The high-resolution locations were then mapped to coarse-resolution river reaches, which were used in CaMa-Flood simulations. Finally, VSs with considerable variation in the mean WSE compared to the MERIT Hydro (Yamazaki et al., 2017 elevation (expressed as riverbank height) were filtered through comparison of mean observations and riverbank heights (i.e., VSs with mean WSEs above or below the 10 m riverbank height of the MERIT river network were removed). Next, all satellite altimetry elevations were converted into EGM96 from EGM08 via geoid conversion. Allocation of VSs to the CaMa-Flood river network is a vital step in the assimilation framework.
Using simulated long-term WSE values, we determined the localization parameters (i.e., local patch and observation localization weights; Fig. 1c). Deriving empirical localization parameters involved simulating WSE with CaMa-Flood, processing the data, running semi-variogram analyses, and assigning a threshold to spatial dependence weights. The physically based empirical localization DA approach outperformed traditional localization methods (Revel et al., 2019). Hence, when combined with LETKF, these localization pa- rameters provide a foundation for efficient continental-scale DA.

Hydrodynamic model
To diagnose the time-varying water states in the DA scheme, we used CaMa-Flood (Yamazaki et al., 2011), which is a large-scale distributed hydrodynamic model. CaMa-Flood uses a local inertial flow equation, which is a computationally efficient variant of the shallow-water equation (Bates et al., 2010;Yamazaki et al., 2011), to determine river hydrodynamics (e.g., discharge, WSE, flood depth, flooded area). Runoff (surface and subsurface flow of water per unit area) from a land surface model (LSM) forces the model, and water is routed through the river network at adaptive time steps . CaMa-Flood is capable of simulating floodplain dynamics and complex hydrodynamics such as the backwater effect (Yamazaki et al., 2011(Yamazaki et al., , 2012 and bifurcation flow (Yamazaki et al., 2014b). It is a physically based model that can simulate WSE; combining CaMa-Flood with MERIT DEM (Yamazaki et al., 2017 improves its performance relative to satellite altimetry. Consequently, the CaMa-Flood hydrodynamic model is appropriate for the DA framework described in Sect. 2.1.
We used CaMa-Flood version 4.0, which was developed with MERIT DEM and MERIT Hydro (Yamazaki et al., 2017 at a spatial resolution of 0.1 • . The simulations used the standard parameters (river channel depth, river width, roughness coefficient, and floodplain profile) of CaMa-Flood. The river channel depth was estimated using a power law relationship with prior river discharge (Sect. S3 in the Supplement, Eq. 1) (Yamazaki et al., 2011;Zhou et al., 2022). River widths were determined using remote sensing for rivers wider than 300 m (Yamazaki et al., 2014a); for narrower rivers, river width was empirically determined (Yamazaki et al., 2011). The roughness coefficient was approxi-mated as a global constant (0.03). MERIT DEM and MERIT Hydro were used to construct the river network (Yamazaki et al., 2017.

Water surface elevation transformation
Because of the large biases in simulated WSE, direct comparison with satellite altimetry is difficult. Figure 2a presents an example of WSE bias and a comparison of satellite altimetry with simulated WSE. This figure shows that direct DA can introduce additional biases into assimilated WSE. These biases are caused by inaccuracies in parameters such as riverbank elevation height errors and river bathymetry errors as well as differences in elevation due to hydrodynamic model resolution (i.e., models assume the unit-catchment outlet elevation as the riverbank elevation of the river reach). Converting WSE into anomalies can reduce the challenges created by large differences between simulated and observed WSE values. WSE anomalies were generated by subtracting the timeaveraged reference WSE (i.e., the long-term mean) from the current WSE. Therefore, each ensemble member had a different reference WSE value.
Although the use of anomalies can overcome the bias between observations and simulations, differences in flow variation between simulated and observed WSE remain (e.g., a difference in the amplitude of WSE variation, upstream water regulations that are not represented in the model). An example of a difference in flow variation is presented in Fig. 2b. The flow dynamic variation between simulations and observations can be overcome by using normalized values (i.e., subtracting the long-term mean and dividing by the standard deviation) when assimilating satellite altimetry into contemporary hydrodynamic models. To compensate more accurately for distribution discrepancies and estimate the river discharge, we used normalized value assimilation. We normalized the current WSE values using the time-averaged ref- erence WSE and the standard deviation of WSE for each perturbation in the ensemble. Hence, each perturbation had a unique reference WSE and standard deviation of WSE.

Local ensemble transformation Kalman filter
DA aims to overcome differences between observations and simulations by combining uncertain and complementary information from observations. In this study, LETKF (Hunt et al., 2007), which is a computationally efficient variant of the ensemble Kalman filter (EnKF: Evensen, 2003), an advanced Kalman filter (Kalman, 1960), was used as the DA method. We used a physically based empirical localization approach (Revel et al., 2019(Revel et al., , 2021 to enhance the computational efficiency of global-scale DA. The LETKF is a commonly used DA algorithm (e.g., Feng et al., 2021;Ishitsuka et al., 2020;Revel et al., 2019Revel et al., , 2021bWongchuig-Correa et al., 2020) for nonlinear models, which are needed for modeling hydrodynamic processes. The nonlinear hydrodynamic model can be shown in discrete form as follows: where x, u, and ϑ represent the vector of the state variable, the model forcing, and the model parameters, respectively.
The nonlinear model operator, M, is related to the time interval of t k to t k+1 , whereas errors in the model structure, parameters, forcing, and antecedent states are represented by q k . All state variables in CaMa-Flood, such as river discharge, WSE, flooded area, flood height, and storage, are included within the vector x. The model states can be related to the observations as follows: where y is the observation vector; ε is the vector of observation errors; and H is the linear observation operator, which relates the model states (x) to the observations (y).
In this study, the observations were WSEs obtained from satellite altimetry. In the anomaly and normalized value assimilations, the observed and forecasted states were transformed into anomalies and normalized values, respectively (Sect. 2.3, Fig. 2). The LETKF assimilation algorithm was used to obtain the optimal estimate of the model state vector X considering the model and observation errors. Here, the model state vector X was composed of WSE values. LETKF analysis is expressed as where X a is the posterior state estimator (or analysis); X f is the prior state estimator (or forecast); Y o is the observation (i.e., the WSE value obtained from satellite altimetry); H is the observation operator corresponding to WSE, which is a subset of H; m is the ensemble size; E f is the prior state error covariance obtained directly from the perturbations; R is the observation error covariance determined from the uncertainty of the measurements; w is the weighting term for observation localization calculated with semi-variogram analysis of the simulated WSE (Revel et al., 2019); and VDV T is defined as where I is the unit matrix of dimension m × m, representing the number of perturbations. VD −1 V T and VD −1/2 V T are calculated through eigenvalue decomposition of VDV T . The overbar represents the ensemble mean vector.

Generation of ensembles
CaMa-Flood diagnoses terrestrial water dynamics forced by surface and subsurface runoff values simulated using LSMs.
LSMs are subject to flaws such as simplified physics, e.g., hill slope dynamics (Fan et al., 2019) and uncertainty in the forcing data. We stochastically perturbed the runoff forcing of the CaMa-Flood hydrodynamic model. The errors in runoff can be attributed to the limitations of LSM physics and uncertainty in the forcing data sets. We assumed that the uncertainty of the LSM's physical processes could be represented by the variability of the multi-model runoff diagnosed in the E2O WRR2 data set (Dutra et al., 2017). The uncertainty of the forcing (e.g., precipitation, radiation) was represented by normally distributed random numbers with a standard deviation calculated from the ensemble spread of the 20th-century atmospheric model ensemble (ERA20CM: Hersbach et al., 2015) runoff data set. We multiplied each runoff from the E2O WRR2 data set (seven runoff data sets from E2O WRR2 were used) by a random number from a normal distribution with a mean of 1 and a standard deviation of 0.1 according to the ERA20CM runoff ensemble. We used seven runoff outputs from the models HTESSEL (Balsamo  ) model outputs were not used since they were incompatible with the CaMa-Flood hydrodynamic model. Therefore, 49 perturbations were used to prepare the runoff ensembles ( Fig. 1) using runoff fields from the E2O WRR2 runoff product, which generally produce a reasonable ensemble of runoff forcing for global DA.

Experimental design
We performed three types of experiments: direct DA (DIR), anomaly DA (ANO), and normalized DA (NOM). ANO was performed because assimilating WSE anomalies rather than direct values can overcome the errors associated with direct DA. Although anomalies can overcome the biases between observations and simulations, differences in flow variation between simulated and observed WSE could not be overcome by the anomaly DA method (Fig. 2b). To overcome the flow dynamic variation between simulations and observations, we performed NOM. Both forecasted values and observations were transformed into anomalies (ANO) and normalized values (NOM) for the DA experiments. The three assimilation approaches were used to identify the optimal assimilation methodology for improving discharge estimation within the present limits of hydrodynamic modeling. The anomalies and normalized values were calculated from the longterm ( We selected the Amazon basin as the test area for our DA experiments (Sect. S1, Fig. S1). The Amazon basin is the world's largest hydrological system, with a catchment area of approximately 6 million km 2 (Reis et al., 2019), and contributes nearly one-fifth of the total freshwater discharged into the ocean (Paiva et al., 2013a). The flow dynamics of the Amazon basin, which range from seasonal flooding (Papa et al., 2010;Prigent et al., 2020) to complex river hydraulics such as hysteresis in the stage-discharge relationship driven by the backwater effect (Paiva et al., 2013b;De Paiva et al., 2013), have been studied extensively. This basin receives substantial annual rainfall (≈ 2200 mm) with high spatial heterogeneity, and experiences distinct rainy and dry sea-

Observational data
We used satellite altimetry as observations for all DA experiments (Sect. 2.6). Satellite altimetry was originally developed to observe ocean surfaces, but its application has expanded through the creation of algorithms to detect surface water dynamics (Birkett et al., 2002;Crétaux et al., 2009;Santos da Silva et al., 2010). Satellite altimetry data were obtained from HydroWeb (https://hydroweb.theia-land.fr/ last access: 14 December 2022). Satellite altimetry measurements from ENVISAT and Jason 1 and 2 were used depending on the data availability during the simulation period (2009)(2010)(2011)(2012)(2013)(2014), as listed in Table 1. Table 1 summarizes the availability periods, temporal resolutions, cross-track distances, and measurement errors of the satellites used in this study. The spatial distribution of VSs is illustrated in Fig. 3. Using the methodology described in Sect. 2.1, we allocated the VSs to river pixels in CaMa-Flood (Fig. 1b). Preprocessing excluded around 3 % of VSs from analyses, which may have generated considerable inaccuracies, in particular in the experiments with direct value assimilation (DIR). These VSs were in narrow rivers at relatively high elevations. The WSE data obtained from satellite altimetry were converted from EGM08 to EGM96, as the EGM96 geoid model is used in MERIT DEM/MERIT Hydro (Yamazaki et al., 2017. As we assimilated only WSEs from satellite altimetry, we used Global Runoff Data Centre (GRDC) river discharge data for validation. We used river discharge gauges located in the main river reaches (upstream catchment area > 1000 km 2 ) and gauges with observational data covering at least 1 year of the simulation period. Furthermore, we randomly chose 80 % of the VSs in the Amazon basin for assimilation, while the remaining 20 % were preserved for WSE validation (Fig. 3). Consequently, we used only 80 % of the 324 VSs in the Amazon basin for the DA.

Evaluation diagnostics
We evaluated the relative assimilation efficiency using several diagnostics. The difference in correlation coefficient ( r) between assimilated and open-loop simulations was assessed to evaluate the improvement in the flow pattern of the discharge. r was calculated as where the correlation coefficients of the assimilated and open-loop simulations are represented by r asm and r opn , respectively. Then the relative efficacy of WSE was assessed with the relative root mean square error (rRMSE): where RMSE asm and RMSE opn are the RMSE values of the assimilated and open-loop simulations, respectively. s and o are simulation results and observations, respectively. N is the number of observations in the time series. The Nash-Sutcliffe (Nash and Sutcliffe, 1970) efficiency-based assimilation index (NSEAI; Revel et al., 2021) was used to evaluate the improvement in river discharge with DA: where NSE asm and NSE opn are the Nash-Sutcliffe (Nash and Sutcliffe, 1970) where ISS asm and ISS opn are the ISS values (Gneiting and Raftery, 2007) of the assimilated and open-loop simulations, respectively. u and l are the upper and lower confidence intervals for the estimate, o is the observed value, and α is the significance level. N is the number of observations. By rewarding narrow confidence intervals and penalizing observations outside the nominal confidence intervals, the ISS incorporates both sharpness (i.e., the size of the confidence interval) and reliability (i.e., the proportion of observations that fall within the nominal confidence interval specified). When balancing sharpness and reliability, a relative comparison of ISS (rISS) allows for the evaluation of ensemble models; those with better performance have lower ISS values (Michailovsky et al., 2013). The significance level (α) was set to 0.05 in this study. Furthermore, relative sharpness (rSharpness) and difference in reliability ( Reliability) were used to evaluate the relative assimilation performance. rSharpness is calculated as rSharpness = Sharpness asm − Sharpness opn Sharpness opn (12) where Sharpness asm and Sharpness opn are Sharpness values of the assimilated and open-loop simulations, respectively. Reliability is defined as where Reliability asm and Reliability opn are reliability values of the assimilated and open-loop simulations, respectively. Reliability is the number of observations within the bounds u and l. We used NSE (Nash and Sutcliffe, 1970) and KGE (Kling and Gupta, 2009) to evaluate the overall performance of river discharge. Furthermore, RMSE, the absolute bias between the means of the observations and simulation results (BIAS) and the difference in amplitude ( A) of WSE were evaluated.

Relative performance evaluation
In this section, we present the relative performance of each assimilation approach, considering the direct (DIR), anomaly (ANO), and normalized value (NOM) DA experiments in that order. Here we analyze the performance of assimilated values with respect to the open-loop simulation. r represents the relative change in r between the open-loop or assimilation results and observations. rRMSE represents the deviation in the RMSE of assimilation relative to that of open-loop simulation. rISS, rSharpness, and Reliability are used to assess changes in ensemble spread between the openloop and assimilated simulations, followed by a comparison of the relative performance (i.e., r, NSEAI, KEGAI, rISS, rSharpness, and Reliability) of the experiments (i.e., DIR, ANO, NOM).

Direct assimilation of satellite altimetry
In DIR, we assimilated direct satellite altimetry measurements into the CaMa-Flood hydrodynamic model. Figure 4a shows the improvement (degradation) in the correlation coefficient in green (violet) for river discharge at the GRDC locations in this study. r improved in 8.1 % of GRDC locations out of the 86 used for evaluating river discharge, whereas half of the gauges (48.8 %) showed no difference. r was reduced ( r<0) at several locations along the Madeira, Negro, and Purus tributaries (accounting for 43.0 % of all gauges). The relative change in RMSE between observed and simulated WSE (rRMSE) was used to evaluate WSE performance, as illustrated in Fig. 4b. Large negative (positive) values for rRMSE indicate better (worse) performance of the DA scheme, which is denoted by blue (red). Overall, 56.4 % and 50.8 % of the assimilation and validation VSs, respectively, showed reductions in RMSE with the assimilation of satellite altimetry into a hydrodynamic model. The WSE estimates obtained from the assimilated simulation with direct DA into a model were degraded (assimilation: 39.0 %, validation: 41.5 %) relative to the open-loop simulation in the Amazon mainstem and the Negro, Branco, Madeira, and Xingu rivers (Fig. 4b). A limited number of gauges demonstrated no change with direct DA.  when the ENVISAT satellite was available. However, confidence intervals became larger after 2010. DA marginally improved NSE scores, with low flows well replicated but peak flows showing some fluctuations. Santos Dumont (Fig. 4d) showed an improvement in the correlation coefficient of river discharge, although NSE suffered from a substantial underestimation of high flow. ISS increased by 29 %, primarily because of an improvement in sharpness, but reliability decreased. Figure 4e illustrates the variation in discharge at a station located in the mainstem of the Amazon River (Santo Antonio Do Ica), showing an improvement in NSE values but a weakening of the correlation coefficient. At this location, the tradeoff between reliability and sharpness is strong. Sharpness is often enhanced when direct satellite altimetry measurements are assimilated into an uncalibrated hydrodynamic model, but reliability is reduced.
In summary, direct DA improved flow dynamics when the simulations were within comparable limits of satellite observations. When direct satellite altimetry measurements were assimilated into the hydrodynamic model, the sharpness of river discharge improved. Furthermore, the accuracy of WSE estimates also improved with DA.

Anomaly assimilation of satellite altimetry
In ANO, anomalies of satellite altimetry were assimilated to anomalies of simulated WSE, with both anomalies produced using long-term means (for satellite altimetry: the mean of the available period; for WSE simulation: 2000-2014). Figure 5a depicts the r of river discharge in ANO, with green indicating an improved r relative to the open-loop simulation, which accounted for around 53.5 % of the GRDC gauges used to evaluate river discharge. Some degradation (purple) in r was observed in the Madeira and Purus rivers, Amazon mainstem, and smaller river reaches. WSE estimates improved in 76.1 % of assimilation VSs and 80.0 % of validation VSs (Fig. 5b). WSE performance decreased (in terms of RMSE) in the Jurua and Purus rivers, although nearly all other river reaches showed increases in the accuracy of WSE calculations with DA.   (Fig. 5c), the r of river discharge increased slightly, whereas NSEAI and rISS decreased. Although low flows were adequately characterized during the brief observation period, peaks were exaggerated, resulting in low NSE and high ISS for the assimilated simulation. River discharge in the Amazon mainstem, notably at Manacapuru gauge (Fig. 5d), was well characterized, with improvements in r and ISS but a deterioration of NSE values. By contrast, the flow variation was accurately defined at Serrinha station in the Negro River (Fig. 5e) with anomaly DA and an uncalibrated hydrodynamic model ( r, NSEAI, and rISS were improved). Through anomaly assimilation into an uncalibrated hydrodynamic model, the flow dynamics (characterized by r) of the Amazon basin improved, although NSE and ISS values worsened slightly.
Overall, the discharge estimates improved at some GRDC gauging stations (35 %) with the assimilation of WSE anomalies into the hydrodynamic model. The seasonality of river discharge improved considerably in most river reaches with anomaly assimilation. Furthermore, the WSE calculation was improved in many Amazon basin river reaches.

Normalized assimilation of satellite altimetry
In NOM, we assimilated normalized values of WSE. Longterm statistics (the mean and standard deviation of WSE for 2000-2014) were used to generate normalized values of DA for NOM. Figure 6a and b represent the r of river discharge and the rRMSE of WSE, respectively, for NOM. A total of 60.5 % of the GRDC gauges demonstrated a positive r, whereas decreases were evident in the Purus and Madeira rivers as well as the Amazon mainstem. A considerable number of VSs showed an improvement in WSE calculations with the normalized DA technique (85.6 % for both assimilation and validation VSs; Fig. 6b).
The lower panels of Fig. 6 illustrate flow dynamics along the Amazon mainstem (Sao Paulo De Olivenca; Fig. 6c) and the Japura (Vila Bittencourt; Fig. 6d) and Negro (Curicuriari; Fig. 6e) rivers. The discharge at Sao Paulo De Olivenca station in the Amazon mainstem (Fig. 6d) resembled the observed river discharge. Although NSEAI and rISS were both enhanced, r was marginally degraded. Note that the normalized value DA replicated the flow dynamics of the observations well, showing a secondary peak (e.g., October 2009) at the Sao Paulo De Olivenca station that was absent in the open-loop simulation. Although low flows and other fluctuations were accurately portrayed along the Japura River (Vila Bittencourt; Fig. 6d), the relative assimilation efficiency metrics had low values. Figure 6e illustrates a hydrograph of the Curicuriari gauge along the Negro River. The discharge at Curicuriari was well characterized, with a positive r and NSEAI and a negative rISS. Normalized DA using an uncalibrated hydrodynamic model improved the characterization of river discharge in terms of seasonal dynamics, overall accuracy, and the tradeoff between sharpness and reliability.
The normalized DA approach improves flow variation in most river reaches. In the normalized assimilation experiment, WSE estimates improved in most Amazon basin river reaches.

Comparison of assimilation experiments
To evaluate the relative improvement associated with DA, we evaluated only those GRDC gauges located in river reaches observed through satellite altimetry (satellite coverage; Fig. S2). The effectiveness of assimilation for GRDC gauges located outside the area of the satellite observations is poor, with very little difference between open-loop and assimilated simulations. Approximately 75 % of the 86 GRDC gauges lay outside the satellite coverage area (Fig. S2). The 21 gauges within the satellite coverage area were used to assess the relative improvement among experiments. Table 2 presents median relative performance statistics for river discharge estimates for all experiments. Positive values for r, NSEAI, KGEAI, and Reliability indicate that DA improved river discharge estimation. Negative values for rISS and rSharpness, in contrast, demonstrate an improvement in river discharge estimation with DA. For all experiments, Fig. 7a displays the kernel density estimate of the probability density function for the r of the river discharge. All experiments except DIR showed improvement in the median r (> 0), demonstrating improvement in the flow regime with DA for at least 50 % of the gauges. The r for river discharge in DIR showed a left-skewed distribution, which suggests a deterioration in seasonality at many gauges (75 %). Approximately 70 % of the gauges showed improvements in flow regime characterization in the anomaly and normalized value DA experiments. However, only NOM had a positive median NSEAI, which indicates that at least 50 % of the gauges had improved NSE values with normalized value DA (Table 2). Figure 7b shows boxplots of NSEAI for all experiments that demonstrate considerable improvement in the DA experiments with normalized value assimilation. KGEAI followed the same pattern, with positive median values for NOM.
Compared to r, NSEAI, and KGEAI, rISS showed the opposite trend (Fig. 7c, Table 2), with major improvements resulting from direct DA. A negative rISS means that the ISS of assimilated discharge was improved, as a lower ISS indicates better performance with DA. Direct assimilation (DIR) led to a lower median rISS value (−0.36), whereas both anomaly and normalized value assimilation had values of −0.13 and −0.18, respectively. When evaluating rISS, one must consider changes in sharpness (i.e., the width of the confidence interval) and reliability (i.e., the percentage of observations that fall within the predicted nominal confidence interval; Michailovsky et al., 2013). A large reduction in sharpness was observed in the direct assimilation experiment (DIR), mainly because the assimilation was conducted using direct values in DIR (Fig. 4c-e), whereas the ANO and NOM reprojected the assimilated values to WSE values. The lowest reliability reduction was obtained in the normalized value assimilation experiment (NOM). The reliability of direct assimilation was reduced by 54 %, whereas sharpness improved by 79 % in DIR compared to the open-loop simulation (i.e., the 95 % confidence interval was larger in the open-loop simulation: Fig. S2b-c). Although the confidence bounds (i.e., sharpness) were narrower with direct DA compared to the anomaly and normalized value DA experiments, reliability was degraded by more than 50 %.
In summary, considering the improvements measured using multiple evaluation metrics (e.g., NSEAI, KGEAI), normalized value assimilation (NOM) showed the greatest improvement relative to the open-loop simulation, whereas the smallest improvement was obtained from the direct DA experiment (DIR). However, the tradeoff between sharpness and reliability was better in the direct DA experiment, as the assimilations were performed directly. Sharpness was substantially improved in DIR. In anomaly and normalized value assimilations, WSE space is affected by the calculation of anomalies or normalized values. Hence, given the current condition of hydrodynamic modeling (i.e., the limitations of hydrodynamic models), normalized value assimilation performed best.

Absolute performance evaluation
In this section, we explore the absolute performance of river discharge and WSE. When analyzing absolute performance, we consider the r, NSE, and KGE values for river discharge; RMSE, BIAS, and A are used for WSE. r is used to assess the seasonality of river discharge estimates, whereas NSE and KGE are used to evaluate the overall performance of river discharge estimation. RMSE is used to evaluate the overall error of WSE estimation against satellite altimetry observations. Long-term bias is assessed with BIAS, and the difference in amplitude between the simulated and observed WSE is examined using A. The absolute performance of daily discharge estimates is presented in Sect. 3.2.1, and the absolute performance of WSE estimation is described in Sect. 3.2.2.

Estimation of the daily river discharge
We used r, NSE, KGE, and Sharpness to evaluate daily assimilated river discharge across all experiments, and Table 3 presents the median statistics for each metric. We obtained the reported median values using all GRDC gauges in the Amazon basin (all) and, more conservatively, using river reaches with satellite altimetry observations (satellite coverage reaches), so the impact of assimilation on river reaches outside the satellite observation area was minimal. In general, the median performance metrics in satellite coverage river reaches were better than the median performance of all discharge gauges, whereas median Sharpness was worse in satellite coverage river reaches. Sharpness was determined with the average confidence bounds, and river reaches with satellite coverage have a high discharge, resulting in larger confidence intervals. Consequently, the median Sharpness estimate for river reaches with satellite coverage was inevitably large. When only river reaches with satellite observations were considered, the NSE of DIR was reduced. Flow patterns improved with the shift from direct to anomaly or normalized value DA. However, the differences between the anomaly and normalized DA experiments were marginal. Median NSE and KGE values increased in the following order: direct, anomaly, and normalized value DA experiments. However, the direct DA experiments efficiently improved Sharpness, thereby increasing confidence in the assimilated river discharge when reliability is higher. DIR had the lowest sharpness values for both the entire river and satellitecovered reaches. Figure 8 shows the spatial distributions of absolute performance metrics (e.g., r, NSE, and KGE) for daily river discharge in all DA experiments (DIR, ANO, and NOM). Figure 8a depicts the spatial distribution of the absolute performance of river discharge estimates obtained in the direct DA experiment (DIR). The r of river discharge estimation for several GRDC gauges was > 0.8 (approximately 38 %), with a median r of 0.74. NSE and KGE were > 0.6 in 22 % and  Table 2. Summary of relative DA efficiency statistics for river discharge in each experiment. The difference in the correlation coefficient ( r), Nash-Sutcliffe efficiency-based assimilation index (NSEAI), Kling-Gupta efficiency-based assimilation index (KGEAI), relative interval skill score (rISS), relative sharpness (rSharpness), and difference in reliability ( Reliability) are shown. Positive values for r, NSEAI, KGEAI, and reliability represent better performance of DA, and lower values for rISS and sharpness indicate improvements due to DA. Improvements in each relative performance metric with DA are highlighted in bold. 34 % of gauges, respectively, and the median NSE and KGE were 0.13 and 0.46, respectively. Some gauges along the Negro, Jurua, and Upper Solimoes rivers had low accuracy for estimating river discharge through direct DA (DIR). The spatial distribution of the absolute performance of river discharge estimation through anomaly DA is shown in Fig. 8b. Anomaly DA (ANO) produced r values of river discharge that were > 0.8 in 84 % of gauges, with a median r = 0.85. In the Amazon basin, overall river discharge was well characterized by anomaly DA (35 % of stations with NSE > 0.6 and 48 % of stations with KGE > 0.6). Figure 8c illustrates the performance of NOM, showing better performance in large river reaches catchment area > 1000 km 2 . Nearly 57 % of the GRDC gauges had r > 0.8, with a median r of 0.84 in NOM. The preponderance of gauges (76 %) had NSE > 0.6, with a median NSE of 0.47. KGE values were greater than 0.6 for 92 % of the gauges, with a median of 0.62. Most gauges along the Amazon mainstem and Negro, Purus, Madeira, and Jurua rivers reliably estimated river discharge with assimilation of satellite altimetry using the normalized value DA method.

Estimation of water surface elevation
Although we used the river discharge to evaluate assimilation efficiency, WSE is an important water dynamic estimator, in particular for predicting floods. Table 4 summarizes the evaluation results of the assimilated WSE satellite altimetry measurements. We evaluated the RMSE, BIAS, and A of WSE between assimilated and observed values. RMSE represents the total departure from observations, whereas BIAS denotes the difference in long-term mean values between simulation results and observations. The mean difference in the variation in the yearly peak and trough of the hydrograph was identified with A. Transitioning from direct to normalized value assimilation did not reduce the RMSE or BIAS. Nevertheless, the anomaly and normalized assimilation methods improved the amplitude of WSE more than direct DA. Similar patterns were observed for the assimilation and validation VSs. The BIAS of WSE, which accounts for a considerable portion of the RMSE, was not corrected in the anomaly or normalized value assimilations. Figure 9 illustrates the spatial distributions of the RMSE, BIAS, and A of WSE for DIR (Fig. 9a), ANO (Fig. 9b), and NOM (Fig. 9c). Median RMSE and BIAS were lowest with direct DA (DIR), but A was larger than in the anomaly and normalized value DA experiments for all, assimilation, and validation VSs. RMSE and BIAS  Table 3. Median performance metrics for daily discharge estimates obtained from DA experiments. Median values for the correlation coefficient (r), Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), and width of the confidence interval (Sharpness) are presented for all GRDC gauges and for gauges in the satellite coverage area.  Fig. 9b). Large BIAS values occurred in the Ama-zon mainstem, Purus River, and Japura River, with BIAS > 4 m. The annual variation in WSE (amplitude) differed considerably in some areas of the Amazon and Negro River mainstems ( A > 8 m).
With normalized DA (NOM), a large RMSE (> 4 m) was observed in the Madeira River, downstream reaches of the Amazon mainstem, and upper Purus River. Large BIAS val- ues occurred in the mid-section of the Amazon mainstem and Japura River (BIAS > 6 m). A was particularly high in some sections of the Amazon mainstem and Negro River (| A| > 8 m). In summary, direct DA estimated WSE with low RMSE and BIAS values, whereas the best A was obtained with anomaly DA.

Comparison of discharge products
To investigate the accuracy of river discharge estimation using DA compared to state-of-the-art hydrodynamic modeling, we compared the river discharge obtained from CaMa-Flood forced with bias-corrected variable infiltration capacity LSM (Liang et al., 1994) runoff data (VIC BC: Lin et al., 2019). We used VIC BC runoff (Lin et al., 2019) to force the discharge without DA, whereas the ensemble mean discharge was examined for all DA experiments (direct DA (DIR), anomaly DA (ANO), and normalized value DA (NOM)). VIC BC runoff is produced with sparse cumulative density function matching, and combining VIC BC runoff with the CaMa-Flood hydrodynamic model yields more accurate discharge estimates (Lin et al., 2019). A comparison of boxplots showing the NSE for various discharge products is presented in Fig. 10. The median NSE for discharge determined using CaMa-Flood standard settings (CaMa VIC BC) was 0.42. Normalized value assimilation (NOM) provided the best river discharge estimates, with a median NSE of 0.50, whereas direct and anomaly DA produced medians of 0.13 and 0.39, respectively. For normalized value DA, NSE values of river discharge were confined to around 0.5, with many of the gauges demonstrating NSE values greater than zero. Hence, normalized value assimilation improved the NSE of river discharge compared to standard CaMa-Flood modeling and the other DA methods tested (i.e., anomaly and direct assimilation).  Assessing the spatial distribution of the optimal discharge product is critical to improving discharge estimates globally. Figure 10b shows the heterogeneity of the optimal discharge estimate among the four products. River discharge was compared based on the NSE of each discharge product. NOM, which accounted for 44 % of the gauges, most accurately estimated discharge overall. ANO estimated river discharge more accurately for 24 % of the GRDC gauges. Direct DA provided better estimates for 23 % of the gauges, whereas CaMa VIC BC estimated river discharge accurately for 8 % of the gauges. Note that the discharge estimates of some of the gauges located on the Amazon mainstem were better without DA. Most gauges with the most accurate discharge estimates of CaMa-Flood VIC BC were located outside of the satellite-observed river reaches and were marginally affected by DA.
In conclusion, normalized value DA (NOM) performed best for estimating river discharge, but uncalibrated CaMa-Flood simulations without DA (i.e., CaMa VIC BC) were more accurate than other DA methods (i.e., direct and anomaly assimilation). Hence, assimilating satellite altimetry can improve the accuracy of river discharge estimates compared to current state-of-the-art hydrodynamic modeling.

DA performance with current hydrodynamic models
We compared several DA methods to overcome the errors associated with assimilating satellite altimetry observations directly into a hydrodynamic model. Through the assimila-tion of anomalies or normalized values, river discharge estimation was improved considerably compared to direct DA (Fig. 7). Although WSE was correctly assimilated with direct DA (Fig. 9a), river discharge estimates were inaccurate because of parameter errors and discrepancies in flow dynamics driven by limited representation of actual physical phenomena (as illustrated in Fig. 2). These biases can be caused by discrepancies in parameters such as riverbank full height and river bottom elevations. For example, when river channel depth was overestimated in the model, the simulated WSE was lower than the observations. When assimilated WSE was converted into the CaMa-Flood prognostic variable (i.e., storage), the initial condition could be erroneous. Such errors were propagated to river discharge. In areas where the simulated water dynamics (e.g., amplitude and flow regime) were similar to observations, anomaly assimilation limited the extent to which the biases affected assimilation (Emery et al., 2020a;Paiva et al., 2013a;Wongchuig-Correa et al., 2020). Although spurious errors (due to limited ensemble size) were regulated using physically based empirical localization patches (Revel et al., 2019), direct DA was adulterated because of the biases and dynamic differences in WSE simulations. Normalized value assimilation showed better assimilation efficiency in terms of NSEAI, representing the overall accuracy of discharge estimation, compared to the anomaly DA method (Fig. 7). Currently the CaMa-Flood hydrodynamic model cannot accurately represent the dynamics of WSE because of limitations in the model framework (e.g., a lack of representation of water regulations, diversions, and lake dynamics) and the impacts of water dynamics other than river flow. These limitations also exist for most global hydrodynamic models, which do not accurately represent reservoirs, diversions, and lakes (Fleischmann et al., 2021). For example, when reservoir operations are not represented in a hydrodynamic model, assimilating observations obtained during reservoir operation will alter the flow regime. In addition, errors related to the model structure hamper the prediction of surface water dynamics and may produce a different flow dynamic than the observations. Therefore, normalized value assimilation provided the best estimates of river discharge given the current limitations of models such as biases and poor representation of flow dynamics.
Moreover, the assimilation framework is computationally efficient and effective at removing spurious correlations. LETKF is a computationally efficient filtering method that uses a local area for the assimilation (Hunt et al., 2007;Miyoshi and Yamane, 2007). In addition, we used a physically based empirical localization technique (Revel et al., 2019) to reduce erroneous correlations and assimilate observations in significantly correlated areas. It has been found that the physically based empirical localization method performed better than the conventional square-shaped local patches in hydrodynamic DA schemes (El Gharamti et al., 2021;Ishitsuka et al., 2020;Revel et al., 2019;Wongchuig et al., 2019). Hence, the assimilation framework is capable of estimating river discharge at the global scale provided satellite observations are available. Once the SWOT satellite is launched, the methods developed in this study will be valuable for accurately estimating river discharge.

DA performance under various conditions
To examine DA performance under model conditions such as biased runoff forcing and corrupted river bathymetry, we performed biased runoff and corrupted bathymetry experiments. An artificial bias of −50 % was introduced into the ensemble mean of the "runoff ensemble" (Fig. 1a) for each assimilation approach, namely, direct DA, anomaly DA, and normalized value DA (Sect. S2, Fig. S4 in the Supplement). Because of the bias introduced by the runoff forcing, river discharge was approximately 50 % lower in the open-loop simulation than in the observations. We artificially corrupted the river bathymetry to represent errors in the hydrodynamic model (Sect. S3, Fig. S5 in the Supplement). River channel depth was increased by 25 % in the corrupted bathymetry experiment. Then we assimilated satellite altimetry into the hydrodynamic model with corrupted river bathymetry through the direct, anomaly, and normalized value DA methods. In general, the WSE was reduced by approximately 25 % of the river channel depth. For simplicity, we used only a single runoff (HTESSEL; Balsamo et al., 2011) from E2O WRR2 to prepare the runoff ensemble. The HTESSEL runoff from E2O WRR2 is fairly unbiased (Dutra et al., 2017;Revel et al., 2021), and the default bathymetry parameter of CaMa-Flood should provide adequate WSE estimates (Yamazaki et al., 2012). Simulations using the default CaMa-Flood bathymetry parameter and HTESSEL runoff are referred to as "normal conditions".
Runoff bias and bathymetry errors affect the accuracy of assimilated river discharge in different ways, and Fig. 11 compares boxplots of NSE values for different DA methods with different error conditions. When neither runoff nor bathymetry was erroneous, the normalized value DA method performed best (median NSE = 0.83) at estimating river discharge in terms of NSE (Fig. 11a). When the bathymetry contained some errors but runoff was unbiased (Fig. 11b), none of the DA methods were able to improve the river discharge from open-loop simulation. The performance of river discharge in the open-loop simulation was not affected much by the river bathymetry corruption, as river discharge is not influenced much by the river bathymetry (Modi et al., 2022). But when the satellite altimetry was assimilated into the hydrodynamic model, the accuracy of the river discharge estimates was degraded. In the direct DA, assimilated WSE values were higher than the observed WSE, resulting in river discharge overestimation when the bathymetry has errors (Sect. S5, Fig. S7 in the Supplement). In contrast, anomaly and normalized value DA were affected by the bias in the open-loop statistics (i.e., mean and standard Figure 11. Comparison of the Nash-Sutcliffe efficiency (NSE) of assimilated river discharge under various conditions: (a) without runoff bias or bathymetry error, (b) without runoff bias and with bathymetry error, (c) with runoff bias and without bathymetry error, and (d) with runoff bias and bathymetry error. The direct, anomaly, and normalized value DA results are represented in blue, yellow, and red, respectively. deviation) used for generating anomalies and normalized values (Sect. S5, Fig. S7 in the Supplement). Bias in the runoff ensemble strongly affected the accuracy of river discharge estimation with anomaly and normalized value DA, as bias in the runoff causes bias in the mean and standard deviation used to generate WSE anomalies and normalized values (Fig. 11c). Direct DA provided the best discharge estimation (median NSE = 0.68) when runoff was biased. When both runoff and river bathymetry were erroneous, none of the DA methods produced better discharge estimates than open-loop simulation. Therefore, the normalized DA method worked well under normal conditions, but anomaly DA produced better discharge estimates when the river bathymetry had errors, and the direct DA method performed best under runoffbiased conditions. Simple calibration of the hydrodynamic model is recommended for successful normalized value DA (i.e., bias correction of runoff to obtain the mean discharge and river bathymetry calibration to accurately determine the mean WSE).

DA performance with calibrated river bathymetry
Investigating the performance of DA with corrected bathymetry is essential, as river bathymetry is the most influential parameter for WSE (Brêda et al., 2019). Calibrating the river bathymetry increases the accuracy of the hydraulic relationship between discharge and WSE (i.e., the rating curve; Zhou et al., 2022), thereby improving discharge estimation with direct DA (median NSEAI = −0.50; Fig. 12). Minimization of the WSE bias attributable to river bathymetry improved discharge estimates obtained with the direct DA method, although the anomaly and normalized value DA approaches had little effect on the estimation of river discharge (Fig. 12b and c). River discharge estimation can be improved by updating river-related parameters. However, anomaly and normalized value assimilation (with and without river bathymetry calibration) had greater assimilation efficiencies than direct DA with the calibrated model. Therefore, correcting river-related parameters is essential to achieving good river discharge estimates with direct DA.
Furthermore, the bathymetry parameters calibrated with the rating curve approach were the most accurate values at- tainable under current conditions (Zhou et al., 2022). However, direct DA was not capable of producing more accurate river discharge estimates than other DA approaches (i.e., anomaly and normalized value DA; Fig. 12). This finding indicates that calibrating a single parameter (i.e., river bathymetry) may be insufficient to improve the overall accuracy of river discharge estimation using direct DA. Hence, calibrating other river-related parameters (e.g., riverbank height, floodplain profile, and cross-sectional shape) is necessary to increase assimilation efficiency (median NSEAI > 0) when assimilating satellite altimetry data directly into large-scale hydrodynamic models such as CaMa-Flood.
Direct DA offers several benefits over anomaly or normalized value assimilation. Although the direct DA approaches reduced overall accuracy, the sharpness of the ensemble spread was substantially reduced compared to the anomaly and normalized value DA approaches (e.g., Fig. 4c-e). In addition, the improvement in the accuracy of river discharge estimates with the anomaly or normalized value assimilation was lower in river reaches with high biases in openloop runoff estimation (Sect. S4, Fig. S6 in the Supplement). This finding suggests that direct DA methods can be used to correct river discharge values in river reaches where runoff causes large biases but the river bathymetry parameter is reasonably accurate. By contrast, the reliability of discharge estimates in the anomaly and normalized DA experiments was highly dependent on the quality of the runoff ensemble. Therefore, direct assimilation has several advantages, such as greater confidence in DA-estimated river discharge and accurate discharge estimation even when the runoff ensemble is biased.

Potential and limitations of river hydrodynamics DA
The development of a discharge reanalysis product (such as those described by Feng et al., 2021 andWongchuig et al., 2019) is crucial to evaluating the reliability of the assimilated discharge product within the capabilities of current hydrodynamic modeling. In addition, reanalyses of river discharge play an important role in biodiversity and biogeochemistry research (Messager et al., 2021). Discharge estimated from the assimilation of satellite altimetry characterized the flow dynamics of the Amazon basin better than estimates from a state-of-the-art hydrodynamic model (Fig. 10). However, CaMa-Flood modeled river discharge better than the assimilated product in certain river reaches along the Amazon mainstem. These discrepancies are primarily due to the limitations of hydrodynamic modeling, as the assimilated WSEs were adequately represented in the assimilated simulation. As we assimilated WSE and corrected the initial conditions of the following time step using CaMa-Flood parameters (e.g., riverbank height, river bathymetry, river width, and floodplain profile), the errors of the modeling framework may have propagated into the river discharge estimates at the next time step. These limitations can be circumvented through the assimilation of in situ or remotely sensed river discharge observations into hydrodynamic models (Emery et al., 2020b;Feng et al., 2021;Paiva et al., 2013a;Wongchuig et al., 2019). Yet, with decreasing numbers of in situ gauges (Hannah et al., 2011;Shiklomanov et al., 2002;Vörösmarty et al., 2001) and the low accuracy of remotely sensed river discharge estimates (Bjerklie et al., 2018;Gleason and Durand, 2020;Gleason and Smith, 2014), obtaining consistent and reliable observations can be difficult. DA of satellite altimetry had several advantages over hydrodynamic modeling, in particular when it came to ac-curately estimating low flows and unanticipated peaks that were not reflected in the runoff forcing (Fig. S8). These unexpected peaks, which were not as large as annual peaks, were characterized well by DA methods (Fig. S8), although the open-loop simulation did not identify them. Low flows were estimated well with normalized assimilation (Sect. S6, Fig. S9 in the Supplement) and further improved through calibration of the river bottom elevation (Sect. S4, Fig. S6 in the Supplement). Hence, the DA scheme accurately represented low flows and unforeseen secondary peaks.
The normalized DA approach may have been unable to accurately predict other variables, such as WSE and flood extent, as the assimilation was performed in transformed space. WSE estimation using normalized value DA had lower overall accuracy than direct DA (Fig. 7) based on median RMSE (Table 4). Moreover, flood extent would be better estimated with direct DA than other DA methods, as flood extent is diagnosed with WSE in the CaMa-Flood hydrodynamic model. Hence, normalized DA may be unable to effectively predict various important variables (e.g., WSE and flood extent).

Conclusion
This study explored strategies for assimilating satellite altimetry data into a contemporary hydrodynamic model. As existing large-scale hydrodynamic models either are too conceptual or have uncertainties in their parameter schemes, direct or anomaly assimilation of satellite altimetry may introduce inaccuracies due to discrepancies between satellite altimetry and simulated WSE. We assessed direct, anomaly, and normalized value DA schemes using a continentalscale hydrodynamic model, CaMa-Flood (Yamazaki et al., 2011). We used the physically based localization approach of LETKF to assimilate satellite altimetry data in the Amazon basin. Normalized value assimilation performed better than other approaches to estimating river discharge in this continental-scale basin. River discharge was accurately estimated with normalized value assimilation in most river reaches covered by satellite observations (NSE > 0.6).
We investigated the capacity of DA approaches to reliably estimate river discharge through cutting-edge hydrodynamic modeling. River discharge was well characterized in the normalized value assimilation experiment, with a median NSE of ≈ 0.50, which was better than the river discharge produced by the uncalibrated model with default parameters using VIC BC runoff (Lin et al., 2019) (median NSE ≈ 0.42). The median NSE of river discharge improved by 19 % with the assimilation of satellite altimetry into a continental-scale hydrodynamic model. Improvements were evident across the entire Amazon basin; however, some degradation occurred due to the underestimation of peak river discharge in the Amazon mainstem. This underestimation of peaks may be attributable to uncertainties in other parameters of the hydrodynamic model.
The estimation of river discharge using DA methods is variable and depends on the state of the runoff data (i.e., biased runoff state) and the accuracy of river cross-sectional parameters (e.g., river bathymetry). In the current condition of the hydrodynamic model with perturbed HTESSEL runoff from the E2O WRR2 data set, the normalized value DA method performed best among other DA methods. But when the runoff was biased without river bathymetry error, the direct DA approach performed best. However, when the river bathymetry was erroneous, none of the DA methods performed better than open-loop simulation. Hence, different DA approaches should be used depending on the runoff and river bathymetry. To realize the advantages of the normalized value DA approach, basic model calibration is necessary, such as calibration of the runoff to capture the mean discharge and moderate calibration of bathymetry to capture WSE patterns.
River bathymetry calibration enhanced the accuracy of the river discharge estimates produced using the direct DA method but had minimal effect on normalized assimilation. Zhou et al. (2022) used a calibration strategy to increase the accuracy of river bathymetry by decreasing the WSE error by utilizing the stage-discharge relationship; they found that the approach does not necessarily improve river discharge accuracy. In addition, when the calibrated model was forced by runoff with large errors, normalized DA did not improve the estimation of river discharge because of bias in the mean discharge and WSE. The quality of runoff perturbation data should be evaluated before they are used in anomaly or normalized value assimilations.
The use of precise river cross-section estimates and floodplain dynamic processes may improve peak discharge estimates. To represent the river discharge more accurately, some improvements to the CaMa-Flood hydrodynamic model may be necessary. Furthermore, assimilating multiple variables such as river discharge, WSE, and the flooded area may improve discharge estimates further. Overall, the methods developed in this study demonstrate great potential for using available satellite altimetry to improve river discharge estimation in continental-scale rivers within the limitations of current hydrodynamic models.

Data availability.
The key data sets used in this study are available from https://doi.org/10.4211/hs.08e1b18aa9f240758dd13d9ac875621f