Extraction of roughness parameters from remotely-sensed products for hydrology applications

Along rivers, where local insitu gauges are unavailable, estimation of river discharge are undirectly derived from the Manning formula that relate discharge to geomorphological characteristics of the rivers and flow conditions. Most components of the Manning formula can currently be derived from spaceborne products except for two features: the unobserved always-wet bathymetry and the roughness coefficient. Global-scale applications use simplified equivalent riverbed shapes and empirical parameters while local-scale applications rely on finer model dynamics, field survey and expert knowledge. Within the framework 5 of the incoming Surface Water and Ocean Topography (SWOT) mission, scheduled for a launch in 2022, and more particularly, the development of the SWOT-based discharge product, fine-resolution but global discharge estimates should be produced. Currently implemented SWOT-based discharge algorithms require prior information on bathymetry and roughness and their performances highly depend on the quality of such priors. Here we introduce an automatic and spaceborne-data-based-only methodology to derive physically-based roughness coefficients to use in one-dimensional hydrological models. The evaluation 10 of those friction coefficients showed that they allow model performances comparable to calibrated models. Finally, we illutrate two cases of application where our roughness coefficients are used as-is to initiate the experiment: a data assimilation experiment designed to correct the roughness parameters and an application around the HiVDI SWOT-based discharge algorithm. In both cases, the roughness coefficients showed promising perspectives by reproducing, for the data assimilation application, and sometimes improving, in the SWOT discharge algorithm case, the calibrated-parameter-based performances. 15


Introduction
At the interface with the atmosphere and the oceans, continental water is at the core of the water cycle. It is therefore crucial to study the dynamics of continental waters to understand the Earth climatic system and close the water budget. More particularly, continental waters are freshwaters, essential to life form and human activities. Freshwaters represent only 2.5% of the total amount of water and an even smaller part is directly accessible, e.g in rivers and lakes. Nevertheless, the continental part of 20 the water cycle is particularly active as freshwaters are continuously moving and renewed. Therefore, Oki and Kanae (2006) stated that freshwater availability is better measured through the estimation of water fluxes, and notably river discharge. 2020) and even wood (Hygelund and Manga, 2003), in either the river channel (Kean and Smith, 2006;Ebrahimi et al., 2008;Fathi-Moghadam et al., 2011) or floodplains (Galema, 2009;Jung et al., 2011;Whittaker, 2014;Walczak et al., 2015). Such environments are difficult to reproduce outside of the experiment scope while also relying on fine-scale parameters that are not obtainable with remote-sensing techniques. Even Chaulagain (2018) tried to derive friction laws using remote-sensing airborne LiDAR data but such products are unavailable at global-scale. In a more general way, these methods are hardly applyable to 70 remote-sensing since there is no acceptable methodology to estimate flow velocities via remote-sensing.
From another perspective, founding works in hydrology established reference tables for the value of the friction coefficients given the soil composition, the surrounding vegetation, the shape of the river bed, the nature of the flow (Chow, 1964;Arcement and Schneider, 1989); and are still widely used. Most of this information can be found in global-scale products issued from automatic chains using remotely-sensed data. Among them, the processing chain IOTA 2 (Inglada et al., 2017) allows the fully 75 automatic generation of land cover maps at country scale. Using any existing soil-related databases in a supervised classification context, high resolution satellite optical images can be treated to produce corresponding land cover maps. Such global-scale information could be derived into global-scale friction information.
Therefore, the objective of the present study is to introduce an automatic methodology to derive physically-based, modelfree prior values of friction coefficients, using, among others, land cover maps produced with the IOTA 2 chain. Such products 80 would therefore be usable globally. First, we describe the whole processing chain and emphasize on how it can exported on any study domain given some reference data to train the IOTA 2 chain. Then, the quality of such priors is first compared to friction coefficients obtained from model calibration, via hydraulic simulations using the 1D-hydraulic Mascaret model (Goutal et al., 2012) applied over a portion of the upstream Garonne river in France and a portion of the Po river in Italy. Subsequently, we validate our products through two application cases: a real-data DA experiment around the Mascaret model applied to the Po 85 river domain and a discharge inference experiment with the HiVDI SWOT-discharge algorithm (Larnier et al., 2021) over both the Garonne and the Po, along with an additional application domain -namely a portion of the Brahmaputra river -to test our approch over another, less-gauged domain. river channel and the adjacent floodplain following the Cowan's formula. For both the channel and the floodplain, the Guide presents a series of tables that map characteristics of the rivers with values of each parameter for the Cowan's formula. Then, it suggests a range of adjustment values given some qualitative conditions (e.g. "severe irregularity" or "minor obstruction").
While, in some cases, numerical values are provided to distinguish between two different classes (e.g. "minor obstruction in a floodplain can be assumed if the obstructions occupy less than 15% of the total cross-sectional area"), the classification is 110 mostly based on observational data (what an operator would actually observe on field).
The method developed in this paper adapts this procedure. Its novelty lies in relying on information from remote global datasets rather than on local observational data to compute each indicator. The approach allows to compute each element of the Cowan's formula and therefore estimate the value of n. While land cover maps are extensively used to derive most of those terms, other sources of data are also used when more appropriate. 115 The general output format for our roughness coefficients would be 2-dimensional maps for roughness coefficients in the floodplain and 1-dimensionnal along-centerline data point for roughness coefficients in the main channel. Indeed, while only raster-based products are used to derive roughness coefficients in the floodplain, vector-based products are also employed to derive roughness coefficients in the main channel. The next section details the global-scale remotely-sensed dataset used in our methodology. 120 2.1.2 Global remotely-sensed products used to derive friction coefficients Table 1 sums up all global-scale products used for our approach and their availability.
Land cover maps The processing chain IOTA 2 (Inglada et al., 2017) allows the fully automatic generation of land cover maps at country scale and high resolution. The chain relies on satellite optical imagery (e.g. Landsat-8, Sentinel-2) and on existing land cover databases (e.g. Corine Land Cover, Randolph Glacier Inventory). Satellite imagery is collected over a 125 complete year and pre-processed with temporal gapfilling and resampling techniques. Three spectral indices (NDVI, NDWI, brightness) are computed at every date to build the feature vector. The land cover databases are cleansed, filtered and merged to generate a reference dataset, that is defined with the targeted number of land cover classes. A first part of the reference dataset is used to train a supervised classifier, namely a Random Forest, while the second part is used to validate the land cover maps obtained with the trained classifier. The chain has been successfully applied over France to generate land cover maps for 130 multiple years already.
Given the reference database used to train the IOTA 2 chain, the land cover map nomenclature will be different. Our methodology relies on 8 broader classes detailed in Table 2. In future applications, land cover maps should be aggregated to this reduced nomenclature to be later used in our chain.
SoilGrids (Poggio et al., 2019) is a system for global soil mapping. SoilGrids250m, its main product, provides, for 6 different 135 depth-levels, global maps of soil properties at a resolution of 250 m. It allows to access at the pixel-level an estimation of the fraction of silt, sand and clay that constitutes the soil.
The Global River Widths from Landsat or GRWL (Allen and Pavelsky, 2018) was built to characterize the global coverage of rivers and streams wider than 30 m. One of the products offers a vector-based dataset of river centerlines, each segment being characterized by several properties including an estimation of the river width at the mean annual discharge.   Table 3. Indicators and values of n b (bed material) for both the main channel and the floodplain.
GlobalDatasetForRivers (Frasson et al., 2019) took advantage of GRWL and the Shuttle Radar Topography Mission (SRTM) digital elevation model to create a global database of river width, slope, catchment area, meander wavelength, sinuosity, and discharge and is also distributed as a vector-based dataset of river centerlines.
The next sections detail how the above products are used to derive each term of the Cowan formula.

Computation of n b
145 n b represents the friction only due to the surface onto which water flows, for both the channel and the floodplain. Arcement and Schneider (1989) directly relates the value of n b to the materials that constitute the soil surface or soil type, e.g. sand, concrete, firm soil, gravel, cobble, etc. Since SoilGrids250m provides the fraction of sand, silt and clay that composes the ground at the surface level, n b is derived from this database. The value of n b for each type of soil are then given in Table 3, taken from Arcement and Schneider (1989).

150
The value of n b at any given location is computed as the weighted average of the adopted values of n b given the fraction of corresponding soil type. There are a few locations where SoilGrids provide no data. In this case n b is computed as the average of the three adopted values of n b values, equal to 0.0245 s.m − 1 3 .
2.1.4 Computation of n 1 n 1 represents the effect of surface irregularities. For the floodplain it is computed based on the land cover type. Arcement and Schneider (1989) relates the value of n 1 to the degree of irregularity of the floodplain surface specified as smooth, minor, moderate or severe. Table 4 shows how these four categories are mapped to the aggregated land cover classes derived from the IOTA 2 chain. Note that the mapping between Arcement and Schneider (1989) categories and the land cover classes suggested here (between column 1 and 2 of Table 4) was set according to our hydrology expertise.

160
Note that the "water bodies" class from Table 2 is ignored here as we are interested in the surface below the water to derive roughness. Therefore, the nomenclature from Table 4 can not be used for the main channel.
For the main channel, the method relies on cross-sectionnal profiles information. To evaluate the irregularity of the crosssectionnal shape, each cross-sectionnal profile is first smoothed using an univariate cubic spline between the edges of the profile. Illustrations of the smoothing procedure are given in Appendix B. Then, a ratio between the original cross-section 165 length and the smoothed profile length is calculated. Finally, Table 5 shows how this ratio is mapped against the categories suggested by Arcement and Schneider (1989) to estimate the value of n 1 in the main channel. Note that, currently, the threshold values were picked to match our knowledge of the study domains.
2.1.5 Computation of n 2 n 2 represents the effect of longitudinal variations in cross-section shape and size in the main channel. Note that n 2 is assumed 170 to be equal to 0.0 s.m − 1 3 in the floodplain as suggested in Arcement and Schneider (1989). Glaciers and perpetual snow -0.000 Table 7. Indicators and values of n3 (degree of irregularity) for the floodplain.
The value of n 2 in the channel is determined based on how frequently the channel shape and size alternates between large and small cross sections. The channel widths sequence, extracted from GRWL along the study domain, is used as a proxy to evaluate this feature. More specifically, the first derivative of the width signal is derived to measure the range of width variations. In the end, the value of n 2 is determined according to the standard deviation of the width first derivative value. 175 Table 6 shows the categories created with this indicator that map with the categories suggested by Arcement and Schneider (1989) to estimate the value of n 2 . Additionnal illustrations of GRWL width variations and its first derivative along the study domains are given in Appendix B. Similarly to the thresholds used for n 1 , note that the threshold values are currently also picked according to match our knowledge of the study domains.
2.1.6 Computation of n 3 180 n 3 represents the effect of obstructions.
Since the scope of the proposed method is framed by the SWOT mission, its goal is to estimate the value of n for large rivers (width>30m) for which obstructions within the channel are rare and can be neglected. The value of n 3 is therefore assumed to be equal to 0.0 s.m − 1 3 in the channel. The value of n 3 in the floodplain is derived from land use maps. Table 7 shows how the land cover nomenclature is associated 185 with the categories suggested by Arcement and Schneider (1989) to estimate the value of n 3 . 2.1.7 Computation of n 4 n 4 represents the effect of vegetation.
As for the estimation of n 3 , the effect of vegetation in the main channel is assumed to be negligible and set to 0.0 s.m − 1 3 . The value of n 4 in the floodplain is derived from land use maps. Table 8 shows how the IOTA 2 classification can be mapped 190 with the categories suggested by Arcement and Schneider (1989) to estimate the value of n 4 .

Computation of m
m is a coefficient that accounts for the effect of meandering. According to the Guide the value of m can be assumed to be equal to 1.0 in the floodplain.
The sinuosity parameter calculated globally by Frasson et al. (2019) is used as a proxy to infer the value of m in the main 195 channel, see Table 9.

Method applicability at global scale
This method aims at being appliable globally. This may be possible as most of the required data are publicly available and have global coverage, see Table 1.
However we should first point out that the IOTA 2 products are currently only available over France. Thus generating land 200 cover maps from the IOTA 2 chain on its own is mandatory for study areas outside France. Runnning the IOTA 2 chain is manageable but requires both training and availability of the required inputs. From our experiments (e.g. Po river, see Section 2.2.2 and Brahmaputra river, see Section 3.2), it may be difficult to find cloud-free Sentinel 2 images which are required to compute Land-cover maps using the IOTA 2 chain. Still, cloud-free Sentinel 2 images can be produced using, for example, the MAJA software (Baetens et al., 2019).

205
The methodology also requires cross-sectionnal profiles to compute n 1 in the main channel. Such profiles are generally obtained locally from field surveys. Still, one could reconstruct profiles down to the lowest observed water level from remotelysensed products; by combining, for example, measurements of elevations -from nadir altimetry -simultaneously with watermasks -from optical imagery or even watermask processing chains such as SURFWater (

Study Domain and Data
Our method was first tested over two European study domains: a portion of the upstream Garonne river in France and a portion of the Po river in Italy.

Garonne domain 215
The study area consists in a 75-km-long reach of the Garonne river, from Toulouse to Castelsarrasin, see Figure 1. The Garonne river is the largest river in South-West of France, it drains an area of about 10,000 km 2 in Toulouse. The studied segment has an average slope of 0.9 m/km, the main channel width ranges from 100 to 300 m and the floodplain width can reach 4 km.
The topography of the reach is described by a set of 145 cross-sections. Each cross-section is the compound product of a local topographic survey achieved during the early 2000s (main channel) and an extraction from the French national Digital 220 Elevation Model (DEM) RGE ALTI (floodplain).
Discharge data has been collected over the year 2019 at multiple gauges, located either directly on the Garonne or on its main tributaries. Water level data has also been collected over the same time period at a single gauge located in the middle of the reach (Verdun-sur-Garonne).
Land cover maps of France obtained from the IOTA 2 chain are already freely available. The map latest version classifies land 225 use in 23 different categories. This native nomenclature was deemed too high for developing the quasi-automatic procedure to estimate roughness coefficients. A new land cover map was therefore created by aggregating the original 23 classes down to the 8 macro-classes introduced in Tab 2. Table 10 shows how the original classes were mapped to the aggregated classes while

Po domain
The study area consists of a 96-km-long reach, from Cremona to Borgoforte, see Figure 5. The Po river is the largest river in Italy, it drains an area of about 50,000 km 2 in Cremona. The studied segment has an average slope of 0.15 m/km and the main 235 channel width ranges from 150 to 500 m. The floodplain is marked by the presence of two continuous dike systems, whose width ranges from 400 m to 4 km. The topography of the reach is described by a set of 91 cross-sections whose data points have been extracted from a 2005 2m-resolution DEM that was made available by the Po River Basin Authority (AdBPo). This DEM was obtained by a fusion of Lidar data for the floodplain and underwater sonar and ground survey data for the main channel.

240
Discharge and water level data has been collected over the period 2016-2019 at multiple in situ gauges from https://simc. arpae.it/dext3r/. Water level data has also been collected over the same time period at three virtual stations located on or very close to the Po river. This data has been downloaded from the online platform Hydroweb that makes available water level time series of continental waters (large lakes, reservoirs, major rivers) obtained after processing nadir altimetry measurements.
Land cover maps generated by the IOTA 2 chain were not available for this study domain. The chain was therefore directly 245 run to create a land cover map, using Corine Land Cover (CLC) 2018 as reference data for training. CLC was however not used as is, its classes were aggregated into the 8 meta-classes described in Table 2 (see the details of the aggregation in Appendix A).
This resulted in the creation of a land cover map consistent with the one available for the Garonne study domain, see Figure 6.
The resulting roughness coefficients -defined as Strickler coefficients (the inverse of the Manning coefficient) -are illustrated in Figure 7 for the floodplain and in Figure 8 for the main channel. Moreover, general statistics on the Strickler coefficient values 250 are displayed in Table C3 of Appendix C.

Approach
The roughness coefficient of a river portion is not a quantity that can be measured on the field. Because of that, the results generated by the method developed in this paper cannot be compared to any existing dataset. The method, however, generates 255 roughness coefficients that can be used as input to design a hydraulic model, whose river-related outputs can be compared to observed data.
The approach adopted to validate the method consists of two steps. First, a hydraulic model is required over a study domain whose characteristics are within the scope of SWOT. This model, either constructed for the sake of validation or obtained from another source, serves as a reference model. Its roughness coefficients must be set after a calibration procedure, whose 260 complexity should be limited to reflect the global scope of the method. The results obtained from this model are compared to observed data, which allows to compute a reference level of performance achievable over the study domain, using metrics such as the Nash-Sutcliffe Efficiency (NSE). Second, a hydraulic model is constructed based on the roughness coefficients estimated    from the method developed in this paper, this model is referred to as the estimation model. Its results are compared to the same observed data and the method is deemed validated if the estimation performance is close to the reference performance.

265
The method described previously is first applied to the two study segments of the Garonne and Po rivers. For the channel n 1 , n 2 and m are directly determined for each cross-section. The values of n b , n 1 , n 3 and n 4 for the floodplain, and n b , n 3 and n 4 for the channel, are determined at each cross-section vertex, and then aggregated over their length with a weighted average to obtain a pair of values per cross-section. The Cowan's formula (2) is finally used to calculate the value of n per cross-section for both the channel and the floodplain.

Models implementation over the study domains
1D hydraulic models for both the Garonne and Po study domains are constructed with Mascaret, set up to solve the shallow water equations with the finite difference Preissman scheme.
For the Garonne river, the upstream boundary condition is the discharge time series collected from the gauge Portet-sur-Garonne (only 10 km upstream of the actual model boundary). The downstream boundary is a rating curve built on the as-275 sumption of a normal flow. Input discharge from the 4 largest tributaries of the reach are integrated as point source discharge.
The collected cross-sections are used to represent the river topography. For the reference model, the roughness coefficient was For the Po river, the upstream boundary condition is set to the discharge time series observed at Cremona and the downstream condition to the water level time series observed at Borgoforte. Input discharge from the tributaries are considered as negligible.
The cross-sections that were manually delineated are used to represent the river topography, their width has been limited to the 285 first system of dikes to prevent the model from inundating areas normally protected behind this system. Again, for the reference model, the roughness coefficient was assumed to be spatially homogeneous within the reach and still differentiated between the main channel and the floodplain. The model was calibrated over the period 2016-2018, by finding the pair of main channel and floodplain roughness coefficients that optimizes the value of a mean NSE computed over water level time series at 4 in situ gauges (Cremona, Isola Pescaroli SIAP, Casalmaggiore, Boretto). The calibration procedure led to set the main channel 290 Strickler coefficient to 34 m 1/3 .s −1 and the floodplain one to 8 m 1/3 .s −1 , corresponding to a mean NSE of 0.94. It also showed that the model was far less sensitive to the floodplain roughness coefficient value than the main channel one.

Reference Estimation Reference Estimation Reference Estimation
Verdun-sur-Garonne 0.22 0.24 0.04 0.14 0.90 0.89 Table 11. Metrics of the simulated and observed water levels for the reference and the estimation model of the Garonne at the Verdun-sur-Garonne gauge.
it was calibrated to better fit the water level time series collected at Verdun-sur-Garonne. A bias of -0.12 m is obtained with the estimation model and the SD has slightly decreased, going from 0.22 m to 0.21 m. 300 Table 11 shows statistics obtained at Verdun-sur-Garonne. The bias previously noted is found again in the PBIAS value of 0.14% for the estimation model compared to a value of 0.04% for the reference model (a positive PBIAS indicates a model underestimation). Both the RMSE and the NSE are deteriorated, although just slightly. These statistics demonstrate that the results obtained with the estimation model are very close to those obtained with the reference model.
Similarly, Figures 11 and 12 show a comparison of the simulated and observed water levels for the reference model of the Po, 305 aggregated among 4 gauges (Cremona, Isola Pescaroli SIAP, Casalmaggiore, Borreto). The MRE is close to 0 for the reference model, which is expected since it was calibrated to better fit these water level time series. A global bias of -0.33 m is obtained with the estimation model and the SD is slightly increased from 0.40 m to 0.48 m. Table 12 shows the results obtained at each gauge location. The global bias already noted is found in each PBIAS value (a positive PBIAS indicates a model underestimation). There is however no clear pattern that emerges from the results, while    roughness coefficient is purely physically-based and the calibrated model encompasses model structural errors to match the observations. Still, our estimated model performs fairly well and allows us to serenely use those roughness coefficients in our applications.

Applications
These new remote-data-fusion-based friction can serve various applications in hydrology. A first and obvious field of appli-320 cation would be to use these friction coefficients in one-dimensional shallow water models (such as HEC-RAS, Mascaret or DassFlow-1D amont others). A second field of application could be fine-scale 2D shallow water models for surveys or research on flood prediction. Indeed our chain does not only provide friction coefficients in the main channel and in the near floodplain, but could also be extended to the entire floodplain using the 2D maps of IOTA 2 and ancillary data (SoilGrids, etc.). In this case the values computed by our chain could be used as-is or would require a small tuning. One must recall that the friction 325 coefficients in shallow water models encompass the modeling errors and so may vary from on model to another and also to take into account modeling differences between one-dimensionnal and two-dimensional models. But having a good starting point for a calibration reduces the computational cost.
Finally these coefficients could also be used as-is or as good first guess for DA in other models that are based on shallow water equations or their simplification (Manning equation, Kinematic wave, etc.). For instance they could be used in the large 330 scale model MGB-IPH (Diffusive wave equations) or in some of the SWOT discharge algorithms, such as MetroMan (Durand et al., 2014), geoBAM (Brinkerhoff et al., 2019) or HiVDI (Larnier et al., 2021). Here we present two applications. The first application is a DA experiment in the one-dimensional shallow water model Mascaret while the second application shows the benefits of using the computed friction coefficients as a first guess for the HiVDI SWOT discharge algorithm.

335
A first case of application for the remote-data-fusion-based friction coefficients will consist in a Data Assimilation (DA) experiment around the Mascaret 1D-hydraulic model (Goutal et al., 2012).
DA consists in correcting parameters and/or state variables of a numerical physical model from external observations to improve its forecasting capabilities. DA can be tackled with two main approaches: "state" estimation or "parameter" estimation. applications, more precisely within the preparation to the SWOT mission. For more details, the reader should read Mirouze and Ricci (2021). Note that SMURF can orchestrate both kind of assimilation introduced above. Here, SMURF is used to orchestrate the application of the Ensemble Kalman filter (EnKF) DA algorithm around the Mascaret model.
The EnKF (Evensen, 2003) is a widely-spread extension of the classical Kalman filter (Kalman, 1960) developed for numeri-350 cal models with nonlinear dynamics. Error covariance matrices are stochastically estimated directly from an ensemble of model runs so they do not rely on a linearity assumption. For more details on the EnKF implementation, please see Appendix D.
Here, we realized two DA experiments around the Po model over the period from Oct, 01 2019 to Dec, 31 2019, as this period covers three flooding events. Both experiment assimilates real-data from insitu and altimetry gauges. The difference between the two relies on the friction coefficients used as first guess to initiate the experiment: first, the uniform calibrated 355 values and second, the roughness from our methodology.
Water levels measurements from 4 insitu stations and 3 virtual stations from Sentinal-3A are assimilated. All observations were converted into equivalent water surface elevations referenced to the same geoid as the Mascaret model. The observation error for the EnKF is modeled as a white noise with a given standard deviation depending on the gauge type. A dispersion of 10 cm was taken for the insitu gauges and the averaged error provided by Hydroweb (http://hydroweb.theia-land.fr/) was used for 360 the virtual gauges, see Table 13 for the detailed observation error value. Insitu measurements are delivered between every 15 or 30 minutes with possible missing data within the entire timeseries while virtual gauges have a revisit period of 27 days, with possibly missing data. We set an assimilation window of 1 day for the EnKF. During a given assimilation cycle, all available measurements for the current day are assimilated and compared to the closest simulated timestep.
For both experiments, the spatial distribution of Strickler coefficients in both the main channel -Ksmin variable -and the 365 floodplain -Ksmaj variable -are simultaneously updated. Note that Mascaret does not differentiate a right and left floodplain so only one overall value of Strickler coefficient is required for the floodplain. Given the spatial coverage of the observations to be assimilated, the Strickler coefficients are set uniform over 7 zones delimited by the gauges position on the 1D domain. Therefore, the control vector to update through data assimilation contains 14 elements: 7 Strickler coefficients for the main channel and 7 Strickler coefficients for the flooplain. The control error is set as a uniform law which range is initialized as ± 370 30% of the prior value in the river channel and as ± 50% of the prior value for the floodplain. DA performances for the real-data experiment can only be evaluated at the observation gauges, see Table 14 and Figures 13-14. Overall, the second experiment performs slightly better than the first one, which validates more our infered roughness coefficient. After the assimilation in the first experiment, water level RMSE are little or not reduced compared to the free run 375 while the contribution of the DA is clearer with the second experiment. Moreover, in the second experiment, DA displays interesting results with mostly improved RMSE but decreasing enhancement the more downstream the observation gauge.
This could be easily explained by the downstream control imposed by the model where a water level timeseries is used as downstream boundary condition and constrains simulated water surface elevations including the two most downstream gauges.
The physically-based roughness coefficients are here used as initial prior values for DA experiments. It is true that part of 380 the fine-resolution information was lost by aggregating over 7 larger areas but this decision was based on the availability of  Table 13. For each zone ordered from upstream to downstream along the Po model (rows), observation error standard deviation for the associated gauge located at the upstream on the zone (column 2); allowed values range for Ksmin (column 3); first experiment initial mean prior value for Ksmin (column 4); second experiment initial mean prior value for Ksmin (column 5); allowed values range for Ksmaj (column 6); first experiment initial mean prior value for Ksmaj (column 7); second experiment initial mean prior value for Ksmaj (column 8  Table 14. Real-data experiment results before (left-hand side of the slash character) and after (right-hand side of the slash character) assimilation per zone. Performances are evaluated by RMSE for Z at observation gauges: free run RMSE (left-hand side of the slash character) are compared to analysis run RMSE (right-hand side of the slash character). Figure 13. Model performances over simulated water surface elevation Z at observation gauges before (blue "prior" bars) and after (red "analysis" bars) assimilation for the Experiment #1 that uses uniform calibrated roughness coefficients. the observations, as it would be complicated to assess the quality of updated roughness coefficient over unobserved part of the study domain. Better performances could be surely obtained by better tunning and setting the DA framework but such setup is out of the scope of the present study. Despite these conditions, the physically-based roughness coefficients present as interesting priors parameters to initialize such DA experiments.

SWOT Discharge Algorithm
A second case of application will consist in assessing the benefits of using the remote-data-fusion-based friction coefficients as a first guess for the assimilation of synthetic SWOT observations in the HiVDI discharge algorithm (Larnier et al., 2021;Larnier and Monnier, 2020). The performance of such applications will be compared to the current implementation that uses a rough estimation of friction coefficients from the river width. As SWOT has not been launched yet, the discharge algorithms have been 390 benchmarked so far on synthetic SWOT observations, namely the PEPSI challenges' datasets (Durand et al., 2016;Frasson et al., 2021). These datasets consist in synthetic SWOT observations of river portions with various flow regimes, computed using calibrated shallow water models. Some of the algorithms were also tested on observations from the SWOT Science simulator that simulates the full error budget of the Karin instrument or from AirSWOT data (airborne similar instrument).  Model performances over simulated water surface elevation Z at observation gauges before (blue "prior" bars) and after (red "analysis" bars) assimilation for the Experiment #2 that uses roughness coefficients from our method.
The analyses of these benchmarks have highlighted the strong dependency on the accuracy of the estimated discharges to the 395 priors, namely bathymetry, friction coefficients and mean discharge value. So it is mandatory to find new solutions to estimate good priors. The application of our chain to estimate the priors of friction coefficients is in line with this objective.
Usually the first guess of friction coefficients are computed from the mean width of the considered river portion. It may vary from 20 (small rivers) to 40 (very large rivers). This rough estimation plus the fact that it provides a uniform value for the first guess can lead to wrong estimations of discharge after the assimilation of the synthetic SWOT observations.

400
Here we realized the comparison on three cases of the PEPSI challenges. The first two cases are the Garonne river and the Po river cases. These cases were selected as the friction coefficients for these rivers have already been estimated in this study.
The third case is the Brahmaputra case (India). This last case has been selected based on the availability of input data for running the IOTA 2 chain. For this case, the Copernicus land-cover at 100m resolution (Buchhorn et al., 2020) was used for the calibration of the IOTA 2 classifier.

405
The PEPSI dataset does not contain any information about the separation between main channel and the floodplain. But as the objective of these datasets was to benchmark algorithms on "nominal conditions", namely natural rivers with no or very little flooding, we conducted our experiments under the asumption that the flow is restricted to the main channel. Thus we only used friction coefficients estimated for the main channel. Moreover, no cross-sectional profiles were previously available. As introduced in Section 2.1.9, we combined simultaneous observations of width and water surface elevations, provided by the  PEPSI dataset, to reconstruct each cross-sectionnal profile individual shape. Using a handcrafted routine, we then draw the profiles along the riverline such that they remain quasi-orthogonal to the riverline while never crossing each other.
For the Garonne River and Po river cases, friction coefficients already computed in our study were spatially interpolated using curvilinear abscissae in the PEPSI 1 datasets. For the Brahmaputra river, the friction coefficients were directly computed at the location of the cross-sections in the PEPSI 2 dataset using our new method. The statistics on the values estimated using 415 our method are listed in Table 15 with comparison to the constant values formerly estimated.
The results obtained for the three cases with comparison with previous results are displayed in Figure 15 for the Garonne river, in Figure 16 for the Po river and in Figure 17 for the Brahmaputra river. Two usual metrics for the benchmarks of SWOT discharge algorithms were computed on these results, namely the Nash-Sutcliffe Efficiency (NSE) and the Normalized Root Mean Squared Error (NRMSE). The values obtained are listed in Table 16.

420
These results show that using the friction coefficients estimated by our method as a prior looks very promising. Indeed for the Garonne and Po rivers cases the results outperform those obtained with the former method with a 47 % reduction of errors (in terms of NRMSE) for the Garonne river and 13 % for the Po River. For the Brahmaputra river, the results obtained with the new method slightly underperform those obtained with the former method with an 3 % increase of errors. However the results are comparable. These results show the potential benefits of this new method to better constrain the SWOT discharge 425 algorithms.

Conclusions
The present study introduces an automatic methodology to derive physically-based roughness coefficients for hydrology applications. The methodology is applied via a processing chain based on the Cowan's formula and can infer roughness coefficients in both the floodplain and the riverbed at different format. A two-dimensional raster image gathers the roughness coefficient in 430 the floodplain while a node-based shapefile contains the roughness coefficients along the main channel centerline. The interesting aspect of such a chain is that it only uses remotely-sensed, publically available datasets and/or existing pre-processing chains. It can therefore be applied at global-scale for a wide range of applications.
Among the input data, land cover maps generated by the IOTA 2 chain are at the core of the methodology to derive roughness coefficients in the floodplain. Such maps are currently only available yearly over France but the IOTA 2 chain itself can be 435 run over different region given any adapted training data. During the present project, we used the already-available French land cover maps over a 75-km-long reach of the upstream Garonne river near Toulouse in France and we run by ourselves the IOTA 2 chain over two additional domain: a 96-km-long reach along the middle Po river in Italy and a 250-km-long portion of the Brahmaputra river. The method was first tested over the Garonne and Po domains and the third one was only used in applications. The interesting aspect with using successively these three domains is the diminution of available local -and field 440 -data to be used. Namely, land cover maps from the IOTA 2 chain are only available over France; local DEM and insitu data were acquired only for the Garonne and Po domain while profiles had to be reconstructed and simulated data had to used for the Brahmaputra river. Therefore, we were able to apply our methodology over both well-gauged and ungauged domains.
Our chain aims at estimating physically-based roughness coefficients, therefore, for specific modeling applications, our coefficients still require some further tuning and calibration to match the model physics. Still, we evaluated the quality of 445 such parameters against calibrated ones via hydraulic simulations using the Mascaret 1D-hydraulic model and by comparing the simulated water levels to observed water levels obtained with insitu data. For both testing domains, namely the Garonne and Po domain, neither the simulations using the calibrated coefficients nor the simulations using the estimated coefficients performed clearly better than the other. These encouraging results allow us to serenely use those roughness coefficients as-is as first guess to initialize different kinds of applications for hydrology. One should keep in mind though that the produced 450 roughness coefficients are better suited for one-dimensional applications.
Finally, the roughness coefficients generated with our method were used as first guess in two applications: first, a real-data DA experiment and second, an execution of one of the SWOT discharge algorithm, here HiVDI. The DA experiment purpose was to correct simultaneously the model's roughness coefficients in both the main channel and the floodplain, for the Mascaret model applied over the Po domain by assimilating insitu and altimetry observations of water levels (for a total of 7 observing 455 gauges). The experiment showed the capacity of neatly reducing the water level RMSE. Performances were less obvious downstream but could be explained by the model implentation forcing the water level as downstream boundary condition.
Thus, the roughness coefficients generated by our method can be easily used in DA applications.
The second application is more related to SWOT-based applications. Indeed, one of the purposes of generating global scale "good" priors of roughness coefficients is to later on use them within SWOT discharge algorithms. The HiVDI algorithm, one 460 of the official in-development SWOT dicharge algorithm, is applied over the Brahmaputra domain, one of the PEPSI dataset domains, using our coefficients as first guess for the friction parameters. To evaluate the benefits of our roughness ceofficients, the performances of the experiment were compared to the usual implementation using the mean width to estimate the roughness coefficients. Results were very promising as experiments using our roughness coefficients performed better or slightly worse (for the Brahmaputra only) than the classical implementation. Given that it has been shown that the overall performances of 465 SWOT discharge algorithms highly depend on the quality of the priors, such results show the potential benefits of this new methodology to infer roughness coefficients and better constrain SWOT discharge algorithm.
The study offer several perspectives. First, given training data and cloud-free Sentinel 2 images are gathered at global scale, one could aim at extend the methodology to build global scale maps of roughness coefficients. Riverlines from Frasson et al. (2019) or the SWOT a priori river database could be used to set the spatial coverage and resolution of the resulting product.

470
Roughness would be easily derived in the floodplain given that only global-scale raster products are used. The main challenge would lie in deriving cross-sectional profiles at global scale. Meanwhile, we are aware that several features of our methodology were parameterized according to our own knowledge in general hydrology and on the study domains we chose. To adjust our choices and/or assert them more, it would interesting to lead a full sensitivity analysis of the roughness coefficients estimated by our methodology to the different terms of the Cowan formula and to the inputs.

475
Code and data availability. The IOTA 2 chain is available through the framagit platform at https://framagit.org/iota2-project/iota2.      Table C1. Descriptive statistics of estimated Strickler coefficients over the Garonne domain over the floodplain per land cover land type (rows 1-7) and over the main channel (row 8): mean (column 1), standard deviation (column 2), minimum (column 3), first quartile (column 4), median (column 5), third quartile (column 6) and maximum (column 7). Note that the "Water" land type refers to the main channel. Table C2. Descriptive statistics of estimated Strickler coefficients over the Garonne domain over the floodplain per land cover land type (rows 1-7) and over the main channel (row 8): mean (column 1), standard deviation (column 2), minimum (column 3), first quartile (column 4), median (column 5), third quartile (column 6) and maximum (column 7). Note that the "Water" land type refers to the main channel. Note also that there is no occurence of the "low-height agricultural areas" over this domain. and over the main channel (row 8): mean (column 1), standard deviation (column 2), minimum (column 3), first quartile (column 4), median (column 5), third quartile (column 6) and maximum (column 7). Note that the "Water" land type refers to the main channel. Note also that there is no occurence of the "low-height agricultural areas" over this domain.  The control vector x contains the 7 Strickler coefficients for the main channel and the 7 Strickler coefficients for the floodplain used in Mascaret for a hydraulic simulation. It is initialized with the friction coefficients issued from the methodology 505 presented in Section 2.1 averaged over the 7 zones between observation gauges. At a given assimilation cycle k, the prior version of the control vector, before assimilation, is denoted x b k , with b standing for "background". As for any Kalman-type DA algorithm, the EnKF comprises two steps per assimilation cycle k: 1. A prediction step, during which the control vector is propagated from time t k to t k+1 using the numerical model M k,k+1 -here Mascaret.   2. An analysis step, generating the analysis control vector denoted x a k , with a standing for "analysis", updated given the prior control vector, the available observations y o k and their associated uncertainies, gathered in matrices B k for the control vector and R k for the observations. Moreover, when the control variables are of a different nature of the observations, as in our current experiment where control varaibles are Strickler coefficients and the observations are water surface elevations, an observation operator denoted H k (and H k for its matrix form) is used to map the control variables 515 in the observation space, giving, the Kalman-type analysis equation: Now, the EnKF specificities state that the control-variable-related error covariance matrices can be stochastically estimated from an ensemble of simulations. Instead of working with a single instance of the control vector, the EnKF works with a control ensemble X b e,k containing n e instances of the control vector, each drawn from a chosen random distribution characterizing the 520 uncertainty model: Note that in our application, we chose n e = 100 to ensure an large-enough ensemble regarding sample errors but still reasonable to avoid long execution time. In SMURF, a uniform law is used to perturb the control vector such as: 525 where x b k is the mean prior control vector from the previous-cycle control ensemble and p is the range around the mean values. Recall that p was set to 50 for Ksmaj and 30 for Ksmin. Note that this implementation implies that each control variable in the control vector is natively independent from all other variables.
Along with the control ensemble X b e,k , its counterpart in the observation space is then generated giving: Using these two ensembles, control-variable-related error covariance matrices are directly estimated as: SMURF then apply the principle of observation randomization (Burgers et al., 1998) where the observation vector is also 535 randomized such that: Y o e,k = y In the litterature, there are two ways to apply Equation D1. One can apply it individually to each member to generate an analysis control ensemble or one can apply it to the mean of the background control ensemble. In SMURF the second option is implemented.

540
Finally, note that, in the classical definition of the Kalman filter, there exists an analysis equation to also update the background error covariance matrix B k from one assimilation cycle to another. In SMURF original implementation, decision was made not to use this equation and keep the initial definition of the control variable errors. This implementation allows to maintain enough dispersion in the control ensemble and ensure EnKF efficiency. However, for our application, we decided to propagate the analysis control error matrix from one assimilation cycle to another.