Reply on RC1

The objective of the paper is not clear or basically wrong. For England, an accurate appraisal of flooded areas and depths for the 100 year flood is available on a majority of locations so that the results of the calculations are useless. Such calculations as proposed in the paper can be useful for more extreme floods for which the uncertainty is always higher because of both the uncertainty of the flow and the lack of calibration data. They may be also useful for countries in which the data are scarce, maps of historical floods are lacking and for which simplified calculations could permit to obtain a whole coverage of the country without costly studies. If the ultimate objective is one out of the two quoted here above, the text should be reoriented in order to be sure that additional data for extreme floods or raw data would be available.

The objective of the paper is not clear or basically wrong. For England, an accurate appraisal of flooded areas and depths for the 100 year flood is available on a majority of locations so that the results of the calculations are useless. Such calculations as proposed in the paper can be useful for more extreme floods for which the uncertainty is always higher because of both the uncertainty of the flow and the lack of calibration data. They may be also useful for countries in which the data are scarce, maps of historical floods are lacking and for which simplified calculations could permit to obtain a whole coverage of the country without costly studies. If the ultimate objective is one out of the two quoted here above, the text should be reoriented in order to be sure that additional data for extreme floods or raw data would be available.
The purpose of the study is to test whether a local inertial approximation to the shallow water equations, combined with a sub-grid topographic parametrisation, is capable of simulating flood inundation over a large area for use in a landsurface model. The aims of the study are stated on p2. lines 18-25 and we will strengthen the statement further in a revised version of the paper.
Land-surface models represent fluxes of water and energy in the atmospheric boundary layer and serve as the lower boundary to weather forecasting and climate models. They therefore require specification of surface properties to calculate latent vs. sensible heat exchange and can be affected significantly by the presence or absence of terrestrial open water on floodplains. Correct specification of the land boundary is therefore essential to calculate these fluxes correctly. As part of weather forecasting and climate models, LSMs are also increasingly used to provide situational awareness of potential flood hazard over large areas, further motivating our study.
To perform this evaluation, we use existing Environment Agency flood inundation data to test whether the model's predictions match an existing benchmark. There is no suggestion that the results from this evaluation against the benchmark data are intended to replace the benchmark data (as we state explicitly on p.2 lines 20-25).
The structure of the paper is also not clear. First, the organization of the paper is not provided at the end of § 1. Second, the method (the best one) should be first described and second, the validation (or calibration) of the results should follow. Third, the discussion could question some aspects of the method and/or the efficiency of the method comparing it to other methods. However, the paper is not organized like this. It seems to me that the paper begins by discussing te one and then the other component of the method. For instance, if table 1 sums up the comparison between alternatives, a conclusion should be provided just below; in the paper, oppositely, one of the three methods is compared to an other estimate on a data set that is not described and appears to date from before 1991 (30 years ago !). If such data set is a reference, other references should be provided; if not I guess that the conclusions from this data set are questionable similarly to other studies that tried to link bankful depth to drainage area (they are a lot not quoted here!). Similar procedure is used later on for wbf (instead of hbf) without more explanations and any clear justification.
We thank the reviewer for their comments and will revise the structure of the paper accordingly. In the circumstances we propose to remove the analysis of the parameter-based sub-grid model (which performs less well) from the revised manuscript to make room for additional uncertainty experiments suggested below and by Reviewer 2.
Our rationale for using the original flood depth estimation procedure referred to in the 1991 study is to remain as close as possible to the method used to construct the benchmark data. This is explained on p3. lines 13-14. The justification for this approach is so that errors and uncertainties diagnosed from our comparison can then be attributed to the structure of our model rather than to differences in the driving data used. During the development of this work, we did also trial another dataset produced by estimating flood discharges from flow records across the United Kingdom as part of the Flood Estimation Handbook. We will include a substantial section showing additional results from that analysis in the revised manuscript.
In parallel work we have collated additional width observations and whilst we did not intend to use them in the present study, we will include them in a revised version of this manuscript given the interest in updated width observations.
The third parameter to be estimated is the channel roughness: I really do not understand the few lines of explanations (how estimate roughness from a database of river crosssections?) but I retain that it is so difficult that the authors are using a uniform value of 0.04. Similarly, I could not understand what is the relaxation time at the bottom of page 7 but I am quite sure that this parameter is not linked with the there previous ones and thus the explanation is not at the right place.
Channel roughness was calculated using a standard procedure described by Chow et al. together with a newly-compiled database of bed texture observations. We used a reference value of 0.04 only where no additional information was available. The procedure is explained in detail in lines p7 lines 7-13 and we propose to add further clarification in revision.

The relaxation time computed in Eq 8 is a time for the model to run to reach its steady state. It is not connected to the roughness calculation but determines the length of model integration required to achieve our aims. We will clarify the purpose of this equation in a revised submission and move it to its own section.
2.4 describes two sub grid parametrization; however, because the detailed procedure of the whole method was never presented, the reader cannot understand the advantage of any sub grid parametrization compared to the solution to establish directly a relation between depth and inundated area from the DTM source.
We will add some additional explanation here, including a diagram showing how the sub-grid information have been used.
From §2.5, I understand that the authors are not using the maps of the flooded area but a percentage of flooded area per Km2 to validate their model. However, form §2.6 I understood that the plot for comparison is 50 m x 50 m. What is correct? I ask the authors clearly explain what means a hit rate of 71%, a FAR of 9% and a success score of 67% in simple words. The following discussion between the various regions does not interest me because the tables 4 and 5 show similar results from one region to another one. It might be more useful to show quantitative results at smaller regions (subregions) for which the results are very different.
The model uses a sub-grid topographic distribution (here at 50 m resolution) to calculate flooded fraction for each 1 km grid box. The evaluation metrics are used with their standard definitions given in the text. Hit rate is the probability of detection of flooding in the 1 km grid box; FAR is the false alarm rate, and the critical success index is a metric which accounts for true positives but penalises for false negatives, as defined in the text. We will amplify this explanation in revision and propose to show an analysis of flooded fraction across the model domain as suggested in SC1. In addition, we will supply model performance metrics for sub-regions given in Figure 9 as the reviewer suggests.
For such a type of model, the sensitivity analysis is a key issue and so a wider sensitivity analysis was expected. Moreover, I could not understand why the results are not sensitive to the Manning coefficient: once the geometry is provide transforming a flow into a flooded area or flood depth should depend on the Manning coefficient if hydraulics equations are used.
We thank the reviewer for this suggestion and propose to add a broader range of sensitivity analysis to the revised paper both in response to this comment and a similar suggestion from Reviewer 2 to consider topographic uncertainty. It is indeed notable that the sensitivity to Manning's n is more subdued than initially expected and we will explore this in more detail with a broader range of analysis in revision.
The first sentence of the conclusion should be written in a different way in order to avoid confusion: I understood that the authors calculated the percentage of flooded area for their studied area quite accurately but they are not providing a map of flooded areas accurately and are not providing at all the peak water depth of the 100 year flood in any location accurately. The objective (for validating the method) should be to provide such results as the ratio of the water depths: reference over calculation for any location (for instance on the 50 m grid).
We thank the reviewer for this important point and will add a comprehensive regional analysis comparing flooded fraction between the model and the benchmark to a revised submission. Unfortunately, flooded depth is not given in the benchmark data and so it will not be possible to include that variable in the analysis.
As a conclusion, I guess that with a new organization of the paper, adding explanations and real validation of their method, the authors could obtain a valuable paper. However, I am not sure that the method is useful for England and that the method can be extended to other countries because a lot of "morphological" equations are very specific to the local geography.
We are pleased that the reviewer sees potential value in the paper and grateful for their suggestions to improve it. We will implement all the recommended changes in a revision. Whilst the morphological equations are specific to the study region, they are really only used to ensure that the test case is as close as possible to that used to produce the benchmark data and do not preclude wider application of the model.