Comment on hess-2021-161

In their paper the Authors have examined the role of rainfall spatial and temporal variability in flood frequency across drainage scales in the highly-urbanized Dead Run watershed (14.3 km2) outside of Baltimore, Maryland, USA with the use of a flood frequency analysis framework that combines stochastic storm transposition-based rainfall scenarios with the physically-based distributed GSSHA model. The results they obtained show the complexities of flood response within several subwatersheds for both short (< 50 years) and long (> 100 years) rainfall return periods. The Authors revealed that the impact of impervious area on flood response decreases with increasing rainfall return period and for extreme storms, the maximum discharge is closely linked to the spatial structure of rainfall, especially storm core spatial coverage. The spatial heterogeneity of rainfall increases flood peak magnitudes by 50 % on average at the watershed outlet and its subwatersheds for both small and large return periods. According to the Authors, the results imply that commonly-made assumption of spatially uniform rainfall in urban flood frequency modeling is problematic even for relatively small basin scales.


Overall remarks
In their paper the Authors have examined the role of rainfall spatial and temporal variability in flood frequency across drainage scales in the highly-urbanized Dead Run watershed (14.3 km2) outside of Baltimore, Maryland, USA with the use of a flood frequency analysis framework that combines stochastic storm transposition-based rainfall scenarios with the physically-based distributed GSSHA model. The results they obtained show the complexities of flood response within several subwatersheds for both short (< 50 years) and long (> 100 years) rainfall return periods. The Authors revealed that the impact of impervious area on flood response decreases with increasing rainfall return period and for extreme storms, the maximum discharge is closely linked to the spatial structure of rainfall, especially storm core spatial coverage. The spatial heterogeneity of rainfall increases flood peak magnitudes by 50 % on average at the watershed outlet and its subwatersheds for both small and large return periods. According to the Authors, the results imply that commonly-made assumption of spatially uniform rainfall in urban flood frequency modeling is problematic even for relatively small basin scales.
I deeply admire the effort the Authors made while preparing their article. I have found it very interesting and thought provoking. In my opinion the paper is relatively well written, presents interesting and appealing (from the practical point of view) approach to the analysis of rainfall-runoff processes in smal urban catchments, and it may be inspirational for scientists performing similar analysis in other cities. I am also pleased to say, that the paper was written with care. Here, it this review let me just concentrate on some issues that I believe, could be corrected.
Heaving read the paper, my first impression was that, however interesting, the paper is a bit too wordy in some parts (e.g. introduction or discussion) and its content would be 'squeezed' by a page or two.
The first chapter roughly describes the problem of the translation of rainfall spatiotemporal distribution into flood responses. In their work the Authors concentrate on small urban area advocating that due to the complexity of the hydrologic and rainfall spatialtemporal conditions on flood frequency analysis in smaller urban areas still provides a room for scientific investigation. Although the Authors noticed that this is not a new scientific issue, they claim that the novelty of their research stem, inter alia, from better understanding of spatial and temporal diversity of rainfalls and generation of runoff which is now possible because of modern techniques of rainfall monitoring and modelling of hydrological processes. They pointed out also that the problem of flash inundations in especially small urban areas has not received proper attention among the researchers. The Authors present exhausted literature review in this field, however some of the papers they cited are not new and perhaps the whole list could be completed by newer research results.
In order to improve the modelling of the spatio-temporal rainfall conditions in the small cachments the Authors apply the Stochastic Storm Transposition method which combined with hydrological models can be used for multiscale rainfall frequency analysis and flood frequency analysis.
In this chapter, the Authors state two scientific questions (page 3, lines 69-72): '(1) How does flood frequency in small urban watersheds vary with diverse space-time rainfall structure and rainfall magnitude? (2) Among the space-time feature of rainfall, what are the dominant features that control flood peak distribution in small urban watersheds?' In my opinion these questions cannot contribute to the development of the hydrological sciences but deal rather with the use of already fossilised knowledge on the rainfall-runoff processes, especially in relatively poorly recognised conditions of a (just one) small highly urbanised area. I suggest to rephrase the main goal of the research in order to emphasise a new approach to solving the problem of modeling the processes of flooding in urbanized catchments within more general context.
The second chapter presents the data and the case study area which is the highlyurbanized 14.3 km 2 Dead Run (DR) watershed located west of Baltimore, Maryland, USA. The Authors chose this area because its wealth of data is exceptional to examine rainfall and hydrologic response, and some preliminary research was already performed in this region, too. It is to me unclear why the Authors decided to carry out their research on watershed rather than within a catchment which is easier to model because of water balance relationships. As an input to the models the radar-detected rainfall data corrected with the use of 54 rain gauges in and around Baltimore City were applied. However, the authors refer the reader to the publication by Zhou et al. (2019) for the detailed description of the methodology of bias correction, it would be good if the Authors could shortly describe these techniques in their paper within a paragraph or two which might improve coherence of the publication as well as underline its highlights. The discharge data for six gages located in the case study area with a resolution of five minutes from the U.S. Geological Survey (USGS) were used for identification of the runoff model parameters. The problem is, however, that only one of the six gauges provides reasonably long datasets of discharges, the rest 5 provide data from 2008, which in my opinion is too short to observe any long-term alterations in rainfall-runoff regime of this region, especially in the area where the intensive urbanisation processes have been occurring. Moreover, the dataset for such a short period should not be applied in the FFA. Section 2.2 describes the two-dimensional GSSHA model to simulate multi-scale flood response, whose structure for the Dead Run was created by one of the Authors in 2015 and later modified for the purpose of the new research. The rainfall scenarios were shortly described in section 3.3. The RainyDay, an open source SST software package was used for this purpose. From the text one can infer that the Authors use ready-to-use models and techniques to perform their case study research. Obviously, these techniques might have needed some modifications and adjustments to the specifics of the Dead Run region, but it is not clearly stated in the paper whether these modifications go beyond the regular adaptation of the models to the case study.
The 2.4 subsection depicts the characteristics of rainfall and hydrologic response. The methodology presented in this section suggest averaging the rainfall over the whole modelled area (Eqs. 1 -3) which in my opinion leads to the averaging the results and ruins the spatial diversity of the rainfall events and the local catchment responses characteristic for urban area with highly differentiated land cover. The methodology presented in this subsection deserve more comment.
Chapter 3 discusses the results of the simulations. Unfortunately it is hard to assess the accuracy of the models calibration, because they are described in appendix A which I could not find. However, the Authors admitted (graphs in Fig 2) that the differences of the estimation of the peak flows range from -35 to 57% of the peak hight (probably in m3/s) which means that the models used are technically useless, even though they perform reasonably good for average discharges. The temporal modelling results are closer to reality and differ only by a quarter from the actual peak time. The misleading estimation of the flood magnitude influences the conclusions presented in the paper. Perhaps use of other modelling tools or better identification of parameters would improve the simulation results. Also the quality of data would affect the mistakes. Obviously, the radar-based rainfall data cannot achieve the accuracy compared to on-ground pluviometric monitoring service.
The Authors claim that the rainfall return periods were calculated by meand of nonparametric kernel function method for periods up to 200 years. It is not clear what dataset (a few-years-long measurements?) were used for this estimation and what parameters of the kernel function were applied. The comments based on this model's results are either trivial e.g. 'For the 200-yr rainfall return period, the interquartile range (IQR) is larger than other return periods.' or at least strange 'Unlike the IQR results, CV decreases with increasing return period', as the uncertainty and thus variability grows rather with the return period of the estimated quantile, in any catchment.
In my opinion, however interesting, the results obtained for the Dead Run case study cannot be easily generalised for the similar catchments even though one used the same models and techniques. I would suggest to compare the results with other similar catchments to make it more universal. Otherwise the paper would attract only local interest. Having read the discussion I could not resist the impression, that the Authors would like to analyse too many complex phenomena by means of (too) short and insufficient data and tools with serious limitations of their use. As a result the simulations are stricken by large uncertainty or evident mistakes (as with the CV-quantile return period relation). Perhaps concentrating on one event in one place (e.g. catchment response to the torrential rainfalls) would account for the quality of the paper. On the other hand, when almost all input parameters are in fact modelled (e.g. rainfall data based on radar measurements, 200-year quantiles based on short datasets) one cannot expect credible accuracy of the results. On top of that, poorly estimated models generate some extra bias additionally increasing the uncertainty of the results.
The last chapter concludes the paper. This chapter summarises the text in a concise way in the form of pin-points but I lack any reference to the universality of the obtained results. I would expect any 'take home' recommendation for other hydrologists and practitioners. I am not also sure whether Authors managed to provide responses to the two questions stated in the first chapter.

Specific comments
The Authors refer to the Appendix A, which I could not find. Map in Fig 1. is unreadable. What are the blue lines? Where is the Dead Run watershed? Misspellings in the map's legend. Figure 3 is incoherent.

Technical remarks
The size of fonts varies throughout the manuscript. Very difficult to assess the results when graphs and tables are attached at the end of the file instead in their original place (scrolling necessary, or printing).

Summary and recommendation
In my opinion the paper needs substantial corrections to meet the standards of the HESS. The novelty of the applied methodology is dubious and ther results obtained are unreliable. I would suggest also to re-phrase some parts of the article, because it is easy to lose the thread in the number of parameters, models, and methods applied in the study. Besides, the results should be generalised for other similar areas, otherwise they are only of local interest.