Merging modelled and reported flood impacts in Europe in a combined flood event catalogue for 1950–2020

Paprotny, Dominik; Rhein, Belinda; Vousdoukas, Michalis I.; Terefenko, Paweł; Dottori, Francesco; Treu, Simon; Śledziowski, Jakub; Feyen, Luc; Kreibich, Heidi

doi:10.5194/hess-28-3983-2024

Articles | Volume 28, issue 17

https://doi.org/10.5194/hess-28-3983-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Special issue:

Methodological innovations for the analysis and management...

https://doi.org/10.5194/hess-28-3983-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 28, issue 17

Research article

| Highlight paper

|

02 Sep 2024

Research article | Highlight paper |

| 02 Sep 2024

Merging modelled and reported flood impacts in Europe in a combined flood event catalogue for 1950–2020

Dominik Paprotny, Belinda Rhein, Michalis I. Vousdoukas, Paweł Terefenko, Francesco Dottori, Simon Treu, Jakub Śledziowski, Luc Feyen, and Heidi Kreibich

Download

Final revised paper (published on 02 Sep 2024)
Preprint (discussion started on 23 Feb 2024)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-499', Anonymous Referee #1, 26 Mar 2024

Summary
The modeling framework outlined and implemented in this paper leverages recent advances in hydrodynamic modeling to create a spatially explicit database of almost 15,000 flood events over a 50-year period in 42 European countries. Beyond this accomplishment, the authors further classify whether the spatial extents and potential impacts of these events are reasonable by conducting a document-based search for historical information on the events included in their generated record. Furthermore, they leverage recent published databases (e.g., HANZE 2.1 and Global Flood Database) to demonstrate how the coverage of events modeled in this framework compares to events modeled (or identified in the case of HANZE 2.1) using different methodologies. I consider both the model framework and efforts to classify and compare the spatial extents and potential impacts of flood events generated in this paper a novel contribution to the discussion and evaluation of how flood risk trends have changed over time. Additionally, as the authors note, this kind of dataset of previous flood events can be used in the future to investigate how different drivers of flood risk—such as climate, land use, and demographic changes—have influenced outcomes of flood risk through time. Few studies have had the necessary data to attribute losses from flooding to specific drivers and the record of events created in this paper constitutes a step forward to investigating these questions.
The results presented in this paper are also compelling regarding the changes in flood impacts over time and the comparison of spatial impacts with external sources. The temporal dynamics of flood potential impacts, as reported in Table 6 in the manuscript, underscore how changes in exposure, whether driven by economic growth or demographic dynamics, have played a significant role in the increase in average potential impacts from flood events over time. The spatial comparison of flooded area and numbers of impacted persons for 20 events across (1) reported impacts from HANZE 2.1, (2) modeled impacts from this study, and (3) modeled impacts from the satellite-derived Global Flood Database highlights that satellite-derived estimates systematically underestimate flooded areas by event and therefore exposure. While the authors of this study are careful to note that the estimates generated in this paper are likely an overestimation of flooding as they represent potential flooding and exposure, I agree that this kind of overestimation is still useful for analyzing trends in flood impacts over time. I also think the database of potential floods and impacts could be used in the future to analyze levels of flood protection through time if data on flood protections measures becomes available.
Overall, I find this paper presents a well-constructed modeling framework for recreating potential flooding and impacts across a large geographic area over a significant period of time. The authors clearly state the research gap they are filling with this work and employ a comprehensive approach that leverages recent advances in hydrodynamic models and external sources of flood information for comparison. I recommend minor revisions to further clarify certain methodological aspects of the paper and interpretation of results. I have included some comments/questions that I would ask the authors to consider below. I have also provided a few additional technical comments specific to the text in the attached document.
Comments/Questions
To improve the clarity of the steps included in the methodology section of the paper, I would suggest converting Table 1 to a flow diagram. Examples of such figures are included in Bates et al. 2021 (Figure 1) and Collins et al. 2022 (Figure 1). This modification would provide a visual and concise overview of the models, data, and filtering used within the different stages of the method section.
In reading through the methods section of the paper, I had a question in section ‘2.2.4 Deriving coastal flood footprints’ regarding the use of return periods for modeled depths and extents of identified flood events. In this section the text mentions that return periods (2, 5, 10, 20, 30, 50, 100, 200, and 500 years) are used for coastal inundation modeling at each coastal segment using Lisflood-ACC at 30m resolution spanning 200km landwards. Then in Line 162 the text states “Total water level of each segment-level flood event is linked with the water level used to generate flood hazard maps for each segment.”
Hypothetically, does this mean that for a coastal segment with an event where the total water level is 15 ft, the depths of water for the flooded area of this event are interpolated between return periods? For example, if the 10-year return period has a water level of 10ft and the 20-year return period has a water level of 20 ft; then the depths associated with an event with a water level of 15 ft at that segment would be the mean depth between the 10-year and 20-year return period maps? Furthermore, are the extents of these hazard maps consistent between return periods? If not, how is the area of inundation interpolated between return periods? These questions aim to clarify how flooded area and depths are interpolated between return periods. I have similar clarification questions regarding interpolation between return period hazard maps for section ‘2.3.4 Deriving riverine and compound flood footprints.’
In the results section ‘3.2.1 Temporal changes in potential flood impacts’ there are observed increases in both the number and impact of events across all three event types shown in Table 6. However, the text in this section references percent changes and values that are not present in Table 6. To enhance clarity of results, it would be helpful to reference the values included in Table 6. For example, in Lines 469-270, based on the information provided in Table 6, the sentence should read as follows: "…they are equivalent to at least a 164% increase in potential coastal flood losses in an average year between 1950 and 2020 in the case of fatalities, 852% in the case of economic loss, and 83% in the case of affected population." If the current figures in the text are accurate, clarification on how these values were calculated would be valuable to improve clarity of the magnitude of these trends. Additionally, according to Table 6, the potential impacts for compound events appear to have increased more substantially than riverine and coastal events while the opposite is indicated in Lines 471-472.
References:
Bates, P. D., Quinn, N., Sampson, C., Smith, A., Wing, O., Sosa, J., et al. (2021). Combined modeling of US fluvial, pluvial, and coastal flood hazard under current and future climates. Water Resources Research, 57, e2020WR028673. https://doi.org/10.1029/2020WR028673
Collins, E. L., Sanchez, G. M., Terando, A., Stillwell, C. C., Mitasova, H., Sebastian, A., & Meentemeyer, R. K. (2022). Predicting flood damage probability across the conterminous United States. Environmental Research Letters, 17, 034006. https://doi.org/10.1088/1748-9326/ac4f0f

Citation: https://doi.org/10.5194/egusphere-2024-499-RC1
- AC1: 'Reply to RC1', Dominik Paprotny, 04 Apr 2024
  
  We would like to thank the reviewer for taking the time to analyse our paper and for the many useful comments. We are also thankful for the detailed “Summary” section of the review, highlighting the most important aspects of our study. We also hope that the data will be useful in future research. We will also try to follow-up shortly with flood protection and vulnerability data built upon this dataset. We have provided a detailed response to the comments in a separate file (the supplement).
  
  Citation: https://doi.org/10.5194/egusphere-2024-499-AC1
RC2:
'Comment on egusphere-2024-499', Anonymous Referee #2, 16 Apr 2024
The authors are to be commended for a huge volume of impressive work, which I am sure will stand as an important milestone in large-scale flood risk literature in charting the extent to which we can understand different risk components and how they have changed. It is very well written, structured, and presented, with probably just the right level of detail given the complexity / convolution of the methods. The construction of a modelled European fluvial-coastal flood event set, paired with historical flood obervations, with robust attempts at validating individual components, is an excellent contribution to our field. I have the benefit of following a detailed public review from Anonymous Referee #1, who summarise the paper well (and whose points of revision I agree with) and I have little cause to repeat it.
I have the following specific comments for the authors to consider:
I think it would be helpful to be more explicit about what the modelled events actually are and how they can be validated with HANZE. It is quite a forgiving test of a model framework to reward the replication of an event at such a coarse level (NUTS3), and so it would be useful to get some commentary on the physical reality in the event set as a whole and what the pairing/comparison process does and does not tell us. For example, are the August 2002 floods well replicated by flooding the wrong regions but biases of different sign cancelling out?

There's no need to shy away from the inevitable subjectivity involved in creating these datasets. It could be clearer which data & method choices are well grounded, versus thosee which are based on assumptions or judgements. Elements such as the 2-day break between events, the various thresholds, the NUTS3 spatial limits on events: it could be clearer why these are chosen.

I am not sure I agree with the description of 'compound' events. To me, one hazard has to affect another to describe an event as compound. The events in this study are generated with riverine and coastal hazards independent from one another and so can only really be described as 'co-occurring'.

What is 'persons affected' in HANZE? Is an intersection of population and modelled flood data the same as a report of persons affected (who may be 'affected' because their road flooded and they couldn't go to work, but weren't flooded themselves). I'm not sure this quantity is particularly helpful, and cannot be modelled when only considering direct impact.

I think the paper is long enough, but the authors could consider adding (or discussing) the sensitivity of some of their choices. Would the conclusions change entirely if you bumped the quantile threshold to 99.9%, the depth threshold to 30 cm, the window between events to 3 days? It is important for a reader to understand that only a snapshot of possible results are presented.

The lack of flood protection data is a real problem, but I think it is conjecture to say or imply that the disparities between model and reality can be explained (solely) by flood defences. We don't actually know this, and I feel the authors use this slightly as a convenient excuse. Similarly, the idea that flood protection standards can be back-calculated on the basis of very coarse analysis of flood events is not strictly true. All such an approach would create is effective calibration parameters, which in reality will compensate for other model errors. The same can be said of the potential risk component attribution use cases. It would be helpful if the authors unstitch some of these points in their concluding remarks.

Again, the lack of flood protection data does make any attempt at validation almost meaningless (the authors acknowledge this). I think there would be scope for more detailed evaluation for select areas/regions where understanding of flood protection standards is relatively well constrained (e.g. parts of UK or Germany?) – this could give confidence that the framework as a whole is performative in spite of data limitations. Similarly, we know the Global Flood Database from Tellman et al. is very difficult to apply for event validation: it just does not contain visually realistic flood events in many cases. I think that section would be better termed a 'comparison' rather than 'validation', and the authors could look to validate against more detailed observational data in select countries (there are at least some datasets out there).

Very minor comments
Section 1:

Very minor comment, but the Rentschler et al. (2023) reference is a replica of the original study by Andreadis et al. (2022) (doi:10.1088/1748-9326/ac9197) and so it would be best to cite the latter.
Section 2.2.3:

Could you describe in a bit more detail what the 'coastal segments' represent and therefore whether the derived footprints would have a realistic spatial structure?

What is the rationale for the 99.6th percentile (compared to the 98th for riverine)?

What is the rationale for the 2-day break for event separation?

Does this method mean that an event cannot occur across different NUTS3 regions?
Section 2.3.4:

I don't follow the water depth extrapolation procedure for when the return period is less than 10 years. Is it realistic to extrapolate the slope from (e.g.) the 10 and 20 year depth (if that is what is meant)?
Section 3.3.1:

Why is relative discharge more important than absolute? If so, one could reasonably expect the validation to focus on the discharge which drove the flood maps rather than the event clustering.
Citation: https://doi.org/10.5194/egusphere-2024-499-RC2
- AC2: 'Reply on RC2', Dominik Paprotny, 26 Apr 2024
  
  We would like to thank the reviewer for taking the time to read our paper, for several important comments, which will enable us to clarify important methodological points, and further for the overall positive assessment. Below, we respond (R) to the more detailed comments (C) of the reviewer.
  C1: I think it would be helpful to be more explicit about what the modelled events actually are and how they can be validated with HANZE. It is quite a forgiving test of a model framework to reward the replication of an event at such a coarse level (NUTS3), and so it would be useful to get some commentary on the physical reality in the event set as a whole and what the pairing/comparison process does and does not tell us. For example, are the August 2002 floods well replicated by flooding the wrong regions but biases of different sign cancelling out?
  R1: In our model framework, there are no ‘wrong’ regions as such, as we want to include both regions were the flood impact happened and those were they did not despite the event happening there as well from only a hydrological perspective. We flag false positives from such hydrological perspective only at event level, as it would require far more detailed data collection (if they available at all) to determine if in individual regions constitute hydrological ‘false positives’. Also, our comparison in 3.3.3 was limited the regions of the impacts known from (HANZE), there it shows the accuracy not affected by ‘false positive’ regions.
  C2: There's no need to shy away from the inevitable subjectivity involved in creating these datasets. It could be clearer which data & method choices are well grounded, versus those which are based on assumptions or judgements. Elements such as the 2-day break between events, the various thresholds, the NUTS3 spatial limits on events: it could be clearer why these are chosen.
  R2: Unfortunately, we did not save our test runs to show the statistics, but we can rerun our integrated modelling chain with modified parameters to highlight how those choices affect the aggregation of events in particular. Overall, 1-day break led to very long flood events containing multiple waves, while a longer break divided observed flooding into smaller sub-events. Both would create difficulty in comparing the modelled and observed impact zones and impact magnitudes. The use of NUTS3 regions is related to the practical availability not only of impact data, but also socio-economic data driving exposure and (for follow-up analysis) vulnerability. We will highlight those aspects in the text.
  C3: I am not sure I agree with the description of 'compound' events. To me, one hazard has to affect another to describe an event as compound. The events in this study are generated with riverine and coastal hazards independent from one another and so can only really be described as 'co-occurring'.
  R3: We agree with the reviewer, in our view a compound events need require interaction of the riverine and coastal drivers. However, we only determine this interaction in context of impacts. In the HANZE database, the events are considered ‘compound’ only if such an interaction was judged to be relevant for the outcome of the flood. Otherwise, it was classified as a single-driver flood. In accordance to this, if a potential ‘compound’ flood is indicated in the catalogue, but impacts can be attributed, say, only to a coastal flood, the compound event was classified as ‘no impact’ (category C), the corresponding coastal event as impactful (category A or B) and the corresponding riverine event as ‘no impact’ (category C). Similarly, if a single-driver event was found to be a ‘false positive’ (category E), the corresponding compound event was also classified as ‘false positive’. We will add this information to the text.
  C4: What is 'persons affected' in HANZE? Is an intersection of population and modelled flood data the same as a report of persons affected (who may be 'affected' because their road flooded and they couldn't go to work, but weren't flooded themselves). I'm not sure this quantity is particularly helpful, and cannot be modelled when only considering direct impact.
  R4: In HANZE ‘persons affected’ refer to the number of people whose houses were flooded, or the number of persons evacuated if the preferred statistic is not available. We assume that all resident population within the potential flood zone as affected and corresponding to the HANZE statistic. This approach is likely to overestimate the affected numbers if houses are adapted to lower
  C5: I think the paper is long enough, but the authors could consider adding (or discussing) the sensitivity of some of their choices. Would the conclusions change entirely if you bumped the quantile threshold to 99.9%, the depth threshold to 30 cm, the window between events to 3 days? It is important for a reader to understand that only a snapshot of possible results are presented.
  R5: As noted in R2, we will make additional model runs with modified parameters to provide the specific numbers on the sensitivity of the catalogue to those choices.
  C6: The lack of flood protection data is a real problem, but I think it is conjecture to say or imply that the disparities between model and reality can be explained (solely) by flood defences. We don't actually know this, and I feel the authors use this slightly as a convenient excuse. Similarly, the idea that flood protection standards can be back-calculated on the basis of very coarse analysis of flood events is not strictly true. All such an approach would create is effective calibration parameters, which in reality will compensate for other model errors. The same can be said of the potential risk component attribution use cases. It would be helpful if the authors unstitch some of these points in their concluding remarks.
  R6: We would like to highlight that we realize the accuracy problem and do not aim to be able to reconstruct flood protection levels below the level of NUTS3 regions. We didn’t include flood defenses because including too high standards could filter out events that caused real-life impacts. We aimed at identifying the impacts based on the observations as much as possible. At the same time, the difference between observed and potential impacts is not only due to flood protection, but also the local level of vulnerability. We will highlight those issues in the conclusions, as they are indeed important for users of the data. We might add that we also explore both flood protection and vulnerability modelling based on the catalogue in detail in a follow-up preprint (https://doi.org/10.21203/rs.3.rs-4213746/v1 ), where we show that reconstructing both on our empirical data is possible, even if on a more aggregated (NUTS3) level.
  C7: Again, the lack of flood protection data does make any attempt at validation almost meaningless (the authors acknowledge this). I think there would be scope for more detailed evaluation for select areas/regions where understanding of flood protection standards is relatively well constrained (e.g. parts of UK or Germany?) – this could give confidence that the framework as a whole is performative in spite of data limitations. Similarly, we know the Global Flood Database from Tellman et al. is very difficult to apply for event validation: it just does not contain visually realistic flood events in many cases. I think that section would be better termed a 'comparison' rather than 'validation', and the authors could look to validate against more detailed observational data in select countries (there are at least some datasets out there).
  R7: The validation is indeed difficult due to both the framing of our study and data availability. As we look at potential impacts, useful validation would require rather the opposite: information from floods which were not so much constrained by flood defences. Also, we do realize the severe limitation of remote-sensed floods, however they are commonly used in the field, very often with the incorrect assumption that they represent the ‘ground truth’. We will change the name of the section not to imply validation. However, we would be also open to the reviewers’ recommendation of additional comparison datasets. There is a large dataset (Recorded Flood Outlines) available for England, but it seems too incomplete and heterogeneous for analysis.
  C8: Section 1: Very minor comment, but the Rentschler et al. (2023) reference is a replica of the original study by Andreadis et al. (2022) (doi:10.1088/1748-9326/ac9197) and so it would be best to cite the latter.
  R8: We were not aware of that study and we will add the reference to it.
  C9: Section 2.2.3: Could you describe in a bit more detail what the 'coastal segments' represent and therefore whether the derived footprints would have a realistic spatial structure?
  R9: The segments are of variable length, depending on the complexity of the coastline. They represent no more than 25 km of the coast (if completely straight), but usually about 15 km. They stretch up to 100 km inland, but far less for more complex areas such as deltas, estuaries, fjords, islands etc. We will make this clear in the text.
  C10: What is the rationale for the 99.6th percentile (compared to the 98th for riverine)? What is the rationale for the 2-day break for event separation?
  R10: The thresholds were empirically derived with the separation threshold partly to provide about 5 potential flood events per year, as in Vousdoukas et al. (2016a), but also to maximize consistency with observed flood impact catalogue (as described in R2). As for the difference between coastal and riverine percentiles, they are due mainly to the different temporal resolution of the models (1-hourly for coastal and 6-hourly for riverine). We will make this clearer in the text.
  C11: Does this method mean that an event cannot occur across different NUTS3 regions?
  R11: Not exactly, the events are first detected independently per each NUTS3 region. Only after deriving the flood zones per NUTS3-event, they are aggregated into one event if multiple NUTS3-events co-occur in time. However, this only includes NUTS3 regions within one country (at present-day boundaries), hence there are no transnational events in the catalogue.
  C12: Section 2.3.4: I don't follow the water depth extrapolation procedure for when the return period is less than 10 years. Is it realistic to extrapolate the slope from (e.g.) the 10 and 20 year depth (if that is what is meant)?
  R12: We indeed forgot to mention in 2.3.4 that we didn’t use return periods here, but river discharge scenarios directly. But, due to the logarithmic nature of the relationship between river discharge and water level, we used the natural logarithm of discharge as basis of extrapolation. This might somewhat overstate water depths at lowest scenarios, however we found that assuming zero water depth at 2-year return period and using that for interpolation with the 10-year water depth often led to very low potential impacts and very limited impacts zones compared to the observed data in HANZE database. We will add this information to the text.
  C13: Section 3.3.1: Why is relative discharge more important than absolute? If so, one could reasonably expect the validation to focus on the discharge which drove the flood maps rather than the event clustering.
  R13: In determining the return periods of peak discharge, which is used to identify potentially impactful floods, the overall bias of the model is less relevant than the ability to correctly model the magnitude of flood waves relative to normal conditions. Though the riverine flood maps were not calculated with the same discharge simulation as we used, they still used an earlier version of the same pan-European model with similar meteorological forcing, therefore they should mostly have similar biases.
  
  Citation: https://doi.org/10.5194/egusphere-2024-499-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (27 May 2024) by Anais Couasnon

Dear Dominik and co-authors,

Thank you for the submission of your very interesting manuscript “Merging modelled and reported flood impacts in Europe in a combined flood event catalogue, 1950–2020”.

As you know, two reviewers have now provided detailed reviews, which you have replied in thoughtful detail to. Both reviewers recommended minor revisions and therefore I would like to invite you to submit a revised version of your manuscript.

Would you please also provide an ‘author’s reply’ to the reviewers (feel free to use the same words that you used in what you have already uploaded). Please make sure to sufficiently explain is you have decided not to take some of the suggestions on board. Please can you also include a track changes document between the old manuscript and the new one (you can include this as part of your ‘author’s reply’).

In addition to the suggestions from the reviewers, I would like to suggest the following minor technical items:
(a) Can you mention how the coastal time series was detrended (line 145)?
(b) In general, can you provide a simple explanation about why detrending is necessary (both for the river discharge and the coastal water levels).
(c) In Table 1 (which will be converted to a diagram as you mentioned), the text "TWL/Q" is confusing at the step "Estimating flood footprint". I believe the authors mean "TWL or Q"?
(d) In Section 2.3, am I understanding correctly that the discharge and thus the riverine model is obtained from Tilloy et al. (2024)? This was clear to me only later in the text (line 206). If the river discharges is used directly from this dataset, it would be useful to add an introduction sentence mentioning this right at the beginning of Section 2.3. If this dataset is different than the one from Tilloy et al. (2024) then again an introduction sentence explaining the main difference is welcome to help the reader.
(e) Can you provide a simple explanation about the fit of the parameters for the General Pareto Distribution (i.e. according to which criteria, using which package etc.)
(f) Can the authors provide a statement about the code availability?

I look forward to seeing the next version of your manuscript and will send it out again to the previous reviewers.

Best regards,

Dr Anaïs Couasnon
NHESS Editor

Hide

AR by Dominik Paprotny on behalf of the Authors (28 May 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (07 Jun 2024) by Anais Couasnon

RR by Helena Garcia (11 Jun 2024)

RR by Anonymous Referee #2 (09 Jul 2024)

ED: Publish as is (10 Jul 2024) by Anais Couasnon

ED: Publish as is (10 Jul 2024) by Thom Bogaard (Executive editor)

AR by Dominik Paprotny on behalf of the Authors (11 Jul 2024)

Editorial statement

This paper is of great interest to the geoscience community and the broader public because it offers a comprehensive European flood event catalogue that merges historical records with modelled data, providing an extensive overview of coastal, riverine and compound flood impacts across Europe over seventy years. This will help enhance the accuracy and completeness of flood impact assessments, crucial for improving flood risk management and mitigation strategies. It also provides a milestone dataset for understanding changes in hazard, vulnerability and exposure for national, regional and continental flood risk assessments.

Merging modelled and reported flood impacts in Europe in a combined flood event catalogue for 1950–2020

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection