A deep learning technique-based data-driven model for accurate and rapid flood prediction

An accurate and rapid urban flood prediction model is essential to support decision-making on flood management, especially under increasing extreme precipitation conditions driven by climate change and urbanization. This study developed a deep learning technique-based data-driven flood prediction model based on an integration of LSTM network and Bayesian optimization. A case study in 10 north China was applied to test the model performance and the results clearly showed that the model can accurately predict flood maps for various hyetograph inputs, meanwhile with substantial improvements in computation time. The model predicted flood maps 19,585 times faster than the physical-based hydrodynamic model and achieved a mean relative error of 9.5%. For retrieving the spatial patterns of water depths, the degree of similarity of the flood maps was very high. In a best case, the difference 15 between the ground truth and model prediction was only 0.76% and the spatial distributions of inundated paths and areas were almost identical. The proposed model showed a robust generalizability and high computational efficiency, and can potentially replace and/or complement the conventional hydrodynamic model for urban flood assessment and management, particularly in applications of real time control, optimization and emergency design and plan. 20

convolutional neural network) to detect and segment the inundated areas in river channels. Guo et al. (2021) adopted a DCNN-based approach for urban flood prediction and was reported to achieve satisfying prediction accuracy and computation efficiency. Zhu et al. (2020) developed a probabilistic 60 long short-term memory (LSTM) network coupled with Gaussian process (GP) to improve the streamflow forecasting in the upper Yangtze River. Note that different from other popular deep learning algorithms, the LSTM network allows inputs of unequal dimensions/ lengths, which is especially suitable for processing time-series data, such as traffic flow  and power systems (Ciechulski and Osowski, 2021). All these studies have demonstrated remarkable capabilities of DL in automated data 65 feature learning with high prediction accuracy and efficiency. The reliability of the methods was also verified in the various types of applications.
Despite the advances of the studies, most of which focused on relatively large spatial scales and required several types of input data (e.g., rainfall, terrain, flow depth) for model predictions. So far, no study has explored the automated prediction of urban-scale flood inundation using the LSTM-based deep 70 learning techniques. This goal of this study is to provide a novel end-to-end method for a dynamic, rapid and accurate urban flood prediction for real-time evaluation and emergency decision-making. Given the uncertainty/unknown of rainfall events and the advantages of LSTM, we present a deep learning-based technique with an integration of LSTM network and Bayesian optimization. The inundation areas and water depths can be forecasted with only rainfall inputs. The method is tested in a case study in northern 75 China with various rainfall conditions. The developed method showed very promising prediction accuracy and low computation cost and is thus of great value to be used as decision making aids in urban flood evaluation and management.

Methodology and data
To examine the performance of the proposed approach, we firstly selected a case study and obtained the 80 relevant data describing the rainfall inputs, local topography and drainage systems. A coupled 1D-2D hydrodynamic model was employed to simulate the inundation areas and water depths under various design rainfall events. Then the DL technique-based prediction model was established and trained based on the simulated flood maps and tested with random rainfall inputs to examine the relevant prediction accuracy and computation cost. 85 https://doi.org/10.5194/hess-2021-596 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License.

Case study area
A portion of the city Hohhot, the capital of the Inner Mongolia Autonomous Region, was used as the case study to test the performance of proposed method. The city is located in Northern China and within a cold semi-arid climate zone. The winters are dry but the summers can be very hot and rainy. The average annual rainfall was approximately 396 mm, with majority of which concentrated from July to August 90 (Zhou et al., 2018;Zhou et al., 2016). The detailed landuse is shown in Fig. 1a and mainly consists of residential areas, commercial districts, institutes, green spaces and other landuse. The terrain is high in the north and lower in the south (see Fig. 1b) and thus the runoffs generally flow in a north to south direction. The service level of the drainage system was rather low and the original design return period was below once a year (Zhou et al., 2018). In recent years, flooding has occurred more frequently in the 95 area. Nevertheless, there is a lack of accurate historical data on flood areas and depths and thus simulations of flood events are performed with a 1d/2D coupled hydrodynamic model (to be introduced in the following sections) under various design rainfalls.  The input rainfall hydrographs for model training and validation were calculated using the regional Storm Intensity Formula (SIF) (q=635×(1+0.841×lg(P))/t^0.61, where q is the storm intensity ((L/s)/hm 2 ), p is the design return period (a) and t is the rainfall duration (minutes), respectively) (Zhang and Guan, 2012;Zhou et al., 2016). The rainfall calculation follows the national code for design of outdoor drainage (Mohurd, 2016) and the design principles of Chicago Design Storms (Berggren et al., 105 2014;Panthou et al., 2014;Zhou et al., 2012). The detailed procedures in applying the regional SIF to https://doi.org/10.5194/hess-2021-596 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License.
obtain CDSs are outlined in the national Technical Guidelines for Establishment of Intensity-Duration-Frequency Curve and Design Rainstorm Profile (Mohurd, 2014). In this study, we adopted in total 90 rainfall events, with return periods ranging from 2 to 100 years and rainfall duration of 2, 4 or 6 hours, respectively. All rainfall inputs were generated with a temporal resolution of 10 minutes.  Dhi, 2016d). When the underground drainage is surcharged, the excess water will flow to the surface and conduct surface inundation calculations under the context of extreme precipitation. On the surface, the water typically flows along buildings or streets based on a description 125 of the local digital elevation/topography (Mark et al., 2004;Leandro et al., 2009).
Model outputs include overland flow paths, extents, depths and velocities at different time steps.
One of the most commonly used outputs is the flood maps describing the maximum water depths caused by the given rainfall inputs (Kaspersen et al., 2017;Mike by Dhi, 2016a;Zhou et al., 2012). These flood maps can be further integrated with vulnerability data for an assessment of flood risk levels at different 130 spatial scales (Sampson et al., 2014;De Moel et al., 2009;Ashley et al., 2007). In doing so, critical areas with higher levels of flood risks can be identified and allocated with priorities in mitigation and adaptation plans (De Moel et al., 2015;Zhou et al., 2012). As shown in Fig. 2 that changes in input rainfalls lead to variations in simulated flood maps. Increases in flood extents and depths are seen with rainfalls of larger return periods in the case study. Specifically, there were in total 90 flood maps output 135 https://doi.org/10.5194/hess-2021-596 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License. as the dataset for deep learning model. Among that, 90% of the flood maps were randomly selected for model training and validation, and the rest 10% for testing. That means, for the deep learning model, all the tested hyetographs were the inputs and all the simulated flood maps were ground truth (GT) data to train the model network. After the training, the randomly sampled 10% flood maps were used to test the model prediction performances. 140

LSTM (Long Short Term Memory) network
The LSTM network has advantages in processing time-series data, especially for the long-term memory 145 of data. As shown in Fig. 3 that the LSTM network is used to predict the flood maps, with the rainfall intensity series used as the network input. Ideally, the network can predict the flood depth distributions in the region as close as the real values/ ground truth (GT) values. The relative error between the output and GT values is calculated and used as a priori condition for the Bayesian optimization (BO). Finally, an https://doi.org/10.5194/hess-2021-596 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License. optimal network model (e.g., with an appropriate number of layers) is obtained through the iterative BO 150 process. A benchmark LSTM network structure is shown in Fig. 4. With the input data (rainfall intensity), the LSTM gets the output (water depth) through a series of functional layers, including a LSTM layer 155 (containing N neural units), a Leaky ReLU activation function (Eq. (1)), and a fully connection (FC) layer. In the LSTM layer, the rainfall is input to N neural units and N outputs (i.e., h0, h1, h2, …, hN) are obtained. The outputs of these neural units are then transformed nonlinearly by the Leaky ReLU activation function and enter the FC layer. Eventually, the FC layer delivers the output of the network.
Where, and are input and scale factor (0.01), receptively. Any input value that is less than zero is multiplied by a fixed scale factor. 1 and 2 are the Gradient Decay Factor (0.9) and Squared Gradient Decay Factor (0.999), respectively. ( ) is the loss function, m and v are the momentum terms, 165 and ɛ=10 -8 . n is the number of samples, and � are the predicted and real results, respectively. The neural unit is a key component of the LSTM network and the structure of a single neural unit is shown in Fig. 5, including a forget gate (Eq. (7)), an input gate (Eqs. (8-10)) and an output gate (Eqs. 170 (11-12)). The forget gate determines how many unit states at time (t-1) are retained until time (t). The input gate determines the update of the unit states. The output of the LSTM neural unit state is determined by the nonlinear activation function (Sigmoid, in Eq. (13)) and the output gate. In general, an input (x) passes through a neural unit to get an output (h). Specifically, the calculation process of a single LSTM neural unit is shown as follows: 175 https://doi.org/10.5194/hess-2021-596 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License.
Where, is the output of the forget gate, and are the weight matrix and bias of the forget gate, and ℎ −1 and are the output of the previous neural unit (time (t-1)) and the current input (time (t)), respectively. is the output of the input gate, ′ and are the unit state of the current input and current time, respectively. is the output of the output gate, ℎ is the neural unit output of time (t).

Bayesian optimization
One problem with the aforementioned LSTM network is that its structure layers, learning rate, number of training epochs, mini-BatchSize and number of neural units were all unknown. To start from scratch, it can be very difficult and time consuming to manually select and fine-tune these hyper-parameters.
Bayesian optimization (BO) is an algorithm that can automatically search for the optimal 190 hyper-parameter combinations. The BO is a continuously updated probability model (Eq. (14)) and assumes that the probability of occurrence of Event A under the a priori condition of Event B is directly proportional to the probability of occurrence of the a posteriori condition of Event B. That is, for successively occurring events, the latter events are related to all previous events. It is a potential hyper-parametric optimization scheme, which means the most likely parametric combination is inferred 195 through a number of a priori attempts (i.e., training network models with different structures).
The posterior probability of the optimization function is updated through a number of evaluations of objective function to obtain the optimal parameter combination. It can provide reference for the subsequent tried models according to the a priori conditions (i.e., historical evaluation records, which are the mean relative errors of the tried network model in this paper). When selecting the next group of 200 parameter combinations, the algorithm made full use of the previous evaluation information to reduce the search time of the parameters. Specifically, we designed a variety of search ranges of the hyper-parameters and BO algorithm automatically took the values from the search ranges and constantly tried the network models with different structures, and then recorded the errors. In this paper, the hyper-parameters to be optimized included the number of LSTM layer, learning rate, Epoch, 205 mini-BatchSize, and number of hidden units. The search ranges of these five parameters were set to [1-5],  function (i.e., the mean relative error (Eq. (16)), see the next section), * is the optimal parametric combination, and is the value range of parameters.

Performance indicators
In order to evaluate the reliability of the proposed method, five indexes were employed to evaluate the prediction results, focusing on estimating the differences in flood depths and the spatial patterns of the 215 flood distributions. First of all, the mean relative error (Mre) was used to calculate the depth error between the prediction results (PR) and the ground truth (GT). Next, the 2-D correlation coefficient (2D-CC) and structural similarity (SS) were used to evaluate the correlation and similarity of images (distributions of flood areas), respectively. Further, the Bhattacharyya distance (BD) and Histogram Intersection Distance (HID) measure the similarity of two discrete or continuous probability distributions. 220 They were adopted to measure the amount of overlap between two statistical samples or images (distribution of water area).
Where, and are the average pixel values of Image I and J, respectively, , , and are the pixel local mean, standard deviation and cross covariance of Image I and J, respectively. 1 and 2 were 6.5 and 58.5 respectively. ( ) and ( ) are probability distributions of pixels of Image I and 225 Image J, respectively. X is the domain of ( ) and ( ).

Results and discussion
An illustration of the mean relative error of the testing dataset obtained from the 100 Bayesian optimizations is shown in Fig. 6a. The range of mean error is between 0.095 and 0.73 and the details on the error value of each number of BO optimization can be seen by the size of the bubble plot. Especially, 230 one of the networks, with a mean relative error value of 0.095, worked best in learning the flood map features. Figure 6b shows the RMSE and loss of the model with the best performance identified from the Bayesian optimization. It is shown that the loss curve stably decreased along the network training and the model achieved a convergence status after the 100 iterations with a small loss value. This implies that the deep learning network is very robust and trained well with the input data. The detailed network structure 235 of the optimized model is shown in Fig. 7. The learning rates, Epoch, mini-BatchSize, and number of hidden units were 0.0146, 385, 59 and 94 respectively.  We further analyzed the statistics of the performance indicators of the best performing model.
Boxplots of the specific relative error of each testing flood map is shown in Fig. 8a. As shown previously, 245 the deep learning model obtained satisfying results with a mean relative error of 9.5%. The achieved minimum RE of a single prediction is 0.76%, which implies the predicted flood map (both the inundation locations and depths) is very close to the ground truth map for validation. Through statistics, the degree of similarity is illustrated by the four types of parameters in Fig. 8b. First of all, the Bhattacharyya distances of the testing dataset were all close to zero, which means that the spatial distributions of the 250 ground truth and predicted flood hazard maps were very similar and a majority of the two map populations were overlapped. The good results were further verified by the Histogram intersection distance, structural similarity and 2D correlation coefficient as their values were all close to one. This implies that the spatial similarity of the predicted maps was very high. On the whole, Figure 8a and 8b indicate that the model is superior in learning and predicting the flood maps with different hyetographs. 255 The computation times of the hydrodynamic model and the deep learning model are compared in In visual quality, Figure 9 illustrated the inundated areas of the ground truth and the predicted flood 270 maps with the best model performance (i.e., with the minimum relative error). In total there are 27,183 grids in the flood map. It can be seen that the deep learning model successfully retrieved the depths and spatial patterns of the inundated areas. The two maps were almost identical and it is very difficult to tell the difference without looking into further statistic details. Figure 10a shows the corresponding spatial distributions of the relative errors of the flood map. The differences between the two maps were almost 275 negligible except the highlighted regions near the water bodies. The predicted flood map could identify all the flow paths and local depressions as the ground truth map. Meanwhile, the spatial distributions of the mean relative errors of the testing dataset are shown in Fig. 10b. Results indicate that there is a good agreement between the series of predictions and the ground truth maps. The distribution of the mean errors is even in most areas, with an error below 1%. We also note that the errors are greater where there 280 are higher water depths and more flow volumes. In the maximum case, the relative error of predicted water depths near the water bodies can reach above 10% (Fig. 10c).  The prediction accuracies of the deep learning model are further examined as a function of water depths in Fig. 11. Results show that the flood map dataset is imbalanced as a majority of the results contain no and shallow water. Results show that for water depth below 3m, the model performed well and most errors were below 2%. The errors tended to increase under extreme conditions, with water 290 depths above 3.5m. Figure 11b shows that the predicted water depths are basically consistent with the ground truth water depths. These results clearly indicate that the deep learning model generalizes well

Conclusions
A rapid, accurate and dynamic flood prediction tool is of great significance to urban water management to protect people, social assets and environment from flood hazards. This study proposed a deep learning 300 technique-based data-driven flood prediction approach, employing an integration of the LSTM technique and Bayesian optimization approach. Results clearly show that model can accurately produce flood maps for various hyetograph inputs with much lower computation costs. The presented model showed a robust generalizability and predicted the flood maps 19,585 times faster than the hydrodynamic model. The achieved mean relative error in water depths is 9.5% and the degree of similarity of flood maps was very 305 high. Specifically, in a best case, the difference between the ground truth and model prediction was only 0.76% and the spatial patterns of the two types of maps were almost identical. In conclusion, the accuracy and efficiency of the proposed method is satisfying.
We acknowledge some limitations in this study and discuss directions of future work. First of all, the current training and testing data were obtained from hydrodynamic modelling due to a lack of 310 detailed field site data. In future work, we consider adopting image capture techniques for data supplement, such as deep learning techniques for automated detection, acquisition and evaluation of water depths from camera images. In doing so, there will be more real case/field survey dataset for model https://doi.org/10.5194/hess-2021-596 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License. training and testing. Meanwhile, data augmentation is useful in enhancing the quantity and quality of input data, which will be tested in future investigations. In addition, the deep-learning model currently 315 only predicts the maximum water depths caused by the rainfall inputs. This means that the temporal changes in water depths during the rainfall events were not considered. Further research to predict the dynamic changes in both temporal and spatial scales is of great interest.
Despite the limitations, this work with its advances can well contribute to a better understanding of the deep learning techniques for urban flood mapping. The proposed methodology predicts water depths 320 with only rainfall inputs, without further requirements of e.g., local terrains and geographical conditions. The approach can be easily adjusted or adopted for other types of applications in water management field.
More importantly, the proposed method can potentially replace and/or complement the conventional detailed hydrodynamic model for urban flood assessment and management, particularly in applications of real time control, optimization and emergency design and plan. 325

Data availability
The dataset that support the findings of this study are available from the corresponding author upon reasonable request.

Author contribution
QZ conceived the idea and acquired the project and financial support. QZ and ST designed the study. XL 330 and JF collected and preprocessed the data. QZ and ST conducted all the experiments and analyzed the results. QZ wrote the first draft of the manuscript with contributions from ST. XL, JF, ZS and GC provided feedback on results and edited the manuscript.

Competing interests
The contact author has declared that neither they nor their co-authors have any competing interests. 335