Hydrological parameters should pass through a careful calibration procedure before being used in a hydrological model that aids decision making. However, significant difficulty is encountered when applying existing calibration methods to regions in which runoff data are inadequate. To achieve accurate hydrological calibration for ungauged road networks, we propose a Bayesian updating framework that calibrates hydrological parameters based on taxi GPS data. Hydrological parameters were calibrated by adjusting their values such that the runoff generated by acceptable parameter sets corresponded to the road disruption periods during which no taxi points are observed. The proposed method was validated on 10 flood-prone roads in Shenzhen and the results revealed that the trends of runoff could be correctly predicted for 8 of 10 roads. This study demonstrates that the integration of hydrological models and taxi GPS data can provide viable alternative measures for model calibration to derive actionable insights for flood hazard mitigation.

In the context of climate change and increased urbanization, flooding poses far-reaching threats to urban road networks of coastal metropolises (Balistrocchi et al., 2020). In Australia, approximately 53 % of flood-related drowning deaths were the result of vehicles driving into flood waters between 2004 and 2014. Additionally, indirect losses caused by flooding such as canceled commutes, mandatory detours, and travel time delays often outweigh direct losses (Kasmalkar et al., 2020). Quantifying the impact of flood exposure requires the prediction of surface runoff over roads and road disruptions induced by runoff, which are critical for the implementation of flood mitigation, traffic resilience improvement, and early warning systems.

Public concerns regarding road flooding hazards have created pressure to develop fine-grained and accurate models for hydrological simulation. Hydrological modeling is based on a relatively well-established theory that can provide approximations of real-world hydrological systems and has been widely used in many road-related studies (Versini et al., 2010; Yin et al., 2016; Safaei-Moghadam et al., 2023). Because hydrological modeling is subject to uncertainty that arises from the oversimplified reflection of hydrological systems, initial and boundary conditions, and lack of true knowledge, parameters for hydrological models must be carefully calibrated prior to their application to practical problems, so that models can closely match the historical trends (Gupta et al., 1998). As uncalibrated models are indefensible and sterile, very few models documented in the literature have been applied without a calibration procedure (Beven, 2012).

Over the past four decades, numerous studies have been conducted on the development of calibration methods. Methodologies for model calibration range from simple trial-and-error methods that adjust one parameter value in each iteration until the differences between predicted and observed values are satisfactory to Bayesian updating frameworks that reject the concept of a single correct solution. To a great extent, the success of model calibration is dominated by the availability of field-observed runoff data. However, runoff data are generally only gathered at a few sites, and some cities never measure runoff data in built-up regions (Gebremedhin et al., 2020). Although runoff data can be effectively collected by administration departments in some cities, these cities are not always motivated to share these data with the public. For example, among China's top 10 largest cities,

Ranked by the resident population in 2021.

only Shenzhen has shared runoff-related data on an open data platform. For model calibration at the road scale, runoff data are even more difficult to acquire because road networks are far denser than river networks and flood gauges are only installed in a few flood-prone roads based on their high measurement cost, leaving most roads ungauged. As pointed out by Beven (2012, p. 55), “the ungauged catchment problem is one of the real challenges for hydrological modelers”.This lack of hydrological data has prompted researchers to seek additional data sources to support flood-related decision making. Based on the advancement of mobile telecommunication technologies, big data are emerging as alternative sources of information for coping with flood risks (Paul et al., 2018; Li et al., 2018; Gebremedhin et al., 2020). Citizens can voluntarily or passively act as human sensors to generate georeferenced data to improve flood monitoring. Many studies have leveraged crowdsourced social media data (Brouwer et al., 2017; Sadler et al., 2018; Zahura et al., 2020), mobile phone data (Yabe et al., 2018; Balistrocchi et al., 2020), and taxi GPS data (She et al., 2019; Kong et al., 2022). However, most previous works have concentrated on using big data either for flood mapping or mining spatiotemporal patterns (Restrepo-Estrada et al., 2018), and parameter calibration for ungauged roads based on big data remains a problem.

This study extends our previous study (Kong et al., 2022) by going a step further than simply recognizing flooded roads. We propose a calibration method for road-related hydrological parameters based on taxi GPS data. Many studies have shown that vehicle-related information during rainfall, including vehicle volume, speed, and trajectory information, is useful for flooded road detection (Zhang et al., 2019; Qi et al., 2020; Yao et al., 2020). When a road segment is inundated by heavy rainfall, the vehicle volume may exhibit a sharp or gradual drop depending on the intensity of the rainfall event. Conversely, an abnormal drop in vehicle volume during the rainfall may imply that a road has experienced rainfall-induced inundation. This motivates us to use traffic-related data sources to calibrate hydrological parameters. In this study, we developed a transformation process that converts rainfall time series data into a time series of probabilities that no taxis will drive on a road (“no-taxi-passing probability” hereafter) for a given hydrological parameter set. We then assigned a probability to every parameter set by integrating the no-taxi-passing probability with observed taxi GPS data. We outlined a generalized taxi-data-driven calibration framework and implemented a framework with specific hydrological and transportation models.

Observed data are not always as informative as expected and may be inconsistent with other data sources. Hydrologists typically adopt the Bayesian framework to update hydrological parameters, which provides a generalized formalism that integrates prior probability representing prior knowledge with likelihood that reflects how accurately a model can reproduce observations to form a posterior probability. Suppose we have several versions of a hydrological model, each with different sets of parameters. Then, the purpose of the Bayesian updating procedure adopted in this study is to assign a posterior probability to every hydrological parameter set as new taxi data become available.

Two components are critical for this Bayesian updating procedure: one is the prior probability and the other is the likelihood function. Regarding the prior probability, for their famous calibration model called generalized likelihood uncertainty estimation, Beven and Binley (1992) stated that all parameter combinations are considered equally probable before additional information is introduced. After the first update, the prior probability of each updating iteration can be replaced by the posterior probability of the latest updating iteration. Likelihood, which is a measurement of how well a given model conforms to the observed taxi behavior, is not as easy to compute as the prior probability because the parameter set to be estimated is hydrology related, whereas the observed evidence is taxi related. Therefore, we must determine how to construct a taxi-based proxy whose probability is equal to the associated hydrological parameter and construct a function enabling the transformation from hydrological parameters to taxi-related proxies.

The proxy selected in this study was the time series of the no-taxi-passing probability. Figure 1 presents a generalized procedure for converting a rainfall time series into a time series of no-taxi-passing probabilities for each hydrological parameter. This procedure consists of three steps. First, a hydrological model is used to convert a rainfall time series into a hydrograph. Second, a runoff-disruption function that relates runoff to the probability that a road is blocked is used to transform the hydrograph into a time series of road disruption probabilities. Third, the taxi arrival rate is combined with the time series of road disruption probabilities to derive a time series of no-taxi-passing probabilities. The hydrological model and taxi arrival rate are considered to be unique for every road and are invariable within a short period, whereas the runoff-disruption function is identical for all roads.

Generalized procedure for converting a rainfall time series into a time series of no-taxi-passing probabilities.

Integrating this three-step process with the Bayesian equation enables us to compute the posterior probability of a parameter set based on taxi data. For a specific road, suppose there are

Solving Eq. (2) involves the calculation of

Section 2.1 presented a generalized three-step procedure for converting a rainfall time series into a time series of no-taxi-passing probabilities. In this section, we specialize this process by integrating existing theories with our model. The three conceptualized steps illustrated in Fig. 1 were replaced with three concrete submodels. First, a Soil Conservation Service (SCS) unit hydrograph was used to convert rainfall excess into a hydrograph of the target road. Second, an empirical runoff-disruption function based on data extracted from various experimental, observational, and modeling studies was applied to convert the hydrograph into a time series of road disruption probabilities. Third, a Poisson distribution representing the distribution of taxi arrival rate was combined with the road disruption probability time series to derive a no-taxi-passing probability time series.

Not all rainfall produces runoff because soil storage can absorb a certain amount of rain. However, in urbanized areas, only a small proportion of rainfall infiltrates the soil or is retained on the land surface, leaving most rain to flow across urban surfaces and become direct runoff. The rainfall that becomes direct runoff is referred to as rainfall excess. The Natural Resources Conservation Service (NRCS)

The NRCS was originally called the US Soil Conservation Service.

developed a method to estimate rainfall excess based on soil types and land uses using the following curve number equation:The rainfall excess derived using Eq. (7) was inputted into the unit hydrograph to derive the runoff. The unit hydrograph is a commonly used rainfall-runoff model that converts rainfall excess into a temporal distribution of direct runoff. First proposed by Sherman (1932), the unit hydrograph is defined as the hydrograph resulting from one unit of rainfall excess distributed uniformly over a catchment area. It assumes that rainfall is uniform over the catchment area and that runoff increases linearly with rainfall excess. Although these assumptions cannot be perfectly satisfied under most conditions, the results obtained from the unit hydrograph are generally acceptable for most practical cases. The model, originally designed for larger watersheds, has been found to be applicable to some catchment areas less than 5000 m

The unit hydrograph is only applicable to watershed areas where runoff data are measured. The paucity of runoff data motivated the development of the synthetic unit hydrograph (SUH) concept. The term “synthetic” in SUH refers to a unit hydrograph derived from watershed characteristics, rather than empirical rainfall-runoff relationships. In this study, we utilized the SCS unit hydrograph, which is a dimensionless SUH proposed by the NRCS. For the dimensionless SUH, the discharge (i.e.,

The shape of an SCS unit hydrograph is entirely determined by the peak rate factor. A standard value of 2.08 for the peak rate factor is recommended and commonly used by the NRCS (Fig. 2). To construct an SUH from an SCS unit hydrograph, the

Standard SCS unit hydrograph. Data provided by the Natural Resources Conservation Service (2007).

For the sake of simplicity, the peak rate factor was not calibrated and was at 2.08, although some studies have indicated that it has a wide range from 0.43 for steep terrain to 2.58 for very flat terrain (Chow et al., 1988). After

Workflow of the SCS unit hydrograph for converting rainfall into runoff.

The goal of Step 2 is to convert the hydrograph generated in Step 1 into a time series of road disruption probabilities, or more specifically, the probability that a taxi driver chooses to turn their car when arriving at a flooded road. Most models in the literature assume that a road is either open or closed, which does not correspond to the empirical evidence that many drivers may take risks to drive on inundated roads. To transition from a binary view of a flooded road being considered “open” or “closed,” Pregnolato et al. (2017) proposed the use of a curve that relates the depth of floodwater to a reduction in vehicle speed to indicate the probability of road disruption. This idea was soon adopted by Contreras-Jara et al. (2018) and Nieto et al. (2021).

A driver will turn around when they believe that the flow rate is too risky for their vehicle configuration. From this perspective, the road disruption probability is equal to the probability that vehicle performance is less than the flow rate perceived by a driver. However, it is difficult to quantify the factors that influence the willingness of people to drive through a flooded roadway, and impossible to obtain the precise knowledge regarding all taxi-flood intersections. Alternatively, to ensure vehicle stability in flood flows, guidelines are typically recommended based on the limiting value of depth times velocity. Many researchers have conducted laboratory testing on the stability of different types of vehicle models exposed to different combinations of depth and velocity (Merz and Thieken, 2009; Shah et al., 2018). As suggested by Pregnolato et al. (2017), we constructed our runoff-disruption function by integrating data from the literature and authoritative guidelines. In this study, the road disruption probability was defined as the probability that the product of flow velocity and flow depth was greater than the stability limits extracted from the literature, which are listed in Table 1 and plotted in Fig. 4. The expression of the fitting curve is

Guidelines recommended in the existing literature.

Empirical runoff-disruption function derived from the existing literature.

A road is considered to have no taxis passing in a fixed time interval if the road has no taxis arriving or if every taxi that arrives at the road turns around. Therefore, the no-taxi-passing probability can be calculated using the following equation:

Table 2 lists all the submodels and parameters used in the three-step process. The core principle of the three-step process was to calculate the time series of no-taxi-passing probabilities,

Specific submodels and parameters used in the three-step process.

The method outlined above was tested on the intersection of Xinzhou Road and Hongli Road in Shenzhen, which is recognized as a flood-prone point by the Water Authority of Shenzhen Municipality. Recall that the parameters to be calibrated are the curve number CN, catchment area

Detailed information on parameter sets to be calibrated.

Taxi GPS data collected during two storm events that occurred on 9 and 23 May 2015 were used to calibrate the parameter sets for the target intersection. Rainfall time series data and taxi observations during these two storms are presented in Fig. 5. Each taxi observation contains two time series: the time series of taxi volumes at 5 min intervals and the time series of road statuses at 5 min intervals. These series were derived from the taxi volumes with a value of one if the taxi volume was greater than zero and a value of zero if the taxi volume was zero.

Rainfall and taxi observations used to calibrate hydrological parameters:

Given the rainfall on 9 May 2015, we must calculate the time series of no-taxi-passing probabilities for each parameter combination. Because there are 4800 parameter sets, we can derive 4800 possible time series of no-taxi-passing probabilities. For simplicity, we only present the 3120th parameter set (i.e., CN

Example transformation of a rainfall time series into no-taxi-passing probabilities using the three-step procedure for the 3120th parameter set:

In the second step, the runoff was transformed into a time series of road disruption probabilities based on the runoff-disruption function (Fig. 6d). The runoff-disruption function takes the product of water depth and velocity (in units of m

In the third step, the time series of road disruption probabilities (Fig. 6e) was converted to no-taxi-passing probabilities using Eq. (16) (Fig. 6f). The average number of taxis during the flooding period is presented in Fig. 6f, and the derived time series of no-taxi-passing probabilities is presented in Fig. 6g.

After the time series of no-taxi-passing probabilities for every parameter set were derived, the degree of belief that a given parameter set is optimal was calculated by integrating it with the taxi observations on 9 May 2015. According to Eq. (5), the posterior probability of the 3120th parameter set is calculated as

By following this process, we can calculate the posterior probabilities for every parameter set. Additionally, the posterior probability distribution of a parameter set can be updated using the taxi observations and rainfall data on 23 May 2015 as

Evolution of the posterior probability distribution of hydrological parameter sets:

The proposed method was validated on flood-prone roads located in Shenzhen, China, which is a coastal city frequently hit by extreme storms during summer. To the best of our knowledge, Shenzhen is the only city that has shared runoff-related data with the public in China. Three data sources, namely taxi GPS data, rainfall data, and authoritative water level data, were used to validate our parameter calibration method. Hydrological parameters were calibrated using the first two data sources and the water level data acted as the ground truth to validate the proposed method. Taxi GPS data were anonymized and aggregated in 5 min intervals. Rainfall data, which were also collected in 5 min intervals, were measured at 115 gauging stations citywide and mapped to the road network throughout Shenzhen using the ordinary Kriging spatial interpolation algorithm. The water level data were only measured at certain flood-prone points with a dynamic sampling interval ranging from 5 min when the weather was rainy to 1 h when the weather was clear. The proposed calibration method was validated by analyzing the hydrographs derived from the calibrated hydrological models against the authoritative water levels for 10 selected roads. Detailed information on the three data sources is provided in Table 4.

Detailed information on the three data sources.

The two storm events on 9 and 23 May 2015 were treated as calibration events, and a storm on 11 June 2019 was retained for testing. Clearly, there is a 4 year gap between the calibration data and validation data based on data availability. The hydrological environments of flood-prone roads may have changed during these years, which could render the parameters calibrated based on data from 2015 inaccurate for analysis in 2019. To reduce the validation error caused by this time gap, the roads to be validated should have been vulnerable to flooding in both 2015 and 2019 so that the hydrological parameters of these roads would have a higher chance of remaining unchanged. Therefore, a total of 10 flood-prone roads that were labeled as such in both the List of 2015 Flood-prone Roads in Shenzhen (Water Authority of Shenzhen Municipality, 2015) and the List of 2019 Flood-prone Roads in Shenzhen (Water Authority of Shenzhen Municipality, 2019) were carefully selected (Fig. 8).

Spatial distribution of 10 flood-prone roads in Shenzhen.

We introduced two types of prior distributions to demonstrate the effects of prior distributions on calibrated parameters. The first prior distribution was determined based on prior knowledge and DEMs from Shenzhen, which were obtained from ASTER GDEM V3, which is a product of NASA and Japan's Ministry of Economy, Trade, and Industry (METI) (Ministry of Economy, Trade, and Industry (METI) of Japan and the United States National Aeronautics and Space Administration (NASA), 2023). This global DEM covers the entire land surface of the earth with a 30 m resolution, exhibiting notable improvements in horizontal and vertical accuracy while reducing anomalies compared with previous versions. We inputted the DEMs from Shenzhen into the hydrological software PCSWMM to delineate catchments and calculate the catchment area. Subsequently, we computed the time of concentration using the watershed lag method (Natural Resources Conservation Service, 2010b). As suggested by Zhang and Huang (2018), we used the average curve number for Shenzhen in 2015, which was assessed to be 60, as the estimated curve number for each road under validation.

We then constructed a discretized parameter space for the three parameters for each road as follows: for the curve number, we examined eight possible values centered on 60 with steps of five. For the catchment area, we considered 20 possible values centered on the estimated value with steps of 0.01 km

Prior probability distributions of hydrological parameter sets based on DEMs and other prior knowledge for 10 flood-prone roads.

The second prior distribution assumed that the three parameters all follow uniform distributions. The parameter spaces for the second prior distribution were the same as those for the first. As a result, the joint probability of each parameter set was equal to

Detailed information on the two types of prior distributions.

We first calibrated the parameters based on the prior distributions calculated according to the DEMs and other prior knowledge. The resulting posterior distributions are presented in Fig. 10. Each row in Fig. 10 represents a different road, and each column represents a curve number. Each subplot presents the joint probability distribution of the catchment area and time of concentration for a given curve number. The color intensity in Fig. 10 represents the magnitude of the probabilities. Following two iterations of updating, the posterior probability distributions for both the catchment area and time of concentration converge around the optimal parameter sets for most flood-prone roads. This demonstrates that incorporating taxi observations significantly reduces the uncertainty associated with catchment area and time of concentration. The probability typically achieves its maximum value when the curve number is either 55 or 60. Furthermore, each subplot contains a salient cluster with higher probability than other regions, suggesting that there may be multiple acceptable parameter sets.

Posterior probability distributions of hydrological parameter sets for 10 flood-prone roads after calibration. The prior probability distributions were derived from the DEMs and additional prior knowledge.

Furthermore, the optimal catchment area under a given curve number decreases as the curve number increases, whereas the optimal time of concentration under a given curve number increases with the curve number. This is logical, because a higher curve number corresponds to increased rainfall excess under identical rainfall conditions, requiring a reduction in catchment area to maintain the runoff that best aligns with the taxi observations. Similarly, an increase in the time of concentration diminishes the peak runoff produced by the additional runoff generated by a higher curve number, thereby preserving the optimal runoff status.

We also present the marginal distributions of the three parameters for 10 roads before and after calibration in Fig. 11. In Fig. 11, the marginal posterior distributions of the curve number appear relatively similar to the marginal prior distributions. It seems that the proposed method employing taxi data provides limited information regarding the distribution of curve numbers compared with the catchment area and time of concentration. This outcome may be a result of the range and discretization granularity of the parameter spaces. Catchment area and time of concentration encompass 20 and 30 possible values, respectively, whereas the curve number has only 8 potential values. The smaller parameter space of the curve number reduces the search space, and its impact on the no-taxi-passing probability is comparatively lower than that of the catchment area and time of concentration.

Marginal prior and posterior probability distributions of the curve number for 10 flood-prone roads.

For example, for road ID

Impacts of three parameters on the variation of the time series of runoff and no-taxi-passing probabilities:

The posterior distributions calibrated based on the uniform prior distribution are presented in Fig. 13. When comparing two posterior distributions derived from two prior distributions, it is clear that the posterior distributions of the catchment area and time of concentration are very similar, indicating that the impact of prior distributions on these parameters rapidly diminishes after taxi-related knowledge is added. As stated by Beven and Binley (1992, p. 286), “as soon as information is added in terms of comparisons between observed and predicted responses then, if this information has value, the distribution of calculated likelihood values should dominate the uniform prior distribution when uncertainty estimates are recalculated”.

Posterior probability distributions of hydrological parameter sets for 10 flood-prone roads after calibration. The prior probability distributions were derived from a uniform distribution.

After the parameter sets were calibrated, they were combined with an SCS unit hydrograph to construct an SUH, which was combined with the rainfall data from 11 June 2019 to produce the predicted hydrograph. Because the posterior probability associated with each parameter set can be regarded as a fuzzy measure reflecting the degree of belief that the parameter set is true, the weighted runoff values for each parameter set were summed to calculate the final predicted runoff:

The output of the calibrated hydrological model is runoff (with units of m

Because the posterior distributions derived from the two types of prior distributions were very similar, we only considered the posterior distribution calibrated based on prior distributions derived from DEMs and other prior knowledge for validation. Comparisons between the observed water depth and simulated runoff for 10 selected roads are presented in Fig. 14, and corresponding scatter plots are presented in Fig. 15. We use the Pearson correlation coefficient, which measures the linear correlation between two variables, as a goodness of fit indicator. One can see that 8 of 10 roads are characterized by significant positive Pearson coefficients, indicating that the runoff and water have similar and consistent variation trends.

Comparisons between the observed water depth and simulated runoff for roads 1–10. The maximum value is 30 m

Scatter plots of the observed water depth and the simulated runoff for roads 1–10.

It is noteworthy that goodness of fit simply describes the degree of correlation between the observed and simulated data, and may contain validation bias. As suggested by Legates and McCabe (1999), correlation-based statistics are insensitive to additive and proportional differences between simulations and observations. Therefore, the fitting of a rating curve is only a partial validation and the usefulness of the proposed calibration method requires further analysis.

Four main points about the proposed calibration method are worthy of further discussion. The first is that although the presented validation results support the use of taxi GPS data to calibrate hydrological parameters for poorly gauged road networks, the proposed method is more applicable to roads that are frequently visited by taxis. Uncertainty increases as the taxi volume on a road decreases. A road is considered to be passable when at least one taxi GPS point is observed during a time interval, but we cannot assert that a road is disrupted when the taxi volume is zero. When a road with frequent taxi traffic is observed with no taxi GPS points during a storm, it is highly probable that the road is disrupted by flooding, which provides relatively reliable information for parameter calibration. Conversely, when a road with little taxi traffic has no taxi points during a storm, there is a relatively high likelihood that the road remains passable and is simply exhibiting its typical trend of no taxis. Therefore, the proposed calibration method becomes relatively unreliable when a no-taxi-passing period is no longer a good proxy for the disruption period on a road with sparse taxi data. To compensate for a shortage of taxi GPS data, additional data sources, such as ride-hailing data and bus data, should be incorporated in future work.

Second, the disruption of one road may cause cascading failures, where the disruption is rapidly propagated from the inundated road to adjacent non-inundated roads under the constraints of road connectivity. For a road that is disrupted, but not inundated by a storm, the implementation of the proposed calibration method may be subject to structural errors. Consider two connected roads called Road 1 and Road 2 that are both disrupted during a storm and have taxi volumes of zero (Fig. 16). In this case, Road 1 is disrupted by the flooding, whereas Road 2 is only disrupted because it is connected to Road 1. If taxi data are the only data used for calibration, then the posterior distributions of the hydrological parameters for Road 1 and Road 2 will be identical after calibration, because the sequences of taxi volume are identical for both roads. However, we know that the hydrological parameters of these two roads are not the same, because only one road is flooded. Just like we cannot simply treat the no-taxi-passing period as the disruption period, we cannot confuse the disruption period with the flooded period. In future work, an algorithm that facilitates distinguishing the flooding-induced disruption from connectivity-induced disruption should be developed.

The difference between the disruption period and flooded period.

Third, the proposed three-step process, which consists of an SCS unit hydrograph, empirical runoff-disruption function, and Poisson distribution, is a realization of the generalized framework presented in Fig. 1. The submodels used in the three-step process can be flexibly replaced with other submodels according to complexity requirements and data availability. For example, an alternative to the SCS unit hydrograph is the distributed hydrological model. Unlike the SCS unit hydrograph, the distributed hydrological model partitions a watershed into physically homogeneous units and captures the complex spatial variation induced by human activity in high resolution, which may be more applicable to urbanized environments such as road networks. However, considering that some critical data, such as road drainage data and land use data, are missing, as well as the extreme computational cost associated with the distributed hydrological model, we did not adopt this model in this study. Another assumption we made in this study is that the number of taxis arriving at a road follows a Poisson distribution. By conducting the chi-square goodness of fit test, we found that the frequency distribution of taxi volumes adheres to a Poisson distribution for more than 50 % of 5 min intervals for 7 of the 10 roads presented in Fig. 8, indicating that the Poisson model appears to be a suitable assumption. However, this hypothesis may not be universally applicable, particularly in different urban contexts, where alternate distributions, such as the Weibull distribution, may provide a more accurate representation.

Fourth, it is imperative to acknowledge that the parameter values in this study were discretized, although hydrological model parameters are inherently continuous. This discretization approach could result in the omission of optimal solutions, particularly when hydrological models exhibit sensitivity to these parameters. It is important to note that discretization is neither a requisite nor a recommended strategy. Future research should address the optimization or posterior inference problem in a continuous parameter space based on established methods such as the Monte Carlo algorithm.

An urban flooding model requires various types of data for calibration. In this study, we proposed a Bayesian calibration framework for the hydrological parameters of a road network based on taxi GPS data. A three-step procedure consisting of a rainfall-runoff model, runoff-disruption function, and no-taxi-passing probability model enabled us to transform a given rainfall time series into a time series of no-taxi-passing probabilities for each parameter set, which is key to taxi-data-driven model calibration. The calculated no-taxi-passing probabilities, which acted as a proxy for the associated hydrological parameter sets, were compared with observed taxi data based on the Bayes equation to assess the posterior probability distributions of the hydrological parameter sets. Three parameters, namely the curve number, catchment area, and time of concentration, were calibrated. The proposed calibration method was instantiated by combining classical hydrological models with traffic flow models and was validated on 10 flood-prone roads in Shenzhen. The validation results indicate that the trends of runoff could be correctly predicted for eight roads, which demonstrates the potential of calibrating hydrological parameters based on taxi GPS data.

This study highlights the potential of integrating transportation-related data with hydrological theory for the transportation resilience improvement and flood risk management of road networks. We hope that our study can provide a flexible calibration framework for countries that have little runoff data but rich taxi data. We acknowledge that the application of the proposed method is currently limited by the heterogeneous spatial distributions of taxis citywide and the cascading effects of road inundation, but we expect this to change with the increasing availability of vehicle data and continuous optimization of modeling approaches.

The data and code used to validate the proposed method are available on Zenodo (

JY conceptualized the article and collected field data. XK designed the methodology and was responsible for code compilation. KX plotted the figures and revised the manuscript. BD managed the implementation of research activities. SJ discussed results and contributed to method validation. XK wrote the final version of the article with contributions from all co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Shan Jiang thanks for the support by Tufts University.

This research has been supported by the National Key Research and Development Program of China (grant no. 2022YFC3303100).

This paper was edited by Yue-Ping Xu and reviewed by Jeffrey M. Sadler and two anonymous referees.