Belowground urban stormwater networks (BUSNs) are critical for removing excess rainfall from impervious urban areas and preventing or mitigating urban flooding. However, available BUSN data are sparse, preventing the modeling and analysis of urban hydrologic processes at regional and larger scales. We propose a novel algorithm for estimating BUSNs by drawing on concepts from graph theory and existing, extensively available land surface data, such as street network, topography, and land use/land cover. First, we derive the causal relationships between the topology of BUSNs and urban surface features based on graph theory concepts. We then apply the causal relationships and estimate BUSNs using web-service data retrieval, spatial analysis, and high-performance computing techniques. Finally, we validate the derived BUSNs in the metropolitan areas of Los Angeles, Seattle, Houston, and Baltimore in the US, where real BUSN data are partly available to the public. Results show that our algorithm can effectively capture 59 %–76 % of the topology of real BUSN data, depending on the supporting data quality. This algorithm has promising potential to support large-scale urban hydrologic modeling and future urban drainage system planning.

Urban flooding events pose an escalating threat to urban areas at regional and larger scales. The worsening of the urban flooding issue can first be attributed to urbanization and associated regional population migration. The United Nations estimates that, globally, the urban population will grow to more than two-thirds of the total population by 2050

Generally, there are two types of systems for transporting stormwater from urban areas to local water bodies: combined sewer systems (CSSs) and separate sewer systems (SSSs). CSSs collect domestic sewage and/or industrial wastewater in addition to stormwater, whereas SSSs have two separate systems for collecting stormwater and sewage/wastewater. During heavy rainfalls, the overflows from CSSs are a major source of pollution; therefore, at the end of the 20th century, many major countries around the globe started adopting SSSs in their urban development plans and partially or fully transforming their existing CSSs into SSSs

The US Environmental Protection Agency (EPA) uses the term “municipal separate storm sewer system” (MS4) to refer to the stormwater collection part of SSSs. In this study, we focus on MS4s because they are the most dominant type of stormwater transport systems in the US. The EPA defines an MS4 as a publicly owned urban stormwater conveyance system that directly receives excess surface runoff from urban areas during storm events and delivers it to lakes, rivers, or oceans. Thus, MS4s play an irreplaceable role in preventing or mitigating urban floods. However, due to a lack of good-quality MS4 data, most urban modules in existing hydrological models focus on surface hydrological processes (e.g., those associated with impervious areas and compacted soils) and do not explicitly account for MS4s

In this study, we focus on the belowground urban stormwater network (BUSN) elements of MS4s, i.e., aboveground elements such as street inlets, manholes, and ditches are not the subject of this study. We attempt to address the data scarcity challenge based on two premises. The first premise is the topological relationship between street/road networks and BUSNs. Generally, BUSNs are required to protect streets/roads from flooding; thus, they are often constructed parallel to street/road networks, particularly for important streets, as illustrated in Fig.

A schematic of an urban drainage system adapted from

The shortest paths in the two graphs shown in Fig.

Paths in bold indicate the paths that are in one graph and not in the other.

A comparison of unweighted

Thus, our primary objective is to propose, develop, and validate a novel algorithm for deriving BUSN topological properties from ubiquitous existing aboveground data. The rest of the article is structured as follows: Sect. 2 describes the conceptual basis and technical details of the new algorithm, Sect. 3 lists four metropolitan areas in the US as the case studies, Sect. 4 illustrates this algorithm in these case studies, and Sect. 5 closes with a summary and discussions on the limitations and future implications.

In this section, we first explain the conceptual framework of our algorithm for deriving the topology of BUSNs, including the following:

complex network analysis concepts (which are transferable and not specific to any location) from graph theory that we adapt to capture the topological relationships between BUSNs and street/road networks,

a generic procedure for BUSN derivation,

a simple yet effective way of validating the BUSNs derived by the algorithm,

the technical details of implementing the algorithm in the US based on the US the federal and local urban drainage design criteria as well as publicly available land surface data.

A schematic diagram of a generic procedure for estimating BUSNs using publicly available aboveground datasets.

BUSNs and street networks are both complex, hierarchical networks. The function of such networks largely depends on their nontrivial topological structure

The mathematical definition of BC is as follows:

We demonstrate the difference between weighted and unweighted BC in a directed graph using a simple example. Figure

Table

A simple case demonstrating the validation method. Real BUSN elements with more than 60 % coverage percentage are considered as “covered”.

The paths highlighted using bold font in Table

Furthermore, the weights of edges can be calculated in various ways depending on their network function.

For adapting the BC concept to BUSNs, we rely on two facts: (1) a BUSN is also a complex network and a directed graph, and (2) BUSNs are well connected to street networks through some connecting elements such as street inlets and catch basins, as shown in Fig.

Moreover, it is neither necessary nor feasible to have a belowground stormwater pipe below each street. For example, a country road or a street in a very sparsely populated area may not need a belowground stormwater pipe because the corresponding surface infrastructures, such as flood buffering zones, may be sufficient to protect it from floods. Those streets in residential and commercial areas are relatively more important and will need BUSNs for flood protection due to two possible reasons: (1) the streets are so important that extra flood protection is needed in addition to existing surface infrastructure (e.g., buffering zone and retention ponds), or (2) there is not enough space for surface infrastructure in heavily populated urban areas, where BUSNs are the most economical and feasible option. Indeed, within a street/road network, some streets are more important than others depending on several factors, such as road type, urban form (e.g., street circulation system and buildings' arrangement, distribution, and spatial accessibility), land use (e.g., residential, commercial, and industrial), and land cover (e.g., open spaces, parks, and impervious surfaces). Thus, streets with more importance are more likely to have BUSNs underlying them. In this study, we quantify the relative importance of streets by incorporating the aforementioned factors into the BC concept.

To address the single-weight-factor limitation pointed out in the

As an edge should have a single integrated weight for computing the BC, different street/pipe attributes should be summarized into a single value.

We transform the values of street/pipe attributes into edge weights such that streets/pipes with more significance have lower weights. The reason for this is that we measure the street/pipe significance based on their BC values, and edges with lower weights are more likely to have higher BC values.

Considering that street/pipe attributes can have different ranges or even data types, we normalize their values before assigning them as edge weights.

In this study, we consider four street attributes, namely, land cover type, road type, the discharge capacity of its associated storm drain pipe, and the building footprint, using integer, string, float, and float data types, respectively. First, we normalize each attribute by data binning (i.e., dividing the values into five categories and assigning each category an integer number from 1 through 5). These integer values correspond to different levels of relative importance starting from very high (1) to very low (5). For example, we normalize building footprints such that streets with higher building footprints have lower weights. The reason for this is that a higher building footprint value indicates that the street is located in a high-density residential area or a business center; therefore, the stormwater should be drained quicker. We note that the only requirement for a normalized weight is that it should be greater than zero, as zero edge weights may lead to infinity paths with equal lengths, meaning that the shortest path cannot be determined. After transforming all of the attributes into edge weights by normalizing them to the range [1, 5], we compute the integrated weight of edges by taking the average of four weights. The same relative importance logic applies to the integrated weight (i.e., a lower integrated weight value for a street increases the probability of the street having a higher BC value). Consequently, the street has a higher relative significance and requires more stormwater transport capacity. We provide more details on the implementation of these rules in our algorithm in the following section.

Comparing the Fisher–Jenks

A flowchart demonstrating details of the proposed framework.

In this subsection, we outline a generic procedure to derive a BUSN based on IWBC that can be conceptually applicable to any urban area. Figure

Step 1 entails the calculation of the surface slopes of all streets in the street network of interest (Fig.

Step 2 involves setting the flow directions between streets based on the surface slopes obtained in Step 1. At this stage, we assume street length is the only weighting factor. Now the street network becomes a directed, weighted network, as shown in Fig.

Step 3 entails the initial estimation of relative street importance by calculating the weighted betweenness, as shown in Fig.

Step 4 requires the estimation of streets' right-of-way (ROW) based on local/federal regulations. ROW is a part of the land that is reserved by local/federal authorities for construction, maintenance, and future expansion of transportation elements, such as highways and public utilities

Step 5 entails the estimation of the hydraulic properties of the potential BUSN pipes (e.g., pipe size and slope) by accounting for both the weighted betweenness from Step 3 and the recommendations from the street-/road-relevant regulations at the federal, state, or local levels (see Fig.

Step 6 involves the calculation of IWBC by assigning different weights to different streets and integrating several weighting factors, such as road type, land use/land cover (LULC), surface topography, and building density.

Step 7 requires the derivation of the BUSN by removing those relatively unimportant streets based on IWBC. We assume that BUSN pipes are only needed for those remaining streets. Thus, the topology of the BUSN is the same as that of the remaining, relatively important streets.

Finally, Step 8 involves checking the connectivity of the remaining network based on the concept of weakly connected components from graph theory. In graph theory, network connectivity is an important measure of a network's resilience to losing edges or nodes (i.e., the impact that removing edges and nodes has on the overall network flow). For this purpose, after removing the unimportant streets, we first detect the isolated subnetworks by determining the weakly connected components (i.e., those components that are unreachable after converting the network to an undirected graph by ignoring edge directions). We then find the number of streets for each subnetwork and remove those subnetworks whose number of streets is less than the average street count of the subnetworks.

In Step 5, we proposed a weight integration strategy for combining continuous and discrete weighting factors into a unified discrete weight system. Some urban features, such as road type and land use/land cover, are only quantified with discrete values and cannot be represented by continuous values. The integrated weight ranges from 1 to 5. For any edge, a smaller weight indicates higher relative importance because the edge will have a higher chance of being on the shortest path. First, we transform all continuous weighting factors into discrete values in two possible ways: (1) the quantile method, which is based on the equal number of features in each class, and (2) the Fisher–Jenks

Due to the scarcity of publicly available real BUSN data and their low quality, we can only validate the topology of the derived BUSNs. Therefore, although our proposed algorithm provides hydraulically feasible approximations for the size and slope of the BUSN pipes, we cannot validate them. Our topology validation strategy is based on the principle of spatial proximity for the places where some real BUSN data are available. As shown in Fig.

A map of urban areas that are the subject of this study as well as their area.

There are often non-negligible uncertainties in both street network and real BUSN data. For instance, over any urban area, one may estimate both the total length values of the real BUSN pipes and the streets, respectively, from the available data. In principle, the real BUSN's total length should not exceed that of the corresponding street network. In reality, however, this may not be the case if there are notably fewer missing data from the real BUSN dataset compared with the corresponding street network dataset. To account for this situation in our validation, we first discretize the targeted urban domain into 1 km resolution grid cells. For each grid cell, we then calculate the ratio of the total length of the real BUSN pipes to the total length of the street network elements. Theoretically, this ratio should be no more than 1.0. We discard those cells with a ratio larger than the theoretical maximum (1.0) and only calculate

Input data for the Los Angeles case:

Examples demonstrating the quality of publicly available datasets.

This subsection describes the detailed implementation of the previously outlined algorithm over the US.

We retrieve the raw input data that are available at least in the US from the following sources:

Street network data are retrieved from OpenStreetMap

Digital elevation model (DEM) data at a 10

Land use/land cover (LULC) data at a 30

Building footprints are retrieved from the Microsoft Building Footprints (MSBF) dataset

BUSN design criteria and recommended parameters are sourced from the

We perform the following post-processing operations on the raw input data:

We retrieve the “Road type” and “length” attributes for each street directly from OpenStreetMap (see Table

We calculate the surface slope and flow direction of each street in four steps. First, within any street network, we remove intersection points that are closer than the DEM resolution and, therefore, cannot be effectively used. Second, we hydrologically condition the DEM data to more accurately represent the flow direction of surface runoff. Third, we compute the street slope using the conditioned DEM data and set the slope value to 0.4 % if the computed slope is less than 0.4 %, as streets must have a minimum longitudinal slope of 0.4 %

We also estimate the streets' ROW in four steps. First, we assign the number of lanes to each street based on its road type defined in OpenStreetMap (OpenStreetMap, 2021) (see Table

We determine the dominant land cover type in the buffer zone of each street by computing the dominant cover type within the buffer zone from the high-resolution LULC data.

We estimate the total area of building footprints within the buffer zone of each street by summing up the footprints of the buildings with more than 30 % of their areas within the buffer zone.

Road type definitions

Upon performing these post-processing operations, each street has seven attributes: road type, length, ROW, surface slope, flow direction, land cover type, and building footprints' area. Once all of the input data are ready, we consider four weighting factors for IWBC: road type, LULC, building footprint area, and stormwater pipe flow capacity, as shown in Table

Ranges and scores of the weighting factors.

We group each weighting factor into five classes and assign a provisional weight to each class. We set a higher provisional weight to a class with less importance and vice versa. Recall that, for any edge in a weighted network, a higher weight implies a lower probability to be included in the shortest path and, thus, a smaller IWBC value.

The weighting factors in Table

We derive

Compute the provisional BC values (

Assign a suitable pipe size to each street in the network based on the

Set the pipe slopes based on the obtained street slopes and the permissible slope ranges corresponding to each pipe size given in Table

Compute the arithmetic mean of the four weighting factors and determine the corrected BC values for the network (i.e., the final IWBC values).

Slope range based on storm drain pipe size.

Model performance with drainage adequacy classifier index (DACI) values varying from 0.0 to 1.0 with a 0.05 interval for the four case studies.

Three different BUSNs estimated for the San Fernando Valley case with three different drainage adequacy classifier index (DACI) values, showing

Note that this predictor–corrector approach does not yet change the baseline street network topology. The obtained IWBC values are our basis to obtain the derived BUSN by removing those relatively unimportant streets from the baseline network. One may begin by removing those edges with IWBC values of less than a threshold, as these edges represent less important streets that are less likely to require belowground stormwater pipes. Intuitively, by increasing the IWBC threshold, we remove more elements from the baseline street network; thus, the drainage capacity of the BUSN corresponding to the remaining part of the street network decreases. There is, nevertheless, a nonlinear relationship between increasing the IWBC threshold and decreasing the derived BUSN. This is due to two reasons:

In most street networks, the numbers of edges associated with lower IWBC values are nonlinearly larger than those with higher IWBC values. Our analysis shows that the lowest IWBC values have the highest frequency in the network. For example, Fig.

The edges with lower IWBC values correspond to the pipes with smaller diameters, and their removal has a smaller impact on the total BUSN's drainage capacity compared with removing those edges corresponding to pipes with larger diameters.

Results for the Los Angeles case. Panel

Summary of real BUSN and street network data.

“No. of types” means the number of element type categories in a database (e.g., trunk and culvert), and “comm.” stands for network communities.

Therefore, IWBC cannot be directly used to guide this removal operation. We carry out this operation in two steps:

We use the first out of 10 classes based on the Fisher–Jenks method (FJ1) to identify the group of streets with the lowest IWBC values. FJ1 has the highest edge count and the lowest within-class variance in IWBC values.

Considering the small variance of IWBC values of the streets in FJ1, the quantile method is suitable for categorizing the streets based on their IWBC values. Thus, we use the quantile method as an indicator for removing edges from the baseline network.

In this study, we use the DACI as an empirical parameter. For a case study where real BUSN data are partially available, we increase the DACI value until there is no significant increase in the average

Results for the Houston case. Panel

Results for the Seattle case. Panel

We choose the following four major cities in the US as the case studies to demonstrate the algorithm (Fig.

Interestingly, the real BUSN and street network data show different characteristics among these case studies. The street network structure in the Baltimore and Seattle cases is quite different from that in the Houston and Los Angeles cases. In graph theory, a community refers to a group of nodes (street intersections) in a network where the density of the connections among them is higher than the rest of the network. In a street network, a community can be analogous to an urban cluster. Table

We retrieve the input data for the four case studies following the procedure described in Sect.

Furthermore, although the input data for the proposed BUSN algorithm are generally available in the US, the data quality might vary among different categories of input data and different locations. For example, in the available real BUSN data, there are often some missing edges. As is evident from Fig.

For each of the case studies, we run the algorithm with DACI values varying from 0.0 to 1.0 at a 0.05 interval (Fig.

Figure

Figures

Results for the Baltimore case. Panel

Comparison of the overland slope of urbanized areas for the four case studies based on

The algorithm performs very well in the Los Angeles and Houston cases, with

In this study, we do not account for the construction limitations and difficulties that arise in BUSNs in hilly terrain; therefore, we expect poorer performance in such areas. We quantify the slope variability of urban areas based on the cumulative percent (CP) graph for slope, as shown in Fig.

Table

Summary of the model performance for the four cases.

This study presents a novel algorithm for estimating belowground urban stormwater networks based on graph theory concepts and publicly available information. Most of the procedure is automatic, except for one empirical parameter that is specified by the user. Inputs of the algorithm are mostly land surface data, such as street network, topography, land use/land cover, and building footprints, that are readily available to the public and cover at least the whole US. We successfully validated the topology of the derived BUSNs for four US cities on both the west and east coasts, with the average coverage percentage varying from 59 % to 76 %.

Although we developed our proposed framework based on publicly available datasets and design manuals in the US, it is flexible and can be adapted to other regions with different design criteria and data availability. Moreover, despite relying only on publicly available datasets, which are not the most accurate available datasets, the model showed satisfactory performance.

There are a few directions to further improve the algorithm, including but not limited to the following:

The quality and availability of input data for the algorithm can be further enhanced at the regional or larger scales (e.g., the street network data).

The DACI threshold for deriving BUSNs is an empirical, user-specified parameter in this study. Estimating it a priori based on the hydroclimatic conditions for any urban watershed can be achieved via a rigorous hydraulic analysis involving estimating peak runoff and adequately detailed BUSN hydraulic modeling.

We may generalize this DACI threshold parameter at the regional or larger scales based on the regional hydroclimate conditions (e.g., intensity, duration, and frequency of extreme rainfall and peak runoff). However, these improvements are beyond the scope of this study and are left for future work.

We may further expand our algorithm to account for drainage catchments in urban areas and break down the derived BUSNs into several subnetworks that follow the catchments.

The BUSN algorithm in this study is designed for separate sewer systems. Considering that combined sewer systems have different design criteria, the applicability of the algorithm for such systems requires further research.

Ultimately, our proposed algorithm for estimating BUSNs is a valuable tool to support the parameterization of large-scale urban hydrologic modeling, particularly in the areas where BUSN data are not available. It may also provide decision support in regional-scale urban planning from the angle of stormwater and flood management.

The source code and the data generated from this study are available from the corresponding author upon reasonable request.

TC and HYL conceived the idea. TC designed and implemented the algorithm, and performed the analyses with inputs from HYL. Both authors contributed to writing the paper.

Hong-Yi Li acknowledges his financial interest in Pythias Analytics regarding the support from the Alfred P. Sloan Foundation.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Taher Chegini was supported by the University of Houston's internal funds and the Alfred P. Sloan Foundation via the Houston Advanced Research Center (grant no. UH0421). Hong-Yi Li was also supported by the US Department of Energy Office of Science Biological and Environmental Research as part of the Earth System Model Development program area through the collaborative, multi-program Integrated Coastal Modeling (ICoM) project.

This research has been supported by the Alfred P. Sloan Foundation (grant no. UH0421).

This paper was edited by Fuqiang Tian and reviewed by three anonymous referees.