Conventional rainfall frequency analysis faces several limitations. These include difficulty incorporating relevant atmospheric variables beyond precipitation and limited ability to depict the frequency of rainfall over large areas that is relevant for flooding. This study proposes a storm-based model of extreme precipitation frequency based on the atmospheric water balance equation. We developed a storm tracking and regional characterization (STARCH) method to identify precipitation systems in space and time from hourly ERA5 precipitation fields over the contiguous United States from 1951 to 2020. Extreme “storm catalogs” were created by selecting annual maximum storms with specific areas and durations over a chosen region. The annual maximum storm precipitation was then modeled via multivariate distributions of atmospheric water balance components using vine copula models. We applied this approach to estimate precipitation average recurrence intervals for storm areas from 5000 to 100 000 km
The probability of extreme rainfall is of great interest and importance in
flood risk estimation and management (e.g., Koutsoyiannis et al., 1998; Langousis et al., 2009; Nerantzaki and Papalexiou, 2022; Troutman and Karlinger, 2003). Standard practice is to fit a univariate probability distribution to either the largest rainfall observations each year (an annual maxima series) or the rainfall values that exceed a high threshold (a peaks-over-threshold or partial duration series (Coles, 2001; Madsen et al., 1997; Miniussi et al., 2020). In either case, the rainfall series corresponds to a given duration (e.g., 1, 24 h, etc.) and spatial scale. The latter is usually the sampling orifice of a rain gauge (roughly 0.1 m
First, relying exclusively on rainfall observations – as opposed to using atmospheric and land surface processes and variables that contribute to rainfall – can preclude knowledge and measurements that could be informative for precipitation frequency analysis (Katz et al., 2002; Klemeš, 1993). This contrasts with techniques that estimate probable maximum precipitation – widely used in flood hazard analyses for major dams and nuclear facilities – which for decades have considered concepts and measurements of atmospheric water vapor storage and transport (e.g., Chen and Bradley, 2006; Rakhecha and Clark, 1999; Rousseau et al., 2014; World Meteorological Organization, 2009), and have recently expanded to dynamic atmospheric simulations (e.g., Alaya et al., 2018; Lee and Kim, 2018; Toride et al., 2019). To overcome this limitation, some recent rainfall frequency studies have attempted to use rainfall-related variables (e.g., sea surface temperature and dew point temperature) and changes in large-scale weather systems as predictors of rainfall frequency and its changes (see, e.g., Kunkel et al., 2020; Roderick et al., 2020a).
Second, while a single rain gauge can provide local observations that lead to ready-to-use quantile estimates (e.g., for flood hazard modeling), it severely restricts the number of extremes and cannot represent areal maxima over a large region. This is because much of the instrumental record at a gauge consists of local and smaller events, which have limited value for understanding rarer and more extreme storms at large scales (e.g., Durrans et al., 2002; Steiner et al., 1999; Svensson and Jones, 2010). While this shortcoming can be ameliorated using regionalization techniques that utilize multiple nearby gauges (e.g., Dawdy et al., 2012; Schaefer, 1990), these gauge-based analyses still struggle to represent precipitation areal maxima due to rain gauges' limited sampling area (Matsoukas et al., 1999; Villarini et al., 2008) and the complexity of storm spatiotemporal extents (Efstratiadis et al., 2014; Krajewski, 1987).
In this study, we present an alternative approach for rainfall frequency
analysis that addresses these two aforementioned limitations. Here, we
highlight two key features of our approach:
First, our approach integrates a more physically detailed (albeit still highly simplified) rainfall-producing process. Specifically, we consider the atmospheric water balance equation, in which the change of water vapor storage within a control volume is balanced by water vapor flux in and out – namely precipitation, evapotranspiration, and water vapor flux convergence (Bradbury, 1957; Banacos and Schultz, 2005; Su and Smith, 2021). Due to mass conservation, precipitation can be modeled as a combination of the remaining components that jointly form a multivariate distribution (Alaya et al., 2020; Klemeš, 1993; Gao et al., 2005). Previous multivariate precipitation modeling of precipitation frequency can be seen in De Michele and Salvadori (2003), Jun et al. (2017), and Salvadori and De Michele (2007), but these focused on joint modeling of “precipitation dimensions” (e.g., rainfall intensity, volume, and duration) rather than the atmospheric water balance. Here, we use vine copulas to represent this multivariate distribution by decomposing it into bivariate dependence structures and marginal distributions (Aas et al., 2009). Applications of vine copula models have been seen in risk analysis (Bevacqua et al., 2017; Sarhadi et al., 2018; Xiong et al., 2014), rainfall simulation (Gyasi-Agyei and Melching, 2012; Vernieuwe et al., 2015), and streamflow modeling (Pereira and Veiga, 2018), but to the best of our knowledge, no study has applied vine copulas to extreme rainfall frequency analysis. Second, our approach is “storm-centered” (Chang et al., 2016; Li et al., 2020; National Research Council, 1988, 1994) rather than gauge-centered. We use storm tracking methods to identify and follow two-dimensional rainfall systems (i.e., storm objects) in space and time within a land–atmosphere reanalysis dataset. These storm objects, particularly their high-rainfall areas, are considered as control volumes to compute the atmospheric water balance for multivariate modeling. Our “storm-centered” approach has three main properties: (1) unlike gauge-centered approaches (Restrepo-Posada and Eagleson, 1982), we can identify all major storms over a region. (2) We can examine precipitation frequency and drivers over user-defined areas, rather than over the spatial scale of a rain gauge orifice. This further allows us to derive “storm-centered” depth–area–duration (DAD) relationships and ARIs over a region. Specifically, based on areal precipitation extracted from storm objects, we can characterize how precipitation estimates change with storm area, given a certain ARI and storm duration. (3) The ARIs estimated by our approach represent the frequency of extreme storms within a chosen region. Such ARIs should be interpreted differently from those in traditional gauge-based analysis that represent local precipitation frequencies. The last two properties are discussed in more detail in Sect. 5. Previous “storm-centered” studies have investigated precipitation properties based on observations and model simulations (e.g., Chang et al., 2016; Davis et al., 2006; Hoskins and Hodges, 2002; Pérez-Alarcón et al., 2022; Shaw et al., 2016). Though there has been long-standing recognition that storm-centered methods hold some advantages over gauge-based methods in rainfall frequency analysis (e.g., National Research Council, 1994), far fewer studies have explored the topic (see Wright et al., 2020 for recent review and discussion).
Descriptions of used ERA5 variables.
The objective of this study is to develop an alternative approach for extreme precipitation frequency, by integrating storm tracking with a multivariate vine copulas model of atmospheric water balance. The approach allows an explicit representation of the dependencies between rainfall-contributing components. It also highlights some of the strengths and limitations of using reanalysis and other atmospheric simulations to study extreme rainfall and its drivers. We use the approach to investigate the frequencies and characteristics of major storms in the Mississippi Basin and its five major subbasins. The remainder of the paper is organized as follows: Sect. 2 describes the study basin and datasets. Section 3 details the proposed methods for storm identification and multivariate modeling. Results are shown in Sect. 4, followed by discussion in Sect. 5. Conclusions are provided in Sect. 6.
The study site is the Mississippi River Basin, located in the central United
States with a drainage area of over 3 220 000 km
Mississippi Basin, including (1) Arkansas-Red, (2) Missouri, (3) Upper Mississippi, (4) Ohio-Tennessee, and (5) Lower Mississippi subbasins. Blue shading denotes the domain over which storm tracking was
applied (24–52
ERA5 Reanalysis was used to identify storms and to compute atmospheric moisture components. This dataset, produced by the European Center for Medium-Range Weather Forecasting (ECMWF), provides hourly estimates of
global climate variables on a 0.25
Meanwhile, Integrated Multi-satellite Retrievals for Global precipitation
measurement (IMERG) data from 2001 to 2019 were used to validate storm tracking results. IMERG surface precipitation rates are estimated globally on a 30 min 0.1
Process of the storm tracking and regional characterization method (STARCH).
To identify and analyze regional extreme storm events in the Mississippi
Basin we developed the storm tracking and regional characterization method
(STARCH, publicly available at Step 1 – storm identification: individual storm objects within a precipitation field were identified at a single time step. First, an initial threshold of 0.5 mm h Step 2 – storm tracking: in this step, storms were tracked across time steps using the overlapping ratio method (NCAR, 2019). For storm Step 3 – area–duration selection: the third step is to extract extreme precipitation events of desired duration Step 4 – storm catalog generation: the final step is to quantify storm characteristics and build annual maximum “storm catalogs”. For each storm selected in Step 3, we computed the storm area, duration, centroid speed, bearing (clockwise direction for north of centroid movement), and atmospheric water balance components (described below in Sect. 3.2). The annual maximum storm catalogs were created by collecting the largest storm each year with area
STARCH was applied to ERA5 from 1951 to 2020 and IMERG from 2001 to 2019 using the parameters from Table 2. Since the precipitation patterns in IMERG
are somewhat more scattered than those in ERA5, we increased the morphing
structure radius and reduced the precipitation thresholds to avoid overly
isolated storm regions. The same overlapping ratio threshold was used for
both datasets. Afterward, the area–duration selection was applied to ERA5
storms for the five subbasins and the whole Mississippi Basin. Values of
storm area
Parameter settings used in STARCH.
This study treats each storm area identified by STARCH as a control volume
for computing a vertically integrated atmospheric water balance (Su and Smith, 2021), which can be written as
All water balance components
This section describes a multivariate vine copula model to estimate the
frequency of extreme precipitation based on other atmospheric water balance
components. According to Eq. (2), precipitation can be calculated as the sum
of the evapotranspiration, water vapor flux divergence, and the time derivative of total precipitable water. In principle, the water balance is
closed by mass conservation. However, due to data assimilation and
differencing schemes, ERA5 does not guarantee water balance closure (see
Sect. 5.1 for further discussion). Therefore, a residual error term was
introduced to Eq. (2):
Treating the right-hand side terms in Eq. (6) as random variables,
precipitation can be represented by a multivariate distribution
We used the R package “VineCopula” to sequentially select vine structures
and estimate copula parameters (Nagler et al., 2021). First, the tree structure at the top level (
Comparison of storm tracking results based on ERA5
For each annual maximum storm catalog described in Sect. 3.1 (144 cases in
total), we fitted an individual vine model to the water balance components
We also computed relative root mean square error (RRMSE) to measure
discrepancies between simulated and reference precipitation at the same
recurrence intervals:
In this section, STARCH storm tracking results based on ERA5 were compared with those from IMERG. This comparison is intended to validate the basic spatiotemporal properties of ERA5-simulated storm systems against observation-based IMERG results. This was done via visual inspection of storm tracks and comparing storm characteristics between the two datasets for the 2001–2019 period. An example over 9 h is shown in Fig. 4. Precipitation regions in ERA5 are generally more contiguous in space than in IMERG. Despite some local differences, storm patterns are generally similar between the two datasets. This is confirmed by the large-sample comparison of the storm characteristics shown in Table 3. The average area (maximum precipitation intensity) of ERA5 storms is 7 % (2 %) higher than IMERG storms, while the average speed (bearing) differs by 4 % (7 %). Discrepancies in storm duration and number are slightly larger; storms in ERA5 last 26 % longer than those in IMERG, while the number of storms per hour is about 20 % smaller. These differences are attributable to the intermittent precipitation regions in IMERG, where the scattered precipitation grids around the storm body are likely to be identified as individual storms with a shorter duration. In summary, despite some discrepancies, ERA5 storm object properties are roughly consistent with a satellite-based observational dataset, lending support to subsequent analyses that rely on ERA5-based objects.
Comparison of the average storm characteristic based on ERA5 and IMERG datasets from 2001 to 2019.
Q–Q plots of standardized atmospheric water balance components between vine copula simulations and reference data from the storm catalog in the Arkansas-Red Basin with a storm area of 25 000 km
Goodness-of-fit of the vine copula models was assessed by comparing model
simulations with reference data from the annual maximum storm catalogs.
Quantile–quantile (Q–Q) plots were used to compare water balance components
from reference data against simulated values at the same quantile. The
simulations here were generated from 100 bootstrap realizations with a
sample size of 70. By examining the Q–Q plots of all the 144 cases, we found
that the points of reference values and mean simulations fall approximately
on the diagonal line, indicating satisfying model fitting (see Fig. 5 for
one example). The good fitting is also supported by Pearson's correlation
coefficients between sorted reference data and mean simulations. The mean
correlation coefficient is 0.995 for
We compared copula-simulated precipitation annual maxima against the reference ERA5 precipitation annual maxima from each storm catalog. The ARI of the reference and simulated precipitation was estimated using the Cunnane plotting position. An example of one storm catalog can be seen in Fig. 6; a GEV distribution fitted to the ERA5 precipitation annual maxima using L moments is also shown. Both the univariate GEV and vine copula models can describe the distribution of reference precipitation, with copula estimates slightly exceeding those of the GEV – and cleaving more closely to the reference – for recurrence intervals above 10 years. By examining all 144 storm catalogs, we found that both vine and GEV models agree well with the empirical distributions of the reference data. Good agreement between the model and reference data is also supported by small RRMSEs with a mean of 3 % and a standard deviation of 1 %, indicating that the vine model can depict the dependence structures between atmospheric water balance components to generate realistic precipitation ARIs.
Comparison of estimated return levels of precipitation between vine copula and GEV models in the Arkansas-Red Basin with a storm area of 25 000 km
100-year precipitation rate return levels from the vine copula and univariate GEV models were calculated for all 144 storm catalogs (Fig. 7; results of 10-year and 500-year return levels are also shown in Figs. A1 and A2 in Appendix A, respectively). As expected, the precipitation rate decreases with increasing storm area and duration. The storm catalog for the entire Mississippi Basin yields the overall highest precipitation estimates, varying from 3.8 to 31.0 mm h
Estimated 100-year precipitation rate (mm h
100-year DAD relationships based on vine copula estimates for the Upper
Mississippi Basin are shown in Fig. 8a. At the 5000 km
Estimated DAD curves.
Our approach can characterize the areal distributions of storm precipitation
at different frequency levels. Figure 8b shows depth–area relationships for
24 h storms with ARIs ranging from 2 to 500 years in the Upper Mississippi Basin. The precipitation estimates are the highest at 5000 km
We also compared DAD relationships across subbasins. Figure 8c and d shows the precipitation estimates for 2 and 24 h storms with 100-year ARI in the six basins. Many of these largest storms are from the Lower Mississippi Basin. At the 2 h duration, the overall Mississippi Basin has depth estimates between 24 mm (100 000 km
Subbasin differences between the DAD curves are more substantial at the 24 h duration (Fig. 8d). Akin to the 2 h duration, Mississippi and Lower Mississippi Basins have similar curves and higher precipitation than the other subbasins. The 24 h duration curves for the Arkansas-Red and Ohio-Tennessee Basins are similar, while Missouri and Upper Mississippi Basins are quite close. Overall, the differences in DAD relationships indicate strong spatial heterogeneity of extreme storms inside the Mississippi Basin, presumably stemming from distance to significant moisture sources.
This section provides a deeper examination of ERA5 water balance components
used in the vine copula model.
Boxplots of the average water balance components. Points show the average of the water balance components from each storm catalog, with colors denoting storm durations.
The sum of
As mentioned in Sect. 3.3, the residual
Contrary to long-term water balances (Berrisford et al., 2011; Gutenstein et al., 2021), the evapotranspiration term for extreme storms is small at around 3 % of precipitation (Fig. 9d). The mean of the evapotranspiration rate decreases from 0.12 to 0.09 mm h
Kendall's
The mean and standard deviation (in parenthesis) of Kendall's
The absolute values of correlation coefficient
We further examined how correlations between water vapor components depend on storm area and duration (Fig. 10). The average correlation coefficient between the divergence and time derivative term increases with storm area (Fig. 10a), while exhibiting a more complex pattern for duration – increasing from 0.21 at 2 h to 0.24 at 24 h and then decreasing to 0.15 at 72 h (Fig. 10b). For the divergence and the residual terms, the mean correlation drops from 0.5 to 0.34 with increasing area and from 0.46 to 0.35 with increasing duration. The average correlation between the divergence and precipitation terms decreases from 0.69 to 0.58 with increasing area while rising from 0.49 at 2 h duration to 0.75 at 72 h duration. The average correlation between the time derivative and residual terms shows a modest reduction from 0.13 to 0.08 with increasing area, and a larger drop, from 0.29 to 0.17, with increasing duration. The average correlation between residual and precipitation terms drops from 0.29 to 0.17 with increasing area, while it increases from 0.18 at 2 h to 0.27 at 24 h and drops to 0.23 at 72 h duration. In short, there are complex relationships among water balance components that depend on storm spatiotemporal scale and cannot be ignored in any attempt at modeling their joint roles in extreme precipitation.
The correlation analyses shown in Table 4 and Fig. 10 are imperfect
representations of the dependency structure between water balance components, since correlation coefficients distill bivariate distributions into a single number. The vine copula, on the other hand, can capture such structures. More detailed relationships can be seen by plotting the simulation results from the vine copula model. Figure 11 shows an example – the 25 000 km
Bivariate dependency structure of simulated atmospheric water balance components from the 25 000 km
The histograms along the diagonal show the marginal distributions of each water balance component used in the vine copula model; the divergence term's histogram is smooth due to the use of a GEV marginal distribution, while the other three components used empirical CDFs. We used empirical CDFs for the remaining variables to reduce additional parameters and errors introduced by fitting parametric distributions; this is common practice in vine copula modeling. However, empirical CDFs can constrain the simulated variables to their maximum in the original sample data, leading to unrealistic upper-bounded tail behavior. Therefore, we used GEV distribution to fit the dominant component (i.e., the convergence term) to allow the model to generate extreme precipitation that exceeds the original maximum. Note that it is feasible to fit parametric distributions to all atmospheric water balance components. The influence on the results is rather minor as long as the parametric distribution fits to each component are good. For example, the distribution of the residual term can be fitted by a t-distribution, while the time derivative and evapotranspiration terms can be fitted with beta distributions. Another advantage of using parametric distribution is that nonstationarities (e.g., changing location and scale) in each atmospheric water balance component can be modeled with distribution parameters that vary with time or other climate indices (see Sect. 5.5).
The residual term
ERA5 Reanalysis is based on numerical weather forecasts combined with multiple observations and is subject to multiple error sources, including numerical model errors, observation errors, DA errors, and spatiotemporal heterogeneity of data sources (Bosilovich et al., 2008; Nogueira, 2020). As a result, precipitation bias exists in ERA5 over the study region, which can influence our model estimates. Previous studies have shown discrepancies in precipitation climatology between ERA5 and observations over the CONUS. We performed an additional comparison with interpolated gauge-based precipitation fields, supporting that ERA5 underestimates extreme precipitation in the Mississippi Basin (see Appendix B). The underestimation may be related to insufficiently strong water vapor flux divergence, given its primary role in extreme storms and high correlation with precipitation as described in Sect. 4.4. Also, the coarse spatial and temporal resolution of ERA5 may limit its ability to represent small-scale, short-lived convective storms that generate extreme precipitation (Beck et al., 2019; Ebert et al., 2007). Studies have also mentioned precipitation bias coming from orography smoothing and inadequate observations over mountainous areas (Essou et al., 2017; Jiao et al., 2021). Overall, one must be careful when analyzing extreme storms based on reanalysis, since the accuracy can vary greatly in space and time depending on topography, climate region, and the quantity and quality of assimilated observations (Ebert et al., 2007; Essou et al., 2017; Zhang et al., 2018).
Uncertainties also exist in atmospheric water balance components in ERA5
Reanalysis and can influence the precipitation estimates from the vine
copula model. Uniquely among reanalyses (to the best of our knowledge), ERA5
includes coarser-resolution (3 h, 0.5
Our “storm-centered” approach allows the estimation of precipitation frequency for a specific area and duration within a region (as shown in Sect. 4.3). This constitutes an alternative way to approach the long-standing task of deriving DAD relationships, i.e., describing how extreme precipitation depth varies with averaging area and duration, usually for the purpose of flood analysis (Alexander, 1963; USACE, 1945; Weather Bureau, 1946). It should be noted, however, that the derivation of DAD from individual storms has generally not been extended to recurrence intervals as is done here (e.g., in Fig. 8). The one notable exception that we are aware of is stochastic storm transposition (Alexander, 1963; Foufoula-Georgiou, 1989; Wright et al., 2020). Our method – which can produce ARI estimates associated with DAD relationships – shares SST's focus on storm catalogs (Wright et al., 2013; Zhou et al., 2019). While it is not the primary focus of this study, we show the potential of the “storm-centered” idea to address the old question of DAD relationships, with the help of advancements in precipitation products and storm tracking methods.
Both DAD and our approach also share a connection to area reduction factors (ARFs), which are fractions between zero and 1 that depict the ratio of the average precipitation depth over an area to a point-scale precipitation depth, given a fixed duration. DAD and ARF are highly related concepts, despite serving somewhat different purposes. ARFs are used to convert gauge-based (i.e., point-scale) precipitation frequency estimates to areal estimates of the same ARI and duration (e.g., Kao and Deneale, 2021; Miller, 1964; Olivera et al., 2008). Most ARF studies have tried to obtain such ratios using a “fixed-area” approach, i.e., to relate precipitation depth from point to area at a fixed location (e.g., Asquith and Famiglietti, 2000; Breinl et al., 2020; Durrans et al., 2002), though others have argued that storm-centered ARF approaches are more conceptually valid (Kim and Kang,
2017; Thorndahl et al., 2019; Wright et al., 2014). The ability to derive
storm-centered DAD relationships using our method can, in principle, obviate
the need for ARFs entirely, something that has been advocated previously
(Wright et al., 2014). To support this point, we compared vine copula DAD curves with those estimated by the ARFs from Kao et al. (2020) in the Ohio River Basin at 10-year ARI (see Appendix C and Fig. C1). The vine copula DAD estimates agree well with those ARF estimates for storm duration between 6 and 72 h, while for 2 h storms, the vine copula estimates are more conservative, i.e., the precipitation depth reduces much more slowly with increasing area. Such discrepancies may be attributable to the ARF estimation of Kao et al. (2020) being a “fixed-area” approach, i.e., the precipitation depth is compared to areal depth in a watershed. Nevertheless, the number of large watersheds in the Mississippi Basin (e.g., watersheds greater than 50 000 km
The ARI is conventionally interpreted as the expected time interval (in years) between events exceeding a certain magnitude at one specific location
(Coles, 2001; Serinaldi, 2015). On the contrary, our “storm-centered” approach identified all the storms at different locations within the basin to create annual maximum storm catalogs. As a result, the ARI estimates in our study describe the probability that an extreme storm will happen somewhere within a region (e.g., the Mississippi basin or subbasin), while the storm location is not specified. Examples of equivalent formulations of ARI can be found in Bosma et al. (2020) and Zhu et al. (2013). Moving from this formulation to the computation of ARIs at one specific site (e.g., a watershed) requires one to model the storm spatial arrival process, which describes the probability that the extreme storm “hits” the chosen site (e.g., Nathan et al., 2016; Wilson and Foufoula-Georgiou, 1990; Wright et al., 2020). The arrival process is challenging because the storm occurrence rate can vary greatly across a region due to inhomogeneous precipitation properties (e.g., Wilson and Foufoula-Georgiou, 1990; Wright et al., 2020; Yu et al., 2021). One simplification is to assume that storm position is independent of storm characteristics, and the storm center is equally likely to occur within the basin (Alexander, 1969; Foufoula-Georgiou, 1989). Then, the exceedance probability at a specific watershed within the basin can be written as
Changes in annual and extreme precipitation due to anthropogenic climate change in the Mississippi Basin have been observed and investigated in many studies (e.g., Karl and Knight, 1998; Groisman et al., 2004; Pan et al., 2016; Gori et al., 2022), with important implications for precipitation frequency estimates (e.g., Milly et al., 2008; Bender et al., 2014; Zscheischler et al., 2018). For multivariate distribution models (e.g., copulas), nonstationarities may exist in the marginal distributions of random variables or their relationships, i.e., dependency structures (Xu et al., 2020). Our approach can consider nonstationarities by using marginal distributions and dependence structures with parameters that vary as a function of, e.g., time, temperature, or climate indices in vine copulas (for examples, see Bender et al., 2014; Jiang et al., 2015; Sarhadi et al., 2018; Xu et al., 2020). This attribute gives our method more flexibility to analyze extreme storm frequency in a changing climate, especially to reveal nonstationary relationships between moisture components. Nevertheless, we expect high uncertainties when using nonstationary models due to statistical issues (Serinaldi and Kilsby, 2015) and a lack of clarity around precipitation change across gridded precipitation datasets (Mallakpour et al., 2022). Therefore, multi-model trend comparison and careful interpretation are necessary components for nonstationary modeling (Mallakpour et al., 2022), especially when using reanalysis that exhibits substantial inter-model variability (Alexander et al., 2020). As mentioned above, such modeling is potentially within reach of our framework but is beyond the scope of this study.
In the study, we present a storm-centered multivariate copula model based on
the atmospheric water balance equation to estimate the frequency of extreme
spatiotemporal precipitation. The model was applied to extreme precipitation
within the Mississippi Basin. Two-dimensional storm objects were identified
using the storm tracking and regional characterization (STARCH) method applied to ERA5 precipitation fields over the contiguous United States from
1951 to 2020. STARCH identified storm objects at each time step, then merged
them across time to track storms. Afterward, an area–duration selection
algorithm was used to extract the largest storms from each year with
specific areas and durations in order to create “storm catalogs”, each of
70 annual maxima. Selected areas were 5000, 25 000, 50 000, and 100 000 km
The annual maximum precipitation distribution was represented using a joint distribution of atmospheric water balance components: land surface evapotranspiration, water vapor flux divergence, the time derivative of water vapor storage, and a residual error term. The latter was included to account for imperfect water balance closure in ERA5. Within each storm catalog, water balance components from ERA5 were computed for each annual maximum storm object and fitted to a multivariate vine copula model. This model used a GEV marginal distribution for the divergence term and empirical distributions for the other components. The fitted model was then used to simulate samples of water balance components to calculate the associated precipitation via the water balance equation. The frequency of annual precipitation maxima was then estimated nonparametrically based on these simulations. The following conclusions can be drawn:
It is feasible to generate plausible extreme precipitation estimates from a vine copula model that incorporates additional physical information from the atmospheric water balance. Good fits were found based on Q–Q plots and comparisons of estimated ARI against those from more conventional univariate GEV distributions fitted to ERA5 precipitation. This indicates that the copula approach can represent the complex dependence structures between water balance components. Percentage differences between the vine copula and univariate GEV models were on average less than 4 % for the 100-year ARI but increased for rarer quantiles.
Among water balance components, water vapor flux divergence is the dominant term in extreme precipitation. Land surface evapotranspiration plays the smallest role, constituting about 3 % of precipitation. The sum of the time derivative of precipitable water and residual terms constitute an average of 17 % of precipitation. Nonlinear dependencies among water balance components vary with storm region, area, and duration; these cannot be neglected when modeling their joint roles in extreme precipitation.
Prior studies have shown that, due to data assimilation and numerical methods, the atmospheric water vapor mass in ERA5 is not perfectly conserved. Despite this, we found relatively good atmospheric water balance closure for extreme events in the Mississippi Basin, with the residual on average constituting about 10 % of precipitation. To our knowledge, this is the first study that examines atmospheric water balance closure for precipitation extremes in reanalysis data. The residual term is the greatest in the 5000 km
Despite the advancement in numerical and data assimilation schemes, ERA5 is still subject to model bias; it tends to underestimate extreme precipitation over the Mississippi Basin. This bias may be attributable to the inadequate simulation of water vapor flux divergence due to its dominant role in precipitation extremes. Also, coarse spatiotemporal resolution and limited observations for assimilation over mountainous areas can contribute further to precipitation biases.
Overall, rainfall frequency analysis can benefit from utilizing additional information from atmospheric/land surface processes. Compared with the conventional approach, the vine copula model allows an explicit representation of the dependence structures between atmospheric water balance components and enables us to investigate the main driver of extreme precipitation. The model does not need an explicit parametric form for the tail of the precipitation distribution, as it is determined by the tail of the moisture components and their dependence structures. This dependence structure can also serve as a constraint that prevents unrealistic large estimates (Bevacqua et al., 2017). Though not explored here due to the complications in interpreting the results (e.g., nonstationary ARIs), the approach is also able to accommodate nonstationary conditions by incorporating marginal distributions and dependence structures that vary with time or according to other climate predictions (e.g., temperature or climate indices, see Bender et al., 2014; Sarhadi et al., 2018; Xu et al., 2020).
Our “storm-centered” approach enables us to focus on the major extreme storms over a region, notwithstanding ERA5's tendency to underestimate those extremes. This feature contrasts with more typical gauge-based analyses that may restrict the number of extreme events due to a limited sampling area. By preserving the spatiotemporal structure of the storms, we can investigate extreme precipitation frequency in user-defined areas. Such areal precipitation estimates demonstrate the potential of using the “storm-centered” idea to derive DAD relationships for storms with different ARIs, with the help of long-term reanalysis products and storm tracking techniques.
The ARIs estimated by our approach represent the frequencies of extreme storms that happen over the entire region. To obtain precipitation ARIs for one specific site within that basin, an additional storm arrival process (i.e., the probability that the storm “hits” the site) is needed to modify the ARI estimates. Such arrival processes can be realized using statistical models or Monte Carlo simulations and will be a direction of future study. Despite evident biases in extreme precipitation, reasonable atmospheric water balance closure can be found in ERA5, lending confidence that it and similar reanalyses can represent reasonable water vapor interactions (Brown and Kummerow, 2014), which can help to understand the mechanisms of extreme events. In the longer run, the performance of this approach can be expected to benefit from further developments in numerical weather simulation and data assimilation.
Estimated 10-year precipitation from vine copula and univariate GEV models. GEV estimates are in parentheses.
Estimated 500-year precipitation from vine copula and univariate GEV models. GEV estimates are in parentheses.
To better assess the bias in ERA5 extreme precipitation, here we provide a
comparison between ERA5 Reanalysis and a gauge-based product, the Gridded
5 km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid, Vose et al., 2014), which provides daily precipitation on a 0.05
Comparison of basin-averaged 99th percentile daily precipitation between ERA5 and nClimGrid from 1951 to 2020. The percentage difference is computed by (nClimGrid-ERA5)/nClimGrid
99th percentile daily precipitation during 1951–2020 for
The vine copula DAD curve was compared against the ARFs in Kao et al. (2020). The ARFs were estimated for 10-year precipitation with durations of 2–72 h and areas of 10–100 000 km
Comparison of DAD curves estimated from vine copulas (solid markers) and ARFs from Kao et al. (2020) (empty makers) for Ohio River Basin with 10-year ARI and 2–72 h durations.
The STARCH code is available at Zenodo:
ERA5 Reanalysis data can be downloaded from ECMWF Climate Data Store (
YL and DBW worked together to set up the study idea and perform the modeling analysis. YL developed the code for the method in the study. YL wrote the paper with contributions from both the authors.
The contact author has declared that neither of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yuan Liu's and Daniel B. Wright's contributions were supported by the US National Science Foundation (NSF) Hydrologic Sciences Program (award number 1749638).
This research has been supported by the National Science Foundation (grant no. 1749638).
This paper was edited by Efrat Morin and reviewed by Geoff Pegram and one anonymous referee.