Assessing the relationship between the intensity, duration, and frequency (IDF) of extreme precipitation is required for the design of water management systems. However, when modeling sub-daily precipitation extremes, there are commonly only short observation time series available. This problem can be overcome by applying the duration-dependent formulation of the generalized extreme value (GEV) distribution which fits an IDF model with a range of durations simultaneously. The originally proposed duration-dependent GEV model exhibits a power-law-like behavior of the quantiles and takes
care of a deviation from this scaling relation (curvature) for sub-hourly durations

The number of heavy precipitation events has increased significantly in Europe

The definition of precipitation extremes is based on the occurrence probability and is quantified using quantiles (return levels) and associated
occurrence probabilities, often expressed as return periods in a stationary interpretation. Quantitative estimations of quantiles and associated
probabilities mostly follow one of two popular methods, namely (1) block maxima and their description with the generalized extreme value (GEV)
distribution – a heavy-tailed and asymmetric distribution – or (2) threshold exceedances and a description with the generalized Pareto distribution

A way to describe the characteristics of extremes for various durations (timescales) are intensity–duration–frequency (IDF) curves, which describe the relationship between extreme precipitation intensities, their duration (timescale), and their frequency (occurrence probability or average return period). These relations have been known since the mid-20th century

Historically, a set of GEV distributions is sought individually for a set of durations (e.g., 5

It is widely agreed that precipitation intensities for given exceedance probabilities follow a power-law-like function (scaling) across duration

The commonly used variant of the d-GEV with five parameters

In this study, we compare different ways to parameterize IDF curves, including the features of multiscaling and duration offset. In addition, we present a
new d-GEV parameter, the intensity offset, which accounts for the deviation from the power law and the flattening of IDF curves for long durations. To our
knowledge, this comprehensive analysis of different features has not been conducted before. Section

We use precipitation measurements in an area in and around the catchment of the river Wupper in western Germany. In order to compare different models
for the d-GEV, we use the quantile skill index (QSI) introduced in

Precipitation sums for the minute, hour, and day are provided by Wupperverband and the German Meteorological Service. Rain gauges are located in and around the catchment of the river Wupper in North Rhine–Westphalia, Germany. In total, 115 stations are used. Data from two measuring devices with a distance below 250 m are combined into one station each in order to obtain a longer time series, thus resulting in a total of 92 grouped stations. However, in cases where measurement series are grouped together, it is common to have measurements from both instruments for a certain period of time. Thus, when merging, we decided to use only the observations with the higher measuring frequency for the analysis. For example, when combining two time series with hourly and daily data, respectively, the time series of all aggregation levels (durations) are obtained from the hourly data for the overlapping period. This choice is made because 24

Years with more than 10 % of missing values are disregarded. Some years contain measurement artifacts, where identical rainfall values were
repeated over several time steps. After consulting the data maintainers, these years are removed before the analysis. The data exhibit heterogeneity
in terms of the temporal frequency and the length of the resulting time series. Figure

Time series for different durations are obtained from an accumulation over a sample of durations as follows:

The set of durations

Minute measurements might be less accurate when only a small number of rain drops is recorded and measured intensity is affected by sampling uncertainty. However, for events that are identified as annual maxima, we expect the rain amount to be large enough so that a higher sampling uncertainty compared to larger measurement accumulation sums can be neglected.

One of the most prominent ideas of extreme value statistics is based on the Fisher–Tippett–Gnedenko theorem, which states that, under suitable
assumptions, maxima drawn from sufficiently large blocks follow one of three distributions. These distributions differ in their tail behavior. The GEV
distribution comprises all three cases in one parametric family and is widely used in extreme precipitation analysis as follows

Here, the non-exceedance probability

Water management authorities and other institutions rely on return values for different durations. However, the GEV distribution in the form of
Eq. (

There are multiple empirical formulations for the relationship between intensity

Here,

IDF curve examples showing the visualization of different IDF curve features. (

When applying the GEV separately to every duration out of a set of durations and interpolating in a second independent modeling step, the number of
parameters equals three GEV parameters times the number of selected durations plus at least three parameters for interpolating every quantile. For the
set of durations chosen here, and for evaluating five quantiles, this implies estimating 15

Models can be further improved by adding the multiscaling feature

Using only the curvature feature,

Summarizing this section, the following three different features were presented: (1) curvature, described by the duration offset parameter

Parameters of the d-GEV distribution are estimated by maximizing the likelihood (maximum likelihood estimation – MLE) under the assumption of
independent annual maxima.

Finding reasonable initial values for d-GEV parameters in the optimization process was a major challenge during parameter estimation because
optimization stability strongly depends on the choice of initial values. Details about this procedure can be found in Appendix

After estimating GEV parameters, quantiles

To compare different IDF models in terms of the QS, we require another verification measure. The quantile skill score QSS compares
the quantile score

The QSS takes values

The QSI has a symmetric range and indicates either (1) a good skill over the reference when leaning clearly towards 1, (2) little or no skill
when being close to 0, or (3) worse performance than the reference when leaning clearly towards

In this study, the quantile score was calculated in a cross-validation setting. For each station, the available years with maxima are divided into

Then, the QSI is derived from the averaged cross-validated QS of the model,

In order to compare individual model features, we will use the mentioned models without this specific feature as a reference in the following.

To provide an estimate of the uncertainty of the intensity quantile estimates in IDF curves, we obtain 95 % confidence intervals using a
bootstrapping method. To account for dependence between annual maxima of different durations, we apply the ordinary non-parametric bootstrap percentile
method

Please note that, in this paragraph, the empirical quantiles used for the confidence intervals should not be confused with the intensity quantiles, which describe the return level and are referred to as quantiles as well. For each station, we draw a sample of years (with replacement) from the set of years with available data. This way, for a chosen year, all maxima from this year are used, and we expect that sampling in this way maintains the dependence structure of the data. We then estimate the parameters of the d-GEV, which is used to calculate the intensity quantile that is connected to a certain non-exceedance probability (see Eq.

We conduct a simulation study to examine whether the derived confidence intervals provide reasonable coverage despite the dependence between the
annual maxima of different durations. Therefore, we simulate 500 samples of data, each with a size of

Results are presented in the following order: (1) modeling performance is verified with the QSI for the three different IDF curve features, i.e., curvature, multiscaling, and flattening. (2) IDF curves with all three features are shown for two rain gauges. Curves are presented with a 95 % confidence interval, as created by a bootstrapping method. (3) The trustworthiness of this bootstrapping method applied to the new model with all three features is investigated with a coverage analysis, based on simulated data.

The QSI is used to compare the quantile score of a model with that of a reference. In order to specifically investigate the influence of a single
model feature, we use these features in a model and compare with a reference without this specific feature; e.g.,

Figure

Quantile skill index (QSI) of the three features (columns) for four different cases (rows) where the investigated feature is combined with no other (upper row), one other (second and third row), or both other features (lower row) in model and reference. Column titles indicate the feature switched on in the model and switched off in the reference. The slightly opaque labels in the panels indicate which model and reference is used (see also Table

The curvature (duration offset

Multiscaling allows for different slopes of different

The intensity offset

When modeling only the durations

Quantile skill index for data with an hourly resolution. The visualization scheme follows that of Fig.

We conclude that the choice of parameters depends on the study purpose. When focusing on long ranges of durations, we recommend using features
like curvature, multiscaling, and flattening. If the focus lies on long durations, or the data do not provide a sub-hourly resolution, simple scaling
models might be sufficient. These recommendations are further elaborated on in the discussion in Sect.

Figure

IDF curves for two example stations within the Wupper catchment. Empirical quantile estimates are denoted with the plus signs (

Confidence intervals in Fig.

Bootstrapping coverage. Using a Brown–Resnick max-stable process, the coverage was determined in order to investigate the reliability of 95 % confidence intervals from bootstrapping. A total of three different levels of dependence were used.

In this study, we show that model performance can be increased when the flattening of IDF curves in the long-duration regime is taken into account. We assume that this behavior arises from seasonal effects. That means that the annual maxima of different durations may not follow the same scaling
process. However, this topic is currently under further investigation

The analyzed features – curvature, multiscaling, and flattening – were seen in the results to have a different impact on modeling performance, depending on the duration and return period. All features are able to improve the model for certain regimes, but depending on the problem that is approached, features should be chosen accordingly. If the focus is on a small timescales of minutes, then the curvature skill is important for a good modeling result. When curvature is used and medium to long timescales are also of importance, then the flattening feature should be used. This helps to compensate for the deterioration due to curvature over longer durations. Multiscaling is a good choice if a loss in skill for short durations can be accepted in exchange for a simultaneous improvement at long durations, regardless of which other features are requested.

The skills of the features depend on another feature's presence. This dependence is strongest for the flattening, which can only improve the model when curvature is used. The modeling performance of the curvature depends less on the presence of other features. The same applies to the multiscaling feature.

These suggestions hold for models that are supposed to cover a wide range of timescales from minutes to days. For data with hourly or more coarse temporal resolution, the skill gain from using the features is much smaller. Here, flattening can improve the model slightly on a daily timescale and multiscaling only improves modeling long durations a little bit but leads to a slight reduction in skill for the hourly timescale.

Additional parameters give the model more flexibility. Including

IDF curve for Bever. A comparison of a model with flattening (

The parametric form of the IDF relation is based on three modifications to a simple power law which are motivated by our understanding of the rainfall
process, namely curvature

The aim of this study is to compare and suggest new parametric forms of consistent IDF curves that are applicable to a large range of durations from minutes to several days and, therefore, cover events from short-lived convective storms to long-lasting synoptic events. The dependence on duration is implemented in the location and scale parameter and allows for three features, i.e., curvature, multiscaling, and flattening. The analysis of these features enables us to understand more about the underlying physical effects beyond the subject of return periods and provides more flexible IDF curves that are suitable for a wide range of durations. The results of our simulation study show that we are able to provide reasonable estimates of uncertainty using bootstrapping and also with regard to dependence between durations.

Our findings agree with

Consistent modeling using the d-GEV enables the use of fewer parameters. In this way, the model can be easily extended, e.g., using physically relevant
atmospheric covariates. Thus, improving the parameterization of the d-GEV is crucial to leading the path for further steps. In future studies, we plan to
include spatial covariates into the estimation of the newly proposed d-GEV parameters, including intensity offset, in order to use data from different
locations more efficiently. Also, the concept of non-stationary precipitation with respect to IDF curves is important to consider

The analysis of the performance shows that the new parametric form of the duration-dependent GEV suggested here, together with the bootstrap-based confidence intervals, offers a consistent, flexible, and powerful approach to describing the intensity–duration–frequency (IDF) relationships for various applications in hydrology, meteorology, and other fields.

The estimation of d-GEV parameters was conducted with the R base

All initial-value techniques were based on the same first step. An individual GEV distribution was fitted to each duration

Another initial duration offset

The same

Overview of initial value combinations (suggestions). The initial values for the parameters

For the coverage analysis in Sect.

We investigate how the choice of durations that are used to train the model influences the model performance. Since the number of training data points
is much higher for long durations

In this way, there is more training data for short durations available, which might shift the model's performance focus to other duration regimes. However, it is important to note that this is only an artificial increase in available data, since the additional data points do not contain substantial new information.

The model with the new artificial training data set (Eq.

In order to evaluate whether the GEV distribution is an appropriate choice for this analysis, we provide quantile–quantile (QQ) plots
(Fig.

QQ plots for selected stations. Confidence intervals were obtained by simulating transformed Fréchet distributed values from the model distribution and extracting a 95 % interval.

In Sect.

Overview of models and references for verification.

Parts of the rainfall data are freely available from the German Weather Service

The conceptualization was done by HWR, FSF, and JU, with FSF and JU also curating the data, developing the methodology, and validating the findings. FSF conducted the formal analysis, validated and visualized the project, and prepared the original draft. HWR acquired the funding and supervised the project. The software was developed by FSF, JU, and OEJ. The review and editing of the paper was done by JU, OEJ, and HWR. Moreover, all of the authors have read and agreed to the published version of the paper.

The contact author has declared that neither they nor their co-author has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank the German Weather Service (DWD) and the Wupperverband, especially Marc Scheibel, for maintaining the station-based rainfall gauge and providing us with data.

This research has been supported by the Bundesministerium für Bildung und Forschung (grant no. 01LP1902H) and the Deutsche Forschungsgemeinschaft (grant nos. GRK2043/1, GRK2043/2, and DFG CRC 1114).We acknowledge support from the Open Access Publication Initiative of Freie Universität Berlin.

This paper was edited by Xing Yuan and reviewed by Rasmus Benestad and three anonymous referees.