A gridded multi-site precipitation generator for complex  terrain: an evaluation in the Austrian Alps

Dabhi, Hetal P.; Rotach, Mathias W.; Oberguggenberger, Michael

doi:https://doi.org/10.5194/hess-27-2123-2023

Articles | Volume 27, issue 11

https://doi.org/10.5194/hess-27-2123-2023

Articles | Volume 27, issue 11

Research article

07 Jun 2023

Research article |

| 07 Jun 2023

A gridded multi-site precipitation generator for complex terrain: an evaluation in the Austrian Alps

Hetal P. Dabhi, Mathias W. Rotach, and Michael Oberguggenberger

Abstract

For climate change impact assessment, many applications require very high-resolution, spatiotemporally consistent precipitation data on current or future climate. In this regard, stochastic weather generators are designed as a statistical downscaling tool that can provide such data. Here, we adopt the precipitation generator framework of Kleiber et al. (2012), which is based on latent and transformed Gaussian processes, and propose an extension of that framework for a mountainous region with complex topography by allowing elevation dependence in the model. The model is used to generate two-dimensional fields of precipitation with a 1 km spatial resolution and a daily temporal resolution in a small region with highly complex terrain in the Austrian Alps. This study aims to evaluate the model with respect to its ability to simulate realistic precipitation fields over the region using historical observations from a network of 29 meteorological stations as input. The model's added value over the original setup and its limitations are also discussed. The results show that the model generates realistic fields of precipitation with good spatial and temporal variability. The model is able to generate some of the difficult areal statistics useful for impact assessment, such as the areal dry and wet spells of different lengths and the areal monthly mean of precipitation, with great accuracy. The model also captures the inter-seasonal and intra-seasonal variability very well, while the inter-annual variability is well captured in summer but largely underestimated in autumn and winter. The proposed model adds substantial value over the original modeling framework, specifically with respect to the precipitation amount. The model is unable to reproduce the realistic spatiotemporal characteristics of precipitation in autumn. We conclude that, with further development, the model is a promising tool for downscaling precipitation in complex terrain for a wide range of applications in impact assessment studies.

Download & links

Article (PDF, 7112 KB)

Supplement (1165 KB)

Download & links

How to cite.

Received: 14 Jan 2022 – Discussion started: 24 Jan 2022 – Revised: 12 Nov 2022 – Accepted: 02 Dec 2022 – Published: 07 Jun 2023

1 Introduction

Precipitation is a major component of the hydrological cycle. With global warming, the hydrological cycle is expected to intensify, and the risk associated with extreme events will increase (Tabari, 2020; Pfahl et al., 2017). The resulting changes in precipitation will be unequally distributed around the world. There are many hydrologic responses to climate change, and the potential impacts of these are likely to affect the availability of fresh water, agriculture, the timing and severity of wildfires, and habitat sustainability (Bates et al., 2008; Kundzewicz et al., 2008). With the increasing awareness of climate change and its global impacts on ecosystems and human societies (Konapala et al., 2020; Haddeland et al., 2014; Schewe et al., 2014), there is also an increasing need to understand the effects and impacts that would occur at the local scale. Knowledge of how the local hydrological cycle and water resources will be affected by climate change is essential for planning reliable adaptation strategies and water policy.

In Austria, where a large part of the country is covered by mountains, the local hydrological cycle depends heavily on temporal and spatial variations in precipitation. Tourism and agriculture are among the main drivers of Austria's economy, and the accessibility of water resources for human consumption and ecosystems is largely contingent on the spatiotemporal distribution of precipitation. In the Austrian Alps, studies on the observed and projected impact of climate change have shown changes in the availability of snow cover and water flux (e.g., Abermann et al., 2009; Wijngaard et al., 2016). This will ultimately have an impact on the economy, the ecosystem, the environment, and society. To assess the impacts of climate change at local scales, precise climate information is critical and can serve the requirements of decision-makers. Often, such information should be consistent in space and time for the present and future climate. Many applications in hydrology require very high-resolution precipitation data, typically at a 1 km spatial resolution and a daily temporal resolution. However, obtaining such high-resolution precipitation data is still a challenging task, especially in mountainous regions (Henn et al., 2018). Most importantly, for complex topography, such as the Austrian Alps, even a 1 km resolution cannot include the impact of topography on climate correctly. For such regions, many applications in hydrology and ecology require even higher-resolution data – at a spatial scale of 100 m and at an hourly temporal scale. Climate models with higher resolutions, like regional climate models, are also unable to provide such data. For this reason, various downscaling methods have been in use over the past few decades. Among these downscaling methods, statistical downscaling using stochastic weather generators (WGs) has become very popular, mainly because WGs are computationally parsimonious.

A vast variety of WGs have been developed based on different approaches. The most widely used WGs are founded on a rather simplistic approach in which the sites are mutually independent in space and time. Such WGs are generally referred to as “single-site” WGs. Among the single-site WGs, the most popular are the parametric models based on Richardson (1981). Richardson (1981) used a Markov chain to simulate time series of precipitation occurrence (wet/dry days) and amount, and other variables were generated upon the condition of whether the generated day was wet or dry (e.g., Dabhi et al., 2021; Caron et al., 2008; Zhang et al., 2004; Dubrovský et al., 2004; Wilks, 1992). Details on the available WGs can be found in the review articles by Ailliot et al. (2015), Maraun et al. (2010), and Wilks and Wilby (1999); Maraun et al. (2010) focused solely on precipitation.

The major drawback of single-site WGs is that they are only focused on a single location; this can generate realistic data at a location, but it lacks a spatially correlated structure in the generated data. Obtaining a spatially and temporally consistent dataset – which is more realistic – from single-site models is impossible. Thus, over the past 2 decades, the focus has moved towards the development of spatiotemporal WGs, also known as “multi-site” WGs. For precipitation with its uneven nature of occurrence and intensity, it is even more challenging to model it under the condition of maintaining its spatiotemporal structure. In particular, in complex topography, such as the Alps, this task is even more challenging.

Numerous approaches have been proposed to generate spatially and temporally correlated precipitation data. Wilks (1998) published one of the early works on the multi-site generation of daily precipitation data; in the aforementioned study, single-site parametric WGs at sites were forced with correlated random numbers to generate the occurrence of precipitation, and the amount of precipitation was generated using a mixture of two exponential distributions. Other approaches to the spatiotemporal modeling of precipitation are hidden Markov models (e.g., Ghamghami et al., 2016; Charles et al., 1999; Ailliot et al., 2009), copula-based approaches (Serinaldi, 2009; Bárdossy and Pegram, 2009), resampling based on the k-nearest-neighbors approach (Apipattanavis et al., 2007; Buishand and Brandsma, 2001; Rajagopalan and Lall, 1999), Poisson cluster models (Ramesh et al., 2012; Cowpertwait, 1995), artificial neural networks (Harpham and Wilby, 2005), and approaches based on generalized linear models (GLMs) (Kleiber et al., 2012; Verdin et al., 2018). Moreover, Baxevani and Lennartsson (2015) proposed a spatiotemporal model using a censored latent Gaussian field for precipitation generation, Olson and Kleiber (2017) used the approximate Bayesian computation method, and Gao et al. (2021) developed a multi-site stochastic daily rainfall model by coupling a univariate Markov chain with a multi-site rainfall event model. More sophisticated WGs also exist that can provide high-resolution spatiotemporal fields combining physical and stochastic approaches (e.g., Peleg et al., 2017; Paschalis et al., 2013).

However, most of the aforementioned approaches simulate precipitation only at the locations where observations are available, and such multi-site WGs have been implemented for the Alps; for example, Keller et al. (2015) and Keller et al. (2017) used a Wilks-type WG for precipitation simulation and downscaling, respectively, for a mountainous catchment in the Swiss Alps, and Breinl et al. (2013) used a semi-parametric multi-site precipitation generator for the mountains in the Austrian–German Alps. Nevertheless, high-resolution data in space and time are needed to provide more realistic input for local climate impact assessment. To achieve this, a gridded multi-site model is required. Sparks et al. (2017) proposed a multi-site multivariate WG based on the use of periodically extended empirical orthogonal functions (EOFs), in which they modeled precipitation as a censored latent Gaussian process. They generated gridded precipitation and temperature data over Europe using gridded input data, but their WG cannot provide gridded data without gridded observations. Peleg et al. (2017) developed a WG called AWE-GEN-2d, which can generate two-dimensional fields of various meteorological variables where precipitation is generated at a 2 km spatial resolution and a 5 min temporal resolution, and used it in the Swiss Alps. Although their approach is sophisticated, as it is a hybrid approach combining dynamical and statistical approaches, it requires spatially distributed data for the calibration of the WG and cannot generate data for a region outside of the calibration region. Such WGs are of limited use if the observed gridded data are not available, which is often the case. To our knowledge, not much work has been done on complex terrain, like the European Alps, using multi-site gridded WGs without gridded observations.

Wilks (2009) developed one of the first ever WGs that could provide gridded data of precipitation and temperature at locations with no observations. Kleiber et al. (2012) also provided an approach using a GLM-based model that utilizes Gaussian processes to generate gridded data. Their approach is appealing, as it generates the readily available field of precipitation using kriging for the interpolation of the model parameters. The advantage of their approach is that one could include various covariates in the GLM framework, such as large-scale climate indices, local climate information, and topographical information, which makes the model more flexible. Another advantage is that it is a probabilistic approach that allows one to quantify the uncertainties in the parameter estimation. However, Kleiber et al. (2012) only tested the model for multi-site precipitation generation, i.e., at locations with observation, and not for generated gridded data of precipitation. As many applications for impact studies require gridded data as input, prior to producing the gridded data for such applications, one must evaluate the model for its ability to reproduce gridded fields. Another important point to be noted is, as the model can provide data at locations without historical observations, one can obtain historical time series of daily precipitation at those locations. In order to use the model for this purpose, it is necessary to assess the model performance with respect to gridded fields. Verdin et al. (2018) modified the framework of Kleiber et al. (2012) by including seasonal precipitation as an additional covariate, and they evaluated the model for gridded data, although it was implemented on flat terrain. Moreover, their modified model and the original model both used an isotropic and stationary covariance structure with ordinary kriging (OK) for the interpolation of the model parameters; this may not be suitable for complex topographical terrain. Bennett et al. (2018) generated precipitation fields using a latent-variable approach that provides a parsimonious method to jointly generate the rainfall occurrence and amount. They used an isotropic, powered exponential function to include spatial correlations and kriging for the interpolation of the parameters. However, they also implemented their model in relatively flat terrain in South Australia. To our knowledge, no space–time gridded precipitation generator has been evaluated with respect to its ability to generate two-dimensional fields of precipitation in the highly complex terrain without the requirement for gridded input data.

Here, we propose an extension of the framework of Kleiber et al. (2012) for complex terrain and evaluate the model with respect to its capability to generate realistic two-dimensional fields of precipitation for a mountainous region in the Austrian Alps. In addition, we examine the added value of our model over the original isotropic setup and discuss the limitations of the model.

This article is organized as follows: Sect. 2 describes the extension of the isotropic framework for the implementation in a mountainous region; Sect. 3 details the study area, the data, and the model evaluation strategy used in the study; Sect. 4 presents and analyses the results; Sect. 5 comprises a discussion of the results; and Sect. 6 summarizes the study.

2 Model description

2.1 Precipitation occurrence

At a location s and on a day t, the precipitation occurrence O(s,t) is 0 (dry day) if no precipitation occurs and 1 (wet day) if precipitation occurs. A wet day is defined as a precipitation amount in excess of 0.1 mm.

For a set of locations s and on a day t, a latent Gaussian process W_O(s,t) is defined with a mean function μ_O(s,t) and a covariance function $C_{O} (h, v, t)$ , where $h = | s_{i} - s_{j} |$ is the horizontal (Euclidean) distance between two locations denoted by i and j and $v = | v_{i} - v_{j} |$ is the elevation difference between the two locations. The suffix “O” stands for occurrence. As we are implementing the model in a mountainous region, we also define the covariances among the sites as a function of the difference in the elevation. In comparison with the original model, where an isotropic covariance structure is used as a function of horizontal distances among the sites, this allows us to include anisotropy in the model. The precipitation occurrence then is defined as follows:

\begin{matrix} (1) & \begin{array}{r} O (s, t) = 0 if W_{O} (s, t) < 0, \\ O (s, t) = 1 if W_{O} (s, t) \geq 0 . \end{array} \end{matrix}

Here, the mean function is

\begin{matrix} (2) & μ_{O} (s, t) = β_{O}^{'} (s) X_{O} (s, t), \end{matrix}

where X_O is a vector of covariates and β_O is a vector of regression parameters, as in Eq. (5).

Kleiber et al. (2012) used a stationary and isotropic exponential covariance function of the form

\begin{matrix} (3) & C (h, t) = \exp (\frac{- | h |}{A (t)}), \end{matrix}

where A(t) is the time-dependent scale parameter.

As our goal is to use the model in complex topography, we introduce anisotropy in the covariance function (Eq. 3) by taking the difference in elevation between two locations. Thus, our stationary and anisotropic covariance function $C (h, v, t)$ takes the following form:

\begin{matrix} (4) & C_{O} (h, v, t) = \exp (- \frac{| h |}{A (t)} - \frac{| v |}{B (t)}), \end{matrix}

where A(t) and B(t) are the time-dependent scale parameters in the horizontal and vertical directions, respectively.

Elevation dependence in the covariance structure is a natural assumption in complex terrain. In the literature (e.g., Wilks, 1999, 2009), it has been used for precipitation simulation in the mountains.

At the base of this model is the single-site precipitation generator based on a GLM framework (e.g., Stern and Coe, 1984; Chandler and Wheater, 2002; Furrer and Katz, 2007); this is similar to a Richardson-type precipitation generator (Richardson, 1981), in which the daily precipitation occurrence is modeled as a first-order Markov chain and the daily precipitation amount is modeled using a gamma distribution. The GLM-based approach allows more flexibility, as one may include as many covariates as desirable; therefore, the seasonality or the influence of large-scale circulation on the local precipitation can be included. In the GLM approach, taking the previous day's occurrence as a covariate forms a first-order Markov chain. Thus, at individual sites, the model reduces to a logit model given by

\begin{matrix} (5) & \log (\frac{p_{t}}{1 - p_{t}}) = β_{O}^{'} (s) X_{O} (s, t), \end{matrix}

where p_t is the probability of occurrence on a day t.

Note that we use a logit link function instead of a probit link function, as was the choice in the original model. The parameters β_O are estimated at each location using the maximum likelihood estimation (MLE) approach.

In order to spatialize the model to obtain gridded data, the estimated regression parameters at observation locations must be interpolated at grid locations. The Gaussian process allows for a spatial interpolation method called kriging; this method allows one to interpolate the model parameters β_O associated with each covariate (estimated at the observation locations) to any location of interest. These interpolated coefficients are then used to obtain the mean function (Eq. 2). Here, we use kriging with external drift (KED) to interpolate the regression parameters. As precipitation in the mountains is unequally distributed across terrain, we allow elevation as an external drift in kriging such that the predicted values of precipitation (through the interpolated parameters) reflect the elevation dependence of precipitation. Again, inclusion of the elevation in kriging interpolation is natural in complex terrain. It has been used in the literature for precipitation interpolation in the mountains and proven to outperform OK (e.g., Tobin et al., 2011; Rata et al., 2020). Moreover, we have found linear dependence in the model parameters with elevation (not shown). In KED, an auxiliary variable is assumed (which is elevation here) that is linearly related to the variable of interest (which is the β parameter associated with each covariate in the model).

The variogram for the regression parameter associated with each of the covariates is estimated using the MLE. The nugget gets close to zero. Once the parameters of the logit model are interpolated, precipitation occurrence can be modeled at each grid point. However, the generated gridded field of precipitation occurrence, which is correlated in space, must also be correlated in time. Hence, the time-dependent covariance $C_{O} (h, v, t)$ (see Eq. 4) has been introduced. This covariance is estimated using the residuals in the logit model. We use the method of moments approach, as suggested by Kleiber et al. (2012), to estimate the parameters of the covariance function. The parameters are estimated separately for each month to allow for seasonality in the generated data. Following this process, the generated field of precipitation occurrence is correlated in both space and time and also reflects the seasonality.

Note that Gaussian process modeling is considered a nonparametric method. In a parametric model, the number of parameters remains fixed with respect to the amount of data available (i.e., number of stations in our case), whereas the number of parameters grows with the number of data points when using nonparametric methods.

We also compare the results of our model with a simulation using ordinary kriging (OK) instead of KED in our model as well as with the original isotropic model using OK and KED. This will be discussed in Sect. 4.3.

2.2 Precipitation amount

To simulate spatially correlated fields of precipitation, another Gaussian process W_A(s,t) is defined with a mean function μ_A(s,t) and a covariance function $C_{A} (h, v, t)$ such that

\begin{matrix} (6) & Y (s, t) = G_{s, t}^{- 1} (Φ (W_{A} (s, t))) . \end{matrix}

Here, G_s,t is the cumulative distribution function (CDF) of the gamma distribution at the location s and time t, and Φ is the CDF of a standard normal distribution. The suffix “A” stands for amount.

At an individual location, the amount model is the gamma GLM with a logarithmic link function as given by Furrer and Katz (2007). The shape parameter α of the gamma distribution varies in space but not in time, whereas the scale parameter γ varies in both space and time. Hence, each location has its own distinct value of shape and scale parameters, with the scale parameter varying with time. Thus, we have

\begin{matrix} (7) & \log (γ (s, t) α (s)) = β_{A}^{'} (s) X_{A} (s, t), \end{matrix}

with the mean of the gamma distribution being the product of the scale and shape parameters, i.e., γα. X_A is the vector of covariates possibly different from those selected in the occurrence model.

The scale and shape parameters (γ and α, respectively) and the model parameters (β_A) are estimated at each individual observation site using the MLE approach and are then interpolated using KED. We allow the scale parameter to vary with every month in order to include seasonal variations in precipitation at each location.

The mean function μ_A(s,t) of the Gaussian process W_A(s,t) is again obtained from a regression on covariates. The covariance function $C_{A} (h, v, t)$ is the same as that given in the occurrence model (Eq. 4) but with different parameters. The parameters of the covariance function are estimated for each month separately using the method of moments approach in order to allow seasonality in the spatiotemporal pattern of the precipitation amount.

3 Implementation

3.1 Study area and data

The model is implemented in a small region comprising highly complex terrain (ranging from 256 to over 3500 m a.s.l., meters above sea level) in the Austrian Alps. The area surrounds the catchment of the Oetz River, mainly in the federal state of Tyrol but also including a part of the Autonomous Province of Bolzano in northern Italy. The reason for selecting this region is that the catchment of the Oetz River is a widely researched area (e.g., Wijngaard et al., 2016; Abermann et al., 2009). To include more stations in the study, we allowed stations from the surrounding region, including northern Italy. The study region is comprised of several valleys including the Oetz and Pitz valleys in Austria and the Passeier Valley in South Tyrol. Daily observations from 29 meteorological stations (Fig. 1) based on the availability of homogeneous time series for a period of 30 years from 1981 to 2010 are selected. Typically, input data for hydrological applications are required on an hourly temporal scale for the spatial scale of the terrain considered; however, due to the fact that very few hourly datasets are available over climate timescales, we selected daily data for the study. The dataset comprises data provided by the Austrian National Weather Service (ZAMG – Zentralanstalt für Meteorologie und Geodynamik), the Hydrographic Service of Austria, the Hydrographic Service of the Autonomous Province of Bolzano, the Institute of Atmospheric and Cryospheric Sciences – University of Innsbruck, and TIWAG (Tiroler Wasserkraft AG). The highest station is on a glacier (Hintereisferner) at an elevation of 2860 m a.s.l., whereas the lowest station is at 588 m a.s.l. in northern Italy. A few stations have missing values for single days or for a short period. Our model ignores such values while computing the observed statistics. The data at all of the stations are thoroughly quality controlled by the respective service providers.

https://hess.copernicus.org/articles/27/2123/2023/hess-27-2123-2023-f01

Figure 1The study area used in this work showing (a) the location of the region in the central Alps and (b) the locations of the 29 meteorological observation stations whose data are used in the study. Latitude and longitude are given in degrees north and east, respectively. Gray shading denotes the elevation (m a.s.l.). Stations with an elevation higher than 1500 m a.s.l., usually high-mountain stations, are shown using the symbol “M”, and the stations with an elevation lower than 1500 m a.s.l., typically the valley stations, are shown using the symbol “V”. The stations shown in red are selected as example stations to illustrate the results in the article.

In the northern part of the region, we have a dense network of stations, whereas the southern part has relatively fewer stations. The average inter-station distance between two locations is 28.15 km, the maximum inter-station distance is 72.84 km, and the minimum inter-station distance is 1.25 km. The average altitude difference between two stations is 0.605 km, while the maximum altitude difference is 2.272 km. The locations of the 29 stations are shown in Fig. 1, and further details about the stations are given in Table 1.

Table 1List of the 29 meteorological stations whose data are used in the study. The names in bold are the three representative stations used to illustrate the results.

Download Print Version | Download XLSX

The mean annual precipitation observed in the lowlands is approximately 780 mm over an average of 150 wet days per year, whereas the highest mean annual precipitation of 1320 mm over an average of 176 wet days per year is observed at the high-mountain station of Dresdner Huette. The highest number of mean annual wet days is 220 at St. Martin in the Passeier Valley in South Tyrol, with an annual average of 887 mm of precipitation.

Due to strongly different topography, a large variability in both space and time is observed in the dataset. Of the 29 stations, Prutz has the most distinct climatological characteristics. For example, Prutz has the largest variability in almost all months. Moreover, the most extreme precipitation (156.5 mm) in a day is recorded at Prutz (in July 2009), while the second highest amount of precipitation amongst the remaining 28 stations was recorded on the same day at Dresdner Huette (35.1 mm). Apart from Prutz, only Dresdner Huette recorded a daily precipitation amount as high as 120.4 mm during the 30 years of record. At the St. Leonhard im Pitztal location, there are two stations operated by two different service providers: one by the Austrian Hydrographic Service (St. Leonhard im Pitztal-1; see Table 1) in the northern part of the valley and the other by ZAMG (St. Leonhard im Pitztal-2; see Table 1) in the southern part of the Pitz Valley. St. Leonhard im Pitztal-2 has somewhat different climatological characteristics than the nearby stations. Another station that has quite a different climatology compared with the Austrian stations is St. Martin. Note that this station is in the south of the Alpine crest (i.e., in northern Italy) and has the lowest elevation in the observed data. Thus, there are high variations in the observed climatologies of precipitation from valley to valley and also for stations within the same valley. This creates a particular challenge with respect to the simulation.

To reduce the sampling uncertainty and increase the robustness of the observations, we increase the sample size of the observed data by considering a 7 d window centered on the day of interest. Thus, the chance that a particular date had, for example, 30 dry days by random choice is minimized, thereby avoiding the probability of a dry day being 1.0 (rather than a value such as 0.98), which is a problematic model setting.

We generate N=30 stochastic realizations, each 30 years long (30 realizations × 30 years = 900 years), of daily two-dimensional fields of precipitation on a 1 km grid over the region using the aforementioned observed 30 years of daily data. The Shuttle Radar Topography Mission (SRTM) 1 km (30 arcsec) resolution dataset (Becker et al., 2009) is used as a simulation grid. We select a 1 km spatial resolution to reduce the simulation time and the data storage requirements.

For the Northern Atlantic Oscillation Index (NAOI) (see Sect. 3.2), a daily time series from 1981 to 2010 is obtained from the National Oceanic and Atmospheric Administration (NOAA) website (https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/nao.shtml, last access: 15 May 2023).

Note that the observed data are from different service providers; therefore, the time of the data collection may differ, which may affect the results.

3.2 The selection of covariates in the model

We allow several covariates in the model so that the model can capture a realistic structure of precipitation patterns over the region. This includes the day-to-day time dependence, the seasonality, and the influence of large-scale circulation. As the first covariate, we select the occurrence of precipitation on the day prior (Occ_t−1) as a possible covariate so that day-to-day temporal dependency in occurrence at a location is captured. To include seasonality, the time-dependent first- and second-order harmonics of sine and cosine (see Table 2) are considered as possible covariates. To allow for the influence of large-scale circulation over Europe, the NAOI is considered as a possible covariate. Studies show that there are links between the NAOI and precipitation characteristics (Casty et al., 2005; Beniston, 1997). A strongly positive NAOI is associated with persistent high pressure over the Alpine region, resulting in warmer than average temperatures and lower than average precipitation. In general, the winter NAOI correlates negatively with precipitation. Along with the described covariates, we also consider their interaction terms as possible covariates.

Table 2List of the covariates included in the occurrence and amount models (Eqs. 8 and 9, respectively).

“n” is 365 (366 in the case of a leap year).

Download Print Version | Download XLSX

For the selection of the covariates in the model, we use both the Akaike information criterion (AIC) (Akaike, 1974) and the Bayesian information criterion (BIC) (Schwarz, 1978). The criteria do not select the same set of covariates at all of the stations: the BIC has a tendency to select the simplest model, whereas the AIC has a tendency to select more complex models. However, it turns out that the BIC helps to identify the most important covariates at all of the stations.

The three most important covariates for the occurrence model at the majority of stations are Occ_t−1, $\cos (2 π t / n)$ , and $\cos (4 π t / n)$ , where “t” is the day of the year. We select the covariates that are selected by both the AIC and BIC at the majority stations. The selected covariates are listed in Table 2. The BIC selected this set of covariates at 18 of the 29 stations (see Sect. 3.1), whereas the AIC selected the same set of covariates at 11 stations. Thus, the vector of covariates in the model is as follows:

\begin{matrix} (8) & \begin{aligned} X_{O} (s, t) = & (1, {Occ}_{t - 1}, \cos (\frac{2 π t}{n}), \sin (\frac{2 π t}{n}), \\ \cos (\frac{4 π t}{n}), \cos (\frac{4 π t}{n}) \cdot \sin (\frac{4 π t}{n}), \\ {Occ}_{t - 1} \cdot \cos (\frac{2 π t}{n}), NAOI), \end{aligned} \end{matrix}

where “n” is 365 (or 366 in the case of a leap year). The first term is associated with the intercept in the model.

For the precipitation amount, we also consider all of the possible covariates described for the occurrence model. Furthermore, as above, we select the covariates using both the AIC and BIC for the amount model. Additionally, selecting the same seven covariates as in the occurrence model at the majority of stations, the second-order harmonic of sine is also selected by both the AIC and BIC (17 stations by the BIC and 16 stations by the AIC). Thus, we allow a total of eight covariates in the amount model (see Table 2), and the vector of covariates for the amount model is as follows:

\begin{matrix} (9) & \begin{aligned} X_{A} (s, t) = & (1, {Occ}_{t - 1}, \cos (\frac{2 π t}{n}), \sin (\frac{2 π t}{n}), \\ \cos (\frac{4 π t}{n}), \sin (\frac{4 π t}{n}), \cos (\frac{4 π t}{n}) \\ \cdot \sin (\frac{4 π t}{n}), {Occ}_{t - 1} \cdot \cos (\frac{2 π t}{n}), NAOI) . \end{aligned} \end{matrix}

The correlations for the precipitation amount in the model are computed only for days when precipitation was observed.

3.3 Model evaluation strategy

Although the model produces daily fields of precipitation, before evaluating the model for gridded data, we first evaluate it at the individual locations for which observations are available. It is common practice for the validation of WGs that first the input statistics must be reproduced. From the simulated gridded data, the 30-year time series of daily precipitation at the nearest grid point to the observation locations is extracted from each of the N=30 realizations. The mean of the simulated statistics in each realization is compared with the observed statistics. The validation is carried out for daily and long-term statistics, and more difficult statistics to be reproduced by the model are also considered. For the illustration of the results at individual locations, 3 example stations (from the 29 available stations) are selected: (i) Oetz, (ii) Pitztal Glacier, and (iii) Prutz. These three stations are highlighted in red in Fig. 1b. The three stations are selected such that Oetz represents the results at valley stations, Pitztal Glacier represents the results at high-mountain stations, and Prutz represents stations with different climatic characteristics and also has the climatic characteristics most distinct from those at the surrounding stations (which makes it the most challenging to simulate). Note that Pitztal Glacier is the highest station amongst the 29 observation stations (see Table 1).

In the next step, we evaluate the model with respect to its ability to reproduce spatial statistics. Thus, gridded observed data are required. We use the Alpine Precipitation Grid Dataset (APGD) (Isotta et al., 2014) from the Swiss Federal Office for Meteorology and Climatology (MeteoSwiss), which has a 5 km spatial resolution and a daily temporal resolution. The dataset is based on measurements from high-resolution rain gauge networks, incorporating more than 5500 rain gauge measurements on average per day from more than 8500 stations in seven Alpine countries. With a 10–15 km station spacing, the dataset is one of the densest in situ observation networks over high-Alpine topography worldwide. These data are available from 1971 to 2008. Note that this dataset is not a perfect reference. To obtain 30 years of gridded observations, we select the period from 1979 to 2008 for the validation of the simulated gridded data.

In order to assess the interpolation accuracy, we perform a holdout cross-validation in which one or more stations are withheld from the model fitting process. We withhold the same three stations that were selected for the illustration of the results: Oetz, Pitztal Glacier, and Prutz. The model should be able to reproduce the observed statistics accurately at the withheld stations. For cross-validation, we also generate N=30 realizations of 30 years, i.e., 900 years of data.

For the uncertainty estimation in the N=30 realizations, we use a tolerance interval (TI) (Patel, 1986; Krishnamoorthy and Mathew, 2009; Young, 2010) instead of the conventional method of using a confidence interval for the sampling error, which is sensitive to sample size. As opposed to confidence intervals, which would give expected bounds on the means of the simulated data, the TI gives bounds on the future individual observations. In our view, TIs provide an appropriate visualization of the expected variability in the simulated data as well as a means of comparison with the original data. Here, we use a parametric two-sided TI with a normal distribution. The TIs are computed for each of the statistics considered in this study, obtained from the simulated 30 realizations at each station. As uncertainty criteria, we select a confidence interval of 95 % and a 99 % proportion of the population for the TI, i.e., the TIs indicate the 99 % range of the simulated values (with 95 % confidence). The TIs are shown in each figure as a shaded area around the curve and are denoted as ${TI}_{99}^{95}$ throughout the article.

To quantify the model performance, along with various error metrics, we also take correlation coefficients (CCs) and coefficients of determination (R² values) into account. Thus, we employ the following metrics: (i) mean bias error (MBE), (ii) mean absolute error (MAE), (iii) root-mean-square error (RMSE), (iv) CC, and (v) R². Each of the performance metrics serves a different purpose. The MBE measures the overall bias in the model performance, whereas the MAE and RMSE both provide information on the mean magnitude of the error, regardless of the direction of the error. However, the greater the difference between them, the greater the variance in the individual errors in the sample. Similarly, correlation shows the association between the two variables (which are the observed and synthetic statistics here), but R² shows the proportion of data variation explained by the model. All of the performance metrics for the spatial statistics (corresponding to Fig. 13) are obtained between the observed and synthetic statistics derived at all of the grid points; for example, for the spatial occurrence probabilities, the metrics are derived between the observed and synthetic “spatial series” of the occurrence probabilities.

4 Results

An extensive evaluation of the model-generated data is carried out here.

https://hess.copernicus.org/articles/27/2123/2023/hess-27-2123-2023-f02

Figure 2Daily conditional probability of a dry day following a dry day (P_dd) at three selected stations: (a) Oetz, (b) Pitztal Glacier, and (c) Prutz (Fig. 1, Table 1). The observed probabilities (navy blue) are obtained from the 30 years (1981–2010) of observed data. The simulated probabilities are the mean of the 30 realizations (sky blue) and of the holdout cross-validation simulation (brown). The solid lines are the curves fitted (using the LOESS method) to the observed and simulated probabilities, respectively.

A gridded multi-site precipitation generator for complex terrain: an evaluation in the Austrian Alps

2.1 Precipitation occurrence

2.2 Precipitation amount

3.1 Study area and data

3.2 The selection of covariates in the model

3.3 Model evaluation strategy

4.1 Evaluation at individual stations

4.1.1 Daily occurrence probabilities at individual stations

4.1.2 Frequency of spells of different lengths at individual stations

4.1.3 Monthly mean precipitation at individual stations

4.1.4 Quantile–quantile (Q–Q) plots at individual stations

4.2 Evaluation of simulated gridded data

4.2.1 The frequency of areal spells of different lengths

4.2.2 Spatial distribution of occurrence probabilities

4.2.3 Spatial distribution of mean wet-day daily precipitation

4.2.4 Monthly mean areal precipitation

4.2.5 Annual maximum precipitation sums

4.3 Comparison between the anisotropic and isotropic models using KED and OK