The joint probability of precipitation and soil moisture is here investigated over Europe with the goal to extrapolate meaningful insights into the potential joint use of these variables for the detection of agricultural droughts within a multivariate probabilistic modeling framework. The use of copulas is explored, being the framework often used in hydrological studies for the analysis of bivariate distributions. The analysis is performed for the period 1996–2020 on the empirical frequencies derived from ERA5 precipitation and LISFLOOD soil moisture datasets, both available as part of the Copernicus European Drought Observatory. The results show an overall good correlation between the two standardized series (Kendall's

Agricultural drought, defined as a condition of unusually high precipitation shortages and/or soil water deficits causing adverse effects on crop yields and production (Panu and Sharma, 2002), is probably the most recognized of the four main drought types or phases (Wilhite and Glantz, 1985). This is mainly due to the more direct and easier to understand impacts compared to the other types of droughts (Mishra and Singh, 2010). The scientific literature on agricultural drought provides a large variety of indices (WMO and GWP, 2016), with the aim of reproducing the temporal dynamics of crop water deficit through a combination of climatic observations, hydrological modeling, and remote-sensing data (Zargar et al., 2011).

The difficulty in capturing the multi-facet nature of agricultural drought events across the world with a single approach (Sivakumar et al., 2011) is confirmed by the absence of consensus in the scientific literature on the most reliable agricultural drought index. Despite the large range of available indices, some common characteristics can be identified, such as the focus on some proxy variables of plant water availability – through soil moisture (Dutra et al., 2008), actual evapotranspiration (Anderson et al., 2011), or basic meteorological information (Vicente-Serrano et al., 2010) – and the need to account for deviations from long-term conditions (i.e., use of standardized anomalies).

Meteorological drought indicators computed on appropriate aggregation timescales (McKee et al., 1993; Vicente-Serrano et al., 2010) have demonstrated a good capability of representing agricultural drought conditions in several case studies (e.g., Bachmair et al., 2018; Mohammed et al., 2022; Tian et al., 2018). They have been successfully integrated into a number of operational drought monitoring systems, thanks to their minimal input data requirement and ease of use. Among those indices, the Standardized Precipitation Index (SPI; McKee et al., 1993) computed on short-to-medium aggregation periods (i.e., SPI-3 and SPI-6) is often adopted as a suitable proxy variable for agricultural droughts (WMO, 2012).

As highlighted by Sheffield and Wood (2007), simplified indices for drought monitoring, such as the Palmer Drought Severity index (PDSI; Palmer, 1965) or the previously mentioned meteorological indicators, have been slowly integrated with indices directly based on modeled soil moisture data. This transition is fostered by the increasing availability worldwide of process-based hydrological models. Soil moisture percentile, or similarly standardized quantities, are often used in this context (Mo and Lettenmaier, 2013; Xia et al., 2014). The ever-growing records of remote-sensing-based estimates of soil moisture are becoming an additional data source to support the development of dedicated soil-moisture-based drought indices (Cammalleri et al., 2017; Carrão et al., 2016).

In the context of agricultural drought, an overall good agreement between SPI and soil moisture indices has been demonstrated over a large range of agricultural practices, crop types and climatic conditions. Halwatura et al. (2017) showed how SPI-3 represents a good approximation of modeled soil moisture over three different climatic regions in eastern Australia. Sims et al. (2002) found a high correlation between short-term precipitation deficit and soil moisture variations in North Carolina, while Ji and Peters (2003) highlighted the high correlation between SPI-3 and vegetation growth over croplands and grasslands in the US Great Plains. Wang et al. (2015) observed a good matching between soil moisture dynamics and SPI at the scale of 1–3 months when testing various indices over China. In Europe, Manning et al. (2018) highlighted how precipitation is the main driver of soil moisture droughts for a set of both dry and wet sites.

In spite of the above-mentioned consistencies, the outcome of any drought analysis is inevitably affected by the index selected to characterize drought conditions over a certain study region, as also highlighted by Quiring and Papakryiakou (2003) in testing different indices over the Canadian prairies. These authors suggest that a variety of drought indices should always be tested to determine the most appropriate one for a given application. It follows that the synergy between multiple indices can be exploited by the use of multivariate indicators (Hao and Singh, 2015), a family of approaches that encompasses a variety of merging strategies, including combined cascading indices (Cammalleri et al., 2021a; Rembold et al., 2019), composite and integrated approaches (Brown et al., 2008; Svoboda et al., 2002), and joint probability functions (Bateni et al., 2018; Hao and AghaKouchak, 2013; Kanthavel et al., 2022).

The latter category, in particular, aims at capturing the complex statistical dependence among different drought-related variables (Hao and Singh, 2015), and it has seen a growing relevance in many hydrological applications thanks to the introduction of copula functions and their ability to model a wide range of dependence structures (Nelsen, 2006; Salvadori et al., 2007; Joe, 2015). In the field of drought indices, the approach proposed by Kao and Govindaraju (2010) for the computation of the joint deficit index (JDI) has been applied to a variety of drought-related quantities over different regions, often including precipitation and soil moisture (i.e., Dash et al., 2019; Kwon et al., 2019).

A key feature in using joint probability is the possibility of characterizing the so-called tail dependence (TD), namely the asymptotical dependence of the extremes (Frahm et al., 2005). While TD has received large attention in the scientific literature of hydrological extremes (e.g., Aghakouchak et al., 2010; Poulin et al., 2007; Serinaldi, 2008), its use is largely unexploited in studies focusing on combined drought indices.

Studies on the marginal distribution of either precipitation or soil moisture usually adopt the gamma distribution for precipitation and the beta distribution for soil moisture. The use of the gamma family for the implementation of the SPI at different accumulation periods has become a standard practice in many applications (e.g., Mo and Lyon, 2015; Yuan and Wood, 2013). While other distributions have also proven to be reliable, such as the exponentiated Weibull (Pieper et al., 2020) and the Pearson Type III (Ribeiro and Pires, 2016), fitting the gamma distribution is still the most adopted approach. Over Europe, Stagge et al. (2015) demonstrated how the gamma distribution outperformed the other tested distributions across all accumulation periods and regions.

A more limited number of applications based on soil moisture data are available in the scientific literature compared to SPI. The use of the beta distribution for soil moisture data was introduced as early as the late 1970s, with the pioneer study of Ravelo and Decker (1979), following the consideration that soil moisture is a double-bounded quantity, ranging between residual and saturation. Sheffield et al. (2004) successfully applied this standardization for drought analyses over the US, while the same distribution was adopted by Cammalleri et al. (2016) on modeled data over Europe. Most recently, the beta distribution was also used to characterize the frequency of global satellite soil moisture data (Sadri et al., 2020).

Conversely, no standard approaches have been identified for the application of copulas to model the bivariate joint distribution of precipitation and soil moisture, mainly due to the large variety of probabilistic structures than may be observed between these two quantities. Common fitting strategies rely on the application of various copula families to identify the optimal for each specific site (e.g., Hao and AghaKouchak, 2013) or are based on an a priori selection of a copula family following empirical evidence (e.g., Dixit and Jayakumar, 2021). Independently from the selection strategy, the adopted copula implicitly assumes an underlying TD behavior, the influence of which on extreme detection should be properly accounted.

A comprehensive study on the joint probabilistic dynamics of precipitation and soil moisture is currently lacking in the scientific literature of multivariate drought modeling. Hence, the main goal of this study is to fill this gap, by investigating the mutual relationship between the empirical frequencies of precipitation (cumulated over 3 months, as for SPI-3) and soil moisture datasets as available over Europe as part of the European Drought Observatory of the Copernicus Emergency Management Service (EDO,

A large set of copulas is tested for this purpose across the entire European domain, to identify an optimal modeling of the dependence especially in proximity of the tails (given its major role in extreme detection). The spatial distribution of the results is analyzed to infer evidence of common patterns and behavior, which may support future operational applications based on similar parametric approaches.

The study focuses on Europe and makes use of the dataset of indicators available over the region as part of EDO. Precipitation data accumulated over consecutive 3-month periods are used here, as the quantity at the base of the SPI-3 index. Hourly total precipitation maps from the ECMWF ERA5 global atmospheric reanalysis model (

Soil moisture records over the entire European domain are derived from the simulations of the LISFLOOD distributed hydrological rainfall–runoff model (de Roo et al., 2000). LISFLOOD runs in near-real time as part of the European Flood Awareness System (Thielen et al., 2009), and it provides daily soil moisture maps for the root zone at a spatial resolution of 5 km. Daily modeled data are averaged at monthly scale and converted into a soil moisture index (SMI) as in Seneviratne et al. (2010). The model is calibrated and validated over an extensive network of river discharge stations following the procedure described in Arnal et al. (2019), and it has been successfully tested for drought analyses over Europe as part of EDO for the computation of the soil moisture anomaly (SMA) index (Cammalleri et al., 2015). Similar to precipitation, empirical frequencies are computed from the monthly soil moisture data in order to obtain a non-parametric calculation of the standardized anomaly, SMA, which is thus independent from a theoretical fitting (i.e., beta distribution). We will refer to this dataset as standardized soil moisture from hereafter.

In this study, data collected for the most recent 25 years (1996–2020) are used as a common period. This period is chosen to minimize the effects of non-stationarity in precipitation records and to avoid the inclusion of early LISFLOOD records that are affected by a lower number of ground meteorological stations in the forcing (Thieming et al., 2022). The time series of both standardized precipitation and soil moisture at grid cell scale are preliminarily tested for auto-correlation using the partial auto-correlation function (PACF; Box and Jenkins, 1976). This analysis returned positive and statistically significant (95 % confidence interval) values only at lag

The 300 maps (12 months

The introduction of copulas in multivariate probability modeling has provided to hydrologists a flexible tool to reproduce the joint probability of multiple dependent variables characterized by a variety of marginal distributions (De Michele and Salvadori, 2003; Salvadori and De Michele, 2004).

Limiting the focus on bivariate variables, the joint probability distribution,

A large variety of parametric formulations has been introduced in the literature to explicitly link the marginal distributions to the joint probability, with some of the most common copula families used in hydrology belonging to the elliptical and Archimedean copulas (Chen and Guo, 2019). Two measures of dependence play a major role in parametric copula inference. The Kendall rank correlation coefficient (

In this study, the parametric bivariate probability of standardized precipitation and soil moisture is assessed by using the R package “VineCopula” (Aas et al., 2009; Dißman et al., 2013). The Akaike information criterion (AIC; Stoica and Selen, 2004) is used to select, for each spatial grid cell, the best-fitting copula among the wide range of families available in the package. The main properties of some relevant copulas are reported in Table 1, as they will be useful to interpret the successive results.

Main copulas analyzed in this study and their upper and lower-tail-dependence coefficients (

In particular, from the data in Table 1 it is important to highlight how the BB7 copula is a combination of Joe and Clayton copulas, from which it inherits the tail dependences, and how the TD behavior of a copula can be inverted (i.e., the upper-tail dependence can become the lower and vice versa) by simply considering the reciprocal marginals (commonly known as rotated forms, identified by the suffix 180). Information from both non-parametric and parametric approaches is here jointly used to discriminate between different TD behaviors.

Even if a copula is selected as the optimal based on the AIC, this does not necessarily exclude the possibility that other copulas may perform similarly. For this reason, we introduced a further test based on the relative likelihood criterion (Burnham and Anderson, 2002),

The interpretation of the selected copula functions may help highlighting the transferability of the observed results over different contexts. For this reason, the observed spatial distribution of the selected copulas is analyzed through a random forest classifier (Breiman, 2001), in order to find evidence of reproducible patterns beyond simple chance.

As input features we consider a set of commonly available variables, such as ground elevation, annual average temperature, annual total precipitation, precipitation seasonality (ratio between total precipitation in warm and cold months), annual average normalized difference vegetation index (NDVI), annual average soil moisture, and soil type. As hyperparameters for the random forest, we tuned the number of trees (ntree) and the number of features randomly sampled at each split (mtry) using the “randomForest” R package (Breiman, 2001).

A preliminary analysis of the degree of correlation between the monthly standardized 3-month precipitation and soil moisture (analogous to non-parametric SPI-3 and SMA) is tested on the full time series of each grid cell using the Kendall's

Spatial distribution of the Kendall's

The results reported in Fig. 1 confirm the expected direct relation between the two variables, with a relatively homogeneous distribution of medium/high (between 0.3 and 0.5)

The analysis of the non-parametric tail-dependence values is summarized in the plot depicted in Fig. 2, where the cumulative frequency of the difference between the empirical

Analysis of the frequency of the empirical tail-dependence coefficients. The plot shows the cumulative frequency distribution of the differences between the empirical

The plot in Fig. 2 highlights how the majority (about 50 %) of the grid cells can be considered characterized by a symmetric behavior in the tail-dependence coefficients according to the abovementioned criterion (

The results reported in Fig. 2 were used to divide the entire domain in three categories (symmetric, LTD, and UTD) as depicted in Fig. 3. This map shows evidence of some coherent spatial patterns, such as the predominance of LTD in southern France, southern Italy, northern Germany and Denmark, and western Ukraine (among others), and a clustering of UTD in Poland, Czechia, southern Scandinavia, and Greece. The symmetric condition seems overall more spread across the entire domain, also thanks to the higher frequency, with a slightly predominance over northern Europe (i.e., northern Scandinavian peninsula and Iceland).

Spatial distribution of the three categories derived from the differences in the empirical tail-dependence coefficients.

Given the results of the tail-dependence assessment, it is useful to focus the copula parametric analysis on the capability of reproducing such patterns instead of finding the single copula that can perform reasonably well over the entire domain. Indeed, the search for the optimal copula based on the minimum AIC returns the BB7 as the optimal one in about 80 % of the domain (not shown). This result is a consequence of the BB7 flexibility (being derived from a combination of two purely asymmetric functions), which allows reproducing both symmetric and asymmetric tail-dependence coefficients according to the values assumed by the two parameters. However, the fact that a single flexible copula works well over a large range of conditions may hide the key spatial patterns observed in the TD analysis. These patterns may be better reproduced by adopting a limited number of more specialized copulas.

By limiting the search to a subset of copula functions, comprising only purely symmetric or purely asymmetric tail behaviors, more interesting results are obtained, as summarized by the frequency plot in Fig. 4. The grid cells where symmetric tail behavior copulas are selected as optimal are about 55 % of the domain (see Fig. 4b), with a predominance of Student's

Frequency of the optimal copulas based on the minimum AIC. The bar plot in panel

The spatial distribution of these optimal copulas (Fig. 5) mostly agrees with the patterns observed in Fig. 3, supporting the findings on the spatial distribution of TD coefficients. In addition, this result further confirms that a rather limited range of simple copula functions is able to capture the overall dynamics of dependence between precipitation and soil moisture over the entire European domain. Despite the observed spatial clusters in the obtained optimal copulas, the overall patterns in Fig. 5 are still rather noisy and may be difficult to interpret. This erratic behavior can be partially explained by the fact that different copulas may perform quite similarly over some grid cells; hence the AIC of the optimal copula (AIC

Spatial distribution of the optimal copulas obtained by minimizing the AIC. The symmetric tail behavior class includes both Gaussian and Student's

To further investigate this hypothesis, we evaluated the possibility of replacing the optimal copulas with either a Student's

Frequency analysis of the relative likelihood computed between the optimal AIC (AIC

The results in Fig. 6 show that, if we assume a relative likelihood of 0.1 as a threshold to detect a statistically significant difference, the Student's

Spatial distribution of the grid cells where the selection of the optimal copula is “univocal” according to the relative likelihood criterion.

The univocal areas derived from the previous analysis are mapped in Fig. 7, highlighting some of the more consistent spatial clusters already observed in both Figs. 3 and 5, as well as a large fraction of cells in northern Europe where a univocal optimal copula cannot be selected. These grid cells with univocal copula are used as a starting point for the random forest classification, given the robustness in their signal and the agreement in the outcome of both parametric and non-parametric TD behaviors.

A sample corresponding to 25 % of the univocal grid cells (about 8 % of the entire domain) was used to train the random forest, adopting a number of trees (ntree) of 80 and a single feature randomly sampled at each split (mtry

Summary of the confusion matrix analysis applied to the trained random forest on the testing subset.

Map of the optimal copula as modeled by the trained random forest classifier.

Finally, the trained classifier was applied to the entire dataset to obtain a classification of the European domain in terms of the expected optimal copula and the corresponding TD behavior. This map, reported in Fig. 8, shows a strong resemblance to both the empirically derived map in Fig. 3 and the optimal AIC fitting in Fig. 5. Beside this overall agreement, some notable discrepancies can be observed over northern Scandinavia and Iceland, two regions where low Kendall's

The overarching goal of the study is to investigate the joint probability of two standardized variables aiming at capturing agricultural drought conditions; hence the overall agreement between these two quantities is a fundamental prerequisite. A direct relationship between standardized 3-month cumulated precipitation and soil moisture is expected, since both SPI-3 and SMA are similarly used agricultural drought indices, and this can support the identification of the most suitable set of copula families (Salvadori et al., 2007; Genest et al., 2007). This direct relationship is overall confirmed by the positive Kendall's

Sehler et al. (2019) studied the correlation between remote-sensing-based precipitation and soil moisture, finding a moderate correlation over southern Europe and a weak (often not significant) correlation in central Europe. However, central Europe is close to the upper limit of the analyzed remote-sensing products, which can explain such low performance. Limited correlation even among different soil moisture products has been observed in northern Europe in other studies (Almenda-Martín et al., 2022), confirming the difficulty of modeling soil moisture dynamics over this region.

The obtained values for the Kendall's

The outcome of the tail-dependence analysis is even more interesting, given the role that such a metric plays in the detection of extreme events (and in particular the low tail for droughts). The TD investigation is sometimes overlooked in the development of multivariate drought indices, where previous studies often focused on optimizing the copula to the local data without analyzing the implicit assumption on the TD, the consistency with the non-parametric TD, and the implications of the associated dependence. Previous studies on the joint probability of precipitation and soil moisture are rather scarce, and TD is rarely the focus of such analyses or, at least, limited to specific areas and/or conditions.

As an example, Manning et al. (2018) performed a very detailed analysis over 11 FLUXNET sites in Europe on the role of precipitation and evapotranspiration on soil moisture drought, based on pairs of copula constructions, but the authors did not provide any indication of which bivariate copula was the optimal one for each site. Kwon et al. (2019) reported that the Frank copula was the most frequent optimal choice in their study over South Korea. However, some clear spatial patterns observed in their outcomes were not discussed, with Frank being the selected copula mostly in the central area of the domain but with Gumbel and Student's

Frequency distribution of the pairwise binary correlation between standardized precipitation and soil moisture lower than

Dash et al. (2019) found Frank (among the Archimedean copulas) working the best for 3-month precipitation and soil moisture over an Indian basin, while Hao and AghaKouchak (2013) highlighted the good performance of Frank and Gumbel in five regions of California, even if neither Gaussian nor Student's

The absence of a standard procedure to investigate tail dependence may be another factor affecting the limited focus on the topic in many studies on multivariate drought indices. Non-parametric TD has the clear advantage of avoiding any alteration of the data due to the fitting procedure, but the outcomes in this study also show a high degree of spatial noise likely due to the intrinsic nature of non-parametric analyses, the large uncertainty in non-parametric methods (Serinaldi et al., 2015), and the effects of the limited sample size (for this last issue, see also the illustration 3.18 in Salvadori et al., 2007). The threshold used here to define a symmetric behavior, based on a random shuffling of the data, seems to successfully overcome the difficulty of defining a self-consistent maximum difference in TD values, but it cannot be seen as a reliable approach to easily identify TD symmetry without the support of further evidence (e.g., by theoretical analyses).

In this regard, the fitting of parametric copula functions returns spatial patterns in TD coefficients similar to the ones obtained with the non-parametric approach. However, the absence of univocal fittings can be observed for large areas, as well as some contrasting results compared to the non-parametric TD especially over northern Europe (areas with a low correlation). The grid cells where a given copula clearly outperforms the alternative options is limited to roughly one-third of the domain, further stressing the evidence that clear-cut outcomes are difficult to infer from a single methodology. Thus, it seems reasonable to state that only a critical concerted analysis of both parametric and non-parametric TDs can return robust practical indications based on a converge of evidence.

A clear outcome of our study is the predominance of regions with symmetric tail-dependence coefficients, where the Student's

To further explore this behavior, the time series of standardized variables were converted in binary vectors based on the commonly used standardized drought threshold of

Regions such as southern France, the northern UK, northern Germany, and Denmark (where a strong LTD is observed; see Fig. 8) are appropriate candidates for a robust assessment of agricultural drought conditions based on a joint precipitation–soil moisture index, whereas some regions in central Europe (i.e., Poland, Czechia, Switzerland) may not equally benefit from the use of a joint index due to the lower importance of LTD.

Overall, the parametric copula fittings confirm most of the non-parametric TD patterns suggesting that a parametric approach is suitable for an operational implementation of a precipitation–soil moisture joint drought index over most of Europe. This implies that the proposed procedure, based on the combination of parametric and non-parametric analyses, can be considered a reliable tool to provide meaningful insight into the potential application of joint probability as a detector of extreme droughts.

At first glance, it may seem difficult to assign an explanation for the observed spatial patterns in LTD and UTD. However, the proven possibility of reasonably reconstructing these spatial patterns with a random forest classifier, starting from only a small sample of robust training data (less than 10 % of the domain) and with commonly available driving features, suggests that the observed clusters are unlikely to be caused only by chance and that hidden structures may be present and may be further explored. This result is encouraging for an extension of the derived approach to other regions of the world.

The use of combined indices based on a copula seems a promising development in the field of drought detection and monitoring. In this study, we analyzed the joint probability of two variables commonly used in agricultural drought analyses: the empirical frequencies of 3-month cumulated precipitation and soil moisture. We focus on the probabilistic characteristics being key for agricultural drought studies.

The overall agreement in the marginal probability of the two standardized variables suggests that they are indeed valid candidates for the development of a joint drought index over the European domain. However, an in-depth analysis of the tail dependence, derived with both non-parametric and parametric approaches, shows some clear spatial patterns, which have a direct repercussion for the capability of such data to provide robust and coherent estimates of drought extremes. In this regard, regions such as southern France, the northern UK, northern Germany, and Denmark may benefit more from the joint use of the two standardized variables thanks to the observed strong low-tail dependence (i.e., increasing agreement on the left tail extremes). The joint dependence of standardized precipitation and soil moisture is well reproduced by using three common copulas (Student's

The codes used for this analysis can be provided upon request via the corresponding author.

All the data used in this study can be accessed and retrieved through the European Drought Observatory (EDO) web portal (

CC designed the experiments, with inputs from AT and CDM. CC developed the codes and performed the analyses. CC prepared the paper, which was expanded and revised by all co-authors.

At least one of the (co-)authors is a member of the editorial board of

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

We would like to thank the reviewers for their thoughtful comments and efforts towards improving our manuscript.

This paper was edited by Alexander Gruber and reviewed by two anonymous referees.