We investigate the potential of causal inference methods (CIMs) to reveal hydrological connections from time series. Four CIMs are selected from two criteria, linear or nonlinear and bivariate or multivariate. A priori, multivariate, and nonlinear CIMs are best suited for revealing hydrological connections because they fit nonlinear processes and deal with confounding factors such as rainfall, evapotranspiration, or seasonality. The four methods are applied to a synthetic case and a real karstic case study. The synthetic experiment confirms our expectation: unlike the other methods, the multivariate nonlinear framework has a low false-positive rate and allows for ruling out a connection between two disconnected reservoirs forced with similar effective precipitation. However, for the real case study, the multivariate nonlinear method was unstable because of the uneven distribution of missing values affecting the final sample size for the multivariate analyses, forcing us to cope with the results' robustness. Nevertheless, if we recommend a nonlinear multivariate framework to reveal actual hydrological connections, all CIMs bring valuable insights into the system's dynamics, making them a cost-effective and recommendable comparative tool for exploring data. Still, causal inference remains attached to subjective choices, operational constraints, and hypotheses challenging to test. As a result, the robustness of the conclusions that the CIMs can draw always deserves caution, especially with real, imperfect, and limited data. Therefore, alongside research perspectives, we encourage a flexible, informed, and limit-aware use of CIMs without omitting any other approach that aims at the causal understanding of a system.

Causal inference methods (CIMs) aim at identifying causal interactions between variables from data only

Our study compares four CIMs on hydrological case studies in the spirit of other recent comparative studies

Like

As a result, by design, multivariate nonlinear CIMs (e.g., PCMCI–CMI) would seem best suited to retrieving effective hydrological connections: they account for nonlinear dependencies and deal with confounding effects, e.g., seasonality or the forcing of precipitation and evapotranspiration, through a multivariate framework. Following a theoretical introduction of the different CIMs, we test this assertion and get hands-on the CIMs using a toy model to conduct a virtual experiment reproducing a simple case of two parallel hydrological reservoirs forced by the same effective precipitation. In a real case study, we apply the CIMs to data acquired at the Lorette cave (Rochefort, southern Belgium). This dataset includes rainfall and potential evapotranspiration data, electrical resistivity time series patterns from the subsurface obtained from a geophysical monitoring experiment using time-lapse electrical resistivity tomography (ERT)

With CCF, we consider that a variable

Usually, the CCF method is not explicitly presented as a CIM. Nevertheless, we consider it to be such because the method is simple, intelligible and linearly interpretable, and, in practice, widely used for the implicit purpose of making causal inference, despite its limits as a bivariate linear method, in most scientific domains like hydrology (e.g.,

CCM is primarily designed to reveal weak nonlinear interactions between time series

For a causal analysis and to check whether

There are few recent applications of CCM in hydrology

Finally, since CCM is rooted in chaos theory, it could be argued that CCM applies under the strict assumptions of its parent theory built on deterministic mathematical models, i.e., on low-dimensional systems with infinite length, noiseless, and with non-cyclic and non-intermittent series or relations

PCMCI is based on a stochastic framework testing conditional independence

Parents are subset variables found in the dataset that includes delayed variables up to

If

PCMCI is based on a strict framework of assumptions: faithfulness, causal sufficiency, the absence of contemporaneous dependencies, the causal Markov condition, stationarity, and the assumptions behind the selected independence tests such as linearity or nonlinear constraints, or hyperparameters related to the estimators (see

Given its flexibility, PCMCI is comparable to several other CIMs found in the literature. First, the PC algorithm itself is a CIM

The karstic study site is the Lorette cave, next to the city of Rochefort in southern Belgium (Fig. 1a) (

Study site and data:

An ERT profile was installed to investigate the hydrology of the subsurface and potential connections above the cave

R0 is associated with a dense limestone area in the model's center (Fig. 1b,

Time series of Fig. 1c are the 11 inputs for the four selected CIMs. Table 1 shows their statistics. Bivariate CIMs (CCF and CCM) are applied between each pair of time series on their overlapping time domain with a maximum causal delay

Summary statistics of the time series variable.

We aim at assessing CIM performances for the detection of effective hydrological connectivity, in particular, the assertion that multivariate nonlinear methods are best suited for that purpose. To this end, we built a simple hydrological reservoir model, inspired by the common cause problem (Fig. 2). Two separate and independent reservoirs, A and B, and their discharges

The conceptual and mathematical model for the synthetic case study. Two reservoirs A and B are forced by inflows

The model is forced by real effective precipitation data

For the synthetic cases, we derived four models based on Table 2. The unit hydrographs

Model parameters for the synthetic cases.

Figure 3 depicts the average and interquartile range of

Patterns of statistical time dependencies between

Regarding the multivariate CIMs (panels c and d), differencing time series is theoretically useless as PCMCI already conditions on the past variables (Eq. 2) and can deal with auto-correlation and seasonality. The linear ParCorr method seems to discriminate the connected case from the disconnected case. Still, it always shows a peak at lag 1 d, whatever the cases, which could be misinterpreted as an effective connection. Only the nonlinear CMI method applied to the non-differenced data seems to reject the idea of connection when it is effectively absent. This finding supports our theoretical assertion: the multivariate nonlinear method is best suited to addressing effective hydrological connectivity. Furthermore, the method appears to perform better if seasonality is left present in the time series. Still, Fig. 3 shows the pattern of the statistics, not the result of a causality test and its

Since we know for each simulation whether or not there is a causal link between A and B, Table 3 reports true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs) for the problematic lag of 1 d. We consider the multivariate PCMCI methods ParCorr and CMI, with the latter having two different

Causal test statistics for the synthetic cases at lag 1 d.

Figure 4 reports the causal graphs for significant pairwise dependencies between first-order differenced time series, for a better screening of time dependencies using bivariate methods (Fig. 3). Detailed time dependencies are also reported in the Supplement (Sect. S2.1, S2.2), allowing us to consider that

Graph of pairwise cross-dependencies:

For multivariate methods, we chose to report causal graphs for the raw (non-differenced) data since differencing is theoretically unnecessary and reduced the precision of CMI in the virtual experiment (Table 3). Hence, Fig. 5 shows linear conditional dependencies (ParCorr) obtained from the raw time series for the full dataset (all data, Fig. 5a) and considering the discharge series one by one (P1, P2, and P3, Fig. 5b to d). The P1, P2, and P3 datasets allow the analysis to be performed over larger time domains (Fig. 1c). Except for R4, the dominant relationships between resistivity and meteorological variables are maintained between the graphs, demonstrating stability in the ParCorr results despite differences in the considered time domain.

Graph of ParCorr cross-dependencies: considering

Regarding PCMCI–CMI, we found unstable results on all datasets: all data, P1, P2, and P3 (see Sect. S2.3). The causal graphs varied substantially when we repeated the analysis with the same parameters due to the stochastic nature of the independence test

Consensual graph of CMI cross-dependencies obtained from the ensemble of simulations performed in the sensitivity analysis, considering

The CCF results for the real case, on the differenced data, show a dubiously high number of significant connections; we interpret those as an indication that the method does not only reveal effective hydrological connections and reflects the complexity of inferring connectivity from CIMs with strongly correlated, synchronous series forced by common drivers. As a result, statistical dependencies, or functional connections, are ubiquitous. A corollary would be that one could commonly build a point-to-point model based on two time series and their statistical dependencies (functional connections) in hydrology. Somehow, it explains why hydrological systems may be modeled with simple functional lumped model or linear transfer functions

Besides, we see that our results contain many false positives reasonably identified from the sign of the dependencies (Fig. 4a). The fact that linear methods report signed dependencies makes it easier to interpret the results and potential false positives. For instance, positive relationships between resistivity and drip discharge are unexpected if water transfers from low-resistivity areas to the drip outlets. Often, positive relationships, e.g., R5

From this latter observation, we recall that dependencies can appear by chance, in particular if the series are short, hence the importance of significance tests. As suggested in

Though clearly not perfect, CCF is simple, linearly interpretable, and widespread or popular. Considering that other CIMs carry their own imperfection, we do not discourage using CCF while knowing its limitations, testing for linearity, removing harmonics (or the use of suitable surrogates' comparisons), or confounding factors using either a statistical or physical model. Manual handling of confounding factors is close to assessing connectivity/causality in a multivariate framework, although not automated but supervised by our knowledge and expertise. Here, we only removed the seasonal signal by differencing, which is not sufficient and results in many dependencies.

CCM results present fewer connections (Fig. 4b), with many connections pointing from RF or resistivity series (except for R4, the clayey limestone) to P1. In particular, this result is encouraging as we expect a fast preferential flow and effective connection between the surface and P1 (

In addition, noise is expected to reduce both the CCM's true- and false-positive rates, i.e., the general mapping skills (see

Currently, we consider CCM to be best suited to testing whether or not there is a functional connection between two points, similarly to CCF but considering nonlinear dependencies. Therefore, our recommendations align with those of CCF. However, if CCM has better predictive capabilities than CCF, it can be concluded that the dependency is nonlinear.

ParCorr is linearly interpretable and computationally efficient. As with CCM, the ParCorr results for the real experiment (Fig. 5) seem promising because they generally favor the expected preferential flow and effective connection between the surface and P1 (

Based on the results of the synthetic experiment (Fig. 3 and Table 3), PCMCI–ParCorr, or its variant multivariate GC, may suffer from a relatively high false-positive rate (

The result of the virtual experiment (Fig. 3 and Table 3) confirmed our general expectation that a multivariate nonlinear method is best suited for assessing effective connectivity. PCMCI–CMI had the lowest false-positive rate (Table 3), which is particularly desired given the confounding problem in hydrological systems. However, it had a low recall relative to other approaches, meaning that the results contain false negatives.

In the real case study, PCMCI–CMI was found to be unstable, providing different results across consecutive runs (see Figs. S5 and S6). Two main reasons might explain this instability. First, PCMCI conditions dependencies on the parents (Eq. 2) and, therefore, builds a dataset containing the initial variables and their delayed versions up to

For short datasets, or with unevenly distributed missing values, we consider that building a consensual graph from multiple runs is a good strategy as our results suggested a fast preferential flow in the case of P1 only (Fig. 6a and b), as expected (

We generally consider that multivariate nonlinear CIMs are preferred to assess effective connectivity based on the theoretical background and the results obtained from our synthetic and real experiments. Still, PCMCI–CMI may miss effective connections because of its low recall, evidenced by the virtual case (Table 3) and the low number of arrows reported in our consensual graph (Fig. 6). In addition, other assumptions may be violated

For our comparative study, we limited ourselves to specific methods and set aside some of their hypotheses to focus on fewer relevant elements for potential CIMs' users in hydrology. More details can be found regarding the underlying frameworks of the newly introduced CIMs, for CCM and/or chaos theory

Besides, the selected CIMs operate on the time domain, limiting us to studying close temporal connections. However, hydrological connections and processes can be spread over much longer timescales. This further questions our tacit assumption of causal sufficiency. Accordingly, studying methods that operate on the frequency domain or that couple the frequency domain with the time domain may deserve particular interest (e.g.,

The PCMCI algorithm is still being actively developed. A more recent implementation of PCMCI is now available in the new version of the Tigramite Python package (v4.2), with refined default parameters and improved computational performance. In particular, a new algorithm, PCMCI+, deals with contemporaneous links and strong auto-correlation in series, with the promises of stronger recall and a well-controlled false-positive rate

A final consideration is more epistemological: should hydrological connectivity (or causality) be studied from a purely empirical and single automated perspective, as with CIMs? We remind the reader that all types of methods can contribute to our causal understanding of environmental systems, e.g., dye tracing tests or spatially detailed inverse resistivity models

Regarding CIMs only,

The results highlighted that the nonlinear multivariate method, PCMCI coupled with the conditional mutual information test (PCMCI–CMI), shines by its low false-positive rate relative to the other three methods. Hence, statistical dependencies revealed by PCMCI–CMI are more likely to be effective hydrological connections. This advantage is particularly valuable since hydrological systems present highly interdependent time series (or functional connections), favoring a high false-positive rate. This finding confirms our introduced expectation that multivariate nonlinear CIMs are best suited to inferring effective connectivity while dealing with nonlinear dependencies and confounding factors resulting from seasonality or meteorological forcing. However, PCMCI–CMI has a low recall, i.e., it misses effective connections, and particular attention should be paid to the robustness of the outcome for small sample sizes or temporal gaps in the data, as evidenced in our real case study. Furthermore, PCMCI–CMI relies on challenging hypotheses to test

CCF and the Student's

The supplement related to this article is available online at:

DD conceptualized the research questions and formulated the ideas and the methodology within the frame of his PhD thesis, supervised by MV and MVC. Together with DD, OdV contributed to the methodology and formal analysis. AW was involved in investigating, collecting, modeling, and curating ERT data. DD wrote the original draft manuscript. All authors have contributed to reviewing and editing the manuscript. All authors have read and agreed to the published version of the manuscript.

At least one of the (co-)authors is a member of the editorial board of

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We are thankful to Michel Van Ruymbeke, who designed and constructed drip discharge sensors utilized in this study, and to Olivier Kaufmann, who also constructed drip discharge sensors, maintained the underground monitoring system, and conceptualized the time-lapse ERT experiment. The authors would like to thank the four anonymous reviewers whose insightful comments significantly improved the quality of the manuscript. Arnaud Watlet publishes with the permission of the Executive Director, British Geological Survey (UKRI-NERC).

This research has been supported by the “Fonds pour la Formation à la Recherche dans l’Industrie et dans l’Agriculture” (FRIA). Damien Delforge is FRIA grantee of the “Fonds de la Recherche Scientifique” (FNRS). The publication in an open access journal has been supported by the sector of science and technology of UCLouvain.

This paper was edited by Nunzio Romano and reviewed by four anonymous referees.