Can causal discovery lead to a more robust prediction model for runoff signatures?
Abstract. Runoff signatures characterize a catchment's response and provide insight into the hydrological processes. These signatures are governed by the co-evolution of catchment properties and climate processes, making them useful for understanding and explaining hydrological responses. However, catchment behaviours can vary significantly across different spatial scales, which complicates the identification of key drivers of hydrologic response. This study represents catchments as networks of variables linked by cause-and-effect relationships. We examine whether the direct causes of runoff signatures can explain these signatures across different environments, with the goal of developing more robust, parsimonious, and physically interpretable predictive models. We compare predictive models that incorporate causal information derived from the relationships between catchment, climate, and runoff characteristics. We use the Peter and Clarck (PC) causal discovery algorithm, along with three prediction models: Bayesian Network (BN), Generalized Additive Model (GAM), and Random Forest (RF). The results indicate that among models, BN exhibits the smallest decline in accuracy between training and test simulations compared to the other models. While RF achieves the highest overall performance, it also demonstrates the most significant drop in accuracy between the training and test phases. When the training sample is small, the accuracy of the causal RF model, which uses causal parents as predictors, is comparable to that of the non-causal RF model, which uses all selected variables as predictors. This study demonstrates the potential of causal inference techniques in representing the interconnected processes in hydrological systems in a more interpretable and effective manner.