The importance of vegetation in understanding terrestrial water storage variations

Abstract. So far, various studies have aimed at decomposing the
integrated terrestrial water storage variations observed by satellite
gravimetry (GRACE, GRACE-FO) with the help of large-scale hydrological
models. While the results of the storage decomposition depend on model
structure, little attention has been given to the impact of the way that
vegetation is represented in these models. Although vegetation structure and
activity represent the crucial link between water, carbon, and energy cycles,
their representation in large-scale hydrological models remains a major
source of uncertainty. At the same time, the increasing availability and
quality of Earth-observation-based vegetation data provide valuable
information with good prospects for improving model simulations and gaining
better insights into the role of vegetation within the global water cycle. In this study, we use observation-based vegetation information such as
vegetation indices and rooting depths for spatializing the parameters of a
simple global hydrological model to define infiltration, root water uptake,
and transpiration processes. The parameters are further constrained by
considering observations of terrestrial water storage anomalies (TWS), soil
moisture, evapotranspiration (ET) and gridded runoff (Q) estimates in a
multi-criteria calibration approach. We assess the implications of including
varying vegetation characteristics on the simulation results, with a
particular focus on the partitioning between water storage components. To
isolate the effect of vegetation, we compare a model experiment in which
vegetation parameters vary in space and time to a baseline experiment in
which all parameters are calibrated as static, globally uniform values. Both experiments show good overall performance, but explicitly including
varying vegetation data leads to even better performance and more physically
plausible parameter values. The largest improvements regarding TWS and ET are
seen in supply-limited (semi-arid) regions and in the tropics, whereas Q
simulations improve mainly in northern latitudes. While the total fluxes and
storages are similar, accounting for vegetation substantially changes the
contributions of different soil water storage components to the TWS
variations. This suggests an important role of the representation of
vegetation in hydrological models for interpreting TWS variations. Our
simulations further indicate a major effect of deeper moisture storages and
groundwater–soil moisture–vegetation interactions as a key to understanding
TWS variations. We highlight the need for further observations to identify
the adequate model structure rather than only model parameters for a
reasonable representation and interpretation of vegetation–water
interactions.



S4 Regional IAV TWS Composition
Figure S7 Global and regional average inter-annual variability of simulated total water storage (wTotal) and its components (wSoil, wDeep, wSlow, wSnow) for B, including the regional Impact Index I for each storage. Figure S8 Global and regional average inter-annual variability of simulated total water storage (wTotal) and its components (wSoil, wDeep, wSlow, wSnow) for VEG, including the regional Impact Index I for each storage.

Figure S9 Global and regional average mean seasonal cycles of modelled transpiration (T) over evapotranspiration (ET) for B and VEG experiments.
Figure S10 Global distribution of modelled transpiration (T) over evapotranspiration (ET) for B and VEG experiments, as well as the difference between both (lower right).

S6 Q Components
Figure S11 Global and regional average mean seasonal cycle of observed grid-wise runoff from GRUN (Q) and simulated total runoff (Qtotal), as well as its components Qslow and Qfast, for the B and VEG experiments. corr is the Pearson correlation coefficient of the respective simulation with observed Q.

S7 Comparison of VEG & VEG without capillary rise
Figure S12 Global distribution of the Impact Index I for the contribution of simulated snow (wSnow), soil (wSoil), deep water storage (wDeep) and delayed water storage (wSlow) to the mean seasonal cycle of total water storage, for VEG and VEG-noGW2Soil, which is a variant of the VEG experiment, in with the capillary rise from wDeep to wSoil is turned off prior to model calibration.
Figure S13 Global distribution of the Impact Index I for the contribution of simulated snow (wSnow), soil (wSoil), deep water storage (wDeep) and delayed water storage (wSlow) to the inter-annual anomalies of total water storage, for VEG and VEG-noGW2Soil, which is a variant of the VEG experiment, in with the capillary rise from wDeep to wSoil is turned off prior to model calibration. Figure S14 Impact Index I for the contribution of simulated snow (wSnow), soil (wSoil), deep water storage (wDeep) and delayed water storage (wSlow) to the global average mean seasonal cycle and inter-annual variability of total water storage, for VEG and VEG-noGW2Soil, which is a variant of the VEG experiment, in with the capillary rise from wDeep to wSoil is turned off prior to model calibration.

S8 Comparison of VEG & VEG with fixed kTransp at 0.05
Figure S15 Grid-wise Pearson's correlation coefficient for total water storage (TWS), evapotranspiration (ET) and runoff (Q) between 1) observations and VEG, and 2) observations and VEG-nok2, as well as differences between 1) and 2) (brown color, i.e., negative values indicate higher correlations for VEG-nok2, while purple color, i.e., positive values indicate better correlation values for VEG). VEG-nok2 is a variant of the VEG experiment, in which the kTransp parameter is not calibrated but fixed at a low value of 0.05. Figure S16 Global and regional mean seasonal cycles of total water storage (TWS), evapotranspiration (ET) and runoff (Q) for VEG and VEG-nok2, which is a variant of the VEG experiment, in which the kTransp parameter is not calibrated but fixed at a low value of 0.05, compared to the observational constraints by GRACE (TWS), FLUXCOM (ET) and GRUN (Q).
Figure S17 Global distribution of the Impact Index I for the contribution of simulated snow (wSnow), soil (wSoil), deep water storage (wDeep) and delayed water storage (wSlow) to the mean seasonal cycle of total water storage, for VEG and VEG-nok2, which is a variant of the VEG experiment, in which the kTransp parameter is not calibrated but fixed at a low value of 0.05. Figure S18 Global distribution of the Impact Index I for the contribution of simulated snow (wSnow), soil (wSoil), deep water storage (wDeep) and delayed water storage (wSlow) to the inter-annual anomalies of total water storage, for VEG and VEG-VEG-nok2, which is a variant of the VEG experiment, in which the kTransp parameter is not calibrated but fixed at a low value of 0.05. Figure S19 Impact Index I for the contribution of simulated snow (wSnow), soil (wSoil), deep water storage (wDeep) and delayed water storage (wSlow) to the global average mean seasonal cycle and inter-annual variability of total water storage, for VEG and VEG-nok2, which is a variant of the VEG experiment, in which the kTransp parameter is not calibrated but fixed at a low value of 0.05.

S9 PFT experiment
The following shows an experiment similar to the traditional approach of global hydrological models, in which vegetationdependent parameters are defined and calibrated for different plant-functional type (PFT) classes separately and then model performance and TWS composition is analyzed in comparison to the B and VEG experiments. The results show that the larger number of parameters (due to different sets for different PFT) does not lead to marked improvements of model performance, but instead increases parameter uncertainty possibly due to overparameterization. In terms of TWS composition, we see substantial differences in the PFT experiment compared to B and VEG, which underlines our conclusions that the representation of vegetation in GHMs is critical for interpreting TWS variations.
Based on the GSWP2 land cover classification (Dirmeyer et al. 2006), we consider 12 PFT classes (Fig. S20), for which we define individual values of wSoilmax(2) (maximum available water capacity of the 2nd soil layer) and sberg (scaling parameter to derive the runoff/infiltration coefficient). Since state-of-the-art global hydrological models (GHMs) usually include seasonal dynamics of leaf area index (LAI) to calculate, e.g., transpiration, we decided to keep the definition of the active vegetation fraction as a function of seasonal EVI data as in the VEG experiment. For the PFT experiment, we focus (i) on wSoilmax(2) because GHMs usually apply a PFT specific rooting depth, and (ii) on sberg because this is similar to the runoff coefficient γ which is tuned in some GHMs (e.g., the WaterGAP model (Müller Schmied et al. 2021)).
When considering these 12 PFT classes, the number of calibration parameters increases from 12 (in B) and 16 (in VEG) to 34 (in PFT). Analysis of parameter uncertainty shows high uncertainties for a set of parameters common with B, while optimized parameter values are between those of B and VEG (Table S1). Additionally, and unlike B and VEG, PFT has high uncertainty of wSoilmax(2) for all PFT classes, and high correlation between each PFT's wSoilmax(2) and sberg (Fig. S21). High uncertainty of wSoilmax(2) is an indication that having one wSoilmax(2) per PFT may not explain the within-PFT variability. On the other hand, high correlation between each PFT's wSoilmax(2) and sberg is systematic, as both parameters are based on the same spatial distribution of PFT classes -and highlights an advantage of the VEG experiment, in which both are based on independent data sets.
In terms of model performance, Fig. S22 shows a partial improvement for TWS and ET in the PFT experiment. Especially Furthermore, the results of the PFT experiment confirm that changing the representation of vegetation has a marked impact on the simulated TWS composition (Fig. S23-S25). In PFT, among the liquid water storages wSoil contributes most to mean seasonal TWS variability, with Impact Index values between those of B and VEG (Fig. S23, Fig. S25).
Compared to VEG, wSlow is in general less important in PFT, while wDeep has a less impact on mean seasonal TWS, but its contribution to inter-annual TWS variability increases.
Overall, this analysis underlines that including continuous fields of vegetation parameters is preferable than the 'traditional' PFT-based approaches of defining parameters for distinct PFT classes (and their calibration) -in terms of model calibration and the uncertainty of calibrated model parameters, but also regarding model performance in relation to the number of model parameters. Furthermore, we could highlight that the representation of vegetation in hydrological models is crucial for the partitioning of simulated TWS.

S10 Consistency Check of Observational Data
In the following, we check for possible inconsistencies between the different observational data products. Similar to that more water leaves the system than comes in when looking at the observational data. In comparison, there is obviously no imbalance for the simulations from B and VEG as they close the water balance by definition of the model -which represents the major advantage of using models instead of observational based data from different sources.
We also calculated each variable in Eq. (S1) by solving the water balance with the other observed components and we compared the resulting water-balance-derived variable with the actual observed one. Differences between both indicate inconsistencies between a particular observed variable and the remaining observational variables. For ET, Q and TWS, we additionally plot the modelled fluxes and storage changes from B and VEG to evaluate the effect of observational inconsistencies on model simulations (Fig. S27). The modelled fluxes are smoother and closer to the observations than the same estimate of the variable from the water balance. Therefore, we find that the model allows to potentially bridge the inconsistencies between the different data products. However, for dS, B and VEG show a time shift compared to the observed storage change, that is not reflected in dS calculated from P, ET and Q observations. Accordingly, this underlines that the phase lag between observed and modelled TWS variations is not caused by data inconsistencies, but rather related to the potential deficiencies in the model structure, as already discussed in the main text of the manuscript.

S11 Analysis for Koeppen-Geiger Zones
Additionally to the hydroclimatic cluster analysis shown in the main text, we performed a regional analysis for Koeppen-Geiger climate zones. To do so, we aggregated Koeppen-Geiger subgroups considering the main climate group and distinguishing between humid and semi-arid conditions. The resulting zones are shown in Fig. S29.  is also evident from the performance maps in Fig. 4 of the main manuscript.
However, as mentioned in the main text, the advantage of the hydroclimatic cluster regionalization becomes obvious when interpreting results of the Arid and Temp-sa KG zones. This is because these climate zones are distributed across the Southern and Northern Hemisphere, causing 2 peaks in the regional seasonal cycles for TWS, ET and Q, due to opposing seasonal dynamics. The Arid KG zone includes the Semi-arid cluster regions (R5) in the Southern Hemisphere, as well as parts of the Temperate region (R2) (mainly in North America). The Temp-sa KG zone covers a rather small fraction of the study area, that is spread over the Temperate region (R2) in the Northern Hemisphere and the Semi-arid region (R5) of the Southern Hemisphere.
The effect of opposing seasonal cycles also exists in the Tropic KG zone, although less pronounced due to the proximity to the equator where the climate is more homogeneous and seasonality is low. The Tropic KG corresponds to the Humid cluster region (R3) on the Southern Hemisphere, and parts of the Sub-humid region (R4) on the Northern Hemisphere.
Compared to the hydroclimatic cluster regions, the Tropic KG has less seasonal variation (a smaller amplitude) of TWS, ET and Q, due to its larger area North and South of the equator. Both, B and VEG underestimate the ongoing depletion of TWS from September to December in Tropic KG, which is likely related to the opposing seasonal cycles of TWS in the Humid (R3) and the Sub-humid (R4) cluster regions. In the Tropic KG, Q peaks in March (as in Humid (R3)) and has a second, smaller peak in September (when Q peaks in the Sub-humid region (R4)). However, model performance is very similar for Tropic KG and the Humid and Sub-humid cluster regions.

Figure S30 Global and regional mean seasonal cycles of total water storage (TWS), evapotranspiration (ET) and runoff (Q) for the B and VEG experiments compared to the observational constraints by GRACE (TWS), FLUXCOM (ET) and GRUN (Q).
Figure S31 Global and regional mean seasonal cycles of simulated total water storage and its components for B and VEG, including the regional Impact Index I for each storage.