Learning from satellite observations: increased understanding  of catchment processes through stepwise model improvement

Hulsman, Petra; Savenije, Hubert H. G.; Hrachowitz, Markus

doi:https://doi.org/10.5194/hess-25-957-2021

Articles | Volume 25, issue 2

https://doi.org/10.5194/hess-25-957-2021

Articles | Volume 25, issue 2

Research article

24 Feb 2021

Research article |

| 24 Feb 2021

Learning from satellite observations: increased understanding of catchment processes through stepwise model improvement

Petra Hulsman, Hubert H. G. Savenije, and Markus Hrachowitz

Abstract

Satellite observations can provide valuable information for a better understanding of hydrological processes and thus serve as valuable tools for model structure development and improvement. While model calibration and evaluation have in recent years started to make increasing use of spatial, mostly remotely sensed information, model structural development largely remains to rely on discharge observations at basin outlets only. Due to the ill-posed inverse nature and the related equifinality issues in the modelling process, this frequently results in poor representations of the spatio-temporal heterogeneity of system-internal processes, in particular for large river basins. The objective of this study is thus to explore the value of remotely sensed, gridded data to improve our understanding of the processes underlying this heterogeneity and, as a consequence, their quantitative representation in models through a stepwise adaptation of model structures and parameters. For this purpose, a distributed, process-based hydrological model was developed for the study region, the poorly gauged Luangwa River basin. As a first step, this benchmark model was calibrated to discharge data only and, in a post-calibration evaluation procedure, tested for its ability to simultaneously reproduce (1) the basin-average temporal dynamics of remotely sensed evaporation and total water storage anomalies and (2) their temporally averaged spatial patterns. This allowed for the diagnosis of model structural deficiencies in reproducing these temporal dynamics and spatial patterns. Subsequently, the model structure was adapted in a stepwise procedure, testing five additional alternative process hypotheses that could potentially better describe the observed dynamics and pattern. These included, on the one hand, the addition and testing of alternative formulations of groundwater upwelling into wetlands as a function of the water storage and, on the other hand, alternative spatial discretizations of the groundwater reservoir. Similar to the benchmark, each alternative model hypothesis was, in a next step, calibrated to discharge only and tested against its ability to reproduce the observed spatio-temporal pattern in evaporation and water storage anomalies. In a final step, all models were re-calibrated to discharge, evaporation and water storage anomalies simultaneously. The results indicated that (1) the benchmark model (Model A) could reproduce the time series of observed discharge, basin-average evaporation and total water storage reasonably well. In contrast, it poorly represented time series of evaporation in wetland-dominated areas as well as the spatial pattern of evaporation and total water storage. (2) Stepwise adjustment of the model structure (Models B–F) suggested that Model F, allowing for upwelling groundwater from a distributed representation of the groundwater reservoir and (3) simultaneously calibrating the model with respect to multiple variables, i.e. discharge, evaporation and total water storage anomalies, provided the best representation of all these variables with respect to their temporal dynamics and spatial patterns, except for the basin-average temporal dynamics in the total water storage anomalies. It was shown that satellite-based evaporation and total water storage anomaly data are not only valuable for multi-criteria calibration, but can also play an important role in improving our understanding of hydrological processes through the diagnosis of model deficiencies and stepwise model structural improvement.

Download & links

Article (PDF, 2269 KB)

Supplement (3521 KB)

Download & links

How to cite.

Received: 24 Apr 2020 – Discussion started: 14 May 2020 – Revised: 20 Jan 2021 – Accepted: 25 Jan 2021 – Published: 24 Feb 2021

1 Introduction

Traditionally, discharge observations at basin outlets are used for hydrological model development and calibration, which can be a robust strategy in small watersheds with relatively uniform characteristics such as topography and land cover but not for larger, heterogeneous basins (Blöschl and Sivapalan, 1995; Daggupati et al., 2015). As a result, temporal dynamics of discharge may be well reproduced. This, however, does not ensure that the spatial patterns and temporal dynamics of model internal storage and flux variables provide a meaningful representation of their real pattern and dynamics (Beven, 2006b; Kirchner, 2006; Clark et al., 2008; Gupta et al., 2008; Hrachowitz et al., 2014; Garavaglia et al., 2017). Especially in large, poorly gauged basins, this traditional model calibration and testing method is likely to result in a poor representation of spatial variability (Daggupati et al., 2015) due to equifinality and the related boundary flux problem (Beven, 2006b).

An increasing number of satellite-based observations have become available over the last decade, giving us insight into a wide range of hydrology-relevant variables, including precipitation, total water storage anomalies, evaporation, surface soil moisture or river width (Xu et al., 2014; Jiang and Wang, 2019). These data are increasingly used as model forcing or for parameter selection and model calibration (e.g. Li et al., 2015; Mazzoleni et al., 2019; Tang et al., 2019).

Many studies used a single satellite product in the calibration procedure, some of them additionally using discharge data, others not. For instance, hydrological models have been calibrated with respect to evaporation (e.g. Immerzeel and Droogers, 2008; Winsemius et al., 2008; Vervoort et al., 2014; Bouaziz et al., 2018; Odusanya et al., 2019), water storage anomalies from GRACE (Gravity Recovery and Climate Experiment; Werth et al., 2009), river width (Revilla-Romero et al., 2015; Sun et al., 2018) or river altimetry (Getirana, 2010; Michailovsky et al., 2013; Sun et al., 2015; Hulsman et al., 2020).

Other studies simultaneously calibrated hydrological models with respect to multiple remotely sensed variables but only exploiting basin-average time series, without consideration for spatial patterns (e.g. Milzow et al., 2011; López et al., 2017; Kittel et al., 2018; Nijzink et al., 2018). On the other hand, some studies exclusively calibrated models to spatial patterns of observed variables (Stisen et al., 2011; Koch et al., 2016; Mendiguren et al., 2017; Demirel et al., 2018; Zink et al., 2018). As most satellite-based observations such as evaporation are not measured directly but are themselves a result of underlying models using satellite data as input (Xu et al., 2014), more focus has been recently placed on calibration to the relative spatial variability instead of using absolute magnitudes (Stisen et al., 2011; van Dijk and Renzullo, 2011; Dembélé et al., 2020).

To fully exploit the information content of satellite-based observations, simultaneous model calibration on both temporal dynamics and spatial patterns of multiple variables has the potential to improve the representation of spatio-temporal variability and, linked to that, their underlying model internal processes and therefore the model realism (Rientjes et al., 2013; Rakovec et al., 2016; Herman et al., 2018). Strikingly, only a few studies so far have used satellite-based observations to calibrate with respect to the temporal and spatial variation simultaneously (Rajib et al., 2018; Dembélé et al., 2020).

In general, most studies that have made use of remotely sensed data for model applications have exclusively addressed the problem of parameter selection and thus model calibration. However, as models are always abstract and simplified representations of reality, every model structure needs to be understood as a hypothesis to be tested (Clark et al., 2011; Fenicia et al., 2011; Hrachowitz and Clark, 2017). Yet, most studies on model structural improvement have so far only relied on spatially aggregated variables (Fenicia et al., 2008; Kavetski and Fenicia, 2011; Hrachowitz et al., 2014; Nijzink et al., 2016), while spatial data remain rarely used for that purpose (e.g. Fenicia et al., 2016; Roy et al., 2017).

The overall objective of this paper is therefore to explore the simultaneous use of spatial patterns and temporal dynamics of satellite-based evaporation and total water storage observations for a stepwise structural improvement and calibration of hydrological models for a large river system in a semi-arid, data-scarce region. More specifically, we tested the research hypotheses that (1) spatial patterns and temporal dynamics in satellite-based evaporation and water storage anomaly data contain relevant information to diagnose and to iteratively improve on model structural deficiencies and that (2) these data, when simultaneously used with discharge data for calibration, do contain sufficient information for a more robust parameter selection.

2 Site description

The Luangwa River in Zambia is a large, mostly unregulated tributary of the Zambezi with a length of about 770 km (Fig. 1). This poorly gauged river basin has an area of 159 000 km², which is mostly covered with deciduous forest, shrubs and savanna and where an elevation difference of up to 1850 m can be found between the highlands and lowlands along the river (The World Bank, 2010; Hulsman et al., 2020). In this semi-arid basin, the mean annual evaporation (1555 mm yr⁻¹) exceeds the mean annual precipitation (970 mm yr⁻¹).

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f01

Figure 1Map of the Luangwa River basin in Zambia with (a) the elevation, groundwater reservoir units at 0.1^∘ resolution and 1^∘ grid according to GRACE and (b) the main landscape types.

The Luangwa River flows into the Zambezi upstream of the Cahora Bassa dam, which is used for hydropower production and flood and drought protection. The operation of this dam is very difficult since there is only limited information available from the poorly gauged upstream tributaries (SADC, 2008; Schleiss and Matos, 2016). As a result, the local population has in the past suffered from severe floods and droughts (ZAMCOM et al., 2015; Schumann et al., 2016).

2.1 Data availability

2.1.1 In situ discharge observations

Historical daily in situ discharge data were available from the Zambian Water Resources Management Authority at the Great East Road Bridge gauging station, located at 30^∘13^′ E and 14^∘58^′ S (Fig. 1), for the time period 2004 to 2016, yet containing considerable gaps, resulting in a temporal coverage of 31 %.

2.1.2 Spatially gridded observation

Spatially gridded data were used for a topography-based landscape classification into hydrological response units (HRUs; Savenije, 2010), as model forcing (precipitation and temperature) and for parameter selection (evaporation and total water storage; see Table 1).

Table 1Data used in this study.

n/a stands for “not applicable”.

Download Print Version | Download XLSX

More specifically, topography was extracted from GMTED with a spatial resolution of 0.002^∘ (Danielson and Gesch, 2011). Daily precipitation data were extracted from CHIRPS (Climate Hazards Group InfraRed Precipitation with Station) with a spatial resolution of 0.05^∘. Monthly temperature data extracted from CRU at a spatial resolution of 1^∘ were used to estimate the potential evaporation, applying the Hargreaves method (Hargreaves and Samani, 1985; Hargreaves and Allen, 2003). These monthly observations were interpolated to a daily timescale using daily averaged in situ temperature measured at two locations with the coordinates 28^∘30^′ E, 14^∘24^′ S and 32^∘35^′ E, 13^∘33^′ S. The satellite-based total evaporation data were extracted from WaPOR (Water Productivity Open-access portal; FAO, 2018) version 1.1 as it proved to perform well in African river basins (Weerasinghe et al., 2020). This product was available on 10 d temporal and 250 m spatial resolution. Satellite-based observations on the total water storage anomalies were extracted from the Gravity Recovery and Climate Experiment (GRACE). With two identical GRACE satellites, the variations in the Earth's gravity field were measured to detect regional mass changes, which are dominated by variations in the terrestrial water storage after having accounted for atmospheric and oceanic effects (Landerer and Swenson, 2012; Swenson, 2012). In this study, the long-term bias between the discharge, evaporation (WaPOR) and total water storage anomalies (GRACE) was corrected by multiplying the evaporation by a correction factor of 1.08 to close the long-term water balance.

The gridded information provided for the precipitation, temperature and evaporation was rescaled to the model resolution of 0.1^∘. If the resolution of the satellite product was higher than 0.1^∘, then the mean of all cells located within each model cell was used. Otherwise, each cell of the satellite product was divided into multiple cells, such that the model resolution was obtained, retaining the original value. In contrast, the modelled total water storage was rescaled to 1^∘, the resolution of the GRACE data set, by taking the mean.

3 Modelling approach

A previously developed and tested (Hulsman et al., 2020) distributed, process-based hydrological model was implemented for the Luangwa basin; see Sect. 3.1 for more information. This benchmark model (Model A) was calibrated with respect to discharge for the time period 2002–2012 and validated for the time period 2012–2016 with respect to discharge, evaporation and total water storage anomalies. Then, the model was calibrated with respect to all above variables, hence discharge, evaporation and total water storage anomalies simultaneously, for the time period 2002–2012 and validated with respect to the same variables for the time period 2012–2016. Model deficiencies were then diagnosed for this benchmark model (Model A) based on the results of both calibration strategies.

Next, model structure changes were applied creating Models B–D to improve the deficiencies found in Model A. These changes concerned the groundwater upwelling into the unsaturated zone as explained in Sect. 4.2. The same calibration and validation strategies as applied to Model A were applied to Models B–D. Model improvements were evaluated, and further deficiencies were diagnosed for these models based on the calibration and validation results.

To improve the deficiencies diagnosed in Models B–D, further model structural changes, i.e. increased levels of spatial discretization of the saturated zone as explained in Sect. 4.3, resulted in Models E and F. Similar to the previous models, the same calibration and validation strategies were applied to Models E and F, and model improvements and deficiencies were diagnosed based on the calibration and validation model performances.

The calculation of the model performance with respect to discharge, evaporation and total water storage is explained in Sect. 3.2. The calibration and validation procedures are described in Sect. 3.3 and 3.4.

3.1 Hydrological models

3.1.1 Benchmark model (Model A)

This model is a process-based hydrological model developed in a previous study by Hulsman et al. (2020) for the Luangwa basin. In this model, the water accounting was distributed by discretizing the basin and using spatially distributed forcing data, while the same model structure and parameter set were used for the entire basin. Each $0.1^{\circ} \times 0.1^{\circ}$ model cell was then further discretized into functionally distinct landscape classes, i.e. hydrological response units (HRUs), inferred from topography (Fig. 1b) but connected by a common groundwater component (Euser et al., 2015) following the FLEX-Topo modelling concept (Savenije, 2010), which was previously successfully applied in many different and climatically contrasting regions (Gao et al., 2014, 2016; Gharari et al., 2014; Nijzink et al., 2016). Here, the landscape was classified based on the local slope and the height above the nearest drainage (HAND; Rennó et al., 2008) into sloped areas (slope ≥ 4 %), flat areas (slope < 4 %, HAND ≥ 11 m) and wetlands (slope < 4 %, HAND < 11 m). For this purpose, the drainage network was derived from a digital elevation map extracted from GMTED (Sect. 2.1.2) using a flow accumulation map after having burned in a river network map extracted from OpenStreetMap (https://wiki.openstreetmap.org/wiki/Shapefiles, last access: February 2021) to obtain an as accurate as possible drainage network, as done successfully in previous studies (Seyler et al., 2009). According to this classification, the wetland areas covered 8 % of the basin, flat areas 64 % and sloped areas 28 % (Fig. 1).

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f02

Figure 2Schematization of the model structure applied to each grid cell. Symbol explanation: precipitation (P), effective precipitation (P_e), interception evaporation (E_i), plant transpiration (E_t), infiltration into the unsaturated root zone (R_u), drainage to fast runoff component (R_f), delayed fast runoff (R_fl), lag time (T_lag), groundwater recharge (R_r), upwelling groundwater flux (R_GW), fast runoff (Q_f) and groundwater/slow runoff (Q_s).

Download

The model consisted of different storage components schematized as reservoirs representing interception and unsaturated storage, as well as a slow-responding reservoir, representing the groundwater, and a fast-responding reservoir (Fig. 2). The water balance for each reservoir and the associated constitutive equations are summarized in Table 2. The individual model structures of each parallel HRU were very similar. Functional differences between HRUs were thus mostly accounted for by different parameter sets. To allow for the use of partly overlapping prior parameter distributions while maintaining relationships between parameters of individual HRUs that are consistent with our physical understanding of the system and to limit equifinality, model process constraints (Gharari et al., 2014; Hrachowitz et al., 2014) were applied for several parameters (Table 3). For instance, in the Luangwa basin, the sloped areas are dominated by dense vegetation, suggesting higher interception capacities and larger storage capacities in the unsaturated zone compared to the remaining part of the basin. In addition, for each HRU the model structure was adjusted where necessary to include processes unique to that area. For instance, water percolates and recharges the groundwater system in sloped and flat areas, whereas in wetlands this was assumed to be negligible due to groundwater tables that are very shallow and thus close to the surface.

Table 2Equations applied in the hydrological model. Fluxes [mm d⁻¹]: precipitation (P), effective precipitation (P_e), potential evaporation (E_p), interception evaporation (E_i), plant transpiration (E_t), infiltration into the unsaturated zone (R_u), drainage to fast runoff component (R_f), delayed fast runoff (R_fl), groundwater recharge (R_r for each relevant HRU and R_r,tot combining all relevant HRUs), groundwater upwelling (R_GW for each relevant HRU and R_GW,tot combining all relevant HRUs), fast runoff (Q_f for each HRU and Q_f,tot combining all HRUs), groundwater/slow runoff (Q_s) and total runoff (Q_m). Storages [mm]: storage in interception reservoir (S_i), storage in unsaturated root zone (S_u), storage in groundwater/slow reservoir (S_s) and storage in fast reservoir (S_f). Parameters: interception capacity (I_max) [mm], maximum upwelling groundwater (C_max) [mm d⁻¹], maximum root zone storage capacity (S_u,max) [mm], reference storage in the saturated zone (S_s,ref) [mm], splitter (W) [–], shape parameter (β) [–], transpiration coefficient (C_e) [–], time lag (T_lag) [d], exponent (γ) [–], reservoir timescales [d] of fast (K_f) and slow (K_s) reservoirs, areal weights for each grid cell (p_HRU) [–] and time step (Δt) [d]. Model calibration parameters are shown in bold letters in the table below. The equations were applied to each hydrological response unit (HRU) unless indicated differently.

Download Print Version | Download XLSX

Table 3Model parameter and ranges (Hulsman et al., 2020).

Download Print Version | Download XLSX

The runoff was first calculated for each individual grid cell. A simple routing scheme based on the flow direction and constant flow velocity as calibration parameters was applied to estimate the flow at the outlet. In total, this model consisted of 16 calibration parameters with uniform prior distributions and constraints as summarized in Table 3.

3.1.2 First model adaptation: adding groundwater upwelling (Models B–D)

Satellite-based evaporation and total water storage observations were used to evaluate the benchmark model (Model A) with respect to the spatial and temporal variability visually and using model performance metrics as described in Sect. 3.2 to detect model deficiencies in these system-internal variables. The first model adaptation was applied to improve the hydrological model with respect to the deficiencies detected in Model A. Therefore, a detailed description of the reasoning behind the first model adaptation was explained in Sect. 4.2 after having described the deficiencies in Model A in Sect. 4.1.3.

In short, groundwater upwelling (R_GW) was added in wetland areas (see Fig. 2). This upwelling groundwater was made (1) a linear function of the water content in the unsaturated reservoir (Model B; Eq. 9 in Table 2), (2) a linear function of the water content in the slow-responding reservoir (Model C; Eq. 10) and (3) a non-linear function of the water content in the slow-responding reservoir (Model D; Eq. 11). As a result, upwelling water from the saturated zone feeds the unsaturated zone, controlled by the water content in the unsaturated (Model B) or in the saturated zone (Models C and D), and thus increasing the water availability for transpiration from the unsaturated zone in wetland areas. Compared to the benchmark model (Model A), Model B introduces one additional calibration parameter, Model C two and Model D three (Tables 2 and 3).

3.1.3 Second model adaptation: discretizing the groundwater system (Models E and F)

Similar to the first model adaptation, the second model adaptation was developed to improve deficiencies detected in Models B–D. Therefore, a detailed description of the reasoning behind the second model adaptation was explained in Section 4.3 after having described the deficiencies in Models B–D in Sect. 4.2.3.

In short, the spatial resolution of the slow-responding reservoir was gradually increased from lumped (Models A–D) to semi-distributed (Model E) and fully distributed (Model F). In Model E, the slow-responding reservoir was divided into four units as visualized in Fig. 1a, whereas in Model F it was further discretized into a grid of 10×10 km², equivalent to the remaining parts of the model. For both alternative formulations, Models E and F, the slow reservoir timescales K_s remained constant throughout the basin to limit the number of calibration parameters. For both Models E and F, groundwater upwelling was included according to Eq. (10) (Table 2), hence using Model C as a basis, introducing two additional calibration parameters compared to the benchmark model (Model A; Tables 2 and 3).

3.2 Model performance metrics

3.2.1 Discharge

The model performance with respect to discharge was evaluated using eight distinct signatures simultaneously characterizing the observed discharge data (Euser et al., 2013; Hulsman et al., 2020). The model performance measure was based either on the Nash–Sutcliffe efficiency (E_NS,θ; Eq. 28 in Table 4) or the relative error (E_R,θ; Eq. 29) depending on the individual signature. The resulting performance metrics for the eight signatures then included the Nash–Sutcliffe efficiencies of the daily discharge time series (E_NS,Q), its logarithm (E_NS,logQ), the flow duration curve (E_NS,FDC) and its logarithm (E_NS,logFDC), of the autocorrelation function of daily flows (E_NS,AC) and the relative errors of the mean seasonal runoff coefficient during dry and wet periods (E_R,RCdry, E_R,RCwet) and of the rising limb density of the hydrograph (E_R,RLD). All these signatures were combined into an overall performance metric based on the Euclidian distance to the “perfect” model (D_E,Qcal; Eq. 31). In absence of more information, and to obtain balanced solutions, all individual performance metrics were equally weighted in Eq. (31). Here, a $D_{E, Q cal} = 1$ indicates a perfect fit.

Table 4Overview of equations used to calculate model performance.

Download Print Version | Download XLSX

The discharge data availability was very limited during the validation time period (2012–2016). As a result, hydrological years were not fully captured, resulting in incomplete information on the hydrologic signatures such as rising limb density or autocorrelation function. That is why the overall model performance (D_E,Qval) was calculated using the signatures E_NS,Q, E_NS,logQ, E_NS,FDC and E_NS,logFDC, excluding E_R,RCdry, E_R,RCwet, E_R,RLD and E_NS,AC. It is therefore important to note that D_E,Qcal cannot be meaningfully compared with D_E,Qval. Instead, following the overall objective of the analysis, D_E,Qval of the different alternative model hypothesis were compared to evaluate the differences between the models.

3.2.2 Evaporation and total water storage

The model performance was also evaluated with respect to both the temporal dynamics and the spatial pattern of evaporation and total storage, respectively. For this purpose, satellite-based evaporation data (WaPOR) were used on a 10 d timescale and total water storage anomaly data (GRACE) on a monthly timescale.

Temporal variation

To quantify the models' skill in reproducing the temporal dynamics of evaporation and total water storage anomalies, the respective Nash–Sutcliffe efficiencies (Eq. 28) were used as performance metrics. This performance metric was applied to assess the models' skill in reproducing the basin-average time series of evaporation and total water storage anomalies, i.e. $E_{NS, Basin, E}$ and $E_{NS, Basin, S}$ . Similarly, the models' performance in mimicking the dynamics of evaporation in all grid cells dominated by the wetland HRU was calculated with the Nash–Sutcliffe efficiency ( $E_{NS, Wetland, E}$ ). Grid cells were considered as wetland-dominated if they were completely covered by wetlands, hence if p_HRU=1, with p_HRU the areal weight of wetland areas within that cell. With respect to evaporation, the flux was normalized first with Eq. (33) to emphasize temporal variations rather than absolute values in an attempt to reduce bias-related errors in the observation:

\begin{matrix} (33) & E_{normalized} = \frac{E - E_{\min}}{E_{\max} - E_{\min}} . \end{matrix}

Spatial variation

The model performance with respect to the spatial pattern of evaporation and total water storage anomalies was calculated with the spatial efficiency metrics E_SP,E and E_SP,S (Eq. 30), respectively, which were successfully used in previous studies (Demirel et al., 2018; Koch et al., 2018). The spatial model performance was first calculated for each time step within the dry period which was in September and October and then averaged to obtain the overall model performance (E_SP, Eq. 30). The spatial pattern was averaged over the dry season to minimize the effect of precipitation errors.

3.2.3 Multi-variable

The overall potential of the models to simultaneously reproduce the temporal dynamics as well as the spatial patterns of all observed variables, i.e. discharge, evaporation and total water storage anomalies, was tested with the overall model performance metric D_E,ESQ. This metric was the Euclidian distance (Eq. 32) of the following individual metrics: the temporal variation of the basin-average evaporation ( $E_{NS, Basin, E}$ ) and total water storage anomalies ( $E_{NS, Basin, S}$ ), spatial pattern of the evaporation (E_SP,E) and total water storage anomalies (E_SP,S), as well as discharge (D_E,Q). See Table 5 for an overview of all model performance metrics used in this study.

Table 5Overview of the applied model performance metrics.

Download Print Version | Download XLSX

3.3 Model calibration

In general, the model was calibrated by first running the model with 5×10⁴ random parameter sets generated with a Monte Carlo sampling strategy from uniform prior parameter distributions (Table 3). Then, the optimal and 5 % best-performing parameter sets were selected according to the model performance metric as described in the previous section. The model was calibrated within the time period 2002–2012 with respect to (1) discharge (D_E,Qcal) and (2) all variables simultaneously (D_E,ESQcal). As the objective of this study was to explore the information content of multiple variables using multiple model evaluation criteria for stepwise model structure development and calibration, it was important to use the same parameter sets for all models as a common starting point to rule out the effect of different parameter sets. This was efficiently possible with the Monte Carlo parameter sampling strategy, which, in addition, also allowed for a relatively straightforward and intuitive interpretation and communication of the results.

3.4 Model validation

The model was validated with respect to discharge, evaporation and total water storage anomalies for the time period 2012–2016. During validation each variable was evaluated separately, both temporally and spatially. This included the temporal variation of the basin-average evaporation ( $E_{NS, Basin, E}$ ) and total water storage anomalies ( $E_{NS, Basin, S}$ ), evaporation in wetland areas ( $E_{NS, Wetland, E}$ ), spatial pattern of the evaporation (E_SP,E) and total water storage anomalies (E_SP,S), as well as discharge (D_E,Qval). In addition, the model was evaluated with respect to the overall performance (D_E,ESQval). This was done for the solutions from both calibration strategies.

4 Model results

4.1 Benchmark model (Model A)

4.1.1 Discharge-based calibration

For the benchmark model (Model A), the model performance of all model realizations following the first calibration strategy, i.e. calibrating to discharge, resulted in an optimum $D_{E, Q cal, opt} = 0.76$ and $D_{E, Q val} = 0.37$ during validation (Table 6, Fig. 3). As shown in Fig. 4, the main features of the hydrological response were captured reasonably well. However, particularly in the validation period, low flows were somewhat underestimated. Note that in 2013, the observed high flows were probably underestimated due to failures in the recording, which resulted in a truncated top in the hydrograph and flat top in the flow duration curve during the validation time period (Fig. 4) and which affect the validated model performance values (D_E,Qval). The range in the calibrated model performance with respect to each discharge signature separately is visualized in Fig. S1 in the Supplement.

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f03

Figure 3Model performance with respect to discharge, evaporation and storage for all models. The model is calibrated to discharge (D_E,Qcal, darker boxplots in the first column) and validated to the discharge, evaporation and storage (lighter boxplots). The dots represent the model performance using the “optimal” parameter set and the boxplot the range of the best 5 % solutions, both according to discharge (D_E,Qcal). The following performance metrics were used: (1) discharge using the overall model performance metric (D_E,Qcal for calibration and D_E,Qval for validation), (2) temporal evaporation basin-average ( $E_{NS, Basin, E}$ ), (3) temporal evaporation, wetland areas only ( $E_{NS, Wetland, E}$ ), (4) spatial evaporation (E_SP,E), (5) temporal storage basin-average ( $E_{NS, Basin, S}$ ), (6) spatial storage (E_SP,S) and (7) the combination of evaporation, storage and discharge (combined metric D_E,ESQval).

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f04

Figure 4Range of model solutions for Model A. Panel (a) shows the hydrograph and (b, c) the flow duration curve of the recorded (black) and modelled discharge: the line indicates the solution with the highest calibration objective function with respect to discharge (D_E,Qcal) and the shaded area the envelope of the solutions retained as feasible. The data in the white area were used for calibration and the grey shaded area for validation.

Download

The basin-average evaporation ( $E_{NS, Basin, E} = 0.54$ ) and total water storage anomalies ( $E_{NS, Basin, S} = 0.74$ ) were in general also reproduced rather well (Figs. S3 and S5). In contrast, the model failed to mimic the evaporation dynamics in wetland-dominated areas as it decreased rapidly to zero in the dry season in contrast to the observations ( $E_{NS, Wetland, E} = 0.25$ , Fig. 5). Similarly, the spatial variability in evaporation ( $E_{SP, E} = 0.17$ ) and water storage anomalies ( $E_{SP, S} = - 0.02$ ) were poorly captured as several areas were over- or underestimated (Figs. 6 and 7). Note that in both figures the normalized evaporation and total water storage anomalies were plotted by applying Eq. (33) to emphasize relative spatial differences rather than absolute values.

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f05

Figure 5Range of model solutions for Model A. Panel (a) shows the time series and (b, c) the duration curve of the recorded (black) and modelled normalized evaporation for wetland-dominated areas: the line indicates the solution with the highest calibration objective function with respect to discharge (D_E,Qcal) and the shaded area the envelope of the solutions retained as feasible. The data in the grey shaded area were used for validation.

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f06

Figure 6Spatial variability of the normalized total evaporation for Model A averaged over all days within the dry season. Panel (a) shows the observation according to WaPOR data, (b) the model result using the “optimal” parameter set with respect to discharge (D_E,Qcal) and (c) the difference between the observation and model.

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f07

Figure 7Spatial variability of the normalized total water storage anomalies for Model A averaged over all days within the dry season. Panel (a) shows the observation according to GRACE data, (b) the model result using the “optimal” parameter set with respect to discharge (D_E,Qcal) and (c) the difference between the observation and model.

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f08

Figure 8Model performance with respect to discharge, evaporation and storage for all models. The model is calibrated to multiple variables simultaneously (D_E,ESQcal, darker boxplots in the first column) and evaluated with respect to each flux individually (lighter boxplots). The dots represent the model performance using the “optimal” parameter set and the boxplot the range of the best 5 % solutions, both according to D_E,ESQcal. The following performance metrics were used: (1) discharge using the overall model performance metric (D_E,Qval), (2) temporal evaporation basin-average ( $E_{NS, Basin, E}$ ), (3) temporal evaporation, wetland areas only ( $E_{NS, Wetland, E}$ ), (4) spatial evaporation (E_SP,E), (5) temporal storage basin-average ( $E_{NS, Basin, S}$ ), (6) spatial storage (E_SP,S) and (7) the combination of evaporation, storage and discharge (combined metric D_E,ESQval).

Download

4.1.2 Multi-variable calibration

Calibrating with respect to multiple variables simultaneously in the second calibration strategy resulted in a reduced model skill in simultaneously reproducing all flow signatures in the validation period with $D_{E, Q val} = 0.07$ (Table 7, Figs. 8 and 9). Compared to the first calibration strategy, the simulated evaporation did not change significantly with respect to the temporal dynamics ( $E_{NS, Wetland, E} = 0.27$ , $E_{NS, Basin, E} = 0.57$ ) and spatial pattern ( $E_{SP, E} = - 0.18$ ). Evaporation from wetland-dominated areas remained underestimated in the dry season (Fig. 10), and large areas in the basin were still under- or overestimated (Fig. 11). The reproduction of the total water storage anomalies decreased though, mostly with respect to the spatial pattern ( $E_{SP, S} = - 0.14$ , Fig. 12). On the other hand, when looking at the 5th and 95th percentile range instead of the “optimal” parameter set, then an improvement was observed in the spatial pattern in evaporation ( $E_{SP, E, 5 / 95} = - 0.10$ –0.22) and in total water storage ( $E_{SP, S, 5 / 95} = - 0.17$ –0.08; compare Tables 6 and 7).

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f09

Figure 9Range of model solutions for Models A to F. The left panels show the hydrograph and the right panels the flow duration curve of the recorded (black) and modelled discharge: the line indicates the solution with the highest calibration objective function with respect to multiple variables (D_E,ESQcal) and the shaded area the envelope of the solutions retained as feasible. The data in the grey shaded area were used for validation.

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f10

Figure 10Range of model solutions for Models A to F. The left panels show the time series and the right panels the duration curve of the recorded (black) and modelled normalized evaporation for wetland-dominated areas: the line indicates the solution with the highest calibration objective function with respect to multiple variables (D_E,ESQcal) and the shaded area the envelope of the solutions retained as feasible. The data in the grey shaded area were used for validation.

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f11

Figure 11Spatial variability of the normalized total evaporation for Models A, C and F averaged over all days within the dry season. Panels (a, d, g) show the observation according to WaPOR data, (b, e, h) the model result using the “optimal” parameter set with respect to multiple variables (D_E,ESQcal) and (c, f, i) the difference between the observation and model.

Download

https://hess.copernicus.org/articles/25/957/2021/hess-25-957-2021-f12

Figure 12Spatial variability of the normalized total water storage anomalies for Models A, C and F averaged over all days within the dry season. Panels (a, d, g) show the observation according to GRACE data, (b, e, h) the model result using the “optimal” parameter set with respect to multiple variables (D_E,ESQcal) and (c, f, i) the difference between the observation and model.

Download

Table 6Summary of model performance with respect to evaporation ( $E_{NS, Basin, E}$ , $E_{NS, Wetland, E}$ and E_SP,E), total water storage anomalies ( $E_{NS, Basin, S}$ or E_SP,S), discharge (D_E,Qcal and D_E,Qval) and all variables combined (D_E,ESQval): the parameter sets were selected based on discharge (D_E,Qcal).

Download Print Version | Download XLSX

Table 7Summary of the model performance with respect to evaporation ( $E_{NS, Basin, E}$ , $E_{NS, Wetland, E}$ and E_SP,E), total water storage anomalies ( $E_{NS, Basin, S}$ or E_SP,S), discharge (D_E,Qval) and all variables combined (D_E,ESQval): parameter sets selected based on multiple variables simultaneously (D_E,ESQcal).

Download Print Version | Download XLSX

4.1.3 Model deficiencies

Regardless of the calibration strategy, the benchmark model failed in particular to adequately reproduce evaporation dynamics in wetland-dominated areas. During the dry seasons, the modelled evaporation decreased rapidly to zero in contrast to the observations (Figs. 5 and 10). Partly as a consequence of that, the spatial pattern of evaporation was captured poorly, as illustrated in Figs. 6 and 11. Apart from the wetlands, the modelled average dry season evaporation was also extremely low in the centre of the basin, which did not correspond with the satellite observations. At the same time, the evaporation was significantly overestimated in the southern part of the basin. Also the spatial pattern in total water storage anomalies were poorly represented since the model significantly overestimated storage anomalies in large parts of the basin (Figs. 7 and 12). Note, overestimations in specific regions do not necessarily mean the actual (non-normalized) model values were also higher compared to the observation, but it does mean the model results in this cell/region were high relative to the remainder of the basin compared to observations. This was the case for the evaporation and total water storage, even though it was negative during dry seasons (compare Figs. S8 and S9).

4.2 First model adaptation: adding groundwater upwelling (Models B–D)

In the benchmark model (Model A), there was no groundwater upwelling into the wetlands and floodplains around the river channels, similar to many distributed conceptual hydrological models (e.g. Samaniego et al., 2010; Bieger et al., 2017). However, according to field and satellite-based observations, wetland areas remain moist at the end of the dry season, while the remaining areas of the basin become very dry. Given the low elevation of these wetlands above rivers, it is plausible to assume that groundwater from higher parts of the catchment is pushed up into the unsaturated root zone of these wetlands. As a result, water deficits in the unsaturated zone are partly replenished by upwelling groundwater. It thereby can sustain relatively elevated levels of moisture, available for plant transpiration long into the dry season.

To improve the representation of evaporation in the model, the process of upwelling groundwater (R_GW) was added to the model. In principle, it was assumed that the upwelling groundwater is regulated by the head difference between upland groundwater and the groundwater in the wetland. As this information was not available, due to the lack of continuous gradients in the type of model used (Hrachowitz and Clark, 2017), this was done in a simplified way. In three alternative formulations of this hypothesis, the upwelling groundwater was made (1) a linear function of the water content in the unsaturated reservoir (Model B, Eq. 9), (2) a linear function of the water content in the slow-responding reservoir (Model C, Eq. 10) and (3) a non-linear function of the water content in the slow-responding reservoir (Model D, Eq. 11). In other words, in Model B the groundwater upwelling was driven by the water deficit in the unsaturated zone; hence the lower the water content in the unsaturated zone, the higher the groundwater upwelling. In Models C and D, the groundwater upwelling was driven by the water content in the slow-responding reservoir, the groundwater system, such that the higher the water content in the slow-responding reservoir, the higher the groundwater upwelling. As a result of the non-linear relation between the groundwater upwelling and the water content in the slow-responding reservoir in Model D, the groundwater upwelling increased the most under dry conditions and the least under wet conditions. In Models B–D, the groundwater upwelling flowed into the unsaturated zone until it was saturated, hence until its maximum S_u,max was reached (Eq. 12). Model B required one additional calibration parameter, Model C two and Model D three (Tables 2 and 3).

4.2.1 Discharge-based calibration

Following the first calibration strategy, the performances of Models B–D with respect to discharge did not improve significantly for the calibration period ( $D_{E, Q cal} = 0.75$ –0.79) compared to Model A, regardless of the model (Table 6, Figs. 3 and S2). For the validation period, Models B and D experienced a pronounced reduction of their ability to adequately reproduce the discharge signatures with $D_{E, Q val} = 0.08$ and −1.7, respectively, since the flows were mostly underestimated (Fig. S2). On the other hand, Model C showed significant improvements with $D_{E, Q val} = 0.81$ . With respect to the evaporation from wetland-dominated areas, the largest improvements were found for Model D ( $E_{NS, Wetland, E} = 0.41$ ), where the evaporation did not drop rapidly to zero anymore, even though it was still significantly underestimated in the dry season (Fig. S4). But this came at the cost of decreased simulations of all remaining variables (Table 6, Fig. 3), i.e. the discharge, basin-average evaporation and total water storage and their spatial patterns (Figs. S2–S7). For example Fig. S6 illustrates the poorly simulated temporally averaged dry season evaporation for Model D, which was higher in wetland areas (centre of the basin) compared to the surrounding areas and which was not observed in the satellite-based observations. For Models B and C, the model performances with respect to the remaining variables remained comparable to Model A or even decreased, as can be seen in Table 6 and Fig. 3. As a result, when considering all variables simultaneously, Model C performed the best, with $D_{E, ES Q val} = 0.32$ .

4.2.2 Multi-variable calibration

Following the second calibration strategy, Model C experienced the largest increases compared to Model A in its ability to describe features of discharge, with $D_{E, Q val} = 0.61$ , while Model D decreased the most to $D_{E, Q val} = - 0.08$ , with the high flows being overestimated and low flows underestimated (Table 7, Figs. 8 and 9). With this calibration strategy, large improvements were observed in the reproduction of the evaporation from wetland-dominated areas for all three Models B–D, especially for Model D, with $E_{NS, Wetland, E} = 0.59$ , where the evaporation was simulated well, even during the dry season as it did not decrease rapidly to zero in the dry season compared to Model A (Fig. 10). For Models C and D, the spatial pattern in evaporation and total water storage anomalies improved, albeit moderately (Table 7), as large areas were still under- or overestimated (Figs. S12 and S13), whereas it decreased slightly for Model B. For Models B–D, the basin-average temporal dynamics in evaporation and total water storage anomalies remained similar or decreased slightly (Table 7, Figs. S10 and S11). Overall, when considering the model performance with respect to all variables simultaneously, Model C showed the best performance, with $D_{E, ES Q val} = 0.33$ .

4.2.3 Model deficiencies

According to the results, the representation of evaporation strongly benefitted from including upwelling groundwater as a function of the water content in the slow-responding reservoir (Eq. 10; Model C), especially for the second calibration strategy. The incorporation of this flux resulted in increased levels of water supply to the unsaturated zone of wetlands to sustain higher levels of transpiration throughout the dry periods (Fig. 10). But even though the evaporation increased during dry periods, it was still underestimated, especially towards the end of the dry season, due to too large an amount of groundwater upwelling depleting the slow-responding reservoir. The major weakness of the model remained its very limited ability in representing the spatial pattern in evaporation as there were several local clusters of considerable mismatches, both over- and underestimating observed evaporation. This was clearly visible for example in the centre and southern part of the basin (Fig. 11). Also the spatial pattern in the total water storage anomalies remained poorly represented, in spite of some improvements compared to Model A, as they were considerably overestimated in the northern parts of the basin (Fig. 12). This could be a result of deficiencies in the hydrological models or in the satellite-based observations.

4.3 Second model adaptation: discretizing the groundwater system (Models E and F)

In all above models, the groundwater layer was simulated as a single lumped reservoir, assuming equal groundwater availability throughout the entire basin. As groundwater processes can occur on relatively large spatial scales, this assumption may be valid for small-scale or mesoscale catchments but not necessarily for larger basins such as the Luangwa basin. This may partly be responsible for the deficiency of all above models to meaningfully reproduce the spatial pattern of the total water storage. Taking Model C as a basis for further model adaptations, two more alternative model hypothesis were formulated. In both models the slow-responding reservoir, representing the groundwater, was spatially discretized. For Model E, the reservoir was split into four units with an area of 15 396–47 239 km², each containing four to six different GRACE cells (see Fig. 1a). In contrast, Model F was formulated with a completely distributed slow reservoir at the resolution of the remaining parts of the model, i.e. 10×10 km². In Models E and F, the slow reservoir timescales K_s remained constant throughout the basin to limit the number of calibration parameters. Models E and F did not require additional calibration parameters. See Tables 2 and 3 for the corresponding model equations and calibration parameter ranges.

4.3.1 Discharge-based calibration

Following the first calibration strategy, the calibrated and validated model performance with respect to discharge did not change significantly for Model E compared to Model C. For Model F on the other hand, the calibrated model performance increased to $D_{E, Q cal} = 0.91$ (Table 6, Figs. 3 and S2), but during validation it decreased to $D_{E, Q val} = 0.52$ compared to Model C as a result of overestimated high flows (Fig. S2). In other words, the discharge simulation was only affected when applying a fully distributed groundwater system (Model F). Also the simulated dynamics of the evaporation improved for Model F, especially for wetland-dominated areas ( $E_{NS, Wetland, E} = 0.56$ , Table 6), even though it remained significantly underestimated during the dry season (Fig. S4). But for both models, no improvements in the spatial pattern of evaporation can be observed, with $E_{SP, E} = 0.05$ and −0.03 for Models E and F, respectively. As shown in Fig. S6, for Model E and F the temporally averaged dry season evaporation was very low in the centre of the basin compared to the remaining part of the basin in contrast to the satellite-based observations. The spatial pattern of total water storage anomalies was at least slightly better mimicked by Model F with $E_{SP, S} = 0.08$ (Fig. S7), which, in turn, came at the price of a poorer reproduction of the temporal dynamics of the basin-averaged total water storage anomalies ( $E_{NS, Basin, S} = 0.66$ , Fig. S5).

4.3.2 Multi-variable calibration

Including multiple variables in the calibration process did not improve the representation of the hydrological response with respect to discharge for Models E and F compared to Model C with $D_{E, Q val} = 0.30$ and 0.51, respectively (Table 7, Figs. 8 and 9). For both models, the flows were underestimated during low flows and overestimated during high flows (Fig. 9). Also the evaporation from wetland-dominated areas did not improve for both models as it decreased rapidly in the dry season (Fig. 10). On the other hand, the spatial pattern in the evaporation was slightly better mimicked for Model F ( $E_{SP, E} = 0.23$ ) but still at low performance levels similar to Models A–D, with large areas still being under- or overestimated (Fig. S12). Slight improvements could be observed though for the representation of spatial pattern in total water storage in Models F ( $E_{SP, S} = 0.09$ , Fig. S13), albeit modestly. Overall, when considering the model performance with respect to all variables simultaneously, Model F showed the best performance, with $D_{E, ES Q val} = 0.37$ .

4.3.3 Model deficiencies

Applying the second calibration strategy, Model F poorly reproduced the evaporation from wetlands (Fig. 10) since the water availability for evaporation decreased rapidly in the dry season due to the limited water availability in the slow-responding reservoir. This was a direct result of the limited connectivity in the distributed groundwater system within the basin and very likely points to the presence of contiguous groundwater systems extending beyond the modelling resolution that sustain dry-season evaporation in wetlands. Strikingly, discretizing the groundwater basin only had limited effects on the spatial pattern in evaporation and total water storage anomalies. Despite their limited improvements, they remained poorly captured as several local clusters were over- and underestimated (Figs. 11 and 12).

5 Discussion

As illustrated in the previous sections, satellite-based evaporation and storage anomaly data were used in an attempt to (1) iteratively improve a benchmark model structure and (2) identify parameter sets with which the model can simultaneously reproduce the temporal dynamics as well as the spatial patterns of multiple flux and storage variables.

The results suggested that among the tested models, Models C and F provided the overall best representation of the hydrological processes in the Luangwa basin, following the first and second calibration strategy respectively. The addition of upwelling groundwater alone (Model C) significantly improved the discharge simulations during validation regardless of the calibration strategy and the simulation of evaporation from wetland areas following the second calibration strategy. Discretizing the slow-responding reservoir (Model F) reached reasonable overall performance levels, i.e. D_E,ESQval, when calibrating with respect to discharge and its signatures only (Fig. 3), with improved simulations of evaporation from wetland areas. But calibrating with respect to multiple variables proved instrumental as it allowed the spatial pattern of the evaporation to be improved compared to calibrating with respect to discharge (Figs. 11 and S12) while maintaining high levels for the other performance criteria (Fig. 8). In general, it could also be observed that a further discretization of the model led to a better representation of the system especially with respect to the spatial patterns. Nevertheless, while the model structure and calibration strategy did influence the spatial pattern in the evaporation (Figs. S6 and S12) and total water storage anomalies (Figs. S7 and S13), none of the tested models could adequately reproduce the observed spatial pattern, which could be a result of model deficiencies or uncertainties in the satellite-based spatial patterns.

A potential reason for the models' problems in meaningfully describing the spatial pattern of the evaporation was in this study the use of the same parameters within a specific HRU in different model grid cells, as also observed in previous studies (Stisen et al., 2018). As a result, the simulated spatial pattern was strongly influenced by the catchment classification method into distinct HRUs. In this study, the catchment was classified merely on the basis of topography into flat, sloped and wetland areas, whereas ecosystem diversity could also be considered as an additional layer in the classification. The poor representation of the spatial pattern in total water storage was also partly linked to that. Spatially distributing calibration parameters could improve the modelled spatial pattern, assuming there are sufficient data available to meaningfully constrain the increased number of calibration parameters and thus to avoid elevated equifinality. In a preliminary test, the maximum interception storage (I_max) was spatially distributed using a linear transfer function with LAI (leaf area index) data similar to previous studies (Samaniego et al., 2010; Kumar et al., 2013) and using Model F as a basis. This did not result in obvious improvements as shown in Fig. S14. It was considered outside the scope of this study to analyse additional parameter distribution strategies with the limited data availability in this study region.

Another likely reason for the poorly modelled spatial pattern is the absence of lateral exchange of subsurface water between model grid cells in the tested models, as contiguous groundwater bodies of varying but unknown spatial scale will shape water transfer through the landscape in the real world, which remains unaccounted for in the model. Lateral exchange fluxes are, as any flux, driven by continuous gradients and resistances. However, conceptual-type models, such as the one used in this study, only mimic gradients within grid cells but not between grid cells. As a result, the head difference between neighbouring cells remains unknown, which entails that the direction and magnitude of lateral exchange between cells are unknown. Consequently, these fluxes can only be expressed on the basis of free calibration parameters. However, in this data-scarce region it will not be possible to test whether the additional calibration parameters and the associated exchange fluxes are physically plausible. These unspecified boundary fluxes across grid cells are at the core of the closure problem (Beven, 2006a) and touch on the limits of what can be done in hydrology with our current observational technology and the available data. Therefore, adding lateral exchange flow to the model was considered outside the scope of this study.

In addition, each of the applied data sources have their own uncertainties and bias. These include uncertainties in observed discharge due to rating curve uncertainties (Westerberg et al., 2011; Domeneghetti et al., 2012; Tomkins, 2014) and limited data availability, in precipitation data, often as a result of poorly capturing mountainous regions or extreme events on small scales (Hrachowitz and Weiler, 2011; Kimani et al., 2017; Dinku et al., 2018; Le Coz and van de Giesen, 2019), in estimates of total water storage anomalies as a result of data (post-)processing including data smoothing using a radius of for example 300 km, affecting the spatial variability on a basin scale (Landerer and Swenson, 2012; Blazquez et al., 2018), and in evaporation data due to model, input data and parameter estimation uncertainties (Zhang et al., 2016). In general satellite products are a result of models that are prone to uncertainties related to the input data or model conceptualization. Uncertainties in for example the spatial pattern of the precipitation affect the spatial pattern of the evaporation considerably, as shown in Fig. S15. In an ideal situation, the data would be validated with field measurements to assess the error magnitude. However, this was not possible due to data limitations. To compensate for bias errors in the satellite-based evaporation, and to allow for more reliable comparisons with model results, the satellite-based evaporation was adjusted with a correction factor of 1.08 (Sect. 2.1.2). Correcting the precipitation in a similar manner instead of the evaporation did not significantly affect the model results since normalized values were used for model calibration and evaluation (Fig. S16).

The results in this study were sensitive to the choice of performance metrics with respect to the individual variables (discharge, evaporation and total water storage) and all variables combined. For instance, the overall model performance measure D_E,ESQval (Eq. 32) was strongly influenced by the validated discharge model performance D_E,Qval due to its large range and variation between models compared to the remaining variables, for which the range was smaller and similar for all models (Fig. 8). As a result, the overall model performance measure might not reflect each variable equally well, which affected the choice of best performing model. However, this did not cause the poorly reproduced spatial pattern in the evaporation as it remained poorly modelled also when calibrating only with respect to that variable (E_SP,E, Fig. S17). In addition, the histogram component (γ) in the spatial efficiency metric (E_SP, Eq. 30) becomes less meaningful for very coarse resolutions when the river basin consists of only a few grid cells, as was the case for GRACE. It would be interesting to examine the different components in E_SP in more detail in future studies to assess the overall suitability of this metric to identify feasible parameter sets across different spatial scales.

Reflecting the results of previous studies, this study found that calibrating to multiple variables including the spatial patterns improved the simulation of the evaporation and storage with some trade-off in the discharge simulation depending on the model structure (Stisen et al., 2011; Rientjes et al., 2013; Demirel et al., 2018; Herman et al., 2018; 2018; Dembélé et al., 2020). But in contrast and in addition to previous studies, this study also provided an example, illustrating that spatial data, here evaporation and total water storage, can contain relevant information to diagnose model deficiencies and to therefore enable stepwise model structural improvement. Previous studies have largely relied on discharge observations to improve model structures (Hrachowitz et al., 2014; Fenicia et al., 2016), and only few studies used satellite data (Roy et al., 2017), even though they provide valuable information on the internal processes temporally and spatially, which is not available with discharge data alone (Daggupati et al., 2015; Rakovec et al., 2016). Roy et al. (2017) observed that the simulated evaporation according to the spatially lumped model HYMOD (HYdrological MODel) rapidly dropped to zero in contrast to the satellite product GLEAM (Global Land Evaporation Amsterdam Model) in the Nyangores River basin in Kenya. They improved this simulated evaporation while maintaining good discharge performances by modifying the corresponding equation in HYMOD such that it was a function of the soil moisture.

While here we focussed on upwelling groundwater and spatial discretization, a promising avenue for future studies may be to evaluate the incorporation of simple formulations of subsurface exchange fluxes between model grid cells. Similarly, a further discretization of HRUs into different land cover and ecosystem types may be worthwhile. In addition, a systematic sensitivity analysis is recommended to explore the influence of individual factors such as model structure and parameters on the spatial and temporal variability of different variables and to further improve the representation of the hydrological processes.

6 Conclusion

The objective of this paper was to explore the added value of satellite-based evaporation and total water storage anomaly data to increase the understanding of hydrological processes through stepwise model structure improvement and model calibration for large river systems in a semi-arid, data-scarce region. For this purpose, a distributed process-based hydrological model with sub-grid process heterogeneity for the Luangwa River basin was developed and iteratively adjusted. The results suggested that (1) the benchmark model (Model A) calibrated with respect to discharge reproduced observed discharge but also basin-average evaporation and total water storage anomalies rather well, while poorly capturing the evaporation for wetland-dominated areas as well as the spatial pattern of evaporation and total water storage anomalies. Testing five further alternative model structures (Models B–F), it was found that (2) among the tested model hypotheses, Model F, allowing for upwelling groundwater from a distributed representation of the groundwater reservoir and (3) simultaneously calibrating this model with respect to multiple variables, i.e. discharge, evaporation and total water storage anomalies, resulted in marked improvements of the model performance, providing the best simultaneous representation of all these variables with respect to their temporal dynamics and spatial pattern. As a result of the limited data availability and model hypotheses tested in this study, it should be noted that Model F allowed for the rejection of alternative hypotheses tested here but may be rejected in future studies in favour of another hypothesis. However, this study illustrated that satellite-based evaporation and total water storage anomaly data are not only valuable for multi-criteria calibration, but can also play an important role in improving our understanding of hydrological processes through the diagnosis of model deficiencies and stepwise model structural improvement.

Appendix A: Abbreviations

CHIRPS	Climate Hazards Group InfraRed Precipitation with Station
CMRSET	CSIRO MODIS reflectance scaling evapotranspiration
CRU	Climatic Research Unit
CSIRO	Commonwealth Scientific and Industrial Research Organization
FAO	Food and Agriculture Organization
GEOS	Goddard Earth Observing System model
GMTED	Global Multi-resolution Terrain Elevation Data
GRACE	Gravity Recovery and Climate Experiment
HRU	Hydrological response unit
MERRA	Modern-Era Retrospective analysis for Research and Applications
MODIS	Moderate Resolution Imaging Spectroradiometer
NDVI	Normalized Difference Vegetation Index
SSEBop	Operational Simplified Surface Energy Balance
WaPOR	Water Productivity Open-access portal

Data availability

Local discharge data were provided by WARMA (Water Resources Management Authority) in Zambia.

Author contributions

PH and MH designed the experiment. PH did the analysis. PH and MH wrote the first draft. All the authors discussed the results and contributed to writing the final paper.

Competing interests

The authors declare that they have no conflict of interest.

Acknowledgements

This research is supported by the TU Delft|Global Initiative, a programme of the Delft University of Technology to boost Science and Technology for Global Development.

Financial support

This research has been supported by the NWO-WOTRO (grant no. W 07.303.102).

Review statement

This paper was edited by Markus Weiler and reviewed by Simon Stisen and one anonymous referee.

References

Beven, K.: Searching for the Holy Grail of scientific hydrology: Q_t=(S, R, Δt)A as closure, Hydrol. Earth Syst. Sci., 10, 609–618, https://doi.org/10.5194/hess-10-609-2006, 2006a.

Beven, K. J.: A manifesto for the equifinality thesis, J. Hydrol., 320, 18–36, https://doi.org/10.1016/j.jhydrol.2005.07.007, 2006b.

Bieger, K., Arnold, J. G., Rathjens, H., White, M. J., Bosch, D. D., Allen, P. M., Volk, M., and Srinivasan, R.: Introduction to SWAT+, A Completely Restructured Version of the Soil and Water Assessment Tool, J. Am. Water Resour. Assoc., 53, 115–130, https://doi.org/10.1111/1752-1688.12482, 2017.

Blazquez, A., Meyssignac, B., Lemoine, J. M., Berthier, E., Ribes, A., and Cazenave, A.: Exploring the uncertainty in GRACE estimates of the mass redistributions at the Earth surface: implications for the global water and sea level budgets, Geophys. J. Int., 215, 415–430, https://doi.org/10.1093/gji/ggy293, 2018.

Blöschl, G. and Sivapalan, M.: Scale issues in hydrological modelling: A review, Hydrol. Process., 9, 251–290, https://doi.org/10.1002/hyp.3360090305, 1995.

Bouaziz, L. J. E., Weerts, A., Schellekens, J., Sprokkereef, E., Stam, J., Savenije, H., and Hrachowitz, M.: Redressing the balance: quantifying net intercatchment groundwater flows, Hydrol. Earth Syst. Sci., 22, 6415–6434, https://doi.org/10.5194/hess-22-6415-2018, 2018.

Clark, M. P., Rupp, D. E., Woods, R. A., Zheng, X., Ibbitt, R. P., Slater, A. G., Schmidt, J., and Uddstrom, M. J.: Hydrological data assimilation with the ensemble Kalman filter: Use of streamflow observations to update states in a distributed hydrological model, Adv. Water Resour., 31, 1309–1324, https://doi.org/10.1016/j.advwatres.2008.06.005, 2008.

Clark, M. P., Kavetski, D., and Fenicia, F.: Pursuing the method of multiple working hypotheses for hydrological modeling, Water Resour. Res., 47, W09301, https://doi.org/10.1029/2010WR009827, 2011.

Daggupati, P., Yen, H., White, M. J., Srinivasan, R., Arnold, J. G., Keitzer, C. S., and Sowa, S. P.: Impact of model development, calibration and validation decisions on hydrological simulations in West Lake Erie Basin, Hydrol. Process., 29, 5307–5320, https://doi.org/10.1002/hyp.10536, 2015.

Danielson, J. J. and Gesch, D. B.: Global multi-resolution terrain elevation data 2010 (GMTED2010), in: Open-File Report 2011-1073, US Geological Survey, Reston, Virginia, 2011.

Dembélé, M., Hrachowitz, M., Savenije, H. H. G., Mariéthoz, G., and Schaefli, B.: Improving the Predictive Skill of a Distributed Hydrological Model by Calibration on Spatial Patterns With Multiple Satellite Data Sets, Water Resour. Res., 56, e2019WR026085, https://doi.org/10.1029/2019WR026085, 2020.

Demirel, M. C., Mai, J., Mendiguren, G., Koch, J., Samaniego, L., and Stisen, S.: Combining satellite data and appropriate objective functions for improved spatial pattern performance of a distributed hydrologic model, Hydrol. Earth Syst. Sci., 22, 1299–1315, https://doi.org/10.5194/hess-22-1299-2018, 2018.

Dinku, T., Funk, C., Peterson, P., Maidment, R., Tadesse, T., Gadain, H., and Ceccato, P.: Validation of the CHIRPS satellite rainfall estimates over eastern Africa, Q. J. Roy. Meteorol. Soc., 144, 292–312, https://doi.org/10.1002/qj.3244, 2018.

Domeneghetti, A., Castellarin, A., and Brath, A.: Assessing rating-curve uncertainty and its effects on hydraulic model calibration, Hydrol. Earth Syst. Sci., 16, 1191–1202, https://doi.org/10.5194/hess-16-1191-2012, 2012.

Euser, T., Winsemius, H. C., Hrachowitz, M., Fenicia, F., Uhlenbrook, S., and Savenije, H. H. G.: A framework to assess the realism of model structures using hydrological signatures, Hydrol. Earth Syst. Sci., 17, 1893–1912, https://doi.org/10.5194/hess-17-1893-2013, 2013.

Euser, T., Hrachowitz, M., Winsemius, H. C., and Savenije, H. H. G.: The effect of forcing and landscape distribution on performance and consistency of model structures, Hydrol. Process., 29, 3727–3743, https://doi.org/10.1002/hyp.10445, 2015.

FAO: WaPOR Database Methodology: Level 1. Remote Sensing for Water Productivity Technical Report: Methodology Series, FAO, Rome, 74 pp., 2018.

FAO and IHE Delft: WaPOR quality assessment. Technical report on the data quality of the WaPOR FAO database version 1.0, FAO and IHE Delft, Rome, 134 pp., 2019.

Fenicia, F., McDonnell, J. J., and Savenije, H. H. G.: Learning from model improvement: On the contribution of complementary data to process understanding, Water Resour. Res., 44, W06419, https://doi.org/10.1029/2007WR006386, 2008.

Fenicia, F., Kavetski, D., and Savenije, H. H. G.: Elements of a flexible approach for conceptual hydrological modeling: 1. Motivation and theoretical development, Water Resour. Res., 47, W11510, https://doi.org/10.1029/2010WR010174, 2011.

Fenicia, F., Kavetski, D., Savenije, H. H. G., and Pfister, L.: From spatially variable streamflow to distributed hydrological models: Analysis of key modeling decisions, Water Resour. Res., 52, 954–989, https://doi.org/10.1002/2015WR017398, 2016.

Funk, C. C., Peterson, P. J., Landsfeld, M. F., Pedreros, D. H., Verdin, J. P., Rowland, J. D., Romero, B. E., Husak, G. J., Michaelsen, J. C., and Verdin, A. P.: A quasi-global precipitation time series for drought monitoring, in: Data Series 832, US Geological Survey, South Dakota, 4 pp. https://doi.org/10.3133/ds832, 2014.

Gao, H., Hrachowitz, M., Fenicia, F., Gharari, S., and Savenije, H. H. G.: Testing the realism of a topography-driven model (FLEX-Topo) in the nested catchments of the Upper Heihe, China, Hydrol. Earth Syst. Sci., 18, 1895–1915, https://doi.org/10.5194/hess-18-1895-2014, 2014.

Gao, H., Hrachowitz, M., Sriwongsitanon, N., Fenicia, F., Gharari, S., and Savenije, H. H. G.: Accounting for the influence of vegetation and landscape improves model transferability in a tropical savannah region, Water Resour. Res., 52, 7999–8022, https://doi.org/10.1002/2016WR019574, 2016.

Garavaglia, F., Le Lay, M., Gottardi, F., Garçon, R., Gailhard, J., Paquet, E., and Mathevet, T.: Impact of model structure on flow simulation and hydrological realism: from a lumped to a semi-distributed approach, Hydrol. Earth Syst. Sci., 21, 3937–3952, https://doi.org/10.5194/hess-21-3937-2017, 2017.

Getirana, A. C. V.: Integrating spatial altimetry data into the automatic calibration of hydrological models, J. Hydrol., 387, 244–255, https://doi.org/10.1016/j.jhydrol.2010.04.013, 2010.

Gharari, S., Hrachowitz, M., Fenicia, F., Gao, H., and Savenije, H. H. G.: Using expert knowledge to increase realism in environmental system models can dramatically reduce the need for calibration, Hydrol. Earth Syst. Sci., 18, 4839–4859, https://doi.org/10.5194/hess-18-4839-2014, 2014.

Gupta, H. V., Wagener, T., and Liu, Y.: Reconciling theory with observations: elements of a diagnostic approach to model evaluation, Hydrol. Process., 22, 3802–3813, https://doi.org/10.1002/hyp.6989, 2008.

Hargreaves, G. H. and Allen, R. G.: History and evaluation of hargreaves evapotranspiration equation, J. Irrig. Drain. Eng., 129, 53–63, https://doi.org/10.1061/(ASCE)0733-9437(2003)129:1(53), 2003.

Hargreaves, G. H. and Samani, Z. A.: Reference Crop Evapotranspiration from Temperature, Appl. Eng. Agricult., 1, 96–99, https://doi.org/10.13031/2013.26773, 1985.

Herman, M. R., Nejadhashemi, A. P., Abouali, M., Hernandez-Suarez, J. S., Daneshvar, F., Zhang, Z., Anderson, M. C., Sadeghi, A. M., Hain, C. R., and Sharifi, A.: Evaluating the role of evapotranspiration remote sensing data in improving hydrological modeling predictability, J. Hydrol., 556, 39–49, https://doi.org/10.1016/j.jhydrol.2017.11.009, 2018.

Hrachowitz, M. and Clark, M. P.: HESS Opinions: The complementary merits of competing modelling philosophies in hydrology, Hydrol. Earth Syst. Sci., 21, 3953–3973, https://doi.org/10.5194/hess-21-3953-2017, 2017.

Hrachowitz, M. and Weiler, M.: Uncertainty of Precipitation Estimates Caused by Sparse Gauging Networks in a Small, Mountainous Watershed, J. Hydrol. Eng., 16, 460–471, https://doi.org/10.1061/(ASCE)HE.1943-5584.0000331, 2011.

Hrachowitz, M., Fovet, O., Ruiz, L., Euser, T., Gharari, S., Nijzink, R., Freer, J., Savenije, H. H. G., and Gascuel-Odoux, C.: Process consistency in models: The importance of system signatures, expert knowledge, and process complexity, Water Resour. Res., 50, 7445–7469, https://doi.org/10.1002/2014WR015484, 2014.

Hulsman, P., Winsemius, H. C., Michailovsky, C. I., Savenije, H. H. G., and Hrachowitz, M.: Using altimetry observations combined with GRACE to select parameter sets of a hydrological model in a data-scarce region, Hydrol. Earth Syst. Sci., 24, 3331–3359, https://doi.org/10.5194/hess-24-3331-2020, 2020.

Immerzeel, W. W. and Droogers, P.: Calibration of a distributed hydrological model based on satellite evapotranspiration, J. Hydrol., 349, 411–424, https://doi.org/10.1016/j.jhydrol.2007.11.017, 2008.

Jiang, D. and Wang, K.: The role of satellite-based remote sensing in improving simulated streamflow: A review, Water, 11, 1615, https://doi.org/10.3390/w11081615, 2019.

Kavetski, D. and Fenicia, F.: Elements of a flexible approach for conceptual hydrological modeling: 2. Application and experimental insights, Water Resour. Res., 47, W11511, https://doi.org/10.1029/2011WR010748, 2011.

Kimani, W. M., Hoedjes, C. B. J., and Su, Z.: An Assessment of Satellite-Derived Rainfall Products Relative to Ground Observations over East Africa, Remote Sens., 9, 430, https://doi.org/10.3390/rs9050430, 2017.

Kirchner, J. W.: Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology, Water Resour. Res., 42, W03S04, https://doi.org/10.1029/2005WR004362, 2006.

Kittel, C. M. M., Nielsen, K., Tøttrup, C., and Bauer-Gottwein, P.: Informing a hydrological model of the Ogooué with multi-mission remote sensing data, Hydrol. Earth Syst. Sci., 22, 1453–1472, https://doi.org/10.5194/hess-22-1453-2018, 2018.

Koch, J., Siemann, A., Stisen, S., and Sheffield, J.: Spatial validation of large-scale land surface models against monthly land surface temperature patterns using innovative performance metrics, J. Geophys. Res.-Atmos., 121, 5430–5452, https://doi.org/10.1002/2015JD024482, 2016.

Koch, J., Demirel, M. C., and Stisen, S.: The SPAtial EFficiency metric (SPAEF): Multiple-component evaluation of spatial patterns for optimization of hydrological models, Geosci. Model Dev., 11, 1873–1886, https://doi.org/10.5194/gmd-11-1873-2018, 2018.

Kumar, R., Samaniego, L., and Attinger, S.: Implications of distributed hydrologic model parameterization on water fluxes at multiple scales and locations, Water Resour. Res., 49, 360–379, https://doi.org/10.1029/2012WR012195, 2013.

Landerer, F. W. and Swenson, S. C.: Accuracy of scaled GRACE terrestrial water storage estimates, Water Resour. Res., 48, W04531, https://doi.org/10.1029/2011WR011453, 2012.

Le Coz, C. and van de Giesen, N.: Comparison of rainfall products over sub-Sahara Africa, J. Hydrometeorol., 21, 553–596, https://doi.org/10.1175/JHM-D-18-0256.1, 2019.

Li, Z., Yang, D., Gao, B., Jiao, Y., Hong, Y., and Xu, T.: Multiscale hydrologic applications of the latest satellite precipitation products in the Yangtze river basin using a distributed hydrologic model, J. Hydrometeorol., 16, 407–426, https://doi.org/10.1175/JHM-D-14-0105.1, 2015.

López, P. L., Sutanudjaja, E. H., Schellekens, J., Sterk, G., and Bierkens, M. F. P.: Calibration of a large-scale hydrological model using satellite-based soil moisture and evapotranspiration products, Hydrol. Earth Syst. Sci., 21, 3125–3144, https://doi.org/10.5194/hess-21-3125-2017, 2017.

Mazzoleni, M., Brandimarte, L., and Amaranto, A.: Evaluating precipitation datasets for large-scale distributed hydrological modelling, J. Hydrol., 578, 124076, https://doi.org/10.1016/j.jhydrol.2019.124076, 2019.

Mendiguren, G., Koch, J., and Stisen, S.: Spatial pattern evaluation of a calibrated national hydrological model – a remote-sensing-based diagnostic approach, Hydrol. Earth Syst. Sci., 21, 5987–6005, https://doi.org/10.5194/hess-21-5987-2017, 2017.

Michailovsky, C. I., Milzow, C., and Bauer-Gottwein, P.: Assimilation of radar altimetry to a routing model of the Brahmaputra River, Water Resour. Res., 49, 4807–4816, https://doi.org/10.1002/wrcr.20345, 2013.

Milzow, C., Krogh, P. E., and Bauer-Gottwein, P.: Combining satellite radar altimetry, SAR surface soil moisture and GRACE total storage changes for hydrological model calibration in a large poorly gauged catchment, Hydrol. Earth Syst. Sci., 15, 1729–1743, https://doi.org/10.5194/hess-15-1729-2011, 2011.

Nijzink, R. C., Samaniego, L., Mai, J., Kumar, R., Thober, S., Zink, M., Schäfer, D., Savenije, H. H. G., and Hrachowitz, M.: The importance of topography-controlled sub-grid process heterogeneity and semi-quantitative prior constraints in distributed hydrological models, Hydrol. Earth Syst. Sci., 20, 1151–1176, https://doi.org/10.5194/hess-20-1151-2016, 2016.

Nijzink, R. C., Almeida, S., Pechlivanidis, I. G., Capell, R., Gustafssons, D., Arheimer, B., Parajka, J., Freer, J., Han, D., Wagener, T., van Nooijen, R. R. P., Savenije, H. H. G., and Hrachowitz, M.: Constraining Conceptual Hydrological Models With Multiple Information Sources, Water Resour. Res., 54, 8332–8362, https://doi.org/10.1029/2017WR021895, 2018.

Odusanya, A. E., Mehdi, B., Schürz, C., Oke, A. O., Awokola, O. S., Awomeso, J. A., Adejuwon, J. O., and Schulz, K.: Multi-site calibration and validation of SWAT with satellite-based evapotranspiration in a data-sparse catchment in southwestern Nigeria, Hydrol. Earth Syst. Sci., 23, 1113–1144, https://doi.org/10.5194/hess-23-1113-2019, 2019.

Rajib, A., Evenson, G. R., Golden, H. E., and Lane, C. R.: Hydrologic model predictability improves with spatially explicit calibration using remotely sensed evapotranspiration and biophysical parameters, J. Hydrol., 567, 668–683, https://doi.org/10.1016/j.jhydrol.2018.10.024, 2018.

Rakovec, O., Kumar, R., Attinger, S., and Samaniego, L.: Improving the realism of hydrologic model functioning through multivariate parameter estimation, Water Resour. Res., 52, 7779–7792, https://doi.org/10.1002/2016WR019430, 2016.

Rennó, C. D., Nobre, A. D., Cuartas, L. A., Soares, J. V., Hodnett, M. G., Tomasella, J., and Waterloo, M. J.: HAND, a new terrain descriptor using SRTM-DEM: Mapping terra-firme rainforest environments in Amazonia, Remote Sens. Environ., 112, 3469–3481, https://doi.org/10.1016/j.rse.2008.03.018, 2008.

Revilla-Romero, B., Beck, H. E., Burek, P., Salamon, P., de Roo, A., and Thielen, J.: Filling the gaps: Calibrating a rainfall-runoff model using satellite-derived surface water extent, Remote Sens. Environ., 171, 118–131, https://doi.org/10.1016/j.rse.2015.10.022, 2015.

Rientjes, T. H. M., Muthuwatta, L. P., Bos, M. G., Booij, M. J., and Bhatti, H. A.: Multi-variable calibration of a semi-distributed hydrological model using streamflow data and satellite-based evapotranspiration, J. Hydrol., 505, 276–290, https://doi.org/10.1016/j.jhydrol.2013.10.006, 2013.

Roy, T., Gupta, H. V., Serrat-Capdevila, A., and Valdes, J. B.: Using satellite-based evapotranspiration estimates to improve the structure of a simple conceptual rainfall–runoff model, Hydrol. Earth Syst. Sci., 21, 879–896, https://doi.org/10.5194/hess-21-879-2017, 2017.

SADC: Integrated Water Resources Management Strategy and Implementation Plan for the Zambezi River Basin, Euroconsult Mott MacDonald, available at: http://www.zambezicommission.org/sites/default/files/clusters_pdfs/Zambezi River_Basin_IWRM_Strategy_ZAMSTRAT.pdf (last access: February 2021), 2008.

Samaniego, L., Kumar, R., and Attinger, S.: Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale, Water Resour. Res., 46, W05523, https://doi.org/10.1029/2008WR007327, 2010.

Savenije, H. H. G.: Topography driven conceptual modelling (FLEX-Topo), Hydrol. Earth Syst. Sci., 14, 2681–2692, https://doi.org/10.5194/hess-14-2681-2010, 2010.

Schleiss, A. J. and Matos, J. P.: Chapter 98: Zambezi River Basin, in: Chow's Handbook of Applied Hydrology, edited by: Singh, V. P., McGraw-Hill Education – Europe, USA, 2016.

Schumann, G., Kirschbaum, D., Anderson, E., and Rashid, K.: Role of Earth Observation Data in Disaster Response and Recovery: From Science to Capacity Building, in: Earth Science Satellite Applications, edited by: Hossain, F., Springer International Publishing, Seattle, USA, 2016.

Seyler, F., Muller, F., Cochonneau, G., Guimarães, L., and Guyot, J. L.: Watershed delineation for the Amazon sub-basin system using GTOPO30 DEM and a drainage network extracted from JERS SAR images, Hydrol. Process., 23, 3173–3185, https://doi.org/10.1002/hyp.7397, 2009.

Stisen, S., McCabe, M. F., Refsgaard, J. C., Lerer, S., and Butts, M. B.: Model parameter analysis using remotely sensed pattern information in a multi-constraint framework, J. Hydrol., 409, 337–349, https://doi.org/10.1016/j.jhydrol.2011.08.030, 2011.

Stisen, S., Koch, J., Sonnenborg, T. O., Refsgaard, J. C., Bircher, S., Ringgaard, R., and Jensen, K. H.: Moving beyond run-off calibration – Multivariable optimization of a surface–subsurface–atmosphere model, Hydrol. Process., 32, 2654–2668, https://doi.org/10.1002/hyp.13177, 2018.

Sun, W., Song, H., Cheng, T., and Yu, J.: Calibration of hydrological models using TOPEX/Poseidon radar altimetry observations, Proc. IAHS, 368, 3–8, https://doi.org/10.5194/piahs-368-3-2015, 2015.

Sun, W., Fan, J., Wang, G., Ishidaira, H., Bastola, S., Yu, J., Fu, Y. H., Kiem, A. S., Zuo, D., and Xu, Z.: Calibrating a hydrological model in a regional river of the Qinghai–Tibet plateau using river water width determined from high spatial resolution satellite images, Remote Sens. Environ., 214, 100–114, https://doi.org/10.1016/j.rse.2018.05.020, 2018.

Swenson, S. C.: GRACE monthly land water mass grids NETCDF RELEASE 5.0, PO.DAAC, CA, USA, 2012.

Swenson, S. C. and Wahr, J.: Post-processing removal of correlated errors in GRACE data, Geophys. Res. Lett., 33, L08402, https://doi.org/10.1029/2005GL025285, 2006.

Tang, X., Zhang, J., Gao, C., Ruben, G. B., and Wang, G.: Assessing the uncertainties of four precipitation products for SWAT modeling in Mekong River Basin, Remote Sens., 11, 304, https://doi.org/10.3390/rs11030304, 2019.

The World Bank: The Zambezi River Basin: A Multi-Sector Investment Opportunities Analysis, in: Volume 3 State of the Basin, The International Bank for Reconstruction and Development, The World Bank, Washington, D.C., 2010.

Tomkins, K. M.: Uncertainty in streamflow rating curves: Methods, controls and consequences, Hydrolo. Process., 28, 464–481, https://doi.org/10.1002/hyp.9567, 2014.

University of East Anglia Climatic Research Unit, Harris, I. C., and Jones, P. D.: CRU TS4.01: Climatic Research Unit (CRU) Time-Series (TS) version 4.01 of high-resolution gridded data of month-by-month variation in climate (Jan. 1901–Dec. 2016), Centre for Environmental Data Analysis, Norwich, UK, https://doi.org/10.5285/58a8802721c94c66ae45c3baa4d814d0, 2017.

van Dijk, A. I. J. M. and Renzullo, L. J.: Water resource monitoring systems and the role of satellite observations, Hydrol. Earth Syst. Sci., 15, 39–55, https://doi.org/10.5194/hess-15-39-2011, 2011.

Vervoort, R. W., Miechels, S. F., van Ogtrop, F. F., and Guillaume, J. H. A.: Remotely sensed evapotranspiration to calibrate a lumped conceptual model: Pitfalls and opportunities, J. Hydrol., 519, 3223–3236, https://doi.org/10.1016/j.jhydrol.2014.10.034, 2014.

Weerasinghe, I., Bastiaanssen, W., Mul, M., Jia, L., and van Griensven, A.: Can we trust remote sensing evapotranspiration products over Africa?, Hydrol. Earth Syst. Sci., 24, 1565–1586, https://doi.org/10.5194/hess-24-1565-2020, 2020.

Werth, S., Güntner, A., Petrovic, S., and Schmidt, R.: Integration of GRACE mass variations into a global hydrological model, Earth Planet. Sc. Lett., 277, 166–173, https://doi.org/10.1016/j.epsl.2008.10.021, 2009.

Westerberg, I., Guerrero, J. L., Seibert, J., Beven, K. J., and Halldin, S.: Stage-discharge uncertainty derived with a non-stationary rating curve in the Choluteca River, Honduras, Hydrol. Process., 25, 603–613, https://doi.org/10.1002/hyp.7848, 2011.

Winsemius, H. C., Savenije, H. H. G., and Bastiaanssen, W. G. M.: Constraining model parameters on remotely sensed evaporation: justification for distribution in ungauged basins?, Hydrol. Earth Syst. Sci., 12, 1403–1413, https://doi.org/10.5194/hess-12-1403-2008, 2008.

Xu, X., Li, J., and Tolson, B. A.: Progress in integrating remote sensing data and hydrologic modeling, Prog. Phys. Geogr., 38, 464–498, https://doi.org/10.1177/0309133314536583, 2014.

ZAMCOM, SADC, and SARDC: Zambezi Environment Outlook 2015, Harare, Gaborone, 2015.

Zhang, K., Kimball, J. S., and Running, S. W.: A review of remote sensing based actual evapotranspiration estimation, Wiley Interdisciplin. Rev.: Water, 3, 834–853, https://doi.org/10.1002/wat2.1168, 2016.

Zink, M., Mai, J., Cuntz, M., and Samaniego, L.: Conditioning a Hydrologic Model Using Patterns of Remotely Sensed Land Surface Temperature, Water Resour. Res., 54, 2976–2998, https://doi.org/10.1002/2017WR021346, 2018.

Articles

Short summary

Satellite observations have increasingly been used for model calibration, while model structural developments largely rely on discharge data. For large river basins, this often results in poor representations of system internal processes. This study explores the combined use of satellite-based evaporation and total water storage data for model structural improvement and spatial–temporal model calibration for a large, semi-arid and data-scarce river system.