A hybrid data-driven approach to analyze the drivers of lake level dynamics

Somogyvári, Márk; Scherer, Dieter; Bart, Frederik; Fehrenbach, Ute; Okujeni, Akpona; Krueger, Tobias

doi:https://doi.org/10.5194/hess-28-4331-2024

Articles | Volume 28, issue 18

https://doi.org/10.5194/hess-28-4331-2024

Articles | Volume 28, issue 18

Research article

20 Sep 2024

Research article |

| 20 Sep 2024

A hybrid data-driven approach to analyze the drivers of lake level dynamics

Márk Somogyvári, Dieter Scherer, Frederik Bart, Ute Fehrenbach, Akpona Okujeni, and Tobias Krueger

Abstract

Lakes are directly exposed to climate variations as their recharge processes are driven by precipitation and evapotranspiration, and they are also affected by groundwater trends, changing ecosystems and changing water use.

In this study, we present a downward model development approach that uses models of increasing complexity to identify and quantify the dependence of lake level variations on climatic and other factors. The presented methodology uses high-resolution gridded weather data inputs that were obtained from dynamically downscaled ERA5 reanalysis data. Previously missing fluxes and previously unknown turning points in the system behavior are identified via a water balance model. The detailed lake level response to weather events is analyzed by calibrating data-driven models over different segments of the data time series. Changes in lake level dynamics are then inferred from the parameters and simulations of these models.

The methodology is developed and presented for the example of Groß Glienicker Lake, a groundwater-fed lake in eastern Germany that has been experiencing increasing water loss in the last half-century. We show that lake dynamics were mainly controlled by climatic variations in this period, with two systematically different phases in behavior. The increasing water loss during the last decade, however, cannot be accounted for by climate change. Our analysis suggests that this alteration is caused by the combination of regional groundwater decline and vegetation growth in the catchment area, with some additional impact from changes in the local rainwater infrastructure.

Download & links

Article (PDF, 1717 KB)

Supplement (661 KB)

Download & links

How to cite.

Received: 14 Sep 2023 – Discussion started: 28 Nov 2023 – Revised: 10 Jul 2024 – Accepted: 29 Jul 2024 – Published: 20 Sep 2024

1 Introduction

One of the most visible effects of climate change in recent years has been the decline in surface water levels, especially in lakes (Woolway et al., 2020). However, not all lakes react to changes in climate in the same way; some are more exposed to climate variations, while others are more exposed to anthropogenic effects (Mason et al., 1994). Understanding the drivers of lake level dynamics and their importance is thus essential for the development of mitigation measures or conservation strategies.

The response of lake levels to changing meteorological conditions has been a focus of research for many decades, but in recent years, an increased interest in water availability has broadened this research topic, with more and more cases waiting for practical solutions (Kebede et al., 2006; Schulz et al., 2020; Getachew et al., 2021; Woolway et al., 2020). This broadened interest often comes with the challenge of limited data availability, especially in remote areas or at the beginning of research campaigns (Woolway et al., 2020; Altunkaynak, 2007; Solomatine and Ostfeld, 2008). Hence, a practical approach that can work in such conditions is needed.

Here, we revisit the downward model development approach of Sivapalan et al. (2003) as a way of tailoring hydrological models to the data availability. Downward model development starts from large-scale, low-complexity models and then progresses to the smaller-scale processes (Hrachowitz and Clark, 2017). We demonstrate that such a downward model development approach is suitable for lake system understanding, with the goal of identifying the key drivers of lake level dynamics. In our case, these drivers may be climatic variations or changes in natural water fluxes due to land cover changes or groundwater trends but could also be changes in water use and water infrastructure. We propose a hybrid data-driven methodology, where the system understanding gained at a specific level of model complexity is used for the design of the higher-complexity models. We start with a water balance model, which then informs the development of a linear regression model operating at a higher temporal resolution. The development may then continue with non-linear models (e.g., artificial neural networks) or with the higher-temporal-resolution or spatially distributed models of the catchment. The downward modeling can also fit organically into the development of process-based models, as will be shown.

Water balance modeling, as in our case, is often an initial step in understanding hydrological systems. Water balance models do not require a complete system understanding to function properly (Xu and Singh, 1998) but can capture very well the macro-scale behavior of the system based on a set of influxes and outfluxes. On a monthly timescale, these models only need a handful of hydrological variables and, hence, will often work in limited-knowledge cases.

Mason et al. (1994) used water balance modeling to simulate the responses of the largest closed lakes around the globe and showed that lakes act as natural low-pass filters over any sudden variation in aridity. Crapper et al. (1996) used such models to predict future levels of Lake Goran in Australia by taking the cumulative sum of the predicted storage change from the model. Kebede et al. (2006) used a monthly water balance model at Lake Tana and identified that the main driver of lake level change was the variation in rainfall and not human-induced activities. Schulz et al. (2020) showed that variations in the levels of Lake Urmia were mainly driven by climate, while local agricultural water extraction had little effect on the overall trend. However, the authors also showed that, even without affecting the trend or even some dynamic variations, the abstractions weakened the resilience of the lake in relation to climatic changes and that the lake levels could be stabilized by limiting abstraction rates.

The main issue in water balance modeling comes from its highly simplified nature; while these models are robust and easy to model with, they are very general and can overlook some of the details that could be relevant to the hydraulic system. One such issue is how to handle any time lag between the inputs and the lake level changes. This issue is well known; Langbein (1961) already suggested incorporating the time lag with geometric weight functions into a water balance model. However, most studies tend to overlook this issue by simply modeling on coarser timescales (e.g., monthly, yearly).

At the other end of the complexity spectrum, process-based models are constructed via simulating the individual hydrological (physical) processes that affect the lake dynamics, including spatially and temporally differentiated inflows and outflows, lake bathymetry, weather effects, and thermal or chemical forcing (Beletsky et al., 2013; Laval et al., 2003; Valipour et al., 2023). Process-based models can resolve the behavior of the lakes at greater spatial and temporal resolutions and can help to study and predict the hydrological evolution of lakes even under complex environmental conditions. Lake models can be expanded to include further physical and biochemical processes such as water quality; hence, their application range is very broad.

Lake Erie, for example, has been subject to extensive modeling work to support adaptive management (Arhonditsis et al., 2019). Getachew et al. (2021) combined water balance and process-based modeling in a prediction framework for lake levels at Lake Tana. The process modeling of these studies focused on the recharge dynamics using the Soil Water Assessment Tool (SWAT), but there is also extensive literature using groundwater modeling software such as MODFLOW that can better handle lake–groundwater interactions (Lu et al., 2022; Dehghanipour et al., 2019). The downside of process-based models is their time-intensive setup, their large number of parameters requiring extensive data and their large computational costs. This is critical in situations where available data and prior knowledge are limited. In the absence of comprehensive data to run and parameterize process-based models, their theoretical superiority over simple water balance models vanishes.

As models of intermediate complexity, data-driven models are based on readily available observations of the investigated system, while the internal system mechanics are approximated using statistical methods. The underlying system behavior is thus approximated from the mathematical relations between the system input and output data (Souza et al., 2016).

In hydrology, data-driven methods are typically used for prediction or management, for which they are frequently embedded into a system dynamics model framework that goes beyond natural science hydrology (Hassanzadeh et al., 2012; Alifujiang et al., 2017). The term data-driven modeling is often used as an overarching term for a wide variety of novel machine learning methods (Zhu et al., 2020b; Elshorbagy et al., 2010a) but usually excludes methods, such as time series analysis or regression, that are also data-driven by design.

Time series analyses methods often only consider the lake level time series themselves, predicting them based on their own past values. Şen et al. (2000) analyzed the time series of Lake Van in Turkey with linear and non-linear trends and combined them with a Markov model to predict future lake levels. Ebtehaj et al. (2019) based a linear prediction on the spectral decomposition of time series. Multiple studies showed the applicability of autoregressive integrated moving average (ARIMA) models in the context of lake water management. ARIMA models use linear regression combined with moving averages and are suitable for short-term time series predictions. Hence, the approach is very popular for prediction applications in hydrology (Ghashghaie and Nozari, 2018; Irvine and Eberhardt, 1992; Montanari et al., 1997). However, simple time series approaches are limited as they do not use any weather forcing as input, as pointed out by Kakahaji et al. (2013). Kakahaji et al. (2013) compared multiple prediction methods based on the Lake Urmia dataset, including water balance modeling, linear predictor models and different machine learning approaches (multi-layer perceptron and fuzzy networks). The authors concluded that, in data-scarce scenarios linear approaches, are preferred, while non-linear machine learning methods could only outperform them when properly trained.

Linear regression has been widely used to model the responses of hydrological systems to rainfall (Clarke, 1973; Tasker, 1980). Linear regression assumes a linear relationship between the model input and output, with the linear coefficients calibrated based on the misfit of the model (usually by the ordinary least-squares method). As this is an easy-to-use and robust methodology, it has been the standard data-driven approach in geosciences for decades. Linear models are usually fitted deterministically, but they are also suitable to be implemented within Bayesian frameworks for sensitivity analysis and uncertainty quantification (Kroll and Song, 2013). In more recent studies, linear regression is still widely used as the reliable baseline to compare other more advanced methodologies. For example, Elshorbagy et al. (2010a, b) compared the predictive capabilities of six different methods using linear models as a baseline. Linear models were similarly used by several studies to show the advantages of machine learning methods (Heuvelmans et al., 2006; Sahoo and Jha, 2013).

Machine learning applications are gaining increasing popularity in hydrological practice, for example, for rainfall–runoff modeling (Kratzert et al., 2019; Sahoo et al., 2019; Klotz et al., 2022), water resources management (Oyebode and Stretch, 2019), drought prediction (Li et al., 2021) or lake level prediction (Kisi et al., 2012; Demir and Yaseen, 2023). Machine learning methods provide black-box solutions with a non-linear internal mathematical structure. They can be used as predictors based on the lake level variation data only (Zhu et al., 2020a), or they can be used to predict dynamics based on forcing data (Páliz Larrea et al., 2021). Machine learning models typically have a very large number of parameters. Proper calibration of parameters requires large datasets, which again limits the applicability of these approaches. An even larger issue regarding the context of our study is the black-box nature of machine learning methods: it is very challenging to analyze individual processes when the black-box method is designed to mimic the overall behavior of the system (McGovern et al., 2019). This is also true for most other data-driven approaches as they are designed for prediction rather than understanding, but models of lower complexity, e.g., regression-type models, could still be analyzed with relative ease.

In this paper we present a case of limited prior knowledge where a process-based model cannot yet be set up with the required level of confidence, although predictions of lake level change and an assessment of potential drivers are increasingly demanded by policymakers and stakeholders. We use the case of Groß Glienicker Lake, a groundwater-fed lake at the outskirts of Berlin, Germany, that has experienced drastic water losses over the last decades. This loss is not systematically observed in all lakes of the region (Lischeid et al., 2021); hence, further drivers beyond climatic changes need to be examined, e.g., water infrastructure and land use changes.

We follow the downward model development approach as follows: by means of a monthly water balance analysis, we identify and quantify missing water fluxes in the hydraulic system and use this as a baseline to identify any turning points and changes over the investigated period. This informs a daily data-driven linear model that can unfold the lake level responses to specific events in more detail. By identifying the main drivers of the lake level dynamics and system changes, our study will support the development of a future process-based model, while the results can already be used in local water management initiatives.

2 Methodology

In this study, we propose a top-down model development approach (Sivapalan and Young, 2005; Hrachowitz and Clark, 2017) to understand the lake level dynamics, starting from simple water balance models and moving to more complex data-driven approaches, with an outlook toward what we can learn from these for even more complex process-based modeling. We propose a hybrid data-driven modeling framework, consisting of the following steps:

monthly water balance modeling to quantify fluxes
identifying the main turning points in the system using the water balance residuals
daily linear regression modeling between the turning points
model response analysis to isolated weather forcings
further analysis of steps 3 and 4 with non-linear approaches of increasing complexity (if needed)
triangulation of findings using independent data.

Our proposed methodology requires the development of multiple models. First, a water balance model is created on the monthly scale that helps in quantifying the fluxes of the hydraulic system and that helps in identifying any major turning points during the investigation periods. The evolution of the water balance residuals can indicate systematic changes, like increases in outflow from the catchment.

Next, daily-scale data-driven models calibrated over the periods between the turning points are compared in order to analyze the differences in their lake level responses. We start this analysis using linear regression models due to their simplicity and transparency. The model responses to the different weather forcings (precipitation, evapotranspiration) are compared separately as well to understand the system in detail.

If linear models cannot capture the system behavior sufficiently, we propose increasing the model complexity using non-linear models, such as artificial neural networks. In our study, the linear approach provided good fits and enough insights into system understanding based on the available data – the fact that the system seems to behave linearly is in itself an interesting result. As the last step, the findings are validated against independent information. In the following, first we present the meteorological forcing data and the way these were obtained. Then, we present the methodologies of the water balance and linear regression approaches.

2.1 Forcing meteorological data

The proposed methodology relies on local meteorological data from the investigated lake catchment, which is achieved by using data from the second version (v2) of the Central European Refined analysis (CER) (Jänicke et al., 2017), a gridded meteorological dataset for central Europe with a focus on the region of Berlin–Brandenburg. As with its predecessor, CER v1, the CER v2 dataset has been produced by means of an observation-based model approach. Global ERA5 reanalysis data have been dynamically downscaled using the Weather Research and Forecasting (WRF) model and validated against 211 weather stations. The methodologically has been comprehensively described and successfully applied in different regions of the world – for instance, in High Asia (Wang et al., 2021; Maussion et al., 2014). The CER v2 dataset covers the time period from 1980 to 2022 (with continuous updates for the most recent years) using a convection-resolving approach at the highest spatial resolution of 2 km horizontal grid spacing. Data from 2002 to 2022 have been used in this study since the CER v1 dataset, which dates back only to 2002, was used to test the robustness of our methodology.

There are two advantages to using a dynamically downscaled gridded dataset instead of relying on interpolated station data. First, such an approach provides an estimate of actual evapotranspiration for each grid point using land cover, vegetation and soil data and dynamic data on soil moisture, while station-based observations are typically restricted to potential evapotranspiration (lysimeters or eddy flux towers would be available at only very few locations). Second, this approach explicitly takes into account mesoscale heterogeneity of weather systems, which is of particular importance for precipitation and actual evapotranspiration with high variability at spatial scales of a few kilometers or less. When we tested our lake models using weather station data, we were unable to obtain the same model fit qualities as with the CER v2 dataset. The largest differences happened after extreme rainfall events, where, due to the spatial variations, the recorded amount of rainfall could differ a lot from the rainfall at other locations. Because summer storms have a strong impact on the lake levels, we could not close the water balance models only using weather station data.

2.2 Water balance modeling

In groundwater-fed lake systems without any surface water connections, the lake level dynamics will be mainly dependent on the inflow from the groundwater. This flow is controlled by the groundwater level–lake level relation. Hence, the groundwater dynamics and lake level dynamics are strongly related, and the lake level changes can be used as an indicator for the groundwater level changes in the catchment. The water balance equation for a groundwater-fed lake system can be formulated as follows:

\begin{matrix} (1) & Δ S_{lake} (t) = P_{lake} (t) - E_{A,lake} (t) + F_{in} (t) - F_{out} (t) + ϵ, \end{matrix}

where ΔS_lake is the change in lake water storage, P_lake is the total precipitation over the lake, and E_A,lake is the total lake evaporation. F_in and F_out are, in this case, the subsurface inflow and outflow of water to the lake, which can be combined into the net subsurface water inflow (ΔF). The final term, ϵ, explains any remaining errors and uncertainties in the data. If there were any surface water connections to the lake, an extra net surface water inflow would have to be accounted for. All terms in Eq. (1) are expressed in units of volume over time (e.g., in m³ d⁻¹ or m³ month⁻¹), with fluxes being integrated over the lake surface area.

Precipitation (P_catchment) and actual evapotranspiration (ET_A,catchment) over the (subsurface) catchment area, and not just over the lake, strongly influence subsurface flow processes that feed the lake. However, these effects show some time delay. Therefore, the water balance equation for the catchment reads as follows:

\begin{matrix} (2) & \begin{aligned} Δ S (t) = & \int_{t - τ^{*}}^{t} P_{catchment} (τ) d τ \\ - \int_{t - τ^{*}}^{t} {ET}_{A,catchment} (τ) d τ + Δ F^{'} (t) + ϵ . \end{aligned} \end{matrix}

Here, the first integral sums precipitation over the catchment back over time until a precipitation event ceases to cause an inflow into the lake at time t, while the second integral does the same for actual evapotranspiration. We denote this time interval with τ^∗, which we can also call the hydraulic memory of the system, which is sometimes called the lake response time in the literature (Mason et al., 1994; Gong et al., 2015). Here, ΔS denotes the change in storage over the whole catchment, which is mainly the change in groundwater storage. With this assumption, storage changes in the unsaturated zone are neglected. In the model, this time represents the time water spends traveling through the unsaturated zone and then the pressure impulse traveling through the system. The hydraulic memory of the system is estimated from the observed data.

In Eq. (2), precipitation over the lake is included in the precipitation of the catchment, and evaporation is included in the catchment evapotranspiration term. The modified water balance equation leaves ΔF^′ to account for any remaining net subsurface inflow unaffected by climatic forcing – for instance, water abstractions, in which case ΔF^′ would be negative – or diverging regional groundwater flows. If these flows are approximately constant over the investigated time period, they will not appreciably affect the lake level dynamics.

Considering the discrete nature of daily input data, the integrals can be substituted by sums:

\begin{matrix} (3) & \begin{aligned} Δ S (t) = & \sum_{i = 1}^{τ^{*}} P_{catchment} (t - i) \\ - \sum_{i = 1}^{τ^{*}} {ET}_{A,catchment} (t - i) + Δ F^{'} + ϵ . \end{aligned} \end{matrix}

The complete water balance can then be used to estimate the changes in catchment storage. To use such model for the lake level dynamics, the catchment storage change (ΔS) needs to be converted to lake level change (Δz). Lake level change can be estimated from lake storage change using a bathymetric model, but this approach is not suitable for catchment storage. Hence, we used an assumption that lake level changes are linearly related to catchment storage changes. We based this assumption on the fact that lake level changes are relatively small compared to the scale of the catchment and that the catchment geometries are simple in a lowland, sedimentary geological setting. The catchment storage change–lake level change relation reads as follows:

\begin{matrix} (4) & Δ z (t) = α Δ S (t) + β . \end{matrix}

The slope (α) and intercept (β) can be estimated by optimizing the fit between the observed and modeled lake level changes. In simpler hydrological systems (such as our case), α could equal to 1, and β could be equal to zero. The unit of α is one over area (m⁻²) to account for the conversion from the change in volume to the change in depth, while the unit of β is the same depth over time as for Δz (e.g., m d⁻¹ or m month⁻¹).

The water balance equation is closed by calibrating the ΔF value, aiming to ensure the best fit between the observed and modeled lake level changes. To characterize the fit quality, we use the R² score metrics, i.e., the coefficient of determination. The calibration is done via a simple grid search by testing a series of values with reasonably small intervals between them.

After the water balance equation is closed, the remaining residuals (ϵ) are analyzed to identify any systematic trends or turning points in the system. Turning points can be identified as the starting point of a continuous increase or decrease in the residual values (see Fig. 5 for example). To quantify these effects, transient fluxes can be introduced to the water balance, with constant values within certain time intervals. The calibration of such fluxes is done similarly, using a grid search.

2.3 Data-driven modeling

Data-driven models use the statistical relationship between the model input data and the observed outputs. Based on the modified discrete water balance equation (Eq. 3), we can frame the general modeling problem as follows:

\begin{matrix} (5) & \begin{aligned} Δ & z (t) = \\ f (P (t - τ^{*}), \dots P (t), {ET}_{A} (t - τ^{*}), \dots {ET}_{A} (t)) + ϵ . \end{aligned} \end{matrix}

This means that we are looking for the functional relationship between the meteorological input data (fluxes in units of mm d⁻¹ considering a daily timescale) and the observed lake level changes (in m d⁻¹). This equation can be amended by additional input data, like data on water abstraction (if such data are available).

The simplest function that can be used in this model is a linear function, which would read as follows:

\begin{matrix} (6) & \begin{aligned} Δ z (t) = & a + b_{P, - τ} P (t - τ^{*}) + b_{P, - τ + 1} P (t - τ^{*} + 1) \\ + \dots + b_{P, 0} P (t) + b_{ET, - τ} ET (t - τ^{*}) + \dots \\ + b_{ET, 0} ET (t) + ϵ, \end{aligned} \end{matrix}

where a is the intercept of the linear function, $b_{P, - τ + i}$ are the linear coefficients for precipitation for the time steps τ+i in the past, and $b_{ET, - τ + i}$ are the respective coefficients for actual evapotranspiration. Although this is a relatively simple formula, the function could have a high dimensionality, which increases with the memory τ^∗ and the number of input features F ( $f : R^{F τ^{*}} \to R$ ). This is often referred to as a multilinear problem in the literature (Sahoo and Jha, 2013).

The linear model can be split into two sub-models to investigate the individual responses to precipitation and evapotranspiration:

\begin{matrix} (7) & Δ z_{P} (t) = b_{P, - τ} P (t - τ^{*}) + \dots + b_{P, 0} P (t), \\ (8) & Δ z_{ET} (t) = b_{ET, - τ} ET (t - τ^{*}) + \dots + b_{ET, 0} ET (t) . \end{matrix}

Optionally, input data might be filtered prior to the analysis. In this study, we used Butterworth filters from the scipy.signal Python package. For the autocorrelation analysis in Sect. 4.1, a band-stop filter was used, which removes the 365 d period signal from the lake level data. For the plots of the linear regression analysis (Figs. 6–9), a low-pass filter was used over the lake level data, with a cutoff frequency of 20 d. This was necessary for the visualization in Fig. 7, where the higher-frequency components would appear as noise over the coefficients. A comparison plot of the filtered and unfiltered data is shown in Fig. S1 in the Supplement.

To make the linear model coefficients more comparable for the different input types, the input data may be standardized as well. Standardization rescales the input time series to have a zero mean and a standard deviation of 1. Standardization may also be necessary for non-linear machine learning approaches.

The linear model is fitted using ordinary least-squares regression from the scikit-learn Python library (Pedregosa et al., 2011). The method minimizes the sum of squared errors between observed and simulated data using an explicit formula of a projection matrix.

The lake response time and/or system memory can be estimated in multiple ways; in this, study we used two separate methods. First, the k-lag autocorrelation of the lake level data was calculated.

\begin{matrix} (9) & AC (τ) = corr (z (x), z (x - τ)) \end{matrix}

The k-lag autocorrelation shows the time dependence of the lake level data and gives a good indication of the ideal memory time frame for the modeling (Seeboonruang, 2015). A second approach, which is often used in rainfall runoff models, is fitting and evaluating a series of linear regression models with different memory windows. With this approach, the fit qualities of the different models are compared to identify the lake response time.

3 Study site and data

We applied this method to Groß Glienicker Lake, a groundwater-fed lake system at the border of Berlin and the federal state of Brandenburg in Germany. Like several lakes in the region, Groß Glienicker Lake has shown a drastic loss in lake levels over the last half century, with an increasing rate over the last decade (Fig. 1b).

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f01

Figure 1(a) Catchment of Groß Glienicker Lake (© OpenStreetMap contributors 2023. Distributed under the Open Data Commons Open Database License (ODbL) v1.0.). (b) Overview of lake level changes in Groß Glienicker Lake together with concurrent time series of temperature and precipitation. Key events potentially impacting the lake system are shown as background colors.

The lake catchment delineated from topographic data spans over 33 km², mainly consisting of forest (20 km²), cropland (6 km²) and urban area (4 km²). The main recharge area is the heathland west of the lake dominated by sandy soils. The regional groundwater flow direction points southeast, toward the Havel River, a major tributary of the river Elbe, but due to the lack of groundwater wells and low gradients, the exact groundwater flow system is currently not known.

Two lakes are located within the catchment, i.e., Sacrower Lake and Groß Glienicker Lake. The latter has been chosen for this study because it is a focus of the local concerns. The lake nevertheless is representative of declining lake levels that are widespread in the Berlin–Brandenburg region and beyond. Both lakes are groundwater fed, with no active surface water connections, similarly to many other lakes in the region (Lischeid, 2021). A connection between the two lakes used to exist, but it has been closed since 1996 due to the declining levels in both lakes.

Groß Glienicker Lake has been extensively studied from a hydrochemical point of view because, between 1970 and 1990, a large amount of untreated sewage was regularly discharged into the lake from a nearby army base, leading to eutrophication. To mitigate the effects of this pollution, a restoration campaign started in the early 1990s, which is well documented (Wolter, 2010; Kleeberg et al., 2012; Heinrich et al., 2022). There are, however, only a limited number of studies that focused on lake level dynamics, although the continuous decrease in lake levels has been a concern for the local communities and authorities for a while.

The lake is located on the administrative boundary of the German federal states of Berlin and Brandenburg. This makes the accessibility of infrastructural and water management data complicated. From the Berlin side, data regarding water supply, wastewater management and canalization maps are available on the city web page for a period of multiple years. From the Brandenburg side, geological information, as well as limited information on the rainwater infrastructure (e.g., manhole cover locations), is available. The lake levels are monitored by an automatic measurement station at the south side of the lake operated by the city of Berlin. Daily lake level data starting from January 1970 are openly available on the website Wasserportal Berlin. The hypsographic curve based on the bathymetric model of the lake shows a linear relation between the lake volumes and lake levels (Jahn and Witt, 2002, p. 66), but note that this relation cannot be used to link the lake levels to the catchment storage.

The catchment shown in Fig. 1a was delineated using surface topography data; hence, it represents the surface catchment. This is used throughout the analysis instead of the unknown groundwater catchment. Due to the integrative nature of our analysis and the focus on the lake level dynamics, this is not a big issue, but it could cause uncertainties when the exact quantification of the hydrological fluxes is needed (see Sect. 4.2). The catchment is located in a lowland area, with an elevation range of 40 m. The unsaturated zone depth is up to 15 m (Geoportal Brandenburg – Detailansichtdienst, 2024).

The Central European Refined analysis (Jänicke et al., 2017) provides atmospheric data for the investigation region of Berlin–Brandenburg on a spatial grid of 2 km and at a temporal resolution of hours. In this study, we used the daily aggregated data of precipitation and actual evapotranspiration, integrated over the catchment area of the lake. Actual evapotranspiration is calculated from atmospheric parameters using static land use data. The exact land use composition of the catchment was estimated from the 2015 remote-sensing-based land cover analysis of Pflugmacher et al. (2019). Figure 2 shows an overview of the CER v2 data over the study time frame of 2002–2023.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f02

Figure 2Weather forcing input data from the CER v2 dataset: (a) investigation period and (b) example year of 2009.

Download

4 Results

In Fig. 1, air temperature shows a very apparent increasing trend over the last decades. Precipitation does not show any long-term trends, only shorter-term variations. This is in line with the climate analysis of the German Weather Service, which forecasts a slow increase in precipitation in the Brandenburg region (DWD, 2019). It is also stated, however, that extreme events are becoming more frequent and have provided a larger fraction of the annual precipitation in more recent years.

Figure 2 shows how summer periods are dominated by the extreme rainfall events in the catchment. These events are isolated by drought periods, as seen during the example year of 2009, plotted in Fig. 2b. The data also show that actual evapotranspiration has a much more periodic and regular behavior, with similar patterns over the years. However, the downscaled actual evapotranspiration data do not show any increasing trend, which one might expect from the increasing air temperatures (Fig. 1b). However, this would only happen in energy-limited systems with unlimited water availability (e.g., over open water bodies). While potential evapotranspiration would follow such a temperature trend, actual evapotranspiration in water-limited regimes does not depend on air temperature.

4.1 System memory (lake response time)

The memory of the hydraulic system is estimated by calculating the k-lag autocorrelation of the lake level data. A band-stop filter is used over these data to remove the annual cycle, which dominates the lake level periodicity and could distort the analysis (see Fig. S1 for the filtered time series). The result is shown in Fig. 3a.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f03

Figure 3Lake response time analysis: (a) autocorrelation of lake levels and (b) fit quality of linear models of the weather forcing–lake level relations (Eq. 6) using different system memory time frames.

Download

The autocorrelation plot in Fig. 3 shows a rapid decrease; it reaches zero around 20 d, and it reaches its minimum value around 30 d. Another method for estimating the optimal memory time frame in hydrology (mainly in rainfall–runoff modeling studies) is to compare linear models with different memory lengths over the same data. This is shown in Fig. 3b for a range from 1 to 100 d using the r² metric. The overall picture is very similar to the autocorrelation, with a rapid increase in fit quality until about 20 d. Then the fit quality increase slows down and stays at a high value of 0.8. This large range of optimal fits indicates the robustness and insensitivity of the linear regression method. Based on these analyses, we will use 30 d as the lake response time or hydraulic memory throughout this study.

4.2 Water balance

The water balance model is built on a monthly scale (30 d scale), as suggested by the system memory analysis. The monthly precipitation (P(t_m)) and actual evapotranspiration (ET_A(t_m)) time series are generated via summing up the daily values, and the lake level time series (which is used for model validation) is averaged to monthly means. The monthly weather values are then compared with the mean lake level of the next month (Δz(t_m+1)).

\begin{matrix} (10) & Δ z (t_{m + 1}) = P (t_{m}) - {ET}_{A} (t_{m}) + Δ F^{'} \end{matrix}

Equation (10) shows the used water balance equation. All terms in the equation are in units of millimeters per month (mm month⁻¹).

Figure 4 shows the two main steps of the water balance estimation. First, the water availability is calculated by subtracting actual evapotranspiration from precipitation monthly. By taking the cumulative sum of the water availability, we can see that if only these two processes would affect the lake, the lake levels would increase over the investigation period.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f04

Figure 4Water balance modeling: (a) cumulative sum of water availability and (b) observed and estimated lake levels.

Download

To obtain a more realistic picture of the water level changes, an additional loss term needs to be introduced (ΔF^′ in Eq. 10). In this case, a constant outflow equivalent of 4.5 mm month⁻¹ was necessary to bring the water balance curve as close to the lake levels as possible. This flux was estimated via a grid search through maximizing the r² score and can be attributed to the net groundwater outflow of the system.

There is a clear breaking point between the modeled and the observed data around 2015, where the two curves start to diverge from each other. Before this turning point, the obtained fit was maximal at 0.76. This means that, until this point in time, 76 % of the lake level variations can be explained solely by the variations of the meteorological inputs. After this point in time, the difference between the curves significantly increases, as is shown by the misfit curve in Fig. 5.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f05

Figure 5Differences between the observations and the water balance model (non-climatic water balance anomaly). At positive values, there is surplus water in the lake compared to the model; at negative values, the lake shows a water deficit that is unaccounted for by the model.

Download

It is clear from Fig. 5 that the residual variations cannot be explained by a single missing water balance component but with some system change. Until 2015, the model was in good agreement with the lake level observations despite some short-term variations in each direction.

Between 2015 and 2022, the lake levels exhibit a downward trend (Fig. 4) which is not captured by the model. By 2015, we see a systematically widening overestimation of the observed lake levels (Fig. 5). The difference is a rainfall equivalent of 10 mm every month, given that the catchment size is around 4×10⁶ m³ yearly. Because this change in the water balance happens very quickly, the time of change is very identifiable for 2015, which is a strong turning point in the hydrological system.

4.3 Linear model

To investigate the changes in the hydrological system in more detail, two data-driven linear models were constructed. The two models are set up identically, both taking daily precipitation and actual evapotranspiration data as input, with a 30 d long memory into the past (Eq. 6). The lake level data are filtered using a Butterworth filter with a 20 d cutoff frequency. The only difference is the calibration period used. The 2004 model uses a 7-year period after 2004 for calibration, which is a relatively steady period according to the previous analysis, while the 2015 model uses the last 7 years of the dataset from 2015, after the previously identified turning point. The calibrated models are then run for the complete available time series, adjusting the start of each calibration period to the actual lake levels that day. The results are shown in Fig. 6.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f06

Figure 6Daily lake levels modeled with a linear model using different time periods for calibration (marked by blue shading): (a) 2004 model and (b) 2015 model.

Download

The results in Fig. 6 clearly show the different dynamics of the two investigated time periods. The 2004 model, trained on the earlier period, depicts a very similar behavior compared to the water balance model. An overall good fit is seen in the first 12 years, with some larger deviations in more extreme years like 2007, which was exceptionally dry. After 2015, the model systematically overestimates the lake level, and, hence, an increasing gap opens between the observed and modeled lake levels. The gap is very similar to the water balance model. As the linear model was calibrated independently from that model, its similar result provides a validation for the chosen water balance parameters.

The output of the 2015 model is very much the opposite. It calibrates so as to capture relatively well the final steep decrease in the lake levels (the fit is even better than the first model), but when this modeled trend is extrapolated over the first half of the dataset as well, it overestimates the lake levels.

These results support the conclusion that the lake system behavior changes systematically around 2015. To diagnose these changes further, we now look into the calibrated models, their mathematical structure and their response behavior. Note that, as these models are purely data-driven, any missing process in the data is compensated for by adjusting the coefficients for P and ET.

To make the estimated effects of the different input features comparable (the effects of different predictors), the input data are first standardized for this analysis. The model coefficients thus give a good indication of the importance of the different inputs over the model's memory framework. The results for the two investigated time periods are compared in Fig. 7.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f07

Figure 7Model coefficients for precipitation and actual evapotranspiration: (a) 2004 model, with calibration period of 2004–2011, and (b) 2015 model, with calibration period of 2015–2022.

Download

Figure 7 shows the coefficient values of the two models over different time lags. For example, the precipitation coefficient at time lag 5 is the weight with which the precipitation of 5 d ago enters the calculation of today's lake level change (see Eq. 6). This plot shows that the lake reacts to precipitation and evapotranspiration in a different manner and that this difference changes depending on the calibration period due to the hypothesized system changes. The effect of precipitation is detectable immediately, and days in the past are becoming less and less relevant. In the 2004 model, the rainfall importance is generally higher than in the 2015 model, where it decreases rapidly after the first 10 d.

These findings can be explained with the following conceptualization: after rainfall, as rainwater reaches the groundwater table, it creates a hydraulic gradient, and the hydraulic signal reaches the lake very rapidly. The impact of the rainfall is still visible a few days later as some of the water takes more time to seep through the soil. This impact decays over time continuously. Actual evapotranspiration, on the other hand, has a delayed influence on lake level changes with a variable but, on average, constant importance of past days after 5–10 d in the 2004 model and an increasing importance of past days after 10–15 d in the 2015 model. The overall importance of actual evapotranspiration is also higher in the second model (34 % vs. 42 %).

To further analyze the model behavior, we created two sub-models according to Eqs. (7) and (8). Here, we separated the lake level response for the two input features of precipitation and actual evapotranspiration, i.e., simulating the lake level responses that would result from only precipitation or actual evapotranspiration as predictors. We used non-standardized inputs and outputs for this analysis. This way, we can directly compare the differences between the two models in terms of the different effects of the two inputs. This is shown exemplarily in Fig. 8.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f08

Figure 8Modeled hydraulic responses to different weather patterns: (a) summer weather input with high-intensity storms, (b) spring weather input with light rain, (c) model response to precipitation in summer, (d) model response to precipitation in spring, (e) model response to actual evapotranspiration in summer, and (f) model response to actual evapotranspiration in spring.

Download

In this plot, we zoom into two different parts of the dataset to compare the two models directly in detail, focusing on typical weather events. The first time period in Fig. 8a is the late summer of 2006, which saw many days without precipitation but high evapotranspiration and single-day rainfall events with relatively high amounts of rainfall. During this time period, the 2015 model shows a systematically stronger response to actual evapotranspiration (Fig. 8e), which leads to a larger simulated lake level drop. The offset is not just vertical; there is a time lag of 5–10 d between the two responses (as expected based on the coefficients in Fig. 7).

The precipitation response is a bit more complex: the 2004 model gives a much stronger response to the larger rainfalls but a weaker one to the lack of rain (Fig. 8c). This balances out the two curves over this time period, resulting in a similar precipitation response.

The models behave differently during the calmer spring season of 2016 (Fig. 8b). Here, both the two evapotranspiration responses (Fig. 8f) and precipitation responses (Fig. 8d) run close to each other with small differences.

Therefore, the general offset between the two models is systematic. Between September and June, the models behave similarly; this is a period with regular rainfall without many dry days or extreme rainfall events. The discrepancy in these periods, however, is usually not that high; hence, the two models stay relatively close to each other.

This result shows that the main difference in the system between these two time periods is coming from the difference in evapotranspiration during the summer periods. The seasonality of the model differences is shown in more detail in Fig. 9, where the median of the lake level response differences for the two inputs is plotted over the months of the year.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f09

Figure 9Yearly dynamics of the model discrepancies (2015 model − 2004 model): (a) median difference in precipitation response with confidence intervals and (b) median difference in evapotranspiration response with confidence intervals.

Download

The median difference in precipitation response in Fig. 9 shows that, over the long run, the differences in rainfall response between the two models are canceled out. We can see some small positive anomalies in the spring and fall, but this effect is much smaller than what is visible in evapotranspiration.

Evapotranspiration response also shows a small positive anomaly during these periods, keeping the two models close during the winter. Figure 9b nicely shows that the main difference between the two models originates from the summer evapotranspiration difference. This difference is very consistent over the years, indicated by the narrow confidence interval – which is not surprising as rainfall shows the bigger variability over the years.

5 Discussion

The water balance model shows a 4×10⁶ m³ yearly deficit in terms of the climatic water balance since 2015. We take the two linear models as representing the system behavior during the relatively stable period between 2002 and 2015 and between 2015 and 2022, respectively. The change in system behavior between these two periods is projected onto differences in the responses to precipitation and actual evapotranspiration in these simple models, while, in reality, a number of other processes will be responsible. However, the changes in the responses can still be analyzed to hypothesize about the actual processes at work. In this section, we discuss some of these hypotheses.

5.1 Water management

A possible factor responsible for the accelerated decrease in lake water levels since 2015 that has been put forward by local stakeholders is an increase in water abstractions at the nearby water supply wells. Our analysis, however, does not support this hypothesis. There was no reported change in the abstraction rate of the local waterworks in this period, and an increased abstraction rate would hardly explain the change in the short-term system dynamics – it would appear as a constant shift in water loss instead. Nevertheless, groundwater abstractions could affect the resilience of the lake to climate change effects, as was shown by Schulz et al. (2020) for Lake Urmia. Based on process-based modeling, their study did not find a direct correlation between the abstractions and water level variability of Lake Urmia as the lake could buffer the reduced inflow. However, in forecast scenarios, they achieved higher lake levels with reduced abstraction rates.

Abstractions by local households directly from the lake or from the groundwater could also have an effect as people tend to use these water sources for gardening. To calculate an upper bound for such private abstractions, we assume that all 5000 residents of Groß Glienicke (on the Brandenburg side of the lake) use 7 % for gardening (Schleich and Hillenbrand, 2009) out of the 200 L daily average water consumption (OECD averages), which would amount to 26 250 m³ abstraction over the year. This is significantly less than the estimated water deficit.

Another local water utility is a former sewage farm (Rieselfeld Karolinenhöhe) north of the catchment. Here, large volumes of untreated wastewater were infiltrated into the groundwater system up until 2010. The effects of this facility have been extensively studied (Haacke et al., 2018; Liese et al., 2004), but no direct link has been found between the sewage farm and the lake's catchment as the infiltrated water flew directly into the Havel Rriver. The sewage farm stopped its operation in 2010, well before the identified turning point.

Another infrastructural change that happened in the area was an upgrade of the sewage system. The most notable example is the former British air force base (General Steinhoff Kaserne) east from the lake. Here, an almost 500 000 m² area got connected to the rainwater canalization system between 2012 and 2017, and the new system now leads the collected rainwater to the Havel River outside the catchment (Döllefeld et al., 2021). Assuming that 90 % of this water would have previously reached the groundwater or the lake (an assumption based on the urban evapotranspiration fraction of the CER dataset), this could account for up to 225 000 m³ of the missing fluxes from the catchment.

Dialog with the local community also suggested that this canalization upgrade extended beyond the former air base and that it also might have included the sewage system. Unfortunately, no reports or studies are available, but, similarly to the private abstractions, we can estimate the sewage production of the districts around the lake. Estimating 20 000 residents living in the area, with an average sewage production of 120 L yr⁻¹ (Umweltbundesamt, 2023), we arrive at an estimate of 900 000 m³. This could be used as an upper bound for the potential effect of the infrastructure change if, before all of this, water had been discharged into the catchment directly. This, however, is most likely a big overestimation as the area is shown to be connected to the Berlin canalization system in documents from 2012 and before (https://www.berlin.de/umweltatlas/wasser/regen-und-abwasser/2012/literatur/, last access: 11 September 2024). The upgrade most likely affected the rainwater canalization only, which has a reported average yearly flux of around 100 000 m³.

Still, this amount could have affected the hydrological system in some way (see Fig. 11) but cannot explain the observed misfit. In particular, the lack of infiltrated water in the catchment would not explain the observed seasonality in the misfits (Fig. 9). Further analysis of this issue is limited because the lake is located on the administrative boundary of the two German states of Berlin and Brandenburg, which means that some of the relevant information is only available on one side of the lake.

5.2 Vegetation increase

A significant part of the lake catchment area west of the lake is covered by forests and heathlands (Fig. 1a). Satellite imagery reveals an increasing trend in the normalized difference vegetation index (NDVI) between 2002 and 2018 (Fig. 10a). This suggest a general increase in vegetation over the past decades within the catchment area, likely to be attributable to an increase in forest canopy density and an expansion of woody vegetation.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f10

Figure 10Analysis of vegetation trends: (a) yearly average NDVI values integrated over the catchment and (b) comparison of the yearly non-climatic water balance anomaly and the cumulative NDVI anomaly of the catchment. The NDVI anomaly is calculated relative to the average NDVI of the 2002–2015 period, before the expected turning point.

Download

Comparing these data with the water balance gap shows striking similarities. In Fig. 10b, we calculated the cumulative sum of the NDVI anomaly relative to the 2002–2015 period average and obtained a similar trend to the water balance anomaly. This suggests a possible connection between the two trends.

A denser canopy intercepts more rainfall available for evaporation, and more mature trees have higher transpiration rates; hence, a denser canopy reduces groundwater recharge. The model discrepancies in Fig. 8 are most pronounced in the growing season, where the tree canopies are most developed. This analysis supports the hypothesis that the forest in the catchment has a strong effect on the hydrological system.

The land cover analysis also shows that our modeling could be improved if we could account for the heterogeneous land cover in the catchment when calculating evapotranspiration. Beside the observed 10 % increase in NDVI, MODIS evapotranspiration data show a 5 %–15 % increase in forest evapotranspiration in the region (see Fig. S2 in the Supplement). The impact of this change over the lake levels is equivalent to a yearly flux of 800 000 m³.

This amount could partly explain the water balance deficit, and the increase in evapotranspiration would also explain why the two linear models differ most during the growing season. However, to gain a more precise understanding of the effects of vegetation cover changes, a more detailed process-based analysis would be required, including biophysical modeling of the trees and detailed modeling of the recharge process.

5.3 Regional groundwater trends

Another hypothesis relates the change in lake dynamics to a larger-scale, regional groundwater trend. Lischeid (2021) analyzed lake and groundwater level time series in the region with principal component analysis. The authors concluded that lakes situated on the higher parts of this lowland region are more sensitive to falling water levels than lakes in the valley bottoms because lakes situated higher are prone to losing their direct connection with the groundwater.

The larger region of Brandenburg has a negative climatic water balance, with water flowing in from areas with a positive budget either as groundwater or surface water flow. This system, however, is currently under stress not only due to climate change but also due to the reduced flows of the Spree River, which are caused by the closure of open-pit mines in the Lausitz region (Habel et al., 2023). Therefore, over the last decades, in multiple parts of the region, decreasing groundwater levels have been visible. As a result, multiple lakes that are mainly groundwater fed show similarly decreasing levels (e.g., Groß Seddiner Lake, Großer Wummsee).

The exact effect of groundwater trends cannot be quantified without groundwater modeling. Still, some signs of deviation between the lake and groundwater levels can be seen after 2015, though these are not decisively clear. We also cannot attribute the seasonality in the differences between the two periods to this explanation.

5.4 Environmental tipping point

Ultimately, it seems most plausible that the observed lake level behavior is the result of a combination of the above-mentioned explanations.

Figure 11 shows a comparison between the estimated impacts of all potential explanations of the non-climatic water balance anomaly considered here. It shows that none of the anthropogenic or ecological factors alone are enough to explain the water deficit completely (even when their maximum potential impact is considered). Note that the effect of regional groundwater trends cannot be estimated; hence, we just used it to explain the missing water flux after the other explanations were applied.

https://hess.copernicus.org/articles/28/4331/2024/hess-28-4331-2024-f11

Figure 11Potential impacts of the different explanations of the non-climatic water balance anomaly.

Download

Based on these results, we could not single out one reason that could explain the sudden change in water balance in 2015, but we found multiple processes that probably all contribute to the loss of water. Due to their combined effect, the hydrological system could have reached a tipping point around 2015 that altered the water balance. This tipping point could have been a critical groundwater level that was reached due to the infrastructural and environmental changes, below which the surface water–groundwater connection got disrupted. The lake bed morphology or subsurface catchment geometry could be a reason for the existence of this critical level.

Another explanation is that the increase in vegetation on the west side of the catchment reduced the groundwater levels locally so that it altered the groundwater flow regime. The gradient of the groundwater table in this area is very small ( $3 \times 10^{- 4}$ m m⁻¹); hence, a local decrease in recharge could divert the groundwater flow and modify the subsurface catchment size. This explanation is in line with our finding that the difference in the hydrological system appears mainly during the growing season. However, to analyze these explanations further, more detailed process-based modeling is required.

6 Conclusions

In this paper, we have shown how a systematic downward model development approach, using water balance and data-driven models, could help with the investigation of a relatively under-studied lake system.

Water balance and data-driven models are well-applicable in such cases as they mainly rely on observed data, which are generally more available than system knowledge (process understanding). In the current information age, this imbalance is expected to shift to be even more in favor of data-rich problems. The presented methodology is well-transferable to similar groundwater-fed lowland lakes and can be used to identify the major drivers behind the lake level dynamics. The methodology can be adapted for systems with surface water connections through expanding the models with further features: a net surface water inflow term for the water balance model and an additional feature for the surface water dynamics in the data-driven model. With the help of high-resolution weather forcing data and lake level observations, we have identified a relation between the climatic and lake level variations. Water balance modeling helped to estimate the inflows and outflows of the system and to reveal any long-term dynamics. Data-driven modeling could then give a more detailed picture of the short-term lake system behavior, including responses to different weather patterns. This set of methods provided an effective toolset for understanding lake level changes and their drivers in a case where prior hydrological system and process knowledge was limited.

The developed water balance and data-driven models provided very good fits with lake level observations, which shows not just the potential of the modeling approaches but also the applicability of the CER v2 weather dataset. The approach revealed the main drivers of the lake level dynamics and provided insights into systemic changes in the hydrological system, which led to hypotheses regarding the lake level loss.

However, the presented methodology was not able to clearly identify the exact reason behind the non-climatic lake level loss, and the proposed hypotheses can only be proved or disproved with additional experiments and/or process-based modeling.

Another drawback of the presented methodology is the strong reliance on good-quality data. Closing the water balance or obtaining a good fit with the linear model were possible only because of the high accuracy of the weather dataset. Due to the spatial variability of precipitation, replacing it with weather station data would lead to a significant drop in model accuracy. Hence, in data-scarce regions, robust process-based approaches might be a better solution as they are capable of transferring knowledge from other comparable catchments, although without data they would operate with large uncertainties.

Our results showed that the lake level variations of Groß Glienicker Lake between 2002 and 2015 can be explained by the variations in net precipitation, i.e., by precipitation and actual evapotranspiration over the catchment. We have identified a change in the hydraulic system around 2015, which not only resulted in a loss of 4×10⁶ m³ water yr⁻¹ but also changed the hydraulic response of the lake to the climatic inputs.

Increased evapotranspiration from the maturation of a forest in the catchment could explain the altered system dynamics, and the change in the vegetation cover is well aligned with the observed hydrological trend. Therefore, the water loss can be at least partly attributed to the growth of the forest. Another likely reason is the continuous sinking of the groundwater levels in Brandenburg due to climate change, which is suggested to disrupt the connection of surface waterbodies to the groundwater, increasing their outflow. Regional studies show similar lake level trends in several lakes of the area.

Ultimately, the most likely explanation is the combination of the aforementioned processes, which made the hydrological system cross a tipping point during the investigation period. Further analysis of this explanation requires more detailed modeling of the individual processes and the development of a groundwater model.

The findings of this paper will be used to help the development of such a model. Our main recommendation is the inclusion of a dynamic land cover model that can account for the changes in vegetation and the benefits of gridded meteorological forcing data. While we would still suggest including the water work wells into the model (for which data series are available), we have shown that the effect of private abstractions is negligible, while the effect of infrastructural changes can be significant. Finally, we have emphasized the importance of the regionally observed changes of groundwater levels that need to be considered in any physically based modeling efforts.

Data availability

The CER v2 dataset is available from the website of the Chair of Climatology, TU Berlin, under the following link: https://www.tu.berlin/en/klima/research/regional-climatology/central-europe/cer (Bart et al., 2023).

The lake level data of Groß Glienicker Lake can be accessed at the water portal of the Berlin Senate: https://wasserportal.berlin.de/stationen_start.php (SenUVK, 2023).

Supplement

The supplement related to this article is available online at: https://doi.org/10.5194/hess-28-4331-2024-supplement.

Author contributions

DS, FB and UF prepared and curated the forcing meteorological data series. MS did the modeling and the data analysis. AO provided the remote sensing data and did the analysis of the vegetation growth. TK and DS guided the conceptualization and the internal review process. The co-authors all contributed to the preparation of the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Financial support

This study was funded through the Einstein Research Unit “Climate and Water under Change” from the Einstein Foundation Berlin and Berlin University Alliance under grant no. ERU-2020-609.

This open-access publication was funded by the Humboldt-Universität zu Berlin.

Review statement

This paper was edited by Theresa Blume and reviewed by Stephan Schulz and one anonymous referee.

References

Alifujiang, Y., Abuduwaili, J., Ma, L., Samat, A., and Groll, M.: System Dynamics Modeling of Water Level Variations of Lake Issyk-Kul, Kyrgyzstan, Water-Sui, 9, 989, https://doi.org/10.3390/w9120989, 2017.

Altunkaynak, A.: Forecasting Surface Water Level Fluctuations of Lake Van by Artificial Neural Networks, Water Resour. Manag., 21, 399–408, https://doi.org/10.1007/s11269-006-9022-6, 2007.

Arhonditsis, G. B., Neumann, A., Shimoda, Y., Kim, D.-K., Dong, F., Onandia, G., Yang, C., Javed, A., Brady, M., Visha, A., Ni, F., and Cheng, V.: Castles built on sand or predictive limnology in action? Part A: Evaluation of an integrated modelling framework to guide adaptive management implementation in Lake Erie, Ecol. Inform., 53, 100968, https://doi.org/10.1016/j.ecoinf.2019.05.014, 2019.

Bart, F., Schmidt, B., Wang, X., Holtmann, A., Meier, F., Otto, M., and Scherer, D.: CER v2 dataset, TU Berlin [data set], https://www.tu.berlin/en/klima/research/regional-climatology/central-europe/cer (last access: 1 November 2023), 2023.

Beletsky, D., Hawley, N., and Rao, Y. R.: Modeling summer circulation and thermal structure of Lake Erie, J. Geophys. Res.-Oceans, 118, 6238–6252, https://doi.org/10.1002/2013JC008854, 2013.

Clarke, R. T.: A review of some mathematical models used in hydrology, with observations on their calibration and use, J. Hydrol., 19, 1–20, https://doi.org/10.1016/0022-1694(73)90089-9, 1973.

Crapper, P. F., Fleming, P. M., and Kalma, J. D.: Prediction of lake levels using water balance models, Environ. Softw., 11, 251–258, https://doi.org/10.1016/S0266-9838(96)00018-4, 1996.

Dehghanipour, A. H., Zahabiyoun, B., Schoups, G., and Babazadeh, H.: A WEAP-MODFLOW surface water-groundwater model for the irrigated Miyandoab plain, Urmia lake basin, Iran: Multi-objective calibration and quantification of historical drought impacts, Agr. Water Manage., 223, 105704, https://doi.org/10.1016/j.agwat.2019.105704, 2019.

Demir, V. and Yaseen, Z. M.: Neurocomputing intelligence models for lakes water level forecasting: a comprehensive review, Neural Comput. Appl., 35, 303–343, https://doi.org/10.1007/s00521-022-07699-z, 2023.

Döllefeld, M., Haag, L., and Welsch, J.: Umweltatlas Berlin – planungsrelevante Umweltdaten für Berlin, ZfV – Zeitschrift für Geodäsie, Geoinformation und Landmanagement, 2, 138–143, https://doi.org/10.12902/zfv-0341-2021, 2021.

DWD: Klimareport Brandenburg. 1. Auflage; Deutscher Wetterdienst, Offenbach am Main, Deutschland, 40 pp., ISBN 978-3-88148-518-0, 2019.

Ebtehaj, I., Bonakdari, H., and Gharabaghi, B.: A reliable linear method for modeling lake level fluctuations, J. Hydrol., 570, 236–250, https://doi.org/10.1016/j.jhydrol.2019.01.010, 2019.

Elshorbagy, A., Corzo, G., Srinivasulu, S., and Solomatine, D. P.: Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 1: Concepts and methodology, Hydrol. Earth Syst. Sci., 14, 1931–1941, https://doi.org/10.5194/hess-14-1931-2010, 2010a.

Elshorbagy, A., Corzo, G., Srinivasulu, S., and Solomatine, D. P.: Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 2: Application, Hydrol. Earth Syst. Sci., 14, 1943–1961, https://doi.org/10.5194/hess-14-1943-2010, 2010b.

Geoportal Brandenburg – Detailansichtdienst: https://geoportal.brandenburg.de/detailansichtdienst/render?url=https://geoportal.brandenburg.de/gs-json/xml?fileid=A140C263-7D61-447B-81C2-8824792AE190, last access: 29 April 2024.

Getachew, B., Manjunatha, B. R., and Bhat, H. G.: Modeling projected impacts of climate and land use/land cover changes on hydrological responses in the Lake Tana Basin, upper Blue Nile River Basin, Ethiopia, J. Hydrol., 595, 125974, https://doi.org/10.1016/j.jhydrol.2021.125974, 2021.

Ghashghaie, M. and Nozari, H.: Effect of Dam Construction on Lake Urmia: Time Series Analysis of Water Level via ARIMA, J. Agric. Sci. Technol., 20, 1541–1553, 2018.

Gong, Y., Liu, G., and Schwartz, F.: Quantifying the Response Time of a Lake–Groundwater Interacting System to Climatic Perturbation, Water-Sui, 7, 6598–6615, https://doi.org/10.3390/w7116598, 2015.

Habel, M., Nowak, B., and Szadek, P.: Evaluating indicators of hydrologic alteration to demonstrate the impact of open-pit lignite mining on the flow regimes of small and medium-sized rivers, Ecol. Indic., 157, 111295, https://doi.org/10.1016/j.ecolind.2023.111295, 2023.

Haacke, N., Frick, M., Scheck-Wenderoth, M., Schneider, M., and Cacace, M.: 3-D Simulations of Groundwater Utilization in an Urban Catchment of Berlin, Germany, Adv. Geosci., 45, 177–184, https://doi.org/10.5194/adgeo-45-177-2018, 2018.

Hassanzadeh, E., Zarghami, M., and Hassanzadeh, Y.: Determining the Main Factors in Declining the Urmia Lake Level by Using System Dynamics Modeling, Water Resour. Manag., 26, 129–145, https://doi.org/10.1007/s11269-011-9909-8, 2012.

Heinrich, L., Dietel, J., and Hupfer, M.: Sulphate reduction determines the long-term effect of iron amendments on phosphorus retention in lake sediments, J. Soil. Sediment., 22, 316–333, https://doi.org/10.1007/s11368-021-03099-3, 2022.

Heuvelmans, G., Muys, B., and Feyen, J.: Regionalisation of the parameters of a hydrological model: Comparison of linear regression models with artificial neural nets, J. Hydrol., 319, 245–265, https://doi.org/10.1016/j.jhydrol.2005.07.030, 2006.

Hrachowitz, M. and Clark, M. P.: HESS Opinions: The complementary merits of competing modelling philosophies in hydrology, Hydrol. Earth Syst. Sci., 21, 3953–3973, https://doi.org/10.5194/hess-21-3953-2017, 2017.

Irvine, K. N. and Eberhardt, A. J.: Multiplicative, Seasonal Arima Models for Lake Erie and Lake Ontario Water Levels, JAWRA J. Am. Water Resour. As., 28, 385–396, https://doi.org/10.1111/j.1752-1688.1992.tb04004.x, 1992.

Jahn, D. and Witt, H.: Gewässeratlas von Berlin: Senatsverwaltung für Standentwicklung, UNICOM, Berlin, https://www.berlin.de/sen/uvk/_assets/umwelt/wasser-und-geologie/publikationen-und-merkblaetter/wasseratlas.pdf (last access: 12 September 2024), 2002.

Jänicke, B., Meier, F., Fenner, D., Fehrenbach, U., Holtmann, A., and Scherer, D.: Urban-rural differences in near-surface air temperature as resolved by the Central Europe Refined analysis (CER): sensitivity to planetary boundary layer schemes and urban canopy models, Int. J. Climatol., 37, 2063–2079, https://doi.org/10.1002/joc.4835, 2017.

Kakahaji, H., Banadaki, H. D., Kakahaji, A., and Kakahaji, A.: Prediction of Urmia Lake Water-Level Fluctuations by Using Analytical, Linear Statistic and Intelligent Methods, Water Resour. Manag., 27, 4469–4492, https://doi.org/10.1007/s11269-013-0420-2, 2013.

Kebede, S., Travi, Y., Alemayehu, T., and Marc, V.: Water balance of Lake Tana and its sensitivity to fluctuations in rainfall, Blue Nile basin, Ethiopia, J. Hydrol., 316, 233–247, https://doi.org/10.1016/j.jhydrol.2005.05.011, 2006.

Kisi, O., Shiri, J., and Nikoofar, B.: Forecasting daily lake levels using artificial intelligence approaches, Comput. Geosci., 41, 169–180, https://doi.org/10.1016/j.cageo.2011.08.027, 2012.

Kleeberg, A., Köhler, A., and Hupfer, M.: How effectively does a single or continuous iron supply affect the phosphorus budget of aerated lakes?, J. Soil. Sediment., 12, 1593–1603, https://doi.org/10.1007/s11368-012-0590-1, 2012.

Klotz, D., Kratzert, F., Gauch, M., Keefe Sampson, A., Brandstetter, J., Klambauer, G., Hochreiter, S., and Nearing, G.: Uncertainty estimation with deep learning for rainfall–runoff modeling, Hydrol. Earth Syst. Sci., 26, 1673–1693, https://doi.org/10.5194/hess-26-1673-2022, 2022.

Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, https://doi.org/10.5194/hess-23-5089-2019, 2019.

Kroll, C. N. and Song, P.: Impact of multicollinearity on small sample hydrologic regression models, Water Res., 49, 3756–3769, https://doi.org/10.1002/wrcr.20315, 2013.

Langbein, W. B.: Salinity and hydrology of closed lakes: A study of the long-term balance between input and loss of salts in closed lakes, 412, US Government Print. Office, https://doi.org/10.3133/pp412, 1961.

Laval, B., Imberger, J., Hodges, B. R., and Stocker, R.: Modeling circulation in lakes: Spatial and temporal variations, Limnol. Oceanogr., 48, 983–994, https://doi.org/10.4319/lo.2003.48.3.0983, 2003.

Li, J., Wang, Z., Wu, X., Xu, C.-Y., Guo, S., Chen, X., and Zhang, Z.: Robust Meteorological Drought Prediction Using Antecedent SST Fluctuations and Machine Learning, Water Res., 57, e2020WR029413, https://doi.org/10.1029/2020WR029413, 2021.

Liese, M., Nagare, R., and Voigt, H.-J.: 12 Jahre Pilotbetrieb Karolinenhöhe – eine erste Auswertung. Kompetenzzentrum Wasser Berlin gGmbH, https://www.kompetenz-wasser.de/media/pages/forschung/publikationen/44/efe477f2f4-1702634137/Liese-2004-44.pdf (last access: 17 September 2024), ISBN 978-3-9811684-2-6, 2004.

Lischeid, G.: Abschätzung des mittelfristigen Niedrigwasserrisikos anhand der Daten des Grundwassermonitorings, KW Korrespondenz Wasserwirtschaft, 12, 780–785, https://doi.org/10.3243/kwe2021.12.004, 2021.

Lischeid, G., Dannowski, R., Kaiser, K., Nützmann, G., Steidl, J., and Stüve, P.: Inconsistent hydrological trends do not necessarily imply spatially heterogeneous drivers, J. Hydrol., 596, 126096, https://doi.org/10.1016/j.jhydrol.2021.126096, 2021.

Lu, C., He, X., Zhang, B., Wang, J., Kidmose, J., and Jarsjö, J.: Comparison of Numerical Methods in Simulating Lake–Groundwater Interactions: Lake Hampen, Western Denmark, Water-Sui, 14, 3054, https://doi.org/10.3390/w14193054, 2022.

Mason, I. M., Guzkowska, M. A. J., Rapley, C. G., and Street-Perrott, F. A.: The response of lake levels and areas to climatic change, Climatic Change, 27, 161–197, https://doi.org/10.1007/BF01093590, 1994.

Maussion, F., Scherer, D., Mölg, T., Collier, E., Curio, J., and Finkelnburg, R.: Precipitation Seasonality and Variability over the Tibetan Plateau as Resolved by the High Asia Reanalysis, J. Climate, 27, 1910–1927, https://doi.org/10.1175/JCLI-D-13-00282.1, 2014.

McGovern, A., Lagerquist, R., Gagne, D. J., Jergensen, G. E., Elmore, K. L., Homeyer, C. R., and Smith, T.: Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning, B. Am. Meteorol. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1, 2019.

Montanari, A., Rosso, R., and Taqqu, M. S.: Fractionally differenced ARIMA models applied to hydrologic time series: Identification, estimation, and simulation, Water Res., 33, 1035–1044, https://doi.org/10.1029/97WR00043, 1997.

Oyebode, O. and Stretch, D.: Neural network modeling of hydrological systems: A review of implementation techniques, Nat. Resour. Model., 32, e12189, https://doi.org/10.1111/nrm.12189, 2019.

Páliz Larrea, P., Zapata-Ríos, X., and Campozano Parra, L.: Application of Neural Network Models and ANFIS for Water Level Forecasting of the Salve Faccha Dam in the Andean Zone in Northern Ecuador, Water-Sui, 13, 2011, https://doi.org/10.3390/w13152011, 2021.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011.

Pflugmacher, D., Rabe, A., Peters, M., and Hostert, P.: Mapping pan-European land cover using Landsat spectral-temporal metrics and the European LUCAS survey, Remote Sens. Environ., 221, 583–595, https://doi.org/10.1016/j.rse.2018.12.001, 2019.

Sahoo, B. B., Jha, R., Singh, A., and Kumar, D.: Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting, Acta Geophys., 67, 1471–1481, https://doi.org/10.1007/s11600-019-00330-1, 2019.

Sahoo, S. and Jha, M. K.: Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment, Hydrogeol. J., 21, 1865–1887, https://doi.org/10.1007/s10040-013-1029-5, 2013.

Schleich, J. and Hillenbrand, T.: Determinants of residential water demand in Germany, Ecol. Econ., 68, 1756–1769, https://doi.org/10.1016/j.ecolecon.2008.11.012, 2009.

Schulz, S., Darehshouri, S., Hassanzadeh, E., Tajrishy, M., and Schüth, C.: Climate change or irrigated agriculture – what drives the water level decline of Lake Urmia, Sci. Rep.-UK, 10, 236, https://doi.org/10.1038/s41598-019-57150-y, 2020.

Seeboonruang, U.: An application of time-lag regression technique for assessment of groundwater fluctuations in a regulated river basin: a case study in Northeastern Thailand, Environ. Earth Sci., 73, 6511–6523, https://doi.org/10.1007/s12665-014-3872-7, 2015.

Şen, Z., Kadioğlu, M., and Batur, E.: Stochastic Modeling of the Van Lake Monthly Level Fluctuations in Turkey, Theor. Appl. Climatol., 65, 99–110, https://doi.org/10.1007/s007040050007, 2000.

SenUVK: Wasserportal Berlin, SenUVK Berlin [data set], https://wasserportal.berlin.de/stationen_start.php (last access: 13 September 2024), 2023.

Sivapalan, M. and Young, P. C.: Downward Approach to Hydrological Model Development, in: Encyclopedia of hydrological sciences, edited by: Anderson, M. G., Wiley, Chichester, https://doi.org/10.1002/0470848944.hsa141, 2005.

Sivapalan, M., Blöschl, G., Zhang, L., and Vertessy, R.: Downward approach to hydrological prediction, Hydrol. Process., 17, 2101–2111, https://doi.org/10.1002/hyp.1425, 2003.

Solomatine, D. P. and Ostfeld, A.: Data-driven modelling: some past experiences and new approaches, J. Hydroinform., 10, 3–22, https://doi.org/10.2166/hydro.2008.015, 2008.

Souza, F. A., Araújo, R., and Mendes, J.: Review of soft sensor methods for regression applications, Chemometr. Intell. Lab., 152, 69–79, https://doi.org/10.1016/j.chemolab.2015.12.011, 2016.

Tasker, G. D.: Hydrologic regression with weighted least squares, Water Res., 16, 1107–1113, https://doi.org/10.1029/WR016i006p01107, 1980.

Umweltbundesamt: Sewage sludge disposal in Germany, https://www.umweltbundesamt.de/en/topics/sewage-sludge-disposal-in-germany, last access: 31 May 2023.

Valipour, R., Fong, P., McCrimmon, C., Zhao, J., van Stempvoort, D. R., and Rao, Y. R.: Hydrodynamics of a large lake with complex geometry and topography: Lake of the Woods, J. Great Lakes Res., 49, 82–96, https://doi.org/10.1016/j.jglr.2022.09.009, 2023.

Wang, X., Tolksdorf, V., Otto, M., and Scherer, D.: WRF-based dynamical downscaling of ERA5 reanalysis data for High Mountain Asia: Towards a new version of the High Asia Refined analysis, Int. J. Climatol., 41, 743–762, https://doi.org/10.1002/joc.6686, 2021.

Wolter, K.-D.: Restoration of Eutrophic Lakes by Phosphorus Precipitation, with a Case Study on Lake Gross-Glienicker, in: Restoration of Lakes, Streams, Floodplains, and Bogs in Europe, Springer, Dordrecht, 85–99, https://doi.org/10.1007/978-90-481-9265-6_7, 2010.

Woolway, R. I., Kraemer, B. M., Lenters, J. D., Merchant, C. J., O'Reilly, C. M., and Sharma, S.: Global lake responses to climate change, Nat. Rev. Earth Environ., 1, 388–403, https://doi.org/10.1038/s43017-020-0067-5, 2020.

Xu, C.-Y. and Singh, V. P.: A Review on Monthly Water Balance Models for Water Resources Investigations, Water Resour. Manag., 12, 20–50, https://doi.org/10.1023/A:1007916816469, 1998.

Zhu, S., Hrnjica, B., Ptak, M., Choiński, A., and Sivakumar, B.: Forecasting of water level in multiple temperate lakes using machine learning models, J. Hydrol., 585, 124819, https://doi.org/10.1016/j.jhydrol.2020.124819, 2020a.

Zhu, S., Lu, H., Ptak, M., Dai, J., and Ji, Q.: Lake water-level fluctuation forecasting using machine learning models: a systematic review, Environ. Sci. Pollut. R., 27, 44807–44819, https://doi.org/10.1007/s11356-020-10917-7, 2020b.

Articles

Short summary

We study the drivers behind the changes in lake levels, creating a series of models from least to most complex. In this study, we have shown that the decreasing levels of Groß Glienicker Lake in Germany are not simply the result of changes in climate but are affected by other processes. In our example, reduced inflow from a growing forest, regionally sinking groundwater levels and the modifications in the local rainwater infrastructure together resulted in an increasing lake level loss.