Karst spring discharge modeling based on deep learning using spatially distributed input data

,

2 Data and Study Areas

Overview
In this study, we investigate three different karst springs: Aubach spring in the Hochifen-Gottesacker area in Austria (Fig. 75 1a), springs of Unica river in Slovenia (Fig. 1b) and Lez spring in southern France (Fig. 1c). All springs show different characteristics regarding relevant system properties (e.g. catchment size, complexity of the hydrological system), environmental conditions (e.g. dominant climate, anthropogenic forcing) and data availability (see also Table A1). Furthermore, all areas are well studied, existing data was easily accessible and except Unica, several previous modeling approaches are available for comparison.

Aubach Spring, Austria
Aubach spring is a major karst spring in the Hochifen-Gottesacker karst area in the northern Alps at the border between Germany and Austria. Southern border of the area is the Schwarzwasser valley, which geologically forms the contact zone between the Helvetic Säntis nappe in the north and sedimentary rocks of the Flysch zone in the south (Goldscheider, 2005). In the northern part the dominant karst formation is the Schrattenkalk formation, a cretaceous limestone with a thickness of about 100 m. This Schrattenkalk is structured in folds, which hydrogeologically form parallel sub-catchments ( Fig. 1a) that contribute to different proportions to the several springs in the valley (Goldscheider, 2005;Chen and Goldscheider, 2014). In this study we focus on one large, non-permanent spring called Aubach spring (1080 m asl, discharge up to 10 m 3 /s). The Hochifen-Gottesacker area is largely influenced by seasonal snow accumulation and melting in the elevated regions (>1,600 m asl), which is also clearly reflected in the discharge of Aubach spring by increased baseflow and daily snowmelt-induced variations, 90 especially in the months of April to June. Earlier studies by Goldscheider (2005) and Chen and Goldscheider (2014) have identified one major catchment area of Aubach spring with approximately 9 km 2 (Fig. 1a), still, to smaller parts upstream catchments can also contribute to Aubach spring discharge. This applies also to the non-karstified Flysch area directly in the South (southernmost sub-catchment in Fig. 1a), where precipitation events are only relevant during low flow conditions.
For this study Aubach spring is selected because of the good data availability. We use 8 years of hourly discharge data pro-100 vided by the office of the federal state of Vorarlberg, division of water management, and we further use precipitation and temperature data from three surrounding climate stations: Oberstdorf, Walmendinger Horn (shown in Fig. 1a) and Diedamskopf.
Additionally, due to the high importance of snow in the area, we run a snowmelt routine as preprocessing of the meteorological input data as described in Chen et al. (2018). This routine is a slightly modified version (after Hock, 1999) of the HBV hydrological model snow routine (e.g. Bergström, 1975Bergström, , 1995Kollat et al., 2012;Seibert, 2000), which redistributes the precipitation 105 time series in accordance with probable snow accumulation and snowmelt.

Unica Springs, Slovenia
The Unica springs (450 m asl) are located on the southern edge of a karst polje in SW Slovenia and are important from a biodiversity and water supply perspective. There are two permanent and several temporary springs that feed the Unica river.
The joint discharge during 1989-2018 ranged from 1 to 90 m 3 /s, while the mean discharge was 21 m 3 /s (ARSO, 2020a). The 110 springs are fed by three clearly distinguishable sub-catchments covering an area of about 820 km 2 . The main recharge area is the highly karstified Javorniki plateau (up to 1,800 m; marked B on Fig. 1b), whose predominant lithology is Cretaceous rocks, mainly limestones, changing in places to dolomites and breccias. To a lesser extent, Jurassic and Palaeogene carbonate rocks are also present. The thickness of the unsaturated zone is estimated to be up to several hundred meters (Petrič et al., 2018, and references therein). To the east, the hydrology of the area is controlled by a strike-slip fault zone, along which a chain of 115 karst poljes developed (between 500 and 700 m; marked C on Fig. 1b). Upper Triassic dolomites predominate, changing to Jurassic limestones and dolomites in the south and west, forming aquifers with fracture porosity, which in places have very low to moderate permeability, and in some parts a superficial river network forms. As the karst poljes follow each other in a downward series, they are connected in a common hydrological system with transitions between surface and groundwater flows and frequent flooding (Mayaud et al., 2019). In the west, the Pivka River Basin (between 500 and 700 m; marked A on Fig.1b)

120
consists of poorly permeable Eocene flysch in the north, which conditions a surface river network. The southern part consists of Cretaceous and Jurassic carbonate rocks forming a shallow karst aquifer. Surface flow occur during high water levels, receiving additional water from intermittent springs on the western foothills of the Javorniki. The water flow of the sinking rivers in the subsurface from the A and C parts is clearly of the channel flow type. The springs were selected for this study because they drain a complex binary karst system of the so-called Classical Karst, they are well studied with a large amount 125 of hydro-meteorological data and their hydrology is influenced by significant ::::::::: substantial snow accumulation and melting. along the basin and through the major geologic fault of Corconne-Les Matelles in the northern part of the basin (indicated by a grey line in Fig. 1c).
The hydrogeological basin associated to the Lez spring has been estimated to be about 240 km 2 (Fig. 1c) on the basis of the hydrodynamic response to high discharge continuous pumping into the saturated zone of the aquifer (Thiéry and Bérard, 1983).
However, the effective recharge catchment of the Lez spring, which corresponds to the extent of Jurassic limestone outcrops, 145 has been estimated to be about 130 km 2 (Fleury et al., 2009;Jourde et al., 2014). The Lez karst aquifer is under anthropogenic pressure (i.e. aquifer exploitation for water supply) with pumping performed directly within the karst conduit. The discharge is measured at the spring pool and is regularly null during low water periods, when the pumping rate exceeds the natural spring discharge. Ecological water discharge towards the Lez river (160 L/s then 230 L/s after 2018) is ensured during such periods by a partial deviation of the pumped water to the river. Lirou spring (Fig. 1c) is the main of several overflow springs that activate 150 during high flow periods (Jourde et al., 2014).
The Lez catchment is exposed to a Mediterranean climate, which is characterized by hot and dry summers, mild winters and wet autumns. Analyses by MeteoFrance show that on average 40% of the annual precipitation occurs between September and November with a high variability across years (Bicalho et al., 2012). The average annual rainfall rate for the 2008-2018 period is 904 mm.

Spatial Climate Data
Besides climate station data, we explored raster data from the E-OBS (Cornes et al., 2018), the ERA5-Land (Muñoz Sabater, 2019) and from the RADOLAN (DWD Climate Data Center (CDC)) as spatially distributed model inputs. E-OBS provides 175 daily gridded meteorological data for Europe from 1950 to present, derived from in-situ observations, ERA5-Land provides hourly reanalysis data from 1981 to present. Both are available with a spatial resolution of 0.1°x 0.1°(approx. 8 km x 11 km for all study areas). Depending on the dataset, different sets of parameters are available. In case of E-OBS we initially provide our models with precipitation (P), mean, minimum and maximum temperature (T, Tmin, Tmax), relative humidity (rH) and surface shortwave downwelling radiation (Rad). For ERA5-Land, where a significantly :::::::::: substantially larger set of parameters is 180 available, the following were used as initial inputs: total precipitation (P), 2m temperature (T), total evaporation (E), snowmelt (SMLT), snowfall (SF) and volumetric soil water of all four available layers (SWVL1: 0 -7 cm, SWVL2: 7 -28 cm, SWVL3: 28 -100 cm, SWVL4: 100 -289 cm). Relevant input parameters from both datasets are later selected through Bayesian optimization (see section 3.3). The spatial extent of the input data is chosen very generously for each spring, so that between 6 and 8 additional cells are available as input data around the respective catchments. This prevents a predefinition of the area that 185 needs to be identified as relevant as well as reduces the influence of possible border effects due to the CNN approach using 3x3 filters (compare methodology section). The resolution of ERA5-Land and E-OBS data corresponds to the grid cell size shown in the catchment plots in Figures 1a-c, although each showing a slightly different absolute position of grid center points. We already see that the relation of grid resolution to catchment size varies strongly, with Aubach catchment being the smallest and Unica catchment being the largest. Depending on the temporal resolution of the available spring discharge measurements, we 190 choose the spatial input data in accordance, thus E-OBS for Unica and Lez spring, ERA5-Land for Aubach spring.
Compared to the catchment size of Aubach spring (about 9 km 2 ), the spatial resolution (approx. 8 km x 11 km) of the gridded input data is extremely coarse. We therefore additionally explore a combination of ERA5-Land input parameters (except P) with radar based precipitation data (RADOLAN) that offers a spatial resolution of 1 km x 1 km (DWD Climate Data Center (CDC)). For each of the RADOLAN grid cells, the according parameter value of the ERA5-Land data is chosen, where the spatial extent of the 2D-input data to save calculation time, but increase the number of cells compared to the ERA5 section around Aubach spring due to the higher resolution of the RADOLAN grid.

200
In this study, we simulate karst spring discharge with deep learning models using meteorological input data. As proof of feasibility, we use meteorological data from surrounding climate stations. However, these stations often show limited data availability in terms of the number of parameters (mostly limited to precipitation and temperature, rarely more), the record length, as well as the sampling interval. Also, the spatial coverage and proximity is often unsatisfactory, which especially in mountainous regions can introduce a distinct error of parameters with high spatial variability such as precipitation.

205
Gridded meteorological data can offer a solution to these issues, as they usually provide good temporal coverage and resolution, a reasonable spatial resolution as well as a large-scale (e.g. continental or even global) availability. Further, especially reanalysis data include a larger parameter set. However, when the catchment of the spring is unknown, it remains unclear which cells to retrieve time series from when using 1D-time series as inputs. Based on our revised version of the approach of Anderson and Radic (2021), we demonstrate a solution by processing 2D-inputs and letting the model decide by itself, which 210 parts of the input data are relevant to model the spring discharge.

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (LeCun et al., 2015) are widely applied in several domains such as object recognition, image classification, and signal processing. The structure of most CNN models is based on the repetition of blocks that are made up of several layers, typically at least one convolutional layer followed by a pooling layer. The former matches the dimension of the 215 input data (e.g. 2D for image alike data, 1D for sequences such as time series) and uses filters with a fixed size (receptive field) to produce feature maps of the input. The latter performs down-sampling of the produced feature maps and hence increases the density of information. A large variety of model structures based on such blocks, in combination with additional layers in between to prevent exploding gradients (e.g. batch normalization layers ::::::::::::::::::::: (Ioffe and Szegedy, 2015)) or model overfitting (e.g. dropout layers ::::::::::::::::::: (Srivastava et al., 2014)) are possible; however CNNs usually end with one or several fully connected dense 220 layers to produce a meaningful output.

Time Interval
:::: :::: (365) : :::: and early stopping patience are varied manually for each model at each test site. Hyperparameters for the 1D-CNNs of both setups are optimized on the respective optimization set as stated above, maximizing the sum of Nash-Sutcliffe efficiency and R 2 (calculated as explained below). The number of optimization steps is also varied manually for each model and is always 255 a trade-off between accuracy and computational costs. In case of many available input parameters we treat input parameter selection equally as a global optimization problem and use Bayesian optimization to simultaneously select a proper set of input parameters and hyperparameters. Thus, input optimization is used for each 2D-model, as ERA5-Land and E-OBS offer several different meteorological parameters, as well as to the 1D-model of Unica springs. For Lez spring and Aubach spring, only a smaller input parameter set is available and hence fully used. For all models we offer an additional input (Tsin), which 260 is a sinus curve fitted to the temperature data. This parameter can provide the model with information on seasonality and on the current position in the annual cycle (Kong-A-Siou et al., 2014). Precipitation is the only parameter that is not optimized but fixed as input, because it has undoubtedly the most important influence on the discharge of a karst spring. ::: The ::::::::: optimized :::::::::::::: hyperparameters, :::::::::: information :: on ::::: some ::::: fixed :::::::::::::: hyperparameters, :::: and : a :::::::: summary :: of ::: the ::::::: number :: of ::::::::: parameters ::: in :::: each :::::: model, :: is :::: given :: in ::::::::: Appendix ::::: Table ::: D1. :

265
We calculate several metrics to evaluate the performance of our models: Nash-Sutcliffe Efficiency (NSE) (Nash and Sutcliffe, 1970), squared Pearson r (R 2 ), root mean squared error (RMSE), Bias (Bias) as well as Kling-Gupta-Efficiency (KGE) (Gupta et al., 2009). For squared Pearson r we use the notation of the coefficient of determination (R 2 ), because we compare the linear fit between simulated and observed discharge, thus of a simple linear model, which makes them equal in this case.

295
:::: Error :::::::: measures ::::::: indicate : a :::: high :::::::: accuracy :: of ::: the :::::: model ::::::::: simulation, : NSE and R 2 values both are 0.74, KGE is 0.79. Overall, this is a satisfying fit of a complex time series. We observe that peaks in winter and spring are underestimated but the snowmelt period, clearly visible by increased baseflow and daily variations from April to June, is nicely fitted, as well as the following summer peaks. A short series of discharge peaks in the end of September/beginning of October is not captured. We assume that these were caused by small-scale precipitation events that are not represented in the data of the climate stations used as inputs.

300
Interestingly, daily variations, which are probably ::::: might :: be : learned during the snowmelt period, are also visible in periods not influenced by snow (e.g. in August). From experience (Chen et al., 2017b) we know the high relevance of snow in this area and by coupling the CNN model with a snow routine data preprocessing, we are able to further improve the model performance ( Fig. 3b). We now can achieve a fit with 0.77 for both NSE and R 2 , KGE increases to 0.84. Our model is able to nicely fit the second largest peak of the whole dataset, which occurs in February, though, the peak is slightly overestimated, whereas other 305 peaks still tend to be underestimated. The snowmelt period remains well simulated, but shows increasing deviations close the end of the period. The earlier noticed daily variations in summer and autumn, now no longer appear, which is most certainly an effect of the snowmelt preprocessing.
Please note that the 95% model uncertainty :::: from ::::::: random ::::::: number :::::::::: dependency, estimated from 10 different ::::::::: differently :::::::: initialized : models with a Monte-Carlo dropout distribution from 100 runs each (i.e. 1000 simulations in total), is very low 310 for both modeling results (a+b) compared to the overall variability of the discharge. Major source of uncertainty is therefore probably the especially :: In ::::::: general, ::: we :::::: assume ::: the : spatially limited input data : to ::: be ::: the ::::: major :::::: source :: of ::::::::: uncertainty, as all climate stations have a certain distance to the catchment area; :::::::: however, :: as :::::::: explained :: in ::: the ::::::: methods ::::::: section, :::: such ::::::::: uncertainty :: is ::: not ::::::: included :: in ::: the ::::::::: confidence ::::::: interval ::::: shown :: in ::: the ::::: plots. Other modeling approaches (Chen and Goldscheider, 2014;Chen et al., 2017bChen et al., , 2018  on the same data set. Despite our model shows a slightly lower NSE value compared to these three models, it is in the same order of magnitude and performs probably :::: range :::: and ::: we :::::: assume :: it :: to ::::::: perform at least equally as none of the previous studies covered a complete annual cycle as contiguous test period, including high peaks in late winter and strong snowmelt influence in spring and early summer. Figure 3c shows the results of the 2D-modeling setup using (only) ERA5-land input data. Based on Bayesian optimization, 325 besides the fixed and not optimized input P, the following input parameters are selected: T, E, SMLT, SWVL2 and SWVL4 (for a comparison of selected parameters with other study areas see also Table A1). The performance of the 2D-model is similar to that of the 1D-models, showing a NSE (0.76) and RMSE in-between both, a larger R 2 (0.8) but a lower KGE (0.71). This performance is still very satisfying ::: high : considering that the major catchment is extremely small (about 9 km 2 ) compared to one ERA5-Land grid cell, and that a large grid section of 14 x 14 ERA5-Land cells (1.4 • x 1.4 • ) was used as input. We see 330 that the major peak in February is slightly underestimated, as well as the beginning of the snowmelt period in April; however, the end of this period in May/June has improved now compared to (b). Both 1D-models are superior in estimating the peaks especially during summer, except the already mentioned peaks in September/October, which have improved using the 2D-input data. This supports the assumption that the climate stations do not represent these precipitation events, but the 2D-data does.
To account for the small area of the catchment of Aubach spring, Figure 3d shows the results of the 2D-input data, using the eters. We have reduced the spatial extent of the 2D-input, but use an even higher grid cell number (22 x 22 or 22 2 km 2 ), with a reasonable buffer around the catchments. Additionally to P, the optimized model uses inputs from all available parameters except E and SWVL3, thus T, Tsin, SMLT, SF, SWVL1/2/4. This model shows the best performance of all four models by reaching an : a : NSE of 0.81, R 2 of 0.82 and KGE of 0.81. Similar to the model in (c), the beginning of the snowmelt period 340 in April remains slightly underestimated and compared to the 1D models and the peaks in summer are less well fitted. Nevertheless, we generally see a very satisfying fitof the simulation results :: an :::::::: accurate :: fit, especially the largest peak in February is simulated almost perfectly :::: well ::::::::: represented. Compared to the 1D-approach, the main source of uncertainty for both 2D-models is probably :::::: should :: be the uncertainty of parameter values resulting from the ERA5-land grid cell sizes, which is too large compared to the catchment size. Improved downscaling of ERA5 data or other high resolved climate data for a combination with 345 RADOLAN precipitation data might be a promising approach for simulating small catchments like this one. Model uncertainty derived from random number effects and Monte Carlo dropout is (equally to the 1D-models) satisfyingly small. In total we see that both the 1D and the 2D approach for this catchment bear significant :::::::: substantial : uncertainties in terms of input data, even though the results are quite satisfactory ::::::: generally :::: very ::::::: accurate. On the one hand the climate stations represent the true observed climate, on the other hand this is true only for a very specific point, in this case even outside the catchment and embedded into 350 a highly variable topography. The 2D data, however, shows too large grid cell sizes for Aubach spring catchment and is itself modeled (in case of ERA5-Land). We therefore do not think that one approach is superior in terms of uncertainty for this study area : , ::: but ::: we :::: can :::: show :::: that :::: even ::: in ::: this :::: case ::::: with :::::::: relatively :::::: course ::::::: gridded :::: input :::: data ::::::::: compared :: to ::: the ::::::::: catchment ::::: size, ::: the :::::::::: 2D-approach :::::: offers : a :::::: decent ::::::::: alternative.

390
Tmax, rH and Rad. We generally observe a similar shape of the simulation as for the 1D-model but with overall reduced errors.
Still, the plateau shape of some peaks is not well captured but the same conceptual understanding as for the 1D-model seems to be learned. Recessions are generally too conservative, especially the simulation of low flow periods and minor discharge events improve clearly though. These results are plausible, as minor discharge peaks are probably ::::::::: presumably : due to small precipitation events, which are probably not ::: may ::: not :: be : reflected in the selected climate station data, but visible in the gridded 395 data derived from a larger set of observational points. We have, however, noticed that training the 1D-model with data starting 1961 (20 years earlier), can still significantly improve the performance, similar to the 2D results shown here. We do not provide these results as they were produced in a slightly different model setting, not fully consistent with this work. To fully exploit the gridded parameter set and maintain comparability, the results shown here are based on a start in 1981, which corresponds to the start date of rH from the E-OBS data (other parameters start earlier). As for Aubach spring, both models show a comparably 400 low uncertainty based on random number variation and Monte-Carlo dropout, the uncertainty of the 2D-simulation is even a bit lower than for the 1D-model. The :::::: Again, ::: we :::::: assume ::: the :::::::: spatially :::::: limited :::::: climate :::::: station :::: data :: to ::: be : Kovačič et al., 2020) even including ANNs (Sezen et al., 2019), but none of these directly modeled Unica springs discharge, but rather focused on other aspects like cave hydraulics or polje modeling.

410
Lez spring represents a third class of study area, as the catchment size (around 240 km 2 ) is somewhere in between the two others, the climate is Mediterranean and the spring runs dry for a significant :::::::::: considerable amount of time during the annual cycle due to a constant exploitation :: of ::: the :::: karst :::::: aquifer : through pumping. Figure 5 shows both the results for the 1D-(a) and the 2D-model (b). Despite comparably short training (daily data, starting in 2008) we observe a very high fit of the 1D-model above 0.86 for NSE, R 2 and KGE. As well the timing of the peaks, the absolute height of the peaks, as the dry periods are 415 simulated very well :::::::: accurately, except some deviations in early 2019.
For the 2D-model we use input from 19 x 18 E-OBS grid cells and the Bayesian model selects only rH and Rad as inputs besides the fixed input P. Considering the high relevance of potential evapotranspiration in Mediterranean areas, this is probably perfectly ::::: should :: be :::::: easily learnable from relative humidity and radiation data. The performance of the model is basically very good, but clearly lower compared to the 1D-model, showing NSE, R 2 and KGE between 0.75 and 0.78. Generally, : 2018 is 420 better simulated than 2019, which is however also slightly visible for the 1D-model. Some non-existent peaks are simulated by the model in the dry sections, after all one of them (in Oct. 2018) is also simulated by the 1D-model. These :::::::::: Presumably, :::: these : differences in performance can probably be explained by looking again at the input data. The climate stations, from which the interpolated precipitation time series is derived, are mainly located inside the catchment and additionally represent a good spatial coverage. Compared to both other study areas the 1D input uncertainty is definitely the best ::::: lowest for Lez spring 425 catchment. Though, it seems harder to extract the relevant data from the gridded data, which may be related to uncertainties from the grid cell size in relation to the catchment size (e.g. which may be more important than for Unica but way less than for Aubach). Nevertheless, the model uncertainty based on initializations and derived from Monte-Carlo dropout again is satisfyingly small for both model setups, especially for the :::::: during dry periods.

Spatial Input Sensitivity Results
The most important results of the spatial input sensitivity analysis from all catchments are shown in Figure 6. In case of Aubach spring modeled with ERA5-Land data ( Fig. 6a) : , we can see that the catchment is hardly the size of one grid cell.
Hence, despite the quite solid discharge modeling, we see no clear spatial meaning of the precipitation channel heatmap but 455 the higher sensitivity in the direction the weather is approximately coming from :::: main :::::::: direction :: of ::: the ::::::: weather :::: area. We also see a border effect with an almost uniform decrease in sensitivity toward the edges. This is an important reason to choose the spatial extent of the data large enough, as it is probably ::: this ::::: effect :::::: should ::: be related to the size of the filter in the convolutional layer (3 x 3). This occurs even though clipping is used while performing convolutions :::::: applied : to improve the informative value of the edges. Though, not all heatmaps show this pattern . In case of Lez spring (Fig. 6d)we see that the sensitive region is 460 clearly separated from the rest, thus we cannot see a border effect even if probably present. For Aubach spring, precipitation shows only the fourth highest sensitivity (S) in terms of absolute values, while the second most sensitive parameter is snowmelt (SMLT), which shows also the best spatial agreement with the catchment area. This is plausible insofar as the discharge for a large part of the time is dominated by snowmelt and to a lesser extent directly by precipitation. We conclude that even though the modeling results are satisfying, not much meaning can be extracted from the spatial sensitivity analysis for such a small 465 catchment. Please find heatmaps of all other parameters in Appendix C ::::: Figure ::: C1. The combined approach of RADOLAN and ERA5-Land (Fig. 6b) data shows the heatmap in more detail in relation to the size of the catchment. We show only the precipitation heatmap, because it is the only parameter with a native resolution of 1 km x 1 km and we do not consider the spatial patterns of the remaining ERA5-Land-based parameters to be meaningful to interpret. We observe that the most sensitive cells are identified close to the spring and at the border between the main catchment and the southern adjacent subcatchment.
Heatmaps of all four selected E-OBS parameters at Unica catchment are shown in Figure 6c. In accordance with our expectation for karst areas, we see the highest sensitivity for precipitation, which :::::: visually : also identifies the catchment area quite 480 exactly. We think that the size of the Unica catchment in terms of catchment delineation is probably the best out of all three catchments investigated in this study, concerning the given spatial resolution of the used input data. ::: very ::::: well. Especially Tmax and rH show high sensitivities on larger areas, however they are usually highly spatially autocorrelated and do not show a strong spatial heterogeneity like precipitation, which makes it plausible that the model learns from larger data fractions :::: areas.
The model further identifies an area in the north as most sensitive for radiation. We know that radiation is a controlling factor 485 for snowmelt. Thus, we can speculate that snowmelt in this northern area, which is already part of the Alps, might be well correlated to the catchment snowmelt. On catchment scale, however, more snow is expected in the southern part, which cannot be seen on the heatmap.
From the obtained insights we can conclude that karst spring discharge can be predicted accurately with the presented ::: 1D 535 ::: and ::: 2D : approaches. Their accuracy rivals that of existing models in the three study areas, with the difference that far less prior knowledge of the system under consideration is required than, e.g., for lumped parameter models. This can significantly :::::::::: substantially : reduce the amount of work required, provided that a mere simulation of the spring discharge is the objective; albeit, we do not gain knowledge about hydraulic processes in the aquifer as we do from lumped parameter models. We can further show that gridded climate data can provide an excellent substitute for non-existent or patchy climate station data. This 540 does not require knowledge of the exact catchment area, which is a critical component especially for karst springs. Rather, 2D-CNNs can be used to generate a first approximation of the catchment area ::::::: location. However, this approach is subject to some inaccuracies and needs further development in combination with 2D-meteorological input data in a finer spatial resolution in relation to the catchment size. Additionally, a sufficient heterogeneity of precipitation in comparison to the catchment size is necessary. Thenit can most certainly : , ::: we :::::: assume :: it ::: can : be used to delineate catchments quite accurately, which can then 545 be evaluated against tracer tests and hydrogeological studies. In the present state, however, it does not replace conventional catchment delineation methods. In terms of accuracy, we do not find that one of the tested model setups is fundamentally superior. Nevertheless, we would conclude that the 2D-approach is superior to the 1D-approach with respect to the effort of data collection as well as data and parameter availability. Though, a significantly :::::::::: substantially : increased computational effort is necessary for the training as well as for the optimization of the models. In summary, gridded meteorological data is excellently 550 useful to overcome missing climate station data and to get a quite good idea of the spatial extent of larger catchments in relation to the grid cell size.
Code and data availability. We provide complete model codes on Github (AndreasWunsch/CNN_KarstSpringModeling) .
Due to redistribution restrictions from several parties we cannot provide a dataset. Nevertheless, the data is available from the respective local authorities listed in the main text and in the following. 2D-datasets (E-OBS, ERA5-Land) are fully accesible online via Copernicus

555
(cds.climate.copernicus.eu). Aubach spring discharge and climate data from surrounding climate stations in Austria are availble on request from the office of the federal state of Vorarlberg, division of water management, Oberstdorf station data (German Meteorological Service) is available online (opendata.dwd.de). Data from Slovenia can be retrieved from ARSO (Slovenian Environment Agency)(ARSO, 2020a, b).
Lez spring discharge was provided by SNO KARST (2021), climate data is available on request from MeteoFrance.

Appendix B: Lez Catchment Precipitation Interpolation
The Thiessen's polygon interpolation method consists of calculating a weighted average of the precipitation data by allocating a contribution percentage to each meteorological station, based on its influence area on the catchment. These influence areas are calculated through geometric operations. First, we draw straight-line segments between each adjacent station, then we add the perpendicular bisectors of each segment, which will define the edges of the polygons. Each meteorological station thus 565 corresponds to a particular polygon, for which the precipitation over the surface is assumed to be the same as the measured precipitation at the station.
contributed to data curation activities. AW wrote the original paper draft with contributions from GC and NR. All authors contributed to interpretation of the results, and review and editing of the paper draft. TL and NG supervised the work.
Competing interests. The authors declare that they have no conflict of interest. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. We acknowledge the E-OBS dataset and the data providers in the ECA&D project (https://www.ecad.eu), data from MeteoFrance, DWD and the office of the federal state of Vorarlberg, division of water management. Lez spring discharge data were provided by the KARST observatory network