Spatiotemporal optimization of groundwater monitoring networks using data-driven sparse sensing methods

. Groundwater monitoring and speciﬁc collection of data on the spatiotemporal dynamics of the aquifer are pre-requisites for effective groundwater management and determine nearly all downstream management decisions. An optimally designed groundwater monitoring network (GMN) will provide the maximum information content at the minimum cost (Pareto optimum). In this study, PySensors, a Python package containing scalable, data-driven algorithms for sparse sensor selection and signal reconstruction with dimensionality reduction is applied to an existing GMN in 1D (hydrographs) and 2D (gridded groundwater contour maps). The algorithm ﬁrst ﬁts a basis object to the training data and then applies a computationally efﬁcient QR algorithm that ranks existing monitoring wells (for 1D) or suitable sites for additional monitoring (for 2D) in order of importance, based on the state reconstruction of this tailored basis. This procedure enables a network to be reduced or extended along the Pareto front. Moreover, we investigate the effect of basis choice on reconstruction performance by comparing three types typically used for sparse sensor selection (i.e., identity, random projection, and SVD, respectively, PCA). We deﬁne a gridded cost function for the extension case that penalizes unsuitable locations. Our results show that the proposed approach


Introduction
Groundwater is a vital resource for drinking water supply and industrial, commercial, and agricultural uses. Therefore, effective groundwater management and monitoring practices are critical to ensure the availability and quality of water supplies for future generations. A groundwater monitoring network (GMN) is defined by a spatial arrangement of groundwater monitoring wells and a temporal sampling frequency (Loaiciga et al., 1992). In most cases, there are economic interests behind groundwater management and thus also behind a monitoring network. As a result, while many monitoring networks meet the basic requirements for groundwater management, they are scientifically insufficient to monitor aquifer dynamics. Considering monitoring costs and monitoring quality (i.e., the information gained by monitoring) as axes in a two-dimensional coordinate system, optimal GMNs lie along a Pareto front on which the maximum information content is achieved for the respective budget. Moreover, existing GMNs are usually grown historically regarding the locations and number of monitoring wells and are therefore primarily inefficient. This means that the monitoring quality is relatively low for the given costs. Thus, optimization could reduce the operating costs without loss of monitoring quality by optimizing the monitoring network regarding the number of wells and their location (Emmert et al., 2016). Furthermore, directives such as the European Water Framework Directive (WFD; EC, 2000) or the European Nitrates Directive (ND; EC, 1991) demand the integration of regional monitoring networks into national or international networks. Selecting a reasonable subset of these networks capable of capturing the dynamics of the groundwater body is an essential and challenging task.
Usually, GMNs are classified according to their purpose into a groundwater quality monitoring network (GQMN), i.e., mostly multivariate, and groundwater quantity/groundwater level monitoring network (GLMN), i.e., univariate. This classification does not preclude a GMN from performing both tasks. However, optimization approaches usually address one of the two tasks. To date, there are neither standard regulations for the planning and expansion of existing GMNs nor established methods. Instead, a high degree of subjectivity prevails. In the last few decades, many studies have been published dealing with the optimization of GMNs. The widely varying requirements for optimizing monitoring networks led to various approaches that attempt to meet these requirements differently. The choice of method usually depends on the GMN type (GQMN vs. GLMN), scale (local, regional, and national), uni-or multivariate network, optimization strategy (extension of GMN vs. reduction of redundant wells), consideration of dynamics (spatial vs. spatiotemporal), and, last, purpose of the monitoring network. With the latter, a distinction is usually made between risk-oriented monitoring (mainly concerning groundwater quality in the catchment of waterworks) and surveillance monitoring, e.g., according to the European WFD.
In general, the design of a monitoring network is considered a nonlinear and non-convex optimization problem whose optimal criterion measures the useful information contained in the information matrix of the design (Ushijima et al., 2021). GMN optimization approaches are commonly divided into the following three categories based on the techniques applied: (a) those based on hydrogeological conceptual models and hydrogeological expert knowledge, (b) those based on numerical groundwater flow models (Kim and Lee, 2007;Singh and Datta, 2016;Thakur, 2017;Sreekanth et al., 2017), and (c) those based on data analysis with (geo-)statistical techniques. Many studies have focused on the geostatistical ability of kriging frameworks to determine new monitoring wells based on the reduction of estimation variance as the optimization criterion (Nunes et al., 2004;Li et al., 2011;Varouchakis and Hristopulos, 2013;Bhat et al., 2015;Thakur, 2015;Ohmer et al., 2019). With the steady increase in computational capacity in recent years, there are a growing number of studies that tackle these optimization problems using traditional datadriven heuristic optimization criteria such as genetic algorithms (GAs; Dhar and Patil, 2012;Reed and Kollat, 2013;Khader and McKee, 2014;Puri et al., 2017;Pourshahabi et al., 2018;Ayvaz and Elçi, 2018;Yudina et al., 2021;Komasi and Goudarzi, 2021), artificial neural networks (ANNs; Alizadeh et al., 2018), particle swarm optimizations (Gaur et al., 2013;Guneshwor et al., 2018;De Jesus et al., 2021), support vector machines (Asefa et al., 2004;Bashi-Azghadi and Kerachian, 2010, SVMs;) and relevance vector machines (RVMs; Khalil et al., 2005;Ammar et al., 2008), or a combination of these approaches. Further studies use entropyand information-theory-based approaches (Hosseini and Ker-achian, 2017;Keum et al., 2017;Alizadeh and Mahjouri, 2017) and Kalman filtering (KF; Kollat et al., 2011;Júnez-Ferreira et al., 2016).
In most of the mentioned studies, the optimization of the GMN amounts to a computationally intensive combinatorial search with innumerable multi-dimensional iteration steps, since many complex physical systems are described by a high-dimensional state [x ∈ R n ]. Moreover, improved data recording and increasing storage leads to fast and strongly growing system complexities and, therefore, to increased computing time beyond Moore's law (Moore, 1998). However, the dynamics of such complex systems often evolve on a low-dimensional attractor, which can be used to predict and control these systems. Pattern extraction is associated with the search for coordinate transformations that simplify the system dynamic and the computational effort (Brunton and Kutz, 2017). In recent years, powerful new techniques in data science have been developed that are capable of analyzing complex data and extracting essential features and correlations from high-dimensional dynamic systems. Sparse sampling (Candes et al., 2005;Candes and Wakin, 2008;Baraniuk, 2007), sparse reconstruction (Yildirim et al., 2009;Annoni et al., 2018;Castillo and Messina, 2020), and sparse classification (Brunton et al., 2016) enable the recovery of relevant information from remarkably few measurements. Although sparse sampling, such as compressed sensing, is a common and powerful method often used in other fields of science including seismic and medical image processing, fluid dynamics, or remote sensing, to our knowledge, there are only a few studies in the field of hydrogeology that applied sparse sensing for hydrogeological tasks. Hussain and Muhammad (2013) utilized sparse signal extraction methods based on l 1 norm minimization to exploit the spatial sparsity in hydrodynamic models and thereby reduce the number of measurements needed to reconstruct the signal. Lee et al. (2021) used compressed sensing for generating groundwater level (GWL) contour maps based on sparsely sampled or incomplete data from a groundwater model below the Nyquist-Shannon sampling criterion (Shannon, 1949). They found that compressed sensing performed much better compared to traditional interpolation methods such as kriging. Ushijima et al. (2021) developed an experimental design algorithm to select locations for a network of monitoring wells with maximum information. The combinatorial search was performed with a GA combined with a proper orthogonal decomposition (POD) to reduce the computational cost of using the GA. POD, which is often formulated using the singular value decomposition (SVD), is a dimensionality reduction method that extracts relevant large coherent structures/patterns (lowdimensional features) from high-dimensional data (Pollard et al., 2017).
This study focuses on a data-driven algorithm to optimize a GMN regarding the number and locations of monitoring wells for temporal and spatial GWL reconstruction. The algorithm uses data-driven sparse-sensing techniques and a QR-based sensor placement algorithm that ranks sensors, in our case monitoring wells, according to their information content. It is based on work by de Silva et al. (2021a), Manohar et al. (2018), and Clark et al. (2019), implemented in the PySensors package, and has, e.g., been successfully applied in a similar context of sensor placement for sea surface temperature reconstruction, fluid flow data (Manohar et al., 2018;Clark et al., 2019), and wind flow data (Annoni et al., 2018), as well as for classification tasks in, e.g., image recognition or cancer classification by microarrays (Brunton et al., 2016). We have adapted this methodology for the first time for the application to GMN, as we see the following advantages over existing methods: (i) it can simultaneously take spatial and temporal information into account, (ii) it allows the ranking of existing monitoring wells based on their information content, (iii) based on the ranking, an existing network can be reduced, while the values either of the abandoned wells or a spatially continuous GWL can be reconstructed, (iv) it proposes locations for an extension of a network that account for the best possible gain in knowledge, and (v), if necessary, allows the application of a cost function for the extension of the network (Clark et al., 2019), either to prefer more suitable locations (e.g., in terms of infrastructure) or to exclude certain areas (like inaccessible terrains, steep slopes, etc.).
We apply the adapted algorithm to a real-world GLMN to demonstrate its suitability for groundwater monitoring networks in general. The data set used for this purpose consists of weekly GWL monitoring between 1990 and 2015 from 480 monitoring sites in the Upper Rhine Graben (URG)'s upper alluvial aquifer. In particular, we show how the algorithm can be applied to address the following questions regarding the optimization of an existing network: -What is the ranking of monitoring wells in an existing network in terms of their information content/reconstruction performance, i.e., in which order should the wells be removed if a network reduction is desired?
-How does the reconstruction/interpolation error vary as wells are progressively removed from the monitoring network in the order of the proposed ranking, and how does this compare to the removal of randomly selected wells?
-When the goal is network extension, where should new wells be placed for maximum information gain? How significant is the increase in information, i.e., how much will the spatial reconstruction error be reduced?
-How well does a combined reduction/extension (i.e., replacement) of a certain number of wells perform compared to a straightforward extension? Most multi-dimensional natural signals are compressible (respectively, sparsely representable). That means that when the signals are transformed into a convenient coordinate system (basis), only a limited number of basis modes are active. These basis modes correspond to the large mode amplitudes (Brunton and Kutz, 2017). In data compression, for example, JPEG or MP3 file compression, only these values are stored to efficiently reconstruct the input signal with a considerable reduction in data size and little loss of information.
A compressible signal x ∈ R n can be written as a sparse vector s ∈ R n on a new orthonormal basis of ∈ R n×n such that, in the following: Vector s is K sparse if it is a linear combination of only K basis vectors (exactly K nonzero elements). The theory of compressed sensing uses this principle as it attempts to infer the sparse representation s in a known transformed basis system with a very small, low-dimensional (compressed) subsample.
where the vector y ∈ R p is a set of incoherent observations and C ∈ R p×n an observation matrix of p linear observations. is the condition number. The objective of compressed sensing is to find the l 1 norm of the sparsest vectorŝ (under a set of conditions) that is consistent with y, as follows: s = argmin s s 1 such that y = s , which almost certainly ends up with the sparsest possible solution for s (Candes et al., 2005;Candes and Wakin, 2008;Donoho, 2006;Baraniuk, 2007).

Sparse sensor placement
While compressed sensing uses random measurements to reconstruct high-dimensional unknown data from a universal basis ∈ R n×n , a data-driven sparse sensor placement collects available information about a signal from observed samples to build up a tailored basis r ∈ R n×r for the respective signal and thus to identify optimal sensor placements for the reconstruction of this signal with low losses. Let the full signal be an unknown linear combination of basis coefficients a ∈ R r (vector of mode amplitudes of x in basis ) as follows: The central challenge is to design a incoherent (i.e., rows of C not correlated with columns ψ of r ) measurement matrix C that allows us to identify the optimal p observations y i to accurately reproduce the signal x, as follows: For n sensor observations and a given p sensor budget, the sampling matrix C must be structured as follows: Here e j ∈ R n are the canonical basis vectors with a unit entry at index j and zeros elsewhere. Thus, each row of C only observes from a single spatial location corresponding to the sensor location. The observations are made up of p elements selected from x, with γ ∈ N p as an index set of the sensor locations that designates the index with cardinality |γ | = p and, additionally, the number of sensors n ≥ r of for a well-defined linear inverse problem (Manohar et al., 2018). The unknown x can thus be reconstructed by approximating a with the Moore-Penrose pseudoinverse of Eq. (5) to the following: where † denotes the Moore-Penrose pseudoinverse. It is assumed that optimal sensor selection C * is mostly a sparse subset selection operator, and the nonzero entries in the rows represent the monitoring wells.

Tailored basis r
As described above, in data-driven sparse sensing, the universal basis is replaced by a tailored basis r which is built from the training data X tr , e.g., by using dimensionality reduction techniques. In this study, we are using the following three basis types which are typically used for sparse sensor selection: -Identity basis. Centered raw data are used directly without dimensionality reduction. r = X tr . Since no lowrank approximation of the data is performed, no information is lost. However, this comes at the cost of a longer computation time (de Silva et al., 2021a).
-Random projection basis. Dimensionality is reduced by projecting the input data onto a randomly generated matrix r = GX tr , where the entries G ∈ R 2p×m are drawn from a Gaussian density function with mean zero and variance 1/n (Dasgupta, 2000;Li et al., 2006).
-SVD/principal component analysis (PCA). Linear dimensionality reduction is performed using a truncated SVD. SVD is a numerically robust and efficient method for extracting dominant patterns from a low dimension (Golub and Kahan, 1965;Halko et al., 2011). For a matrix X ∈ C n×n , the SVD is given by the following: The columns of are the left singular vector of X. They are often termed as spatial correlations, principal components, features, or POD modes of the data set. is a diagonal matrix.

QR pivoting for sparse sensors
While the previous steps serve to best fit the basis to the training data, the next steps aim to determine the resulting optimal sensor locations that minimize the reconstruction error. This optimization problem is solved using an approximate greedy solution using reduced QR factorization with column pivoting (Brunton and Kutz, 2017;Halko et al., 2011). The QR factorization decomposes a matrix A ∈ R m×n into a unitary matrix Q and an upper-triangular matrix R and a column permutation matrix C (Eq. 6), such that AC T = QR. The diagonal inputs of R are determined by a selection of the pivot columns with maximum l 2 norm within all modes in the library. Subsequently, the orthogonal projection of the pivot column is then subtracted from all other columns, and the process is iteratively repeated over all columns. Thus, QR factorization with column pivoting yields r column indices (which correspond to sensor locations) that best sample the r basis modes (columns) T r .
Since the pivot columns represent the sensors, the QR factorization results in a hierarchical list of all n pivots, where the first p pivots are optimized for the reconstruction of r . This means that, in the GMN optimization based on hydrograph data, all monitoring wells are ranked based on their information content. When using spatial input data, e.g., from interpolation or model results, all gridded input data cells are ranked based on their information content. Thus, it allows recommendations for the placement of additional monitoring wells at locations with supposedly high information content. The used QR decomposition approach includes a cost constraint function (Clark et al., 2019). This constraint allows different costs to be considered when selecting sensor placement, such as favoring or excluding certain areas.

Application cases
In principle, there are the following two possible application cases of the algorithm regarding groundwater monitoring data: (i) the application to the observed data at the wells (i.e., hydrographs) only and (ii) the application of spatially continuous gridded information that has been regionalized based on the well data (e.g., by interpolation). While the optimization based on hydrographs only serves to rank the individual wells of the network according to their information content and thus to identify and eliminate redundant wells, the spatially continuous input data also allow a GMN extension at optimal locations and the best possible reduction of the spatial prediction error.

Error metrics
The calculation of the reconstruction error for a given set of measurements is done using the root mean square error (RMSE) for the scoring function. We further used the following metrics widely used for calibration and evaluation of hydrological models: mean absolute error (MAE), Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), squared Pearson's correlation coefficient (R 2 ), and relative bias (rBias). In the following equations, o stands for observed values, r for the reconstructed values, cov is the covariance, σ is the standard deviation, µ is the arithmetic mean, and n stands for the number of measurements.
The RMSE is one of the most commonly used error index statistics. In general, the lower the RMSE, the better the model performance. It is useful for comparing different model performances for a given time series. However, only the relative root mean square error (rRMSE) is meaningful in comparing the model performance between different time series.
Analogous to the RMSE, the smaller the MAE, the better the performance, as follows: The NSE (Nash and Sutcliffe, 1970) is a widely used goodness of fit measure of hydrologic models, as it normalizes model performance into an interpretable scale (Knoben et al., 2019). The NSE ranges between −∞ and 1, where 1 indicates a perfect correspondence between observations and reconstructions, while a NSE = 0 indicates that the model has the same explanatory power as µ(o).
The KGE (Gupta et al., 2009) was proposed as an alternative to the NSE because it addresses several shortcomings of the NSE (Knoben et al., 2019). Like the NSE, a KGE = 1 indicates a perfect model correspondence. However, explicit statements on benchmark performance have varied so far.
where r is the linear correlation between o and r, α is a measure of the variability error, and β is a bias term. We use the squared Pearson r (Eq. 14) correlation coefficient as a general coefficient R 2 . It describes the degree of collinearity between measured and reconstructed data. R 2 ranges from 0 to 1, with higher values indicating lower error variance. In general, values above 0.5 are considered acceptable.
The relative bias is a measure for a systematic over-or underestimation of a model. The optimal rBias is 0. Positive values indicate model underestimation of bias; negative values indicate model overestimation of bias. (Gupta et al., 1999).
Statements about model performance in Sect. 3 are based on the Moriasi et al. (2007) guidelines for model evaluation.

Hydrogeological framework
The Upper Rhine Graben (URG), also known as Rhine Rift Valley, is a 300 km long and, on average, 50 km wide structural trough. It was formed in the Oligocene in response to the alpine orogenesis and subsequently filled with fluvial to lacustrine sediments of the Late Miocene, Pliocene, and Quaternary (Przyrowski and Schäfer, 2015). The Pliocene and Quaternary alluvial gravels and sands represent the largest groundwater reservoir in central Europe (LUBW, 2006). Based on their permeability and the appearance of finegrained horizons, the Pliocene and Quaternary gravels are subdivided into three (locally also more) aquifers, partly separated by fine-grained sediments (Wirsing and Luz, 2007). The study region is in the Baden-Württemberg part of the URG (Fig. 1). The Rhine forms the western boundary, and the Kaiserstuhl volcano complex is the southern boundary.
To the east, the URG is bounded by a rift flank uplift composed of a system of troughs and highs, which follow the ENE structural grain of the Variscan fold belt (Derer, 2003). Along the study area these are, from south to north, the Black Forest high, Kraichgau basin, and Odenwald-Spessart high. Groundwater recharge occurs predominantly through lateral inflow and infiltration of streams from the Black Forest valleys in the east, the Freiburg basin in the southwest, and the infiltration of the Rhine and other surface waters.

Data and preprocessing
The data set used in this study consists of weekly GWL measurements from 480 wells in the uppermost aquifer within the Quaternary sand/gravel deposits of the URG, covering the period from 1990 to 2015 (i.e., 1304 time steps). Data values that deviate by more than ±3σ from the moving average (with a window size of 11 values) are considered outliers and were removed from further processing. Data gaps were subsequently filled based on information from highly correlated neighboring hydrographs using the clustering results of Wunsch et al. (2022). Where this did not yield plausible results or was not possible due to similarly missing data in neighboring hydrographs, an alternative PCHIP (piecewise cubic hermite interpolating polynomial) interpolation was performed. To avoid possible bias, the measurement data are globally and locally centered. The data set is split into the following two subsets: the first 80 % (January 1990-December 2009; 1043 time steps) is used to train the algorithm and the last 20 % for test/validation (January 2009-December 2014; 261 time steps). The well data are publicly available from the Baden-Württemberg State Office for Environment web service (LUBW, 2021).

Groundwater contour maps
We used the hydrograph data to generate 1304 weekly groundwater contour maps with a grid size of 50 × 50 m using ordinary kriging. In addition to finding a sparse set of monitoring wells for the optimal temporal reconstruction of other hydrographs, the objective is to identify monitoring wells that allow optimal spatiotemporal reconstruction of GWL from a subset of the wells. Moreover, the spatially continuous information of the gridded contour maps is used to suggest additional locations for an extended network. We used an isotropic Gaussian semivariogram model for interpolation, which is flexible and, therefore, a good candidate for a standard model (Krivoruchko, 2011). The associated parameters partial sill (42.7 m), range (17 853 m), lag size (1485 m), and nugget (0.05 m) were optimized using automatic cross-validation (CV) diagnostics to achieve the lowest mean square error. It should be noted that the use of a single variogram model (Gaussian) may not be the optimal way to quantify spatial correlation, especially for nonstationary data. However, this is a necessary simplification due to the automation process that still produces comparable interpolation results, while the best possible interpolation result is not the focus of this study. Just as with the hydrographs, the first 80 % (January 1990-December 2009; 1043 time steps) of the contour maps were used to train the algorithm and the last 20 % for test/validation (January 2009-December 2014; 261 time steps).

Results and discussion
The following section is structured as follows. First, the grid search results regarding the three types of basis used, the number of basis modes, and a varying number of sensors are presented and discussed. Since we are applying the presented sensor placement approach to a groundwater monitoring well optimization, we use the term "well" as a synonym for sensors in the following. This is followed by the results of the GMN optimization based solely on the hydrograph data set.
Finally, the results of the GMN optimization with the interpolated GWL as inputs are shown. Figure 2a shows the RMSE between the estimated and actual GWL for the validation set as a function of the number of wells that the network is reduced to, the type of tailored basis (identity basis, random projection, and SVD basis), and the number of basis modes used. Since the number of wells and basis modes interacts, it is necessary to determine the appropriate number of basis modes that will result in the lowest reconstruction error for a given number of wells. If the number of wells is close to the number of basis modes, the reconstruction error increases significantly for all three basis mode types used. While the number of basis modes for identity and random projection is theoretically open-ended, the dimensionality for SVD, and thus the number of basis modes, must be less than the number of wells. Although there are SVD methods ( The results show that, with only a few basis modes, SVD has the highest accuracy (Fig. 2a). As the number of basis modes exceeds the number of remaining wells, the identity and random projection basis perform better. In general, random projection and identity basis perform similarly. With fewer than 950 basis modes, slightly better results are obtained with random projection; above 950, the identity basis performs marginally better. Figure 2b shows the lowest RMSE per number of wells achieved in the grid search, and Fig. 2c shows the corresponding number of basis modes. The gray dashed line in Fig. 2b shows the median of reconstruction errors from 100 iterations, with a random well selection as a benchmark. Except for SVD basis with fewer than 50 wells, all three basis types perform considerably better compared to reconstruction with the randomly placed wells and independently of the number of wells. Our findings are consistent with those of Manohar et al. (2018), Clark et al. (2019), and de Silva et al. (2021a), where SVD consistently underperforms compared to random projection and identity basis. However, the latter two show almost identical results with an optimized number of basis modes (Fig. 2b), with random projection performing marginally better with a small number of wells and identity basis performing slightly better with a larger number of wells. For consistency reasons, and based on the grid search results, we decided to compute the optimization steps shown in the following uniformly with identity basis and a fixed number of 1043 basis modes. This combination, on average, shows the best results for any chosen number of remaining wells (Fig. 2c).

Grid search results
As examples, all results of the following section are presented using five reduction stages: 10 %, 25 %, 50 %, 75 %, and 90 %. Thus, in the 25 % stage, the 25 % wells with the lowest information content are removed, and their hydrographs are predicted using the remaining optimal 75 % of the monitoring wells. The reduction stages are also shown in the grid search results (Fig. 2b and c). The color scheme of the reduction stages is kept for all remaining figures. Figure 3 shows the result of the ranking of the monitoring wells computed with an identity basis and using 1043 basis modes. The ranks are assigned from 1 (essential well with high information content) to 480 (most redundant well); thus, lower numbers mean a higher ranking concerning their information content and importance to reconstructing potentially removed redundant wells. The reduction stage at which the respective wells are removed is indicated in parentheses. The color scheme ranges from dark red for redundant monitoring wells that can be eliminated with a minor loss in prediction accuracy to dark blue for important monitoring wells that contain essential information about the system and are needed for the accurate reconstruction of signals at other monitoring wells. In addition, Fig. 4 shows the centered hydrographs of the most important 10 % and the most redundant 10 % of all hydrographs. Most of the important wells (blue; removal at a reduction of > 90 %) show a pronounced flashiness (i.e., high frequency and rapidity of shortterm changes) and strong irregular patterns during the recording period. These dynamics indicate a strong interaction with surface waters or boundary inflows, for example, from side valleys of the rift flanks, which can also be seen in Fig. 3 from the location of the wells. Additionally, the most important wells include those with a distinct trend, which can be best seen for the two highest-ranked wells at the bottom, showing an upward trend over the considered period. In contrast, the redundant wells show low flashiness and also include wells with high seasonality, though most of the signals seem to be dominated by interannual variations. Most of these wells are located in the northern part of the study area within the URG. Since the eastern boundary in this area is the Kraichgau basin, the landscape profile is less pronounced than in the Black Forest hill range in the south and the Odenwald in the north. Therefore, less recharge occurs   through stream infiltration, which is often the reason for more pronounced short-term variations or flashiness. The hydrographs of all wells can be found in the Appendix. Overall, the ranking shows that the most important wells include the ones with a noticeable unusual behavior, i.e., patterns that are not present in many of the other wells (like flashiness, trends, and jumps) and thus are hard to reconstruct. Overall, the more redundant wells either show a higher seasonality or tend to show low variability. Both patterns are common to a larger number of the wells and can be reconstructed more easily.

Ranking of wells and network reduction with hydrograph data
While the ranking itself already contains essential information and could be used, for example, to equip higherranked monitoring wells with higher-quality sensors or measure them with a higher time frequency, we use the ranks here to reduce the original network well by well, with most of the results shown only for the five abovementioned reduction stages, i.e., GMN reduction by 10 %, 25 %, 50 %, 75 %, and 90 %.
The upper part of Fig. 5 shows the development of the prediction accuracy of the GMN reduction for the error measures NSE, KGE, and R 2 (left), as well as rBias, RMSE, and MAE (right), for the validation data set (mean and ranges of the reconstructed validation period of all predicted/removed wells). Even with only a few optimally selected wells, the predictive power is considerably higher than the mean value of the time series (NSE = 0). An average performance for the validation period of all predicted removed wells rated as satisfactory (NSE > 0.5) is already achieved with only nine remaining wells (corresponds to a reduction of 98.1 %), those rated as good (NSE > 0.65) with 21 remaining wells (95.6 %), and those rated as very good (NSE > 0.75) with 54 remaining wells (88.7 %). With more than 191 wells (60.2 %), the NSE rises above 0.9. KGE and R 2 behave in much the same way as NSE. A KGE of 0.75 is achieved with nine wells (98.1 %) and 0.9 with 144 wells (70.0 %). R 2 of 0.75 is achieved with 22 wells (95.4 %) and of 0.9 with 155 wells (67.7 %), respectively. A MAE of 0.1 m is achieved with 31 monitoring wells (93.5 %) remaining, 0.05 m with 147 wells (69.4 %), and 0.01 m with 394 wells (17.9 %). From a reduction of more than about 75 %, the removal of each subsequent well leads to a disproportionate decrease in accuracy, with a very steep drop from about 95 % on. When the reduction is less than 75 %, the gradient shows a nearly linear course, meaning a linear (but small) performance increase with more monitoring wells. The rBias also approaches zero at a reduction below 75 %. Thus, we conclude that about 25 % of the wells could be seen as a kind of absolute minimum that is required to adequately describe the system dynamics in the considered study area, despite the average NSE of the reconstruction still being rated as good for only the optimally selected 10 % of the wells. The lower part of Fig. 5 displays the performance of the QR-optimized wells compared to an equal number of randomly selected remaining wells for a reduction by 10 % (48 wells) to 90% (432 wells) of the GMN. Removing wells based on the ranking results leads to a lower prediction loss. Only at a reduction of more than 90 % are the errors in the same range as for the randomly removed wells. Therefore, the information content of these 10 % remaining wells is probably not sufficient to reflect the overall dynamics. Again, from about 75 % downwards, the performance differences become more pronounced, with a considerably higher average, 25 % quantile, and minimum NSE, KGE, and R 2 values (lower MAE and RMSE, respectively). Below a reduction of 25 %, the 75 % quantile and minimum NSE, KGE, and R 2 values are also clearly higher (lower for MAE and RMSE, respectively). This clearly shows that the advantages of the data-driven optimization method come into play, especially for moderate to smaller reductions of a GMN. Figure 6 shows the temporal reconstruction accuracy at the considered reduction stages from 10 % to 90 % for eight selected wells (see also Fig. 3). These wells were chosen to reflect the dynamics spectrum and represent the full ranking range. The reconstruction is always based on the higherranked remaining wells (but keeping the chosen reduction stages). Consequently, well 154-304-1, the highest-ranked well shown with rank 59 (bottom), which could theoretically be reconstructed with a maximum of 58 remaining wells, is reconstructed with 10 % of the wells (48). Similarly, well 132-257-4, the lowest-ranked well with rank 478 (top), which could theoretically be reconstructed with a maximum of 477 wells, is reconstructed with 90 %, 75 %, 50 %, 25 %, and 10 % remaining wells (432, 360, 240, 120, and 48, respectively) for a comparison.
The results show that the individual dynamics of the hydrographs can already be adequately reconstructed with a 10 % subset of the monitoring network. As expected, as the Figure 6. Hydrograph reconstruction at the five reduction stages 10 %-90 % shown for eight monitoring wells, along with the respective error measures. number of wells increases, the accuracy improves on average, hydrographs are reproduced more consistently, and short-term peaks are reproduced more accurately. Though these seem to be only, comparatively, slight improvements, considering the overall dynamics for some wells and time steps, the absolute errors can be up to several tens of centimeters, albeit achieved by many additional wells. Whether this justifies the increased operating costs of the monitoring network depends on the task at hand. The reconstruction results for the other wells can be found in the Appendix.

Network reduction and extension based on gridded GWL contour maps
This application case is based on spatially continuous gridded weekly GWL contour maps from 1990 to 2015. Analogous to the hydrograph data set, the first 80 % of this period was used for model training and the last 20 % for evaluation. According to the ranking, we investigate how well the GWL can be reconstructed with two reduction stages in which 10 % and 20 % of the GMN are removed. We have selected these reduction stages since, in the analysis of the reduction stages with hydrographs, the advantages of the datadriven optimization method were more pronounced for moderate to smaller reductions of a GMN. Moreover, reducing an existing network by more than 20 % seems unrealistic in practice. Furthermore, we extend the existing network by 10 % and 20 % wells and analyze where new wells are placed to supposedly improve the GMN. To account for more or less suitable locations (e.g., with regard to infrastructure), we apply costs with a non-uniform spatial step function on a 50 × 50 grid (corresponding to the used GWL grid) into the QR factorization. The cost function grid is assigned zero (no additional costs) at existing monitoring well locations to ensure that existing wells remain since, technically, the extension (by e.g., 10 %) is realized such that 110 % of new wells are placed on the gridded GWL maps. At potentially well-suited additional sites, which are defined within 50 m of roads and paths, outside surface waters, and where the slope is less than 20 %, we assigned a cost value of 21. For all other areas that are considered as not suitable, the cost weighting is set to 22. Alternatively, a gradual cost function can be used, where the weighting increases with distance to the infrastructure or similar. It should be noted that the weighting depends on the system, basis, and cost function and must be adjusted for the particular case (Clark et al., 2019). We assigned the mentioned weighting factors iteratively until it resulted in the desired behavior. With the weightings chosen in this way, it was possible to achieve a result such that the first 480 wells are placed at existing monitoring wells, and all subsequent wells are placed at suitable locations, while the algorithm avoids the other locations. Finally, we combine a reduction/extension scenario, where the original number of wells is kept, but the 10 % and 20 % most redundant wells are removed and replaced afterward. Technically, this is done in a two-step procedure, consisting of the reduction step followed by the above-described extension, where the cost function is adapted for the existing wells. The maps on the left side of Fig. 7a show the spatial distribution of the monitoring wells as a result of a 10 % (yellow dots) and 20 % (red dots) reduction (1), and a 10 % (light blue dots) and 20 % (dark blue dots) extension (2) of the GMN, as well as a combined reduction/extension of the GMN (3). For the latter, redundant 10 % and 20 % monitoring wells were eliminated and replaced with wells at optimal locations. We should note that the ranking based on the spatially interpolated data is different from the ranking based on the hydrographs alone (see Appendix A).
This variation can be explained by the ranking reflecting the information content regarding the reconstruction with the lowest possible error. While, in the case of hydrographs, the goal is to reconstruct the hydrographs of the removed wells, here the goal is to reconstruct the interpolated surface (which constitutes a best guess of spatially continuous GWL based on the available data).
While the first 10 % of reduced wells are evenly distributed across the study area, the subsequent removal step (i.e., additional 10 %; thus, 20 % removed wells in total) eliminates well clusters in the central and northern regions. This seems conclusive because clusters of nearby wells tend to show similar dynamics and thus do not add much information to an interpolation, according to Tobler's law. Optimal locations for additional wells are identified primarily along the western and eastern margins, i.e., along the Rhine and downstream of the alluvial valley aquifers of the adjacent Black Forest. These are areas with expected higher groundwater dynamics (e.g., high seasonal magnitudes and high flashiness) and, on the other hand, due to the elongated geometry of the URG, areas with increased interpolation uncertainty (transition from interpolation to extrapolation). Optimal well locations are primarily, but not exclusively, located in areas of increased variability (standard deviation of the interpolated GWL; see Fig. 7a 2 and a 3 ).
The box plots in Fig. 7c show the mean (left) and maximum (right) absolute error of the reconstructed 261 GWL contour maps of the evaluation set for all abovementioned scenarios. It has to be noted that the MAE is now taken as the mean over the spatial axis, i.e., for each of the reconstructed 261 GWL contour maps separately (whereas, with the hydrographs, the MAE was taken as the mean over the time axis for each reconstructed well). This was done because, in the application case, the focus is on the error of a spatial reconstruction of GWL contours and not explicitly on time series. Correspondingly, the maximum absolute error (maxAE) is the maximum over the spatial axis for each of the reconstructed 261 GWL contour maps. We therefore also refer to them as mean and maximum spatial reconstruction errors. Thus, the boxes in Fig. 7c show the variability in the mean absolute error and maximum absolute error over the 261 time steps. On average, the model can reconstruct the GWL contour maps with very high accuracy, with mean absolute errors far below 1 cm. This seems very low compared to the reconstruction of the hydrographs. However, this is due to the fact that the reconstruction of a large number of many similar values (i.e. raster pixels) is much easier for the model, for the following two reasons: (i) there are many more training patterns for each type of dynamics than with the hydrographs alone, and (ii) the overall dynamics are reduced by the interpolation itself, which smooths the data spatially and temporally. Taking these limitations due to spatially interpolated data into account, it seems more reasonable to focus on the maximum absolute error, which allows the identification of areas with higher errors, where the model (with the existing data) cannot produce reliable reconstructions and additional wells would bring the most information.
When comparing the maxAE ( Fig. 7c; right) for all scenarios, we see that a reduction in the network increases the spatial reconstruction error by a factor of about 2 for 10 % reduction and about 3 to 4 for 20 %. For comparison, the gray box (100 %) shows the reconstruction errors with an unchanged GMN (this error results from the fact that the model is trained with the first 80 % of all time steps, but the reconstruction is performed for the unknown 20 % of the evaluation data set). An extension of the network by 10 % can considerably reduce the spatial reconstruction error to about less than two-thirds, while an extension by 20 % reduces it further to 1/10 of the initial value. Most interestingly, the reconstruction errors for the combined reduction/extension scenarios with 90 %/10 % and 80 %/20 %, respectively (thus an unchanged number of 480 wells in total), are slightly below the straightforward GMN extension with 110 % (528 wells) and 120 % (576 wells). To a lesser degree, this also applies to the mean absolute errors, at least for the 80 %/20 % scenario, which performs slightly better than an extension by 20 %, and considerably better than a 10 % extension. In practice, that means that, with a combined reduction/extension, for example, sensors/data loggers that become available can be used elsewhere at better locations. This reduces the installation costs of the additional wells and the operating costs of the GMN and, moreover, performs about the same as or even better than a pure extension.

Conclusions
This study investigated data-driven sparse sensing approaches based on the work of de Silva et al. (2021a), Clark et al. (2019), and Manohar et al. (2018) and adapted them to optimize an existing GLMN. The algorithm fits a tailored basis to the training data, subsequently used in a QR decomposition to rank the monitoring wells by importance based on reconstruction performance. This approach allows us to remove groundwater monitoring wells with low information content if needed, equip monitoring wells with higher rank with higher quality sensors, or measure with a higher time frequency. When using spatially continuous input data (by interpolation or numerical simulation), the ranking is performed according to the same scheme for all locations. This rank can be used as a decision-making aid to search for locations for additional monitoring wells. We incorporated a cost function to eliminate inaccessible locations from the site selection process. Adjusting the cost constraint allows a specific adaptation to the individual problem definition.
Our results show that identifying redundant, low-ranking monitoring sites would allow a drastic reduction of the monitoring network, with a minor loss of information, compared to a random reduction (which corresponds to a reduction based on other criteria, as is often the case in practice). In the case of a desired network extension, the reconstruction quality can benefit from the additional removal of unsuitable wells.
As in related previous studies (Clark et al., 2019;Manohar et al., 2018), using the identity data basis (raw data without dimensionality reduction) and the total number of available base modes yielded a lower reconstruction error for a given number of wells compared to other basis mode types and numbers. This is because no information is lost when constructing a low-ranking approximation to the data. However, for larger data sets than the one used in this study, an optimization without previous dimensionality reduction can lead to impractically long computation times. Just as in the work of Clark et al. (2019) and Manohar et al. (2018), a randomized projection of the data in our study performed, on average, only slightly worse than the raw data and may be worthwhile for large data sets or multiple computational runs due to lower computational costs. Even though the widely used SVD basis gave the worst results for our data set, the reconstruction errors are still lower than for a random network optimization.
In addition to GLMN, this approach can also optimize groundwater quality or multivariate monitoring networks. As with all data-driven methods, the quality of the results depends strongly on the availability of the input data (spatial and primarily temporal). Since this approach relies on detecting patterns in data and placing monitoring locations based on those patterns, it benefits from large data sets. Therefore, we see the main application of this technique in optimizing monitoring networks of regional-scale groundwater systems, where a comprehensive overview of the variability and quantity of groundwater bodies and the assessment of long-term changes in natural conditions is the monitoring objective.
Overall, we could demonstrate that modern data-driven methods of sparse sensing are well suited for the application to groundwater monitoring networks, as long as there is a good historic data basis. The applied method can be used for an optimization regarding the number of wells and their location, for a network reduction and extension, or for both combined. Using hydrographs (1D) as input data, the applied approach allows an information-based assessment of an oper-ated monitoring network. The outcomes can be used to identify representative key wells for selecting expressive subnetworks, equip the critical wells with improved data loggers, or release installed sensors/loggers at redundant wells for more suitable locations. The spatial dependency structures and the sphere of influence of wells can be considered in optimizing with two-dimensional input data, both for reduction and for an extension of monitoring networks tailored to the dynamics of the aquifer. Although optimized reduction can generally lead to greater cost efficiency, it should always be done judiciously and in combination with expert knowledge of the system.
Appendix A Figure A1. Comparison of QR-based ranking with 1D hydrograph data (left; panel a 1 ), with 2D interpolated GWL contour maps (middle; panel a 2 ) and differences of rankings between 1D and 2D (right; panel a 3 ).