Accurate prediction of groundwater table is important for the efficient management of groundwater resources. Despite being the most widely used tools for depicting the hydrological regime, numerical models suffer from formidable constraints, such as extensive data demanding, high computational cost, and inevitable parameter uncertainty. Artificial neural networks (ANNs), in contrast, can make predictions on the basis of more easily accessible variables, rather than requiring explicit characterization of the physical systems and prior knowledge of the physical parameters. This study applies ANN to predict the groundwater table in a freshwater swamp forest of Singapore. The inputs to the network are solely the surrounding reservoir levels and rainfall. The results reveal that ANN is able to produce an accurate forecast with a leading time of 1 day, whereas the performance decreases when leading time increases to 3 and 7 days.

Physical-based numerical models are widely used in groundwater table simulation. Different numerical models have been developed for different regions with different objectives, such as to describe regional groundwater flow patterns, and to understand local hydrological processes. (e.g. Matej et al., 2007; Pool et al., 2011; Yao et al., 2015). Numerical models solve the deterministic equations to simulate the groundwater systems based on the knowledge of the system characteristics, initial conditions, system forcings, etc. To develop a groundwater numerical model, essential data include: topography, geological coverage, soil properties, land use map, vegetation distribution, evapotranspiration information, hydrologic and climatic data, etc. Extensive data demanding makes numerical models highly data dependent and data sensitive. Fitting a physical model is not possible when data are not sufficient; the accuracy of the numerical model to a great extent depends on how accurate the model inputs are. Numerical models are also less competent in forecasting as most of the system forcings (e.g. evapotranspiration, rainfall) are less predictable. As a result of aforementioned constraints, numerical models tend to produce imperfect results in spite of the perfect knowledge of the governing laws (Sun et al., 2010).

To combat the deficiencies of the numerical models, artificial neural networks (ANNs) have emerged as an alternative modelling and forecasting approach with a variety of applications in hydrology research (e.g. French et al., 1992; Maier and Dandy, 2000). Unlike the traditional physical-based models, the ANN-based approach does not require explicit characterization of the physical properties, or accurate representation of the physical parameters, but rather simply determines the system patterns based on the relationships between inputs and outputs mapped in the training process. ANNs typically use input variables that are more accessible to make predictions, and therefore circumvent the data reliance inherent to the numerical models. As compared to classical regression techniques, e.g. linear regression modelling, ANNs are capable of simulating the nonlinear dynamics of the hydrological processes and hence result in superior modelling and forecasting performance.

ANNs in recent years have also been successfully applied in groundwater table modelling. Yang et al. (1997) utilized ANN to predict groundwater table variations in subsurface-drained farmland. Coulibaly et al. (2001) calibrated three different ANN models using groundwater recordings and other hydrometeorological data to simulate groundwater table fluctuations. Lallahem et al. (2005) showed the feasibility of using ANN to estimate groundwater level in an unconfined chalky aquifer. Daliakopoulous et al. (2005) examined the performance of different ANN architectures and training algorithms in groundwater table forecasting. Taormina et al. (2012) developed a two-step ANN model to simulate the groundwater fluctuations in a coastal aquifer using past observed groundwater levels and external inputs, i.e. evapotranspiration and rainfall. Most of above studies, however, focus on applying ANN in large-scale semiarid or arid watersheds, where groundwater table is less variable and long-term groundwater table variation (e.g. monthly, annually) is of more concern. In addition, these studies use historical groundwater tables as inputs to the network, requiring continuously long groundwater table recordings which can be a luxury for many regions.

This study, for the first time, applies ANN to forecast the groundwater table in a tropical wetland – the Nee Soon Swamp Forest (NSSF) in Singapore. Being nourished with water supply from reservoirs and precipitation, the groundwater table in the NSSF is close to the ground level and extremely sensitive to the changes in hydrometeorological conditions. This study selects surrounding reservoir levels and rainfall as inputs to the network, avoiding the requirement on continuously long groundwater table recordings. The forecast is made with 3 leading times, i.e. 1 day, 3 days, and 7 days, which provide sufficient reaction time for human intervention to maintain favourable hydrological conditions for conserving local ecosystems. The methodology, application, results, and conclusions are elaborated in the following sections.

As defined by Haykin (1999), artificial neural networks (ANNs) are massively parallel distributed processors made up of simple processing units, known as neurons, which have a natural propensity for storing experiential knowledge and making it available for use. ANNs are inspired by biological neural networks to emulate the way in which human brains function. The fact that neurons can be interconnected in numerous ways results in numerous possible topologies that can be divided into two basic classes, i.e. feedforward neural networks (FNNs) and recurrent neural networks (RNNs; Graves et al., 2009). In FNNs information flows from inputs to outputs in only one direction, whereas in RNNs some of the information can flow not only in one direction from inputs to outputs but also in the opposite direction.

There are many algorithms for training neural network models, most of which employ some form of gradient descent using backpropagation to compute the actual gradients (Werbos, 1974). The backpropagation algorithm is implemented by taking the derivatives of the cost function with respect to the synaptic weights and then changing the weights in a gradient-related direction (Sexton and Dorsey, 2000; Mandischer, 2002).

This study opts for a standard FNN and a quasi-Newton training algorithm, more specifically a multilayer perceptron (MLP) trained with the Levenberg–Marquardt (LM) algorithm, attributing to its superior accuracy in groundwater table forecasting (Daliakopoulous et al., 2005).

Multilayer perceptron (MLP) was developed for pattern classification by
Rosenblatt (1958). The architecture of a typical MLP consists of an input
layer, one hidden layer and an output layer. In mathematical terms, a
computational neuron in the hidden or output layers can be described by
following pair of equations:

The universal approximation theorem states that every continuous function defined on a closed and bounded set can be approximated arbitrarily closely by an MLP provided that the number of neurons in the hidden layers is sufficiently high and that their activation functions belong to a restricted class of functions with particular properties (Hornik et al., 1989).

Geographical location of the Nee Soon Swamp Forest in Singapore.

The Levenberg–Marquardt (LM) algorithm, independently developed by
Levenberg (1944) and Marquardt (1963), provides a numerical solution to the problem of
minimizing a nonlinear function. The update rule of the LM algorithm can be
presented as follows:

The LM algorithm essentially blends the steepest descent method and the
Gauss–Newton algorithm. The optimization process is guided by the
combination coefficient

Figure 1 shows the geographical location of the study area – the Nee Soon Swamp Forecast (NSSF) in Singapore. The NSSF is located in the northern part of the Singapore central catchment nature reserve bounded by the Upper Seletar, Upper Peirce, and Lower Peirce reservoirs. As the only substantial freshwater swamp forest remaining on the main island of Singapore, the NSSF houses a diversity of flora and fauna, some of which are found nowhere else in Singapore or the world (Karunasingha et al., 2013).

Observed vs. MLR- and ANN-forecasted groundwater tables (P1).

With an estimated area of about 750 ha, the NSSF covers the lower area of
shallow valleys with slow-flowing streams and a few higher grounds with
dryland forests. The elevation of NSSF ranges between 1 and 80 m above mean
sea level (a.m.s.l.). The aquifer depth in the NSSF is 20–40 m, and the
major soil type features silty sand with a hydraulic conductivity of

The surrounding reservoirs serve as important freshwater storage for Singapore, with reservoir levels being kept at relatively high levels ranging from 10 to 40 m a.m.s.l. Singapore has a typical tropical rainforest climate with abundant rainfall; the annual rainfall at the NSSF region can be as high as 3000 mm. Despite being another important influential factor for the groundwater, observed evapotranspiration is not available due to the constraints imposed from setting up monitoring stations in the protected forest, and hence it is excluded in the ANN setup. Reservoir levels and rainfall, as the major water source and driving force, are fed to the networks as inputs, while the output is the observed groundwater tables with a leading time of 1, 3, and 7 days (i.e. future observed groundwater tables after 1, 3, and 7 days).

A multiple-input multiple-output (MIMO) network is selected over 4 multiple-input single-output (MISO) ANNs for two reasons: (1) it is easier to implement; and (2) cross correlation exists in the observed groundwater tables, e.g. the synchronous response to dry and wet conditions; targeting the groundwater table measurements at 4 locations simultaneously, the cross-correlation impact can be captured in the synaptic weights of the trained ANN and hence a better performance is expected. The MIMO network is composed of an input layer with 4 input neurons (including 3 reservoir levels and one rainfall), a hidden layer with 10 neurons (inspired by the universal approximation theorem and determined by trial and error), and an output layer with 4 output neurons (future observed groundwater tables at the 4 piezometers). The logistic function and threshold function are respectively adopted as the activation functions for the hidden layer and the output layer.

Scatter plots of observed and ANN-forecasted groundwater tables (P1).

Daily observed data, i.e. reservoir levels, rainfall and groundwater
tables, are available in 2012 and 2013. The data set is divided into three
subsets as follows:

Training data (January 2012–December 2012)

Training data are used for adjusting the synaptic weights in the network. An entire year's data are selected as the training data, so as to expose the network to a complete annual cycle for a robust training.

Cross-validation data (January 2013–June 2013)

Cross-validation data are used for avoiding overfitting. When the errors between the predicted values and desired values in the cross-validation data begin to increase, the training stops and this is considered to be the point of best generalization. One half of a year's data are selected as the cross-validation data.

Testing data (July 2013–December 2013)

Testing data are used for evaluating the performance of the network. Once the network is trained, the weights are frozen; the testing set is fed into the network and the network output is then compared with the desired output. The remaining half of a year's data are selected as the testing data.

Observed vs. ANN-forecasted groundwater tables (P4).

Evaluation statistics of the ANN forecast.

Figure 2 illustrates examples at P1 of the observed groundwater tables, the forecasted groundwater tables from a multiple linear regression (MLR) model and the ANN model. Due to the complicated geological characteristics and hydrological processes, the relationship between the input (reservoir level, rainfall) and the output (groundwater table) is highly nonlinear. Therefore, the MLR model is not suitable to serve our study purpose and produces inferior forecasting results, especially at the extreme values. In contrast, the ANN forecast successfully resolves the rising and falling tendencies of the groundwater tables, resulting in a rather reasonable groundwater table forecast. The scatter plots of the observed groundwater tables and the ANN forecast are presented in Fig. 3. The response of the groundwater tables to the system forcings, for such a confined and wet catchment, is rapid and sensitive. The correlation fades out between the inputs and outputs when the leading time progresses; this leads to the model performance deterioration at 3 and 7 days. The groundwater tables experience a drastic drop in July and August 2013, caused by a continuous 2-month drought. As such an extreme drought condition does not exist in the training data, the ANN tends to overpredict the groundwater tables for that period.

Scatter plots of observed and ANN-forecasted groundwater tables (P4).

Figures 4 and 5, respectively, present the groundwater table time series and scatter plots at P4. P4 is located near the Upper Seletar Reservoir, and the groundwater table is affected by the spillway discharge released from the reservoir. Failing to include the spillway information makes the ANN less competent in capturing the groundwater table extreme values caused by the spillway discharge, and hence results in the lower forecast accuracy at P4.

Table 1 summarizes the ANN forecast efficiency through evaluating the root mean square error (RMSE) and the correlation
coefficient (

This study, for the first time, applies artificial neural networks (ANNs) to predict the groundwater table variations in a tropical wetland – the Nee Soon Swamp Forest (NSSF) in Singapore. The ANN model solely utilizes the easily accessible surrounding reservoir levels and rainfall as inputs to forecast the groundwater tables, without requiring any other prior knowledge of the system's physical properties. The ANN forecast shows generally promising accuracy, while its performance decreases when the leading time progresses due to the fading correlation between the network inputs and outputs.

In this study, surrounding reservoir levels and rainfall are selected as ANN inputs. The limited number of inputs eliminates the data-demanding restrictions inherent in the numerical models. However, improvements are expected if more variables can be involved in the training, cross-validation, and testing process; such variables, for example, are spillway discharge, evapotranspiration, soil properties, and water level measurements. Less data demanding, lower computational cost and higher site-specific forecast accuracy are the advantages of the ANN-based approach over the physical-based numerical models. Numerical models, however, can be applied to describe the spatiotemporal variations of the system process over the entire model domain provided with sufficient information of the model inputs. Therefore, the ANN and numerical model can act as natural complements in such a way that ANN is more suitable for site-specific forecast while the numerical model provides a better spatial coverage.

This study forms part of the research project “Nee Soon Swamp Forest Biodiversity and Hydrology Baseline Studies – Phase 2” funded by National Parks Board (NParks), Singapore. The authors are also grateful for the data support from Public Utilities Board (PUB), Singapore for making this study possible. Edited by: D. Solomatine