Effect of water surface area on the remotely sensed water quality parameters of Baysh Dam Lake, Saudi Arabia

Water quality parameters help to decide the further use of water-based on its quality. Changes in water surface area in the lake shall affect the water quality. Chlorophyll a, nitrate concentration and water turbidity were extracted from satellite images to record each variation on these parameters caused by the water amount in the lake changes. Each water quality measures have been recorded with its surface area reading to analyses the effects. Water quality parameters were estimated from the Sentinel-2 sensor based on the satellite temporal resolution for the years 2017–2018. Data were pre-processed then processed to estimate the maximum chlorophyll index (MCI), green normalized difference vegetation index (GNDVI) and normalized difference turbidity index (NDTI). The normalized difference water index (NDWI), was used to calculate and record the changes in the water surface area in Baysh Dam Lake. Results showed different correlation coefficients between the lake surface area and the water quality parameters estimated remote sensing data. The response of the water quality parameters to surface water changes was expressed in four different surface water categories. MCI is more sensitive to surface water changes rather than GNDVI and NDTI. Neural network analysis showed a resemblance between GNDVI and NDTI expressed in sigmoidal function while MCI showed a different behavior expressed in exponential behavior. Therefore, monitoring of the surface water area of the lack is essential in water quality monitoring.


Introduction
Water bodies in lakes and dams pools exposed to many factors that affect the water quality; the climate changes disturb the water's temperatures and that leads to an increase or decrease in the evaporation rate which plays a big role in pollutant concentrations. The ecosystem of Wadi of Baysh contains a considerable amount of vegetation form and a large number of trees; in the rainy season, most of that ecosystem was submerged by water [1,2].
The organic marital from inside the lake is affecting the water quality. Also, runoff possibly will transport leaves and wooden pieces to dam's lake as well as the sediment particle which is the driving force of lake water turbidity [3,4]. The amount of these organic marital in the lake is a fundamental part of the living organisms in the dam lake including bacteria and algal live cycles [5].
For seeking knowledge, losing 1,000 m 3 of freshwater is not a disaster. Nevertheless, keeping around 140 million m 3 of rainwater for microorganisms and algae to grow could be a catastrophe. Throughout history, small polluted pools were responsible for hundreds of deaths. The rainwater at the end of any watershed contains many elements and organic materials [6,7].
Microorganisms consume organic material and deoxidized the water. After that, the green algae start to grow and creating visible layers over the surface, some kinds of these algae generate toxic gases and pollute all the water bodies [8].
If the lake at Baysh Dam starts to develop such harmful algae colonies on its surface in the presence of sunlight and shortage of rainfalls, the development of harmful algae could be uncontrolled and pollutes the soil and groundwater [9].
In some cases, the change in the water quality measures could be minor and unnoticeable, but with continuity and time, the water body will get contaminated and then will affect the ecosystem around it. Water quality monitoring and pollution prevention are better than having over 100 million m 3 of contaminated water in one location will affect the region. Also, it will need huge budgets for future treatments [10,11].
Baysh Dam designed to hold 190 million m 3 within a surface area of 8 km 2 ; the area at full capacity. The actual surface area has never been recorded at full capacity for safety purposes [12]. The maximum safe operating capacity at Baysh Dam is 120 million m 3 with a surface area of 4.4 km 2 ; at the safe operational level the surface area rapidly changes with any inflow or outflow from the dam; rapids change happen due to the shape of the Wadi at operational elevation.
The quest for remote sensing applications to monitor water quality parameters is required to minimize the human efforts to the lowest level [13]. Sentinel-2 sensor developed by the European Space Agency (ESA) provides data with high spatial resolution and equipped to practice models to detect water quality parameters. A most recent study on Sentinel-2 shows that the most accurate algorithm to acquire the highest reflectance for normalized difference water index (NDWI) coming from bond 5 and bond 3 [14].
Sentinel-2 bands were used to record the surface area of the lake and to develop a model to detect chlorophyll and nitrogen concentrations with low root mean squared error [4,15]. Furthermore, the selected satellite occupied with multispectral imager multispectral instrument (MSI) which studied and proved in more than one study to be more accurate than the moderate resolution imaging spectroradiometer. MSI has been used to detect suspended particulate matter in the water body and its results were accepted with a wavelength range of 560 to 780 nm [16].
The main objective of the current study is to monitor the effect of the lake surface area on the water quality. Maximum chlorophyll index (MCI), green normalized difference vegetation index (GNDVI) and normalized difference turbidity index (NDTI) will be estimated to represent the water quality parameters in the dam lake and NDWI will be used to delineate the lake surface area. Partition analysis and artificial neural network analysis will be used to envisage the water surface area's effect on the estimated water quality parameters.

Study area
Baysh Dam is located in the western part of the Asir Mountains, approximately 100 km north of Jizan City, Saudi Arabia (Fig. 1). The dam is in an arid region with a distinguished difference in temperatures which has a huge effect on algae growing and oxygen dissolving eutrophication processes. The dam is constructed for flood control, irrigation of farmland and groundwater recharge. Also, there is a water treatment plant located about 5 km from Baysh Dams' gates. The plant operates in two phases, the first phase is the conventional water treatment and the second phase is the reverse osmosis water treatment plant. The water treatment plant produces 70,000 m 3 /d of irrigating water and been managed and used by the Ministry of Environment water and agriculture. The catchment area of the dam is more than 4,000 km 2 [3,6]. The turbidity of the dam lake is acceptable as much as the water volume behind the dam is over 80 million m 3 [4]. On the other hand, the water treatment plant requires low turbidity to operate in normal mode so the dam's authority in Jazan opens the dam gates to lower the water level for safety sake and not to decrease it less than 80 million m 3 in order to get low turbidity water for the treatment plant.

Remote sensing data collection
Data collection started in January 2017 and last until December 2018 on a temporal resolution of the satellite instrument which resulted in 52 scenes in total. The sensor is made of 12 spectral bands, 3-visible bands (VI) with 10 m resolution, 5-vegetation red edge (VRE) and infrared (IR) bands of 20 m resolution of and 2-short-wave infrared (SWIR) bands 60 m resolution in addition to 3 bands related to coastal aerosols and water vapor of 60 m resolution. ESA two levels of treated images which are 1B and 1C [17]. Level 1C been used in this paper because 1C images contain radiometric and geometric corrections. The geodetic system for level 1C images is WGS84 [18].

Realization of water quality parameters
Three different remotely sensed indices were obtained to represent three different water quality parameters, the MCI, GNDVI, and NDTI. The water quality parameters of MCI, GNDVI, and NDTI were realized according to Matthews et al. [19], Gitelson and Merzlyak [20] and Lacaux et al. [21] respectively. Detailed exercises of the water quality parameter realizations were discussed in Elhag et al. [4]. While the NDWI was found by Gao [22] Then improved by Ganaie et al. [23] to measure the liquid water molecules at the top of the canopy level. NDWI is calculated by the following equation: where NIR is Sentinel-2 near-infrared band; SWIS is Sentinel-2 short-wave infrared band.

Regression analysis
The regression analysis is the practice of creating a curve, or mathematical function that has the best fit to a series of data points, possibly subject to constraints. There are several fitting functions and there is no general best fit. The best fit is a data dimension and mathematical function dependent [24][25][26][27].
In order to describe the effect of the surface area on the water quality parameters at the dam's lake, relations between different surface water areas and water quality measures must be examined. The scatter plot has been conducted on both variables to visualize the connection between water surface area and the quality parameters. The readings of the water quality measures are independent variables, also, the calculated area values are independent. In this case, the principal component analysis, neural network analysis, and partition analysis are the verified methods of exploring the relationship between two independent variables [28,29].

Principal component analysis
Principle component analysis (PCA) is performed to transform a set of likely correlated with unlikely correlated variables. Principal components number is less/equal to the variable's original number. Following Monahan [30], PCA fundamental equations are: First vector w (1) should be answered as follows: The matrix form of the above equation gives the following: w (1) should be answered as follows: Originated w (1) suggests that first component of a data vector x (i) can then be expressed as a score of t 1(i) = x (i) × w (1) in the transformed coordinates, or as the corresponding vector in the original variables, (x (i) × w (1) ) w (1) .

Neural network analysis
The neural network regression model is written as: where Y = E(Y|X). This neural network model has one hidden layer, but it is possible to have additional hidden layers. The φ(z) the function used is the hyperbolic tangent activation function. It's used for logistic activation for the hidden layers.
It is significant that the final outputs to be linear not to constrain the predictions to be between 0 and 1. The equation for the skip-layer neural network for regression is shown below: Cross-validation is therefore critical to make sure that the predictive performance of the neural network model is adequate. Recall the skip-layer neural network regression model looks like this:

Partition analysis
The partition methods used to contribute all the conditions to the main function of this paper. Each quality parameter in the lake has its characters and conditions, consequently, the changes in the surface area affect each parameter in a special way which been explained throughout the partition analysis [31].
Euler invented a generating function which gives rise to a recurrence equation in P(n) Berndt [32], where σ 1 (n) is the divisor function as well as the identity. A recurrence relation involving the partition function Q is given by Hirschhorn [33]:

Results and discussion
Changes in the lake's surface area have a clear effect on the dam's water quality. As the surface area and remotely sensed water quality values been collected from satellite images, the relation between these two is water surface area dependent. Whenever the surface area of the dam's lake changes, the water quality of the dam lake got affected. Even though, the effect on the MCI values is weak but has the same inverse relation with surface area [34].

Regression analysis
Regression results showed that mean pixel values were the best to present the coherent association between the water quality parameters and the remotely estimated surface area. Changings in the surface area effect each water quality parameter in a slightly different way. MCI, GNDVI, and NDTI were the main quality parameters in this study. Fig. 2 shows a robust correlation of MCI mean pixel values (R 2 = 0.94) with the dam lake surface area in km, also, it clarifies the positive connections of the MCI mean values. The same processes were conducted on GNDVI and NDTI values to find and represent the correlation between the variables. R 2 for the GNDVI and NDTI mean values are counted for 0.95 each.

Principal component analysis
Root mean square error (RMSE) was conducted to confirms the association between the mean value of the in-situ water quality measurements and the conducted values from remote sensing data according to the summary of the fit analysis. the effect of the area change has a clear on NDTI with very minor on the other components, MCI and GNDVI. But with a separate analysis for each quality measure, more than 95% of the quality values are responding positively with the decreased surface areas [35]. The direction and magnitude of the mixed connection between the quality measures and the change of the surface area are described in Fig. 3. The separated analysis of the quality data could be misguided because of the outlier numbers. Also, each quality parameter has its correlation line, which is different than the other parameter [36,37].
MCI has its own in response to the dam lake surface area changes rather than GNDVI and NDTI. This finding is also supported by the neural network analysis showed in Table 1, where there the prediction profile of the MCI expresses an exponential trend while GNDVI and NDTI express a sigmoidal trend.

Neural network analysis
The total number of contributed values which injected in the neural network is 51 values using 1 hidden layer and two nodes as shown in Fig. 4. The hidden layer on this neural network is sensitive to the change in surface area. As a result of the quality parameters, it is promising results with a very low percentage error.
The MCI values have a percentage error of less than 0.0012%, and the regression line of the points [38] has an R 2 value of 0.977. The predicted values of MCI with the measured data generate an exponential data line which clarifies the connection between the water surface area and MCI concentration at Baysh Dam [4,14].
The sigmoidal function is shown in Table 1 for the GNDVI and NDTI values, the regressing lines interact inversely with surface area changes but not in an exponential manner as the concentration of chlorophyll does. For nitrogen concentration, the number of points used in this specific neural network is 34 readings with an R 2 value of 0.953. The total number of the nitrogen reading is 51, but   34 were used to keep 17 values for validation of the results from the neural network. For the water turbidity, R 2 for the measured value is 0.95 and for the predicted values is 0.98; for the same parameter, RMSE is 0.00026. The validation process for all parameters is presented in Table 1.

Partition analysis
The surface area values were divided into four area levels to emphasis on minor changes in the water quality parameters (Fig. 5). The effect on the chlorophyll concentration was minor because of the interaction with other factors. But the effect is trackable and notable. There are four splits in the partition analysis based on the LogWorth statistics. The decision tree showed unevenness in surface area splits affecting the MCI indicating the later sensitivity to the surface area [34,39]. The surface area of 3.28 km 2 has the maximum LogWorth value (4.99) pointing out the optimal split [39].
The same procedures were conducted on the GNDVI and NDTI illustrated in Figs. 6 and 7 respectively. Although   GNDVI and NDTI showed decision tree evenness with four splits, the optimum LogWorth values counted for 23.7 and 15.63 correspondingly at the same surface area split (3.28 km 2 ). Such finding supports the vulnerability of nitrogen concertation towards lake surface area changes [40]. Therefore, monitoring the dam lake surface area based on the LogWorth statistics is very crucial. In the current study, the lake surface area of 3.28 km 2 demonstrated to be critical to the estimated water quality parameters.

Conclusions
Changes in the lake's surface area have a clear effect on the turbidity of the Dam's water. As the surface area and NDTI values been collected from satellite images, the relation between these two is proportionally related. Whenever the surface area of the dam's lake increases, the turbidity of the water decreases. Even though, the effect on the MCI values is weak but has the same consistency relation with the surface area. The surface area of the lake surface is a supplementary expression of the water amount in the Baysh Dam. With the analysis of water quality parameters in the last two years, the relation between the amount of water expressed in this study as the water surface area and the chlorophyll concentration, nitrogen concentration, and the sedimentation process is a corresponding relation. Nevertheless, chlorophyll concentration expressed a sensitive behavior to changes in the lake surface while nitrogen concentration and turbid-ity expressed more steady behavior.