Integrated versus isolated scenario for prediction dissolved oxygen at progression of water quality monitoring stations

This study examined the potential of Multi-layer Perceptron Neural Network (MLP-NN) in predicting dissolved oxygen (DO) at Johor River Basin. The river water quality parameters were monitored regularly each month at four different stations by the Department of Environment (DOE) over a period of ten years, i.e. from 1998 to 2007. The following five water quality parameters were selected for the proposed MLP-NN modelling, namely; temperature (Temp), water pH, electrical conductivity (COND), nitrate (NO 3) and ammonical nitrogen (NH3-NL). In this study, two scenarios were introduced; the first scenario (Scenario 1) was to establish the prediction model for DO at each station based on five input parameters, while the second scenario (Scenario 2) was to establish the prediction model for DO based on the five input parameters and DO predicted at previous station (upstream). The model needs to verify when output results and the observed values are close enough to satisfy the verification criteria. Therefore, in order to investigate the efficiency of the proposed model, the verification of MLP-NN based on collection of field data within duration 2009–2010 is presented. To evaluate the effect of input parameters on the model, the sensitivity analysis was adopted. It was found that the most effective inputs were oxygen-containing (NO 3) and oxygen demand (NH 3-NL). On the other hand, Temp and pH were found to be the least effective parameters, whereas COND contributed the lowest to the proposed model. In addition, 17 neurons were selected as the best number of neurons in the hidden layer for the MLP-NN architecture. To evaluate the performance of the proposed model, three statistical indexes were used, namely; Coefficient of Efficiency (CE), Mean Square Error (MSE) and Coefficient of Correlation (CC). A relatively low correlation between the observed and predicted values in the testing data set was obtained in Correspondence to: A. Najah (ali najah@ymail.com) Scenario 1. In contrast, high coefficients of correlation were obtained between the observed and predicted values for the test sets of 0.98, 0.96 and 0.97 for all stations after adopting Scenario 2. It appeared that the results for Scenario 2 were more adequate than Scenario 1, with a significant improvement for all stations ranging from 4 % to 8 %.


Introduction
Water is a vital resource necessary for all aspects of human and ecosystem survival and health.In addition to drinking and personal hygiene, water is needed for agricultural production, industrial and manufacturing processes, hydroelectric power generation, waste assimilation, recreation, navigation, enhancement of fish and wildlife and a variety of other purposes (Biswas, 1981).The term water quality is used to describe the condition of water, including its chemical, physical and biological characteristics.Water quality is one of the main characteristics of a river, which purpose is not only for human water supply (Dogan et al., 2009;Lopes et al., 2005).
Dissolved oxygen (DO) is one of the important water quality parameters for the survival of aquatic life.It is a critical parameter used frequently and continuously to determine the water quality of rivers.The sources of DO in a water body include re-aeration from the atmosphere, photosynthetic oxygen production and DO loading.The sinks include oxidation of carbonaceous and nitrogenous materials, sediment oxygen demand, and respiration by aquatic plants (Kuo et al., 2007).The problems associated with low concentrations of DO in rivers have been recognized for over a century and the impacts of low DO concentrations or, at the extreme, anaerobic conditions in a normally well-oxygenated river system, are an unbalanced ecosystem with fish mortality, odours and other aesthetic nuisances.When DO concentrations are reduced, aquatic animals are forced to alter their breathing patterns or lower their level of activity.Both of these actions will Published by Copernicus Publications on behalf of the European Geosciences Union.
retard their development, and can cause reproductive problems (such as increased egg mortality and defects) and/or deformities (Kalff, 2002;Cox, 2003) The simplicity of measurement of dissolved oxygen obscures the fact that a number of physical and chemical processes within the water body contribute to the dissolved oxygen level within the stream (Lopes et al., 2005).Notably, the Winkler method is the most reliable technique used to measure dissolved oxygen in freshwater systems.This is a multistep chemical method, where the test is performed on-site, as delays between sample collections and testing may result in an alteration in oxygen content (Sengorur et al. 2006).
Water quality modelling is the basis of water pollution control project.It predicts the water quality tendency of varieties according to the current water environment quality condition, transfer and transformation rule of the pollutants in the river basin (Najah et al., 2010).In addition, several water quality models, such as determistic and stochastic models have been developed in order to manage the best practices for conserving water quality (Hull et al., 2008;Einax et al., 1999).Most of these models are very complex and require a significant amount of field data to support the analysis.Furthermore, many statistical-based water quality models, which assume the relationship between response variable and prediction variable, are linear and distributed normally (Ansare et al., 2000;Garcia et al., 2002).However, as water quality can be affected by so many factors, traditional data processing methods are no longer efficient enough for solving the problem (Xiang et al., 2006), as such factors show a complicated non-linear relation to the variables of water quality forecast.Therefore, utilizing statistical approaches usually does not possess high precision (Rankovic et al., 2010).
Recently, the neural networks approach has been applied to many branches of science.There are a number of studies in which neural networks are used to address water resources problems (Alvisi et al., 2006;El-shafie et al., 2008;Najah et al., 2009;Akhtar et al., 2009;Hung et al., 2009;Ming et al., 2010, Najah et al., 2010a, b;El-shafie et al., 2011).In water quality issues, artificial Neural Networks (ANNs) were first applied by French and Recknagel (1994) to the task of learning to predict algal blooms from water quality databases.In their application, a feed-forward ANN was trained to make predictions of abundance of species of phytoplankton in Saidenbach Reservoir, Germany.Similarly, Yabunaka et al. (1997) also applied ANNs to predict algal bloom by simulating the future growth of five phytoplankton species and the chlorophyll-a (Chl-a) concentrations in the same lake.
Hence, motivated by the successful applications in modelling non-linear system behaviours in a wide range of areas, ANNs are used to predict water quality parameters in complex systems.The literature offers some recent successful ANNs applications related to water quality predictions.The main intentions were to minimize fieldwork and improve the accuracy of prediction.For instance, Hatzikos et al. (2005) utilized neural networks with active neurons as a modelling tool for predicting seawater quality indicators, such as water temperature, pH, dissolved oxygen and turbidity.
The water quality of Johor River is deteriorating due to increasing levels of various pollutants.It continues to be silted and contaminated by wastes due to the lack of enforcement by local authorities.These contaminants eventually flow into the estuaries of Johor River, which are rich habitats that provide spawning and feeding areas for fish and poultry.
This study demonstrated the application of Artificial Neural Network to predict water quality parameters in terms of dissolved oxygen (DO), having the dynamic processes hidden in the measured data itself.The use of Multi-Layer Perceptron Neural Network (MLP-NN) model in water quality prediction in Johor River could be complementary in capturing patterns of historical data set and improving the prediction accuracy.

Study area data analysis
Johor is the second largest state in the Malaysia Peninsular, with an area of 18 941 km 2 .The Johor River and its tributaries are important sources of water supply, not only for the state of Johor but also for Singapore.The river comprises 122.7 km long drains, covering an area of 2636 km 2 .It originates from Mount Gemuruh and flows through the southeastern part of Johor and finally into the Straits of Johor.The catchment is irregular in shape.The maximum length and breadth are 80 km and 45 km, respectively.About 60 % of the catchment comprises undulating highlands rising to a height of 366 m, while the remainder encompasses lowland and swampy areas.
The station's location map is provided in Fig. 1a and b.This station includes four locations along the main stream of the river, which are near to the mouth of the major tributaries.The proposed models in this research were constructed under the assumption that land use/cover has remained unchanged during the study period.However, land use/cover is an important factor for the prediction of water quality parameters.A more precise prediction of water quality parameters could be achieved by adding variables representing the land use/cover status into the model.
The selection of appropriate input parameters is a very important aspect for the neural network modelling.In order to use the MLP-NN structures effectively, the input parameters must be selected with great care.This highly depends on better understanding of the problem.The choice of input parameters based on statistical correlation analysis is the most popular analytical technique for selecting input.The drawback of cross-correlation is that only able to capture linear dependence between two variables.Consequently, it can lead Hydrol.Earth Syst.Sci., 15, 2693Sci., 15, -2708Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/2693/2011/  to omission of important inputs that are nonlinearity related to the output.To evaluate the effect of input parameters on the model, two evaluation processes were used.First, a priori knowledge supported by statistical correlation analysis.The second assessment process was based on the prediction accuracy of water quality parameters.Moreover, In the literature, various input parameters have been used to create the model for predicting dissolved oxygen (Table 1).
Based on the literature, existing measured values and statistical analyses, the following five water quality parameters were selected for the ANN modelling in this study, namely; temperature (Temp), water pH, electrical conductivity (COND), nitrate (NO 3 ) and ammonical nitrogen (NH 3 -NL).The river water quality parameters were monitored regularly each month at four different stations by the Department of Environment (DOE) over a period of ten years, i.e. from 1998 to 2007.The analysed results of the study sites are given in Table 2.
The water quality data that used in this study was collected within the Johor River.Both in-situ measurements and laboratory analysis were conducted.Four observation points were selected based on the location of the stations.At each point samples were taken at three different depths.These depths are surface, middle and bottom.All these parameters are measured during sampling using a water quality checker which is known as Multi-parameter YSI 550A with five sensors.The YSI was re-calibrated daily to ensure data accuracy.
pH is the indicator for acidic and alkaline conditions of water status.Notably, the INWQS threshold range of pH for Malaysian rivers is 5.00 to 9.00 (DOE, 1994).From the results, the mean pH of the Johor River varied from 6.22 to 6.39.At all stations, pH was almost equal and did not show a statistically significant difference.Basically, the pH value is controlled by the dissolved carbon dioxide (CO 2 ), which forms carbonic acid in water (Hem, 1985).The main source of such chemical should be urban runoff or industrial wastewater.
On the other hand, electrical conductivity (COND) is associated with major water quality parameters due to the dilution effect of stream flow and can be used as a general water quality indicator.The mean COND of Johor River varied from 20.01 µs to 22.14 µs except at Station 3, which was 53.8 µs.This change in COND might be an indicator of a discharge or some other source of pollution that entered the stream.
Meanwhile, NO 3 ion is usually derived from anthropogenic sources like agricultural fields, domestic sewage and other waste effluents containing nitrogenous compounds (Das and Acharya, 2003).It appeared that the mean values among all locations did not vary greatly.
Ammoniacal nitrogen (NH 3 -N) is used to measure the amount of ammonia, i.e. a toxic pollutant often found in agriculture fertilizer and domestic sewage.NH 3 -N has been promoted as a tool to define the status of surface water quality in Malaysia (DOE, 2003).The mean NH 3 of the Johor River varied from 0.1 mgl 1 to 0.15 mgl 1 .At all stations, NH 3 was almost equal and did not show a statistically significant difference.
The coefficient of variation (CV) is employed to measure the data statistical dispersion, which is the mean normalized standard deviation of the given data set (Singh et al., 2009).
All parameters showed a coefficient of variation between 3.08 % and 214.96 %.Such variability among the samples might be due to the large geographical variations in climate influences in the study area.Temperature showed the lowest variation which might be due to the buffering capacity of the river.The correlation coefficient between DO and the input parameters was calculated and presented in Table 3.

Artificial Neural Network (ANN)
An artificial neural network (ANN) is tailored to mimic natural neural networks using a computing process (Haykin, 1999).Among many types of ANNs, the most widely used is the feed-forward neural network such as multi-layer perceptron (MLP) network with back-propagation training algorithm.The MLP is organized as layers of computing elements, known as neurons, which are connected between layers via weights.Apart from the input layer receiving inputs from the environment and the output layer generating the network's response, one or more intermediate hidden layers also exist.For brevity, we refrain from discussing the details of the neural network methodology and instead refer the reader to the papers written by Lek et al. (1996b), as well as Olden and Jackson (2001) for more comprehensive treatments.
Generally, forecasting models can be divided into statistical and physical based approaches.Statistical approaches determine the relationships between historical data sets, whereas physical based approaches model the underlying processes directly.MLP networks are closely related to statistical models and are the type of ANN most suited to forecasting applications (Rumelhart et al., 1986).When using ANNs for forecasting, the modelling philosophy employed is similar to that used in traditional statistical approaches.In both cases, the unknown model parameters (i.e. the connection weights in the case of ANNs) are adjusted to obtain the best match between the historical set of model inputs and the corresponding outputs.
These neural networks are commonly used in ecological studies because they are believed to be universal approximates of any continuous function (Hornik and White, 1989).A neural network consists of at least three or more layers, which comprise an input layer, an output layer and a number of hidden layers, as shown in Fig. 2. Each neuron in one layer is connected to the neurons in the next layer, but there are no connections between the units of the same layer (Kasabov and Foundations, 1996).The number of neurons in each Hydrol.Earth Syst.Sci., 15,[2693][2694][2695][2696][2697][2698][2699][2700][2701][2702][2703][2704][2705][2706][2707][2708]2011 www.hydrol-earth-syst-sci.net/15/2693/2011/  layer may vary depending on the problem.The weighted sum of the input components is calculated as follows (Freeman and Skapura, 1991): where Net j is the weighted sum of the j th neuron for the input data received from the preceding layer with n neurons, W ij is the weight between the j th neuron and the i th neuron in the preceding layer and θ j is the bias term of the j th neuron.The output of the j th neuron out j is calculated with a sigmoid function as follows: The network is trained by adjusting the weights.The training process is done with a large number of training sets and training cycles (epochs).The main goal of the learning procedure is to find the optimal set of weights, which can ideally produce the correct output for the relative input.The output of the network is compared with the desired response to determine the error.The performance of the MLP is measured in terms of a desired signal and the criterion for con-vergence.For one sample, it is determined by the sum square error (SSE), expressed as follows: where T i and out i are the desired (target) output and output of the neural network, respectively, for the i th output neuron, and m is the number of neurons in the output layer.

Dissolved Oxygen (DO) prediction with MLP-NN
In fact, the prediction procedure is, by detention, an operation through which the future dissolved oxygen pattern can be provided.In this study, the ANN with its non-linear and stochastic modelling capabilities was utilized to develop a prediction model that mimicked the DO pattern at the Johor River based on the five input parameters (Scenario 1) mentioned earlier, which can be expressed as follows: where DO N is the dissolved oxygen at station N, and  Most of the recent studies attempted to predict the concentrations of DO at each station.Generally, the water pollution of a downstream station is affected by the discharge of local area from the upstream station (Zaqoot et al., 2009).Hence, it was required to consider the effect of DO at the upstream station in the proposed model.Therefore, second scenario (Scenario 2) was formed to establish the model prediction for DO at each station based on the five input parameters.The predicted DO at the previous station (upstream) can be expressed following Eq.( 6).This procedure of using the predicted DO can be repeated for the third and fourth stations at downstream.The schematic representation of the proposed networks for Scenario 2 is shown in Fig. 3.
The ANN models were established using the above two equations.The architecture of the networks consisted of an input layer of five and six neurons for Scenario 1 and Scenario 2, respectively.The hyperbolic tangent sigmoid transfer function was employed between the input and the hidden layers.Moreover, a linear transfer function was employed between the hidden and output layers (corresponding to the predicted DO).Finally, the optimal ANN, together with a flowchart of the algorithm's procedure, is shown in Fig. 4.
It is important to divide the data set in such a way that the training, validation and test data sets are statistically comparable.In this study, the water quality data were divided into three sets data set used as the validation set and 15 % of the data set, which the network had never seen before, was used as the testing set.The statistical properties (i.e.mean, standard deviation, range) from them were compared, as shown in Fig. 5.
According to the statistical properties of those data sets, no significant differences between the divisions of the data were observed.All samples were normalized in the [0 1] range.Thus, all of the data sets (X i ) from the training, validation and test sets were scaled to a new value x i as follows:

Selection of back propagation training algorithm
The back propagation (BP) learning algorithm (Rumelhart et al., 1986) is a method conventionally used to perform the training of Artificial Neural Networks for adjusting weighted connections.Standard back propagation is a gradient descent algorithm in which the network weights are moved along the negative of the gradient of the performance function.Although traditional BP uses a gradient descent algorithm to determine the weights in the network, it computes rather slowly due to linear convergence.
There are a number of variations on the basic algorithm that are based on other standard optimization techniques.In this paper, Levenberg-Marquardt algorithm (LMA) was used, which appears to be the fastest method for training moderate-sized feed-forward neural networks (Demuth et al., 2008).The LMA is a very simple but robust method, which provides a numerical solution to the problem of minimizing a function over a space of parameters for the function.Principally, it involves in solving the following equation: where I is the identity matrix, J is the Jacobian matrix for the system, λ is the Levenberg's non-negative damping factor, δ is the weight update vector that we want to find and E is the error vector containing the output errors for each input vector used in training the network.The δ tells us by how much we should modify our network weights to reach a better solution.λ is adjusted at each iteration.If the reduction of E is rapid, a smaller value can be used, bringing the algorithm closer to the Gauss-Newton algorithm, whereas if iteration gives insufficient reduction in the residual, λ can be increased, giving a step closer to the gradient descent direction.In that way, LMA is considered as a hybrid between the classical Newton and steepest descent algorithms (Souza et al., 2009).The Jacobian matrix can be created by taking the partial derivatives of each output in respect to each weight and has the following form: (9)

Performance criteria
Due to the fact that water parameters had been truthfully monitored over the 10-yr period, the performances of the  proposed models could be examined and evaluated.The performances of the models were evaluated according to three statistical indexes.Coefficient of Efficiency (CE) is often used to evaluate the model performance, introduced by Nash and Sutcliffe (1970).
where n is the number of observations, DO p and DO m are the predicted and measured dissolved oxygen, respectively, and DO m is the average of measured dissolved oxygen.The Mean Square Error (MSE) can be used to determine how well the network output fits the desired output.The smaller values of MSE ensure better performance.It is defined as follows: The coefficient of correlation (CC) is often used to evaluate the linear relationship between the predicted and measured dissolved oxygen.It is defined as follows: 3 Results and discussion

Optimizations of the Neurons Number
One of the most important characteristics of MLP-NN is the number of neurons in the hidden layer.If an insufficient number of neurons are used, the network will be unable to model the complex data and the resulting fit will be poor.On the contrary, if too many neurons are used, the training time may become excessively long and the network may over fit the data.In this study, the number of neurons needed in the hidden layers to achieve the precision criteria was generally determined by trial and error approach.The optimum number of neurons was determined based on the minimum value of Mean Square Error (MSE) of the training data set.The training of the MLP-NN was performed with a variation of 1-20 neurons.Figure 6 shows the relationship between the numbers of neurons versus MSE during training.It was obvious from the figure that the MSE equalled to 0.2761 when one neuron was used and decreased to 0.0310 when 17 neurons were used.Enlarging the neurons more than 17 did not significantly decrease MSE.Thus, 17 neurons were selected as the best number of neurons.

Test and validation of the model
Figure 4 shows the proposed architecture used to predict the dissolved oxygen at Station 1 (DONN1), which was developed according to the procedure discussed in the previous Hydrol.Earth Syst. Sci., 15, 2693-2708, 2011 www.hydrol-earth-syst-sci.net/15/2693/2011/ section.Training, validation and testing processes of the MLP-NN model were performed to minimize the Mean Square Error (MSE) between the output and the desired response, as shown in Fig. 7.It was apparent that the performance goal was achieved in less than 15 iterations (epochs).
On the other hand, Fig. 8 illustrates the comparison between the predicted versus observed DO using 45 • line of graph and two deviation lines of ±15 % deviation from the 45 • line for both validation and testing data sets.It was obvious from Fig. 8 that DONN1 can predict the DO with a high level of accuracy, whereby the error for majority of the records did not reach 15 %, while the error of a few records fell within 15 %.

Sensitivity analysis
To evaluate the effect of input parameters on the model, two evaluation processes were used.First, the performance evaluation of various possible combinations of the parameters was investigated utilizing Coefficient of Efficiency (CE) and Mean Square Error (MSE) approaches to determine the most effective parameters on the output.Overall, six networks were compared, as shown in Table 4.Each one demonstrated how significant the eliminated parameter would affect the network accuracy.Apparently, the precision of MLP-NN became higher if all the suggested parameters were used as the input to the model, where minimum MSE and CE were determined to be 0.05 and 0.95 for the testing data set, respectively.Meanwhile, the level accuracy of the second network slightly decreased (MSE = 0.07, CE = 0.91) when COND was eliminated.
Compared with the results reported in the previous research, Rankovic et al. ( 2010) developed a separate neural network model for each independent input variable in order to determine the most effective variable.The correlation was 0.4802 between the conductivity and DO for the testing data set.The negative correlation of DO with electrical conduc-  tivity was well documented (Zaqoot et al., 2009).Although it was evident that COND was less effective on the DO, the level of accuracy increased 3 % when we considered it in the model input, as shown in the first network in Table 4. Therefore, COND would not be neglected in this study.Conversely, the Coefficient of Efficiency reduced gradually if any one of the input parameters was removed, which reduced the ability of ANN in the capability prediction.Furthermore, DO was found to be sensitive to the No 3 parameter, where the level accuracy of model six (eliminate No 3 ) decreased (MSE = 0.26, CE = 0.6) for the testing data set.Singh et al. (2009) computed the DO levels in the Gomti River (India) using three-layer feed-forward neural networks with back propagation learning.The sensitivity analysis revealed that NO 3 provided relatively higher contributions to the network.Figure 9 shows the effect on predictive accuracy if any one of the input parameters was removed from the model.For each of the five input parameters, removing the   parameter dramatically increased the predictive error.Thus, the five input parameters were essential to the model.The second assessment process was based on partitioning the neural network connection weights in order to determine the relative importance of each input parameter in the network (Garson, 1998;Emad et al., 2010).In this study, the proposed network consisted of five environmental parameters.Assuming the connection weights from the input nodes to the hidden nodes demonstrate the relative predictive importance of the independent parameter, the importance of each input parameter can be expressed as follows: where I j is the relative importance of j th input parameter on the output parameter, N i and N h are the numbers of input and hidden neurons, respectively, and W is the connection weight.Meanwhile, superscripts "i", "h" and "o" refer to the input, hidden and output layers, respectively, whereas subscripts "k", "m" and "n" refer to the input, hidden and output neurons, respectively.Table 5 shows the connection weights values for the proposed model.It is important to note that Garson's algorithm uses the absolute values of the connection weights when calculating parameter contributions.The relative importance of each of the input parameters as computed by Eq. ( 13) is shown in Fig. 10.The relative importance showed the significance of a parameter compared with the others in the model.Although the network did not necessarily represent physical meaning through the weights, it suggested that all the parameters had strong effects on the prediction of DO, where the predictor contributions ranged from 15 to 25 %.It was obvious that the most effective inputs were those which included oxygen containing (NO 3 ) and oxygen demanding (NH 3 -NL).On the other hand, Temp and pH were found to be the least Hydrol.Earth Syst.Sci., 15,[2693][2694][2695][2696][2697][2698][2699][2700][2701][2702][2703][2704][2705][2706][2707][2708]2011 www.hydrol-earth-syst-sci.net/15/2693/2011/ effective parameters.Moreover, COND revealed the lowest contribution on the proposed model.These findings agreed with those found in previous evaluation (the combinations of parameters).

Performance of the proposed scenarios
Considering the same architecture that was used to predict dissolved oxygen at Station 1 (DONN1), the DO at Stations 2, 3 and 4 was predicted.Figure 11 demonstrates the performance of the proposed models.Apparently, the scatter plot of the three models showed that the error approximately fell on the ideal line except three records, which remarkably exceeded 15 % and were also found in the third model that was used to predict the DO at Station 3.These three records were more deviated from the observed value attributes due to the fact that the extreme values were found in the samples which were polluted by noise signals owing to systematic and random errors.The Coefficient of Efficiency (CE) and MSE computed for validation and testing data sets used for the three stations are presented in Table 6.The CE values for the three stations for the validation data set were controlled within an acceptable range, i.e. 0.9-0.97,while the CE values for the three stations for the test data set were 0.83, 0.89 and 0.86, respectively.The respective MSE values of validation and testing data sets for the three locations were 0.03-0.05for DONN2, 0.03-0.10for DONN3 and 0.06-0.10for DONN4.
The relatively low correlation between the observed and predicted values in the testing phase was perhaps due to the non-homogenous nature of water quality parameters.Moreover, Ying et al. (2007) showed that the selection of affecting factors (input parameters) plays a key role since these factors have great impact on the forecast results.Thus, it was evident that the low correlation in this study was attributed to the fact that, the input parameters did not include all the relevant parameters.In addition, water pollution at the downstream station was related to the discharge from the upstream station.Hence, to overcome the problem, this study introduced another approach (Scenario 2) so that a high level of accuracy could be reached.This approach was related to the   prediction of the DO, with consideration of the predicted DO at the upstream station as the input to the model, as expressed by Eq. ( 6).Fig. 12 demonstrates the performance of the proposed models (Scenario 2).Apparently, the scatter plot of the three models showed that the error approximately fell on the ideal line for both validation and testing data sets.
In comparison between Scenario 1 and Scenario 2, Scenario 2 was able to achieve a high level of accuracy in simulating the magnitude and patterns of DO at all stations and reducing the deviation error from ±15 % that reached by Scenario 1 to ±10 %.
For further analysis, we adopted the accuracy improvement (AI) index for correlation coefficient statistical index to measure the significance of the proposed Scenario 2 over Scenario 1, expressed as follows: ) * 100 ( 14)  Where CC Scen2 is the value of the correlation coefficient for Scenario 2, while CC Scen1 is the same statistical index for Scenario 1. Examining Table 7 carefully, it can be observed that Scenario 2 was more adequate than Scenario 1, with a significant improvement for all stations ranging from 4 % to 8 %.Prediction accuracy was significantly improved after introducing Scenario 2 for all stations.
For further assessment, the proposed models were compared with the results reported in the literature.Soyupak et al. (2003) employed the ANN modelling approach to calculate the pseudo steady state time and space dependent DO concentrations in three different reservoirs, with entirely different properties.The correlation coefficients between neural network estimates and field measurements were higher than 0.95.In addition, Sengorur et al. (2006)  The developed model was compatible with the results of other researchers/authors.High coefficients of correlation were obtained between the observed and predicted values for the test sets of 0.98, 0.96 and 0.97 for all stations.These results revealed that the input parameters selected in this study had direct relevance with the target (DO).The selection of input parameters might affect the model output remarkably (Singh et al., 2009).
The results also indicated that the proposed model was basically an attractive alternative, offering a relatively fast al- gorithm with good theoretical properties to predict the dissolved oxygen where the performance goal was achieved in less than 15 iterations (epochs) and can be extended to predict different water quality parameters.
The model needs to verify when output results and the observed values are close enough to satisfy the verification criteria.Therefore, in order to investigate the efficiency of the proposed model, the verification of MLP-NN based on collection of field data within duration 2009-2010 is presented.Scatter plots between the observed and predicted value for each of the DO for both scenarios are presented in Figs. 13  and 14.
The comparison of the proposed model (Scenario 2) and Scenario 1 for prediction water quality parameters shows that the network output of Scenario 2 could depict the behaviour of the water quality parameters pattern more accurately than Scenario 2. It can be seen that most of the predicted water quality parameter values are close to the actual observation.A value of R 2 should be close to 1, R 2 more than 0.9 indicates a very satisfactory model performance, a value between 0.6-0.9indicates a fairly good performance, and values below 0.5 indicate unsatisfactory performance.The proposed model showed efficiency in predicting the concentration of water quality parameters in the Johor River, and it was compatible with the results of other researchers/authors.The results also indicated that the proposed model was basically an attractive alternative, offering a relatively fast algorithm with good theoretical properties to predict the water quality parameters and can be extended to predict different water quality parameters.
Nevertheless, due to the fact that water quality forecast can be easily affected by external environment, the obtained www.hydrol-earth-syst-sci.net/15/2693/2011/ Hydrol.Earth Syst.Sci., 15, 2693-2708, 2011 model sometimes produces results much deviated from the actual values, therefore further study needs to be done in future work to identify the suitable forecast model, understand its laws of changes and solve the problem of forecast deviation.In general, this research work has managed to integrate several analytical and modeling methods that would prove to be useful for various institutions that are directly involved in the management of river basin in Malaysia.Moreover, the tools used in this work could form a basis for a more effective decision making process on the part of the policy makers in order to help maintain and improve the management of river basins.However, it should be emphasized that there are no structured methods today to identify what network structure can be best approximate the function, mapping the inputs to outputs.In addition, pre-processing for the data is essential step for water quality prediction model and required more survey and analysis that could lead to better accuracy in their application.Moreover, the optimal selection of the key parameter still required to be achieved by augmenting the AI model with other optimization model such as genetic algorithm or particle swarm optimization methods.On the other hand, the variable selection (input pattern) in AI model is always a challenging task due to the complexity of the hydrologic process.Some other advanced ANN model, namely; Dynamic Neural Network DNN could be investigated and might provide better predict model.The investigation and application of more robust input pattern selection approaches, such as systematic searching of optimal or near optimal variable combination in DNN with ensemble procedure, would be desirable in future water quality prediction research studies.

Conclusions
In this study, model based on artificial neural networks was developed for prediction of the DO concentration in water of the Johor river (Malaysia).Among many types of ANNs, the most widely used is the three-layer feed-forward neural network which is introduced in this study.LMA algorithm is used for its faster convergence speed and lower error rate to overcome the shortcomings of traditional BP algorithm as slow to converge and easy to reach extreme minimum value.The architecture of the proposed model giving the optimal result was multi-layer perceptron neural network (MLP-NN) with The hyperbolic tangent sigmoid transfer function between the input and the hidden layers, linear transfer function between the hidden and output layers.The number of neuron is varying in the range 1-20 neurons.17neurons were selected as the best number of neurons based on the minimum value of MSE of the training data set.The sensitivity analysis showed that all studied input parameters (temperature (Temp), water pH, electrical conductivity (COND), nitrate (NO 3 ) and ammonical nitrogen (NH3-NL)) have strong effect on dissolved oxygen.In addition, NO 3 is the most influ-ential parameter with relative importance of 25 %.Two scenarios were introduced; the first scenario (Scenario 1) was to establish the prediction model for water quality parameters at each station based on five input parameters, while the second scenario (Scenario 2) was to establish the prediction model for water quality parameters based on the five input parameters and the value of DO at previous station (upstream).In comparison between Scenario 1 and Scenario 2, Scenario 2 was able to achieve a high level of accuracy in simulating the magnitude and patterns of all water quality parameters at all stations where the accuracy improvement percentage(AI %) ranging from 4 % to 8 % after utilizing Scenario 2. The verification of the developed model based on collection of field data within duration 2009-2010 showed that the model perform very satisfactorily to predict DO concentration (R 2 values equal or bigger than 0.9) for all stations.The result showed that the proposed model can be applied successfully and contribute to enhance the accuracy of water quality prediction.

Fig. 1a :
Fig. 1a: Map showing the geographical setting of the survey area with monitoring Fig. 2: A typical multi ypical multi-layer perceptron neural network architecture layer perceptron neural network architecture Fig. 2. A typical multi-layer perceptron neural network architecture.

Fig. 4 :Fig. 4 .
Fig. 4: An optimal architecture of architecture of ANN, together with a flowchart of the algorithm's procedure ANN, together with a flowchart of the Fig. 4.An optimal architecture of ANN, together with a flowchart of the algorithm's procedure.

Fig. 5 :Fig. 5 .
Fig. 5: Statistical properties (i.e.mean, standard deviation, range) for training, validation and testing data sets Fig. 5: Statistical properties (i.e.mean, standard deviation, range) for training, validation and testing data sets Fig. 5: Statistical properties (i.e.mean, standard deviation, range) for training, Fig. 5. Statistical properties (i.e.mean, standard deviation, range) for training, validation and testing data sets.

Fig. 6 :Fig. 6 .
Fig. 6: Relationship between   Relationship between the number of neurons and MSE number of neurons and MSE Fig.6.Relationship between the number of neurons and MSE.

Fig. 7 :
Fig. 7: Training, validation and test mean squared errors for the developed Training, validation and test mean squared errors for the developed Training, validation and test mean squared errors for the developed model Fig. 7. Training, validation and test mean squared errors for the developed model.

Fig. 8 :
Fig. 8: A scatter diagram of the predicted versus observed DO green dot) and testing (in yellow dot) Fig. 8: A scatter diagram of the predicted versus observed DO for validation (in green dot) and testing (in yellow dot) for validation (in Fig. 8.A scatter diagram of the predicted versus observed DO for validation (in green dot) and testing (in yellow dot).

Fig. 9 :Fig. 9 .
Fig. 9: Percentage error of the model Fig. 9: Percentage error of the model if any one of the input parameters removed any one of the input parameters was Fig. 9. Percentage error of the model if any one of the input parameters was removed.

Fig. 10 :Fig. 10 .
Fig. 10: The relative importance of The relative importance of each input parameter at Station 1 each input parameter at Station 1 Fig. 10.The relative importance of each input parameter at Station 1.
Fig. 13 : Scatter plots between the observed and predicted value for each of the Fig. 14 : Scatter plots between the observed and predicted value for each of the

Table 1 .
Input parameters used in previous studies for the ANN model.

Table 2 .
Basic statistics of the measured water quality parameters in Johor River.

Table 3 .
The correlation coefficient between DO and the input parameters.
by the MLP-NN network.Hence, a total of four models for DO prediction were constructed.

Table 4 .
Predictive accuracy if any one of the input parameters was removed from model.

Table 5 .
Connection weights between the input and hidden layers (W 1 ) and weights between hidden and output layers (W 2 ).

Table 6 .
Coefficient of Efficiency (CE) and Mean Square Error (MSE) associated with MLP-NN models for each station at Johor River.

Table 7 .
A summary of correlation coefficient for Scenario 1 and Scenario 2 and the AI %.