Downstream prediction using a nonlinear prediction method

Introduction Conclusions References


Introduction
Urbanization and urban growth are essential factors for planners and policy makers because the urbanization pattern has major implications for the hydrological processes.Urbanization can have various effects on certain hydrological problems, such as flood prevention in urban areas, allocation of adequate resources in terms of water quality and quantity, and waterborne waste disposal (Hall, 1984).Thus, the problems of Figures urban hydrology involve flood and pollution prevention.However, our study only considers flood prevention based on the river flow prediction.If an undeveloped area is then transformed into a developed area, the conditions of the soil structure will be disturbed (Hall, 1984).These factors can change the magnitude of the river flow.The volume of runoff will increase significantly with the increase in the magnitude of the river flow due to the impervious areas and the lack of drainage.Hence, downstream flooding problems exist in urban areas.There are several methods that can be used to estimate the river flow in a watershed that is located in an urban area, such as the empirical and physical process methods.Referring to the empirical method for urban hydrology research, the behaviour of river flow in the downstream area is important to provide accurate information for the whole river flow (Viesmann and Lewis, 1996).This information can help in planning, development and flood prevention of the downstream area.
The Langat River, which is one of the longest rivers in the state of Selangor, Malaysia, is used as a case study.This research focuses on the downstream area at Kajang, which is well-known for experiencing flood hazards.Figure 1 shows the four gauging stations along the Langat River.The Langat River flows from east to southeast, which is from Lui River to Kajang.The total length of the upstream and downstream is about 34.4 km and the downstream area has been identified as a flood risk area (Mohammed et al., 2011).Checkpoint 1 (station number 3118445) is located at the Lui River gauging station (upstream) and Checkpoint 2 (station number 2917401) is located at the Kajang gauging station (downstream).The Langat River at the Kajang gauging station has been used for the river flow analysis and prediction using the nonlinear prediction method.This area had a population of 229 655 people in 2000, which increased to 342 657 people in 2010 (Department of Irrigation and Drainage Malaysia, 2005).The increase in population in this area reflects the development in the Kajang area.Furthermore, the study area is adjacent to an industrial area and pig farms.Flooding in this area can cause damage to the industrial area and pollution in the Langat River basin.Thus, studies of the downstream area (Kajang) are important to provide information Figures

Back Close
Full about the flow downstream.This study was conducted at this point so that the release of water from Checkpoint 2 could be estimated for a certain length of time.The results of this study could help to identify the preventive measures that could be undertaken in this downstream area.The analysis and prediction of river flow could provide the information about the dynamics of the river flow system.However, the flow of the river is not dependent on rainfall alone.The characteristics of an area, such as shape, slope, land, soil structure and climate change, can also affect the flow of the river in an area (Viesmann and Lewis, 1996).Thus, the application of stochastic methods is often used to analyse complex natural conditions, such as the river flow.Developments in the study of nonlinear time series analysis is growing with some revolutionary methods.One particular method that provides important findings is known as chaos theory, which explains that a complex system can be analysed by deterministic methods that use a minimum number of the system's variables (Islam and Sivakumar, 2002).Several decades ago, a number of studies were performed to obtain information on characterizing, modelling and predicting hydrological phenomena as a deterministic system (e.g.Jayawardena and Lai, 1993;Sivakumar, 2000;Ghorbani et al., 2010).The results showed that the river flow prediction and other hydrological processes are in good agreement with the actual data values (Sivakumar, 2003;Regonda et al., 2005;She and Yang, 2010;Khatibi et al., 2012).In addition, prediction using chaos theory can reveal the number of variables that affect the dynamics of the river flow.
Studies on river flow analysis and prediction in Malaysia have been done and improved for a variety of purposes, such as providing information for flood prevention.Several methods, such as support vector machine method (Shabri and Suhartono, 2012), neural network model (Ahmad and Juahir, 2006) and hydrodynamic modelling (Ghani et al., 2010), have been used for river flow prediction.However, several methods have yet to be explored for the purpose of river flow prediction in Malaysia, such as chaos theory, Bayesian methods and wavelet methods.River flow prediction using chaos theory involves a single variable (river flow data) albeit there are other dominant Introduction

Conclusions References
Tables Figures

Back Close
Full variables affecting river flow prediction.Meanwhile, the Bayesian and Wavelet methods are dependent on a number of dominant variables, such as rainfall, temperature and soil type.To the best of our knowledge, this is the first attempt to use the for the analysis and prediction of river flow in Malaysia.

Nonlinear prediction method
The nonlinear prediction method (NLP) of chaos theory is used to analyse river flow and predict the future value of the flow.There are two steps in NLP -phase space reconstruction and prediction.Reconstruction of the phase space uses observed data (one-dimensional) to build the m-dimensional phase space that reflects the dynamics of the river flow (Abarbanel, 1996;Adenan and Noorani, 2013).A scalar time series x(t) forms a one-dimensional time series: where N is the total number of points in the time series that can be transformed into m-dimensional vectors: where τ is an appropriate time delay and m is a chosen embedding dimension (Abarbanel, 1996;Tongal and Berndtsson, 2013).Referring to Eq. ( 2), the value of τ and m are needed to reconstruct the phase space.In this study, τ has been predetermined.Selection of the appropriate τ is important during the reconstruction of the phase space.The most optimal value of τ can provide a separation of neighbouring projections with respect to the dimension of the phase space.If the value of τ is too small, the coordinates of the phase space cannot properly describe the dynamics of the system.Meanwhile, information on trajectories in the phase space will diverge if the value of τ is too big (Sangoyomi et al., 1996;Islam and Introduction

Conclusions References
Tables Figures

Back Close
Full Sivakumar , 2002).The optimal value of m in phase space reconstruction can describe the topology of the attractor.The number of dimensions in the reconstructed phase space is equal to the number of columns in the matrix resulting from the embedding parameters in the time series.If the number of columns is insufficient, it cannot reflect the phase space dynamics of the system.Therefore, the selection of the preliminary parameter pair (τ, m) is important to reflect the dynamics of the phase space.

Determination of preliminary parameter pair (τ, m)
Previous studies on the river flow prediction showed that when a condition of time delay τ = 1 is used in phase space reconstruction, the results gave good predictions (Sivakumar, 2002(Sivakumar, , 2003)).Thus, in this study, the time delay τ = 1 is used.The embedding dimension m is calculated using the correlation dimension and false nearest neighbour method (FNN).There are two models to be considered, Model I and Model II, which involve different combinations of preliminary parameter pairs for the reconstruction of the phase space.Model I involves τ = 1, and m is the result of the calculation from the correlation dimension; and Model II involves the combination τ = 1, and m is the result of the calculation of FNN.A comparison of the prediction results is conducted to distinguish the strength of the two models.
The correlation dimension method is the most fundamental method in the study of chaotic time series for proving the presence of chaotic behaviour in hydrological studies (Jayawardena and Lai, 1994;Martins et al., 2011;Khatibi et al., 2012).For a given distance r, the main idea of the correlation function C(r) is related to the shortest distance of the vectors Y t .Here the Euclidean distance is used to calculate the distance between points on the vector space: (3) Introduction

Conclusions References
Tables Figures

Back Close
Full The correlation dimension is based on the correlation integral introduced by Grassberger (1986): where H is the Heavyside function, which has the value 0 or 1 and can be defined as: and acts as a barrier to the Euclidean distance between two points on the attractor Y i and Y j .The correlation function C(r) is calculated for the pair of points (Y i , Y j ) with a distance less than the radius r.In the limit to infinite amount of data (N → ∞) and sufficiently small r(r → 0), the relation C(r) ∼ = αr D 2 is expected (Men et al., 2004).The correlation dimension D 2 and correlation exponent v can be defined as: correlation exponent is finite, low and non-integer, the system is considered to be of low dimensional chaotic nature (Men et al., 2004).If the correlation value increases without limit as m increases, the system should be studied as a stochastic system.The false nearest neighbour method (FNN) is an effective method for finding the embedding dimension m for the reconstruction phase space.This method has been used to analyse river flow time series (Wu and Chau, 2010;Ghorbani et al., 2012).This paragraph describes how FNN is implemented.Suppose the dimension increases then the distance between the point and the nearest neighbour should not change if it is indeed the nearest neighbouring point.Computation of the distance between the point and the nearest neighbour is by the Euclidean distance.
FNN can be calculated using the following algorithm.Assume that For all points i in vector space, equation T is used and the value of false nearest neighbour can be calculated.R T is a value between 10 and 30.In this study, the value of 15 is used (Wu and Chau, 2010).Repeat the algorithm with different embedding dimensions and the value of the false nearest neighbour that is close to zero is used as the embedding dimension.

Prediction
In this study, the prediction of river flow has been performed by using the local linear approximation method.This method was proposed by Lorenz (1969).Application of the local linear approximation method is to (1) examine whether the river flow at the downstream areas can be predicted, (2) to compare the prediction results for Models I and II.The local linear approximation method is used to predict river flow in downstream areas as follows.The first step is to reconstruct the phase space.The combination of the preliminary parameter pair (τ, m) is important for reconstruction of the phase space Introduction

Conclusions References
Tables Figures

Back Close
Full because this phase space result will be used in making a prediction.The difference between Models I and II is in the reconstruction phase space.Models I and II involve τ = 1 but involve different methods in determining the value of m.Model I uses the correlation dimension while FNN is employed for Model II.Assume that the reconstruction of phase space is like

Performance evaluation
The assessment of the prediction accuracy of the models for predicting the daily river flow is evaluated by using the mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (CC).The MAE, RMSE and CC are as follows: where y o t is the observed and y f t is the forecast value at time t, and n is the number of data points.MAE and RMSE can provide information on the predictive ability of the Introduction

Conclusions References
Tables Figures

Back Close
Full

Results and Discussion
River flow prediction using NLP involves the reconstruction of the phase space and prediction.Thus, the discussion of the findings is divided into two parts.The first part is to determine the parameters for the reconstruction of the phase space for Models I and II.Meanwhile, the description of the prediction results are discussed in the second part.
The phase diagram can provide information about the dynamics of a system through the trajectories in the phase space.The trajectories that are of interest focus on a subspace called the attractor.In addition, the observation of attractor trajectories in the phase space can provide information about the chaotic behaviour of the system.Hence, the phase diagram and the observation data involved are plotted.Figure 3 shows the phase diagram in two and three dimensions with τ = 1.The trajectories in the phase space can indicate the presence of chaotic behaviour of the data (Sivakumar, 2002).Referring to Fig. 3, the trajectories of the attractor are clearly shown in the two phase diagrams.Thus, the data involved in this analysis are chaotic.Therefore, the dynamics of the system can be studied using chaos theory without involving stochastic methods.
This study involved data from January 2002 to December 2005 (1433 days).Three years of data are used in the reconstruction of the phase space to predict the behaviour one year ahead.Reconstruction of the phase space is based on the embedding dimension.In Model I, the embedding dimension is based on the calculation of the correlation dimension.Graph ln C(r) vs. ln(r) in Fig. 4a shows the behaviour of the correlation function v vs. radius r for the increasing m-dimensional.In general, the increasing value of the m-dimensional gradient occurs at the beginning of the curve from left m = 1 to right m = 10.Meanwhile, the graph of the correlation dimension estimation, the relationship between the correlation exponent v for different values of m is shown in Fig.  increased as the value of the m-dimensional increased.The increase in m-dimension can be seen up to a scaling region where the correlation dimension is saturated.The situation in which the value for the correlation dimension is saturated might indicate the existence of deterministic dynamics in the system.The saturated conditions for the d 2 value is in the interval (2.5, 3).The saturation value for d 2 is known as the correlation dimension attractor (Sivakumar, 2000).In general, the sufficient condition for the value of the smallest integer m is m greater than 2D 2 (Wu and Chan, 2010).Thus, the value of m = 6 is related to the Langat River flow time series in Kajang.The correlation dimension d 2 is finite and shows low levels of correlation dimension.Hence, Sungai Langat is a chaotic and deterministic system.Model I involves a combination of preliminary parameters (1, 6) in the phase space reconstruction.Model II involves the calculation of m using FNN to find a combination of the preliminary parameters for RPS. Figure 5 shows the percentage of false nearest neighbours vs. m.Thus, the optimal value for the embedding dimension identified is m = 14.Model II involves a combination of parameters (1, 14) for the reconstruction phase space.

River flow prediction for Model I and Model II
The combination of preliminary parameters for Model I is (1, 6) while for Model II it is (1, 14).Thus, for both models, the combination of the preliminary parameters (τ, m) has been applied to construct the phase space.Figure 6 and Table 2 provide a summary of the river flow prediction results in terms of MAE, RMSE and CC.Overall, the results show good performance prediction for chaos theory in predicting the future value of the river flow for the downstream area.Referring to Table 2, a comparison of prediction performance shows that the prediction results for Model II are better than Model I.The correlation coefficient for Model II (0.6360) is slightly higher compared to Model I (0.6103).Thus, analysis and prediction of the Langat River can provide information in which the selection of a combination of preliminary parameters in the reconstruction phase space is essential for better prediction results.In this study, Model II uses FNN Introduction

Conclusions References
Tables Figures

Back Close
Full to calculate the embedding dimension m and is more appropriate than the correlation dimension method.

Conclusions
Analysis and prediction for testing the presence of chaotic behaviour in daily river flow data recorded at Langat River involving the station at Kajang, Selangor, Malaysia, has been performed.The station is located in the downstream area, which is a flood prone area.The analysis was carried out on the river flow data for a period of 4 yr (2002)(2003)(2004)(2005).The focus of this study was to identify the chaotic behaviour of the river flow data in the downstream area and determine whether the river flow can be predicted when chaotic behaviour of the river exists downstream.Chaos theory, together with NLP, were used in the analysis.The reconstruction phase space clearly shows the existence of a chaotic attractor.Hence, the data involved in this analysis are chaotic.
Next, was the attempt to make a prediction for one year ahead with the observed data using the results of the reconstruction of the phase space for three years.Two combinations of preliminary parameters were used.Model I used τ = 1 for which m is the result of the calculation of the correlation dimension, while Model II used τ = 1 for which m is the result of FNN calculation.Using these methods, the optimal combination for Model I was (1, 6) and for Model II it was (1, 14).The overall prediction results showed that both models could give a good prediction for the river flow downstream.However, the combination of the preliminary parameters for Model II using the FNN algorithm provided a better prediction result than Model I, which used the correlation dimension.The results showed that Langat River in Kajang, which is in the downstream area, is chaotic and predictable using NLP.Therefore, the results of the analysis and prediction of river flow in the downstream area could provide information on river flow for the authorities to take appropriate control of the downstream flooding.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Several steps are required to identify the value of the correlation dimension.The first step is to draw a graph ln C(r) vs. ln(r) with a given m.Then the gradient (correlation exponent v) of the m-dimensional curve values has to be determined.The gradient of the graph can be measured by the least squares method for determining the scaling.For finite data and where the value of r exceeds the diameter, there exists a saturated area of the graph.The saturated area is the scaling region.A better way to estimate the gradient is to use δ[log C(r)]/δ[log r].To examine if there is a chaotic nature, the correlation exponent (slope v) vs. m-dimensional has to be plotted.If the value of the Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | models involved.Meanwhile, the correlation coefficient CC can measure the correlation between the prediction and the observed data.3 Description of data Langat River is one of the longest rivers in Selangor and its river basin is transboundary, inasmuch as it crosses three states -Selangor, Negeri Sembilan, and the Federal Territory of Kuala Lumpur and Putrajaya (Department of Irrigation and Drainage Malaysia, 2011).The Langat River flows from Mount Nuang in Hulu Langat district to the Straits of Malacca in Kuala Langat.The Langat River catchment area covers a total of 1815 km 2 and is located between latitude 2 • 40 152 N and 3 • 16 15 N, and longitude 101 • 19 20 E and 102 • 1 10 E (Juahir et al., 2011).There are two water reservoirs located in this area -Langat Dam and Semenyih Dam.Langat Dam was built with an area of 54 km 2 and Semenyih Dam has an area of 41 km 2 .Both of these dams were built to deliver water for domestic and industrial use.In addition, the Langat Dam is also used to generate electricity for the use of residents in the vicinity of the Langat Valley.There are several towns and villages built along the Langat River -Cheras, Semenyih, Dengkil and Kajang.Since 1976, Langat River has also been acknowledged to be an area that regularly suffers flooding.The variations of daily river flow data for Checkpoint 2 are shown in Fig. 2. The irregular patterns in data for Kajang River show that the river in this area is a complex system.The overall data were taken from the Department of Irrigation and Drainage Malaysia.Missing data constitute about 0.018 % and were filled using the results of linear interpolation calculation.The statistical parameters of the data cover a period of four years (January 2002 to December 2005) and are shown in Table 1Discussion Paper | Discussion Paper | Discussion Paper | 4b . The relationship between the value of correlation dimension d 2 and m-dimension can be seen in Fig. 4c, which is a graph of d 2 vs. m.The value of the correlation dimension Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Tongal, H. and Berndtsson, R.: Phase-space reconstruction and self-exciting threshold modeling approach to forecast lake water levels, Stoch.Env.Res.Risk.A., online first, doi:10.1007/s00477-013-0795-x,2013.Viessman, W. and Lewis, G. L.: Introduction to Hydrology, HarperCollins College Publishers, New York, 760 pp., 1996Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |
+τ , x i +2τ , . . ., x i +(m−1)τ .The nearest neighbour for Y t is required to predict Y t+1 .Assume that the vector of the minimum distance to the nearest neighbour is Y M .Next, for the local linear approximation method, the values of Y M and Y M+1 are used to satisfy the linear equations Y M+1 = AY M + B. The constants A and B are calculated using the least squares method.Thus, the predictive value Y t+1 can be calculated using Y t+1 = AY t + B.

Table 1 .
Statistics for river flow series at Kajang station (Checkpoint 2).