COPULA AND ARMA BASED STUDY OF CONTROLLED OUTFLOW AT FARAKKA 1 BARRAGE 2

Abstract. In this study, 25 years mean monthly out flow discharge data of Farakka barrage was used (i.e., from 1949 to 1968). Farakka barrage is located between on Ganga River. Spatial and temporal variation in flow rate for any particular area is very common due to various meteorological and other factors existing in nature. But large variations in these factors cause extreme events (e. g., floods and droughts). Monthly outflow discharge for a particular critical month are predicted using statistical models (ARMA Model and Copula Model). Different Copulas (i.e., Normal, t, Frank, Clayton, Gumbel–Hoggard, Ali–Mikhail–Haq) are used for this purpose and the copula model is selected based on distribution functions (Normal distribution, Lognormal distribution, Extreme value type-1 distribution, Generalized Extreme value type, Gamma distribution, Weibull distribution, Exponential distribution). The distribution is selected based on the Mean square error (MSE), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The model parameters were computed using the Maximum Likelihood (ML) estimation method.



Introduction
An accurate flood-frequency analysis is critical for the design of many civil infrastructures such as drainage system and flood proof walls.Copula word is taken from Latin language and the meaning of copula is link and the concept of copula was introduced in mathematical and statistical manner by Sklar (1959) in a theorem that describes a copula as a function.Afterwards, many researchers such as Genest and MacKay (1986), Genest and Rivest (1993) and Nelsen (1999), Favre et al. (2004), Genest and Favre (2007) and Salvadori and De Michele (2007) used in hydrology applications.Crucial steps for copulas modeling are driving the bivariate distribution of peak flow and volume, volume and duration, peak flow and duration (Zhang and Singh, 2006).Archimedean copulas (Clayton, Frank, Gumbel-Hoggard, Ali-Mikhail-Haq, Indpendance and Joe) can be used for bivariate modeling peak flow and volume, volume and duration, peak flow and duration.Dependence structure of data set is captured by copulas, thus they are used for describing the dependence of o extreme output values and also useful for depandence non parametric measurement.Statistical dependence among three random variables two copulas are used for modeling.The Archimedean copulas are prepared by association measurement of Kendall's tau (Osorio et al. 2009).The probability density function for the two-dimensional random variable representing volume and time is given in graphic form.The graphs both represents Clayton copula and Gumbel-Hougaard functions.The Gumbel-Hougaard copula was best suited for this study because it shows lower value in selection criterion function.Gumbel-Hougaard copula shows better matching of empirical and theoretical distribution function.The results obtained in the study, risk values at extreme analyzed values of controlled discharge and flood control capacity are not monotonic.It represents that simulations were completed for sets of only 10000 cycle elements and only 10000 cycles (Twaróg, 2016).Peak flow and hydrograph volume both can be jointly studied by bivariate approach (e.g., Goel et al. 1998;Yue et al. 1999;Favre et al. 2004;Shiau et al. 2007).The selection of the, different criterion should be consider among the candidate copula (Chowdhary et al. 2011;Requena et al. 2013).The first criterion is the goodness-of-fit test which relates the ability of copula to characterize the data (Genest et al. 2009), The second criterion is estimation of kendall's tau return period estimation by copula.It relates the adequacy of copula, for a large copula value t ϵ[0, 1], which is based on the Kendall's function K C (t) = P[C ϴ (u 1 , u 2 ) ≤ t] (Genest and Rivest,1993).The third criterion is the estimation of Akaike Information Criterion (AIC) (e.g., Zhang and Singh, 2006).A copula-based model and a distributed hydro-meteorological model and a copula-based model can be studied by combining extension of observed flood series (Requena, et al. 2015).Significant number of researchers found in their research that Gumbel-Hougaard copula as the most suitable choice to model the dependence structure relating to the peak flow discharge and the flood volume (De Michele et al., 2005;Zhang and Singh, 2007, Karmakar and Simonovic, 2009and Li et al., 2013).A copula-based approach was used to derive a bivariate distribution function of two constituent flood variables, with regard to a real-world case study.It was found to provide an effective and straightforward strategy for inferring probability functions from multivariate sample data.Powerful tests developed inside copula framework allowed to investigate the empirical dependence structure in an accurate manner, especially with respect to the evaluation of tail dependencies (Balistrocchi, 2017).The dependence of copula model between intensity and rain fall duration, both properties of marginal distribution and dependence between intensity and storm duration were preserved.The Joint cumulative distribution functions represents dependence between independent variables of their marginal distribution of copula (Joe, 1997 andNelsen, 2006).Gaussian copula was used for generation of 1020 synthetic data sets.Among the data sets, 21 data sets lies beyond the range of acceptance so these data sets were omitted.Of course it is not possible to cover all input-output cases in trained models the extrapolation limit are required (Hooshyaripor et al. 2014).Best copula model can be selected by coarse grid model selection with supposedly known marginal parameters in which 15 families of copulas were divided into 4 categories and selection with uncertain marginal parameters (Parent et al. 2013).
Copula is a tool for modeling multivariate distribution in which input is the marginal distribution.Multivariate distribution function couples to the corresponding marginal distribution.(Poulin et al., 2007;Salvadori et al., 2007).The monsoon rainfall of Assam, Meghalaya and Nagaland, Manipur, Mizoram, Tripura, Gumbel-Hoggard copula model was well simulates for rain fall estimation (Ghosh, 2010).Marginal distributions and correlations values are used to simulate the Gaussian model.They were taken four case studies to demonstrate its usefulness in the reference of determination of field significance analysis, analysis of regional risk , frequency analysis and design of hydrograph derivation by QdF models.(Renard et al. 2007).Copulas are very good tool to model multivariate data and they are very useful in financial economics as well and in the analysis of multivariate survival data.Dependent variables are very useful Monte Carlo simulations for copula model.It estimates the structural dependence of the data set and describe accurately for dependence of extreme out come.(Muhaisen, et al. 2006).Multivariate probability distributions with arbitrary marginal can be constructed in a flexible manner with the introduction of copulas (Wang et al. 2001).Major issue of a copula is the compatibility with dimensions though they were successfully tested and applied on several hydrological problems.(Kao and Govindaraju, 2008).Application of copula in the engineering problem need moderate and minimal computational effort and accuracy of the output is also satisfactory (Kao et al., 2012).For two copula approach the spatial dependence of rainfall dependence in sub-basins decreases up to 18 %.To predict decrease runoff error spatial rainfall dependence could be recommended for copula modeling (Razmkhah, 2016).
The aim of this paper is to generate the out flow discharge data at Farakka barrage using Copulas.In this study, Normal Copula, T-Copula, Frank Copula, Clayton Copula, Gumbel-Hoggard (GH) copula, Ali-Mikhail-Haq(AMH) copula are used and best copula is selected for generation of discharge data based on copula parameters, Mean square error(MSE), Akaike Information criterion(AIC), Bayesian Information criterion (BIC).
ARIMA model was developed to forecast monthly inflow discharge in a reservoir system (Mohan et al., 1955 ).Criteria for model selection are residual variance (Katz et al. 1981), Akaike information criteria ( Akaike 1974) and Posterior probability criteria (Kashyap 1977).

Copulas used for study
Copulas are alternative methods for dealing with multivariate extremes, and these are very popular in recent times.Consider a moment pair of random variables U and V, with their distribution functions F(u) = P [U ≤ u] and G(v) = P [V ≤ v], respectively, and a joint distribution multivariate distribution functions to their corresponding distribution functions of their corresponding marginal distribution functions (Poulin et al., 2007;Salvadori et al., 2007).
Definition which is given below is given by Sklar (1959), if p-dimensional distribution function then F can be written as: where F 1 , ……., F p = Marginal distribution functions.If F 1 , …….F p are continuous then the copula C is unique and has the representation (Poulin et al., 2007): Copula is expressed for two random variables, U and V, with their CDFs, respectively, as F u (u) and F v (v), let X = F u (u) and Y =F v (v), Where, X and Y are random variables which is uniformly distributed with their values x and y.The list copulas and its equations with generating function is shown in Table 1.
Table 1.: List of Copulas and its equation, generating function and relation with τ.

S.
No.

Relation with τ
Where, e cdf = Empirical Cumulative Density Function P cdf = Predicted Cumulative Density Function

Akaike Information Criterion (AIC)
For a given data set and given set of models .AIC measures relative quality of statistical methods and it compute the each model's quality, relative to other models quality (   3).The two-sample K-S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.In the

Copula parameter estimation
252  is taken from some non-parametric method (Table 5).Formula of empirical CDF of copula is given below.In the Figure 10, the blue points shows data points at calibration and validation state.Blue points represents data points in calibrated and validation stage by Frank copula as shown in Figure 10.Where, Xi=Value of mean monthly discharge, X=Long term average,σ =Long term standard deviation, i=1 to N, N is total number of data point in monthly step.The normalization or differencing in the data is not only make it stationary but also removes periodicity from the time series where periodicity can be defined as correlation i.e. linear association of data with the previous some lag value of data.As we are interested to only capture unknown information from a process which are unknown due to noise or random term (stochastic factor in the process), so deterministic part in terms of long term mean, periodicity, seasonality, trend, sudden drop or jump is necessary to remove from the time series since these deterministic terms already reflects

Spectral Analysis
The observe time series is analyzed in frequency domain to indicate exactly in which months periodicity present in the data that is only indicates by correlogram.In this frequency domain analysis an assumption is taken as time series is a random sample of a process over time and is made up of oscillations of all possible frequencies.The time series is approximated by signal process contains deterministic term in wave form and noise or random term by which the information is extracted from time series and shows prominent spike in variance spectrum plot.
The contributing equations for spectral analysis are given below

Line spectrum
The spike in the line spectrum confirms the presence of particular month periodicity in the data (Figure 13 and Table 6 )and lime spectrum is plot between spectral density versus angular frequency.It is also known as variance spectrum.Line spectrum plot is drawn by using discharge data and standardized discharge data.
Where, ω = Angular frequency and I N = Observation numbers,

Model Description
Auto regressive moving average models are developed using white noise series.In the present study the information form observed time series has captured not only developing ARMA (p, q) model but also by pure AR (p) and MA (q) model.The block diagram for AR (p), MA (q) and ARMA (p, q) process are shown below.Where, Li is the likelihood value, z represents a vector of historical series i.e. parameter vector,

Model Validation
In the present study ARMA(2,0) (table 9)models has selected as one time step ahead and prediction model by Maximum MLE criteria respectively.The selected model is validate to examine whether the assumptions used for selection of the model are valid.

Significance of residual mean
This test examines the validity of the assumption that the error series e(t) has zero mean.A statistic η(e) is defined as: Where, e ⃑ = Estimated residual mean.ρ = Estimated residual variance.
The statistic η(e), approximated distribution as t(α, N-1), α represents the significance level at test is being carried out.If the value of η(e) <t (α, N-1),(table 10) then the mean of the residual series is not significantly different from zero (-)ve series passes the test.Copula is the best model for generating outflow discharge data at Farakka barrage.Predicted discharge (Cumec) 39 2 ) (Figure 18 and Figure 19).Based upon all above test Frank Copula is the best model for generating outflow discharge data at Farakka barrage.
: Comparison of Observed and predicted data for Frank Copula model.
function H (u, v) = P [U ≤ u, V ≤ v].Each pairs having of real numbers (u, v), associated three numbers: F(u), G(v), and H (u, v) and each numbers are lie in the interval [0,1].In other words, each pair of real numbers i.e. (u, v) leads to a point {F(u), G(v)} in the unit square [0, 1]×[0, 1], and this ordered pair in turn corresponds to a number H(u, v) in [0,1].We will show that this correspondence, those values are assign in the joint distribution function to each values of ordered pair in the individual distribution functions.Such functions are named as copulas.A copula is used as a tool in modeling multivariate distribution in which marginal distributions are input data and neglect restrictions mentioned in pervious text.Copula means couples or joins Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-380Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 12 December 2018 c Author(s) 2018.CC BY 4.0 License.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-380Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 12 December 2018 c Author(s) 2018.CC BY 4.0 License.Where, ϴ= Parameter which controlling the dependence between x and y.Ф= Generator of the copulas.Debye function is expressed as follows.D n (β,x) = ∫ at Farakka barrage data set about twenty-five years from 1949 to 1973 data has taken from Water Resources Information System of India at Farakka barrage project, Farakka, West Bengal, India.The observed data set are divided into two parts.One part contains twenty years' data (from 1949 to 1968) has been used for parameter estimation i.e. in model calibration, next five years' data (from1969 to 1973) has been used for model validation and testing.Parameter estimation data is arranged such a way that pre-monsoon (December to May) and post monsoon (June to November) data is separated and making two series of dataset for copulas.4. Selection of distribution for Copulas For modeling of controlled outflow, bivariate Copula has taken in this study.As Copula accepts CDF of variables, distribution functions of two variables, should be known.The distribution functions are chosen on the basis of AIC, BIC values, k-s test and probability plots.The distributions that are tested to know the parent distribution of two variables are normal distribution, lognormal distribution, extreme value type I distribution, generalized extreme value distribution, gamma distribution, weibull and exponential distributions.We used data set for Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-380Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 12 December 2018 c Author(s) 2018.CC BY 4.0 License.different times i.e., from Dec. -May 1949 to Dec. -May 1968 (Figure 1), from Jun. -Nov.1949 to Jun. -Nov.1968 (Figure 2), Dec. -May 1949 to Dec. -May 1968 (Figure 3), Jun.-Nov.1949 to Jun. -Nov.1968 (Figure 4), Dec. -May 1949 to Dec. -May 1968 (Figure 5), Jun.-Nov.1949 to Jun. -Nov.1968 (Figure 6).The violet colour represents the data set for different times and red colour represents normal distribution, green colour represents lognormal distribution, etc as shown in Figures1-6.Figure1represents cumulative distribution function of data points along with all distributions in pre monsoon seasons(Dec.-May1949 to Dec.-May 1968).

Figure 2
Figure 2 represents cumulative distribution function of data points along with all distributions in

Figure 4 .
Figure 4. PDF of mean monthly discharge

Figure 7
Figure 7 and 8, green colour shows empirical CDF and red colour shows generalized extreme value of CDF.On the basis of Figure 7, generalized extreme value distribution is representing best fit for cumulative distribution function (Jun.-Nov.1949 to Jun. -Nov.1968).Further, on the basis of Figure 8, generalized extreme value distribution is represents best fit for cumulative distribution function (Dec.-May 1949 to Dec. -May 1968).

Figure 9 .
Figure 9. PDF and CDF for Frank copula.

Figure 10 .
Figure 10.CDF of observed and empirical copula in calibration and validation state.
known information about the process, are not required to model.Generally monthly discharge time series shows periodicity and seasonality in the data set and it is necessary to remove before calibrate (finding Parameter of model) to ARMA model as this type of model is developed to capture unknown information from noise i.e. random process.The observed data set are divided into two parts.One part contains twenty years' data (from 1949 to 1968) has been used for parameter estimation i.e. in model calibration, next five years' data (from1969 to 1973) has been used for model validation and testing.The mean monthly discharge data used for model calibration may have serial correlation i.e. any data in particular time step depends on its previous adjacent data and may follow so on.The time series plot of observe discharge depicts this serial correlation, seasonality or periodicity in terms of information contain in the series by showing some regularity or similar oscillation of the series.Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-380Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 12 December 2018 c Author(s) 2018.CC BY 4.0 License.
numbers, X t = Observe rainfall data, t= Time step in month P = Periodicity in the data, X = Mean of the series (average monthly rainfall) α k = Cosine wave form, β k = Sine wave form of time series.M = Maximum lag typically consider 0.25N.Values of α k and β k in equation number 7 are valid up to k = N/2.

Figure 13 :
Figure 13: Plot of spectral density versus angular frequency.

Figure 17
Figure 17 describes dependence structure for 1000 generated samples i.e.Copula is a statistical

Figure 20
Figure 20 is a time series of discharge data, blue colour represents observed data from jan. 1968

Figure18:
Figure18: Comparison of Observed and predicted data for Frank Copula model.
Based upon all above test FrankCopula is the best model for generating outflow discharge data at Farakka barrage.: Comparison of Observed and predicted data for Frank Copula model.
Table 2).MSE represents the risk function corresponding to the expected value of the squared error loss or quadratic loss.The difference in the MSE because of randomness.Lowest value of AIC is good for model. ()

Table 2
of BIC is preferred (Table2).It is mainly based on likelihood function and it having 223 approximate same conditions as Akaike information criterion (AIC).

Table 4 . : Copulas and its parameter.
For a best copula model MLE should be high and MSE, AIC, BIC should be minimum from the 256 above data frank is best model for predicting the data (Table4).Figure9shows the probability 257 density variation from green to red, green having lowest probability density and green colour 258 having maximum probability density.It also represents the probability density function and 259 cumulative distribution function for frank copula which is best for prediction of discharge data.