A Robust calibration/validation protocol of a hydrological model using hidden Markov states

. The impacts of climate and land-use changes make the stationary assumption in hydrology obsolete. Moreover, there is still considerable uncertainty regarding the future evolution of the Earth’s climate and the extent of the alteration of ﬂow regimes. In that context, it is crucial to assess the performance of a hydrologic model over a wide range of climates and their corresponding hydrologic conditions. We propose a calibration/validation protocol based on the differential split-sample test and numerous, contrasted, climate sequences identiﬁed through a Hidden Markov Model (HMM) classiﬁcation. The proposed 5 protocol is tested on the Senegal River in West Africa. Results show that when the time series of river discharges does not exhibit a clear climate trend, or when it has multiple change points, classical rupture tests are useless and HMM classiﬁcation is a viable alternative as long as the climate sub-sequences are long enough.

section 5, the results are described and discussed. Finally, concluding remarks are given.

2 The Senegal River Basin and its sub-basins
The Senegal River drains a basin shared by four countries in West Africa : Guinea, Mali, Mauritania, and Senegal. There are three main tributaries: (i) the Bafing River contributing to ∼ 50% of the Senegal flows, (2) the Bakoye River (∼ 15%), and (iii) the Faleme River (35%). Flowing northward on 500 km, the Bafing River collects precipitation on the Fouta Djallon, a high Sub-basins River Area ( Indicative isohyets ranging are extracted from Faye et al. (2015).
perform well under contrasted hydrologic conditions.
To take advantage of the hydroclimatic specificities of the SRB and its heterogeneity, we have divided the SRB into three sub-basins (Figure 1.b,c,d and Table 1). This allows us to demonstrate the potential of the proposed protocol based on an HMM classification on basins with contrasting hydrologic characteristics. Sub-basins have been delimited using the GRASS-3.4 soft- 80 ware, and the Shuttle Radar Topography Mission (SRTM) 1arc sec elevation data set.
GR2M has only two parameters: X1 and X2 controlling the production and the transfer functions respectively ( Figure 2). We use the GR2M version included in the environment "airGR", developed by Coron et al. (2017). GR2M calibration/validation 90 phase requires three time-series: (i) a time series of monthly precipitations (P) in the basin, (ii) a time series of monthly potential evapotranspirations (PET), and (iii) a time series of monthly river discharges (q) at the outlet.

The calibration/validation protocol
The calibration/validation protocol follows 3 steps: (1) selecting the inputs, (2) choosing the adequate objective function, and

95
(3) identifying the climate subsequences. The optimization algorithm used for the calibration phase comes from Michel (1991).
This paper's contribution lies in step 3 and how HMM can handle complex hydrologic sequences to ultimately assess the ro- bustness of a calibrated hydrological model.

Hydrological data
100 Some authors (Paturel et al., 1995;Hossain et al., 2004;Huard and Mailhot, 2006;Kavetski et al., 2006;Huard and Mailhot, 2008) have pointed out that selecting the most accurate hydrological and meteorological inputs can significantly reduce the Bayesian error during the calibration/validation of a hydrological model. Based on a comparison with meteorological observations compiled by SIEREM, and details given by Bader et al. (2014) (Harris et al., 2020), and covers a period from 1901 to 2018; (3) Monthly river discharge at sub-basin outlets are naturalized flows extracted from Bader et al. (2014) for the 1903-2012 period. Based on the above datasets, we will analyse the period 1940-01 / 1998-12 (59 years), which is denoted by "the full historical record" (T 1940−1998 ) in the remaining of this paper.

Choosing adequate objective functions
Selecting an objective function to calibrate a conceptual hydrological model is one of the main concerns of the hydrological community (Garcia et al., 2017;Krause et al., 2005;Madsen, 2003). Here, we have selected two objective functions:(1) the Nash Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), and the (2) Kling-Gupta Efficiency criterion (KGE) (Gupta et al., 2009). The former is a popular criterion and since it mainly focuses on high flows, it is particularly relevant for rivers where 115 much of the annual discharge is generated during the high flow season, which is the case in the SRB. The latter allows for a multi-objective calibration that considers more components than just the errors; that is, correlation, bias and variability.
Mathematically, the NSE and KGE formulations can be written as: with q obs is the observed flow at time t, q sim is the simulated flow at time t, µ obs the mean of observed flows; β the ratio between the mean simulated flow and the mean observed flow value β = µ sim /µ obs ; α the ratio between the standard deviation of simulated flows and the standard deviation of observed flows α = σ sim /σ obs ; and

Identifying the climate subsequences
The calibration/validation protocol relies on the differential split-sample test proposed by Klemes (1986). The main idea is to split a period into a number of sub-periods with different hydro-climatic features. This amounts to detecting shifts in climate regimes.
To achieve this, we use two methods: 130 1. In the first method, we apply the non-parametric trend Pettitt test (section 4.3.1) to divide T into two climate subsequences: (1) a single dry sub-sequence noted T P ettitt.dry , and (2) a single wet sub-sequence noted T P ettitt.wet .
2. The second method relies on a HMM to river discharge data according to a fixed number of climate states. We apply two versions of the HMM classification: -The 2-states HMM classification, in which each year is labelled as "dry" or "wet". Thus, T is divided into numerous  For an given variable (q, referring to inflows for example) , the Pettitt test is defined as follow (Pettitt, 1979): Since they are considered as an integrative signal of the whole basin hydro-climatic conditions, Pettitt test has been carried out on the mean annual flows. K T gives the year of the change-point if the test is significant (p ≤ 0, 05) (Pettitt, 1979).

The Hidden Markov Model:
Hidden Markov Model is a class of probabilistic model that can be used to label the observations (Rabiner, 1989). The motiva-150 tion for adopting this type of model in hydrology is that the climate regime can be represented by a state variable that can take only a limited number of values (e.g. dry/wet for 2 states; dry/normal/wet for 3 states). In other words, in parallel to the time series of historical river discharges, there exists another time series with discrete climate states. Denote {q 1 , q 2 , ..., q T } the time series of annual flows and {Φ 1 , Φ 2 , ..., Φ T } the time series of climate states which can only take N possible values ( Figure 3).

155
The state variable is unobserved and is accordingly referred to as a hidden variable. The hidden climate state Φ t is modelled as a N state Markov chain (i.e the probability of a particular state depends only on the previous state) described by a transition probability matrix M with elements M ij : 160 With t = 2, ..., T i, j = 1, ..., N where M ij is the conditional probability of transitioning from a hidden climate state i to a hidden climate state j.

165
The observed variable q t is assumed to have been drawn from a probability distribution whose parameters are conditional upon the distinct state at time t such that, when Φ t is known, the distribution of q t depends only on the current state Φ t and not on previous states or observations.
Fitting a HMM to the observed sequence (here the time series of annual flows), requires evaluating the likelihood of ob-175 serving that sequence, as calculated under a N-state HMM (see Appendix A for more details). In this study, we use the Expectation-Maximization (EM) algorithm, which is an iterative method for finding the maximum-likelihood estimate of the parameters of an underlying distribution when some of the data are missing. In the context of HMM, the EM algorithm is known as the Baum-Welch algorithm (Welch, 2003) and the hidden climate states are treated as missing data (Bilmes, 1998;Zucchini et al., 2017). The EM algorithm consists of two main phases: an expectation phase called "E step", followed by a maximization phase called "M step". Given the current estimate of the HMM parameters θ, the following steps are repeated until acceptable convergence is achieved: The "E step" phase of the algorithm computes the expected value of unobserved data (i.e hidden climate states) using the current estimate of the parameters and the observed data. The "M step" phase of the algorithm then provides a 185 new estimate of the parameters by using the data from the "E step" phase as if they were actually measured data. These parameters are then used to calculate the distribution of unobserved data in the next "E step" phase of the algorithm. The resulting values of θ is then the stationary point of the likelihood of the observed data (Please refer to Appendix B for more details).
Given the observation sequence, we want to determine the sequence of hidden climate states {Φ 1 , Φ 2 , ..., Φ T } that has most 190 likely (under the fitted HMM) given rise to the time series of annual river discharges. In the literature, this is known as the decoding procedure. In this study we use the Viterbi algorithm (Viterbi, 1967) to unfold the sequence of hidden climate states (called the Viterbi path). This, in turn, enables us to divide the whole period into numerous climate sub-sequences.

Application of the calibration/validation protocol 195
Recall that the identification of change-points is done on the time series of annual flows while the hydrological model simulates monthly river discharges.
When applying the calibration/validation protocol described above, seven cases arise as shown in Figure 4. Indeed: -If relevant, the Pettitt test offers two calibration/ validation possibilities: calibration on T pettitt.dry and validation on 200 T pettitt.wet , and vice versa. In addition, even though the KGE is based on a decomposition of the NSE, the corresponding scores cannot be directly compared. Therefore, we will discuss the results obtained with NSE and KGE separately. During all calibration phases, the  first year is considered as warming-up period and not considered.  Table 2 and Figure 5a., while the corresponding parameters are given in Table 2  For the three sub-basins, the Pettitt test is significant and shows a rupture in 1970 or 1971 (Figure 5a. red vertical line). The 2-state HMM classification provides similar results with nearly aligned climate sub-sequences for all sub-basins. This is also true for the 3-states HMM classification.

225
With the 2-states or 3-states HMM classification, the states are clearly distinct, which enables us to divide the time series into numerous climate sub-sequences. The values close to one on the diagonal indicate that when the climate is in a particular state, it will likely remain in that state in the next time period (year). With a 2-states-HMM classification (Figure 5a. blue line), the dry is dominant in all sub-basins, whereas with a 3-states-HMM classification (Figure 5b. orange line), the wet state is dominant in both Daka Saidou and Oualia sub-basins. This is due to the fact that some years that were labelled as "dry" 230 with the 2-states HMM classification are now recognized as "normal" when using the 3-states HMM classification. Similarly, several years move from "wet" to "normal" with the 3-states HMM classification. However, since the number of years that are switching from dry to normal is larger than those that are switching from wet to normal, the result is a dominant wet state.
We note that Pettitt change-point is aligned with the 2 or 3-states-HMM transitions in 1970 (Daka Saidou) or in 1971 (Oualia 235 and Bakel). The length of climate sub-sequences are given in Table 3.  The full historical record (T 1940−1998   Depending on the length of period T, the climate sub-sequences can be short. This is the case with T 1945−1971 and T 1972−1998 , for which climate sub-sequences provided by the 3-states HMM classification could reach a length of 5 years. Thus, the ques-250 tion of the minimum of years required to ensure a reliable calibration or validation raises one more time. No consensus has been reached by the hydrologist community at this point, but a number from two to eight years could be enough depending on the "hydrological events" included (Razavi and Tolson, 2013;Juston et al., 2013;Singh and Bárdossy, 2012).
In addition, we want to underline the following two points: (1) According to our protocol, when a model is calibrated on a 255 specific climatic state, it will not be evaluated (validated) on this same state. To overcome this, the "split-sample test" of Klemes (1986) could be combined with the differential split-sample test, but a longer record could be required.
(2) Please recall that the principle of the differential split-sample test could be applied to detect complex climate sequences and to detect massive conversions in land uses when applied on flows.

Towards an enhancement of calibration and validations scores with HMM classifications?
For Daka Saidou, all seven cases have high scores (≥0.9), indicating that the model is robust. However, for Oualia and Bakel, calibration/validation scores are more scattered (Figure 5b.). scores. This is due to the fact that Daka-Saidou is a small upstream basin with relatively homogeneous precipitation and evapotranspiration (Table 1), whereas Oualia and Bakel are larger and heterogeneous basins.

275
In this article, we have shown how an HMM can deal with complex climates sequences, and how the resulting classification can be used to develop a robust calibration/validation protocol. The protocol has been implemented in the Senegal River basin using the GR2M model and the historical flow from 1940-1998.
The main concluding remarks are:

280
-When records display a single point change, a classical rupture trend (as Pettitt test) remains an adequate tool to divide the records into two climate sub-sequences.
-If the records contain multiple change points, HMM classifications are a good alternative to divide the records into several climate sub-sequences. However, records must be long enough (typically 20-25 years for a 2-states HMM classification, and 30-35 years for a 3-states HMM classification) to (i) have a sufficient number of usual and unusual hydrological 285 events (as mentioned by Singh and Bárdossy (2012)), and (ii) to have a minimum number of years for each climate state.
-Regardless of the division method used, the range of climate conditions over which the hydrological model can perform depends on the intrinsic variability of the series used during the calibration/validation phase.
-HMM classifications open up the range of possibilities for calibrate/validate a hydrological model, which can lead to an enhancement of the criterion function (but not necessarily).

290
Assuming the data belonging to each hidden state are characterized by a specific Gaussian probability distribution, the two terms on the right-hand side are: The complete data likelihood function ζ(θ|Z) can be calculated as: For a HMM which has the initial distribution δ and transition probability matrix M for the Markov chain, let us define the probability mass function of Q if the Markov chain is in state i at time t as: With i = 1, 2, ...N

305
The general form of likelihood function is then given by (Zucchini et al., 2017): where Γ(q) is defined as the diagonal matrix with i the diagonal element p i (q) and 1 is N dimensional vector of 1.
In order to set out the likelihood computation in the form of Baum-Welch algorithm (Welch, 2003), which involves the forward α(t) and backward β(t) probabilities, we define α(t) and β(t) as: and respectively. More specifically, α i (t) is the probability of observing the partial sequence q 1 , q 2 , ..., q t and ending up in state i at time t, and β i (t) is the probability of observing the remaining sequence. Numerical calculation of α i (t) and β i (t) is not trivial (Akintug and Rasmussen, 2005). Here we use the method suggested by Durbin et al. (1998) for scaling forward and backward probabilities to overcome this problem. Now let us define u j (t) and v jk (t) as (Zucchini et al., 2017): Where M jk is the probability of transition from hidden climate state j to climate state k, and ζ is the likelihood function.
With EM algorithm, we aim to maximize the log-likelihood of the parameters of interest θ, based on complete data (i.e. both the observed data and the hidden climate states). Now let us represent the sequence of climate states (missing data) by the Markov chain by the zero-one random variables. The complete data log-likelihood can be formulated as: where u j (t) = 1 if and only if Φ t = j(t = 1, 2, ..., T ), and transition probability v jk (t) = 1 if and only if Φ t−1 = j and Φ t = k(t = 2, 3, ..., T ), N is the number of hidden climate states,δ j is the initial transition of Markov chain, and p j (.) is the probability mass function if the Markov chain is in state j at time t. Maximization of the complete data log-likelihood function is performed with the EM algorithm through an iterative process presented in Figure B1. rep., Université Paris-Diderot, Paris, https://tel.archives-ouvertes.fr/tel-00920438/, 2013.