Articles | Volume 29, issue 2
https://doi.org/10.5194/hess-29-335-2025
https://doi.org/10.5194/hess-29-335-2025
Research article
 | 
20 Jan 2025
Research article |  | 20 Jan 2025

State updating of the Xin'anjiang model: joint assimilating streamflow and multi-source soil moisture data via the asynchronous ensemble Kalman filter with enhanced error models

Junfu Gong, Xingwen Liu, Cheng Yao, Zhijia Li, Albrecht H. Weerts, Qiaoling Li, Satish Bastola, Yingchun Huang, and Junzeng Xu
Abstract

Assimilating either soil moisture or streamflow individually has been well demonstrated to enhance the simulation performance of hydrological models. However, the runoff routing process may introduce a lag between soil moisture and outlet discharge, presenting challenges in simultaneously assimilating the two types of observations into a hydrological model. The asynchronous ensemble Kalman filter (AEnKF), an adaptation of the ensemble Kalman filter (EnKF), is capable of utilizing observations from both the assimilation moment and the preceding periods, thus holding potential to address this challenge. Our study first merges soil moisture data collected from field soil moisture monitoring sites with China Meteorological Administration Land Data Assimilation System (CLDAS) soil moisture data. We then employ the AEnKF, equipped with improved error models, to assimilate both the observed outlet discharge and the merged soil moisture data into the Xin'anjiang model. This process updates the state variables of the model, aiming to enhance real-time flood forecasting performance. Tests involving both synthetic and real-world cases demonstrates that assimilation of these two types of observations simultaneously substantially reduces the accumulation of past errors in the initial conditions at the start of the forecast, thereby aiding in elevating the accuracy of flood forecasting. Moreover, the AEnKF with the enhanced error model consistently yields greater forecasting accuracy across various lead times compared to the standard EnKF.

1 Introduction

Floods, as one of the most frequent natural disasters, significantly affect infrastructure and agricultural yields and may even directly endanger the lives of local residents (Johnson et al., 2020). The destructiveness of flash floods is particularly notable. In recent decades, flash floods triggered by localized torrential rains have frequently resulted in significant human casualties (Pilon, 2002). Short-term flood forecasting, a vital non-structural approach to flood mitigation, plays a crucial role in facilitating emergency responses in flood-prone regions (Craninx et al., 2021). Hydrological models are instrumental in flood forecasting, utilizing mathematical and physical representations to analyze the various components of the catchment hydrological processes, including precipitation, evaporation, and runoff, as well as their interplay. This understanding aids in comprehending catchment hydrological characteristics and trends, which is crucial for simulating and forecasting hydrological processes. The Xin'anjiang model, extensively applied in operational short-term flood forecasting in China, stands as one of the well-known semi-distributed hydrological models. Its broad applicability, especially in the humid and semi-humid climate zones of the Yangtze River basin, has been substantiated by extensive studies (e.g., Fang et al., 2017; Gong et al., 2021; Zang et al., 2021).

However, in hydrological simulations, multiple sources of uncertainty, such as uncertainties in model inputs, structure, and parameters, can significantly affect the accuracy of the simulations (Beven, 1993; Ajami et al., 2007). In short-term flood forecasting, an additional process, often referred to as the real-time correction process, is typically employed to mitigate these uncertainties. A notable strategy in real-time correction involves the recursive adjustment of the hydrological model's state variables based on available real-time observational data. It helps reduce the error accumulation in the initial conditions of hydrological model, a factor that has been identified as a primary source of uncertainty at the start of flood forecasting (Shukla and Lettenmaier, 2011; Yossef et al., 2013; Thiboult et al., 2016). This process is sometimes termed hydrological data assimilation in literature (e.g., Clark et al., 2008). The ensemble Kalman filter (EnKF) (Evensen, 2003), which includes ensemble forecasting concepts with the Kalman filter and employs Monte Carlo methods for error statistical prediction, effectively addresses the inability of Kalman filtering to handle non-linear systems. Its robustness, flexibility, and ease of use have led to its widespread application in hydrological data assimilation (Clark et al., 2008; Liu et al., 2012; Rakovec et al., 2012; Piazzi et al., 2021).

Data assimilation typically falls into two categories: synchronous and asynchronous methods. Synchronous methods depend solely on observational data at a specific update moment, while asynchronous methods broaden this scope by incorporating data over a time frame, including both current and preceding time steps (Sakov and Bocquet, 2018). This distinction is particularly crucial in sequential assimilation, where commonly employed sequential filters like the EnKF utilize a synchronous strategy. Conversely, the asynchronous strategy is predominantly used in smoothers, such as the ensemble Kalman smoother (EnKS) (Evensen and van Leeuwen, 2000). While the EnKS augments reanalysis by integrating future observational data backwards in time, its forecasting efficacy (including real-time forecasting) aligns with that of the EnKF (Evensen, 2009). The intrinsic difference between smoothers and filters is their focus: smoothers assimilate future observational data, while filters process past observational data (Rakovec et al., 2015). Hence, in hydrological data assimilation with a focus on forecasting, filters are generally the preferred choice over smoothers.

In recent years, researchers have made strides in integrating asynchronous strategies into filters for sequential assimilation. This is notably evident in the development of the four-dimensional ensemble Kalman filter (4D-EnKF) (Hunt et al., 2004) and the four-dimensional local ensemble transform Kalman filter (4D-LETKF) (Hunt et al., 2007). The 4D-EnKF stands out for its ability to synchronize the timing of observations with lower computational demands, which is particularly effective in linear dynamics. In contrast, the 4D-LETKF builds upon the 4D-EnKF by prioritizing spatial localization and refining the handling of non-linear observation operators. This enhancement renders it more effective and versatile in managing high-dimensional, chaotic systems, especially in meteorology and climatology. Building on this, Sakov et al. (2010) and Sakov and Bocquet (2018) introduced the asynchronous ensemble Kalman filter (AEnKF). Remarkably, the AEnKF and the 4D-LETKF are essentially equivalent (Sakov et al., 2010), both employing ensemble-based methods to update model states based on observational data. The 4D-LETKF processes asynchronous observations by amalgamating them and updating the state via ensemble transform matrices. Conversely, the AEnKF accomplishes this by advancing corrections along the forecast system trajectory, utilizing ensemble observations from the observation time, thereby efficiently assimilating both past and future data. The AEnKF is designed to be computationally efficient, which is noted for its relative simplicity in implementation compared to the 4D-LETKF. It modifies the standard EnKF using ensemble observations from the time of observations, a straightforward change that does not significantly complicate the assimilation process. The AEnKF technique was first applied by Krymskaya (2013) to the problem of history matching in reservoir engineering. The study revealed that the AEnKF outperforms the EnKF in parameter estimation and utilizes the data with similar efficiency. The AEnKF is recognized for its simplicity and high computational efficiency, offering significant potential in short-term flood forecasting applications. Despite its promise, the scope of research in this area is relatively limited. Among the few studies conducted, Mazzoleni et al. (2018) evaluated AEnKF assimilation in simplified flow routing models, highlighting its exceptional performance in both lumped and distributed flow routing. Tao et al. (2016) summarized the hydrological forecasting test conducted during the 2014 Intense Observing Period of the Integrated Precipitation and Hydrology Experiment (IPHEx-IOP) campaign, proposing a framework for improving flood prediction in mountainous regions through the assimilation of discharge data using the AEnKF method, with a focus on enhancing forecast accuracy and reducing uncertainty. In addition, Rakovec et al. (2015) and our earlier study (Gong et al., 2024) applied the AEnKF to the distributed HBV-96 model and the Xin'anjiang model, respectively. These studies examined the effectiveness of the AEnKF in real-time correction through the assimilation of observed discharge in distributed and semi-distributed hydrological models, revealing that the AEnKF outperforms the standard EnKF. However, these studies assimilate only a single type of observational data (e.g., observed discharge) using the AEnKF method, which does not take full advantage of the AEnKF.

In the context of real-time correction processes employing the AEnKF, the types of observations we assimilate constitute another key factor influencing the effectiveness. Popular observation types that are currently assimilated include discharge, soil moisture, and snow data (Gong et al., 2023). In rainfall–runoff modeling, soil moisture plays a pivotal role in driving the runoff generation process (Massari et al., 2014). A wealth of research has demonstrated that updating hydrological model states through the assimilation of soil moisture significantly enhances the precision of runoff simulations and forecasts (e.g., Wanders et al., 2014; Alvarez-Garreton et al., 2015; Chao et al., 2022). These studies typically rely on a single type of soil moisture dataset. One of the highlights of our study is that it simultaneously considers the advantages of site observation data and soil reanalysis datasets, enhancing both the timeliness and the spatial accuracy of the soil moisture data. Specifically, in the real-time correction process of flood forecasting, there is a high demand for the timeliness of observational data to swiftly respond to flood events. Satellite remote sensing data and reanalysis products often suffer from delays in data release or lengthy observational intervals. In contrast, ground-based soil moisture measurements offer high accuracy and timeliness but are limited to point-scale data, failing to capture the spatial distribution of soil moisture. To overcome this limitation, the weighted k-nearest-neighbor (WKNN) algorithm (Pedregosa et al., 2011; Jung and Lee, 2017) is employed to fuse ground soil moisture measurements with reanalysis soil moisture data. This approach involves establishing a regression relationship between historical ground and reanalysis data, subsequently generating real-time, spatially distributed fusion soil moisture data from current ground observations. On the other hand, discharge observations, due to their direct relevance to flow or water level predictions crucial in flood forecasting, are another valuable choice for assimilation. They provide a comprehensive view of the hydrological conditions of a catchment. Discharge measurements are often more accessible and offer more timely data than soil moisture readings, generally yielding greater reliability (Li et al., 2013). Numerous studies have concentrated on assimilating observed discharge data to enhance flood forecasting, showcasing the substantial potential and impressive effectiveness of this strategy across various regions (e.g., Clark et al., 2008; Sun et al., 2020; Gong et al., 2023). Given that assimilating soil moisture or discharge alone can provide acceptable results, exploring the simultaneous assimilation of both observation types warrants consideration. Previous studies have highlighted the benefits of concurrently assimilating various observation types. Techniques such as the EnKF (Meng et al., 2017), variational assimilation (VAR) (Lee et al., 2011), and tempered particle filter (TPF) (García-Alén et al., 2023) have consistently shown that joint assimilation generally surpasses the efficiency of single-type assimilation. Although these findings are encouraging, the advantage of joint assimilation may not always hold. This is partly because each observation type represents a specific hydrological process, with correlations among variables varying across different spatial and temporal scales. For instance, soil moisture immediately responds to rainfall, while streamflow responses are inherently delayed due to the time delay in the routing process (Meng et al., 2017). Such delays can lead to the accumulation of uncertainties in discharge predictions, which is an aspect often overlooked in synchronized assimilation methods. Contrarily, the AEnKF method considers all observational data within a specific time window rather than just a single observation at the update time, effectively considering the time delays in routing processes and offering a novel approach for the combined assimilation of diverse observation types. However, to our knowledge, there are no existing studies on the performance of the AEnKF in assimilating multiple types of observational datasets (such as soil moisture and discharge measurements), which could significantly improve the accuracy of short-term flood forecasting.

In AEnKF assimilation, ensemble dispersion is achieved by introducing pre-determined noise (commonly zero-mean Gaussian noise) into model state variables and forcing data. The models governing these perturbations are termed error models, and their associated parameters are known as hyperparameters (Thiboult and Anctil, 2015). Improper handling of these uncertainties can potentially impair the efficacy of ensemble-based Kalman filters (Crow and Van Loon, 2006; Pathiraja et al., 2018). The commonly adopted practice involves setting the hyperparameters of error models based on the empirical knowledge of hydrologists or forecasters (e.g., Weerts and El Serafy, 2006; Clark et al., 2008; Sun et al., 2020). This approach is highly subjective, resulting in forecast results that may significantly differ among practitioners. The maximum a posteriori estimation (MAP) method (Li et al., 2014; Gong et al., 2023) represents a Bayesian inference technique specifically designed for ensemble-based Kalman filters. This method leverages historical observational data to objectively estimate the hyperparameters in error models, thereby substantially mitigating the subjectivity associated with hyperparameter configuration. Notably, the strengths of MAP method, compared to alternatives like the kernel conditional density estimation method (Pathiraja et al., 2018), include its independence from the need for sequential observations to be independent themselves. Furthermore, it enables concurrent estimation of hyperparameters across diverse error models, making it particularly compatible with the error models employed in the AEnKF. Another challenge in AEnKF assimilation is reducing the systematic biases that arise from perturbations. When creating ensemble dispersion using error models, it is implicitly assumed that the introduction of noise will not lead to systematic biases in the model outputs (Ryu et al., 2009). Nevertheless, the strong non-linearity of hydrological models and the stringent physical limitations on some state variables mean that even zero-mean Gaussian perturbations may result in systematic biases (Alvarez-Garreton et al., 2015). A case in point is soil moisture, which must stay below saturation levels. During flooding, when soil moisture approaches saturation, perturbing this variable risks breaching these physical boundaries. Subsequent corrections made by the hydrological model to align with saturation levels may introduce truncation errors in the prediction of the background field. To counter this, our study incorporates the bias-corrected Gaussian error model (BGEM) (Ryu et al., 2009), which introduces an unperturbed model run in parallel with the ensemble. This unperturbed model is utilized to correct the biases induced by perturbations. Our prior research (Gong et al., 2024) has shown that the BGEM is effective in alleviating systematic biases caused by random perturbations in soil moisture state variables. However, the performance of the AEnKF with these enhanced error models when assimilating multiple types of observations has yet to be further tested.

This study developed an efficient joint data assimilation framework for real-time correction of short-term flood forecasting based on the AEnKF with improved error models. One of the main highlights of this study is the consideration of the inherent limitations of single-source soil moisture data. By fusing ground-based soil moisture measurements with reanalysis data from the China Meteorological Administration Land Data Assimilation System (CLDAS), the study generates a reliable, real-time spatial distribution dataset of soil moisture that aligns with the 8 h observation intervals of monitoring sites. The second highlight is that the AEnKF with improved error models fully accounts for the time delays in routing process, enabling effective joint assimilation of soil moisture data and discharge observations. Upon establishing the appropriate assimilation time window for the AEnKF with improved error models, the study conducted a detailed comparison between the joint assimilation scheme and individual assimilation schemes (including the separate assimilation of soil moisture or discharge observation data) using synthetic and real-world cases. This comparison effectively underscores the superior performance of the joint assimilation framework proposed in this study.

2 Methodology

2.1 Hydrological model

The Xin'anjiang model, conceptualized by Zhao (1992), is a commonly used hydrological model primarily based on a saturation-excess mechanism. Renowned for its straightforward structure and explicit parameter definitions, this model excels in simulating humid catchments, making it a popular tool for flood forecasting in China. To account for spatial variability, the model typically divides a catchment into sub-catchments. These sub-catchments act as computational units for runoff generation and routing.

The Xin'anjiang model demands relatively simple forcing data, and key inputs include the areal mean rainfall depth (P) and pan evaporation (EM) for each sub-catchment. The model typically comprises four main components: evapotranspiration, runoff production, runoff separation, and flow routing, involving the calibration of 16 distinct parameters. The flow chart of the Xin'anjiang model is presented in Fig. 1. Soil evaporation is derived from pan evaporation data using a three-layer soil moisture module. The runoff generation is based on a saturation-excess mechanism, where runoff is produced only when the soil moisture in the unsaturated zone reaches field capacity. The lag-and-route method calculates the outflow from each sub-catchment. Flow routing from the sub-catchment outlets to the total catchment outlet employs the Muskingum method to successive sub-reaches. It is implemented by dividing the channel from each sub-catchment outlet to the total catchment outlet into varying numbers of sub-reaches. These sub-reaches are based on the distance from each sub-catchment outlet to the total catchment outlet. In addition, the catchment inflow to the outlet is directly calculated by the Muskingum method.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f01

Figure 1Flow chart of the Xin'anjiang model. The variables in the boxes indicate the model state, inputs, and outputs, and the symbols outside the corresponding blocks are model parameters.

Download

Zhao (1992) categorized the parameters of the Xin'anjiang model into sensitive and non-sensitive groups. In real-world cases, non-sensitive parameters are assigned values based on expert judgment, while optimal values for sensitive parameters are derived from historical data using the shuffled complex evolution (SCE-UA) method (Duan et al., 1992). For synthetic cases, however, parameters are taken to have recommended default values. Table 1 summarizes these parameters.

Table 1Parameters of the Xin'anjiang model.

* Bold and underlined parameters indicate that they are sensitive parameters.

Download Print Version | Download XLSX

2.2 Asynchronous ensemble Kalman filter

The asynchronous ensemble Kalman filter (AEnKF) represents a straightforward enhancement of the ensemble Kalman filter (EnKF), utilizing the same assimilation framework as the EnKF. We follow the notation of Ide et al. (1997) and Vetra-Carvalho et al. (2018) as closely as possible, aiming to make our paper accessible and practical for both data assimilation specialists and a broader audience interested in applying these methods. To this end, the dimensions of the state space and observation space, denoted as Nx and Ny. Further, the time index is always denoted in parentheses to the right of the variable, i.e., (.)(ti). Notably, the observational data are categorized into two types: the observed discharge at the catchment outlet and soil moisture across sub-catchments. During an ensemble run of the dynamic model, the assimilation process has two steps: the soil moisture observations are used to update the soil states, and the discharge observations are used to update the cumulative channel flow. Consequently, for each assimilation process, the values of Ny can differ, and the same applies to Nx.

2.2.1 Ensemble Kalman filter

At a given time ti, we define the model state vector as x(ti)RNx and the observation vector as y(ti)RNy. In the EnKF framework, it is crucial to generate a set of independent model state vectors. These vectors constitute an ensemble matrix, denoted as X(ti)RNx×Ne, where Ne is the total number of the ensemble members. The initial X(0) is obtained by the Monte Carlo method.

The state transfer equation at the forecast step is represented by

(1) x j f ( t i + 1 ) = M [ x j a ( t i ) , U ( t i ) ] + η ( t i ) ,

where M[.]:RNxRNx signifies the dynamic model, such as the Xin'anjiang model; U(ti) represents the forcing data (including rainfall and evaporation); and η(ti)RNx symbolizes the process or system noise characterized by a mean of zero and covariance matrix Q(ti). In addition, the subscript j signifies the ensemble index, ranging from 1 to Ne. The forecasted values from the dynamic model are marked with a superscript f, while the analysis (updated) values from the filter are denoted by a superscript a.

During the analysis step, we create a set of new observation vectors by perturbing the original observation vector y(ti), as described by

(2) y j o ( t i ) = y ( t i ) + ε ( t i ) ,

where yjo(ti)RNy represents the perturbed observation vector for the jth ensemble and ε(ti)RNy is Gaussian noise characterized by covariance matrix R(ti). We assume spatial independence of observation errors, thereby designating R(ti) as a diagonal matrix. Furthermore, the state update equation is expressed as follows:

(3) x j a ( t i ) = x j f ( t i ) + K ( t i ) ( y j o ( t i ) - H [ x j f ( t i ) ] ) ,

where ℋ[.] is the measurement operator that maps the state space to the observation space, which is also the Xin'anjiang model in this study, and K(ti) is the Kalman gain matrix calculated by

(4) K ( t i ) = P f ( t i ) H T [ HP f ( t i ) H T + R ( t i ) ] - 1 .

In scenarios where the state space dimensionality, Nx, is substantial, bypassing the direct computation of Pf(ti) in favor of calculating Pf(ti)HT and HPf(ti)HT emerges as a strategy to enhance computational efficiency, as highlighted by Nerger and Hiller (2013).

2.2.2 Asynchronous variant

The AEnKF is based on the concept of joint state–observation space, where the ensemble is replaced by a joint ensemble that combines state and observation information. Updating model states involves considering observations from both current and previous time steps, controlled by the assimilation time window, tw. This window defines the duration over which observations are considered for the analysis, for instance, including data from the previous 5 h. Moreover, when assimilating only current observations (tw=0), the AEnKF reverts to the standard EnKF. In the AEnKF, the observation vector is altered to

(5) y ̃ ( t i ) = [ y ( t i ) T , y ( t i - 1 ) T , , y ( t i - tw ) T ] T R ( tw + 1 ) N y ,

where ỹ(ti) is the joint observation vector and R̃(ti) denotes the covariance matrix of the associated observation noise, expressed as a diagonal matrix:

(6) R ̃ ( t i ) = R ( t i ) 0 R ( t i - 1 ) 0 R ( t i - tw ) .

Similarly, the model prediction vector from the prior tw time steps in the observation space is used to expand the state vector:

(7) x ̃ j f ( t i ) = x j f t i + 1 T , H [ x j f ( t i - 1 ) ] T , H [ x j f ( t i - 2 ) ] T , , H [ x j f ( t i - tw ) ] T T R N x + tw N y .

Furthermore, the new state definition introduces an augmented observation operator H̃(ti):

(8) H ̃ = H 0 I i 0 I i - tw ,

where I, with the corresponding subscript, stands for a series of identity elements on the diagonal, matching the dimensions in Eq. (7).

Following these augmented equations for x̃jf(ti), ỹ(ti), R̃(ti), and H̃, we can directly apply these augmented variables in the EnKF process (Sect. 2.2.1) to implement AEnKF assimilation. Crucially, in the joint state vector x̃jf(ti), model prediction vectors within the observation space, such as H[xjfti-1] and others, are considered diagnostic variables instead of state variables. As a result, they are not updated during the analysis step. Specifically, in Eq. (3), only the first Nx elements of the vector x̃j(ti) are calculated, while others are disregarded.

2.3 Error estimation

Both the EnKF and its variant update model states by employing a weighted average of observational data and model forecasts. This process highlights the crucial role of model and observational errors in determining the effectiveness of the assimilation system. In rainfall–runoff modeling in particular, where uncertainties in both model and observations are inherently ambiguous, generalizing these uncertainties is instrumental in acquiring refined approximations of suboptimal model states. A common technique involves adding unbiased noise to observations, model forcing, and model states.

Observations involved in this study include discharge at the catchment outlet and observed soil moisture. We generalize the observational errors as Gaussian perturbations related to the corresponding observed values (Weerts and El Serafy, 2006; Clark et al., 2008; Alvarez-Garreton et al., 2015). Given that rainfall serves as the most critical input information for the hydrological model, we employ log-normal multiplicative perturbations to describe the errors associated with rainfall, thereby representing the uncertainty in model forcing (McMillan et al., 2011; DeChant and Moradkhani, 2012). Moreover, we introduce a first-order autoregressive model to represent the temporal correlation within the observational errors and the forcing errors.

In the assimilation of observed discharge at catchment outlet, the key model state variable updated is cumulative channel flow. This variable represents the outflow from each sub-catchment on the routing calculation unit (sub-reaches in this study), denoted as QC. As Li et al. (2014) conclude, this state variables are perturbed using a Gaussian function. When assimilating observed soil moisture, the model state variables representing soil humidity need to be updated. Specifically, this refers to the tension water storage (including upper- and lower-layer tension water) and the free water storage in the Xin'anjiang model. In the Xin'anjiang model, the soil moisture state variables receive physical constraints. The free water storage (denoted as S) reflects the soil moisture in the topsoil layer, specifically the humus layer (Yao et al., 2012). Therefore, it is assumed that the free water storage can be considered to range between the saturation moisture content and the field capacity, with its upper limit controlled by the parameter SM and the lower limit set to zero. On the other hand, the tension water storage (denoted as W) represents the soil moisture throughout the entire soil profile, encompassing the whole unsaturated zone (Yao et al., 2012). Consequently, the tension water storage is considered to vary between the field capacity and the wilting point, with its upper limit governed by the parameter WM and the lower limit being zero. The WU, WL, and WD represent the upper-, lower-, and deep-layer tension water storage, respectively, with their upper limits controlled by the parameters WUM, WLM, and WDM and WM=WUM+WLM+WDM. When the variables approach the upper or lower limit, the Gaussian perturbations may cause it to violate the physical constraints. If the hydrological model corrects it, it will lead to the truncation error in the background field predictions. We introduce the bias-corrected Gaussian error model (BGEM) proposed by Ryu et al. (2009), which is aimed at reducing biases that emerge due to adherence to physical constraints.

The aforementioned error models are controlled by parameters known as hyperparameters (Thiboult and Anctil, 2015), such as the hyperparameters for Gaussian perturbations, which are the mean and standard deviation. We apply the maximum a posteriori estimation method (MAP) to identify the globally optimal values of these hyperparameters (Gong et al., 2023). The MAP method aims to maximize the probability density of the hyperparameters with given the observed historical flood events. Section S1 in the Supplement provides a comprehensive introduction to the implementation of error estimation in this study.

2.4 Multi-source soil moisture data fusion

The soil moisture reanalysis data are sourced from the China Meteorological Administration Land Data Assimilation System (CLDAS) near-real-time dataset (National Meteorological Information Centre, 2017). While the CLDAS dataset demonstrates a reasonable level of accuracy within China, with a regional average correlation coefficient of 0.89, a root mean squared error of 0.02 m3 m−3, and a bias of 0.01 m3 m−3 (Wang and Li, 2020), it faces limitations due to missing values in some areas and data latency (published with a 2 d lag), restricting its application in real-time flood forecasting in small and medium-sized catchments. On the other hand, ground station measurements offer high precision and timeliness (real-time data) but represent point-scale soil moisture, while the Xin'anjiang model simulates soil moisture as areal averages for sub-catchments, necessitating the consideration of spatial-scale effects. To bridge this gap and assimilate soil moisture observations into the Xin'anjiang model, this study employs the weighted k-nearest-neighbor (WKNN) algorithm (Pedregosa et al., 2011) to merge CLDAS soil moisture data (hereinafter referred to as CLDAS) with in situ soil moisture data collected from monitoring sites (hereinafter referred to as IN SITU). This method generates real-time, spatially distributed soil moisture data based on in situ observations and the spatial distribution from the CLDAS dataset, ensuring compatibility with the tension water storage and free water storage in the Xin'anjiang model.

The Harmonized World Soil Database (HWSD) provides a soil texture map for two layers: 0–30 cm (topsoil layer, T) and 30–100 cm (subsoil layer, S). Initially, using the technique by Reynolds et al. (2000), soil transfer functions are applied to the grid of the soil texture map. This process involves estimating the wilting point, θwp; field capacity, θfc; and saturation moisture content, θs, for each grid layer based on its soil clay and sand percentage contents along with the United States Department of Agriculture (USDA) soil texture classification. In this study, we assume that the soil moisture constants for each sub-catchment are the arithmetic average of the grid-scale soil moisture constants within the corresponding areas.

In the Xin'anjiang model, tension water capacity (WM), corresponding to available water capacity, is defined as the moisture content between the wilting point and field capacity, thus representing the thickness of the unsaturated zone. Free water capacity (SM) is defined as the moisture content between field capacity and saturation moisture content, relating to the thickness of the humus soil layer. Accordingly, we define a conceptual soil profile in the Xin'anjiang model, where the soil profile of tension water is divided into upper, lower, and deep layers. The capacity of each layer is calculated as

(9) WL = WWM θ fc - θ wp ,

where WL is the soil profile thickness matrix of tension water, WL=(WUL,WLL,WWL), representing the thickness of the upper, lower, and entire soil profile of tension water in millimeters, respectively, and WWM=(WUM,WLM,WM). Similarly, the thickness of the conceptual soil profile of free water is calculated as

(10) SL = SM θ s - θ fc .

Subsequently, linear interpolation is used to adjust the in situ data and CLDAS reanalysis soil moisture data, both at varying depths, to match the thickness of the conceptual soil profile. This step is followed by the calculation of tension and free water storage, derived from the transformed in situ and CLDAS data. The calculation formula is as follows:

(11a)WOBi=θ_WLi-θwpWL,WL=WUL000WLL000WWL,(11b)SOBi=θ_SLi-θfcSL,

where SOB and WOB=(WUOB,WLOB,WWOB), respectively, represent the free water storage and tension water storage at various layers, derived from observation data. These are referred to as the observed free water storage and observed tension water storage. θ_WL=(θ_WUL,θ_WLL,θ_WWL) and θ_SL indicate the soil moisture contents after linear interpolation to the respective conceptual soil profile thicknesses. θwp=(θwp,θwp,θwp). The subscript i indicates different datasets – namely, IN SITU or CLDAS.

Finally, using the WKNN method, soil moisture data from the IN SITU dataset is integrated with the CLDAS dataset. The specific implementation steps are as follows:

  1. Normalize the observed free water content and observed tension water content from the dataset using the min–max normalization method. Denote the normalized observation vector as PSM:

    (12) P S M i = ( WOB i , SOB i ) .
  2. The Minkowski distance is used to measure the proximity between the IN SITU data under evaluation and historical samples. A smaller distance indicates a closer match between the evaluated soil moisture content and the historical sample. The distance is calculated as follows:

    (13) d = j = 1 n | psm IN SITU, j RTD - psm IN SITU, j HD | p 1 / p ,

    where psmIN SITU,j represents the jth element of the vector PSMIN SITU and n is the dimension of PSMIN SITU. The superscript RTD stands for the data under evaluation, and HD denotes historical data. The distances between the data under evaluation and each historical sample are ranked in ascending order. The K nearest historical samples are then selected as reference indices based on this principle.

  3. The inverse distance weighting method is used to calculate the final observed free water storage and observed tension water storage based on the K nearest historical samples:

    (14a)ωm=1dm/m=1K1dm,(14b)PSMRGC=m=1KωmPSMCLDAS,m,

    where PSMRGC is the normalized merged observational soil moisture vector, ω represents the inverse distance weights, and PSMCLDAS,m is the normalized CLDAS observation data vector corresponding to the mth sample. The merged observed tension water storage, WOBRGC=(WUOBRGC,WLOBRGC,WWOBRGC,) and merged observed free water storage, SOBRGC, are obtained after denormalization

In this study, the grid search (GS) method (Bergstra and Bengio, 2012; Alibrahim and Ludwig, 2021) is employed to optimize the hyperparameters K and p, accompanied by a three-fold cross-validation. This approach ensures maximum R2 and minimum root mean squared error for the test set, balancing model generalizability with accuracy (Sect. S2, Table S1). For the multi-source soil moisture data fusion, 70 % of the historical dataset is used as the training set for model training, while the remaining 30 % serves as the test set to verify model generalization.

2.5 Evaluation metrics

In this study, we use four metrics to assess the assimilation effectiveness, focusing on both optimal single-value and ensemble performances, as suggested by McInerney et al. (2020). The optimal single-value performance, indicating the highest simulation accuracy, is represented by the ensemble mean values of the simulated discharge. The ensemble performance evaluation, in contrast, examines the simulated discharge ensemble through the lens of the ensemble forecasting, covering both the overall performance of ensemble and its reliability.

For quantitatively assessing the optimal single-value performance, we employ the normalized Nash–Sutcliffe efficiency coefficient (NNSE) (Nossent and Bauwens, 2012) and the root mean squared error (RMSE). The continuous ranked probability score (CRPS), introduced by Hersbach (2000), measures the overall performance of the ensemble. The reliability component of CRPS, denoted as RELI, focuses on assessing ensemble reliability. For these metrics, we use the ratios of the AEnKF to open loop (ensemble run without assimilation), represented as RRMSE, RCRPS, and RRELI. Moreover, the event-averaged values of these ratios are denoted as MRRMSE, MRCRPS, and MRRELI. The mean value of NNSE for multiple flood events is denoted as MNNSE. In synthetic cases, “synthetic true” values serve as the benchmark for all evaluation metrics, while observed values are used in real-world cases. Additional information about these metrics can be found in Sect. S3.

3 Study area and data

The Wuqiangxi Catchment (Fig. 2), is located in the middle reaches of the Yuan River, the third-largest tributary of the Yangtze River. It covers an area of approximately 8033 km2, with elevations ranging from 42 to 1396 m above sea level. The geographical coordinates of the catchment extend from 109°44 E to 111°01 E and from 28°01 N to 29°07 N. Situated in the mid-subtropical monsoon humid climate zone, the Wuqiangxi Catchment experiences abundant rainfall and has rich water resources. The average annual precipitation is around 1400 mm, with uneven distribution throughout the year, predominantly during the flood season (March to September). The catchment, located in the subtropical evergreen and deciduous broadleaf forest belt, features dense vegetation, predominantly forests and grasslands. The soil texture is primarily loamy. For this study, the Wuqiangxi Catchment is divided into 10 sub-catchments, each identified by red underlined numbers in Fig. 2b, ensuring at least one rain gauge in each sub-catchment. Among the three discharge stations in the study catchment,Wuqiangxibashang station provides the outflow data at the catchment outlet, while Hexi and Gaochetou are stations that provide inflow data for the study area. Due to the lack of soil moisture and rainfall data within their controlled areas, the control areas of Hexi and Gaochetou are not included in the study. For an overview of the data used in this study, please see Sect. S4.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f02

Figure 2Study catchment. (a) Digital elevation map (DEM). (b) Sub-catchments and observation stations. (c) Soil texture (0 to 30 cm). (d) Soil texture (30 to 100 cm).

4 Experimental setup

4.1 Warming-up period

In China, hydrometeorological data are typically reported at sub-daily intervals during flood periods and on a daily basis otherwise to support flood forecasting and water resource management. The Xin'anjiang model operates in two modes to meet these needs: it uses hourly simulations for flood forecasting and daily simulations for managing water resources. The hourly simulations require initial soil moisture for each sub-catchment, which is derived from the daily simulations (Chen et al., 2023). Consequently, a daily simulation must be performed prior the hourly simulation, and we recommend that this warming up (spin up) be at least 3 months long. This period enables the soil moisture simulated daily, driven by observed hydrometeorological data, to gradually approaches actual soil moisture (Kim et al., 2018). The influence of initial soil moisture on the daily simulation becomes minimal by the end of the warming-up period, allowing for soil moisture for daily simulation to be used as initial conditions for hourly simulation (Yao et al., 2012). The daily simulation in this study began on 10 February 2014. Testing showed that even in extreme cases where the initial soil moisture in the daily simulation is set to zero or fully saturated, there is almost no impact on the flood simulation results. So, they can be set arbitrarily within reason. In this study, the initial values for the daily simulation are set, with the soil moisture content at half of the saturation value, and the sub-reaches outflow is set to be the observed discharge at the catchment outlet on the start date divided by the total number of sub-reaches.

4.2 Synthetic cases

In the synthetic cases, the hydrological model operates on an hourly time step with a maximum lead time of 24 h, and ensemble simulations involve 100 members. The initial soil moisture is set to half of the maximum value. To ensure consistency in the length of forecast sequences and the comparability of results, the start time for forecasting the same flood event under different lead times is set to the same moment – specifically, the 24 h (maximum lead time) after the flood start time. To capture peak flows even at the maximum lead time, the start of each flood event is advanced by 24 h. However, due to the lack of hourly observations prior to the actual onset of the flood, data for these initial 24 h are derived by interpolating from daily observations. Synthetic data are generated as follows. Firstly, historical flood events are utilized to apply the MAP method, producing an optimal hyperparameter set, ψ^. Here, σ^lnp and α^lnp control the error model of forcing data (Sect. S1.1). This introduces random perturbations into hourly rainfall observations, creating a set of random rainfall data, referred to as synthetic true rainfall. Similarly, σ^yd and α^yd manage the observation error model (Sect. S1.2), perturbing catchment inflow to produce a dataset known as synthetic true inflow. Subsequently, the Xin'anjiang model, driven by the synthetic true rainfall and synthetic true inflow, along with the recommended parameters (see Table 1), outputs state variables (such as tension water storage) and discharge at the catchment outlet for each time step. These outputs are designated as the synthetic true state variables and synthetic true discharge. In the final phase, optimal hyperparameter sets (σ^ys,α^ys) and (σ^yd,α^yd) are applied to the observation error model. This step introduces random perturbations into the synthetic true state variables and synthetic true discharge, resulting in the creation of synthetic observed state variables and synthetic observed discharge. Specifically, synthetic observations of tension and free water storage are employed to update the simulated values in the Xin'anjiang model. On the other hand, synthetic discharge observation is utilized for updating cumulative channel flow. Both of these assimilation processes are conducted at an hourly interval.

4.3 Real-world cases

In the real-world cases, the time step and number of ensemble members are the same as in the synthetic cases. Similarly to the synthetic cases, to ensure the comparability of results, the forecast start time for all lead times is uniformly delayed from the flood onset (the earliest available hourly data) by a duration corresponding to the maximum lead time. For some flood events, high flow occurred as early as in the ninth hour after onset. To avoid missing the peak flow, the maximum lead time is set to 8 h. The observational tension water storage, WOBRGC, and free water storage, SOBRGC, as introduced in Sect. 2.4, are used to assimilate the simulated tension and free water storage in the Xin'anjiang model, with an assimilation interval of 8 h. Additionally, discharge observation is assimilated into the cumulative channel flow with a 1 h interval. Note that in both synthetic and real-world cases in this study, we use historical rainfall data as a perfect proxy for rainfall prediction with the aim of assessing temporal persistence of the assimilation effect without introducing uncertainty from numerical weather prediction. Temporal persistence refers to the duration over which the updating applied to state variables by the AEnKF at the start of forecasting continue to hold in the future.

By introducing the unbiased perturbations into the model forcing and states, and running the Xin'anjiang model in ensemble mode without assimilation, the operation is referred to as open loop (OL). In contrast, an ensemble run integrated with AEnKF assimilation is referred to as the AEnKF. To reduce the effects of random perturbations on outcomes, each flood event in our study is subjected to five repeated ensemble simulations. We then select the simulation corresponding to the median RMSE in the forecasted discharge as our final outcome.

5 Results

5.1 Synthetic cases

5.1.1 Hyperparameter estimation for error models

Most current assimilation methods, while suboptimal for complex hydrological processes, still yield reliable outcomes within a reasonably characterized uncertainty. Our approach to error characterization, widely adopted in hydrology, involves perturbing model forcing, observations, and states from an assumed distribution. We applied the MAP method for global hyperparameter optimization, with optimal parameters detailed in Table 2. These optimized hyperparameters are used in error models for both synthetic and real-world cases. However, given the limited number of flood events used for calibration, the hyperparameter optimization, akin to model parameter calibration, might exhibit uncertainty and parameter equifinality, leading to multiple hyperparameter combinations that may produce similar ensemble simulations.

Table 2Hyperparameters estimated by the MAP method.

Download Print Version | Download XLSX

5.1.2 The time window of the AEnKF

The AEnKF employs observational data from both the current and the preceding time periods for assimilation, with the duration of the past interval defined by the time window, ω. Determining the optimal duration for assimilating past observations is critical for the effectiveness of the AEnKF. If the time window is set too narrowly, the system might fail to fully capitalize on historical data to enhance assimilation precision. On the other hand, an excessively broad time window could lead the non-linear system incorporating irrelevant information from distant past periods, potentially undermining assimilation performance. Therefore, we conducted tests to assess the impact of varying time windows on discharge forecast accuracy. Specifically, for soil moisture observations, we explored three different time windows: ωs=1 h, ωs=3 h, and ωs=5 h. Similarly, for discharge observations, we examined time window, ωd, values of 1, 3, and 5 h. To facilitate clarity, these assimilation time windows are denoted using dual numerical subscripts. For instance, the AEnKF utilizing ωs=1 h and ωd=3 h is designated as AEnKF13, and similar nomenclature applies to other configurations.

The disparity in forecast discharge accuracy across different time windows is presented in Fig. 3. It shows the MNNSE and MRRMSE metrics for forecast discharge across lead times of 1 to 24 h under assorted time window combinations. It is observed that the performance of the AEnKF varies across these time windows. The most effective assimilation across all lead times is achieved with ωs=3 h and ωd=3 h. It is important to note, however, that even with the least effective time windows (ωs=1 h and ωd=5 h), performance of the AEnKF still surpasses that of the EnKF. In more detail, the time windows for soil moisture and discharge have complex interactions that collectively influence the forecast results for catchment outlet discharge. For soil moisture assimilation, a 3 h window demonstrates the most significant benefits. In terms of NNSE, the 5 h window outperforms the 1 h in most cases, except when ωd=3 h, where the reverse is true. In the assimilation of outlet discharge, the 3 h window generally proves most effective but with a larger soil moisture window (ωs=5 h), as assimilating discharge data with 1 h window yields the best results. Almost universally, the 1 h window performs as well as or surpasses the 5 h window. This indicates that longer assimilation windows do not necessarily yield better results. Therefore, for upcoming studies involving synthetic data cases, the AEnKF will utilize assimilation time windows of ωs=3 h and ωd=3 h.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f03

Figure 3The forecasted discharge accuracy for various time windows for (a) MNNSE and (b) MRRMSE.

Download

5.1.3 Multivariate observation assimilation scheme

Upon determining the assimilation time window, we meticulously analyzed the variances between three unique AEnKF assimilation strategies. These include the assimilation of solely observed soil moisture (labeled as AEnKFS), the assimilation of solely observed outlet discharge (labeled as AEnKFQ), and a joint assimilation of both observations types (labeled as AEnKFSQ).

In our assessment of one-step (1 h) prediction of outlet discharge, we examined the optimal single-value performance and ensemble efficacy of the three schemes. The evaluation of the optimal single-value performance was conducted using NNSE and RRMSE as metrics. Figure 4a–h illustrate the NNSE values during eight flood events in 2023 and 2024 (refer to Table S3). Significantly, in events no. 2023062100 and no. 2023072516, the catchment experienced minimal rainfall (only 1 h of rainfall exceeded 3 mm), with the flood dynamics largely driven by catchment inflows. Consequently, updates to soil moisture within the catchment had no influence on flood progression, and while assimilating observed discharge data slightly enhanced flood forecasting accuracy, the improvement was minimal and could be considered negligible. Conversely, in the other six events where rainfall predominantly influenced the flood dynamics, all three assimilation schemes outperformed the OL mode in NNSE scores, indicating improvements in one-step prediction accuracy to varying extents. Among these, AEnKFSQ, simultaneously assimilating observed soil moisture and discharge data, notably surpassed the other two schemes. This superiority is further supported by the RRMSE statistics in Fig. 4i, where the MRRMSE for AEnKFSQ showed a decrease of 0.11 and 0.16 compared to AEnKFS and AEnKFQ, respectively. Moreover, soil moisture assimilation and discharge assimilation exhibited comparable performances, with only a marginal difference of 0.05 in MRRMSE.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f04

Figure 4The optimal single-value performance of three AEnKF assimilation schemes for synthetic cases: (a–h) NNSE and (i) RRMSE.

Download

Subsequently, we conducted an evaluation of the ensemble performance across eight flood events, specifically examining the overall ensemble performance as measured by CRPS and the ensemble reliability as indicated by the RELI metric. In Fig. 5a, the distribution of RCRPS values is showcased. For the AEnKFQ scheme, RCRPS values fluctuated between 0.8 and 1.0, averaging 0.89; for AEnKFS, the range is from 0.39 to 1.03 with an average of 0.84; and for AEnKFSQ, it varied from 0.29 to 1.01, averaging 0.74. This demonstrates an enhancement in the overall ensemble performance for all schemes over the OL model, particularly for AEnKFSQ, which significantly outshone AEnKFS and AEnKFQ. Further, AEnKFS slightly outperforms AEnKFQ. Figure 5b illustrates the RRELI scores, showing a similar trend of improved ensemble reliability for all three schemes over OL. Here, the reliability of AEnKFSQ is notably higher than that of both AEnKFS and AEnKFQ. On the other hand, AEnKFS is more reliable compared to AEnKFQ.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f05

Figure 5The ensemble performance of three AEnKF assimilation schemes for synthetic cases: (a) RCRPS and (b) RRELI.

Download

Within the context of one-step prediction, Fig. 6 presents the RRMSE for updated state variables of the Xin'anjiang model under three distinct assimilation schemes, involving free water storage; tension water storage across upper, lower, and total layers; and cumulative channel flow across all sub-catchments. As anticipated, in the AEnKFQ scheme, which solely updates the cumulative channel flow without involving the runoff generation process, the state variables indicative of soil moisture (S, W, WU, WL) remain unaffected. This is reflected in Fig. 6a–d, where the mean RRMSE values associated with the grey boxes hover around 1.0. In the context of cumulative channel flow, AEnKFQ generally achieves a reduction in RMSE relative to OL.

In the case of AEnKFS, updating the soil moisture impacts both runoff generation and subsequent flow routing processes, leading to a response in all state variables. Significantly, the AEnKFS scheme demonstrates the most substantial corrections in free water storage (S), consistently yielding lower RMSE values than the OL scheme in all instances. When updating the total tension water storage (W), AEnKFS usually attains lower RMSE values compared to OL. However, this effect is less marked than that for free water storage. This is illustrated in Fig. 6b, where, except for Event 6 and Event 8, the average values associated with the red boxes exceed those in Fig. 6a. Contrastingly, updates to the upper and lower layers of tension water storage in the AEnKFS scheme produced opposite outcomes, with the RMSE values for these post-update state variables exceeding those of OL. This phenomenon can be attributed to the following: firstly, the initial soil moisture values in this study were set at half of their maximum, and during flood periods, the saturation-excess runoff generation mechanism ensures rapid saturation of both the upper and the lower tension water, reaching the maximum limits (WUM and WLM). Thereafter, due to the physical upper bounds of these variables, the assimilation process is hindered in effectively updating WU and WL values. Consequently, this may lead to a systematic underestimation of these values compared to actual measurements and consequently higher RMSE values than OL. In contrast, free water storage, even during flood periods, may not persistently reach its maximum (SM), resulting in the most advantageous update effect for it. These results highlight the criticality of choosing appropriate state variables for updates in hydrological model state updating, particularly when utilizing methods such as AEnKF.

In the case of AEnKFSQ, the updates to soil moisture state variables show similarities to those in AEnKFS. However, when it comes to the updates of cumulative channel flow, AEnKFSQ effectively integrates the strengths of both AEnKFQ and AEnKFS, resulting in a more outstanding performance. This outcome suggests that the concurrent assimilation of both soil moisture and discharge observations can efficiently utilize the advantages of each, leading to a greater assimilation accuracy than the assimilation of a single observation source.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f06

Figure 6Effects of three assimilation schemes on state variables. (a) Free water storage (S), (b) tension water storage (W), (c) upper tension water storage (WU), (d) lower tension water storage (WL), and (e) cumulative channel flow (QC).

Download

In our assessment, we analyzed the discharge simulation precision of three assimilation schemes over lead times ranging from 1 to 24 h, aiming to gauge the temporal persistence of the assimilation effect. Figure 7a–h present the NNSE for eight flood events. The events identified as no. 2023062100 (Fig. 7d) and no. 2023072516 (Fig. 7f) were mainly driven by inflows, exhibiting only slight improvements in state updates, and therefore are not included in further discussions. For the events identified as no. 2023050416, no. 2023052008, no. 2024040100, and no. 2024042900, the NNSE of each assimilation scheme exceeded that of OL across all lead times, indicating a consistent assimilation impact lasting up to 24 h. For the event labeled no. 2023040308, the temporal persistence for the AEnKFSQ, AEnKFS, and AEnKFQ schemes are noted as 8, 8, and 2 h, respectively; in the case of the event marked as no. 2023063000, these durations are 5, 4, and 1 h. Importantly, it is noteworthy that even in no. 2023063000, despite being the least effective, the NNSE discrepancy between AEnKFSQ and OL for lead times exceeding 8 h remains below 0.02. Furthermore, AEnKFSQ demonstrated superior performance in most flood events across all lead times compared to both AEnKFS and AEnKFQ. The notable exception is event no. 2023052008, where AEnKFSQ excelled within a 4 h lead time but slightly lagged behind the other two schemes beyond this duration. Nevertheless, the variance in NNSE for AEnKFSQ during this event stayed below 0.02. Figure 7i statistically illustrates the RRMSE values. Notably, the MRRMSE for each of the three assimilation schemes remains below 1.0 for all lead times, signifying that in terms of event averages, each scheme achieves a temporal persistence of up to 24 h. Additionally, the discharge forecast accuracy across nearly all lead times is ranked with AEnKFSQ surpassing AEnKFS, which itself exceeds AEnKFQ.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f07

Figure 7Assessment of forecast discharge accuracy across three assimilation schemes during 1 to 24 h lead times for (a–h) NNSE and (i) RRMSE.

Download

5.2 Real-world cases

In real-world cases, the sensitive parameters of Xin'anjiang model are calibrated based on historical flood events. After global optimization using the SCE-UA, the Xin'anjiang model, equipped with optimally sensitive parameters, exhibited an average NNSE of 0.89 for the calibration events and 0.8 for the validation events, demonstrating reliable and credible flood simulation and forecasting capabilities.

Considering that soil moisture observations are obtained every 8 h in real-world cases as opposed to every 1 h in synthetic cases, we have drawn on synthetic case results to establish an assimilation time window for soil moisture as close as possible to 3 h, set to 8 h. Therefore, ωs is designated to be 8 h, utilizing only the observations from the current time and those from 8 h earlier for assimilation. For the discharge assimilation, we set the time window to be consistent with the synthetic cases, i.e., ωd=3 h. Additionally, guided by the insights from synthetic cases, in real-world cases, we incorporate all available soil moisture observations but limit updates to the free water storage component of the Xin'anjiang model.

5.2.1 Fusion of in situ data with CLDAS soil moisture data

The soil moisture data fused using the WKNN model exhibits enhanced timeliness, with the soil moisture in each conceptual soil profile aligning closely with that of the CLDAS data (Table 3). Specifically, during the calibration set, the correlation coefficients with CLDAS data consistently exceed 0.9. Additionally, the correlation coefficients in the validation set are 0.85, 0.80, 0.84, and 0.75, respectively, indicating that the WKNN model possesses robustness and generalizability. It effectively captures the statistical relationship between point-scale and areal-scale soil moisture datasets.

Table 3The correlation coefficient between the fused soil moisture data and CLDAS soil moisture data.

Download Print Version | Download XLSX

5.2.2 Multivariate observation assimilation scheme

Within the context of one-step prediction, Fig. 8 displays the impact of updating free water storage in three different assimilation schemes, quantified by RRMSE. For the AEnKFQ scheme, there is no update to free water storage, leading to expected RRMSE values oscillating near 1.0. Conversely, both AEnKFS and AEnKFSQ successfully updated free water states, with mean values of RRMSE for free water storage in flood event simulations lying between 0.48 and 0.74. This demonstrates an average reduction in RMSE of 26 % to 52 % for free water storage across various flood events in comparison to the OL mode. Moreover, Fig. 8 reveals that, in the vast majority of cases, the whiskers (representing 1.5 times the interquartile range) of the red and blue boxes remain below 1.0. This indicates that both AEnKFS and AEnKFSQ successfully updated the free water storage in most sub-catchments for most flood events. The effective updates to free water storage will further impact the discharge process at the catchment outlet, which is discussed in detail in the subsequent sections.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f08

Figure 8Effects of three assimilation schemes on free water storage (S).

Download

We evaluate the optimal single-value performance of the AEnKF in one-step (1 h) prediction. Figure 9 illustrates the NNSE and RRMSE values achieved through three AEnKF assimilation schemes. In OL, the mean NNSE stands at 0.75. Following assimilation with three schemes, the mean values of NNSE improve to be 0.79, 0.78, and 0.81, respectively. The RRMSE of AEnKFQ fluctuates between 0.78 and 1.0, with an average of 0.88; for AEnKFS, it ranges from 0.71 to 1.02, averaging 0.91; and for AEnKFSQ, it varies from 0.64 to 0.99, with an average of 0.84. These results show that all three AEnKF assimilation schemes enhance the optimal single-value performance, with AEnKFSQ outperforming AEnKFQ, which in turn exceeds AEnKFS. Moreover, AEnKFSQ achieves a higher improvement ceiling in certain flood events. For instance, the maximum reduction in RMSE reaches 22 % for AEnKFQ, 29 % for AEnKFS, and up to 36 % for AEnKFSQ.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f09

Figure 9The optimal single-value performance of three AEnKF assimilation schemes for real-world cases for (a) NNSE and (b) RRMSE.

Download

Figure 10 utilizes RCRPS and RRELI metrics to evaluate overall ensemble performance and reliability. The RCRPS values for AEnKFQ are in the range of 0.81 to 1.0, averaging at 0.90; for AEnKFS, they span from 0.71 to 1.02, averaging 0.92; and for AEnKFSQ, they vary from 0.66 to 0.98, averaging 0.86. Notably, AEnKFQ exhibits the narrowest box plot, indicating a more focused distribution of RCRPS for this scheme. The average RCRPS for AEnKFS closely aligns with that of AEnKFQ, yet its box plot shows greater breadth at both the top and the bottom, suggesting a higher potential for improvement in overall ensemble performance but with increased instability. In contrast, the average RCRPS for AEnKFSQ is lower than those of the first two. While the box plot width for AEnKFSQ is similar to that of AEnKFS, the upper boundary of the box plot aligns more closely with AEnKFQ, and the upper whisker is shorter than that of AEnKFQ, indicating a comprehensive superiority of AEnKFSQ in overall ensemble performance compared to both AEnKFS and AEnKFQ. Similar findings also emerge in the assessment of ensemble reliability. AEnKFS and AEnKFQ exhibit similar mean RRELI values, but the box plot for AEnKFQ is more constricted. In contrast, AEnKFSQ shows a thorough superiority in ensemble reliability compared to both AEnKFS and AEnKFQ.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f10

Figure 10The ensemble performance of three AEnKF assimilation schemes for real-world cases: (a) RCRPS and (b) RRELI.

Download

We extend the examination to the temporal persistence of assimilation effects for these schemes. Figure 11 displays discharge forecasting accuracy across various lead times, as measured by NNSE and RRMSE. Within a lead time range of 1 to 8 h, both AEnKFS and AEnKFSQ demonstrate improvements in forecasting performance: AEnKFSQ exceeds AEnKFS within both lead times. AEnKFQ shows significantly shorter temporal persistence than the other two, slightly outperforming AEnKFS in 1 h lead time but with a rapid decline in accuracy as lead time increases. Past a 5 h lead time, the assimilation effect of AEnKFQ vanishes, leading to accuracy slightly below OL. And at different lead times, AEnKFSQ consistently outperforms AEnKFQ. This reveals that employing AEnKF for updating cumulative channel flow may notably enhance discharge forecasting accuracy in shorter lead times. While updating free water storage may not be as effective as AEnKFQ initially, it ensures a longer-lasting assimilation impact. The scheme of AEnKFSQ merges these strengths, offering robust discharge corrections and an extended temporal persistence of assimilation effects.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f11

Figure 11The accuracy of forecasted discharge under different lead times for (a) NNSE and (b) RRMSE.

Download

6 Discussion

6.1 Discussion of AEnKF time window in synthetic cases

In the study of assimilation windows for the AEnKF in synthetic cases, we found that longer assimilation windows do not necessarily yield better results (Fig. 3). This is primarily because a longer time window includes too much historical information, which may have a weak correlation with the current state variables. Due to the non-linearity of the hydrological model, where overly long windows can result in the system assimilating excessive noise, which negates the benefits derived from incorporating past observations. Tao et al. (2016) obtained similar results when studying the assimilation window length (1–3 h) for the assimilation of observed discharge only. They found that the 2 h time window generally yielded better assimilation results than the 3 h time window, while the 1 h time window performed the worst.

6.2 Discussion of two flood events in real-world cases

In flood simulation and forecasting, peak flow rates are a primary focus for researchers. Using the two flood events with the most significant peak flow errors in the OL mode in 2023 (no. 2023040308 and no. 2023052008) as case studies, we examined the variations in free water storage and discharge at the catchment outlet.

Figure 12 displays the hydrographs simulated for no. 2023040308. Black lines (and dots) signify observed values. Grey lines and bands represent the ensemble mean and range of OL, respectively. Similarly, green lines and bands illustrate the ensemble mean and range for the AEnKF. In examining the time series of free water storage, it is evident that observational data points almost never fall within the grey bands of the OL scheme. This indicates a notable difference between the soil moisture levels simulated by the Xin'anjiang model and those derived from observational data. Both AEnKFS and AEnKFQ exhibit similar update patterns, where the post-update ensemble mean values significantly shift towards observational data. Concurrently, this adjustment expands the ensemble bands, indicating an increase in ensemble simulation accuracy for AEnKFS and AEnKFQ along with an increased ensemble spread. In the analysis of the discharge time series, it becomes evident that the ensemble distribution from the AEnKF aligns more closely with observational data and presents a narrower bandwidth than that of OL. This trend suggests that the ensemble accuracy with the AEnKF exceeds that of the OL scheme and also demonstrates a reduced ensemble spread. Furthermore, the ensemble distribution observed during peak periods is more expansive than during the onset and recession periods of flood. This is attributed to the error models applied. These models introduce larger perturbations in the assimilation system during peak periods, leading to a broader ensemble distribution, which, in turn, ensures a more effective assimilation during these critical periods. In examining the time series of discharge, it is noted that both AEnKFQ and AEnKFS significantly reduced the height of the simulated flood peak. The AEnKFQ scheme shows effectiveness around the 20th hour, following the assimilation of approximately 20 discharge observations, achieving a relative error of 17 % in the simulated flood peak (maximum instantaneous flow) compared to the observed peak. AEnKFS started effectively updating the discharge following the assimilation of the third group of soil moisture observations at the 17th hour, which led to a flood peak relative error of 13 %. The AEnKFSQ scheme successfully amalgamates the strengths of both, culminating in a reduced flood peak relative error of merely 8 %.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f12

Figure 12Hydrograph during flood event labeled no. 2023040308 for the (a, b) AEnKFQ scheme, (c, d) AEnKFS scheme, and (e, f) AEnKFSQ scheme. The left panels show the discharge at the catchment outlet, and the right panels display the free water storage in sub-catchment 1.

Download

In the case of the flood event labeled no. 2023052008, as illustrated in Fig. 13, the time series exhibits a similar pattern to no. 2023040308. The peak flooding occurred between the 25th and 33rd hours, which corresponds to the period between the fourth and fifth sets of soil moisture observations. During this interval, there is a notable and rapid increase in free water storage. Figure 13c indicates that the AEnKFS fails to effectively adjust the discharge volumes around the peak period. Conversely, the AEnKFQ scheme, which focused on updating cumulative channel flow, successfully rectified the peak flooding. Owing to the ineffectiveness of free water content updates in discharge correction, the assimilation impact of AEnKFSQ closely matched that of AEnKFQ. In summary, it is apparent that AEnKFSQ effectively integrates the strengths of both the AEnKFS and AEnKFQ schemes. Even when one of these strategies fails to update effectively, AEnKFSQ still manages to enhance the precision of discharge predictions.

https://hess.copernicus.org/articles/29/335/2025/hess-29-335-2025-f13

Figure 13Hydrograph during flood event labeled no. 2023052008 for the (a, b) AEnKFQ scheme, (c, d) AEnKFS scheme, and (e, f) AEnKFSQ scheme. The left panels show the discharge at the catchment outlet, and the right panels display the free water storage in sub-catchment 1.

Download

6.3 Limitations

The Xin'anjiang model is a conceptual hydrological model that generalizes the rainfall–runoff process. Its most prominent feature is performing runoff production calculations based on the saturation-excess runoff mechanism, meaning net rainfall is first entirely used to replenish soil water, and once the soil moisture content in the unsaturated zone reaches field capacity, all subsequent net rainfall is used to generate runoff. Therefore, the Xin'anjiang model is mostly suitable for humid and semi-humid regions where the saturation-excess runoff mechanism is dominant and is less or not applicable to arid and semi-arid regions. However, it is important to note that the state updating method proposed in this study is not limited to coupling with the Xin'anjiang model. In fact, this method can be easily coupled with any lumped or semi-distributed hydrological model that includes state variables related to soil moisture and channel storage. When coupled with hydrological models suitable for semi-arid and arid regions, it can be effectively applied in those areas.

Semi-distributed hydrological models, like the Xin'anjiang model used in this study, have smaller state variable dimensions, allowing for the direct application of the proposed state updating scheme. However, in distributed models where each computational grid (e.g., DEM-based grids) has its own state variables, the state dimension becomes large, making direct application inefficient or prone to spurious correlations from distant observations. To resolve this, we recommend applying covariance localization (Janjić et al., 2011) to the AEnKF or other localization techniques (Khaniya et al., 2022). For instance, in covariance localization, a localization radius (RL) is set, and the forecast error covariance matrix is adjusted using a correlation matrix derived from the Schur product theorem. This study focuses on jointly assimilating soil moisture and streamflow using the AEnKF, and performing localization on the AEnKF is beyond the scope of this research. We will explore this further in future work.

Real-time flood forecasting is a dynamic prediction system based on real-time monitoring data combined with hydrological and/or hydrodynamic models to predict the evolution of flood processes. It provides critical information such as the time of peak flow, water levels, and discharge when a flood occurs. This type of forecasting is characterized by its high timeliness and short forecasting window, with the lead time generally set to several hours (e.g., Toth et al., 2000; Liu et al., 2016). The methods proposed in this study is particularly suited for state updating within real-time flood forecasting as it dynamically updates the state variables of the hydrological model using real-time observational data, reducing the accumulation of errors. In real-world cases, we set the maximum lead time to 8 h, which sufficiently meets the requirements for real-time flood forecasting in medium-sized catchments. This provides reliable real-time and near-real-time information for emergency responses, assisting government and flood control agencies in organizing evacuations, resource allocation, and reservoir operations, thereby minimizing casualties and property damage caused by floods. Moreover, to test the temporal persistence of the state updating method, we used historical observed rainfall as a perfect proxy for numerical weather forecasts, thereby avoiding the introduction of uncertainties from numerical weather predictions. As the lead time increases, uncertainties in numerical weather predictions may gradually replace the accumulation of errors in hydrological model state variables as the primary source of uncertainty in flood forecasting (Weerts and El Serafy, 2006; Yossef et al., 2013; Thiboult et al., 2016). For medium- to long-term flood forecasts, greater attention may need to be given to uncertainties stemming from numerical weather predictions.

7 Conclusions

This study uses the asynchronous ensemble Kalman filter (AEnKF) with enhanced error models for assimilating two types of observational data into the Xin'anjiang model. The data include observed discharge at catchment outlet and soil moisture gathered from multiple sources. The objective is to diminish error accumulation in the initial conditions of the Xin'anjiang model at the start of flood forecasting, thereby enhancing initial conditions. The assimilation framework includes advanced error models, such as the BGEM model to reduce systematic biases from perturbed soil moisture and the MAP method for the objective estimation of hyperparameters in the error model. The study specifically contrasts three AEnKF assimilation strategies: (1) the AEnKFQ scheme, which updates cumulative channel flow in the Xin'anjiang model by assimilating observed outlet discharge; (2) the AEnKFS scheme, which focuses on updating soil moisture variables in the model by assimilating fused soil moisture observations; and (3) the AEnKFSQ, a joint assimilation scheme, which combines both discharge and soil moisture assimilation processes.

Generally, the AEnKF is considered an effective approach for updating hydrological model states. It integrates a greater amount of observational data while barely increasing the computational burden, making it highly suitable for flood forecasting. The effectiveness of assimilation with the AEnKF relates to the assimilation time window. Results for synthetic data cases indicate that an appropriate setting involves a 3 h time window for assimilating observed soil moisture and outlet discharge. Moreover, in lead times ranging from 1 to 24 h, this method consistently outperforms the EnKF approach.

In synthetic case studies, while updating soil moisture state variables of the Xin'anjiang model, it is observed that effective updates are limited to free water storage and total tension water storage. This underscores the significance of choosing appropriate state variables for updates in the application of the AEnKF method. Further analysis revealed that with high-quality hourly available observational data, all three assimilation schemes maintained their effectiveness for up to 24 h lead time. Notably, AEnKFSQ demonstrated enhanced optimal single-value performance, overall ensemble performance, and ensemble reliability, surpassing both AEnKFS and AEnKFQ. Specifically, in the one-step forecast, the MRRMSE for AEnKFSQ decreased by 0.11 and 0.16 compared to AEnKFS and AEnKFQ, respectively; the MRCRPS for AEnKFSQ decreased by 0.10 and 0.15, and the MRRELI decreased by 0.20 and 0.15 compared to AEnKFS and AEnKFQ, respectively. AEnKFSQ's advantage in optimal single-value performance persists up to a 24 h lead time.

In the real-world case studies, we merged soil moisture data from in situ monitoring sites with the near-real-time CLDAS soil moisture data. This fusion produces spatially distributed data characterized by high temporal immediacy while addressing the limitation of point-scale in in situ soil data. In contrast to experiments using synthetic data, extending soil moisture observation intervals to 8 h impacts the performance of the AEnKFS scheme. In one-step prediction, the AEnKFSQ scheme exhibits the highest level of accuracy with the MRRMSE of 0.84. Concurrently, the simulation precision of the AEnKFQ scheme exceeds that observed in AEnKFS, with MRRMSE values of 0.88 and 0.91, respectively. Variations in results are observed under different lead times. AEnKFSQ and AEnKFS consistently demonstrate an assimilation effect duration of 8 h in contrast to the 5 h temporal persistence of the assimilation effect of AEnKFQ. The use of the AEnKF for updating cumulative channel flow markedly enhances the accuracy of discharge forecasting in a brief lead time. In contrast, the adjustment extent of discharge by updating free water storage in a single-step forecast might be lower than that achieved with AEnKFQ. Nevertheless, it guarantees a more sustained assimilation effect. The AEnKFSQ integrates the strengths of the previous two strategies, thereby improving discharge forecasting accuracy even when a particular strategy does not update effectively and prolonging the temporal persistence of the assimilation effect.

Code and data availability

Data will be made available on request. The code of the AEnKF was developed using the Parallel Data Assimilation Framework (PDAF) (https://doi.org/10.5281/zenodo.7861829, Nerger, 2023) and OpenDA (https://doi.org/10.5281/zenodo.8018104, Kramer et al., 2023). The SCE-UA algorithm was implemented via the Uncertainty Quantification Python Laboratory (UQ-PyL) (http://www.uq-pyl.com/, Wang et al., 2022). The soil moisture reanalysis data are downloaded from the China Meteorological Administration Land Data Assimilation System (CLDAS) near-real-time dataset (http://data.cma.cn/data/detail/dataCode/NAFP_CLDAS2.0_NRT.html, National Meteorological Information Centre, 2017). We thank these organizations for granting us permission to use their data and software.

Supplement

The supplement related to this article is available online at: https://doi.org/10.5194/hess-29-335-2025-supplement.

Author contributions

JG: conceptualization, methodology, software, visualization, and writing (original draft and review and editing), funding acquisition. XL: methodology, software. CY: data curation, software, funding acquisition, and writing (review and editing). ZL: project administration, funding acquisition, and writing (review and editing). AW: conceptualization and writing (review and editing). QL: supervision and writing (review and editing). SB: writing (review and editing). YH: validation and funding acquisition. JX: supervision and resources.

Competing interests

At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

We would like to express our heartfelt gratitude to Liaofan Lin (Cooperative Institute for Research in the Atmosphere, Colorado State University, and Global Systems Laboratory, Oceanic and Atmospheric Research, NOAA) for his constructive comments and invaluable assistance in enhancing the quality of this research. The hydrometeorological data, including evaporation, precipitation, and discharge, were provided by the Hydrology Bureau of Hunan Province.

Financial support

This research has been supported by the Xinjiang Uygur Autonomous Region Key Research and Development Project (grant no. 2023B02044-2), the Postdoctoral Fellowship Program of CPSF (China Postdoctoral Science Foundation) (grant no. GZC20240377), the Fundamental Research Funds for the Central Universities (grant nos. B240201181 and B240203007), the China Postdoctoral Science Foundation (grant no. 2024M760743), the National Natural Science Foundation of China (grant no. 52079035), the Natural Science Foundation of Fujian Province (grant no. 2024J01267), and the Anhui Provincial Natural Science Foundation (grant no. 2208085US06).

Review statement

This paper was edited by Yi He and reviewed by two anonymous referees.

References

Ajami, N. K., Duan, Q. Y., and Sorooshian, S.: An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction, Water Resour. Res., 43, W01403, https://doi.org/10.1029/2005wr004745, 2007. 

Alibrahim, H. and Ludwig, S. A.: Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization, in: 2021 IEEE Congress on Evolutionary Computation (CEC), Kraków, Poland, 28 June–1 July 2021, 1551–1559, https://doi.org/10.1109/CEC45853.2021.9504761, 2021. 

Alvarez-Garreton, C., Ryu, D., Western, A. W., Su, C.-H., Crow, W. T., Robertson, D. E., and Leahy, C.: Improving operational flood ensemble prediction by the assimilation of satellite soil moisture: comparison between lumped and semi-distributed schemes, Hydrol. Earth Syst. Sci., 19, 1659–1676, https://doi.org/10.5194/hess-19-1659-2015, 2015. 

Bergstra, J. and Bengio, Y.: Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13, 281–305, 2012. 

Beven, K.: Prophecy, reality and uncertainty in distributed hydrological modelling, Adv. Water Resour., 16, 41–51, https://doi.org/10.1016/0309-1708(93)90028-E, 1993. 

Chao, L. J., Zhang, K., Wang, S., Gu, Z., Xu, J. Z., and Bao, H. J.: Assimilation of surface soil moisture jointly retrieved by multiple microwave satellites into the WRF-Hydro model in ungauged regions: Towards a robust flood simulation and forecasting, Environ. Modell. Softw., 154, 105421, https://doi.org/10.1016/j.envsoft.2022.105421, 2022. 

Chen, X. Y., Zhang, K., Luo, Y. N., Zhang, Q. N., Zhou, J. Q., Fan, Y. Z., Huang, P. N., Yao, C., Chao, L. J., and Bao, H. H.: A distributed hydrological model for semi-humid watersheds with a thick unsaturated zone under strong anthropogenic impacts: A case study in Haihe River Basin, J. Hydrol., 623, 129765, https://doi.org/10.1016/j.jhydrol.2023.129765, 2023. 

Clark, M. P., Rupp, D. E., Woods, R. A., Zheng, X., Ibbitt, R. P., Slater, A. G., Schmidt, J., and Uddstrom, M. J.: Hydrological data assimilation with the ensemble Kalman filter: Use of streamflow observations to update states in a distributed hydrological model, Adv. Water Resour., 31, 1309–1324, https://doi.org/10.1016/j.advwatres.2008.06.005, 2008. 

Craninx, M., Hilgersom, K., Dams, J., Vaes, G., Danckaert, T., and Bronders, J.: Flood4castRTF: A real-time urban flood forecasting model, Sustainability-Basel, 13, 5651, https://doi.org/10.3390/su13105651, 2021. 

Crow, W. T. and Van Loon, E.: Impact of incorrect model error assumptions on the sequential assimilation of remotely sensed surface soil moisture, J. Hydrometeorol., 7, 421–432, https://doi.org/10.1175/JHM499.1, 2006. 

DeChant, C. M. and Moradkhani, H.: Examining the effectiveness and robustness of sequential data assimilation methods for quantification of uncertainty in hydrologic forecasting, Water Resour. Res., 48, W04518, https://doi.org/10.1029/2011WR011011, 2012. 

Duan, Q. Y., Sorooshian, S., and Gupta, V.: Effective and efficient global optimization for conceptual rainfall-runoff models, Water Resour. Res., 28, 1015–1031, https://doi.org/10.1029/91WR02985, 1992. 

Evensen, G.: The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean Dyn., 53, 343–367, https://doi.org/10.1007/s10236-003-0036-9, 2003. 

Evensen, G.: Data assimilation: the ensemble Kalman filter, Springer, Berlin, https://doi.org/10.1007/978-3-642-03711-5, 2009. 

Evensen, G. and van Leeuwen, P. J.: An ensemble Kalman smoother for nonlinear dynamics, Mon. Weather Rev., 128, 1852–1867, https://doi.org/10.1175/1520-0493(2000)128<1852:AEKSFN>2.0.CO;2, 2000. 

Fang, Y.-H., Zhang, X., Corbari, C., Mancini, M., Niu, G.-Y., and Zeng, W.: Improving the Xin'anjiang hydrological model based on mass–energy balance, Hydrol. Earth Syst. Sci., 21, 3359–3375, https://doi.org/10.5194/hess-21-3359-2017, 2017. 

García-Alén, G., Hostache, R., Cea, L., and Puertas, J.: Joint assimilation of satellite soil moisture and streamflow data for the hydrological application of a two-dimensional shallow water model, J. Hydrol., 621, 129667, https://doi.org/10.1016/j.jhydrol.2023.129667, 2023. 

Gong, J. F., Yao, C., Li, Z. J., Chen, Y. F., Huang, Y. C., and Tong, B. X.: Improving the flood forecasting capability of the Xinanjiang model for small- and medium-sized ungauged catchments in South China, Nat. Hazard., 106, 2077–2109, https://doi.org/10.1007/s11069-021-04531-0, 2021. 

Gong, J. F., Weerts, A. H., Yao, C., Li, Z. J., Huang, Y. C., Chen, Y. F., Chang, Y. F., and Huang, P. N.: State updating in a distributed hydrological model by ensemble Kalman filtering with error estimation, J. Hydrol., 620, 129450, https://doi.org/10.1016/j.jhydrol.2023.129450, 2023. 

Gong, J. F., Yao, C., Weerts, A. H., Li, Z. J., Wang, X. Y., Xu J. Z., and Huang, Y. C.: State updating in Xin'anjiang model by Asynchronous ensemble Kalman filtering with enhanced error models, J. Hydrol., 640, 131726, https://doi.org/10.1016/j.jhydrol.2024.131726, 2024. 

Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems, Weather Forecast., 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2, 2000. 

Hunt, B. R., Kalnay, E., Kostelich, E. J., Ott, E., Patil, D. J., Sauer, T., Szunyogh, I., Yorke, J. A., and Zimin, A. V.: Four-dimensional ensemble Kalman filtering, Tellus A, 56, 273–277, https://doi.org/10.3402/tellusa.v56i4.14424, 2004. 

Hunt, B. R., Kostelich, E. J., and Szunyogh, I.: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter, Physica D, 230, 112–126, https://doi.org/10.1016/j.physd.2006.11.008, 2007. 

Ide, K., Courtier, P., Ghil, M., and Lorenc, A. C.: Unified notation for data assimilation: Operational, sequential and variational, J. Meteorolog. Soc. Jpn., 75, 181–189, https://doi.org/10.2151/jmsj1965.75.1B_181, 1997. 

Janjić, T., Nerger, L., Albertella, A., Schröter, J., and Skachko, S.: On domain localization in ensemble-based Kalman filter algorithms, Mon. Weather Rev., 139, 2046–2060, https://doi.org/10.1175/2011MWR3552.1, 2011. 

Johnson, K. A., Wing, O. E. J., Bates, P. D., Fargione, J., Kroeger, T., Larson, W. D., Sampson, C. C., and Smith, A. M.: A benefit-cost analysis of floodplain land acquisition for US flood damage reduction, Nat. Sustainability, 3, 56–62, https://doi.org/10.1038/s41893-019-0437-5, 2020. 

Jung, W. H. and Lee, S. G.: An Arrhythmia Classification Method in Utilizing the Weighted KNN and the Fitness Rule, IRBM, 38, 138–148, https://doi.org/10.1016/j.irbm.2017.04.002, 2017. 

Khaniya, M., Tachikawa, Y., Ichikawa, Y., and Yorozu, K.: Impact of assimilating dam outflow measurements to update distributed hydrological model states: Localization for improving ensemble Kalman filter performance, J. Hydrol., 608, 127651, https://doi.org/10.1016/j.jhydrol.2022.127651, 2022. 

Kim, K. B., Kwon, H. H., and Han, D. W.: Exploration of warm-up period in conceptual hydrological modelling, J. Hydrol., 556, 194–210, https://doi.org/10.1016/j.jhydrol.2017.11.015, 2018. 

Kramer, W., van Velzen, N., Pelgrim, E., theavuik, aljavrieling, verlaanm, carbonatezero, Ridler, M., sumihar, Drost, N., robprev, michael-topsom-deltares, van Meeuwen, D., Michiel, Guarneri, H., vt-max, Robot 144, Mourits, A., Spee, E., and huibtanis: OpenDA-Association/OpenDA: OpenDA 3.1.1, Zenodo[code], https://doi.org/10.5281/zenodo.8018104, 2023. 

Krymskaya, M. V.: Quantification of the impact of data in reservoir modeling, TU Delft, https://doi.org/10.4233/uuid:deffd661-aa01-43f2-bbac-acff15e7ccc6, 2013. 

Lee, H., Seo, D. J., and Koren, V.: Assimilation of streamflow and in situ soil moisture data into operational distributed hydrologic models: Effects of uncertainties in the data and initial model soil moisture states, Adv. Water Resour., 34, 1597–1615, https://doi.org/10.1016/j.advwatres.2011.08.012, 2011. 

Li, Y., Ryu, D., Western, A. W., and Wang, Q. J.: Assimilation of stream discharge for flood forecasting: The benefits of accounting for routing time lags, Water Resour. Res., 49, 1887–1900, https://doi.org/10.1002/wrcr.20169, 2013. 

Li, Y., Ryu, D., Western, A. W., Wang, Q. J., Robertson, D. E., and Crow, W. T.: An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting, J. Hydrol., 519, 2722–2736, https://doi.org/10.1016/j.jhydrol.2014.08.009, 2014. 

Liu, Y., Weerts, A. H., Clark, M., Hendricks Franssen, H.-J., Kumar, S., Moradkhani, H., Seo, D.-J., Schwanenberg, D., Smith, P., van Dijk, A. I. J. M., van Velzen, N., He, M., Lee, H., Noh, S. J., Rakovec, O., and Restrepo, P.: Advancing data assimilation in operational hydrologic forecasting: progresses, challenges, and emerging opportunities, Hydrol. Earth Syst. Sci., 16, 3863–3887, https://doi.org/10.5194/hess-16-3863-2012, 2012. 

Liu, Z., Guo, S., Zhang, H., Liu, D., and Yang, G.: Comparative study of three updating procedures for real-time flood forecasting, Water Resour. Manag., 30, 2111–2126, https://doi.org/10.1007/s11269-016-1275-0, 2016. 

Massari, C., Brocca, L., Moramarco, T., Tramblay, Y., and Lescot, J. F. D.: Potential of soil moisture observations in flood modelling: Estimating initial conditions and correcting rainfall, Adv. Water Resour., 74, 44–53, https://doi.org/10.1016/j.advwatres.2014.08.004, 2014. 

Mazzoleni, M., Noh, S. J., Lee, H., Liu, Y. Q., Seo, D. J., Amaranto, A., Alfonso, L., and Solomatine, D. P.: Real-time assimilation of streamflow observations into a hydrological routing model: effects of model structures and updating methods, Hydrolog. Sci. J., 63, 386–407, https://doi.org/10.1080/02626667.2018.1430898, 2018. 

McInerney, D., Thyer, M., Kavetski, D., Laugesen, R., Tuteja, N., and Kuczera, G.: Multi-temporal Hydrological Residual Error Modeling for Seamless Subseasonal Streamflow Forecasting, Water Resour. Res., 56, e2019WR026979, https://doi.org/10.1029/2019WR026979, 2020. 

McMillan, H., Jackson, B., Clark, M., Kavetski, D., and Woods, R.: Rainfall uncertainty in hydrological modelling: An evaluation of multiplicative error models, J. Hydrol., 400, 83–94, https://doi.org/10.1016/j.jhydrol.2011.01.026, 2011. 

Meng, S. S., Xie, X. H., and Liang, S. L.: Assimilation of soil moisture and streamflow observations to improve flood forecasting with considering runoff routing lags, J. Hydrol., 550, 568–579, https://doi.org/10.1016/j.jhydrol.2017.05.024, 2017. 

National Meteorological Information Centre: The China Meteorological Administration Land Data Assimilation System (CLDAS-V2.0) near-real-time product dataset, China Meteorological Data Service Centre [data set], http://data.cma.cn/data/detail/dataCode/NAFP_CLDAS2.0_NRT.html (last access: 16 January 2025), 2017. 

Nerger, L. and Hiller, W.: Software for ensemble-based data assimilation systems-Implementation strategies and scalability, Comput. Geosci., 55, 110–118, https://doi.org/10.1016/j.cageo.2012.03.026, 2013. 

Nerger, L.: The Parallel Data Assimilation Framework (PDAF) (v2.1), Zenodo [code], https://doi.org/10.5281/zenodo.7861829, 2023. 

Nossent, J. and Bauwens, W.: Application of a normalized Nash-Sutcliffe efficiency to improve the accuracy of the Sobol'sensitivity analysis of a hydrological model, in: EGU General Assembly 2012, Vienna, Austria, 22–27 April, 237, https://ui.adsabs.harvard.edu/abs/2012EGUGA..14..237N/abstract (last access: 16 January 2025), 2012. 

Pathiraja, S., Moradkhani, H., Marshall, L., Sharma, A., and Geenens, G.: Data-driven model uncertainty estimation in hydrologic data assimilation, Water Resour. Res., 54, 1252–1280, https://doi.org/10.1002/2018WR022627, 2018. 

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., and Dubourg, V.: Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011. 

Piazzi, G., Thirel, G., Perrin, C., and Delaigue, O.: Sequential data assimilation for streamflow forecasting: Assessing the sensitivity to uncertainties and updated variables of a conceptual hydrological model at basin scale, Water Resour. Res., 57, e2020WR028390, https://doi.org/10.1029/2020WR028390, 2021. 

Pilon, P. J.: Guidelines for reducing flood losses, Tech. Rep., United Nations International Strategy for Disaster Reduction (UNISDR), 87 pp., http://www.unisdr.org/files/558_7639.pdf (last access: 16 January 2025), 2002. 

Rakovec, O., Weerts, A. H., Hazenberg, P., Torfs, P. J. J. F., and Uijlenhoet, R.: State updating of a distributed hydrological model with Ensemble Kalman Filtering: effects of updating frequency and observation network density on forecast accuracy, Hydrol. Earth Syst. Sci., 16, 3435–3449, https://doi.org/10.5194/hess-16-3435-2012, 2012. 

Rakovec, O., Weerts, A. H., Sumihar, J., and Uijlenhoet, R.: Operational aspects of asynchronous filtering for flood forecasting, Hydrol. Earth Syst. Sci., 19, 2911–2924, https://doi.org/10.5194/hess-19-2911-2015, 2015. 

Reynolds, C. A., Jackson, T. J., and Rawls, W. J.: Estimating soil water-holding capacities by linking the Food and Agriculture Organization soil map of the world with global pedon databases and continuous pedotransfer functions, Water Resour. Res., 36, 3653–3662, https://doi.org/10.1029/2000WR900130, 2000. 

Ryu, D., Crow, W. T., Zhan, X. W., and Jackson, T. J.: Correcting Unintended Perturbation Biases in Hydrologic Data Assimilation, J. Hydrometeorol., 10, 734-750, https://doi.org/10.1175/2008JHM1038.1, 2009. 

Sakov, P. and Bocquet, M.: Asynchronous data assimilation with the EnKF in presence of additive model error, Tellus A, 70, 1414545, https://doi.org/10.1080/16000870.2017.1414545, 2018. 

Sakov, P., Evensen, G., and Bertino, L.: Asynchronous data assimilation with the EnKF, Tellus A, 62, 24–29, https://doi.org/10.1111/j.1600-0870.2009.00417.x, 2010. 

Shukla, S. and Lettenmaier, D. P.: Seasonal hydrologic prediction in the United States: understanding the role of initial hydrologic conditions and seasonal climate forecast skill, Hydrol. Earth Syst. Sci., 15, 3529–3538, https://doi.org/10.5194/hess-15-3529-2011, 2011. 

Sun, Y., Bao, W., Valk, K., Brauer, C. C., Sumihar, J., and Weerts, A. H.: Improving forecast skill of lowland hydrological models using ensemble Kalman filter and unscented Kalman filter, Water Resour. Res., 56, e2020WR027468, https://doi.org/10.1029/2020WR027468, 2020. 

Tao, J., Wu, D., Gourley, J., Zhang, S. Q., Crow, W., Peters-Lidard, C., and Barros, A. P.: Operational hydrological forecasting during the IPHEx-IOP campaign – Meet the challenge, J. Hydrol., 541, 434–456, https://doi.org/10.1016/j.jhydrol.2016.02.019, 2016. 

Thiboult, A. and Anctil, F.: On the difficulty to optimally implement the Ensemble Kalman filter: An experiment based on many hydrological models and catchments, J. Hydrol., 529, 1147–1160, https://doi.org/10.1016/j.jhydrol.2015.09.036, 2015. 

Thiboult, A., Anctil, F., and Boucher, M.-A.: Accounting for three sources of uncertainty in ensemble hydrological forecasting, Hydrol. Earth Syst. Sci., 20, 1809–1825, https://doi.org/10.5194/hess-20-1809-2016, 2016. 

Toth, E., Brath, A., and Montanari, A.: Comparison of short-term rainfall prediction models for real-time flood forecasting, J. Hydrol., 239, 132–147, https://doi.org/10.1016/S0022-1694(00)00344-9, 2000. 

Vetra-Carvalho, S., Van Leeuwen, P. J., Nerger, L., Barth, A., Altaf, M. U., Brasseur, P., Kirchgessner, P., and Beckers, J. M.: State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems, Tellus A, 70, 1445364, https://doi.org/10.1080/16000870.2018.1445364, 2018. 

Wanders, N., Karssenberg, D., de Roo, A., de Jong, S. M., and Bierkens, M. F. P.: The suitability of remotely sensed soil moisture for improving operational flood forecasting, Hydrol. Earth Syst. Sci., 18, 2343–2357, https://doi.org/10.5194/hess-18-2343-2014, 2014. 

Wang, Y. Y. and Li, G. C.: Evaluation of simulated soil moisture from China Land Data Assimilation System (CLDAS) land surface models, Remote Sens. Lett., 11, 1060–1069, https://doi.org/10.1080/2150704X.2020.1820614, 2020. 

Wang, C., Duan, Q., Tong, C. H., Di, Z., and Gong, W.: Uncertainty Quantification Python Laboratory (UQ-PyL) (v1.0 Windows Binary), UQ-PyL (Uncertainty Quantification Python Laboratory) [software], http://www.uq-pyl.com/ (last access: 16 January 2025), 2022. 

Weerts, A. H. and El Serafy, G. Y. H.: Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff models, Water Resour. Res., 42, W09403, https://doi.org/10.1029/2005WR004093, 2006.  

Yao, C., Li, Z. J., Yu, Z. B., and Zhang, K.: A priori parameter estimates for a distributed, grid-based Xinanjiang model using geographically based information, J. Hydrol., 468, 47–62, https://doi.org/10.1016/j.jhydrol.2012.08.025, 2012. 

Yossef, N. C., Winsemius, H., Weerts, A. H., van Beek, R., and Bierkens, M. F. P.: Skill of a global seasonal streamflow forecasting system, relative roles of initial conditions and meteorological forcing, Water Resour. Res., 49, 4687–4699, https://doi.org/10.1002/wrcr.20350, 2013. 

Zang, S. H., Li, Z. J., Zhang, K., Yao, C., Liu, Z. Y., Wang, J. F., Huang, Y. C., and Wang, S.: Improving the flood prediction capability of the Xin'anjiang model by formulating a new physics-based routing framework and a key routing parameter estimation method, J. Hydrol., 603, 126867, https://doi.org/10.1016/j.jhydrol.2021.126867, 2021. 

Zhao, R. J.: The Xinanjiang model applied in China, J. Hydrol., 135, 371–381, https://doi.org/10.1016/0022-1694(92)90096-E, 1992. 

Download
Short summary
Our study introduces a new method to improve flood forecasting by combining soil moisture and streamflow data using an advanced data assimilation technique. By integrating field and reanalysis soil moisture data and assimilating this with streamflow measurements, we aim to enhance the accuracy of flood predictions. This approach reduces the accumulation of past errors in the initial conditions at the start of the forecast, helping to better prepare for and respond to floods.