Improving operational flood ensemble prediction by the assimilation of satellite soil moisture : comparison between lumped and semi-distributed schemes

Assimilation of remotely sensed soil moisture data (SM-DA) to correct soil water stores of rainfall-runoff models has shown skill in improving streamflow prediction. In the case of large and sparsely monitored catchments, SM-DA is a particularly attractive tool. Within this context, we assimilate satellite soil moisture (SM) retrievals from the Advanced Microwave Scanning Radiometer (AMSR-E), the Advanced Scatterometer (ASCAT) and the Soil Moisture and Ocean Salinity (SMOS) instrument, using an Ensemble Kalman filter to improve operational flood prediction within a large (> 40 000 km) semi-arid catchment in Australia. We assess the importance of accounting for channel routing and the spatial distribution of forcing data by applying SM-DA to a lumped and a semi-distributed scheme of the probability distributed model (PDM). Our scheme also accounts for model error representation by explicitly correcting bias in soil moisture and streamflow in the ensemble generation process, and for seasonal biases and errors in the satellite data. Before assimilation, the semi-distributed model provided a more accurate streamflow prediction (Nash–Sutcliffe efficiency, NSE= 0.77) than the lumped model (NSE= 0.67) at the catchment outlet. However, this did not ensure good performance at the “ungauged” inner catchments (two of them with NSE below 0.3). After SM-DA, the streamflow ensemble prediction at the outlet was improved in both the lumped and the semi-distributed schemes: the root mean square error of the ensemble was reduced by 22 and 24 %, respectively; the false alarm ratio was reduced by 9 % in both cases; the peak volume error was reduced by 58 and 1 %, respectively; the ensemble skill was improved (evidenced by 12 and 13 % reductions in the continuous ranked probability scores, respectively); and the ensemble reliability was increased in both cases (expressed by flatter rank histograms). SM-DA did not improve NSE. Our findings imply that even when rainfall is the main driver of flooding in semi-arid catchments, adequately processed satellite SM can be used to reduce errors in the model soil moisture, which in turn provides better streamflow ensemble prediction. We demonstrate that SM-DA efficacy is enhanced when the spatial distribution in forcing data and routing processes are accounted for. At ungauged locations, SM-DA is effective at improving some characteristics of the streamflow ensemble prediction; however, the updated prediction is still poor since SM-DA does not address the systematic errors found in the model prior to assimilation.


Introduction
Floods have large costs to society, causing destruction of infrastructure and crops, erosion, and in the worst cases, injury and loss of life (Thielen et al., 2009).To reduce flood impacts on public safety and the economy, early and accurate alert systems are needed.These systems rely on hydrologic mod-Published by Copernicus Publications on behalf of the European Geosciences Union.
C. Alvarez-Garreton et al.: Assimilation of satellite soil moisture to improve flood prediction els, whose accuracy in turn is highly dependent on the quality of the data used to force and calibrate them.Therefore, in the case of sparsely monitored and ungauged catchments, flood prediction suffers from large uncertainties.
A plausible approach to reduce model uncertainties in the sparsely monitored catchments is to exploit remotely sensed hydro-meteorological observations to correct the states or parameters of the model in a data assimilation framework.Within this context, satellite soil moisture (SM) products are appealing given the vital role of SM in runoff generation.SM influences the partitioning of energy and water (rainfall, infiltration and evapotranspiration) between the land surface and the atmosphere (Western et al., 2002).Satellite SM observations provide global scale information and can be obtained in near real time at regular and reasonably frequent time intervals.This makes them valuable for improving the representation of catchment wetness.The accuracy of these observations has been assessed by a number of studies (Albergel et al., 2009(Albergel et al., , 2010(Albergel et al., , 2012;;Draper et al., 2009;Gruhier et al., 2010;Brocca et al., 2011;Su et al., 2013).In general, they have shown promising performance with moderate correlation between satellite SM and ground data, but with significant bias at some locations.
In the last decade a large number of studies have explored satellite SM data assimilation (SM-DA) to correct the soil water states of models.These studies can be categorised into two main groups; the first, and larger group, has focused on the improvement of the SM predicted by the model (generally working with land surface models, e.g.Crow and van Loon, 2006;Crow and Reichle, 2008;Crow and Van den Berg, 2010;Reichle et al., 2008;Ryu et al., 2009).The second, and smaller group (where our study fits), has focused on the improvement of streamflow prediction in rainfall-runoff models (Francois et al., 2003;Brocca et al., 2010bBrocca et al., , 2012;;Alvarez-Garreton et al., 2013, 2014;Chen et al., 2014;Wanders et al., 2014).
Studies from the first group evaluate the prediction improvement of the same variable that is updated in the assimilation scheme (SM).Improvements in streamflow predictions investigated by studies in the second group are not exclusively influenced by better representation of SM.The potential improvement of streamflow predictions in the latter case is constrained by the particular runoff mechanisms operating within a catchment.Accordingly, even when a model structure and parametrisation are capable of representing the runoff mechanisms, improving streamflow prediction by reducing error in soil moisture depends on the error covariance between these two components.This error covariance (which in the model space will be defined by the representation of the different sources of uncertainty) may become marginal when the errors in streamflow come mainly from errors in rainfall input data (Crow and Ryu, 2009).This physical constraint is case specific and determines the potential skill of SM-DA for improving streamflow prediction.To understand and assess this skill, further studies focusing on the im-provement of streamflow prediction are needed with different model characteristics, such as structure, parametrisation and performance before assimilation; and with different catchment characteristics, such as climate, scale, soils, geology, land cover and density of monitoring network.Among the latter, semi-arid catchments present distinct rainfall-runoff processes which have been rarely studied in SM-DA.
Here we address this gap by studying the Warrego River catchment in Australia, a large and sparsely monitored semiarid basin.We set up the probability distributed model (PDM) within the catchment, and assimilate passive and active satellite SM products using an ensemble Kalman filter (EnKF) (Evensen, 2003) for the purpose of improving operational flood prediction.We devise an operational SM-DA scheme to answer three main questions.(1) While rainfall is presumably the main driver of flood generation in semiarid catchments, can we effectively improve streamflow prediction by correcting the antecedent soil water state of the model?(2) What is the impact of accounting for channel routing and the spatial distribution of forcing data on SM-DA performance?(3) What are the prospects for improving streamflow prediction within ungauged sub-catchments using satellite SM?
A series of SM-DA experiments using a lumped version of PDM have already been undertaken in this study catchment by Alvarez-Garreton et al. (2014).They found that assimilating passive microwave satellite SM improved flood prediction, while highlighting specific limitations in their scheme.This paper expands on this previous result in a number of key ways.We improve the representation of model error by explicitly treating forcing, parameter and structural errors.We devise a more robust ensemble generation process by correcting biases in soil moisture and streamflow predictions.We incorporate additional satellite products and apply instrumental variable regression techniques for seasonal rescaling and observations error estimation.Furthermore, we employ a semi-distributed scheme to evaluate the advantages of accounting for channel routing and the spatial distribution of forcing data.
In this paper, Sect. 2 presents a description of the study catchment and the data used.Section 3 presents the methodology, including a description of the rainfall-runoff model, the EnKF formulation and the specific steps for setting up the SM-DA scheme.These include the error model estimation, estimation of profile SM based on the satellite surface data, the rescaling of satellite observations and observation error estimation.Section 4 presents the results and discussion.Section 5 summarises the main conclusions of the study.

Study area and data
The study area is the semi-arid Warrego catchment (42 870 km 2 ) located in Queensland, Australia (Fig. 1).The catchment has an important flooding history, with at least three major floods within the last 15 years.The study area also features geographical and climatological conditions that enable satellite SM retrievals to have higher accuracy than in other areas.These conditions include the size of the catchment, the semi-arid climate and the low vegetation cover.Moreover, the ground-monitoring network within the catchment is sparse thus satellite data is likely to be more valuable than in well-instrumented catchments.The catchment has summer-dominated rainfall with mean monthly rainfall accumulation of 80 mm in January, and 20 mm in August.Mean maximum daily temperature in January is above 30 • C and below 20 • C in July.The runoff seasonality is characterised by peaks in summer months and minimum values in winter and spring.The mean annual precipitation over the catchment is 520 mm.Regarding the governing runoff mechanisms within the study catchment, Alvarez-Garreton et al. (2014) showed that streamflow has a negligible baseflow component and the surface runoff is generated only when a wetness threshold is exceeded.They concluded that soil moisture exerts an important control on the runoff generation mechanisms.In this work, the runoff mechanisms analysis is deepened by looking at model predictions (Sect.3.1).
Daily rainfall data was computed from the Australian Water Availability Project (AWAP), which has a grid resolution of 0.05 • (Jones et al., 2009).Hourly streamflow records were collected from the State of Queensland, Department of Natural Resources and Mines (http://watermonitoring.dnrm.qld.gov.au) (Fig. 1).Daily discharge was calculated based on the daily AWAP time convention (9.00 a.m.-9.00 a.m.local time, UTC +10 h).The flood classification for the study catchment (at the catchment outlet, N7) was provided by the Australian Bureau of Meteorology as river height threshold values, corresponding to minor, moderate and major floods.These threshold values expressed as streamflow (mm day −1 ) are 0.06, 0.55 and 2.05, respectively and relate to flood impact rather than recurrence interval.The associated annual exceedance probability for the minor, moderate and major floods at N7 are 15.7, 3.1 and 0.95 %, respectively (calculated using the complete daily streamflow record period).Potential evapotranspiration was obtained from the Australian Data Archive for Meteorology database.Daily values were estimated by assuming a uniform daily distribution within a month.
Three satellite products were used here.The first was the Advanced Microwave Scanning Radiometer -Earth Observing System (AMS hereafter) version 5 VUA-NASA Land Parameter Retrieval Model Level 3 gridded product (Owe et al., 2008).AMS uses C-(6.9 GHz) and X-band (10.65 and 18.7 GHz) radiance observations to derive near-surface soil moisture (2-3 cm depth) using a land-surface radiative transfer model.The product used is in units of volumetric water content (m 3 m −3 ) and has a regular grid of 0.25 • .
The second product was the TU-WIEN (Vienna University of Technology) ASCAT (ASC hereafter) data produced using the change-detection algorithm (Water Retrieval Pack-age, version 5.4) (Naeimi et al., 2009).ASC transmits electromagnetic waves in C-band (5.3 Gz) and measures the backscattered microwave signal.The change-detection algorithm assumes that land surface characteristics are relatively static over long time periods.Based on this, the differences between instantaneous backscatter coefficients and the historical highest and lowest values for a given incident angle, are related to changes in soil moisture (Wagner et al., 1999).The final SM estimate is provided in relative terms as the degree of saturation and has a nominal spatial resolution varying from 25 to 50 km.
The third satellite product was the Soil Moisture and Ocean Salinity satellite (SMO hereafter), version RE01 (Reprocessed 1-day global soil moisture product) SM provided by the Centre Aval de Traitement des Donnees.SMO uses L-band (1.4 GHz) detectors to measure microwave radiation emitted from depth of up to approximately 5 cm.Nearsurface soil moisture is obtained in units of volumetric water content (m 3 m −3 ) at a spatial resolution of approximately 43 km by using the forward physical model inversion described by Kerr et al. (2012).The overpass times of the AMS, ASC and SMO satellites over the study catchment are 1.30, 10.00 and 6.00 a.m./p.m. local time (UTC +10 h), respectively.Figure 2 summarises the period of record of the different data sets.
For each satellite data set, a daily averaged SM was calculated for the complete catchment (or sub-catchment in the case of the semi-distributed scheme).The areal estimate of satellite SM over the catchment was given by averaging the values of ascending and descending satellite passes on days when more than 50 % of the pixels had valid data.For the case of the passive sensors (AMS and SMO), we subtracted the long-term temporal mean of the ascending and descending data sets to remove the systematic bias between them (Brocca et al., 2011;Draper et al., 2009).Then, daily satellite SM was calculated as the average between the meanremoved ascending and descending passes (if both were available) or directly as the mean-removed available pass.For ASC retrievals, given the unbiased ascending and descending measurements, daily satellite SM was calculated from the actual ascending and descending values averaged over the catchment.

Lumped and semi-distributed model schemes
The probability distributed model (PDM) is a conceptual rainfall-runoff model that has been widely used in hydrologic research and applications (Moore, 2007), mainly over temperate and humid environments.The model was selected from amongst the set of models available within the flood forecasting system managed by the Australian Bureau of Meteorology.This selection was based on both the suitability of 1:80,000,000 The lumped PDM scheme is set up over the entire catchment, while the semi-distributed scheme divides the total catchment in seven subcatchments (SC1-SC7).PDM to simulate ephemeral rivers (Moore and Bell, 2002) and preliminary analysis comparing PDM against other models such as the Sacramento soil moisture accounting model, which did not perform as well as PDM.
PDM is a parsimonious model where the runoff production is controlled by the absorption capacity of the soil (including canopy and surface detention).This process is conceptualised by a store with a distribution of capacities across the catchment and the spatial distribution of these capacities is described by a probability distribution (Moore, 2007).The spatial variability of store capacities can be related to different soil depths, which was identified as the most dominant factor governing runoff variability in a semi-arid catchment (Jothityangkoon et al., 2001).
In the current formulation, the model treats soil moisture store (S 1 in Fig. 3) over the entire catchment as a distributed variable with capacities (c) following a Pareto distribution function, F (c).At a given time, the different stores receive water from rainfall and lose water by evaporation and groundwater recharge (drainage).The shallower stores with less capacity than a critical capacity, C * , start to generate direct runoff while the rest accumulates the water as soil moisture.The proportion of the catchment that generates runoff can therefore be expressed in terms of the Pareto density function, f (c), as In this way, for a time t, the soil moisture over the entire catchment, θ (water content of S 1 ), can be expressed as the summation of all the store capacities greater than C * (t): Note that the critical capacity C * varies in a time interval t based on the net rainfall rate during that time, P , Direct runoff is calculated based on Eq. ( 1) and routed through two cascade of reservoirs (S 21 and S 22 in Fig. 3, with time constants k 1 and k 2 , respectively).Subsurface runoff is estimated based on the drainage from S 1 and transformed into baseflow by using a storage reservoir (S 3 in Fig. 3 with time constant k b ).These are then combined as total runoff, or streamflow.A detailed description of the model conceptualisation and the formulation of the different rainfall-runoff processes is presented in Moore (2007).
PDM was set up using both a lumped scheme and a semidistributed scheme (see Fig. 1).The semi-distributed scheme was configured with seven sub-catchments (SC1-SC7), each using the lumped version of PDM.The area and mean annual rainfall of each sub-catchment are summarised in Table 1.The river routing between upstream and downstream subcatchments in the semi-distributed scheme was represented by a linear Muskingum method (Gill, 1978): where S is the storage within the routing reach, k m is the storage time constant, I and O are the streamflow at the beginning and end of the reach, respectively, and x is a weighting factor parameter.The time constant parameters of the storages S 21 , S 22 and S 3 (k 1 , k 2 and k b , respectively) were scaled by the area of each sub-catchment, and k m from the Muskingum routing was scaled by the length of the river channel between corresponding nodes.The remaining model and routing parameters of the semi-distributed scheme were treated as homogeneous.
The lumped and the semi-distributed models were calibrated by using a genetic algorithm (Chipperfield and Fleming, 1995) with an objective function based on the Nash-Sutcliffe model efficiency (NSE) (Nash and Sutcliffe, 1970).The models were calibrated for the period 1 January 1967-31 May 2003 and evaluation performed for the period 1 June 2003-2 March 2014.To make fair comparisons between the two model setups in a scenario where the inner catchments are ungauged, the semi-distributed scheme was calibrated using only the outlet gauge (N7 in Fig. 1).The performance of the calibrated models was evaluated based on the NSE at the catchment outlet (N7, Fig. 1) and at inner nodes N1 and N3, in the case of the semi-distributed scheme.
To analyse the runoff mechanisms simulated by the lumped and the semi-distributed schemes, we calculated the lag-correlation between rainfall and streamflow, and between antecedent SM and streamflow.This enables further understanding of the improvement in streamflow that can be expected by improving the simulated SM content through SM-DA.

EnKF formulation
The EnKF proposed by Evensen (2003) has been widely used in hydrologic applications given the nonlinear nature of runoff processes.In the EnKF, the error covariance between the model and observations is calculated from Monte Carlo-based ensemble realisations.In this way, the model and observation uncertainties are propagated and the streamflow prediction is treated as an ensemble of equally likely realisations.The uncertainty of the streamflow prediction can be derived from the ensemble, which provides valuable information for operational flood alert systems.
In a state-updating assimilation approach, the state ensemble is created by perturbing forcing data, parameters and/or states of the model with unbiased errors.As we will see in Sect.3.3, an N-member ensemble of model soil moisture, θ = {θ 1 , θ 2 , . ..θN }, was generated by perturbing rainfall forcing data, the model parameter k 1 , and θ .Then, the soil water error of member i at time t was estimated as where the superscript " − " denotes the state prediction prior to the assimilation step.The error vector for time step t was defined as θ − (t) = {θ − 1 (t) , θ − 2 (t) , . .., θ − N (t) } and the error covariance of the model state (P − ) was estimated at each time step as: When a daily SM observation from AMS, ASC or SMO was available, each member of the background prediction (θ − ) was updated.Before being assimilated, each of the three observation data sets was transformed to represent a profile SM and then rescaled to remove systematic differences between the model and the transformed observations (details in Sects.3.5 and 3.6).We sequentially assimilated an N -member ensemble of the transformed and rescaled AMS, ASC and SMO (named θ ams , θ asc and θ smo , respectively) and updated each member of θ − with the following three steps: 1.If θ ams was available at time t, where H is an operator that transforms the model state to the measurement space.Since the additive and multiplicative biases between the model predictions and the microwave retrievals were removed via rescaling in a separate step (see Sect. 3.6), H reduced to a unit matrix.The Kalman gain K 1 (t) was calculated as where R 1 (t) is the error variance of θ ams estimated in the rescaling procedure (Sect.3.6).If θ ams was not available, θ 2. If θ asc was available at time t, we updated the model soil moisture with where K 2 (t) was calculated as R 2 (t) is the error variance of θ asc and P − is the model error covariance re-calculated by applying Eq. ( 6) to the updated soil moisture θ + (t).If θ asc was not available, θ ++ (t) = θ + (t).
3. If θ smo was available at time t, we updated the model soil moisture with ), (11) where K 3 (t) was calculated as R 3 (t) is the error variance of θ smo and P − is the model error covariance re-calculated by applying Eq. ( 6) to the updated soil moisture θ ++ (t).If θ smo was not available, θ +++ (t) = θ ++ (t).
In the case of the semi-distributed scheme, during the updating steps described above, each sub-catchment was treated independently and no spatial cross-correlation in the satellite measurements was considered.The order of the products assimilated in steps 1-3 was arbitrary; however, we checked that different orders did not significantly affect the SM-DA results.

Error model representation
The main sources of uncertainty in hydrologic models are the errors in the forcing data, the model structure and the incorrect specification of model parameters (Liu and Gupta, 2007).Generally, these errors are represented by adding unbiased synthetic noise to forcing variables, model state variables and/or model parameters.
The estimation of model errors is among the most crucial challenges in data assimilation, as it determines the value of the Kalman gain.In the case of a state updating SM-DA, the ability of the scheme to improve streamflow prediction will mainly depend on the covariance between the errors in SM states and modelled streamflow, which directly depends on the specific representation and estimation of the model errors.
To represent the forcing uncertainty, we adopted a multiplicative error model for the rainfall data (McMillan et al., 2011;Tian et al., 2013).In particular, we followed the scheme used in various SM-DA studies (e.g.Chen et al., 2011;Brocca et al., 2012;Alvarez-Garreton et al., 2014) and represented a spatially homogeneous rainfall error ( p ) as where σ p is the standard deviation of the log-normal distribution.The above representation assumes a spatially homogeneous fraction of the error to the rainfall intensity, which could be an over-simplification in a large area like the study catchment.However, it avoids the estimation of additional error parameters (e.g.spatial correlation parameter) in an already highly undetermined problem (see Sect. 3.4).
The parameter uncertainty was represented by perturbing the time constant parameter (k 1 ) for store S 21 , a highly sensitive parameter of the model that directly affects the streamflow generation by influencing the water stored in both surface storages S 21 and S 22 (note that in the PDM formulation used, the time constant k 2 is calculated as a function of k 1 ).Given the lack of prior information about the structure of the parameter error ( k ), we adopted a normally distributed multiplicative error with unit mean and standard deviation of σ k , following previous SM-DA studies working with rainfallrunoff models (Brocca et al., 2010b(Brocca et al., , 2012)).
Following the scheme used in most SM-DA experiments (e.g.Reichle et al., 2008;Crow and Van den Berg, 2010;Chen et al., 2011;Hain et al., 2012), the model structural error was represented by perturbing the SM prediction (θ ) with a spatially homogeneous additive random error, where σ s is the standard deviation of the normal distribution.
The physical limits of SM (porosity as an upper bound and residual water content as a lower bound) are represented by the model through the storage capacity of S 1 .When θ approaches the limits of S 1 , applying unbiased perturbation to θ can lead to truncation bias in the background prediction.This can then result in mass balance errors and degrade the performance of the EnKF (Ryu et al., 2009).Moreover, the Kalman filter assumes unbiased state variables.This issue is of particular importance in arid regions like the study area, where the soil water content can be rapidly depleted by evapotranspiration and transmission losses, thus approaching the residual water content of the soil.To ensure that the state ensemble remained unbiased after perturbation we implemented the bias correction scheme proposed by Ryu et al. (2009).
The truncation bias correction consisted of running a single unperturbed model prediction (θ −0 ) in parallel with the perturbed model prediction (θ − i, ).At each time step, the mean bias, δ(t), of the N-member ensemble prediction was calculated by subtracting θ −0 (t) from the ensemble mean, as follows (Ryu et al., 2009): Then, a bias corrected ensemble of state variables, θ − i (t), was obtained by subtracting δ(t) from each member of the perturbed ensemble, θ − i (t).Although the latter resulted in unbiased state ensembles, some important but subtle effects remain that arise from the highly non-linear nature of hydrologic model.These need to be guarded against in SM-DA.Representing model errors by adding unbiased perturbation to forcing, model parameters and/or model states can lead to a biased streamflow ensemble prediction (e.g.Ryu et al., 2009;Plaza et al., 2012), compared with the unperturbed model run.This biased streamflow ensemble prediction (open-loop hereafter) is degraded compared with the streamflow predicted by the unperturbed calibrated model.As a consequence, improvement of the open-loop after SM-DA will in part be due to the correction of bias introduced during the assimilation process itself.
To avoid overstating the SM-DA efficacy due to the above issue, we applied the bias correction scheme directly to the streamflow prediction (in both the open-loop and the assimilation runs).We used the unperturbed model run to estimate a mean bias in the streamflow (following Eq. 15, but using streamflow instead of soil moisture) and then corrected each ensemble member by subtracting this mean bias.This practical tool ensures that the streamflow ensemble mean maintains the performance skill of the unperturbed (calibrated) model run, thus avoiding artificial degradation of the unperturbed model run by bias.To our knowledge, this approach has not been applied in previous SM-DA studies.

Error model parameters calibration
To calibrate the error model parameters (σ p , σ k and σ s ), we evaluated the open-loop ensemble prediction (Q ol ) against the observed streamflow at the catchment outlet.In this study we used a maximum a posteriori (MAP) scheme, a Bayesian inference procedure detailed by Wang et al. (2009) that maximises the probability of observing historical events given the model and error parameters.In other words, it maximises the probability of having the streamflow observation within the open-loop streamflow.
Member i from the N-member open-loop can be expressed as where Q T is the (unknown) truth streamflow and m is the error of the streamflow prediction and consists of forcing, parameter and states errors: The observed streamflow at N7 (Q obs ) can be expressed as a function of the same (unknown) truth and the streamflow observation error ( obs ), Combining Eqs. ( 16) and ( 18), the model ensemble prediction of the observed streamflow ( Qobs ) is expressed as: Following Li et al. (2014), obs was assumed to be a serially independent multiplicative error following a normal distribution (mean 1 and standard deviation of 0.2).Then, the likelihood function (L) defining the probability of observing the historical streamflow data given the calibrated model parameters (x), and the error model parameters (σ p , σ k and σ s ), was expressed as To maximise L, we applied a logarithm transformation to it and maximised the sum over time of the transformed function.The probability density function (p) at each time step was estimated by assuming that the ensemble prediction of the observed streamflow, Qobs (t), follows a Gaussian distribution, with its mean and standard deviation computed using the ensemble members.The period used to calibrate the error model parameters was 1 January 1998-31 May 2003.
An important aspect to highlight about this error parameter calibration is that it is a highly underdetermined problem.Only one data set (streamflow at N7) is used to calibrate the error parameters, while there might be many combinations of error parameters that can generate similar streamflow ensemble (equifinality on the error parameters).

Profile soil moisture estimation
The aim of the stochastic assimilation detailed in Sect.3.2 is to correct θ , which is a profile average SM representing a soil layer depth determined by calibration.By assuming a porosity of 0.46, (A-horizon information reported in McKenzie et al., 2000), and the model S 1 storage capacity of 396 mm (420 mm) for the lumped (semi-distributed) scheme, this profile SM roughly represents the upper 1 m of the soil.On the other hand, the satellite SM observations represent only the C. Alvarez-Garreton et al.: Assimilation of satellite soil moisture to improve flood prediction few top centimetres of the soil column (see Sect. 2).To provide the model with information about more realistic dynamics of θ, we applied the exponential filter proposed by Wagner et al. (1999) to the satellite SM to estimate the soil wetness index (SWI) of the root-zone.SWI has been widely used to represent deeper layer SM based on satellite observations (e.g.Albergel et al., 2008;Brocca et al., 2009Brocca et al., , 2010bBrocca et al., , 2012;;Ford et al., 2014;Qiu et al., 2014).SWI was recursively calculated as: where SSM(t) is the satellite SM observation and G(t) is a gain term varying between 0 and 1 as: T is a calibrated parameter that implicitly accounts for several physical parameters (Albergel et al., 2008).T was calibrated by maximising the correlation between SWI and the unperturbed model soil moisture (θ ) during the first year of satellite data.This calibration period was selected to maximise the independent evaluation period (see Sect. 3.7); however, more representative values are likely to be obtained if a longer period is used for calibration.SWI was calculated independently for each of the AMS, ASC and SMO data sets (named SWI AMS , SWI ASC and SWI SMO , respectively) and then rescaled to remove systematic differences with the model prediction (Sect.3.6).

Rescaling and observation error estimation
The systematic differences (e.g.biases) between θ and the SWI derived from each satellite product must be removed prior to applying a bias-blind data assimilation scheme (Dee and Da Silva, 1998).We applied instrumental variable (IV) regression to resolve the biases and estimate the measurement errors simultaneously (Su et al., 2014a).In threedata IV regression analysis, also known as triple collocation (TC) analysis (Stoffelen, 1998;Yilmaz and Crow, 2013), the model θ , the passive SWI and active SWI are used as the data triplet.As the sample size requirement for TC is stringent (Zwieback et al., 2012), a pragmatic threshold of 100 triplet sample was imposed (Scipal et al., 2008).During periods when only one satellite product was available (i.e.before ASC) or when the sample threshold for TC was not met, a two-data set IV regression using lagged variables (LV) was applied as a practical substitute (Su et al., 2014a).The LV analysis was performed on the model θ and a single satellite SWI, with the lagged variable coming from the model.In most SM-DA experiments, the error in satellite SM has been treated as time-invariant (e.g.Reichle et al., 2008;Ryu et al., 2009;Crow and Van den Berg, 2010;Brocca et al., 2010bBrocca et al., , 2012;;Alvarez-Garreton et al., 2014); however, studies evaluating satellite SM products have shown an important temporal variability in the measurement errors (Loew and Schlenz, 2011;Su et al., 2014a).Since a data assimilation scheme explicitly updates the model prediction based on the relative weights of the model and the observation errors, assuming a constant observation error may lead to overcorrection of the model state if the actual error is higher, and vice versa.
Temporal characterisation of the observation error can be achieved by applying TC (or LV) to specific time windows of the observations and model predictions (for example, by grouping the triplets or doublets by month-of-the-year).There is however, a trade-off between the sampling window (which defines the temporal characterisation of the error) and the sample size (number of triplets in each subset).In an operational context this trade-off becomes more critical since only past observations are available.After analysing the temporal variability of the observation errors using the complete period of record (not shown here), we found that a 4-month sampling window can reproduce seasonality in errors while ensuring sufficient data samples for the TC and LV schemes.With this analysis we also assessed the suitability of using LV, which yielded similar results to TC although some positive bias in LV error variance estimates relative to TC was noted (not shown here).
Summarising, the procedure for rescaling and error estimation consists of: 1. From the start of the AMS data set, we grouped LV triplets (SWI AMS (t), θ (t) and θ (t − 1)) into three subsets: December-March, April-July and August-November.
2. We applied LV and thus, estimated the observation error variance and rescaling factors for a given 4-month subset only when a minimum of 100 samples was reached (after one year of AMS data set).After the first year of AMS, new seasonal triplets were added into the corresponding 4-month data pool (retaining all earlier triplets) and LV was applied to the updated subset.
3. When ASC was available, LV triplets (SWI ASC (t), θ (t) and θ (t − 1)) subsets were formed following step 1 criteria and LV was applied after the 4-month data pools had more than 100 samples, following step 2.
4. In parallel with step 3, TC triplets were formed using the two available satellite data sets (SWI AMS (t), SWI ASC (t) and θ (t)) and grouped into the 4-month subsets defined in step 1. TC was applied only when the 4-month data pools contained more than 100 samples (after approximately 3 years of ASC data).

5.
Steps 3 and 4 were repeated when SMO was available.The triplets for TC in this case were given by SWI ASC (t), SWI SMO (t) and θ (t).constructed for each satellite-derived SWI by selecting TC results when available, and LV results if not.This criterion was adopted because LV is susceptible to bias due to auto-correlated errors in the model SM (Su et al., 2014a).The rescaled observations from AMS, ASC and SMO were named θ ams , θ asc and θ smo , respectively.

Evaluation metrics
To evaluate the SM-DA results, we used six different metrics.Firstly, the normalised root mean squared difference (NRMSE) was calculated as the ratio of the root mean square error (RMSE) between the updated streamflow ensemble (Q up ) and the observed streamflow to the RMSE between the open-loop (ensemble streamflow prediction without assimilation, Q ol ) and the observed discharge: where N = 1000 is the number of ensemble members.The NRMSE provides information about both the spread of the ensemble and the performance the ensemble mean, which is considered as the best estimate of the ensemble prediction.Moreover, as it is calculated in linear streamflow space, it gives more weight to high flows.
To further evaluate the performance of the ensemble mean, we calculated the Nash-Sutcliffe efficiency (NSE) for the entire evaluation period as follows (example for the open-loop case): where Q ol is the open-loop ensemble mean.Similarly, NSE up was calculated by applying Eq. ( 24) to the updated ensemble mean (Q up ).We also estimated the probability of detection (POD) of daily flow rates (not flood events) exceeding minor, moderate and major floods, for the open-loop and the updated ensemble mean, as follows (example for the open-loop case): where the symbol # represents the number of times.Q 15.7 % obs is the observed streamflow corresponding to a minor flood classification.This corresponds to a flow (not flood) frequency of 15.7 % (see Sect. 2).Similarly, POD up was calculated by applying Eq. ( 25) to the updated ensemble mean (Q up ).We estimated the false alarm ratio (FAR) for daily flows as (example for the open-loop case): Similarly, FAR up was calculated by applying Eq. ( 26) to the updated ensemble mean.Finally, we calculated the aggregated peak volume error (PVE, in mm) of the ensemble mean, for days when the observed streamflow was above a minor flood classification (t * days in Eq. 27).An example for the open-loop, PVE was calculated as To evaluate the skill of the streamflow ensemble prediction before and after SM-DA, we calculated the continuous ranked probability score (CRPS; Robertson et al., 2013).CRPS is used as a measure of the ensemble errors.In the case of the deterministic unperturbed run, CRPS reduces to the mean absolute error.The reliability of the ensembles was also evaluated by inspecting the rank histograms of the ensemble following Anderson (1996).A reliable ensemble should have a uniform histogram while a u-shape (n-shape) histogram indicates that the ensemble spread is too small (large) (De Lannoy et al., 2006).
The evaluation period for the SM-DA was 1 June 2003-2 March 2014.This period is independent of all scheme component calibration periods (see Sects.3.1, 3.4 and 3.5).

Model calibration
The streamflow at the outlet of the study catchment (N7 in Fig. 1) features long periods of zero-flow, a negligible baseflow component and sharp flow peaks after rainfall events, when the catchment has reached a threshold level of wetness (see observed streamflow in Fig. 4).
The simulated streamflows from the lumped and the semidistributed schemes are presented in Fig. 4. To help visualisation of these time series, the calibration and evaluation periods were plotted separately.The evaluation period was further separated into two sub-periods, evaluation subperiod 1 (1 June 2003-30 April 2007), characterised by having only moderate and minor floods, and evaluation subperiod 2 (30 April 2007-2 March 2014), which had three major flooding events.The plots show that both the lumped and the semi-distributed models are generally able to capture the hydrologic behaviour of the catchment.As expected, the spatial distribution of forcing data and the channel routing accounted for by the semi-distributed scheme enhanced the overall performance of the model, with lower residual values through time (panels a.2, b.2 and c.2 in Fig. 4) and consistently improved the simulation of peak flows.
Table 2 presents the evaluation statistics for the streamflow prediction in the calibration and evaluation periods, for both the catchment outlet and the inner catchments (notice that N1 does not have data in the calibration period).The  consistently show that, at the catchment outlet, the semi-distributed has consistently better performance than the lumped scheme in terms of RMSE, NSE, PVE and CRPS.Both schemes show better statistics in the evaluation period due to the higher flows over that period.
The good performance of the semi-distributed scheme at the catchment outlet was not reflected at the inner catchments.To explore the reasons for such bad performance, we separately calibrated the model parameters in those subcatchments by using all the available N7, N1 and N3 observations.The results (not shown here) revealed that in this case, the model was able to adequately simulate streamflow in those sub-catchments (NSE in evaluation period of 0.78, 0.69 and 0.84 at N1, N3 and N7 nodes, respectively).Based on this, we argue that the problem of the poor model performance in the "ungauged" inner catchments is most likely due to sub-optimal parameter estimation (due to the limited information about catchment heterogeneity provided by the integrated catchment streamflow response) and unlikely to be due to errors in the input data or model structure.
To focus the analysis of the catchment runoff mechanisms on periods with flood events, the lag-correlation between the daily streamflow simulated at N7 and θ (Fig. 5), and between daily streamflow and the daily rainfall (Fig. 6), was calculated for daily streamflow values greater than Q 15.7 % obs , or minor flood level.The lumped scheme indicates a stronger link between θ and streamflow than the semi-distributed scheme.This is represented by higher r values in panel (a) compared with panels (b)-(h) in Fig. 5. Conversely the link between rainfall and streamflow is weaker in the lumped scheme (lower r values in panel (a) compared with panels (b)-(h) in Fig. 6).These different representations of the catchment runoff response will have a direct impact on the skill of SM-DA to improve streamflow prediction.A strong relationship between θ and streamflow prediction suggests strong correlation between their errors, and therefore, greater potential improvement of streamflow resulting from an improved representation of θ .
If we assume that the semi-distributed scheme provides a better representation of runoff response within the entire catchment (based on its better model performance at the outlet), Figs. 5 and 6 also suggest that daily rainfall is the main control on runoff generation and thus has a stronger impact on the streamflow prediction than soil moisture.Figure 5 shows that flood prediction strongly depends on antecedent soil moisture for up to the preceding 3 days.The strong correlation found at lag-0 suggests that the real time SM correction given by the proposed SM-DA would be a good strategy to improve flood prediction.

Error model parameters and ensemble prediction
The calibrated error parameters for the lumped and the semidistributed schemes are σ p = 1.286 mm and 0.977 mm; σ s = 0.099 and 0.03 and σ k = 0.084 and 0.018, respectively.σ s is expressed as a percentage of the total storage capacity (396 mm in the lumped scheme and 420 mm in the semidistributed scheme) and σ k is expressed as a percentage of the calibrated parameter k 1 .
The rank histograms of the generated ensemble prediction (open-loop) are presented in Fig. 7.The histograms at the catchment outlet (N7) are either n-shaped or displaced to one side, for both the lumped and semi-distributed model schemes (Fig. 7a and b, respectively).This suggests that the open-loop ensembles are slightly biased (with respect to the observed streamflow) and feature wider spread than an ideal ensemble.The width of the spread will be critical for the evaluation of SM-DA (Sect.4.4) since any decrease of the spread would be considered as an improvement of the ensemble prediction.
The wider spread of the open-loop ensembles at the catchment outlet could be due to factors such as an over-prediction of error parameters by the MAP calibration algorithm, or the representation of the model error with time-constant error parameters.The latter becomes critical given the distinct behaviour of the intermittent streamflow response within the catchment, which could indicate distinct behaviour in the model errors as well.
The ensemble predictions at the inner nodes N1 and N3 (Fig. 7c and d, respectively) feature high bias with respect to the observed streamflow (note that observations at N1 and N3 were not used to calibrate the error parameters).The large bias at these inner nodes result from the large errors in the calibrated model in SC1 and SC3 (see Sect. 4.1).2) present evaluation sub-period 1, which has only moderate and minor flood events.(c.1) and (c.2) present evaluation sub-period 2, which has 3 major flood events.The daily rainfall plotted on the right axis correspond to the averaged rainfall over the entire catchment.flood level.The lumped scheme indicates a stronger link between θ and streamflow than the semi-distributed scheme.This is represented by higher r values in panel a compared 790 with panels b-h in Fig. 5. Conversely the link between rain-fall and streamflow is weaker in the lumped scheme (lower r values in panel a compared with panels b-h in Fig. 6).These different representations of the catchment runoff response will have a direct impact on the skill of SM-DA to improve   and (c.2) present evaluation sub-period 2, which has 3 major flood events.The daily rainfall plotted on the right axis correspond to the averaged rainfall over the entire catchment.

SWI estimation and rescaling
The satellite SM derived from AMS, ASC and SMO are presented in Fig. 8a, for the lumped model.The satellite data sets feature significantly higher noise than the modelled θ .This can be explained by factors such as random errors in the satellite retrievals (Su et al., 2014b), and the rapid variation of water content in the surface layer of soil due to infiltration and evapotranspiration losses.Figure 8b presents the SWI derived from the satellite products, after seasonal rescaling (θ ams , θ asc and θ smo ).This plot shows better agreement between model and observations due to SWI filtering/transformation, even when the higher noise in the rescaled SWI time series is still present.
Figure 8c shows the seasonal observation error variance, and reveals a clear variation in the error with time.The variation of the seasonal error values is due to the alternative use of TC or LV and to the increasing sample size of each seasonal pool (see Sect. 3.6), which should reduce the uncertainties coming from finite sample size.One limitation of this procedure is its assumption that the errors vary seasonally without inter-annual variability.Since there are inter-annual cycles (wet and dry years), one may also expect the errors to vary with year.Ideally, moving-window estimation with win- dows smaller than 3 months should be considered, but that would cause greater sampling uncertainties for the TC and LV estimates.The inverse relationships between θ ams and θ asc error variances at some times could be due to the passive retrieval by AMS compared with the active ASC, among other factors.
A common error standard deviation value used in previous SM-DA studies is 3 % m 3 m −3 (e.g.Chen et al., 2011).This constant error, when transformed according to the soil moisture storage capacity of the model and the soil porosity (see Sect. 3.5) gives an error variance of 667 (750) mm 2 for the lumped (semi-distributed) scheme.As a simple comparison, these values are within the range of the error variance estimated through seasonal LV/TC; however, a comprehensive analysis of the impacts of accounting for seasonality in SM-DA is beyond the scope of this work.
Table 3 summarises the results of the SWI calibration and seasonal rescaling for the lumped model, showing the T parameter for each SWI and the correlation coefficient (r) between θ and the satellite SM before and after SWI transformation and rescaling (θ obs ).These results confirm the visual assessment of plots in Fig. 8 by showing an important increase in the linear correlation coefficient with θ when satellite SM is transformed into SWI.The correlation is further increased after rescaling, which illustrates that there is clear benefit from performing seasonal bias correction.Note that applying a constant rescaling factor would have no impact on the correlation between θ and θ obs .
The optimal T values (Table 3) are difficult to validate since there is no ground data to compare with and, given that it has been shown that they strongly depend on the physical processes of the study site (Ceballos et al., 2005), direct comparison with other studies cannot be made reliably.Indeed, previous studies have shown a wide range of optimal T values for soil depths ranging between 10 and 100 cm.As an example, in Fig. 9 we have summarised the optimal T found in five different studies (Albergel et al., 2008;Brocca et al., 2009Brocca et al., , 2010a;;Ford et al., 2014;Wagner et al., 1999).
Previous studies have shown that optimal T value increases with layer depth (e.g.Brocca et al., 2010a).Results presented here show an increased T value for SMO, which would be inconsistent with L-band having a deeper penetration than AMS C-band (to limit the comparison within passive retrievals).We speculate that these differences might be due various factors, including the different retrieval methods (which have quite different assumptions pertaining to spatial heterogeneity) and the influence that radio-frequency interference noise.Moreover, to the best of our knowledge, the existing studies examining the dependence of T on the soil depths are usually based on a single satellite product against in situ measurements at variable depths.Hence it is difficult to compare our results against these studies due to the increased complexity due to different sensing and retrieval methods.
There are some key theoretical issues that should be considered when using SWI as a profile SM estimator.Firstly, the parameter T in Eq. ( 22) was estimated by maximising the correlation between SWI and θ , which could introduce cross-correlated errors between them.This would violate the IV regression assumption of no correlation between the errors among the triplets (Sect. 3.6).A way to overcome this issue, if data requirements are met, would be to estimate a profile SM independently of the rainfall-runoff model prediction, for example by using a physically based model to transfer surface SM into deeper layers (e.g.Richards, 1931;Beven and Germann, 1982;Manfreda et al., 2014).Secondly, the SWI formulation explicitly incorporates autocorrelation terms, which would result in autocorrelated errors in the observation, which violates an EnKF assumption: independence between observation and prediction errors.The autocorrelation in the observation error can be transferred to the updated θ + during the SM-DA updating step.In that case, the θ − background prediction error covariance at time t + 1 would be correlated to the error of the rescaled SWI at time t + 1.In contrast with the first issue listed above, the violation of the EnKF assumption can not be avoided by replacing SWI with a physically based model, since the latter would result in profile SM strongly correlated with previous states as well.Indeed, given the physical mechanisms of water flux in the unsaturated soil, this problem will be present whenever a profile SM estimated from satellite SM is used as an observation in an EnKF-based data assimilation framework.A way to overcome this could be to work with models that explicitly account for the water in the top few centimetres of soil and therefore can directly assimilate a (rescaled) satellite retrieval.However, the errors in satellite SM retrievals are probably already autocorrelated (Crow and Van den Berg, 2010).
Breaching some of the EnKF-based scheme and/or the IV-based rescaling assumptions could theoretically degrade 0 50 100 0 20 40 Soil depth (cm) Optimal T (d) Wagner et al., 1999Albergel et al., 2008Brocca et al., 2009Brocca et al., 2010aFord et al., 2014 Figure 9. Optimal T parameter against soil depth found in previous studies.
the performance of the SM-DA scheme, when the variable analysed is soil moisture (Crow and Van den Berg, 2010;Reichle et al., 2008;Ryu et al., 2009).In this context, the performance of SM-DA with respect to the improvement in streamflow has been under-investigated.Alvarez-Garreton et al. (2013, 2014) show that in terms of streamflow prediction, SM-DA seems to be less sensitive to violation of these assumptions.Both the lower sensitivity and the apparent contradiction with previous studies analysing soil moisture prediction performance highlight the need for further studies focusing on SM-DA for the purposes of improving streamflow prediction from rainfall-runoff models.

Satellite soil moisture data assimilation
The ensemble predictions of streamflow and θ , before and after SM-DA, for both the lumped and the semi-distributed schemes at N7, are presented in Fig. 10.The truncation bias correction (Sect.3.3) was successful in creating an unbiased θ ensemble when the unperturbed model approached the soil water storage bounds (Fig. 10a.2 and b.2).
The rank histograms at N7, N1 and N3 are presented in Fig. 7.For all the evaluated nodes, the ensemble predictions are more reliable after SM-DA (flatter histograms compared with the open-loop).The consistent overestimation of the observed streamflow in the open-loop ensembles (diagonal histograms displaced towards the higher ensemble percentiles) is partially addressed by the SM-DA.
The evaluation statistics for the SM-DA are summarised in Table 4.The streamflow data of the inner catchments (N1 and N3) are used only for evaluation purposes in the semidistributed scheme, therefore they are representative of "ungauged" inner catchments.
The NRMSE in Table 4 (all values below 1) demonstrates that the SM-DA was effective in reducing the streamflow prediction uncertainty (RMSE) across all gauged and ungauged catchments.The reductions in the RMSE ranged from 17 to 24 % for the different evaluation nodes.The NRMSE combines precision improvement (i.e.reduction of ensemble spread) with prediction accuracy improvement (i.e.enhancement of ensemble mean performance) resulting from the SM-DA.Given that the ensemble open-loop spread was larger than an ideal ensemble (based on the n-shaped rank histograms in Fig. 7), the reduction of the ensemble spread may be in part artificial.The performance of the ensemble mean was assessed by computing the NSE ol and NSE up (Table 4).At the catchment outlet, the NSE of the ensemble mean after SM-DA only improved for the semi-distributed scheme.At the ungauged catchments, SM-DA was effective at improving the performance of the ensemble mean only at N3, compared with the open-loop.However, the performance of the model in that catchment was still poor.This can be explained by the systematic errors present in the model for those catchments before assimilation, which were not addressed by the SM-DA.
The POD values at the catchment outlet (N7) show that before and after SM-DA, the model is consistently capable of detecting minor floods.Although this does not demonstrate an advantage of the SM-DA scheme proposed here, it does reflect the adequacy of the model ensemble prediction for simulating minor (and larger) floods.Consistently with previous results, the prediction of the semi-distributed model at the inner catchments is poorer in terms of detecting minor floods.The lower FAR values after SM-DA demonstrates the efficacy of the scheme in reducing the number of times the model predicted an unobserved minor flood, at both the gauged and the ungauged catchments.
The open-loop PVE was improved (lower PVE values) after SM-DA at N7 (for both the lumped and the semidistributed schemes) and at N3.This was not the case however, for inner node N1, at which the PVE was higher after SM-DA, compared with the open-loop.When compared to the unperturbed model run (Table 2), the assimilation of satellite soil moisture improved the performance of the model in terms of PVE at all the nodes and for both the lumped and semi-distributed schemes.
The skill of the ensembles after SM-DA was improved at the catchment outlet by 12 and 13 % (expressed by a reduction in CRPS) for the lumped and semi-distributed scheme   The skill of the ensembles after SM-DA was improved at the catchment outlet by 12% and 13% (expressed by a reduction in CRPS) for the lumped and semi-distributed scheme respectively, and by a 17% at N1.The skill of the updated ensemble was also consistently higher than the unperturbed 1035 model run (Table 2).
To summarise the efficacy of the SM-DA, we take into account the characteristics of the ensemble predictions (openloop and updated) in terms of the their mean, skill and reliability.Overall, SM-DA was effective at improving stream-1040 flow ensemble predictions in the gauged and the ungauged catchments.By accounting for rainfall spatial distribution and routing process within the large study catchment, we improved the model performance at the outlet compared with respectively, and by a 17 % at N1.The skill of the updated ensemble was also consistently higher than the unperturbed model run (Table 2).
To summarise the efficacy of the SM-DA, we take into account the characteristics of the ensemble predictions (openloop and updated) in terms of the their mean, skill and reliability.Overall, SM-DA was effective at improving streamflow ensemble predictions in the gauged and the ungauged catchments.By accounting for rainfall spatial distribution and routing process within the large study catchment, we improved the model performance at the outlet compared with a lumped homogeneous scheme.This led to greater improvements from the SM-DA for the semi-distributed model.The latter was achieved even though the relationship between θ and the streamflow prediction was weaker in the semi-distributed scheme (Fig. 5).The proposed SM-DA scheme therefore has the merits of improving streamflow ensemble predictions by correcting the SM state of the model, even when rainfall appears to be the main driver of the runoff mechanism (see Sect. 4.1).

Conclusions
This paper presents an evaluation of the assimilation of passive and active satellite soil moisture observations (SM-DA) into a conceptual rainfall-runoff model (PDM) for the purpose of reducing flood prediction uncertainty in a sparsely monitored catchment.We set up the experiments in the large semi-arid Warrego River basin (> 40 000 km 2 ) in south central Queensland, Australia.Within this context, we explore the advantages of accounting for the forcing data spatial distribution and the routing processes within the catchment.
The framework proposed here rigorously addressed the two main stages of a SM-DA scheme: model error representation and satellite data processing.We applied the different methods in the context of a sparsely monitored large catchment (i.e.limited data), under operational streamflow and flood forecasting scenarios (i.e.no future information is used in any of the presented methods).
The model error representation was the most critical step in the SM-DA scheme, since it determined the error covariance between observations and model state, and thus the potential efficacy of SM-DA.Moreover, the SM-DA evaluation was done against the open-loop ensemble prediction.We addressed key issues of the ensemble generation process by correcting truncation biases in soil moisture and streamflow predictions.This prevented an unintended degradation of the open-loop ensembles coming from perturbing a highly non-linear model.The open-loop ensembles at the catchment outlet provide key information about prediction uncertainty, which is required for assessing risks associated with water management decisions (Robertson et al., 2013).These ensembles showed a slight bias with respect to the observed streamflow and featured a wide spread.Further exploration of model error representation (sources of error and the structure of those errors) and error parameter estimation is required to improve the characteristics of the open-loop ensemble prediction.
In the satellite data processing, we highlighted that the use of an exponential filter to transfer surface information into deeper layers may potentially lead to violation of some of TC and EnKF assumptions (Sect.4.3).Possible solutions to overcome this would be to use more physically based methods to transfer satellite SM into deeper layers or to use a rainfall-runoff model that explicitly accounts for the surface soil layer that can directly assimilate a (rescaled) satellite SM product.However, both solutions are constrained by the ancillary data available for satisfactory implementation of a physically based model.In the rescaling and error estimation procedure, we applied seasonal TC and LV to avoid error-in-variable biases.Applying these to correct biases in the SWI showed improved agreement between observed and modelled SM.This seasonal approach is novel in the context of SM-DA and tends to lead to closer agreement between model and observations.Further investigation is required to assess the impacts and importance of accounting for seasonality in rescaling and error estimation.
The evaluation of the SM-DA results led to several insights.(1) The SM-DA was successful at improving the open-loop ensemble prediction at the catchment outlet, for both the lumped and the semi-distributed case.(2) Accounting for spatial distribution in the model forcing data and for the routing processes within the large study catchment improved the skill of the SM-DA at the catchment outlet.
(3) The SM-DA was effective at improving streamflow pre-diction at the ungauged locations, compared with the openloop.However, the updated prediction in those catchments was still poor, because the systematic errors before assimilation are not addressed by a SM-DA scheme.
This work provides new evidence of the efficacy of SM-DA in improving streamflow ensemble predictions within sparsely instrumented catchments.We demonstrate that SM-DA skill can be enhanced if the spatial distribution of forcing data and routing processes within the catchment are accounted for in large catchments.We show that SM-DA performance is directly related to the model quality before assimilation.Therefore we recommend that efforts should be focused on ensuring adequate models, while evaluating the trade-offs between more complex models and data availability.

#Figure 1 .
Figure1.The Warrego River basin located in Queensland, Australia (left panel).A close-up of the area is presented in the right panel.The lumped PDM scheme is set up over the entire catchment, while the semi-distributed scheme divides the total catchment in seven subcatchments (SC1-SC7).

Figure 2 .
Figure 2. Periods of record of the different data sets.The initial date of the plot was set as the beginning of the streamflow data record.

6.
Once steps 1-5 were complete, a single time series of observations error variance and rescaling factors was

Fig. 4 .
Fig. 4. Simulated and observed daily streamflow (Q) and model streamflow prediction residuals (simulated minus observed) at the catchment outlet (N7).(a.1) and (a.2) present the calibration period.(b.1) and (b.2) present evaluation sub-period 1, which has only moderate and minor flood events.(c.1) and (c.2) present evaluation sub-period 2, which has 3 major flood events.The daily rainfall plotted on the right axis correspond to the averaged rainfall over the entire catchment. 795

Figure 4 .
Figure 4. Simulated and observed daily streamflow (Q) and model streamflow prediction residuals (simulated minus observed) at the catchment outlet (N7).(a.1) and (a.2) present the calibration period.(b.1) and (b.2) present evaluation sub-period 1, which has only moderate and minor flood events.(c.1)and (c.2) present evaluation sub-period 2, which has 3 major flood events.The daily rainfall plotted on the right axis correspond to the averaged rainfall over the entire catchment.

Figure 7 .Figure 8 .
Figure 7. Rank histograms of the open-loop and updated streamflow ensemble predictions.(a) presents the results from the lumped scheme at node N7. (b)-(d) present the results from the semi-distributed (semidist) scheme at nodes N7, N1 and N3.

Fig. 10 .
Fig. 10.Streamflow (Q in mm d −1 ) and soil moisture (θ in mm d −1 ) ensemble prediction at the catchment outlet, before and after SM-DA for evaluation sub-period 2 (01 May 2007 -02 March 2014), which had three major flooding events.(a.1) and (a.2) present the results for the lumped model.(b.1) and (b.2) present the results for the semi-distributed model. 1030

Figure 10 .
Figure 10.Streamflow in mm d −1 ) and soil moisture (θ in mm d −1 ) ensemble prediction at the catchment outlet, before and after SM-DA for evaluation sub-period 2 (1 May 2007-2 March 2014), which had three major flooding events.(a.1) and (a.2) present the results for the lumped model.(b.1) and (b.2) present the results for the semi-distributed model.

Table 1 .
Area and mean annual rainfall of the catchments used in the lumped and semi-distributed schemes.

Table 2 .
Model evaluation at the catchment outlet (N7) and at the inner catchments (N1 and N3), for calibration and evaluation periods.RMSE and PVE statistics are in units of mm.

Table 3 .
Parameter T and correlation coefficient between model SM (θ ) and satellite SM, before and after SWI transformation and rescaling.Results are presented for the entire catchment.

Table 4 .
SM-DA evaluation statistics calculated at the catchment outlet (N7) and at the inner catchments (N1 and N3).