The success of real-time estimation and forecasting applications based on geophysical models has been possible thanks to the two main existing frameworks for the determination of the models' initial conditions: Bayesian data assimilation and variational data assimilation. However, while there have been efforts to unify these two paradigms, existing attempts struggle to fully leverage the advantages of both in order to face the challenges posed by modern high-resolution models – mainly related to model indeterminacy and steep computational requirements. In this article we introduce a hybrid algorithm called OPTIMISTS (Optimized PareTo Inverse Modeling through Integrated STochastic Search) which is targeted at non-linear high-resolution problems and that brings together ideas from particle filters (PFs), four-dimensional variational methods (4D-Var), evolutionary Pareto optimization, and kernel density estimation in a unique way. Streamflow forecasting experiments were conducted to test which specific configurations of OPTIMISTS led to higher predictive accuracy. The experiments were conducted on two watersheds: the Blue River (low resolution) using the VIC (Variable Infiltration Capacity) model and the Indiantown Run (high resolution) using the DHSVM (Distributed Hydrology Soil Vegetation Model). By selecting kernel-based non-parametric sampling, non-sequential evaluation of candidate particles, and through the multi-objective minimization of departures from the streamflow observations and from the background states, OPTIMISTS was shown to efficiently produce probabilistic forecasts with comparable accuracy to that obtained from using a particle filter. Moreover, the experiments demonstrated that OPTIMISTS scales well in high-resolution cases without imposing a significant computational overhead. With the combined advantages of allowing for fast, non-Gaussian, non-linear, high-resolution prediction, the algorithm shows the potential to increase the efficiency of operational prediction systems.

Decision support systems that rely on model-based forecasting of natural phenomena are invaluable to society (Adams et al., 2003; Penning-Rowsell et al., 2000; Ziervogel et al., 2005). However, despite increasing availability of Earth-sensing data, the problem of estimation or prediction in geophysical systems remains as underdetermined as ever because of the growing complexity of such models (Clark et al., 2017). For example, taking advantage of distributed physics and the mounting availability of computational power, modern models have the potential to more accurately represent impacts of heterogeneities on eco-hydrological processes (Koster et al., 2017). This is achieved through the replacement of lumped representations with distributed ones, which entails the inclusion of numerous parameters and state variables. The inclusion of these additional unknowns has the downside of increasing the level of uncertainty in their estimation. Therefore, in order to be able to rely on these high-resolution models for critical real-time and forecast applications, considerable improvements on parameter and initial state estimation techniques must be made with two main goals: first, to allow for an efficient management of the huge number of unknowns; and second, to mitigate the harmful effects of overfitting – i.e. the loss of forecast skill due to an over-reliance on the calibration and training data (Hawkins, 2004). Because of the numerous degrees of freedom associated with these high-resolution distributed models, overfitting is a much bigger threat due to the phenomenon of equifinality (Beven, 2006).

There exists a plethora of techniques to initialize the state variables of a model through the incorporation of available observations, and they possess overlapping features that make it difficult to develop clear-cut classifications. However, two main schools can be fairly identified: Bayesian data assimilation and variational data assimilation. Bayesian data assimilation creates probabilistic estimates of the state variables in an attempt to also capture their uncertainty. These state probability distributions are adjusted sequentially to better match the observations using Bayes' theorem. While the Kalman filter (KF) is constrained to linear dynamics and Gaussian distributions, ensemble Kalman filters (EnKF) can support non-linear models (Evensen, 2009), and particle filters (PFs) can also manage non-Gaussian estimates for added accuracy (Smith et al., 2013). The stochastic nature of these Bayesian filters is highly valuable because equifinality can rarely be avoided and because of the benefits of quantifying uncertainty in forecasting applications (Verkade and Werner, 2011; Zhu et al., 2002). While superior in accuracy, PFs are usually regarded as impractical for high-dimensional applications (Snyder et al., 2008), and thus recent research has focused on improving their efficiency (van Leeuwen, 2015).

On the other hand, variational data assimilation is more akin to traditional calibration approaches (Efstratiadis and Koutsoyiannis, 2010) because of its use of optimization methods. It seeks to find a single–deterministic initial-state-variable combination that minimizes the departures (or variations) of the modelled values from the observations (Reichle et al., 2001) and, commonly, from their history. One- to three-dimensional variants are also employed sequentially, but the paradigm lends itself easily to evaluating the performance of candidate solutions throughout an extended time window in four-dimensional versions (4D-Var). If the model's dynamics are linearized, the optimum can be very efficiently found in the resulting convex search space through the use of gradient methods. While this feature has made 4D-Var very popular in meteorology and oceanography (Ghil and Malanotte-Rizzoli, 1991), its application in hydrology has been less widespread because of the difficulty of linearizing land-surface physics (Liu and Gupta, 2007). Moreover, variational data assimilation requires the inclusion of computationally expensive adjoint models if one wishes to account for the uncertainty of the state estimates (Errico, 1997).

Traditional implementations from both schools have interesting characteristics and thus the development of hybrid methods has received considerable attention (Bannister, 2016). For example, Bayesian filters have been used as adjoints in 4D-Var to enable probabilistic estimates (Zhang et al., 2009). Moreover, some Bayesian approaches have been coupled with optimization techniques to select ensemble members (Dumedah and Coulibaly, 2013; Park et al., 2009). The fully hybridized algorithm 4DEnVar (Buehner et al., 2010) is gaining increasing attention for weather prediction (Desroziers et al., 2014; Lorenc et al., 2015). It is especially interesting that some algorithms have defied the traditional choice between sequential and extended-time evaluations. Weakly constrained 4D-Var allows state estimates to be determined at several time steps within the assimilation time window and not only at the beginning (Ning et al., 2014; Trémolet, 2006). Conversely, modifications to EnKFs and PFs have been proposed to extend the analysis of candidate members/particles to span multiple time steps (Evensen and van Leeuwen, 2000; Noh et al., 2011). The success of these hybrids demonstrates that there is a balance to be sought between the allowed number of degrees of freedom and the amount of information to be assimilated at once.

Following these promising paths, in this article we introduce OPTIMISTS (Optimized PareTo Inverse Modelling through Integrated STochastic Search), a hybrid data assimilation algorithm whose design was guided by the two stated goals: (i) to allow for practical scalability to high-dimensional models and (ii) to enable balancing the imperfect observations and the imperfect model estimates to minimize overfitting. Table 1 summarizes the main characteristics of typical Bayesian and variational approaches and their contrasts with those of OPTIMISTS. Our algorithm incorporates the features that the literature has found to be the most valuable from both Bayesian and variational methods while mitigating the deficiencies or disadvantages associated with these original approaches (e.g. the linearity and determinism of 4D-Var and the limited scalability of PFs): Non-Gaussian probabilistic estimation and support for non-linear model dynamics have been long held as advantageous over their alternatives (Gordon et al., 1993; van Leeuwen, 2009) and, similarly, meteorologists favour extended-period evaluations over sequential ones (Gauthier et al., 2007; Rawlins et al., 2007; Yang et al., 2009). As shown in the table, OPTIMISTS can readily adopt these proven strategies.

Comparison between the main features of standard Bayesian data assimilation algorithms (KF: Kalman filter, EnKF: ensemble KF, PF: particle filter), variational data assimilation (one- to four-dimensional), and OPTIMISTS.

However, there are other aspects of the assimilation problem for which no single combination of features has demonstrated its superiority. For example, is the consistency with previous states better achieved through the minimization of a cost function that includes a background error term (Fisher, 2003), as in variational methods, or through limiting the exploration to samples drawn from that background state distribution, as in Bayesian methods? Table 1 shows that in these cases OPTIMISTS allows for flexible configurations, and it is an additional objective of this study to test which set of feature interactions allows for more accurate forecasts when using highly distributed models. While many of the concepts utilized within the algorithm have been proposed in the literature before, their combination and broad range of available configurations are unlike those of other methods – including existing hybrids which have mostly been developed around ensemble Kalman filters and convex optimization techniques (Bannister, 2016) – and therefore limited to Gaussian distributions and linear dynamics.

In this section we describe OPTIMISTS, our proposed data assimilation
algorithm which combines advantageous features from several Bayesian and
variational methods. As will be explained in detail for each of the steps of
the algorithm, these features were selected with the intent of mitigating the
limitations of existing methods. OPTIMISTS allows selecting a flexible data
assimilation time step

State probability distributions

This process is repeated iteratively each assimilation time step

List of global parameters in OPTIMISTS.

Steps in OPTIMISTS, to be repeated for each assimilation time
step

Let a “particle”

Drawing: draw root samples

Sampling: randomly sample

Simulation: compute

Evaluation: compute the fitness values

Optimization: create additional samples using evolutionary algorithms
and return to 3 (if number of samples is below

Ranking: assign ranks

Weighting: compute the weight

While traditional PFs draw all the root (or base) samples from

In this step the set of root samples drawn is complemented with random
samples. The distinction between root samples and random samples is that the
former are those that define the probability distribution

In this step, the algorithm uses the model to compute the resulting state
vector

In order to determine which initial state

In OPTIMISTS any such fitness metric could be used and, most importantly, the
algorithm allows defining several of them. Moreover, users can determine
whether each function is to be minimized (e.g. costs or errors) or maximized
(e.g. likelihoods). We expect these features to be helpful if one wishes to
separate errors when multiple types of observations are available (Montzka et
al., 2012) and as a more natural way to consider different fitness criteria
(lumping them together in a single function as in Eq. (1) can lead to
balancing and “apples and oranges” complications). Moreover, it might prove
beneficial to take into account the consistency with the state history both
by explicitly defining such an objective here and by allowing states to be
sampled from the previous distribution (and thus compounding the individual
mechanisms of Bayesian and variational methods). Functions to measure this
consistency are proposed in Sect. 2.2. With the set of objective functions
defined by the user, the algorithm computes the vector of fitness metrics

The optimization step is optional and is used to generate additional particles by exploiting the knowledge encoded in the fitness values of the current particle ensemble. In a twist to the signature characteristic of variational data assimilation, OPTIMISTS incorporates evolutionary multi-objective optimization algorithms (Deb, 2014) instead of the established gradient-based, single-objective methods. Evolutionary optimizers compensate for their slower convergence speed with the capability of efficiently navigating non-convex solution spaces (i.e. the models and the fitness functions do not need to be linear with respect to the observations and the states). This feature effectively opens the door for variational methods to be used in disciplines where the linearization of the driving dynamics is either impractical, inconvenient, or undesirable. Whereas any traditional multi-objective global optimization method would work, our implementation of OPTIMISTS features a state-of-the-art adaptive ensemble algorithm similar to the algorithm of Vrugt and Robinson (2007), AMALGAM, that allows model simulations to be run in parallel (Crainic and Toulouse, 2010). The optimizer ensemble includes a genetic algorithm (Deb et al., 2002) and a hybrid approach that combines ant colony optimization (Socha and Dorigo, 2008) and Metropolis–Hastings sampling (Haario et al., 2001).

During the optimization step, the group of optimizers is used to generate

A fundamental aspect of OPTIMISTS is the way in which it provides a probabilistic interpretation to the results of the multi-objective evaluation, thus bridging the gap between Bayesian and variational assimilation. Such method has been used before (Dumedah et al., 2011) and is based on the employment of non-dominated sorting (Deb, 2014), another technique from the multi-objective optimization literature, which is used to balance the potential tensions between various objectives. This sorting approach is centred on the concept of dominance, instead of organizing all particles from the best to the worst. A particle dominates another if it outperforms it according to at least one of the criteria/objectives while simultaneously is not being outperformed according to any of the others. Following this principle, in the ranking step particles are grouped in fronts comprised of members which are mutually non-dominated; that is, none of them is dominated by any of the rest. Particles in a front, therefore, represent the effective trade-offs between the competing criteria.

Figure 1c illustrates the result of non-dominated sorting applied to nine
particles being analysed under two objectives: minimum deviation from
observations and maximum likelihood given the background state distribution

In this final step, OPTIMISTS assigns weights

As mentioned before, OPTIMISTS uses kernel density probability distributions
(West, 1993) to model the stochastic estimates of the state-variable vectors.
The algorithm requires two computations related to the state-variable
probability distribution

When the state vector of the model becomes large (i.e.

With only the diagonal terms of matrix

In this section we prepare the elements to investigate whether OPTIMISTS can help improve the forecasting skill of hydrologic models. More specifically, the experiments seek to answer the following questions. Which characteristics of Bayesian and variational methods are the most advantageous? How can OPTIMISTS be configured to take advantage of these characteristics? How does the algorithm compare to established data assimilation methods? And how does it perform with high-dimensional applications? To help answer these questions, this section first introduces two case studies and then it describes a traditional PF that was used for comparison purposes.

We coupled a Java implementation of OPTIMISTS with two popular open-source distributed hydrologic modelling engines: Variable Infiltration Capacity (VIC) (Liang et al., 1994, 1996a b; Liang and Xie, 2001, 2003) and the Distributed Hydrology Soil Vegetation Model (DHSVM) (Wigmosta et al., 1994, 2002). VIC is targeted at large watersheds by focusing on vertical subsurface dynamics and also enabling intra-cell precipitation, soil, and vegetation heterogeneity. The DHSVM, on the other hand, was conceived for high-resolution representations of the Earth's surface, allowing for saturated and unsaturated subsurface flow routing and 1-D or 2-D surface routing (Zhang et al., 2018). Both engines needed several modifications so that they could be executed in a non-continuous fashion as required for sequential assimilation. Given the non-Markovian nature of surface routing schemes coupled with VIC that are based either on multiscale approaches (Guo et al., 2004; Wen et al., 2012) or on the unit hydrograph concept (Lohmann et al., 1998), a simplified routing routine was developed that treats the model cells as channels – albeit with longer retention times. In the simplified method, direct run-off and baseflow produced by each model cell is partly routed through an assumed equivalent channel (slow component) and partly poured directly to the channel network (fast component). Both the channel network and the equivalent channels representing overland flow hydraulics are modelled using the Muskingum method. On the other hand, several important bugs in version 3.2.1 of the DHSVM, mostly related to the initialization of state variables but also pertaining to routing data and physics, were fixed.

We selected two watersheds to perform streamflow forecasting tests using
OPTIMISTS: one with the VIC model running at a

Characteristics of the two test watersheds: Blue River and Indiantown Run. US hydrologic units are defined in Seaber et al. (1987). Elevation information was obtained from the Shuttle Radar Topography Mission (Rodríguez et al., 2006); land cover and impervious percentage from the National Land Cover Database, NLCD (Homer et al., 2012); soil type from CONUS-SOIL (Miller and White, 1998); and precipitation, evapotranspiration, and temperature from NLDAS-2 (Cosgrove et al., 2003). The streamflow and temperature include their range of variation of 90 % of the time (5 % tails at the high and low end are excluded).

Maps of the two test watersheds in the United States displaying the
30 m resolution land cover distribution from the NLCD (Homer et al., 2012).

These optimal parameter sets, together with additional sets produced in the
optimization process, were used to run the models and determine a set of
time-lagged state-variable vectors

Set-up of the three factorial experiments, including the watershed,
the total number of configurations (conf.), the values assigned to OPTIMISTS'
parameters, and which objectives (objs.) were used (one objective: minimize
MAE given the streamflow observations; two objectives: minimize MAE and
maximize likelihood given the source or background state distribution

Three diverse scenarios were selected for the Blue River, each of them comprised of a 2-week assimilation period (when streamflow observations are assimilated) and a 2-week forecasting period (when the model is run in an open loop using the states obtained at the end of the assimilation period): Scenario 1, starting on 15 October 1996, is rainy through the entire 4 weeks. Scenario 2, which starts on 15 January 1997, has a dry assimilation period and a mildly rainy forecast period. Scenario 3, starting on 1 June 1997, has a relatively rainy assimilation period and a mostly dry forecast period. Two scenarios, also spanning 4 weeks, were selected for the Indiantown Run, one starting on 26 July 2009 and the other on 26 August 2009.

We used factorial experiments (Montgomery, 2012) to test different
configurations of OPTIMISTS on each of these scenarios, by first assimilating
the streamflow and then measuring the forecasting skill. In this type of
experimental designs a set of assignments is established for each parameter
and then all possible assignment combinations are tried. The design allows us
to establish the statistical significance of altering several parameters
simultaneously, providing an adequate framework for determining, for example,
whether using a short or a long assimilation time step

Observation errors are usually taken into account in traditional assimilation
algorithms by assuming a probability distribution for the observations at
each time step and then performing a probabilistic evaluation of the
predicted value of each particle/member against that distribution. As
mentioned in Sect. 2, such a fitness metric, like the likelihood utilized in
PFs to weight candidate particles, is perfectly compatible with OPTIMISTS.
However, since it is difficult to estimate the magnitude of the observation
error in general, and fitness metrics

For the Blue River scenarios, a secondary likelihood objective or metric was
used in some cases to select for particles with higher consistency with the
state history. It was computed using either Eq. (

Comparing the performance of different configurations of OPTIMISTS can shed
light into the adequacy of individual strategies utilized by traditional
Bayesian and variational methods. For example, producing all particles with
the optimization algorithms (

Additionally, the comparison is performed using a continuous forecasting
experiment set-up instead of a scenario-based one. In this continuous test,
forecasts are performed every time step and compiled in series for different
forecast lead times that span several months. Forecast lead times are of 1,
3, 6, and 12 days for the Blue River and of 6 h, and 1, 4, and 16 days for
the Indiantown Run. Before each forecast, both OPTIMISTS and the PF
assimilate streamflow observations for the assimilation time step of each
algorithm (daily for the PF). The assimilation is performed cumulatively,
meaning that the initial state distribution

This section summarizes the forecasting results obtained from the three
scenario-based experiments and the continuous forecasting experiments on the
Blue River and the Indiantown Run model applications. The scenario-based
experiments were performed to explore the effects of multiple
parameterizations of OPTIMISTS, and the performance was analysed as follows.
The model was run for the duration of the forecast period (2 weeks) using
the state configuration encoded in each root state

The Supplement includes the performance metrics for all the
tested configurations on all scenarios and for all scenario-based
experiments. Figure 3 summarizes the results for Experiment 1 with the VIC
model application for the Blue River watershed, in which the distributions of
the changes in MARE after marginalizing the results for each
scenario and each of the parameter assignments are shown. That is, each box
(and pair of whiskers) represents the distribution of change in
MARE of all cases in the specified scenario or for which the
specified parameter assignment was used. Negative values in the vertical axis
indicate that OPTIMISTS decreased the error, while positive values indicate
it increased the error. It can be seen that, on average, OPTIMISTS improves
the precision of the forecast in most cases, except for several of the
configurations in Scenario 1 (for this scenario the control already produces
a good forecast) and when using an assimilation step

Box plots of the changes in forecasting error (MARE) achieved while
using OPTIMISTS on Experiment 1 (Blue River). Changes are relative to an
open-loop control run where no assimilation was performed. Each column
corresponds to the distribution of the error changes on the specified
scenario or assignment to the indicated parameter. Positive values indicate
that OPTIMISTS increased the error, while negative values indicate it
decreased the error. Outliers are noted as asterisks and values were limited
to 100 %. For the one-objective case, the particles' MAE was to be
minimized; for the two-objective case, the likelihood given the background
was to be maximized in addition. No optimization (“false”) corresponds to

A

On the other hand, we found it counterintuitive that neither using a larger particle ensemble nor taking into account state-variable dependencies through the use of F-class kernels leads to improved results. In the first case it could be hypothesized that using too many particles could lead to overfitting, since there would be more chances of particles being generated that happen to match the observations better but for the wrong reasons. In the second case, the non-parametric nature of kernel density estimation could be sufficient for encoding the raw dependencies between variables, especially in low-resolution cases like this one, in which significant correlations between variables in adjacent cells are not expected to be too high. Both results deserve further investigation, especially concerning the impact of D- vs. F-class kernels in high-dimensional models.

Interestingly, the ANOVA also yielded small

Continuous daily streamflow forecast performance metrics for the
Blue River application using OPTIMISTS (

Comparison of 6-day lead time probabilistic streamflow

Based on these results, we recommend the use of both objectives and no optimization as the preferred configuration of OPTIMISTS for the Blue River application. A time step of around 5 days appears to be adequate for this specific model application. Also, without strong evidence for their advantages, we recommend using more particles or kernels of class F only if there is no pressure for computational frugality. However, the number of particles should not be too small to ensure an appropriate sample size.

Table 5 shows the results of the 5-month-long continuous forecasting
experiment on the Blue River using a 30-particle PF and a configuration of
OPTIMISTS with a 7-day assimilation time step

Box plots of the changes in forecasting performance
(NSE

Figure 4 shows the probabilistic streamflow forecasts for both algorithms for a lead time of 6 days. The portrayed evolution of the density, in which the mean does not necessarily correspond to the centre of the ensemble spread, evidences the non-Gaussian nature of both estimates. Both the selected configuration of OPTIMISTS and the PF methods show relatively good performance for all lead times (1, 3, 6, and 12 days) based on the performance metrics. However, the PF generally outperforms OPTIMISTS.

We offer three possible explanations for this result. First, the relatively low dimensionality of this test case does not allow OPTIMISTS to showcase its real strength, perhaps especially since the large scale of the watershed does not allow for tight spatial interactions between state variables. Second, OPTIMISTS can find solutions based on multiple objectives rather than a single one, which could be advantageous when multiple types of observations are available (e.g. of streamflow, evapotranspiration, and soil moisture). Thus, the solutions are likely not the best for each individual objective, but the algorithm balances their overall behaviour across the multiple objectives. Due to the lack of observations on multiple variables, only streamflow observations are used in these experiments even though more than one objective is used. Since it is the case that these objectives are consistent with each other, to a large extent, for the studied watershed, the strengths of using multiple objectives within the Pareto approach in OPTIMISTS cannot be fully evidenced. Third, additional efforts might be needed to find a configuration of the algorithm, together with a set of objectives, that best suits the specific conditions of the tested watershed.

While PFs remain easier to use out of the box because of their ease of configuration, the fact that adjusting the parameters of OPTIMISTS allowed us to trade off deterministic and probabilistic accuracy points to the adaptability potential of the algorithm. This allows for probing the spectrum between exploration and exploitation of candidate particles – which usually leads to higher and lower diversity of the ensemble, respectively.

Box plots of the changes in forecasting performance (NSE

Figure 5 summarizes the changes in performance when using OPTIMISTS in Experiment 2. In this case, the more uniform forcing and streamflow conditions of the two scenarios allowed us to statistically analyse all three performance metrics. For Scenario 1 we can see that OPTIMISTS produces a general increase in the Nash–Sutcliffe coefficients, but a decline in the MARE, evidencing tension between fitting the peaks and the inter-peak periods simultaneously. For both scenarios there are configurations that performed very poorly, and we can look at the marginalized results in the box plots for clues into which parameters might have caused this. Similar to the Blue River case, the use of a 1 h time step significantly reduced the forecast skill, while the longer step almost always improved it; and the inclusion of the secondary history-consistent objective (two objectives) also resulted in improved performance. Not only does it seem that for this watershed the secondary objective mitigated the effects of overfitting, but it was interesting to note some configurations in which using it actually helped to achieve a better fit during the assimilation period.

While the ANOVA also provided evidence against the use of optimization algorithms, we are reluctant to instantly rule them out on the grounds that there were statistically significant interactions with other parameters (see the ANOVA table in the Supplement). The optimizers led to poor results in cases with 1 h time steps or when only the first objective was used. Other statistically significant results point to the benefits of using the root samples more intensively (in opposition to using random samples) and, to a lesser extent, to the benefits of maintaining an ensemble of moderate size.

Figure 6 shows the summarized changes in Experiment 3, where the effect of
the time step

Table 6 and Fig. 7 show the results from comparing continuous forecasts from the PF and from a configuration of OPTIMISTS with a time step of 1 week, two objectives, 50 particles, and no optimization. Both algorithms display overconfidence in their estimations, which is evidenced in Fig. 7 by the bias and narrowness of the ensembles' spread. It is possible that a more realistic incorporation of uncertainties pertaining to model parameters and forcings (which, as mentioned, are trivialized in these tests) would help to compensate for overconfidence. For the time being, these experiments help characterize the performance of OPTIMISTS in contrast with the PF, as both algorithms are deployed under the same circumstances. In this sense, while the forecasts obtained using the PF show slightly better results for lead times of 6 h and 1 day, OPTIMISTS shows a better characterization of the ensemble's uncertainty for the longer lead times.

Continuous hourly streamflow forecast performance metrics for the
Indiantown Run application using OPTIMISTS (

Comparison of 4-day lead time probabilistic streamflow

OPTIMISTS' improved results in the high-resolution test case over those in the low-resolution one suggest that the strengths of the hybrid method might become more apparent as the dimensionality, and therefore the difficulty, of the assimilation problem increases. However, while OPTIMISTS was able to produce comparable results to those of the PF, it was not able to provide definite advantages in terms of accuracy. As suggested before, additional efforts might be needed to find the configurations of OPTIMISTS that better match the characteristics of the individual case studies and, as with the Blue River, the limitation related to the lack of observations of multiple variables also applies here. Moreover, the implemented version of the PF did not present the particle degeneracy or impoverishment problems usually associated with these filters when dealing with high dimensionality, which also prompts further investigation.

It is worth noting that the longer the assimilation time step, the faster the entire process is. This occurs because, even though the number of hydrological calculations is the same in the end, for every assimilation time step the model files need to be generated accordingly, then accessed, and finally the result files written and accessed. This whole process takes a considerable amount of time. Therefore, everything else being constant, sequential assimilation (like with PFs) automatically imposes additional computational requirements. In our tests we used RAM drive software to accelerate the process of running the models sequentially and, even then, the overhead imposed by OPTIMISTS was consistently below 10 % of the total computation time. Most of the computational effort remained with running the model, both for VIC and the DHSVM. In this sense, model developers may consider allowing their engines to be able to receive input data from main memory, if possible, to facilitate data assimilation and other similar processes.

Finally, here we summarize the recommended choices for the parameters in
OPTIMISTS based on the results of the experiments. In the first place, given
their low observed effect, default values can be used for

Even though only two objective functions were tested, one measuring the
departures from the observations being assimilated and another measuring the
compatibility of initial samples with the initial distribution, the results
clearly show that it is beneficial to simultaneously evaluate candidate
particles using both criteria. While traditional cost functions like the one
in Eq. (

Our results demonstrated that the assimilation time step is the most
sensitive parameter and, therefore, its selection must be done with the
greatest involvement. Taking the results together, we recommend that multiple
choices be tried for any new case study looking to strike a balance between
the amount of information being assimilated and the number of degrees of
freedom. This empirical selection should also be performed with a rough sense
of what is the range of forecasting lead times that is considered the most
important. Lastly, more work is required to provide guidelines to select the
number of particles

In this article we introduced OPTIMISTS, a flexible, model-independent data assimilation algorithm that effectively combines the signature elements from Bayesian and variational methods: by employing essential features from particle filters, it allows performing probabilistic non-Gaussian estimates of state variables through the filtering of a set of particles drawn from a prior distribution to better match the available observations. Adding critical features from variational methods, OPTIMISTS grants its users the option of exploring the state space using optimization techniques and evaluating candidate states through a time window of arbitrary length. The algorithm fuses a multi-objective or Pareto analysis of candidate particles with kernel density probability distributions to effectively bridge the gap between the probabilistic and the variational perspectives. Moreover, the use of evolutionary optimization algorithms enables its efficient application on highly non-linear models as those usually found in most geosciences. This unique combination of features represents a clear differentiation from the existing hybrid assimilation methods in the literature (Bannister, 2016), which are limited to Gaussian distributions and linear dynamics.

We conducted a set of hydrologic forecasting factorial experiments on two watersheds, the Blue River with 812 state variables and the Indiantown Run with 33 455, at two distinct modelling resolutions using two different modelling engines: VIC and the DHSVM, respectively. Capitalizing on the flexible configurations available for OPTIMISTS, these tests allowed us to determine which individual characteristics of traditional algorithms prove to be the most advantageous for forecasting applications. For example, while there is a general consensus in the literature favouring extended time steps (4-D) over sequential ones (1-D–3-D), the results from assimilating streamflow data in our experiments suggest that there is an ideal duration of the assimilation time step that is dependent on the case study under consideration, on the spatiotemporal resolution of the corresponding model application, and on the desired forecast length. Sequential time steps not only required considerably longer computational times but also produced the worst results – perhaps given the overwhelming number of degrees of freedom in contrast with the scarce observations available. Similarly, there was a drop in the performance of the forecast ensemble when the algorithm was set to use overly long time steps.

Procuring the consistency of candidate particles, not only with the observations but also with the state history, led to significant gains in predictive skill. OPTIMISTS can be configured to both perform Bayesian sampling and find Pareto-optimal particles that trade off deviations from the observations and from the prior conditions. This Bayesian and multi-objective formulation of the optimization problem was especially beneficial for the high-resolution watershed application, as it allows the model to overcome the risk of overfitting generated by the enlarged effect of equifinality.

On the other hand, our experiments did not produce enough evidence to recommend either exploring the state space with optimization algorithms instead of doing so with simple probabilistic sampling, the use of a larger number of particles above the established baseline of 100, or the computationally intensive utilization of full covariance matrices to encode the dependencies between variables in the kernel-based state distributions. Nevertheless, strong interactions between several of these parameters suggest that some specific combinations could potentially yield strong outcomes. Together with OPTIMISTS' observed high level of sensitivity to the parameters, these results indicate that there could be promise in the implementation of self-adaptive strategies (Karafotias et al., 2014) to assist in their selection in the future. With these experiments, we were able to configure the algorithm to consistently improve the forecasting skill of the models compared to control open-loop runs. Additionally, comparative tests showed that OPTIMISTS was able to reliably produce adequate forecasts that were comparable to those resulting from assimilating the observations with a particle filter in the high-resolution application. While not being able to provide consistent accuracy advantages over the implemented particle filter, OPTIMISTS does offer considerable gains in computational efficiency given its ability to analyse multiple model time steps each time.

Moreover, in this article we offered several alternatives in the implementation of the components of OPTIMISTS whenever there were tensions between prediction accuracy and computational efficiency. In the future, we will focus on incorporating additional successful ideas from diverse assimilation algorithms and on improving components in such a way that both of these goals are attained with ever-smaller compromises. For instance, the estimation of initial states should not be overburdened with the responsibility of compensating for structural and calibration deficiencies in the model. In this sense, we embrace the vision of a unified framework for the joint probabilistic estimation of structures, parameters, and state variables (Liu and Gupta, 2007), where it is important to address challenges associated with approaches that would increase the indeterminacy of the problem by adding unknowns without providing additional information or additional means of relating existing variables. We expect that with continued efforts OPTIMISTS will be a worthy candidate framework to be deployed in operational settings for hydrologic prediction and beyond.

All the data utilized to construct the models are
publicly available through the internet from their corresponding
US government agencies' websites. The Java implementation of OPTIMISTS and of
the particle filter are available through GitHub (2018)
(

The supplement related to this article is available online at:

FH designed and implemented OPTIMISTS, implemented the particle filter, performed the experiments, and drafted the manuscript. XL identified problems for study; provided guidance; supervised the investigation, including experiment design; and finalized the manuscript.

The authors declare that they have no conflict of interest.

The authors are thankful to the two referees and the editor for their valuable comments and suggestions. This work was supported in part by the United States Department of Transportation through award no. OASRTRS-14-H-PIT to the University of Pittsburgh and by the William Kepler Whiteford Professorship from the University of Pittsburgh. Edited by: Dimitri Solomatine Reviewed by: Maurizio Mazzoleni and one anonymous referee