This is a re-review of the paper by Berg et al. on "Covariance resampling for particle filter – state and parameter estimation for soil hydrology". I have carefully read the responses of the authors' to my earlier comments and those of the other referee. The authors' responses are professional and clearly indicate how and where changes were made and why. This certainly makes reviewing easier (as not all authors write a detailed response letter). In my letter to the editor I wrote the following:
"...I can entertain/live with some of their answers to my comments, yet, I remain skeptical that the proposed method is going to work for more complex problems - and/or enlarged parameter space (current ranges are rather tight). The authors' responses to my more theoretical questions I more or less expected. They do not deny/refute important weaknesses/difficulties of their proposed resampling method. In my view the paper focuses too much on practical application without going in depth about the underlying theory. Some changes have been made to the paper to mention problems with PDF approximation, yet, I feel that the method is not properly demonstrated with the Richards' type flow model. Authors emphasize in their response letter their interest in the application - and that the PF is designed for soil hydrology (not mentioned this way) - and thus state/parameter dimensionality is not of their concern. In my view this really limits the potential impact of the paper. Right now, the paper is very much an Engineering solution - which may work well for simple problems with a model that tracks the data closely (transition density is more than adequate) and low state and parameter dimensionality. Beyond this simple case nothing else is demonstrated. What is more, the authors agree with my comment on the subjectivity of the inflation factor, gamma (see equation 21) - but their response that this variable can be estimated on the fly during assimilation is problematic as several different values of gamma will work. Some value of gamma will provide a nicer convergence of the multivariate parameter and state distribution than others. But all those gamma values are equally likely; in other words, there is no metric that will help determine what value of gamma to use; so what value to use in practice? Personally, I find this troublesome - hence why I used the wording "Engineering Solution". I can live with this but this really diminishes the impact of the work. Of course, HESS is not a statistics journal, yet, research focused on methodological improvements should in my view be held to higher standards than more application oriented papers. Ideally, we would want people from other fields use our methodologies - that requires that improvements satisfy underlying statistical principles. This is not the case. I will not use this to say that the paper should not be published. I just want to express that I think that the paper could be much better if:
1) the authors were more concerned with "statistics", that is, satisfying/honoring important principles of convergence and approximation. The resampling method does not leave the target PDF invariant. This is something that I think we should care about as others may apply the method and present the results as if they are exactly right. This will not be the case. I see a parallel here with the poor numerics of many models used in surface hydrology. For years, this issue was denied/not investigated as people believed that the numerical error would be small compared to, say, measurement errors. Then, from 2004 papers by Kavetski, Clark, Schoups, etc. have demonstrated the effect of poor numerics on parameter and state estimates. Turned out to have a major effect; what is more, this work more or less refuted the need for global optimization methods as mass-conservative solvers turned out to produce much smoother response surfaces with a much better defined global optimum. Thus, a deeper focus on the methodology may proof very useful - and will certainly enhance tremendously the impact of the paper.
2) The application in the presented paper is rather weak - that is - the model (transition density) is (essentially) perfect and the parameter/state space is rather low. This is not realistic for real-world applications. This begs the question on how well the method is going to do in practice. In my view this is an important shortcoming.
For a hydrology paper, I can certainly accept weakness (1) above, yet then I would expect the authors to carefully examine their method; this is not the case - weakness (2) has not been addressed in the revision. I think it is important to do so - but will not object against publication if the authors do not address this. I do not want to hold up publication - as I have great respect for Prof. Roth and his co-authors. Hence my recommendation for a minor revision."
Then some other remaining comments:
1. Table 1 - alpha values should have units of 1/m? Or are authors presenting values of 1/alpha instead? That explains the very large values of -12.4 and -7.5 for alpha1 and alpha2 as the air-entry value?
2. Parameter ranges are rather narrow. What would happen if we enlarge n to say 1.1 - 6?
3. Table 1: Are the log(Kw) values natural log-values? I do not think so - otherwise we get values of exp(-4) * 60 * 60 * 24 ~ 1600 m/day. So they must be log10 right? With log10(Kw) values between -7 and -4 the range of Kw is between 0.0086 and 8.64 m/day. The lower bound of 0.86 cm/day may still be considered large for say fine textured soils. Therefore I would suggest using a range of log10(Kw) between -10 and -4 or so. Point is - the parameter space is narrow - this simplifies tremendously inference - but may not necessarily demonstrate convergence properly; how does the filter work with enlarged parameter spaces that encompass a larger ranges of soils?
4. Line 18: generic algorithm? Or genetic algorithm? Generic algorithm would not make sense as this has been presented before in other papers - for example Particle-DREAM. A genetic algorithm on the contrary will not maintain detailed balance unless they use the Differential Evolution variant with Metropolis step as done in DE-MC and DREAM. I have to read this new paper to understand/see what has been done and how this differs from previous DE-MC/DREAM MCMC work.
5. Figure 11: Those 100 ensemble members - are they randomly chosen? Or are these the "best" members; I would be interested to see all the members as "visibility" is not an argument here. You cannot see the 100 different lines anyway (much fewer in practice as particles have many copies). Instead of using Lines you can color the 95% ranges of all particles.
6. In section 6.3, the authors attempt to address the model error by perturbing the forcing ( = boundary conditions). This is not really a model error but measurement error. Indeed, the PF should rapidly address ( = correct for) those errors in a few assimilation steps as this entails conservation of mass. What is much more difficult is to address actual model errors due to an incorrect/absent process representation. Those errors lead to much more complicated and predictable residual patterns, making state and certainly parameter estimation more difficult. Will produce parameter estimates that may compensate at least somewhat for the model errors. I think it is important to address, at least in writing, this difference between forcing data errors and model errors. These two errors are not equal.
Certainly, the authors have taken the time to digest the PF and find ways to improve its efficiency/effectiveness. This is not easy and involves lots of statistics. I hope I encouraged the authors to explore further the methodology they proposed. The comments are certainly intended to do so.
Jasper A. Vrugt |