Comment on hess-2021-64

The manuscript “Spatiotemporal development of the 2018–2019 groundwater drought in the Netherlands: a data-based approach” by Brakkee et al. presents an application of time series modeling of simulate groundwater levels (GWL) to study groundwater droughts, and I found the manuscript interesting to read. The authors deal with the common problem of irregular time steps between GWL observations. Time series modeling is applied to obtain GWL time series that can be used to study the droughts of 2018 and 2019 in the Netherlands. At this point I need to acknowledge here that my response may not be entirely impartial, being one of the authors of the Pastas software, and apologize in advance for any self-referencing. However, I wanted to provide the Authors with some suggestions to further strengthen the acceptability of the reported results, and improve the manuscript regarding its reproducibility. I will restrict myself to the time series modeling approach, as I think others have already provided excellent reviews of the manuscript in its entirety.


Effect of use of linear model
An important assumption underlying this study is that a linear recharge model can be used to accurately simulate the effects of precipitation and evaporation on the GWL. Previous studies have shown (e.g., Berendrecht et al., 2005;Peterson and Western, 2014;Collenteur et al., 2020) that this assumption may not always be valid, particularly during drought periods when non-linear unsaturated zone processes become more important (e.g., evaporation limited by the availability of soil moisture). This is particularly important because these droughts events are the periods of interest in this study. This model deficiency may partly explain the large RMSE and ME values in Table 3 of the manuscript and the results shown in Figure 8. The linear model could still be an appropriate choice here, but a justification of this assumption is required in my opinion. The impact of this assumption on the estimated SGI values and the results in general could also be discussed in the discussion section (e.g., in lines 450-455).

Uncertainty in time series modelling and its impact on the SGI
Time series models are used to obtain regular GWL time series, comparable to the approach presented by Marchant and Bloomfield (2018). In that study the uncertainty of the simulated GWL was also quantified and used to compute the uncertainty of the SGI values. I think it would be interesting to do this in this study as well (or discuss why this is not done), given that, despite generally good fits, the simulated GWL time series and thus the SGI may still have considerable uncertainties.

Reproducibility of results
Some of the claims made in the manuscript directly depend on how the time series modeling was done, which is very briefly described in section 3.1 and 3.2. From the information contained in the manuscript it is not possible to reproduce the results from this study, to verify any of these claims. I therefore think some work is required to improve the reproducibility of the presented work. This could be a much more detailed description of the modeling process (e.g., settings, model structure, calibration settings, software versions), but perhaps an easier way to do this would be to upload all scripts and data (if allowed) to an online repository (e.g., Zenodo) and assign an DOI. This would enable other researchers to build upon this work more easily and make it a more valuable contribution.

Specific Line Comments:
Please find a few specific lines comments below.
L168. I would kindly ask the Authors to change "PASTAS" to "Pastas" throughout the manuscript.
L168-170. I think it would be good to rephrase this sentence, because it reads as if this was the goal of developing Pastas, which is incorrect. The use of (Python) scripting is what allows the models to be applied in larger workflows.

L170.
It is unclear from the manuscript or the reference what "the basic settings of Pastas" are. It should either be described in the manuscript, or the scripts and data can be provided in an Appendix or external repository. Sharing the scripts would help in improving the reproducibility, without requiring the Authors to go into details about the modeling in the manuscript itself and increasing its length.

L172.
Pastas has multiple non-linear recharge models, based on Berendrecht et al. (2005) and Collenteur et al. (2020). It is not clear which method has been tried here, but this would be valuable information. Moreover, the statement is not supported by any data presented in this manuscript, so it is hard to verify such a general statement. Since the non-linear models have more parameters to fit the model to the data, compared to the linear model, I find the finding somewhat surprising. However, it could be the case that the linear model really does work better, perhaps because evaporation of groundwater occurs at most monitoring wells (which would be an interesting finding in itself).
L173-174. "Small minority" could be quantified (e.g., XX number of wells). Also, the current phrasing in this sentence seems to suggest that a non-linear model was used for some locations, but in that case no parameter "f" would be available for some models (L.185).
L462. The fact that the drought was overestimated may also result from the use of a linear model, where evaporation is not limited by soil water availability and may ultimately lead to a decrease in simulated GWL.