Reply on RC2

Anonymous Referee 2: In this work, Almeida et al. compared the performance of 2-D lake models with/without accounting for lateral flow, 1-D lake models (Hostetler-based and FLake) and data-based ANN models in simulating the thermal regimes of 24 reservoirs in Portugal. They domenstrated that for reservoirs with short WRT, it is important to represent the effect of lateral flow and water level fluctuation in the lake models of GCMs and RCMs. Although the importance of lateral flow in the thermal regimes of reservoirs has been investigated by previous studies, the work of Almeida et al. is novel in three aspects: 1) the investigation of a large set of reservoirs, 2) the inclusion of ML methods, and 3) the comparison of multiple 1-D lake models. The manuscript is well written and easy to follow. I agree with the comments of the first reviewer and provide additional comments. I recommend the publication of this work after these comments are addressed.

reproduce the behavior of the Hostettler model, as implemented in various systems including WRF, and to additionally refine the model eddy diffusivity parameterization.
COMMENT: Second, it looks that the 1-D lake models were not calibrated in the study but the ANN model because it is based on the 2-D reservoir model was implicitly calibrated. Thus, in my view, the comparison of their performance in the current format is unfair. According to my own experience, by calibration, 1-D lake models can also mimic some effect of lateral flow and water level change. But whether this is physically sound is another story. However, my point is that the current experiment design does not convince me the superior of ANN over 1-D models in representing lake thermal dynamics for GCMs and RCMs because when we have data to train ANN we can also use the data to calibrate 1-D models.

RESPONSE:
Thank you for this comment. We understand the reviewer's concern. In our opinion, the comparison of the models' results -1-D models versus ANN -would be unfair only if the terms of the comparison were unknown. Simple models like FLake and Hostetler are coupled with numerical weather prediction models, due to their computational efficiency, but also because their parameters should not be re-evaluated when the model is applied to a specific lake. This is the principle that guided the development of both models. Also, this is the reason why the parameterization of eddy diffusion described by Hostetler and Bartlein (1990) followed the Henderson-Sellers (1985) method instead of parameterizations requiring individual model calibration (e.g., Sundaram and Rehm, 1973). It is not feasible to calibrate dozens or hundreds of lakes for numerical climate prediction. Therefore, to estimate the performance of 1-D models in the way they are applied in regional and global models we did not calibrate them during the development of this work.
At this point, in our opinion, a question needs to be answered: what is the way forward when it comes to improving on or reducing the impact of all of the above-mentioned limitations, in particular the neglect of horizontal transport process?
We agree with the reviewer. Through the calibration of 1-D models it is possible to "mimic" some effect of lateral flow but in our opinion this is not the best way to address the issue. We think that, by forcing other parameters or constants, we are probably unbalancing the model's response in certain specific conditions. Moreover, as the reviewer says, we do not ensure a physically sound response. Could the solution for improving the parameterization of lakes inflows and outflows be in the consideration of a simplified hydrological model? Reducing this approach to its basics: we would compute inflows from precipitation, taking into consideration a constant runoff coefficient and a constant lake outflow. In our opinion, considering that we need to avoid the calibration of the model, this solution could also substantially increase the errors associated with surface-water temperature predictions.
This was the reason why we have included the ANN in our study. We think that progress in improving the parameterization of lakes in the climate system can be obtained by a combination of both approaches: process-based physical models and machine-learning solutions, when the limitations and advantages of each of them have been considered. It is true that the use of machine-learning approaches relies on the existence of training data that can sometimes be difficult to obtain. We think that, with the constant development of remote-sensing technologies, this limitation can be considerably diminished. It is also important to mention that, after the initial work of defining the neural network and all its components is done, the ANN needs to be trained not calibrated, which is different. In our study we show that this approach can be a good solution for this problem.

Specific comments
COMMENT: L22-24: as indicated above, I do not think the current results can make such a statement. Further, there is another difficulty for ANN models to replace 1-D lake models in GCMs and RCMs. Compared with ANN models, 1-D lake models are much more generalized because they are physically based. For example, due to the limitation of model resolutions, usually the lake grid cells in GCMs and RCMs do not directly correspond to real lakes. We still do not know whether ANN models trained by data from real lakes can be extended to artificial lake grid cells.

RESPONSE:
Thank you for your comment. We understand the reviewer's concerns. To clarify our point, we have included the following sentence in the revised version of the manuscript in line 24.
"Overall, results suggest that the combined use of process-based physical models and machine-learning models will considerably improve the modeling of air-lake heat and moisture fluxes." We think that the result is quite balanced because we say that: "Our findings also highlight the efficiency of the machine-learning approach, which may overperform processbased physical models both in accuracy and in computational requirements, if applied to reservoirs with long-term observations available." We do not say that machine-learning approaches are better, only that they may perform better in certain conditions. The heat fluxes retrieved from the output of an ANN will affect the near-surface atmospheric layer in the same way as a physically based model. The type of output of both models is precisely the same. In our opinion, the mismatch of the lake grid cells in GCMs and RCMs with the real lake dimensions is indeed a problem, but it is a problem for the physically based model -whose performance is greatly affected by the quantification of the lake maximum and mean depths. Our concern regarding the coupling of an ANN with a GCM or RCM relies more on the implementation of the training phase of the ANN. Nonetheless we believe that this constraint, with time, can be overcome. COMMENT: Table 1: Did you use the bathymetry data of the 24 reserviors to setup the models? Or did you only use mean depth, maximum depth and surface area to construct ideal bathymetry for these reserviors? Sometimes, the uncertainty in bathymetry can introduce large uncertainty in 2-D lake modeling.

RESPONSE:
Thank you for this question. Yes, we used the bathymetry data retrieved from 1:25000 topographic charts of the future flooded watersheds area, prior to the dams' construction. We understand the reviewer's concern, as uncertainty in bathymetry can indeed affect considerably 2-D model results. The majority of the 2-D models considered here were also used for water-quality research studies which were finalized before the development of this manuscript. Therefore, they were thoroughly tested. In order to address a comment by reviewer 1, we have included the abovementioned information and a table with the grid dimensions of each reservoir.
COMMENT: L152-154: please rewrite this sentence. It is difficult to understand.

RESPONSE:
Thank you for pointing this out. This sentence has been rewritten.
Line 152-153: "SWT time series were compared using statistic error measures (see Sect. 3.3 for more details), which allowed the assessment of the relation between reservoir WRT and the error that results when the advection due to inflows and outflows is neglected (as mentioned in the introduction, a common feature of contemporary GCMs and RCMs)." Was replaced with "SWT time series obtained with both scenarios, W2 hydrology and W2, were compared using statistic error measures (see Sect. 3.3 for more details), assessing the relationship between the reservoir WRT and the error resulting from the neglect of advection due to inflows and outflows (as mentioned in the introduction, a common feature of contemporary GCMs and RCMs)."