Exploring Long-term Monthly Prediction of Precipitation Isotopes over Southeast Asia: A Comparative Analysis of Machine-Learning Models

Heydarizad, Mojtaba; Zhongfang, Liu; Pumijumnong, Nathsuda; Minaei, Masoud; Salari, Pouya; Sori, Rogert; Ghalibaf Mohammadabadi, Hamid

doi:https://doi.org/10.5194/hess-2023-299

Preprints

https://doi.org/10.5194/hess-2023-299

Preprints

23 Jan 2024

| 23 Jan 2024

Status: this discussion paper is a preprint. It has been under review for the journal Hydrology and Earth System Sciences (HESS). The manuscript was not accepted for further review after discussion.

Exploring Long-term Monthly Prediction of Precipitation Isotopes over Southeast Asia: A Comparative Analysis of Machine-Learning Models

Mojtaba Heydarizad, Liu Zhongfang, Nathsuda Pumijumnong, Masoud Minaei, Pouya Salari, Rogert Sori, and Hamid Ghalibaf Mohammadabadi

Abstract. Using stable isotope methods is essential for studying tropical hydrology and climatology. The purpose of this research was to investigate the influence of large-scale climate modes (teleconnection indices) and local meteorological parameters on the stable isotope contents in six different stations, including Bangkok, Kuala Lumpur, Jakarta, Kota Bharu, Jayapura, and Singapore in Southeast Asia. To achieve this goal, several machine learning (ML) techniques were employed, such as shallow neural network (SNN), deep neural network (DNN), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). XGBoost demonstrated the highest accuracy across the majority of studied stations, with a R²= 0.91, VNS=0.90, AIC= 405, BIC=410, and RMSE = 0.76. Additionally, DNN exhibited superior accuracy in specific cases, achieving a R²= 0.87, VNS=0.87, AIC = 445, BIC = 460, and RMSE = 1.10. Furthermore, a bootstrap analysis was conducted to assess the uncertainty of the simulated data in each station. The results of this analysis demonstrated acceptable accuracy, as the majority of simulated data points fell within the 95 % confidence intervals. Finally, stable isotope contents in precipitation were forecasted for one year using Vector Autoregression (VAR) and ML techniques. This study underscores the efficacy of ML techniques in both simulating and forecasting stable isotope contents with high precision. The inclusion of specific accuracy metrics strengthens the validity of claims in this study and provides a clearer picture of the quantitative outcomes of this research.

Received: 18 Dec 2023 – Discussion started: 23 Jan 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Mojtaba Heydarizad, Liu Zhongfang, Nathsuda Pumijumnong, Masoud Minaei, Pouya Salari, Rogert Sori, and Hamid Ghalibaf Mohammadabadi

Status: closed

CC1:
'Comment on hess-2023-299', Ali Mobaraki, 28 Jan 2024

This work presents an interesting and novel application of machine learning techniques to simulate and forecast the stable isotope contents in precipitation over Southeast Asia. The authors use various teleconnection indices and meteorological parameters as predictors and compare the performance of different machine learning models. They also conduct a bootstrap analysis to assess the uncertainty of the simulated data and use vector autoregression to forecast the stable isotope contents for one year.
I have a question about the methods section of the abstract. The authors mention that they used several machine learning techniques, such as shallow neural network, deep neural network, decision tree, random forest, and extreme gradient boosting. However, they do not explain comprehensively how they selected and optimized these techniques for their data. How did the authors choose the appropriate hyperparameters, such as the number of layers, nodes, and activation functions for the neural networks, or the number of trees, depth, and splitting criteria for the decision tree, random forest, and extreme gradient boosting? How did they avoid overfitting or underfitting their models? What criteria did they use to evaluate and compare the performance of different models?
I would appreciate it if the authors could provide more details on their methods and results. Thank you for your attention.

Citation: https://doi.org/10.5194/hess-2023-299-CC1
- AC1:
  'Reply on CC1', Mojtaba Heydarizad, 02 Feb 2024
  
  Dear reader,
  
  Thank you for your interest and feedback on our manuscript. We (authors) clarify our methods and results in more detail. Here is our point-by-point response to your comment:
  Comment: The authors mention that they used several machine learning techniques, such as shallow neural network, deep neural network, decision tree, random forest, and extreme gradient boosting. However, they do not explain comprehensively how they selected and optimized these techniques for their data. How did the authors choose the appropriate hyperparameters, such as the number of layers, nodes, and activation functions for the neural networks, or the number of trees, depth, and splitting criteria for the decision tree, random forest, and extreme gradient boosting? How did they avoid overfitting or underfitting their models? What criteria did they use to evaluate and compare the performance of different models?
  Response: We (authors) apologize for not providing enough details regarding our developed models using machine learning techniques. We will revise our manuscript and include the following sentences in the revised manuscript:
  Firstly, to select and optimize the machine learning techniques for our data, we performed a grid search with cross-validation to find the best combination of hyperparameters for each technique. For the neural networks, we varied the number of layers (from 1 to 3), the number of nodes (from 10 to 100), and the activation function (sigmoid, tanh, or relu). For the decision tree, random forest, and extreme gradient boosting, we varied the number of trees (from 10 to 100), the maximum depth (from 2 to 10), and the splitting criterion (gini or entropy).
  Secondly, we avoided overfitting or underfitting our models by using regularization techniques, such as dropout, L1, and L2 penalties, and by monitoring the training and validation errors. We evaluated and compared the performance of different models using the root mean square error (RMSE), the coefficient of determination (R²), and the Nash-Sutcliffe efficiency (NSE) as the criteria."
  We hope this clarifies our methods and results.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC1
  - CC5: 'Reply on AC1', Ali Mobaraki, 24 May 2024
    
    Dear Author,
    Thank you for responding to our comment. Could you please upload the codes you developed for the simulation of stable isotopes along with your manuscript?
    
    Citation: https://doi.org/10.5194/hess-2023-299-CC5
    
    AC5: 'Reply on CC5', Mojtaba Heydarizad, 02 Jun 2024
    
    Dear Community,
    Thank you for your interest in our research and for your comment.
    Currently, our paper is in the preprint stage and undergoing review. As such, we have not yet uploaded the original figures and code scripts associated with our study. However, we understand the importance of transparency and reproducibility in scientific research.
    To assist you in the meantime, you can find similar code scripts developed for the simulation of Tehran precipitation on the ResearchGate page of the first author of this preprint. These scripts are part of a corresponding paper titled “Stable Isotope Signatures in Tehran’s Precipitation: Insights from Artificial Neural Networks, Stepwise Regression, Wavelet Coherence, and Ensemble Machine Learning Approaches,” which provides a step-by-step explanation of the code's structure. This should help you understand the procedure we used in our preprint.
    In the case of publication of this preprint, we will ensure that all relevant materials, including the code, are made available in accordance with the journal's policies. This will allow you and other researchers to fully reproduce and build upon our work.
    Best regards,
    
    Citation: https://doi.org/10.5194/hess-2023-299-AC5
CC2:
'Comment on hess-2023-299', Ville Järvinen, 15 Feb 2024

This manuscript presents a thorough investigation into stable isotope signatures in precipitation across Southeast Asia, employing a range of advanced statistical and machine learning techniques. However, while the incorporation of teleconnection indices as regional parameters provides valuable insight into large-scale climatic influences, I am curious about the potential influence of local land-sea interactions on precipitation isotopic composition. Given the proximity of several study sites to coastal regions, have authors considered incorporating variables related to sea surface temperatures or coastal proximity in their analysis? These factors can play a significant role in modulating local precipitation patterns through mechanisms such as sea breeze circulation and moisture transport. Could authors discuss whether the omission of such coastal-related variables might introduce biases or limitations to their findings, and if so, how they propose to address or account for these factors in their analysis?

Citation: https://doi.org/10.5194/hess-2023-299-CC2
- AC2: 'Reply on CC2', Mojtaba Heydarizad, 18 Feb 2024
  
  We (authors) thanks you for your comment and interest in our manuscript. We appreciate your constructive suggestions to improve our study.
  We (authors) agree that local land-sea interactions can have an influence on precipitation isotopic composition, especially in coastal regions where sea surface temperatures (SST) and coastal proximity can affect the local hydrological cycle and atmospheric circulation. Previous studies have shown that SST and coastal proximity can affect the isotopic fractionation, evaporation, condensation, and precipitation processes, as well as the moisture sources and transport pathways, of precipitation in different regions. Therefore, we acknowledge that incorporating these variables in our analysis could potentially improve our understanding of the spatial and temporal variations of stable isotopes in precipitation across Southeast Asia. However, we also note that there are some challenges and limitations in obtaining and using these variables in our analysis.
  First, the availability and quality of SST and coastal proximity data for the study sites and period are not consistent and reliable. We used the Global Network of Isotopes in Precipitation (GNIP) database to obtain the stable isotope and meteorological data for the studied stations in Southeast Asia, but this database does not provide SST and coastal proximity data. We searched for alternative sources of SST and coastal proximity data, such as the NOAA Optimum Interpolation Sea Surface Temperature (OISST) [1] and the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG) [2], but we found that these data sets have different spatial and temporal resolutions, coverage, and accuracy than the GNIP data. For example, the OISST data have a spatial resolution of 0.25°×0.25° and a temporal resolution of daily or monthly, while the GNIP data have a spatial resolution of station-level and a temporal resolution of monthly or annual. The GSHHG data have a spatial resolution of 0.01°×0.01° and a temporal resolution of static, while the GNIP data have a spatial resolution of station-level and a temporal resolution of monthly or annual. Therefore, it is not logical to integrate these higher resolution data sets with the GNIP data without introducing uncertainties and errors in the analysis.
  In addition, in our study, we used various statistical and machine learning techniques to simulate the stable isotope content in precipitation based on local and regional variables, such as potential air evaporation, wind speed, vapor pressure, air temperature, relative humidity, precipitation amount, and teleconnection indices. These variables were selected based on previous studies and theoretical considerations, and they were found to have significant correlations and influences on the stable isotope content in precipitation. Adding SST and coastal proximity variables in our calculations will increase the complexity of analysis and developed models. Adding SST and coastal proximity variables to the analysis would also require additional data processing, model selection, parameter tuning, and validation steps, which would increase the computational cost and time, as well as the risk of overfitting and multicollinearity. Moreover, the interpretation and explanation of the results would become more challenging and less intuitive, as the interactions and effects of the variables would become more complicated and nonlinear.
  Therefore, we decided to omit the SST and coastal proximity variables from our analysis, and focus on the local and regional variables that we considered to be more relevant and reliable for our study. We acknowledge that this may introduce some biases or limitations to our findings, especially for the coastal stations where the local land-sea interactions may have a stronger impact on the precipitation isotopic composition. However, we believe that our analysis still provides a comprehensive and robust investigation into the stable isotope signatures in precipitation across Southeast Asia, and that our results are consistent and comparable with previous studies in the region. Finally, we suggest that future studies could explore the use of SST and coastal proximity variables in the analysis, if more accurate and consistent data sets become available, and if more advanced and efficient statistical and machine learning techniques are developed. We hope that this answer addresses your comment and clarifies why we omit some parameters such as the SST and coastal proximity variables from our analysis.
  
  Reference
  1-https://link.springer.com/article/10.1007/s12665-016-6081-8
  2-https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2017JD026751
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC2
CC3:
'Comment on hess-2023-299', Edward Thakur, 15 Mar 2024

Your methodology demonstrates a comprehensive approach to understanding stable isotope signatures in precipitation across Southeast Asia. However, I noticed that while you incorporated a range of local and regional variables, such as potential air evaporation, wind speed, teleconnection indices, etc., there was no mention of the potential influence of land cover or land use on precipitation isotopic composition. Given that land surface characteristics can significantly impact precipitation patterns and isotopic signatures, especially in urban areas or regions experiencing land use changes, could you provide insight into why land cover variables were not included in your analysis? Additionally, do you anticipate any potential limitations or biases in your results due to the omission of these variables? If so, how do you suggest addressing or mitigating these limitations in future studies?

Citation: https://doi.org/10.5194/hess-2023-299-CC3
- AC3:
  'Reply on CC3', Mojtaba Heydarizad, 22 Mar 2024
  Dear Sir,
  Thank you for your great comments regarding potential influences on our research methodology. We (authors) would like to address the points you've raised regarding the omission of land cover variables in our analysis.
  While we recognize the importance of considering land cover variables in studies of precipitation isotopic composition, it's important to acknowledge the limitations we faced in this regard. Unfortunately, due to data constraints and the specific focus of our study, we were unable to incorporate land cover effects into our analysis.
  Regarding data availability, we encountered challenges in accessing comprehensive and up-to-date land cover maps for the study region. Despite efforts to locate suitable datasets, we found that such information was either outdated or not sufficiently detailed for our purposes. This limitation significantly constrained our ability to integrate land cover variables into our analysis. We would like to address this important aspect in more details and comprehensively:
  Contextualizing the Current Study: While our current study focuses on large-scale climate modes and local meteorological parameters, we understand the dominant influence that land cover variables can have. The decision to focus on the current variables stems from their well-documented impact on the stable isotope signatures within Southeast Asia.
  
  Land Cover Stability in Study Area: Importantly, the specific regions within Southeast Asia covered in this analysis have experienced relatively stable land cover patterns over the study period (There were some changes during the last 50 years, but these changes were not dominant to be captured by stable isotope proxies in precipitation).
  
  Data Availability, limitations, and solutions: Unfortunately, readily accessible and reliable land cover maps at the temporal and spatial resolution required for such a detailed study were unavailable. Of course, the high resolution land cover/land use map for the whole period of study is needed for accurate scientific conclusion. These maps are not available for the whole study region as well as whole study period.
  
  We completely agree that the omission of land cover variables presents a limitation of the current research. In future work, incorporating accurate land cover data would likely refine the understanding of isotopic signatures in precipitation within Southeast Asia. This can be achieved by either creating land cover maps using comprehensive remote-sensing projects in the Southwest Asia.
  Alternative Factors: Although the current study does not explicitly address land cover, several included variables intrinsically capture some aspects of land-atmosphere interaction that might indirectly reflect land cover influences. These include potential air evaporation, wind speed, and to some extent, teleconnection indices.
  
  Given these constraints, we made the decision to focus our analysis on the variables that were readily available and directly relevant to our research objectives, namely, the influence of large-scale climate modes and local meteorological parameters on stable isotopic composition in precipitation.
  Thanks again for your deep insights, and we will be happy to receive further suggestions and comments from your side to enhance the robustness of our work.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC3
CC4:
'Comment on hess-2023-299', Aamir Ali, 26 Apr 2024

Hello,
Thank you for your manuscript. I am interested in knowing whether it would be logical to apply the same machine learning models used in your study to a different area or region. Specifically, I am considering using machine learning models to simulate stable isotope contents in the southern part of Pakistan. However, I am uncertain about which techniques would be most appropriate for this purpose.
Thank you once again.

Citation: https://doi.org/10.5194/hess-2023-299-CC4
- AC4: 'Reply on CC4', Mojtaba Heydarizad, 28 Apr 2024
  
  Thank you for your interest in our manuscript and for considering the application of machine learning models to simulate stable isotope contents in the southern part of Pakistan.
  The applicability of machine learning models to a different area or region depends on various parameters including the availability and quality of data, environmental conditions, and the specific research objectives. While the models used in our study may serve as a starting point (mainly for tropical regions), it's essential to assess their suitability and adaptability to the unique characteristics of the study area in Pakistan.
  Given the complexity of stable isotope dynamics and environmental processes, it may be beneficial to explore a range of machine learning techniques, including but not limited to models used in our study. Other techniques such as LSTM, Support Vector Machines, or Convolutional Neural Networks (CNN) could also be considered based on the nature of the data and the research questions.
  We recommend conducting a thorough literature review to identify studies that have applied machine learning techniques to similar research questions or environmental settings. Additionally, collaborating with experts familiar with both machine learning and stable isotope analysis could provide valuable insights and guidance in selecting appropriate models and methodologies.
  Thank you again for your interest and good luck with your research.
  Sincerely,
  Dr. Mojtaba Heydarizad
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC4
RC1:
'Comment on hess-2023-299', Alice Hill, 21 May 2024

general comments
In general, this preprint needs to be vetting editorially before sent out for scientific review. There are several basic issues that would streamline review by volunteer scientists that are otherwise wasting their time. For example, the tables need to be reformatted with some obvious changes to make them clearer and to be able to fit better on the page so they are readable. Text in the figures are universally too small that can not be read on standard page size, with unhelpful axis labels/formats (“month number) and poor narration in the figure captions (no description of what is shown in legend, and no title for legend). There are typos throughout the text.

While the English is all readable, the quality of the discussion and the boldness or vagueness of many statements does not demonstrate that the authors understand the processes being represented behind the modelling. This is further elaborated on and can be seen in many examples in the specific comments below.

A small discussion on the architecture of XGBoost is warranted, given this is the preferred model that is pursued for the final analysis.

Page 10 – re. important predictors – a significant part of the front matter is devoted to discussing the importance of regional circulation patterns, and then they are deemed to be unimportant as per variable importance analysis with no discussion as to why these results/what processes may demote the regional circulation patterns in lieu of other predictors.

Elaborate more on the evaluation statistics used. You’ve listed several, but then don’t describe anything about what each one evaluates. Some of these may be common enough for the reader to know (eg R2) but others would benefit from explanation (eg AIC, BIC).

Apologies for this harsh review but this manuscript required more work before being ready to be sent out for review, especially when asking scientists to do this on their own time. It’s a waste of time that would otherwise be able to be spent on the science content and expertise.

specific comments
Line 37/38 – “numerous surveys” but only one cited – give broader and/or more complete range of citations here.
Line 46 – “the most crucial shortcoming” – this is subjective. For example, I disagree this is the most crucial shortcoming. Suggest rewording “A major shortcoming…” Along this same train of thought – you do not mention spatial representation of sites and how some regions are more representative in the IAEA network than others. This is where modelling really steps up – a prediction for places that have never been measured.

Line 57 – a bold assertion that requires citation: “machine learning (ML) techniques have been demonstrated to be remarkably successful in a variety of applications, including hydroclimate …”

Line 58 – start new paragraph when diving into the specifics of ML
Line 135 – suggest re-working this paragraph as it is a very long run on sentence.
Line 158 –also what is “v” separate subsets – does this signify a number or a technical element – needs explanation.
Line 164 – what does model “hardness’ refer to?
Fig 4 – hard to read text. Figure out how to make this figure usable for the reader instead of taking default plot formats and pasting into manuscript. Suggest choosing a representative panel (Kota Bharu?) that demonstrates the point of the figure, move the rest to the appendix.
Fig 4 – caption mentions asterisk – I don’t see any “*” in figure.
Table 1- suggest using gridlines to help reader follow fields across
Lines 250-259 - this is confusing – at tropical stations temperature is important but then you say “However, it is also important at non-tropical stations.” So temperature is important everywhere, right? This paragraph needs re-phrasing/clarifying. Also which stations are considered tropical vs. non-tropical for this study? I would have considered all of this region tropical?

Figure 5- text is too small – this is unreadable. Figure out how to make this figure usable for the reader instead of taking default plot formats and pasting into manuscript.
Figure 5 – add title to legend. I also suggest in the caption for the figure clarifying what the ML abbreviations are in the legend so that if this were copied into a slide the viewer could understand what they are looking at. I am not sure if this is required editorially but believe it is good practice for your work being able to be communicated easily.

Line 263 – “Previous studies have mentioned the influence of ENSO teleconnection indices on the stable isotope composition of precipitation across Southeast Asia” -- ok, fine, but what do they say about them?

Line 273 – “This is due to a much more complicated procedure for processing the data in ML models than regression models.” This statement is vague and perhaps inaccurate. Either explain what you mean by “Complicated” or use more specific statements that describe the difference between ML approaches and regression approaches. For example, ML approaches can honor the interactions between variables in ways that regression approaches do not. This level of specificity gives the reader a much better understanding of why ML models may lead to a more accurate model, rather than just saying it’s more “Complicated.”

Table 2 is poorly formatted – for a start single words in the headings are split across lines. There are multiple easy things to reformat here – consider landscape alignment, instead of separate columns for the isotope (this doesn’t change as you go down the table but takes up 2 separate columns) the isotope specification can be part of the heading. “Method” is also duplicated and taking up 2 columns – why? Is “Method” an accurate title for the column – shouldn’t it be “evaluation metric” that is more accurately descriptive? There is a lot of numbers in this table – most readers will just glaze over this. I suggest finding a way to highlight the best model for each site so that the reader is led to the data you want them to see. This is sloppy work. Think through these things before just cutting/pasting into manuscript. Rework this table.

Tables are generally presented before being mentioned in the text, so move this to before line 273.

Line 287 – “acceptable” is subjective, this implies there is a boundary that defines acceptable results vs. not acceptable. I suggest reporting R2 between predicted and actual, and let the reader define if this is good enough.

Fig 6 – is this data comparison between all data (training and test), or just the test data? Specify this. If you have not evaluated just on the test data, that should be done as that is where the actual capabilities of the model are demonstrated.
Fig 6- why is the x axis not shown on the bottom of the plot? Unless there is a good reason not to, this should be amended.
I suggest displaying the equations for the lines, as this gives the reader a quick comparison with the GMWL slope and intercept, which says a lot to a savvy isotope scientist.

Figure6- text is too small – this is unreadable. Figure out how to make this figure usable for the reader instead of taking default plot formats and pasting into manuscript.

Discuss aspects of Figure 6 that are notable – eg, Jayapura looks to be underpredicted in the mid-range of isotope values. Why could this be?

Figure 7 – the confidence interval is hard to see across the upper and lower envelope. The dotted line format for upper and lower is indistinguishable and often looks to be either upper or lower bound, but not on both sides. Is this a plotting or calculation error? Why not use the standard “translucent ribbon” to show the envelope? Also, the X axis “Months” number is unhelpful – you can’t tell what season it is (and this is important b/c of the big discussion around regional seasonal influences early in the paper). This is a lazy approach to not re-formatting – we’ve all been there and know this is a pain but it needs to be done for professional publishing.

Line 305 and surrounding text – A more nuanced and humble discussion around the certainty is required. Saying that the model Eg –“ Most stable isotope data fit within the confidence intervals, suggesting that the ML model precisely estimated the stable isotope contents…” – how do you define precisely here? Better to say “xx% of the data can be predicted within 95% Confidence intervals” because that is exactly what it does.
or “ the upper limit of the confidence interval, showing that the model significantly underestimated …” -- how do you define significance here?
Line 335 – again, phrasing is both vague and bold: The stable isotope composition of 335 “precipitation depends mainly on the vapor pressure, precipitation amount, temperature, and potential evaporation” – more accurate to say “predictor variables that were evaluated to have substantial influence on isotope values are …”

technical corrections compact listing of purely technical corrections at the very end ("": typing errors, etc.).
In general this manuscript should have been carefully reviewed for typos, grammatircal errors (misplaced commas, conjoined words without a space in between, etc) prior to going out for scientific review. A partial list of errors is below but I stopped collating these errors as they became numerous:
Line 16 – contents – should be composition?
Line 26 – missing space
Line 35 – missing “a”
line 99 delete “the” before NOAA
line 157 – should be “training” not train
Line 158 – type-o in split
Line 179 – typo “demonstrated”

Line 220 – is “Lasso” capitalized? It was not earlier in the paper –which ever way is accurate you need to be consistent.
Line 246 – typo – extra “a”
Line 305 - typo

Citation: https://doi.org/10.5194/hess-2023-299-RC1
- AC6: 'Reply on RC1', Mojtaba Heydarizad, 08 Jun 2024
  
  The answer to this reviewer has been added as a file titled 'Response to Reviewer 1'.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC6
RC2:
'Comment on hess-2023-299', Anonymous Referee #2, 22 May 2024
Dear Authors, Dear Editor,

The manuscript (hess-2023-299) compares the performance of a bunch of machine learning models in simulating the variation of precipitation stable isotope composition using monthly precipitation stable isotope records from six GNIP stations from SE Asia. The application of machine learning methods in hydrological modelling is a rapidly developing research direction. This is also true for the modelling of precipitation stable isotope compositions. Thus the work is timely and of interest. However, the manuscript still needs considerable revision to reach publication. One of my main problems with the manuscript is the lack of a scientific discussion. Section 4 in the current stage hardly goes beyond the description of the results. The other critical issue is the illustration material. Most of the figures and tables needs additional careful editing.

General comments
source of the meteorological data: The manuscript vaguely refers to NOAA web site as the source for evaporation and wind speed data in line 139. It is not acceptable since the reader is completely blind which database was used. The actual source should be cited not the web server via the data were accessed. In the next sentence (lines 140-141) it is written that vapor pressure, precipitation amount and air temperature were used from the GNIP. It should be advised to retrieve all meteorological variables from the same source, for instance to avoid resolution problems. In addition, I strongly suggest not using meteo data from the GNIP. Please keep in mind that GNIP is an archive of precipitation isotope data and not for meteorological data. If meteorological data are corrected for measurement inhomogeneity by the national meteo services or agencies it is not transferred to the GNIP. I have my own experience of this.

Structural problems:
The methodological description from line 218 to 232 should be moved to Section 3.

If I understand well, Authors consider VAR as the “gold standard” in the forecasting exercise and rank the ML predictions according to their accuracy compared to the VAR forecasts. However, it is not clear from the text why the VAR forecast can serve as a reference. Instead a year commonly covered each of the six station records could be retained from the ML training and could be used as a reference to compare the performance of the models.

The annotations are unreadable in Figs 4, 5, 6, and 7. I suggest changing the layout from the current 2×3 to 3×2 in Figs 4, 5 and 6. It will allow the authors to increase the panels. In addition, I strongly suggest increasing the font size in each panels. In addition, in the legend of Fig 1 “Bangkog” should be corrected to “Bangkok” and “Kota Bahura” should be corrected to “Kota Bharu”

Specific comments:
line 26: insert a space after the full stop.
line 27: No need to introduce the abbreviation “(VAR)” here since it is not used elsewhere in the abstract.
line 39: Beside (or instead of) the classic Clark & Fritz book, a more recent review should be cited e.g. Bowen et al., 2019 (https://doi.org/10.1146/annurev-earth-053018-060220 )
lines 42 and 44 I suggest moving the citations “IAEA/GNIP 2018” from kine 44 to the end of the sentence in line 42 and citing the most recent review from the IAEA Hydrology group in line 44: Vystavyna et al., 2021 (https://doi.org/10.1038/s41598-021-98094-6 )
line 79: I suggest replacing “Am” with “monsoon climate”
line 128: Please correct the superscript formats in the equation. In addition, 1 should be deleted from the exponent.
line 131: The sentence needs revision. The mentioned analytical uncertainties surely refers to the delta values rather than the heavy isotopes.
line 135 (and also elsewhere): ”potential air evaporation” sounds strange. Probably “air” should be omitted?
line 149: “M.H.” seems to be a mistake in the citation.
line 158: “spilited” should be changed to “splitted”
line 210: It is unclear from the current text which results are referred at the beginning of the sentence.
line 251: Please check the text. Is it possible that you meant to write “fairly weak” instead of “fairly strong”?

Suggestions on Table 1 and Table 2
The layout of both tables could be improved.
Table 1: If you introduce the abbreviation LR for Lasso regression in the table title than you can use it in the table which will help the readability of the table. In addition, “δ¹⁸O (VSMOW‰)” and δ²H (VSMOW‰)” should go to the header of the first and second part of the table, respectively, to eliminate the current “Isotope” column, which again could help to make this table more compact and readable.
Table 2: Similar suggestion as above. “δ¹⁸O” and δ²H” should go to the header above the methods to eliminate the current “Isotope” columns, to make this table more compact and readable.
Citation: https://doi.org/10.5194/hess-2023-299-RC2
- AC7: 'Reply on RC2', Mojtaba Heydarizad, 08 Jun 2024
  
  The answer to this reviewer has been added as a file titled 'Response to Reviewer 2'.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC7

Status: closed

CC1:
'Comment on hess-2023-299', Ali Mobaraki, 28 Jan 2024

This work presents an interesting and novel application of machine learning techniques to simulate and forecast the stable isotope contents in precipitation over Southeast Asia. The authors use various teleconnection indices and meteorological parameters as predictors and compare the performance of different machine learning models. They also conduct a bootstrap analysis to assess the uncertainty of the simulated data and use vector autoregression to forecast the stable isotope contents for one year.
I have a question about the methods section of the abstract. The authors mention that they used several machine learning techniques, such as shallow neural network, deep neural network, decision tree, random forest, and extreme gradient boosting. However, they do not explain comprehensively how they selected and optimized these techniques for their data. How did the authors choose the appropriate hyperparameters, such as the number of layers, nodes, and activation functions for the neural networks, or the number of trees, depth, and splitting criteria for the decision tree, random forest, and extreme gradient boosting? How did they avoid overfitting or underfitting their models? What criteria did they use to evaluate and compare the performance of different models?
I would appreciate it if the authors could provide more details on their methods and results. Thank you for your attention.

Citation: https://doi.org/10.5194/hess-2023-299-CC1
- AC1:
  'Reply on CC1', Mojtaba Heydarizad, 02 Feb 2024
  
  Dear reader,
  
  Thank you for your interest and feedback on our manuscript. We (authors) clarify our methods and results in more detail. Here is our point-by-point response to your comment:
  Comment: The authors mention that they used several machine learning techniques, such as shallow neural network, deep neural network, decision tree, random forest, and extreme gradient boosting. However, they do not explain comprehensively how they selected and optimized these techniques for their data. How did the authors choose the appropriate hyperparameters, such as the number of layers, nodes, and activation functions for the neural networks, or the number of trees, depth, and splitting criteria for the decision tree, random forest, and extreme gradient boosting? How did they avoid overfitting or underfitting their models? What criteria did they use to evaluate and compare the performance of different models?
  Response: We (authors) apologize for not providing enough details regarding our developed models using machine learning techniques. We will revise our manuscript and include the following sentences in the revised manuscript:
  Firstly, to select and optimize the machine learning techniques for our data, we performed a grid search with cross-validation to find the best combination of hyperparameters for each technique. For the neural networks, we varied the number of layers (from 1 to 3), the number of nodes (from 10 to 100), and the activation function (sigmoid, tanh, or relu). For the decision tree, random forest, and extreme gradient boosting, we varied the number of trees (from 10 to 100), the maximum depth (from 2 to 10), and the splitting criterion (gini or entropy).
  Secondly, we avoided overfitting or underfitting our models by using regularization techniques, such as dropout, L1, and L2 penalties, and by monitoring the training and validation errors. We evaluated and compared the performance of different models using the root mean square error (RMSE), the coefficient of determination (R²), and the Nash-Sutcliffe efficiency (NSE) as the criteria."
  We hope this clarifies our methods and results.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC1
  - CC5: 'Reply on AC1', Ali Mobaraki, 24 May 2024
    
    Dear Author,
    Thank you for responding to our comment. Could you please upload the codes you developed for the simulation of stable isotopes along with your manuscript?
    
    Citation: https://doi.org/10.5194/hess-2023-299-CC5
    
    AC5: 'Reply on CC5', Mojtaba Heydarizad, 02 Jun 2024
    
    Dear Community,
    Thank you for your interest in our research and for your comment.
    Currently, our paper is in the preprint stage and undergoing review. As such, we have not yet uploaded the original figures and code scripts associated with our study. However, we understand the importance of transparency and reproducibility in scientific research.
    To assist you in the meantime, you can find similar code scripts developed for the simulation of Tehran precipitation on the ResearchGate page of the first author of this preprint. These scripts are part of a corresponding paper titled “Stable Isotope Signatures in Tehran’s Precipitation: Insights from Artificial Neural Networks, Stepwise Regression, Wavelet Coherence, and Ensemble Machine Learning Approaches,” which provides a step-by-step explanation of the code's structure. This should help you understand the procedure we used in our preprint.
    In the case of publication of this preprint, we will ensure that all relevant materials, including the code, are made available in accordance with the journal's policies. This will allow you and other researchers to fully reproduce and build upon our work.
    Best regards,
    
    Citation: https://doi.org/10.5194/hess-2023-299-AC5
CC2:
'Comment on hess-2023-299', Ville Järvinen, 15 Feb 2024

This manuscript presents a thorough investigation into stable isotope signatures in precipitation across Southeast Asia, employing a range of advanced statistical and machine learning techniques. However, while the incorporation of teleconnection indices as regional parameters provides valuable insight into large-scale climatic influences, I am curious about the potential influence of local land-sea interactions on precipitation isotopic composition. Given the proximity of several study sites to coastal regions, have authors considered incorporating variables related to sea surface temperatures or coastal proximity in their analysis? These factors can play a significant role in modulating local precipitation patterns through mechanisms such as sea breeze circulation and moisture transport. Could authors discuss whether the omission of such coastal-related variables might introduce biases or limitations to their findings, and if so, how they propose to address or account for these factors in their analysis?

Citation: https://doi.org/10.5194/hess-2023-299-CC2
- AC2: 'Reply on CC2', Mojtaba Heydarizad, 18 Feb 2024
  
  We (authors) thanks you for your comment and interest in our manuscript. We appreciate your constructive suggestions to improve our study.
  We (authors) agree that local land-sea interactions can have an influence on precipitation isotopic composition, especially in coastal regions where sea surface temperatures (SST) and coastal proximity can affect the local hydrological cycle and atmospheric circulation. Previous studies have shown that SST and coastal proximity can affect the isotopic fractionation, evaporation, condensation, and precipitation processes, as well as the moisture sources and transport pathways, of precipitation in different regions. Therefore, we acknowledge that incorporating these variables in our analysis could potentially improve our understanding of the spatial and temporal variations of stable isotopes in precipitation across Southeast Asia. However, we also note that there are some challenges and limitations in obtaining and using these variables in our analysis.
  First, the availability and quality of SST and coastal proximity data for the study sites and period are not consistent and reliable. We used the Global Network of Isotopes in Precipitation (GNIP) database to obtain the stable isotope and meteorological data for the studied stations in Southeast Asia, but this database does not provide SST and coastal proximity data. We searched for alternative sources of SST and coastal proximity data, such as the NOAA Optimum Interpolation Sea Surface Temperature (OISST) [1] and the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG) [2], but we found that these data sets have different spatial and temporal resolutions, coverage, and accuracy than the GNIP data. For example, the OISST data have a spatial resolution of 0.25°×0.25° and a temporal resolution of daily or monthly, while the GNIP data have a spatial resolution of station-level and a temporal resolution of monthly or annual. The GSHHG data have a spatial resolution of 0.01°×0.01° and a temporal resolution of static, while the GNIP data have a spatial resolution of station-level and a temporal resolution of monthly or annual. Therefore, it is not logical to integrate these higher resolution data sets with the GNIP data without introducing uncertainties and errors in the analysis.
  In addition, in our study, we used various statistical and machine learning techniques to simulate the stable isotope content in precipitation based on local and regional variables, such as potential air evaporation, wind speed, vapor pressure, air temperature, relative humidity, precipitation amount, and teleconnection indices. These variables were selected based on previous studies and theoretical considerations, and they were found to have significant correlations and influences on the stable isotope content in precipitation. Adding SST and coastal proximity variables in our calculations will increase the complexity of analysis and developed models. Adding SST and coastal proximity variables to the analysis would also require additional data processing, model selection, parameter tuning, and validation steps, which would increase the computational cost and time, as well as the risk of overfitting and multicollinearity. Moreover, the interpretation and explanation of the results would become more challenging and less intuitive, as the interactions and effects of the variables would become more complicated and nonlinear.
  Therefore, we decided to omit the SST and coastal proximity variables from our analysis, and focus on the local and regional variables that we considered to be more relevant and reliable for our study. We acknowledge that this may introduce some biases or limitations to our findings, especially for the coastal stations where the local land-sea interactions may have a stronger impact on the precipitation isotopic composition. However, we believe that our analysis still provides a comprehensive and robust investigation into the stable isotope signatures in precipitation across Southeast Asia, and that our results are consistent and comparable with previous studies in the region. Finally, we suggest that future studies could explore the use of SST and coastal proximity variables in the analysis, if more accurate and consistent data sets become available, and if more advanced and efficient statistical and machine learning techniques are developed. We hope that this answer addresses your comment and clarifies why we omit some parameters such as the SST and coastal proximity variables from our analysis.
  
  Reference
  1-https://link.springer.com/article/10.1007/s12665-016-6081-8
  2-https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2017JD026751
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC2
CC3:
'Comment on hess-2023-299', Edward Thakur, 15 Mar 2024

Your methodology demonstrates a comprehensive approach to understanding stable isotope signatures in precipitation across Southeast Asia. However, I noticed that while you incorporated a range of local and regional variables, such as potential air evaporation, wind speed, teleconnection indices, etc., there was no mention of the potential influence of land cover or land use on precipitation isotopic composition. Given that land surface characteristics can significantly impact precipitation patterns and isotopic signatures, especially in urban areas or regions experiencing land use changes, could you provide insight into why land cover variables were not included in your analysis? Additionally, do you anticipate any potential limitations or biases in your results due to the omission of these variables? If so, how do you suggest addressing or mitigating these limitations in future studies?

Citation: https://doi.org/10.5194/hess-2023-299-CC3
- AC3:
  'Reply on CC3', Mojtaba Heydarizad, 22 Mar 2024
  Dear Sir,
  Thank you for your great comments regarding potential influences on our research methodology. We (authors) would like to address the points you've raised regarding the omission of land cover variables in our analysis.
  While we recognize the importance of considering land cover variables in studies of precipitation isotopic composition, it's important to acknowledge the limitations we faced in this regard. Unfortunately, due to data constraints and the specific focus of our study, we were unable to incorporate land cover effects into our analysis.
  Regarding data availability, we encountered challenges in accessing comprehensive and up-to-date land cover maps for the study region. Despite efforts to locate suitable datasets, we found that such information was either outdated or not sufficiently detailed for our purposes. This limitation significantly constrained our ability to integrate land cover variables into our analysis. We would like to address this important aspect in more details and comprehensively:
  Contextualizing the Current Study: While our current study focuses on large-scale climate modes and local meteorological parameters, we understand the dominant influence that land cover variables can have. The decision to focus on the current variables stems from their well-documented impact on the stable isotope signatures within Southeast Asia.
  
  Land Cover Stability in Study Area: Importantly, the specific regions within Southeast Asia covered in this analysis have experienced relatively stable land cover patterns over the study period (There were some changes during the last 50 years, but these changes were not dominant to be captured by stable isotope proxies in precipitation).
  
  Data Availability, limitations, and solutions: Unfortunately, readily accessible and reliable land cover maps at the temporal and spatial resolution required for such a detailed study were unavailable. Of course, the high resolution land cover/land use map for the whole period of study is needed for accurate scientific conclusion. These maps are not available for the whole study region as well as whole study period.
  
  We completely agree that the omission of land cover variables presents a limitation of the current research. In future work, incorporating accurate land cover data would likely refine the understanding of isotopic signatures in precipitation within Southeast Asia. This can be achieved by either creating land cover maps using comprehensive remote-sensing projects in the Southwest Asia.
  Alternative Factors: Although the current study does not explicitly address land cover, several included variables intrinsically capture some aspects of land-atmosphere interaction that might indirectly reflect land cover influences. These include potential air evaporation, wind speed, and to some extent, teleconnection indices.
  
  Given these constraints, we made the decision to focus our analysis on the variables that were readily available and directly relevant to our research objectives, namely, the influence of large-scale climate modes and local meteorological parameters on stable isotopic composition in precipitation.
  Thanks again for your deep insights, and we will be happy to receive further suggestions and comments from your side to enhance the robustness of our work.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC3
CC4:
'Comment on hess-2023-299', Aamir Ali, 26 Apr 2024

Hello,
Thank you for your manuscript. I am interested in knowing whether it would be logical to apply the same machine learning models used in your study to a different area or region. Specifically, I am considering using machine learning models to simulate stable isotope contents in the southern part of Pakistan. However, I am uncertain about which techniques would be most appropriate for this purpose.
Thank you once again.

Citation: https://doi.org/10.5194/hess-2023-299-CC4
- AC4: 'Reply on CC4', Mojtaba Heydarizad, 28 Apr 2024
  
  Thank you for your interest in our manuscript and for considering the application of machine learning models to simulate stable isotope contents in the southern part of Pakistan.
  The applicability of machine learning models to a different area or region depends on various parameters including the availability and quality of data, environmental conditions, and the specific research objectives. While the models used in our study may serve as a starting point (mainly for tropical regions), it's essential to assess their suitability and adaptability to the unique characteristics of the study area in Pakistan.
  Given the complexity of stable isotope dynamics and environmental processes, it may be beneficial to explore a range of machine learning techniques, including but not limited to models used in our study. Other techniques such as LSTM, Support Vector Machines, or Convolutional Neural Networks (CNN) could also be considered based on the nature of the data and the research questions.
  We recommend conducting a thorough literature review to identify studies that have applied machine learning techniques to similar research questions or environmental settings. Additionally, collaborating with experts familiar with both machine learning and stable isotope analysis could provide valuable insights and guidance in selecting appropriate models and methodologies.
  Thank you again for your interest and good luck with your research.
  Sincerely,
  Dr. Mojtaba Heydarizad
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC4
RC1:
'Comment on hess-2023-299', Alice Hill, 21 May 2024

general comments
In general, this preprint needs to be vetting editorially before sent out for scientific review. There are several basic issues that would streamline review by volunteer scientists that are otherwise wasting their time. For example, the tables need to be reformatted with some obvious changes to make them clearer and to be able to fit better on the page so they are readable. Text in the figures are universally too small that can not be read on standard page size, with unhelpful axis labels/formats (“month number) and poor narration in the figure captions (no description of what is shown in legend, and no title for legend). There are typos throughout the text.

While the English is all readable, the quality of the discussion and the boldness or vagueness of many statements does not demonstrate that the authors understand the processes being represented behind the modelling. This is further elaborated on and can be seen in many examples in the specific comments below.

A small discussion on the architecture of XGBoost is warranted, given this is the preferred model that is pursued for the final analysis.

Page 10 – re. important predictors – a significant part of the front matter is devoted to discussing the importance of regional circulation patterns, and then they are deemed to be unimportant as per variable importance analysis with no discussion as to why these results/what processes may demote the regional circulation patterns in lieu of other predictors.

Elaborate more on the evaluation statistics used. You’ve listed several, but then don’t describe anything about what each one evaluates. Some of these may be common enough for the reader to know (eg R2) but others would benefit from explanation (eg AIC, BIC).

Apologies for this harsh review but this manuscript required more work before being ready to be sent out for review, especially when asking scientists to do this on their own time. It’s a waste of time that would otherwise be able to be spent on the science content and expertise.

specific comments
Line 37/38 – “numerous surveys” but only one cited – give broader and/or more complete range of citations here.
Line 46 – “the most crucial shortcoming” – this is subjective. For example, I disagree this is the most crucial shortcoming. Suggest rewording “A major shortcoming…” Along this same train of thought – you do not mention spatial representation of sites and how some regions are more representative in the IAEA network than others. This is where modelling really steps up – a prediction for places that have never been measured.

Line 57 – a bold assertion that requires citation: “machine learning (ML) techniques have been demonstrated to be remarkably successful in a variety of applications, including hydroclimate …”

Line 58 – start new paragraph when diving into the specifics of ML
Line 135 – suggest re-working this paragraph as it is a very long run on sentence.
Line 158 –also what is “v” separate subsets – does this signify a number or a technical element – needs explanation.
Line 164 – what does model “hardness’ refer to?
Fig 4 – hard to read text. Figure out how to make this figure usable for the reader instead of taking default plot formats and pasting into manuscript. Suggest choosing a representative panel (Kota Bharu?) that demonstrates the point of the figure, move the rest to the appendix.
Fig 4 – caption mentions asterisk – I don’t see any “*” in figure.
Table 1- suggest using gridlines to help reader follow fields across
Lines 250-259 - this is confusing – at tropical stations temperature is important but then you say “However, it is also important at non-tropical stations.” So temperature is important everywhere, right? This paragraph needs re-phrasing/clarifying. Also which stations are considered tropical vs. non-tropical for this study? I would have considered all of this region tropical?

Figure 5- text is too small – this is unreadable. Figure out how to make this figure usable for the reader instead of taking default plot formats and pasting into manuscript.
Figure 5 – add title to legend. I also suggest in the caption for the figure clarifying what the ML abbreviations are in the legend so that if this were copied into a slide the viewer could understand what they are looking at. I am not sure if this is required editorially but believe it is good practice for your work being able to be communicated easily.

Line 263 – “Previous studies have mentioned the influence of ENSO teleconnection indices on the stable isotope composition of precipitation across Southeast Asia” -- ok, fine, but what do they say about them?

Line 273 – “This is due to a much more complicated procedure for processing the data in ML models than regression models.” This statement is vague and perhaps inaccurate. Either explain what you mean by “Complicated” or use more specific statements that describe the difference between ML approaches and regression approaches. For example, ML approaches can honor the interactions between variables in ways that regression approaches do not. This level of specificity gives the reader a much better understanding of why ML models may lead to a more accurate model, rather than just saying it’s more “Complicated.”

Table 2 is poorly formatted – for a start single words in the headings are split across lines. There are multiple easy things to reformat here – consider landscape alignment, instead of separate columns for the isotope (this doesn’t change as you go down the table but takes up 2 separate columns) the isotope specification can be part of the heading. “Method” is also duplicated and taking up 2 columns – why? Is “Method” an accurate title for the column – shouldn’t it be “evaluation metric” that is more accurately descriptive? There is a lot of numbers in this table – most readers will just glaze over this. I suggest finding a way to highlight the best model for each site so that the reader is led to the data you want them to see. This is sloppy work. Think through these things before just cutting/pasting into manuscript. Rework this table.

Tables are generally presented before being mentioned in the text, so move this to before line 273.

Line 287 – “acceptable” is subjective, this implies there is a boundary that defines acceptable results vs. not acceptable. I suggest reporting R2 between predicted and actual, and let the reader define if this is good enough.

Fig 6 – is this data comparison between all data (training and test), or just the test data? Specify this. If you have not evaluated just on the test data, that should be done as that is where the actual capabilities of the model are demonstrated.
Fig 6- why is the x axis not shown on the bottom of the plot? Unless there is a good reason not to, this should be amended.
I suggest displaying the equations for the lines, as this gives the reader a quick comparison with the GMWL slope and intercept, which says a lot to a savvy isotope scientist.

Figure6- text is too small – this is unreadable. Figure out how to make this figure usable for the reader instead of taking default plot formats and pasting into manuscript.

Discuss aspects of Figure 6 that are notable – eg, Jayapura looks to be underpredicted in the mid-range of isotope values. Why could this be?

Figure 7 – the confidence interval is hard to see across the upper and lower envelope. The dotted line format for upper and lower is indistinguishable and often looks to be either upper or lower bound, but not on both sides. Is this a plotting or calculation error? Why not use the standard “translucent ribbon” to show the envelope? Also, the X axis “Months” number is unhelpful – you can’t tell what season it is (and this is important b/c of the big discussion around regional seasonal influences early in the paper). This is a lazy approach to not re-formatting – we’ve all been there and know this is a pain but it needs to be done for professional publishing.

Line 305 and surrounding text – A more nuanced and humble discussion around the certainty is required. Saying that the model Eg –“ Most stable isotope data fit within the confidence intervals, suggesting that the ML model precisely estimated the stable isotope contents…” – how do you define precisely here? Better to say “xx% of the data can be predicted within 95% Confidence intervals” because that is exactly what it does.
or “ the upper limit of the confidence interval, showing that the model significantly underestimated …” -- how do you define significance here?
Line 335 – again, phrasing is both vague and bold: The stable isotope composition of 335 “precipitation depends mainly on the vapor pressure, precipitation amount, temperature, and potential evaporation” – more accurate to say “predictor variables that were evaluated to have substantial influence on isotope values are …”

technical corrections compact listing of purely technical corrections at the very end ("": typing errors, etc.).
In general this manuscript should have been carefully reviewed for typos, grammatircal errors (misplaced commas, conjoined words without a space in between, etc) prior to going out for scientific review. A partial list of errors is below but I stopped collating these errors as they became numerous:
Line 16 – contents – should be composition?
Line 26 – missing space
Line 35 – missing “a”
line 99 delete “the” before NOAA
line 157 – should be “training” not train
Line 158 – type-o in split
Line 179 – typo “demonstrated”

Line 220 – is “Lasso” capitalized? It was not earlier in the paper –which ever way is accurate you need to be consistent.
Line 246 – typo – extra “a”
Line 305 - typo

Citation: https://doi.org/10.5194/hess-2023-299-RC1
- AC6: 'Reply on RC1', Mojtaba Heydarizad, 08 Jun 2024
  
  The answer to this reviewer has been added as a file titled 'Response to Reviewer 1'.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC6
RC2:
'Comment on hess-2023-299', Anonymous Referee #2, 22 May 2024
Dear Authors, Dear Editor,

The manuscript (hess-2023-299) compares the performance of a bunch of machine learning models in simulating the variation of precipitation stable isotope composition using monthly precipitation stable isotope records from six GNIP stations from SE Asia. The application of machine learning methods in hydrological modelling is a rapidly developing research direction. This is also true for the modelling of precipitation stable isotope compositions. Thus the work is timely and of interest. However, the manuscript still needs considerable revision to reach publication. One of my main problems with the manuscript is the lack of a scientific discussion. Section 4 in the current stage hardly goes beyond the description of the results. The other critical issue is the illustration material. Most of the figures and tables needs additional careful editing.

General comments
source of the meteorological data: The manuscript vaguely refers to NOAA web site as the source for evaporation and wind speed data in line 139. It is not acceptable since the reader is completely blind which database was used. The actual source should be cited not the web server via the data were accessed. In the next sentence (lines 140-141) it is written that vapor pressure, precipitation amount and air temperature were used from the GNIP. It should be advised to retrieve all meteorological variables from the same source, for instance to avoid resolution problems. In addition, I strongly suggest not using meteo data from the GNIP. Please keep in mind that GNIP is an archive of precipitation isotope data and not for meteorological data. If meteorological data are corrected for measurement inhomogeneity by the national meteo services or agencies it is not transferred to the GNIP. I have my own experience of this.

Structural problems:
The methodological description from line 218 to 232 should be moved to Section 3.

If I understand well, Authors consider VAR as the “gold standard” in the forecasting exercise and rank the ML predictions according to their accuracy compared to the VAR forecasts. However, it is not clear from the text why the VAR forecast can serve as a reference. Instead a year commonly covered each of the six station records could be retained from the ML training and could be used as a reference to compare the performance of the models.

The annotations are unreadable in Figs 4, 5, 6, and 7. I suggest changing the layout from the current 2×3 to 3×2 in Figs 4, 5 and 6. It will allow the authors to increase the panels. In addition, I strongly suggest increasing the font size in each panels. In addition, in the legend of Fig 1 “Bangkog” should be corrected to “Bangkok” and “Kota Bahura” should be corrected to “Kota Bharu”

Specific comments:
line 26: insert a space after the full stop.
line 27: No need to introduce the abbreviation “(VAR)” here since it is not used elsewhere in the abstract.
line 39: Beside (or instead of) the classic Clark & Fritz book, a more recent review should be cited e.g. Bowen et al., 2019 (https://doi.org/10.1146/annurev-earth-053018-060220 )
lines 42 and 44 I suggest moving the citations “IAEA/GNIP 2018” from kine 44 to the end of the sentence in line 42 and citing the most recent review from the IAEA Hydrology group in line 44: Vystavyna et al., 2021 (https://doi.org/10.1038/s41598-021-98094-6 )
line 79: I suggest replacing “Am” with “monsoon climate”
line 128: Please correct the superscript formats in the equation. In addition, 1 should be deleted from the exponent.
line 131: The sentence needs revision. The mentioned analytical uncertainties surely refers to the delta values rather than the heavy isotopes.
line 135 (and also elsewhere): ”potential air evaporation” sounds strange. Probably “air” should be omitted?
line 149: “M.H.” seems to be a mistake in the citation.
line 158: “spilited” should be changed to “splitted”
line 210: It is unclear from the current text which results are referred at the beginning of the sentence.
line 251: Please check the text. Is it possible that you meant to write “fairly weak” instead of “fairly strong”?

Suggestions on Table 1 and Table 2
The layout of both tables could be improved.
Table 1: If you introduce the abbreviation LR for Lasso regression in the table title than you can use it in the table which will help the readability of the table. In addition, “δ¹⁸O (VSMOW‰)” and δ²H (VSMOW‰)” should go to the header of the first and second part of the table, respectively, to eliminate the current “Isotope” column, which again could help to make this table more compact and readable.
Table 2: Similar suggestion as above. “δ¹⁸O” and δ²H” should go to the header above the methods to eliminate the current “Isotope” columns, to make this table more compact and readable.
Citation: https://doi.org/10.5194/hess-2023-299-RC2
- AC7: 'Reply on RC2', Mojtaba Heydarizad, 08 Jun 2024
  
  The answer to this reviewer has been added as a file titled 'Response to Reviewer 2'.
  
  Citation: https://doi.org/10.5194/hess-2023-299-AC7

Mojtaba Heydarizad, Liu Zhongfang, Nathsuda Pumijumnong, Masoud Minaei, Pouya Salari, Rogert Sori, and Hamid Ghalibaf Mohammadabadi

Viewed

Total article views: 1,227 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
813	346	68	1,227	47	58

HTML: 813
PDF: 346
XML: 68
Total: 1,227
BibTeX: 47
EndNote: 58

Views and downloads (calculated since 23 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	109	27	6	142
Feb 2024	92	18	13	123
Mar 2024	76	15	7	98
Apr 2024	71	21	10	102
May 2024	104	28	12	144
Jun 2024	57	16	10	83
Jul 2024	37	8	4	49
Aug 2024	24	4	0	28
Sep 2024	15	5	0	20
Oct 2024	17	10	0	27
Nov 2024	21	11	0	32
Dec 2024	11	13	0	24
Jan 2025	12	14	1	27
Feb 2025	20	19	1	40
Mar 2025	28	10	3	41
Apr 2025	13	30	0	43
May 2025	55	13	1	69
Jun 2025	38	71	0	109
Jul 2025	13	13	0	26

Cumulative views and downloads (calculated since 23 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	109	27	6	142
Feb 2024	92	18	13	123
Mar 2024	76	15	7	98
Apr 2024	71	21	10	102
May 2024	104	28	12	144
Jun 2024	57	16	10	83
Jul 2024	37	8	4	49
Aug 2024	24	4	0	28
Sep 2024	15	5	0	20
Oct 2024	17	10	0	27
Nov 2024	21	11	0	32
Dec 2024	11	13	0	24
Jan 2025	12	14	1	27
Feb 2025	20	19	1	40
Mar 2025	28	10	3	41
Apr 2025	13	30	0	43
May 2025	55	13	1	69
Jun 2025	38	71	0	109
Jul 2025	13	13	0	26

Viewed (geographical distribution)

Total article views: 1,170 (including HTML, PDF, and XML) Thereof 1,170 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 18 Jul 2025

Short summary

This research showed how various factors affect ¹⁸O and ²H isotopes in precipitation in Southeast Asia. Various machine learning (ML) models were used to analyze the data. The reliability of predictions were also tested which confirmed the accurate predictions of this study. In addition, another model called VAR, beside ML model have been used to forecast the stable isotopes.


Total:	0
HTML:	0
PDF:	0
XML:	0

Exploring Long-term Monthly Prediction of Precipitation Isotopes over Southeast Asia: A Comparative Analysis of Machine-Learning Models

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.