the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Exploring Long-term Monthly Prediction of Precipitation Isotopes over Southeast Asia: A Comparative Analysis of Machine-Learning Models
Abstract. Using stable isotope methods is essential for studying tropical hydrology and climatology. The purpose of this research was to investigate the influence of large-scale climate modes (teleconnection indices) and local meteorological parameters on the stable isotope contents in six different stations, including Bangkok, Kuala Lumpur, Jakarta, Kota Bharu, Jayapura, and Singapore in Southeast Asia. To achieve this goal, several machine learning (ML) techniques were employed, such as shallow neural network (SNN), deep neural network (DNN), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). XGBoost demonstrated the highest accuracy across the majority of studied stations, with a R2 = 0.91, VNS=0.90, AIC= 405, BIC=410, and RMSE = 0.76. Additionally, DNN exhibited superior accuracy in specific cases, achieving a R2 = 0.87, VNS=0.87, AIC = 445, BIC = 460, and RMSE = 1.10. Furthermore, a bootstrap analysis was conducted to assess the uncertainty of the simulated data in each station. The results of this analysis demonstrated acceptable accuracy, as the majority of simulated data points fell within the 95 % confidence intervals. Finally, stable isotope contents in precipitation were forecasted for one year using Vector Autoregression (VAR) and ML techniques. This study underscores the efficacy of ML techniques in both simulating and forecasting stable isotope contents with high precision. The inclusion of specific accuracy metrics strengthens the validity of claims in this study and provides a clearer picture of the quantitative outcomes of this research.
- Preprint
(5547 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 24 May 2024)
-
CC1: 'Comment on hess-2023-299', Ali Mobaraki, 28 Jan 2024
reply
This work presents an interesting and novel application of machine learning techniques to simulate and forecast the stable isotope contents in precipitation over Southeast Asia. The authors use various teleconnection indices and meteorological parameters as predictors and compare the performance of different machine learning models. They also conduct a bootstrap analysis to assess the uncertainty of the simulated data and use vector autoregression to forecast the stable isotope contents for one year.
I have a question about the methods section of the abstract. The authors mention that they used several machine learning techniques, such as shallow neural network, deep neural network, decision tree, random forest, and extreme gradient boosting. However, they do not explain comprehensively how they selected and optimized these techniques for their data. How did the authors choose the appropriate hyperparameters, such as the number of layers, nodes, and activation functions for the neural networks, or the number of trees, depth, and splitting criteria for the decision tree, random forest, and extreme gradient boosting? How did they avoid overfitting or underfitting their models? What criteria did they use to evaluate and compare the performance of different models?
I would appreciate it if the authors could provide more details on their methods and results. Thank you for your attention.
Citation: https://doi.org/10.5194/hess-2023-299-CC1 -
AC1: 'Reply on CC1', Mojtaba Heydarizad, 02 Feb 2024
reply
Dear reader,
Thank you for your interest and feedback on our manuscript. We (authors) clarify our methods and results in more detail. Here is our point-by-point response to your comment:
Comment: The authors mention that they used several machine learning techniques, such as shallow neural network, deep neural network, decision tree, random forest, and extreme gradient boosting. However, they do not explain comprehensively how they selected and optimized these techniques for their data. How did the authors choose the appropriate hyperparameters, such as the number of layers, nodes, and activation functions for the neural networks, or the number of trees, depth, and splitting criteria for the decision tree, random forest, and extreme gradient boosting? How did they avoid overfitting or underfitting their models? What criteria did they use to evaluate and compare the performance of different models?
Response: We (authors) apologize for not providing enough details regarding our developed models using machine learning techniques. We will revise our manuscript and include the following sentences in the revised manuscript:
Firstly, to select and optimize the machine learning techniques for our data, we performed a grid search with cross-validation to find the best combination of hyperparameters for each technique. For the neural networks, we varied the number of layers (from 1 to 3), the number of nodes (from 10 to 100), and the activation function (sigmoid, tanh, or relu). For the decision tree, random forest, and extreme gradient boosting, we varied the number of trees (from 10 to 100), the maximum depth (from 2 to 10), and the splitting criterion (gini or entropy).
Secondly, we avoided overfitting or underfitting our models by using regularization techniques, such as dropout, L1, and L2 penalties, and by monitoring the training and validation errors. We evaluated and compared the performance of different models using the root mean square error (RMSE), the coefficient of determination (R2), and the Nash-Sutcliffe efficiency (NSE) as the criteria."
We hope this clarifies our methods and results.
Citation: https://doi.org/10.5194/hess-2023-299-AC1
-
AC1: 'Reply on CC1', Mojtaba Heydarizad, 02 Feb 2024
reply
-
CC2: 'Comment on hess-2023-299', Ville Järvinen, 15 Feb 2024
reply
This manuscript presents a thorough investigation into stable isotope signatures in precipitation across Southeast Asia, employing a range of advanced statistical and machine learning techniques. However, while the incorporation of teleconnection indices as regional parameters provides valuable insight into large-scale climatic influences, I am curious about the potential influence of local land-sea interactions on precipitation isotopic composition. Given the proximity of several study sites to coastal regions, have authors considered incorporating variables related to sea surface temperatures or coastal proximity in their analysis? These factors can play a significant role in modulating local precipitation patterns through mechanisms such as sea breeze circulation and moisture transport. Could authors discuss whether the omission of such coastal-related variables might introduce biases or limitations to their findings, and if so, how they propose to address or account for these factors in their analysis?
Citation: https://doi.org/10.5194/hess-2023-299-CC2 -
AC2: 'Reply on CC2', Mojtaba Heydarizad, 18 Feb 2024
reply
We (authors) thanks you for your comment and interest in our manuscript. We appreciate your constructive suggestions to improve our study.
We (authors) agree that local land-sea interactions can have an influence on precipitation isotopic composition, especially in coastal regions where sea surface temperatures (SST) and coastal proximity can affect the local hydrological cycle and atmospheric circulation. Previous studies have shown that SST and coastal proximity can affect the isotopic fractionation, evaporation, condensation, and precipitation processes, as well as the moisture sources and transport pathways, of precipitation in different regions. Therefore, we acknowledge that incorporating these variables in our analysis could potentially improve our understanding of the spatial and temporal variations of stable isotopes in precipitation across Southeast Asia. However, we also note that there are some challenges and limitations in obtaining and using these variables in our analysis.
First, the availability and quality of SST and coastal proximity data for the study sites and period are not consistent and reliable. We used the Global Network of Isotopes in Precipitation (GNIP) database to obtain the stable isotope and meteorological data for the studied stations in Southeast Asia, but this database does not provide SST and coastal proximity data. We searched for alternative sources of SST and coastal proximity data, such as the NOAA Optimum Interpolation Sea Surface Temperature (OISST) [1] and the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG) [2], but we found that these data sets have different spatial and temporal resolutions, coverage, and accuracy than the GNIP data. For example, the OISST data have a spatial resolution of 0.25°×0.25° and a temporal resolution of daily or monthly, while the GNIP data have a spatial resolution of station-level and a temporal resolution of monthly or annual. The GSHHG data have a spatial resolution of 0.01°×0.01° and a temporal resolution of static, while the GNIP data have a spatial resolution of station-level and a temporal resolution of monthly or annual. Therefore, it is not logical to integrate these higher resolution data sets with the GNIP data without introducing uncertainties and errors in the analysis.
In addition, in our study, we used various statistical and machine learning techniques to simulate the stable isotope content in precipitation based on local and regional variables, such as potential air evaporation, wind speed, vapor pressure, air temperature, relative humidity, precipitation amount, and teleconnection indices. These variables were selected based on previous studies and theoretical considerations, and they were found to have significant correlations and influences on the stable isotope content in precipitation. Adding SST and coastal proximity variables in our calculations will increase the complexity of analysis and developed models. Adding SST and coastal proximity variables to the analysis would also require additional data processing, model selection, parameter tuning, and validation steps, which would increase the computational cost and time, as well as the risk of overfitting and multicollinearity. Moreover, the interpretation and explanation of the results would become more challenging and less intuitive, as the interactions and effects of the variables would become more complicated and nonlinear.
Therefore, we decided to omit the SST and coastal proximity variables from our analysis, and focus on the local and regional variables that we considered to be more relevant and reliable for our study. We acknowledge that this may introduce some biases or limitations to our findings, especially for the coastal stations where the local land-sea interactions may have a stronger impact on the precipitation isotopic composition. However, we believe that our analysis still provides a comprehensive and robust investigation into the stable isotope signatures in precipitation across Southeast Asia, and that our results are consistent and comparable with previous studies in the region. Finally, we suggest that future studies could explore the use of SST and coastal proximity variables in the analysis, if more accurate and consistent data sets become available, and if more advanced and efficient statistical and machine learning techniques are developed. We hope that this answer addresses your comment and clarifies why we omit some parameters such as the SST and coastal proximity variables from our analysis.
Reference
1-https://link.springer.com/article/10.1007/s12665-016-6081-8
2-https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2017JD026751
Citation: https://doi.org/10.5194/hess-2023-299-AC2
-
AC2: 'Reply on CC2', Mojtaba Heydarizad, 18 Feb 2024
reply
-
CC3: 'Comment on hess-2023-299', Edward Thakur, 15 Mar 2024
reply
Your methodology demonstrates a comprehensive approach to understanding stable isotope signatures in precipitation across Southeast Asia. However, I noticed that while you incorporated a range of local and regional variables, such as potential air evaporation, wind speed, teleconnection indices, etc., there was no mention of the potential influence of land cover or land use on precipitation isotopic composition. Given that land surface characteristics can significantly impact precipitation patterns and isotopic signatures, especially in urban areas or regions experiencing land use changes, could you provide insight into why land cover variables were not included in your analysis? Additionally, do you anticipate any potential limitations or biases in your results due to the omission of these variables? If so, how do you suggest addressing or mitigating these limitations in future studies?
Citation: https://doi.org/10.5194/hess-2023-299-CC3 -
AC3: 'Reply on CC3', Mojtaba Heydarizad, 22 Mar 2024
reply
Dear Sir,
Thank you for your great comments regarding potential influences on our research methodology. We (authors) would like to address the points you've raised regarding the omission of land cover variables in our analysis.
While we recognize the importance of considering land cover variables in studies of precipitation isotopic composition, it's important to acknowledge the limitations we faced in this regard. Unfortunately, due to data constraints and the specific focus of our study, we were unable to incorporate land cover effects into our analysis.
Regarding data availability, we encountered challenges in accessing comprehensive and up-to-date land cover maps for the study region. Despite efforts to locate suitable datasets, we found that such information was either outdated or not sufficiently detailed for our purposes. This limitation significantly constrained our ability to integrate land cover variables into our analysis. We would like to address this important aspect in more details and comprehensively:
- Contextualizing the Current Study: While our current study focuses on large-scale climate modes and local meteorological parameters, we understand the dominant influence that land cover variables can have. The decision to focus on the current variables stems from their well-documented impact on the stable isotope signatures within Southeast Asia.
- Land Cover Stability in Study Area: Importantly, the specific regions within Southeast Asia covered in this analysis have experienced relatively stable land cover patterns over the study period (There were some changes during the last 50 years, but these changes were not dominant to be captured by stable isotope proxies in precipitation).
- Data Availability, limitations, and solutions: Unfortunately, readily accessible and reliable land cover maps at the temporal and spatial resolution required for such a detailed study were unavailable. Of course, the high resolution land cover/land use map for the whole period of study is needed for accurate scientific conclusion. These maps are not available for the whole study region as well as whole study period.
We completely agree that the omission of land cover variables presents a limitation of the current research. In future work, incorporating accurate land cover data would likely refine the understanding of isotopic signatures in precipitation within Southeast Asia. This can be achieved by either creating land cover maps using comprehensive remote-sensing projects in the Southwest Asia.
- Alternative Factors: Although the current study does not explicitly address land cover, several included variables intrinsically capture some aspects of land-atmosphere interaction that might indirectly reflect land cover influences. These include potential air evaporation, wind speed, and to some extent, teleconnection indices.
Given these constraints, we made the decision to focus our analysis on the variables that were readily available and directly relevant to our research objectives, namely, the influence of large-scale climate modes and local meteorological parameters on stable isotopic composition in precipitation.
Thanks again for your deep insights, and we will be happy to receive further suggestions and comments from your side to enhance the robustness of our work.
Citation: https://doi.org/10.5194/hess-2023-299-AC3
-
AC3: 'Reply on CC3', Mojtaba Heydarizad, 22 Mar 2024
reply
-
CC4: 'Comment on hess-2023-299', Aamir Ali, 26 Apr 2024
reply
Hello,
Thank you for your manuscript. I am interested in knowing whether it would be logical to apply the same machine learning models used in your study to a different area or region. Specifically, I am considering using machine learning models to simulate stable isotope contents in the southern part of Pakistan. However, I am uncertain about which techniques would be most appropriate for this purpose.
Thank you once again.
Citation: https://doi.org/10.5194/hess-2023-299-CC4
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
319 | 74 | 30 | 423 | 14 | 13 |
- HTML: 319
- PDF: 74
- XML: 30
- Total: 423
- BibTeX: 14
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1