Reply on RC1

Thanks very much for your great support and constructive suggestions with regard to our manuscript. These comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have made our best efforts to improve our paper very carefully following your comments and suggestions. Our point by point response to the comments are given below. We hope the revised manuscript will be acceptable to your requirements. If still there are concerns, we will be happy to take care once we hear from you.

Thanks very much for your great support and constructive suggestions with regard to our manuscript. These comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have made our best efforts to improve our paper very carefully following your comments and suggestions. Our point by point response to the comments are given below. We hope the revised manuscript will be acceptable to your requirements. If still there are concerns, we will be happy to take care once we hear from you.

Major improvements:
Comment 1: GSDNN is critical in enhancing the GSARNN model's fitting accuracy. This highlights the significance of obtaining the correct spatial correlation between sample points by taking into account both the dataset's local and global qualities. I'm simply wondering if using GSDNN with traditional approaches will result in better interpolation results.
[Response]: Thanks for your comment. The GSDNN unit plays an important role in GSARNN model, taking variation characteristics of geographical elements in different directions into account. However, since there is no recognized true value of generalized spatial distance for training process, applying GSDNN unit to traditional methods will make the calculation process impossible to carry on. Therefore, the GSDNN unit can only be embedded in the NN method and participates in its overall training and calculation process. In other words, the generalized spatial distance is determined by the spatial characteristics of the elements to be interpolated, owning a specific connotation based on specific context of spatial elements. Some additional explanations have been added in the last paragraph of Section 2.3.1.
"Note that since there is no recognized true value of generalized spatial distance for training process, the GSDNN unit can only be embedded in the neural network-based method and participates in its overall training and calculation process. In other words, the generalized spatial distance is determined by the spatial characteristics of the elements to be interpolated, owning a specific connotation based on specific context of spatial elements." Comment 2: Section 2.1.2 should go over the kriging approach in greater detail. Please explain how the weight coefficient λ is calculated.
[Response]: Thanks for your helpful advice. We have properly supplemented the interpolation calculation process of the Kriging method, including how the weight coefficient λ is calculated. However, since the focus of this paper is on the GSARNN model, the derivation processes of some formulas in Kriging method have been omitted to make the length of Section 2.1.2 not too long.

See supplement for more revision details.
Comment 3: The phrase "weight matrix" appears for the first time on Page 5, Line 6. Please provide some context.
[Response]: Thanks for your instructive advice. The logic of space weight description in the original manuscript is not coherent enough, which will make some readers confused. The overall presentation of this part (Section 2.2) has been properly reorganized to make it easier to understand.

See supplement for more revision details.
Comment 4: You say in section 2.3.3 that you utilize variable learning rate for network training and explain how it changes during the training process. You should also consider the benefits of this customized learning rate.
[Response]: Thanks for your very helpful advice. The learning rate starts from αstart and increases gradually at the rate of k1 until αmax. A relatively small initial learning rate can prevent excessive fluctuation and convergence obstacle and the following increment of leaning rate can avoid the convergence rate at the early stage of the training process being too low. The maximum learning rate is maintained for n epochs. At this stage, the model can stably learn the spatial characteristics of the elements. The learning rate then gradually decreases exponentially at the rate of k2, ensuring that the model can sufficiently converge near the optimal position. The description of the customized learning rate benefits has been added in Section 2.3.3, Paragraph 3, which makes this strategy more reasonable.

See supplement for more revision details.
Comment 5: The difference between all of these interpolation solutions is difficult to notice in Figure 11. I recommend graphing the difference between the interpolated and real values (interpolation error) and modifying the color scheme accordingly.
[Response]: Thanks for your very instructive advice. According to your recommendation, we graph the cross-validation interpolation results as well as the interpolation errors of the four methods as Figure 10 (Figure 11 in the original manuscript). The difference of the interpolation performances between the four methods can be more easily distinguished now. The caption and paragraph corresponding to this figure have also been appropriately revised.

See supplement for more revision details.
Comment 6: It is unclear what the x-axis signifies in Figures 8 and 12. Please include some text and figure descriptions.
[Response]: Thanks for your helpful advice. The x-axis in Figures 8 and Figure 12 represents the identifier number of each sampled point after they are sorted in ascending order of real value. Without an explanation, readers may feel confused. Some descriptions have been added in Figure 8 and Figure 12.
See supplement for more revision details.

Minor suggestions:
Comment 1: In section 3.1.1, from Line 19 to Line 22, I suggest not overly emphasizing the benefits of this experiment using the simulated data. This point has been mentioned in earlier paragraphs.
[Response]: Thanks for your helpful advice. Redundant statements have been removed. In order to keep Section 3.1.1 consistent with Section 3.2.1, a sentence is added at the end of Section 3.1.1.
"This case compares the interpolation abilities of the GSARNN model and the other three models in three-dimensional space using the simulated dataset above." Comment 2: In Formula 10, since kij is 1 for the situation i≠j, then should the off-diagonal elements simply be written as ρij?
[Response]: Thanks for your helpful advice. The issue has been corrected.

See supplement for more revision details.
Comment 3: Multiline formulas, such as Formula 6, Formula 19, Formula 30, should be left aligned. The "int" in Formula 26 and Formula 27 seems to be redundant.
[Response]: Thanks for your helpful advice. Formula 6, Formula 19, Formula 30 have been corrected to be left aligned. The "int" in Formula 26 and Formula 27 is to take the integer part of the operation result, which is the coordinate of the point in the cube.
See supplement for more revision details.